Lec02-Syntax Analysis and LL
Lec02-Syntax Analysis and LL
1
SECTION 2.1: CONTEXT FREE GRAMMAR
2
SYNTAX ANALYZER
4
PARSERS (CONT.)
1. Top-Down Parser
Parse-trees built is build from root to leaves (top to bottom).
Input to parser is scanned from left to right one symbol at a time
2. Bottom-Up Parser
Start from leaves and work their way up to the root.
Input to parser scanned from left to right one symbol at a time
6
CONTEXT-FREE GRAMMARS
(CFG)
G = (T,V, S, P)
{a, b} S aAa
S, A S b
Aa 8
TERMINALS SYMBOLS
Terminals include:
Lower case letters early in the alphabets
Operator symbols, +, %
Punctuation symbols such as ( ) , ;
Digits 0,1,2, …
Boldface strings id or if
11
DERIVATION OF A STRING
12
DERIVATION OF A STRING
where
+ and are arbitrary strings of
terminal and non-terminal symbols
+ : derives in zero or more steps
13
: derives in one or more steps
DERIVATION OF A STRING
Derived in multiple
SaSa aaSaaaaaSaaaaaabaaa
steps
14
SENTENCE AND SENTENTIAL
FORM
Left-Most Derivation
Right- Most Derivation
In Left Most Derivation , we start deriving the string ‘w’ from the
left side and convert all non terminals into terminals.
In Right Most Derivation, we start deriving the string ‘w’ from the
right side and convert all non terminals into terminals.
16
LEFT-MOST DERIVATIONS
E
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id E E
Derive the string ‘id+id *id’
+
EE+E (EE+E) id E E
Eid+E (Eid) *
Eid+E*E (EE*E)
id
Eid+id*E (Eid) id
Eid+id*id (Eid)
18
RIGHT-MOST DERIVATIONS
19
RIGHT-MOST DERIVATIONS
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
E
EE*E (EE+E) E E
EE*id (Eid) *
EE+E*id (EE+E) E id
+ E
EE+id*id (Eid)
Eid+id*id (Eid)
id id
20
SECTION 2.2: AMBIGUOUS GRAMMAR
21
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e. it can be
derived by more then one ways from LMD or RMD.
stmt stmt
E2 S1 E2 S1 S2
1 2 24
AMBIGUITY (CONT.)
• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
25
SECTION 2.3: LEFT RECURSION AND LEFT
FACTORING
26
LEFT RECURSION
A grammar is left recursive if it has a non-terminal A such
that there is a derivation.
+
A A for some string
AA| A A'
Eliminate
A' A' |
where does immediate left
recursion
not start with A An equivalent grammar
In general,
A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
EE+T|T (AA | )
A is E; is +T and is T
Applying Rule we get
E T E'
E T E' (A A ')
E’ +T E' | (A ' A '| ) E’ +T E' |
T F T'
T’ *F T' |
TT*F|F (AA | ) F id | (E)
A is T; is *F and is F
Applying Rule we get Final Output 29
T F T' (A A')
T *F T' |
’
(A’ A'|)
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the Grammar S
No Immediate left recursion in the
Aa | b grammar
A Sc | d
Substitution
We need to check and eliminate both Immediate left recursion and Left recursion
30
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the Grammar S
Aa | b No Immediate left recursion in S
A Ac | Sd | f
Order of non-terminals: S, A
for S:
Substitute ASd with Aad|bd
- there is no immediate left recursion in S.
Applying Rule
S Aa | b
We get: A bdA' | fA'
A bdA' | fA' A' cA' | adA' |
A' cA' | adA' | 31
Final Output
NO IMMEDIATE LEFT-RECURSION AA|
BUT AA'
A' A
GRAMMAR IS LEFT RECURSIVE '|
for A:
Eliminate the immediate left-recursion A Ac | Sd | f
in A is c; 1 is Sd and 2 is f
A SdA' | fA'
A' cA' |
for S:
- Replace S Aa with S SdA' a|fA'a
ABx y | x
BCD
CA| c
D d
33
ELIMINATE LEFT-RECURSION -- ALGORITHM
}
LEFT-FACTORING
Consider the Grammar
S Aa |A b
OR
If there is a grammar
A 1|2
where is non-empty and the first symbols of 1 and 2
(if they have one)are different.
convert it into
A A' | 1 | ... | m
A' 1 | ... | n
37
LEFT-FACTORING – EXAMPLE1
A aA' | cdA''
A' bB | B
A'' g | eB | fB
38
LEFT-FACTORING – EXAMPLE2
A ad | a | ab | abc | b
is a; 1 is d; 2 is ; 3 is b, 4 is bc
A aA' | b
A' d | | b | bc
is b; 1 is ; 2 is c
A aA' | b
A' d | | bA''
A'' | c
39
NON-CONTEXT FREE LANGUAGE
CONSTRUCTS
There are some language constructions in the programming
languages which are not context-free. This means that, we cannot
write a context-free grammar for these constructions.
41
TOP-DOWN PARSING
Beginning with the start symbol, try to guess the productions
to apply to end up at the user's program.
42
CHALLENGES IN TOP-DOWN PARSING
In general, we can't.
There are some grammars for which the best we can do is guess and
backtrack if we're wrong.
If we have to guess, how do we do it?
43
TOP-DOWN PARSING
Top-down parser
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not work, we backtrack
Not efficient
Predictive Parsing
No backtracking
Efficient
backtracking.
Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
44
RECURSIVE-DESCENT PARSING
(USES BACKTRACKING)
Backtracking is needed.
It tries to find the left-most derivation.
S aBc
B bc | b
S S
Input: abc
a Bc a Bc
b c b
45
fails, backtrack
RECURSIVE PREDICTIVE
PARSING
Each non-terminal corresponds to a procedure.
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
46
RECURSIVE PREDICTIVE PARSING
(CONT.)
A aBb | bAB
proc A {
case of the current token
{
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
} 47
RECURSIVE PREDICTIVE PARSING
(CONT.)
When to apply -productions.
A aA | bB |
If all other productions fail, we should apply an -production. For example, if the
current token is not a or b, we may apply the -production.
48
TOP-DOWN, PREDICTIVE PARSING: LL(1)
49
TOP-DOWN, PREDICTIVE PARSING:
LL(1)
current token
50
TOP-DOWN, PREDICTIVE PARSING: LL(1)
stmt if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current
token is if we have to choose first production rule
When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
We eliminate the left recursion in the grammar, and left factor it. But
it may not be suitable for predictive parsing (not LL(1) grammar).
51
NON-RECURSIVE PREDICTIVE
PARSING -- LL(1) PARSER
Non-Recursive predictive parsing is a table-driven parser.
It is a top-down parser.
It is also known as LL(1) Parser.
Input Buffer
Parsing Table
52
LL(1) PARSER
Input buffer
Contains the string to be parsed. We will assume that its end is marked with a special symbol $.
Output
A production rule representing a step of the derivation sequence (left-most derivation) of the
string in the input buffer.
Stack
Contains the grammar symbols
At the bottom of the stack, there is a special end marker symbol $.
Initially the stack contains only the symbol $ and the starting symbol S.
$S initial stack
When the stack is emptied (ie. only $ left in the stack), the parsing is completed.
Parsing table
A two-dimensional array M[A,a]
Each row is a non-terminal symbol
Each column is a terminal symbol or the special symbol $
Each entry holds a production rule.
53
LL(1) PARSER – PARSER
ACTIONS
The symbol at the top of the stack (say X) and the current symbol in the input string (say a)
determine the parser action.
There are four possible parser actions.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser
also outputs the production rule XY1Y2...Yk to represent a step of the derivation.
54
CONSTRUCTING LL(1) PARSING
TABLES
Two functions are used in the construction of LL(1) parsing
tables:
FIRST FOLLOW
55
COMPUTE FIRST FOR ANY STRING X
56
COMPUTE FIRST FOR ANY STRING X
Initially, for all non-terminals A, set
FIRST(A) = { t | A → t for some }
Consider the grammar :
SaC/bB
Bb
Cc
FIRST(S) ={a,b}; FIRST (B) ={b} and FIRST(C) ={c}
FIRST(C) ={c}
FIRST COMPUTATION WITH ΕPSILON
For all NT A where A → ε is a production, add ε to FIRST(A).
For eg. Sa|ε FIRST(S) {a, ε}
For each production A → , where is a string of NT whose FIRST sets contain ε, set
FIRST(A) = FIRST(A) ∪ { ε }.
For eg. SAB|c ; Aa| ε ; B b| ε
FIRST(S) {a, b,c, ε} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ;
For each production A → t, where is a string of NT whose FIRST sets contain ε, set
FIRST(A) = FIRST(A) ∪ { t }
For eg. SABcD ; Aa| ε ; B b| ε ; Dd
FIRST(S) {a,b, c} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
For each production A → B, where is string of NT whose FIRST sets contain ε, set
FIRST(A) = FIRST(A) ∪ (FIRST(B) - { ε }).
For eg. SABDc|f ; Aa| ε ; B b| ε ; Dd
FIRST(S) {a,b,d,f } ;FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
58
FOLLOW SET
The FOLLOW set represents the set of terminals that might
come after a given nonterminal
Formally:
59
COMPUTE FOLLOW FOR ANY STRING X
We apply these rules until nothing more can be added to any follow set.
60
FIRST AND FOLLOW SET
EXAMPLE
61
FIRST AND FOLLOW SET
EXAMPLE
Consider the grammar
C P F class id X Y
P public |
F final |
X extends id |
Y implements I |
I id J
J , I |
E TE'
E' +TE'|
T FT'
T' *FT'| 63
F (E)|id
FIRST EXAMPLE
FIRST(F) = {(,id}
FIRST(T') = {*, }
FIRST(T) = {(,id}
FIRST(E') = {+, }
FIRST(E) = {(,id}
64
FIRST(F) = {(,id}
FIRST(T’) = {*, }
FIRST(T) = {(,id}
FOLLOW EXAMPLE
FIRST(E’) = {+, } 1. If S is the start symbol $ is in FOLLOW(S)
FIRST(E) = {(,id} 2(i) If A B is a production rule
everything in FIRST() is FOLLOW(B) except
3(i) If ( A B is a production rule ) or
3(ii) ( A B is a production rule and is in FIRST() )
Consider the following grammar:
everything in FOLLOW(A) is in FOLLOW(B).
E TE'
E’ +TE' |
T FT' ETE’ {(Rule 1: $ in FOLLOW(E);
(Rule 2: A B : is ; B is T and is E’ );
T’ *FT' | (Rule3(i): A B : is T; B is E ’);
F (E) |id Rule 3 (ii): A B : is ; B is T and E ’ is ; FIRST of has )}
E+TE ’ | {Rule 2: A B : is +; B is T and is E’ );
(Rule3(i): A B: is +T; B is E ’;
(Rule3(ii): A B: is +; B is T; is E ’;FIRST of has )}
TFT ’ {Rule 2: A B : is ; B is F and is T’);
FOLLOW(E) = { $, ) } (Rule3(i): A B : is F; B is T ’);
FOLLOW(E') = { $, ) } (Rule3(ii): A B : is ; B is F and is T ’ FIRST of has )}
FOLLOW(T) = { +, ), $ } T’*FT ’| {Rule 2: A B : is *; B is F and is T ’);
(Rule3(i): A B : is *; B is F; is T ’);
FOLLOW(T') = { +, ), $} Rule3(ii): A B : is *; B is F; is T ’; FIRST of has )}
FOLLOW(F) = {+, *, ), $ } F (E)|id 65
{(Rule 2: A B : is ‘(‘; B is E and ‘)’ is )}
CONSTRUCTING LL(1) PARSING
TABLE -- ALGORITHM
for each production rule A of a grammar G
for each terminal a in FIRST()
add A to M[A,a]
If in FIRST()
for each terminal a in FOLLOW(A) add A
to M[A,a]
If in FIRST() and $ in FOLLOW(A)
add A to M[A,$]
T FT'
T' *FT' |
F (E) | id FIRST (E') has , so add E’ in FOLLOW (E’)
FIRST (T') has , so add T’ in FOLLOW (T’)
id + * ( ) $
E E TE' E TE'
E' E' +TE' E' E'
T T FT' T FT'
T' T' T' *FT’ T' T' 68
F F id F (E)
LL(1) PARSER – EXAMPLE 1
Stack Input Output id + * ( ) $
E E TE' E TE'
$E id+id$ ETE'
$E'T id+id$ TFT' E' E' +TE' E' E'
$E'T'F id+id$ Fid T T FT' T FT'
$E'T'id id+id$
T' T' T' *FT’ T' T'
$E'T' +id$ T'
F F id F (E)
$E' +id$ E’+TE'
$E'T+ +id$
$E'T id$ TFT'
$E'T'F id$ Fid
$E’T'id id$
$ET' $ T'
$E' $ E'
$ $ Accept
69
LL(1) PARSER – EXAMPLE 2
a b $
S aBa S S aBa
B bB | B B B bB
$ $ Accept, Successful
Completion
LL(1) PARSER – EXAMPLE2 (CONT.)
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b B
b B
71
A GRAMMAR WHICH IS NOT
LL(1)
Problem ambiguity
A GRAMMAR WHICH IS NOT LL(1)
(CONT.)
What do we have to do it if the resulting parsing table contains multiply defined
entries?
If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
If the grammar is not left factored, we have to left factor the grammar.
If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is
ambiguous or it is inherently not a LL(1) grammar.
A left recursive grammar cannot be a LL(1) grammar.
A A |
any terminal that appears in FIRST() also appears FIRST(A) because A
.
any terminal that appears in FIRST() also appears in FIRST(A)
If is ,
and FOLLOW(A).
A grammar is not left factored, it cannot be a LL(1) grammar
• A 1 | 2
any terminal that appears in FIRST(1) also appears in FIRST(2).
73
An ambiguous grammar cannot be a LL(1) grammar.
PROPERTIES OF LL(1)
GRAMMARS
A grammar G is LL(1) if and only if the following conditions hold for
two distinctive production rules A and A
1. Both and cannot derive strings starting with same terminals.
2. At most one of and can derive to .
3. If can derive to , then cannot derive to any string starting with a
terminal in FOLLOW(A).
In other word we can say that a grammar G is LL(1) iff for any
productions
A → ω and A → ω , the sets
1 2
This condition is equivalent to saying that there are no conflicts in the
74
table.