Unit-II CD
Unit-II CD
Syntax Analysis
1
Syllabus
• Role of Parser – Grammars – Error Handling – Context-free grammars
– Writing a grammar – Top Down Parsing – General Strategies
Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce
Parser-LR Parser-LR (0)Item Construction of SLR Parsing Table -
Introduction to LALR Parser – Error Handling and Recovery in Syntax
Analyzer-YACC.
2
The role of the Parser
Token
Source Rest of
Lexical Parser front end
Program
Analyzer
Get next
token Intermediate
representation
Lexical error
Symbol Table
3
• Syntax Analyzer creates the syntactic structure of the given source
program.
• This syntactic structure is mostly a parse tree.
• Syntax Analyzer is also known as parser.
• The syntax of a programming is described by a context-free grammar
(CFG).
• The syntax analyzer (parser) checks whether a given source program
satisfies the rules implied by a context-free grammar or not.
• If it satisfies, the parser creates the parse tree of that program.
• Otherwise the parser gives the error messages.
4
Categorize the parsers into two groups:
1. Top-Down Parser
• the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
• the parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to right (one symbol at a
time).
• Efficient top-down and bottom-up parsers can be implemented only for sub-classes of
context-free grammars.
• LL for top-down parsing
• LR for bottom-up parsing
5
Context-Free Grammars
• Inherently recursive structures of a programming language are defined by a
context-free grammar
G=(V,T,P,S)
• In a context-free grammar, consists of
• A finite set of terminals (in our case, this will be the set of tokens)
• A finite set of non-terminals (syntactic-variables)
• A finite set of productions rules in the following form
• A where A is a non-terminal and
is a string of terminals and non-terminals
(including the empty string)
• A start symbol (one of the non-terminal symbol)
• Example:
E E+E | E–E | E*E | E/E | -E
E (E)
6
E id
Derivations
E E+E
1 2 ... n ( 1 derives n )
*
+ : derives in one step
: derives in zero or more steps
7
: derives in one or more steps
Derivation Example
E -E -(E) -(E+E) -(id+E) -(id+id)
OR
E -E -(E) -(E+E) -(E+id) -(id+id)
• At each derivation step, we can choose any of the non-terminal in the sentential form of G for the
replacement.
• If we always choose the left-most non-terminal in each derivation step, this derivation is called as
left-most derivation.
• If we always choose the right-most non-terminal in each derivation step, this derivation is called as
right-most derivation.
8
Left-Most and Right-Most
Derivations
Left-Most Derivation
lm lm lm lm lm
E -E -(E) -(E+E) -(id+E) -(id+id)
Right-Most Derivation
rm rm rm rm rm
E -E -(E) -(E+E) -(E+id) -(id+id)
• We will see that the top-down parsers try to find the left-most derivation of the given
source program.
• We will see that the bottom-up parsers try to find the right-most derivation of the given
source program in the reverse order.
9
Parse Tree
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
E -E E
-(E) E
-(E+E)
E
- E - E - E
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
10
Ambiguity
• A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.
• unambiguous grammar
unique selection of the parse tree for a sentence
E E+E id+E E
id+E*E
E + E
id+id*E id+id*id
id E * E
id id
E
E E*E E+E*E
id+E*E *
E E
id+id*E id+id*id
E + E id
id id
11
Notational Conventions
12
13
14
Elimination of Left Recursion
• A grammar is left recursive if it has a non-terminal A such that there is a
derivation.
+
A A for some string
15
Immediate Left-Recursion
AA| where does not start with A
In general,
16
Immediate Left-Recursion --
ExampleE E+T |
T
T T*F |
F
F id |
(E)
eliminate immediate left recursion
E T E’
E’ +T E’ |
T F T’
T’ *F T’ |
F id |
(E)
17
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
• By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S Aa | b
A Sc | d This grammar is not immediately
left-recursive,
but it is still left-recursive.
S Aa Sca or
A Sc Aac causes to a left-recursion
18
Eliminate Left-Recursion – Example
S Aa | b
A Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A SdA’ | fA’
A’ cA’ |
for S:
- Replace S Aa with S SdA’a | fA’a
So, we will have S SdA’a | fA’a | b
- Eliminate the immediate left-recursion in S
S fA’aS’ | bS’
S’ dA’aS’ |
19
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai A j
by
Ai 1 | ... | k
where Aj 1 | ... | k
}
- eliminate immediate left-recursions among Ai productions
}
20
Left Factoring
• A predictive parser insists that the grammar must be
left-factored.
• Given a non-terminal A, represent its rules as:
A 1 | 2 | … |
• is the longest matching prefix of several A productions
• is the other productions that does not have leading
• should be eliminated to achieve predictive parsing
• Rewrite the production rules
A A’ |
A’ 1 | 2 | …
11/08/2024 21
Left Factoring-Example1
S iEtS | iEtSeS | a
Eb
( i stands for “if”; t stands for “then”; and e stands for “else”)
• Left factored, this grammar becomes:
S iEtSS’ | a
S’ eS | є
Eb
11/08/2024 22
Left-Factoring – Example2
A abB | aB | cdg | cdeB | cdfB
A aA’ | cdg | cdeB | cdfB
A’ bB | B
A aA’ | cdA’’
A’ bB | B
A’’ g | eB | fB
11/08/2024 23
Left-Factoring – Example3
A ad | a | ab | abc | b
A aA’ | b
A’ d | | b | bc
A aA’ | b
A’ d | | bA’’
A’’ | c
11/08/2024 24
Top-Down Parsing
• The parse tree is created top to bottom.
• Top-down parser
• Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work,
we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
• Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent
parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1)
parser.
11/08/2024 25
Constructing of Predictive Parsing
Table
11/08/2024 26
FIRST & FOLLOW
• Computing First:
• If X is a terminal, then First(X) is {X}
• If X є is a production, then add є to First(X)
• If X is a non-terminal and X Y1 Y2 … Yk is a production, then place a in
First(X) if for some i, a is in First(Yi) and є is in all of First(Y1)…First(Yi-1)
11/08/2024 27
• Computing Follow:
• Place $ in Follow(S), where S is the start symbol and $ is the input right end marker.
• If there is a production A αBβ, then everything in First(β) except for є is placed in Follow(B).
• If there is a production A αB, or a production A αBβ where First(β) contains є, then
everything in Follow(A) is in Follow(B)
11/08/2024 28
Example
FIRST FOLLOW
E TE’ FOLLOW(E) = { $, ) }
E’ +TE’ |
T FT’ FOLLOW(E’) = { $, ) }
T’ *FT’ | FOLLOW(T) = { +, ), $ }
F (E) | id
FOLLOW(T’) = { +, ), $ }
FIRST(F) = {(,id} FOLLOW(F) = {+, *, ), $ }
FIRST(T’) = {*, }
FIRST(T) = {(,id}
FIRST(E’) = {+, }
FIRST(E) = {(,id}
11/08/2024 29
Non-recursive Predictive Parser
Input: a + b $
Stack: Output
X Predictive Parsing
Y Program
Z
$
Parsing
Table: M
11/08/2024 30
Constructing the Parsing Table
• Algorithm for constructing a predictive parsing table:
1. For each production A α of the grammar, do steps 2 and 3
2. For each terminal a in First(α), add A α to M[A, a]
3. If є is in First(α), add A α to M[A, b] for each terminal b in Follow(A). If є is
in First(α) and $ is in Follow(A), add A α to M[A, $].
4. Make each undefined entry of M be an error.
11/08/2024 31
FIRST(F) = {(,id} FOLLOW(E) = { $, ) }
FIRST(T’) = {*, } FOLLOW(E’) = { $, ) }
id + * ( ) $
E ETE’ ETE’
E’ E’+TE’ E’є E’є
T T FT’ TFT’
T’ T’є T’*FT’ T’є T’є
F Fid F(E)
11/08/2024 32
Stack Input Output
Parsing of Input $E
$E’T
id+id*id$
id+id*id$ E TE’
String id+id*id $E’T’F id+id*id$ T FT’
$E’T’id id+id*id$ F id
$E’T’ +id*id$
id + * ( ) $ $E’ +id*id$ T’ є
E ETE’ ET $E’T+ +id*id$ E’ +TE’
E’ $E’T id*id$
E’ E’+ E’є E’є $E’T’F id*id$ T FT’
TE’
$E’T’id id*id$ F id
T T FT’ TF
$E’T’ *id$
T’
$E’T’F* *id$ T’ *FT’
T’ T’є T’* T’є T’є
FT’ $E’T’F id$
F Fid F(E $E’T’id id$ F id
) $E’T’ $
$E’ $ T’ є
$ $ E’ є
11/08/2024 33
Bottom Up Parsing
• A bottom-up parser creates the parse tree of the given input starting
from leaves towards the root.
• A bottom-up parser tries to find the right-most derivation of the given
input in the reverse order.
• Bottom-up parsing is also known as shift-reduce parsing because its
two main actions are shift and reduce.
34
• 2 methods in shift reduce parsing
• Operator Precedence Parsing
• LR Parsing(Left-to-right, Rightmost derivation)
• SLR
• Canonical LR
• LALR
35
Shift-Reduce Parsing
Grammar: Reducing a sentence: Shift-reduce corresponds
SaABe abbcde to a rightmost derivation:
AAbc| aAbcde S rm a A B e
b aAde rm a A d e
Bd aABe rm a A b c d e
S rm a b b c d e
These match
production’s
right-hand sides
36
Handles
A handle is a substring of grammar symbols in a
right-sentential form that matches a right-hand side
of a production
Grammar: abbcde
SaABe aAbcde
AAbc| aAde Handle
b aABe
Bd S
Handle Pruning
38
Stack Implementation of
Shift-Reduce Parsing
40
LR-Parser Example: The Parsing Table
State Action Goto
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 Acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 R1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
41
LR Parsers
• The most powerful shift-reduce parsing (yet efficient) is:
LR(k) parsing.
42
The LR Parser Algorithm
input a1 a2 … ai … an $
stack
Xm
sm-1
Xm-1 action goto
… shift
s0 reduce
accept 43
error
Constructing LR(0) Item
• An LR(0) item of a grammar G is a production of G a
• dot at the some position of the right side.
Ex: A aBb Possible LR(0) Items: A aBb .
.
a Bb
(four different possibility) A
.
A aB b
aBb . A
44
1. Augmented Grammar
• G’ is G with a new production rule S’S where S’ is the new starting
symbol.
45
2. The Closure Operation
• If I is a set of LR(0) items for a grammar G, then closure(I) is the
set of LR(0) items constructed from I by the two rules:
.
1. Initially, every LR(0) item in I is added to closure(I).
.
2. If A B is in closure(I) and B is a production rule of G; then B
will be in the closure(I). We will apply this rule until no more new LR(0)
items can be added to closure(I).
46
Closure-Example
.
.
E’ E closure({E’ E}) =
.
E E+T { E’ E
.
ET E E+T
.
T T*F E T
.
TF T T*F
.
F (E) T F
.
F id F (E)
F id }
47
3. goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-
terminal),
•
. then goto(I,X) is defined as follows:
.
If A X in I then every item in closure({A X }) will be in goto(I,X).
48
goto-example
Example:
I ={ .. .. .
E’ E, E E+T, E T,
. . ..
T T*F, T F,
F (E), F id }
.. .
goto(I,E) = { E’ E , E E +T }
goto(I,T) = { E T , T T *F }
.. . . . .
goto(I,F) = {T F }
goto(I,() = { F ( E), E E+T, E T, T T*F, T . F,
. F (E), F id }
goto(I,id) = { F id }
49
SLR Parsing TableAction Table Goto Table
state id + * ( ) $ E T F
0
1
2
3
4
5
6
7
8
9
10
11
50
Constructing SLR Parsing Table
(of an augumented grammar G’)
1. Construct the canonical collection of sets of LR(0) items for G’. C{I0,...,In}
2. Create the parsing action table as follows
• If a is a terminal, A.a in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• If A. is in Ii , then action[i,a] is reduce A for all a in FOLLOW(A) where AS’.
• If S’S. is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not SLR(1).
3. Create the parsing goto table
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
5. Initial state of the parser contains S’.S
51
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
52
Kernel Item & Non Kernel Item
53
The LR Parser Algorithm
input a1 a2 … ai … an $
stack
Xm
sm-1
Xm-1 action goto
… shift
s0 reduce
accept 54
error
LR Parsing Algorithm
• The parsing table consists of two parts: a parsing action function and a
goto function.
• The LR parsing program determines sm, the state on top of the stack
and ai, the current input. It then consults action[s m, ai] which can take
one of four values:
• Shift
• Reduce
• Accept
• Error
55
LR Parsing Algorithm
• If action[sm, ai] = shift s, where s is a state, then the parser executes a
shift move.
• If action[sm, ai] = reduce A β, then the parser executes a reduce
move.
• pop 2*|| items from the stack;
• If action[sm, ai] = accept, parsing is completed
• If action[sm, ai] = error, then the parser discovered an error.
56
LR Parsing Algorithm
set ip to point to the first symbol in w$
initialize stack to 0
repeat forever
let ‘s’ be top most state on stack & ‘a’ be symbol pointed to by ip
if action[s, a] = shift s’
push a then s’ onto stack
advance ip to next input symbol
else if action[s, a] = reduce A
pop 2*| | symbols of stack
let s’ be state now on top of stack
push A then goto[s’,A] onto stack
output production A
else if action[s, a] == accept
return success
else
error()
57
Canonical Collection of Sets of LR(1) Items
• The construction of the canonical collection of the sets of LR(1) items are similar to the construction of
the canonical collection of the sets of LR(0) items, except that closure and goto operations work a little
bit different.
59
Construction of The Canonical LR(1)
Collection
Item(G’)
{
C is { closure({S’.S,$}) }
repeat
for each I in C
for each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C;
Until no new set of items are added to C;
}
60
An Example S’ S
1. S C C
2. C c C
3. C d
I0: closure({(S’ S, $)}) =
(S’ S, $) I3: goto(I1, c) = I6: goto(I3, c) = : goto(I4, c) = I4
(S C C, $) (C c C, c/d) (C c C, $)
(C c C, c/d) (C c C, c/d) (C c C, $) : goto(I4, d) = I5
(C d, c/d) (C d, c/d) (C d, $)
I9: goto(I7, c) =
I1: goto(I1, S) = (S’ S , $) I4: goto(I1, d) = I7: goto(I3, d) =
(C c C , $)
(C d , c/d) (C d , $)
I2: goto(I1, C) = : goto(I7, c) = I7
(S C C, $) I5: goto(I3, C) = I8: goto(I4, C) =
(C c C, $) (S C C , $) (C c C , c/d) : goto(I7, d)61= I8
(C d, $)
S’ S, $ I1
S C C, $ S (S’ S , $
C c C, c/d
C d, c/d
C I5
I0 S C C, $ C
C c C, $ S C C , $
C d, $
I2
c I6
c
C c C, $ C
C c C, $
d C d, $ I9
c
d C cC , $
I7
C d , $
d c
C c C, c/d C I8
C c C, c/d
C d, c/d C c C , c/d
I3
I4 d
C d , c/d 62
An Example
c d $ S C
0 s3 s4 g1 g2
1 a
2 s6 s7 g5
3 s3 s4 g8
4 r3 r3
5 r1
6 s6 s7 g9
7 r3
8 r2 r2
9 r2
63
Construction of LR(1) Parsing Tables
1. Construct C’{I0,...,In} the canonical collection of sets of LR(1) items for G’.
2. Create the parsing action table as follows
•
.
If a is a terminal, A a,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
•
.
If A ,a is in Ii , then action[i,a] is reduce A where AS’.
•
.
If S’S ,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not LR(1).
64
Construction of LALR Parsing Tables
1. Create the canonical LR(1) collection of the sets of LR(1) items for the given
grammar.
2. For each core present; find all sets having that same core; replace those sets
having same cores with a single set which is their union. C={I0,...,In}
C’={J1,...,Jm} where m n
3. Create the parsing tables (action and goto tables) same as the construction of the
parsing tables of LR(1) parser.
Note that: If J=I1 ... Ik since I1,...,Ik have same cores
cores of goto(I1,X),...,goto(I2,X) must be same.
So, goto(J,X)=K where K is the union of all sets of items having same cores as goto(I1,X).
66
Error Recovery Strategies
• Panic mode
• Phrase-level recovery
• Error productions
• Global correction
67
Error Recovery in Predictive Parsing
• An error may occur in the predictive parsing (LL(1) parsing)
• if the terminal symbol on the top of stack does not match with the current
input symbol.
• if the top of stack is a non-terminal A, the current input symbol is a, and the
parsing table entry M[A,a] is empty.
• What should the parser do in an error case?
• The parser should be able to give an error message (as much as possible
meaningful error message).
• It should be recover from that error case, and it should be able to continue
the parsing with the rest of the input.
68
Panic-Mode Error Recovery in LL(1)
Parsing
• In panic-mode error recovery, we skip all the input symbols until a
synchronizing token is found.
• What is the synchronizing token?
• All the terminal-symbols in the follow set of a non-terminal can be used as a
synchronizing token set for that non-terminal.
• So, a simple panic-mode error recovery for the LL(1) parsing:
• All the empty entries are marked as synch to indicate that the parser will skip all the
input symbols until a symbol in the follow set of the non-terminal A which on the top
of the stack. Then the parser will pop that non-terminal A from the stack. The parsing
continues from that state.
• To handle unmatched terminal symbols, the parser pops that unmatched terminal
symbol from the stack and it issues an error message saying that unmatched terminal
is inserted.
69
70
Phrase-Level Error Recovery
• Each empty entry in the parsing table is filled with a pointer to a
special error routine which will take care that error case.
• These error routines may:
• change, insert, or delete input symbols.
• issue appropriate error messages
• pop items from the stack.
• We should be careful when we design these error routines, because
we may put the parser into an infinite loop.
71
72
Error Recovery in LR Parsing
• An LR parser will detect an error when it consults the parsing action
table and finds an error entry. All empty entries in the action table are
error entries.
• Errors are never detected by consulting the goto table.
• A canonical LR parser (LR(1) parser) will never make even a single
reduction before announcing an error.
• The SLR and LALR parsers may make several reductions before
announcing an error.
• But, all LR parsers (LR(1), LALR and SLR parsers) will never shift an
erroneous input symbol onto the stack.
73
Panic Mode Error Recovery in LR
Parsing
• Scan down the stack until a state s with a goto on a particular nonterminal A is
found.
• Discard zero or more input symbols until a symbol a is found that can legitimately
follow A.
• The symbol a is simply in FOLLOW(A), but this may not work for all situations.
• The parser stacks the nonterminal A and the state goto[s,A], and it resumes the
normal parsing.
T→T+F|F
F → ( E ) | id
• It scans the stack to find a state that has a "goto" action for some nonterminal (T in this case).
• It discards * (the unexpected symbol) and looks for the next symbol (id) that can follow T.
• The parser resumes by pushing T onto the stack, as if the invalid input * had never been encountered,
and continues parsing from there.
74
Phrase-Level Error Recovery in LR
Parsing
• Each empty entry in the action table is marked with a specific error
routine.
• An error routine reflects the error that the user most likely will make
in that case.
• An error routine inserts the symbols into the stack or the input (or it
deletes the symbols from the stack and the input, or it can do both
insertion and deletion).
• missing operand(e1)
• unbalanced right parenthesis(e2)
• Missing operator(e3)
• Missing right parenthesis(e4)
75
Example
EE+E
|E*E action goto
|(E)
| id
id + * ( ) $ E
missing operand(e1) 0 s3 e1 e1 s2 e2 e1 1
unbalanced right parenthesis(e2) 1 e3 s4 s5 e3 e2 acc
Missing operator(e3)
Missing right parenthesis(e4) 2 s3 e1 e1 s2 e2 e1 6
3 r4 r4 r4 r4 r4 r4
4 s3 e1 e1 s2 e2 e1 7
5 s3 e1 e1 s2 e2 e1 8
6 e3 s4 s5 e3 s9 e4
7 r1 r1 s5 r1 r1 r1
8 r2 r2 r2 r2 r2 r2 76
YACC
• LALR parser generator Yacc
• “Yet another compiler-compiler”
• Available on different platforms
• UNIX, Linux
77
Creating an Input/Output Translator with
Yacc
Yacc specification
Yacc compiler y.tab.c
translate.y
78
Linking lex&yacc
79
• A Yacc source program has three parts
• declarations
%%
translation rules
%%
supporting C functions
• Ex:
• EE+T|T
TT*F|F
F(E)|digit
80
• %{
#include <stdio.h>
%}
%token DIGIT
%%
expr : expr ‘+’ term { $$=$1+$3; }
| term
;
term : term ‘*’ factor { $$=$1*$3; }
| factor
;
factor : ‘(‘ expr ‘)’ { $$ = $2; }
| DIGIT
;
%%
Auxiliary procedures
81