Ch3_Syntax Analysis
Ch3_Syntax Analysis
Syntax Analysis
Left Factoring
Context-Free Grammars versus Regular Expressions
E. Boldface strings such as id or if, each of which represents a single terminal symbol.
2. These symbols are non-terminals:
by E, T, and F, respectively.
3. Uppercase letters late in the alphabet , such as X, Y, Z, represent grammar symbols; that is ,
11 either non-terminals or terminals.
Contd.
4. Lowercase letters late in the alphabet , chiefly u, v, ... ,z , represent (possibly empty) strings
of terminals.
5. Lowercase Greek letters ,,, for example, represent (possibly empty) strings of grammar
symbols.
Thus, a generic production can be written as A , where A is the head and the body.
6. A set of productions A 1, A 2, A 3,..., A k with a common head A (call them A-
7. Unless stated otherwise, the head of the first production is the start symbol.
12
Derivation
A sequence of replacements of non-terminal symbols to obtain
strings/sentences is called a derivation
• If we have a grammar E E+E then we can replace E by
E+E
• In general a derivation step is A if there is a
production rule A in a grammar
• where and are arbitrary strings of terminal and non-
terminal symbols
Derivation of a string should start from a production with start
symbol in the left
S
is a sentential form (terminals & non-terminals Mixed)
RMD for - ( id + id )
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
id * E + E id
E E
id id
id id
A 1A’|…. | nA’
A’→ 1 A’ | …. | nA’ |
Eliminating left recursion…
Example
E→E+T | T
T→T*F | F
F→(E) | id
After eliminating the left-recursion the grammar becomes,
E → TE’
E’→+TE’|ε
T → FT’
T’→*FT’|ε
F→ (E) |id
Left factoring
When a non-terminal has two or more productions
whose right-hand sides start with the same grammar
symbols, the grammar is not LL(1) and cannot be used for
predictive parsing
A predictive parser (a top-down parser without
backtracking) insists that the grammar must be left-
factored.
21
Left factoring…
When processing α we do not know whether to expand A to
αβ1 or to αβ2, but if we re-write the grammar as follows:
A αA’
A’ β1 | β2 so, we can immediately expand A to αA’.
22
Left factoring…
23
Left factoring…
Example 1 Example 2
A abB|aB|cdg|cdeB|cdfB A ad|a|ab|abc|b
A a A’ |cd A’‘ A a A’ |b
A’ bB|B A’ d| Ɛ| b A’‘
A’‘ g|eB|fB A’‘ Ɛ|c
Context-Free Grammars versus Regular Expressions
Every regular language is a context-free language, but not vice-
versa.
Example: The grammar for regular expression (a|b)*abb
Describe the same language, the set of strings of a's and b's
ending in abb. So we can easily describe these languages either by
finite Automata or PDA.
On the other hand, the language L ={anbn | n ≥1} with an equal
number of a's and b's is a prototypical example of a language that
can be described by a grammar but not by a regular expression.
We can say that "finite automata cannot count" meaning that a
finite automaton cannot accept a language like {anbn | n ≥ 1} that
would require it to keep count of the number of a's before it sees
the b’s.
Recursive descent
Involves Back tracking Operator precedence
predictive parsing
Parsing without LR parsing
backtracking
SLR
Recursive
predictive
CLR
Non-Recursive
predictive
Or LL(1) LALR
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
2. If in FIRST()
for each terminal a in FOLLOW(A) add A to M[A,a]
All other undefined entries of the parsing table are error entries.
Handle: A “handle” of a string is a substring of the string that matches the right
side of a production, and whose reduction to the non terminal of the production
is one step along the reverse of rightmost derivation.
Handle pruning: The process of discovering a handle and reducing it to
appropriate left hand side non terminal is known as handle pruning.
EE+E
EE*E String: id1+id2*id3
Eid
Rightmost Derivation Right sentential form Handle Production
NB: If a shift-reduce parser cannot be used for a grammar, that grammar is called
non-LR(k) grammar. An ambiguous grammar can never be an LR grammar
SLR, LR and LALR work same, only their parsing tables are
different.
Chapter – 3 : Syntax Analysis 69 Bahir Dar Institute of Technology
LR Parsers
LR parsing is attractive because:
• LR parsers can be constructed to recognize virtually all
programming-language constructs for which context-free
grammars can be written.
• LR parsing is most general non-backtracking shift-reduce
parsing, yet it is still efficient.
• The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive parsers.
• LL(1)-Grammars LR(1)-Grammars
• An LR-parser can detect a syntactic error as soon as it is
possible to do so a left-to-right scan of the input.
Drawback of the LR method is that it is too much work
to construct an LR parser by hand.
• Use tools e.g. yacc
Sm
Xm
LR Parsing Algorithm output
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions will be a a state number
t applied t
e e
S s
LR parser configuration
Behavior of an LR parser describe the complete state of
the parser.
A configuration of an LR parser is a pair:
(S0 X1 S1 X2 S2… Xm Sm , ai ai+1 … an $)
inputs
stack
This configuration represents the right-sentential form
(X1 X2 … Xm , ai ai+1,…, an $)
73
Behavior of LR parser…
74
LR-parsing algorithm
METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state,
and w$ in the input buffer.
Let a be the first symbol of w$;
while(1)
{ /* repeat forever */
let S be the state on top of the stack;
if ( ACTION[S, a] = shift t )
{ push t onto the stack;
let a be the next input symbol;
}
else if ( ACTION[S, a] = reduce Aβ) //reduce previous input symbol to head
{ pop |β| symbols off the stack;
let state t now be on top of the stack;
push GOTO[t, A] onto the stack;
output the production Aβ;
} else if ( ACTION[S, a] = accept ) break; /* parsing is done */
else call error- recovery routine;
}
6) F id 7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Steps for Constructing SLR Parsing Tables
NB: I1, I4, I5, I6 are called final items. They lead to fill the
‘reduce’/ri action in specific row of action part in a
NB: In the LR(0) construction table whenever any state having final item in
that particular row of action part put Ri completely.
egg. in row 4, put R3 , 3 is a leveled number for production in G
Chapter – 3 : Syntax Analysis 83 Bahir Dar Institute of Technology
Example of LR(0) parsing Table
Step 6: check the parser by implementing using stack for string abb$
88
Constructing SLR parsing tables…
I0 = {[E’ .E], [E .E + T], [E .T], [T .T * F],
[T .F], [F .(E)], [F .id]}
I1 = Goto (I0, E) = {[E’ E.], [E E. + T]}
I2 = Goto (I0, T) = {[E T.], [T T. * F]}
I3 = Goto (I0, F) = {[T F.]}
I4 = Goto (I0, () = {[F (.E)], [E .E + T], [E .T],
[T .T * F], [T .F], [F . (E)], [F .id]}
I5 = Goto (I0, id) = {[F id.]}
I6 = Goto (I1, +) = {[E E + . T], [T .T * F], [T .F],
[F .(E)], [F .id]}
89
I7 = Goto (I2, *) = {[T T * . F], [F .(E)],
[F .id]}
I8 = Goto (I4, E) = {[F (E.)], [E E . + T]}
Goto(I4,T)={[ET.], [TT.*F]}=I2;
Goto(I4,F)={[TF.]}=I3;
Goto (I4, () = I4;
Goto (I4, id) = I5;
I9 = Goto (I6, T) = {[E E + T.], [T T . * F]}
Goto (I6, F) = I3;
Goto (I6, () = I4;
Goto (I6, id) = I5;
I10 = Goto (I7, F) = {[T T * F.]}
Goto (I7, () = I4;
Goto (I7, id) = I5;
I11= Goto (I8, )) = {[F (E).]}
Goto (I8, +) = I6;
Goto (I9, *) = I7;
90
LR(0) automation
91
SLR table construction method…
Construct the SLR parsing table for the grammar G1’
Follow (E) = {+, ), $} Follow (T) = {+, ), $, *}
Follow (F) = {+, ), $,*}
E’ E
1 EE+T
2 ET
3 TT*F
4 TF
5 F (E)
6 F id
92
State action goto
id + * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 accept
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S5 S4 8 2 3
5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5
93
SLR parser…
How a shift/reduce parser parses an input string w = id * id + id using
the parsing table shown above.
3-94
LR parsing: Exercise
95
LALR and CLR parser
NB: LR(0) and SLR(1) used LR(0) items to create a parsing table
but LALR and CLR parsers used LR(1) items in order to construct a
parsing table.
Reading assignment
• LALR parser and
• CLR parser