Toc Unit 3
Toc Unit 3
SYNTAX ANALYSIS
Topics
Role of the parser
Writing Grammars
Context-Free Grammars
Top Down parsing
Recursive Descent Parsing
Predictive Parsing
Bottom-up parsing
Shift Reduce Parsing
Operator Precedence Parsing
LR Parsers
SLR Parser
Canonical LRParser
LALR Parser
The role of Parser
Lexical toke
Parse Rest of
Sourc Analyze n Parser Tree front end IC
e r getNextTok
G
progr en
am
Symbol
table
Parsing Techniques
⚫ There are two types of parsing
Top Down parsing
Recursive Descent Parsing
Predictive Parsing
Bottom-up parsing
Shift Reduce Parsing
Operator Precedence Parsing
LR Parsers
SLR Parser
Canonical LRParser
LALR Parser
Recursive –Descent Parser
⚫ Steps are
⚫ Check whether the
Grammar (G) has Left
Recursion / Left Factoring
⚫ If it has Eliminate LR /LF
⚫ Draw the transition Diagram
⚫ Minimize the transition
diagram
Elimination of left recursion
⚫ T -> F T’’
T F T’’
⚫ T -> *F T’’ | ε ε
T’ +
T T
⚫ T -> (E) |id
T ( E )
id
Optimization
After optimization it yields the following DFA
like structures:
+ *
T F
STAR
T ε
FINAL ε
T ( E )
id
Definition of First(x)
⚫ Let X be a grammar symbol (a terminal
or nonterminal) or ε. Then the set First(X)
consisting of terminals, and possibly ε, is
defined as follows:
⚫ If X is a terminal or ε, then First(X) = {X}.
⚫ If X is a nonterminal, then for each
production
X → X1X2...Xn, First(X) contains First(X1) -
{ε}.
⚫ If also for some i < n, all the sets First(X1),…,
First(Xi) contain ε, then First(X) contains
First(Xi+1) - {ε}.
Definition of First(α)
⚫ Now, define First(α) for string α =
X1X2…Xn (a string of terminals and
nonterminals), as follows:
⚫ First(α) contains First(X1) - {ε}.
⚫ For each i = 2,…,n, if First(Xk)
contains ε for all k = 1,…i - 1, then
First(α) contains First(Xi) - {ε}.
⚫ Finally, if for all i = 1,…,n, First(Xi)
contains ε, then add ε to First(α).
Example: First Sets
⚫ First(a) = {a}, First(ε) = {ε}, First(b) = {b}
⚫ To calculate First(B), look at rules for B:
⚫ B → b ⇒ First(B) = First(b) = {b}.
⚫ To calculate First(A), look at rules for A:
S → AB
⚫ A → a ⇒ First(A) includes {a} A→a|ε
⚫ A → ε ⇒ First(A) includes {ε} B→b
⚫ We conclude that First(A) = {a, ε}
⚫ To calculate First(S), look at rule S → AB.
⚫ First(S) = First(AB), which includes First(A) – {ε} =
{a}
⚫ Since First(A) contains ε, First(AB) also includes
First(B) = {b}
⚫ We conclude that First(S) = {a, b}
Definition of Follow(A)
⚫ Given a nonterminal A, the set
Follow(A), consisting of terminals,
and possibly $ (the end of input
symbol), is defined as follows:
1. If A is the start symbol, then $ is
in Follow(A).
2. If there is a production B → α A γ,
then First(γ) - {ε} is in Follow(A).
3. If there is a production B → α A γ
such that ε is in First(γ), then
Example: Follow Sets
⚫ By rule 2, if there is a production B
→ α A γ, then First(γ) - {ε} is in
Follow(A). Here are examples
showing how right hand sides of
rules match the pattern above(red
is α, green is γ)
⚫ X → abYmZ α = ab A = Y γ = mZ
⚫ X → abYmZ α = abYm A = Z γ = ε
⚫ B → Axy α=ε A=A γ = xy
Example: Follow Sets
⚫ By rule 3, if there is a production B
→ α A γ such that ε is in First(γ),
then Follow(A) contains Follow(B).
⚫ Examples of right hand sides that
match this pattern.
⚫ A → aX α=a A=X γ=ε
⚫ A → aXY α=a A=X γ = Y (ε ∈
First(Y))
⚫A→Y α=ε A=Y γ=ε
LL(1) Parser
input buffer
⚫ our string to be parsed. We will assume that its end is marked
with a special symbol $.
output
⚫ a production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input buffer.
stack
⚫ contains the grammar symbols
⚫ at the bottom of the stack, there is a special end marker symbol
$.
⚫ initially the stack contains only the symbol $ and the starting
symbol S. $S 🡺 initial stack
⚫ when the stack is emptied (ie. only $ left in the stack), the
parsing is completed.
parsing table
⚫ a two-dimensional array M[A,a]
⚫ each row is a non-terminal symbol
⚫ each column is a terminal symbol or the special symbol $
⚫ each entry holds a production rule.
LL(1) Parser – Parser Actions
⚫ The symbol at the top of the stack (say X) and the current
symbol in the input string (say a) determine the parser
action.
⚫ There are four possible parser actions.
1. If X and a are $ 🡺 parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
🡺 parser pops X from the stack, and moves the next symbol
in the input buffer.
3. If X is a non-terminal
🡺 parser looks at the parsing table entry M[X,a]. If M[X,a]
holds a production rule X→Y1Y2...Yk, it pops X from the
stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser also
outputs the production rule X→Y1Y2...Yk to represent a step
of the derivation.
4. none of the above 🡺 error
⚫ all empty entries in the parsing table are errors.
⚫ If X is a terminal symbol different from a, this is also an error case.
LL(1) Parser – Example1
S → aBa a b $
B → bB | ε S S → aBa
B B→ε B → bB
Derivation(left-most): S⇒aBa⇒abBa⇒abbBa⇒abba
S
parse tree
a B a
b B
b B
ε
LL(1) Parser – Example2
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
id + * ( ) $
E E→ E → TE’
TE’
E’ E’ → +TE’ E’ → ε E’ → ε
T T → FT’ T → FT’
T’ T’ → ε T’ → *FT’ T’ → ε T’ → ε
F F → id F → (E)
LL(1) Parser – Example2
stack input output
$E id+id$ E → TE’
$E’T id+id$ T → FT’
$E’ T’F id+id$ F → id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ → ε
$ E’ +id$ E’ → +TE’
$ E’ T+ +id$
$ E’ T id$ T → FT’
$ E’ T’ F id$ F → id
$ E’ T’id id$
$ E’ T’ $ T’ → ε
$ E’ $ E’ → ε
$ $ accept
Constructing LL(1) Parsing
Tables
⚫ Two functions are used in the construction of
LL(1) parsing tables:
⚫ FIRST & FOLLOW
FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S→
FIRST(eS) = {e} S S → a iCtSE
FIRST(ε) = {ε} E→eS E→
E
FIRST(b) = {b} E→ε ε
S ⇒ αAω ⇒ αβω
Sm
Xm output
LR Parsing Algorithm
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s
A Configuration of LR Parsing
Algorithm
⚫ A configuration of a LR parsing is:
.
a dot at the some position of the right side.
.
⚫ Ex: A → aBb Possible LR(0) Items: A → aBb
.
(four different possibility) A → a Bb
.
A → aB b
A → aBb
⚫ Sets of LR(0) items will be the states of action and
goto table of the SLR parser.
⚫ A collection of sets of LR(0) items (the canonical
LR(0) collection) is the basis for constructing SLR
parsers.
⚫ Augmented Grammar:
G’ is G with a new production rule S’→S where S’ is