0% found this document useful (0 votes)
8 views49 pages

Toc Unit 3

Uploaded by

Kashish Bhalla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views49 pages

Toc Unit 3

Uploaded by

Kashish Bhalla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

UNIT III

SYNTAX ANALYSIS
Topics
Role of the parser
Writing Grammars
Context-Free Grammars
Top Down parsing
Recursive Descent Parsing
Predictive Parsing
Bottom-up parsing
Shift Reduce Parsing
Operator Precedence Parsing
LR Parsers
SLR Parser
Canonical LRParser
LALR Parser
The role of Parser

Lexical toke
Parse Rest of
Sourc Analyze n Parser Tree front end IC
e r getNextTok
G
progr en
am

Symbol
table
Parsing Techniques
⚫ There are two types of parsing
Top Down parsing
Recursive Descent Parsing
Predictive Parsing

Bottom-up parsing
Shift Reduce Parsing
Operator Precedence Parsing
LR Parsers
SLR Parser
Canonical LRParser
LALR Parser
Recursive –Descent Parser
⚫ Steps are
⚫ Check whether the
Grammar (G) has Left
Recursion / Left Factoring
⚫ If it has Eliminate LR /LF
⚫ Draw the transition Diagram
⚫ Minimize the transition
diagram
Elimination of left recursion

⚫ We will consider a grammar with


productions:
Left Recursion : A → Aα / β
where α and β are sequences of
terminals and nonterminals that do not
start with A.
⚫ Elimination :
A → βA’
A’ → αA/ ε
Example: An Expression
Grammar
⚫ Let's eliminate left
recursion from the
grammar below:
E→E+T|T
T→T*F |F
F→(E)|a

⚫ The left associativity for


+ is illustrated by the
parse tree for a + a + a:
Example: An Expression
Grammar
⚫ Eliminating left
recursion gives the
grammar below:
E → TE‘
E' → +TE' | ε
T → FT‘
T' → *FT' | ε
F→(E)|a
⚫ Note the right
associativity of + in the
parse tree for
Algorithm for Eliminating General
Left Recursion
Arrange nonterminals in some order A1, A2, ... , An.
for i := 1 to n do begin
for j := 1 to i - 1 do begin
Replace each production of the form Ai -> Ajβ by
the productions:
Ai → α1β | α2β | ... | αkβ
where Aj → α1 | α2 | . . . | αk are all the current Aj
productions.
end { for j }

Remove immediate left recursion from the Ai


productions, if necessary
end { for i }
Left Factoring
⚫ In general, if A → αβ1 | αβ2 are two
A-productions, and the input
begins with a nonempty string
derived from α, we do not know
whether to expand to αβ1 or to αβ2.
⚫ Instead, the grammar may be
changed. The formal technique is
to change the grammar to
A → αA'
A' → β1 | β2
Left-Factoring – Example1
A → abB | aB | cdg | cdeB | cdfB

A → aA’ | cdg | cdeB | cdfB
A’ → bB | B

A → aA’ | cdA’’
A’ → bB | B
A’’ → g | eB | fB
Left-Factoring – Example2
A → ad | a | ab | abc | b

A → aA’ | b
A’ → d | ε | b | bc

A → aA’ | b
A’ → d | ε | bA’’
A’’ → ε | c
Example
The grammar is as follows
⚫ E -> E + T | T
⚫ T-> T*F|F
⚫ F -> (E) | id
After removing left-recursion , left-factoring
The rules are :
Rules and their transition
diagrams
E T T’
⚫ E->T T’
START
ε
⚫ T’ -> +T T’ | ε T’ +
T T

⚫ T -> F T’’
T F T’’

⚫ T -> *F T’’ | ε ε
T’ +
T T
⚫ T -> (E) |id
T ( E )

id
Optimization
After optimization it yields the following DFA
like structures:

+ *
T F
STAR
T ε
FINAL ε

T ( E )

id
Definition of First(x)
⚫ Let X be a grammar symbol (a terminal
or nonterminal) or ε. Then the set First(X)
consisting of terminals, and possibly ε, is
defined as follows:
⚫ If X is a terminal or ε, then First(X) = {X}.
⚫ If X is a nonterminal, then for each
production
X → X1X2...Xn, First(X) contains First(X1) -
{ε}.
⚫ If also for some i < n, all the sets First(X1),…,
First(Xi) contain ε, then First(X) contains
First(Xi+1) - {ε}.
Definition of First(α)
⚫ Now, define First(α) for string α =
X1X2…Xn (a string of terminals and
nonterminals), as follows:
⚫ First(α) contains First(X1) - {ε}.
⚫ For each i = 2,…,n, if First(Xk)
contains ε for all k = 1,…i - 1, then
First(α) contains First(Xi) - {ε}.
⚫ Finally, if for all i = 1,…,n, First(Xi)
contains ε, then add ε to First(α).
Example: First Sets
⚫ First(a) = {a}, First(ε) = {ε}, First(b) = {b}
⚫ To calculate First(B), look at rules for B:
⚫ B → b ⇒ First(B) = First(b) = {b}.
⚫ To calculate First(A), look at rules for A:
S → AB
⚫ A → a ⇒ First(A) includes {a} A→a|ε
⚫ A → ε ⇒ First(A) includes {ε} B→b
⚫ We conclude that First(A) = {a, ε}
⚫ To calculate First(S), look at rule S → AB.
⚫ First(S) = First(AB), which includes First(A) – {ε} =
{a}
⚫ Since First(A) contains ε, First(AB) also includes
First(B) = {b}
⚫ We conclude that First(S) = {a, b}
Definition of Follow(A)
⚫ Given a nonterminal A, the set
Follow(A), consisting of terminals,
and possibly $ (the end of input
symbol), is defined as follows:
1. If A is the start symbol, then $ is
in Follow(A).
2. If there is a production B → α A γ,
then First(γ) - {ε} is in Follow(A).
3. If there is a production B → α A γ
such that ε is in First(γ), then
Example: Follow Sets
⚫ By rule 2, if there is a production B
→ α A γ, then First(γ) - {ε} is in
Follow(A). Here are examples
showing how right hand sides of
rules match the pattern above(red
is α, green is γ)
⚫ X → abYmZ α = ab A = Y γ = mZ
⚫ X → abYmZ α = abYm A = Z γ = ε
⚫ B → Axy α=ε A=A γ = xy
Example: Follow Sets
⚫ By rule 3, if there is a production B
→ α A γ such that ε is in First(γ),
then Follow(A) contains Follow(B).
⚫ Examples of right hand sides that
match this pattern.
⚫ A → aX α=a A=X γ=ε
⚫ A → aXY α=a A=X γ = Y (ε ∈
First(Y))
⚫A→Y α=ε A=Y γ=ε
LL(1) Parser
input buffer
⚫ our string to be parsed. We will assume that its end is marked
with a special symbol $.

output
⚫ a production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input buffer.

stack
⚫ contains the grammar symbols
⚫ at the bottom of the stack, there is a special end marker symbol
$.
⚫ initially the stack contains only the symbol $ and the starting
symbol S. $S 🡺 initial stack
⚫ when the stack is emptied (ie. only $ left in the stack), the
parsing is completed.

parsing table
⚫ a two-dimensional array M[A,a]
⚫ each row is a non-terminal symbol
⚫ each column is a terminal symbol or the special symbol $
⚫ each entry holds a production rule.
LL(1) Parser – Parser Actions
⚫ The symbol at the top of the stack (say X) and the current
symbol in the input string (say a) determine the parser
action.
⚫ There are four possible parser actions.
1. If X and a are $ 🡺 parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
🡺 parser pops X from the stack, and moves the next symbol
in the input buffer.
3. If X is a non-terminal
🡺 parser looks at the parsing table entry M[X,a]. If M[X,a]
holds a production rule X→Y1Y2...Yk, it pops X from the
stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser also
outputs the production rule X→Y1Y2...Yk to represent a step
of the derivation.
4. none of the above 🡺 error
⚫ all empty entries in the parsing table are errors.
⚫ If X is a terminal symbol different from a, this is also an error case.
LL(1) Parser – Example1
S → aBa a b $
B → bB | ε S S → aBa
B B→ε B → bB

stack input output


$S abba$ S → aBa
$aBa abba$
$aB bba$ B → bB
$aBb bba$
$aB ba$ B → bB
$aBb ba$
$aB a$ B→ε
$a a$
$ $ accept, successful completion
LL(1) Parser – Example1 (cont.)
Outputs: S → aBa B → bB B → bB B→ε

Derivation(left-most): S⇒aBa⇒abBa⇒abbBa⇒abba

S
parse tree

a B a

b B

b B

ε
LL(1) Parser – Example2
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id

id + * ( ) $
E E→ E → TE’
TE’
E’ E’ → +TE’ E’ → ε E’ → ε
T T → FT’ T → FT’
T’ T’ → ε T’ → *FT’ T’ → ε T’ → ε
F F → id F → (E)
LL(1) Parser – Example2
stack input output
$E id+id$ E → TE’
$E’T id+id$ T → FT’
$E’ T’F id+id$ F → id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ → ε
$ E’ +id$ E’ → +TE’
$ E’ T+ +id$
$ E’ T id$ T → FT’
$ E’ T’ F id$ F → id
$ E’ T’id id$
$ E’ T’ $ T’ → ε
$ E’ $ E’ → ε
$ $ accept
Constructing LL(1) Parsing
Tables
⚫ Two functions are used in the construction of
LL(1) parsing tables:
⚫ FIRST & FOLLOW

⚫ FIRST(α) is a set of the terminal symbols which


occur as first symbols in strings derived from α
where α is any string of grammar symbols.
⚫ if α derives to ε, then ε is also in FIRST(α) .

⚫ FOLLOW(A) is the set of the terminals which


*
occur immediately after (follow) the non-terminal
A in the strings derived* from the starting symbol.
⚫ a terminal a is in FOLLOW(A) if S ⇒ αAaβ
⚫ $ is in FOLLOW(A) if S ⇒ αA
Constructing LL(1) Parsing Table --
Algorithm
⚫ for each production rule A → α of a
grammar G
⚫ for each terminal a in FIRST(α)
🡺 add A → α to M[A,a]
⚫ If ε in FIRST(α)
🡺 for each terminal a in FOLLOW(A) add A
→ α to M[A,a]
⚫ If ε in FIRST(α) and $ in FOLLOW(A)
🡺 add A → α to M[A,$]

⚫ All other undefined entries of the parsing


table are error entries.
LL(1) Grammars
⚫ A grammar whose parsing table has no
multiply-defined entries is said to be LL(1)
grammar.

one input symbol used as a look-head symbol do


determine parser action

LL(1) left most derivation


input scanned from left to right

⚫ The parsing table of a grammar may


contain more than one production rule. In
this case, we say that it is not a LL(1)
grammar.
A Grammar which is not LL(1)
S→iCtSE | a FOLLOW(S) = { $,e }
E→eS | ε FOLLOW(E) = { $,e }
C→b FOLLOW(C) = { t }

FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S→
FIRST(eS) = {e} S S → a iCtSE
FIRST(ε) = {ε} E→eS E→
E
FIRST(b) = {b} E→ε ε

two production rules for M[E,e]


C C→b
Problem 🡺 ambiguity
Properties of LL(1) Grammars
⚫ A grammar G is LL(1) if and only if the
following conditions hold for two
distinctive production rules A → α and
A→β

1. Both α and β cannot derive strings starting


with same terminals.

2. At most one of α and β can derive to ε.

3. If β can derive to ε, then α cannot derive to


any string starting with a terminal in
FOLLOW(A).
Bottom-Up Parsing
⚫ A bottom-up parser creates the parse tree of the
given input starting from leaves towards the root.
⚫ A bottom-up parser tries to find the right-most
derivation of the given input in the reverse order.
S ⇒ ... ⇒ ω (the right-most derivation of ω)
← (the bottom-up parser finds the right-most
derivation in the reverse order)
⚫ Bottom-up parsing is also known as shift-reduce
parsing because its two main actions are shift and
reduce.
⚫ At each shift action, the current symbol in the input
string is pushed to a stack.
⚫ At each reduction step, the symbols at the top of the
stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the
left side of that production.
⚫ There are also two more actions: accept and error.
Shift-Reduce Parsing
⚫ A shift-reduce parser tries to reduce the given
input string into the starting symbol.

a string 🡺 the starting symbol


reduced to
⚫ At each reduction step, a substring of the input
matching to the right side of a production rule is
replaced by the non-terminal at the left side of that
production rule. *
rm
⚫ If the substring is chosen correctly, the right most
derivation of that string is created in the reverse
rm rm
order.

Rightmost Derivation: S⇒ω


Shift-Reduce Parser finds: ω ⇐ ... ⇐ S
Shift-Reduce Parsing --
Example
S → aABb input string: aaabb
A → aA | a aaAbb
B → bB | b aAbb ⇓ reduction
aABb
S

S ⇒ aABb ⇒ aAbb ⇒ aaAbb ⇒ aaabb


rm rm rm rm
Right Sentential Forms

⚫ How do we know which substring to be replaced at


each reduction step?
Handle
⚫ Informally, a handle of a string is a substring that
matches the right side of a production rule.
⚫ But not every substring matches the right side of a
production rule is handle

⚫ A handle of a right sentential form γ (≡ αβω) is


a production rule A → β and a position of γ
where the string β may be found and replaced by A to
produce
the previous* right-sentential form in a rightmost
derivation of
rm γ. rm

S ⇒ αAω ⇒ αβω

⚫ If the grammar is unambiguous, then every


right-sentential form of the grammar has exactly one
handle.
Handle Pruning
⚫ A right-most derivation in reverse can be
obtained by handle-pruning.
S=γ0 ⇒ γ1 ⇒ γ2 ⇒ ... ⇒ γn-1 ⇒ γn= ω
rm rm rm rm rm
input string
⚫ Start from γn, find a handle An→βn in γn,
and replace βn in by An to get γn-1.
⚫ Then find a handle An-1→βn-1 in γn-1,
and replace βn-1 in by An-1 to get γn-2.
⚫ Repeat this, until we reach S.
A Shift-Reduce Parser
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E ⇒ E+T ⇒ E+T*F ⇒ E+T*id ⇒ E+F*id
F → (E) | id ⇒ E+id*id ⇒ T+id*id ⇒ F+id*id ⇒ id+id*id

Right-Most Sentential FormReducing Production


id+id*id F → id
F+id*id T→F
T+id*id E→T
E+id*id F → id
E+F*id T→F
E+T*id F → id
E+T*F T → T*F
E+T E → E+T
E
Handles are red and underlined in the right-sentential
forms.
A Stack Implementation of A
Shift-Reduce Parser
⚫ There are four possible actions of a
shift-parser action:

1. Shift : The next input symbol is shifted onto


the top of the stack.
2. Reduce: Replace the handle on the top of the
stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls
an error recovery routine.

⚫ Initial stack just contains only the


end-marker $.
A Stack Implementation of A
Shift-Reduce Parser
Stack Input Action
$ id+id*id$ shift
$id +id*id$ reduce by F → id Parse Tree
$F +id*id$ reduce by T → F
$T +id*id$ reduce by E → T E 8
$E +id*id$ shift
$E+ id*id$ shift E 3 + T 7
$E+id *id$ reduce by F → id
$E+F *id$ reduce by T → F T 2 T 5 * F6
$E+T *id$ shift
$E+T* id$ shift F 1 F 4 id
$E+T*id $ reduce by F → id
$E+T*F $ reduce by T → T*F id id
$E+T $ reduce by E → E+T
$E $ accept
LR Parsers
⚫ The most powerful shift-reduce parsing (yet
efficient) is:
LR(k) parsing.

left to right right-most k lookhead


scanning derivation (k is omitted 🡺 it is 1)
⚫ LR parsing is attractive because:
⚫ LR parsing is most general non-backtracking
shift-reduce parsing, yet it is still efficient.
⚫ The class of grammars that can be parsed using
LR methods is a proper superset of the class of
grammars that can be parsed with predictive
parsers.
LL(1)-Grammars ⊂ LR(1)-Grammars
⚫ An LR-parser can detect a syntactic error as soon
LR Parsers
⚫ LR-Parsers
⚫ covers wide range of grammars.
⚫ SLR – simple LR parser
⚫ LR – most general LR parser
⚫ LALR – intermediate LR parser (look-head
LR parser)
⚫ SLR, LR and LALR work same (they used
the same algorithm), only their parsing
tables are different.
LR Parsing Algorithm
input a1 ... ai ... an $
stack

Sm
Xm output
LR Parsing Algorithm
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s
A Configuration of LR Parsing
Algorithm
⚫ A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

⚫ Sm and ai decides the parser action by


consulting the parsing action table. (Initial
Stack contains just So )

⚫ A configuration of a LR parsing represents the


right sentential form:
X1 ... Xm ai ai+1 ... an $
Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the
state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $
)

2. reduce A→β (or rn where n is a production


number)
⚫ pop 2|β| (=r) items from the stack;
⚫ then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an
$)

⚫ Output is the reducing production reduce A→β

3. Accept – Parsing successfully completed

4. Error -- Parser detected an error (an empty entry


in the action table)
Reduce Action
⚫ pop 2|β| (=r) items from the stack; let us
assume that β = Y1Y2...Yr
⚫ then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ...


an $ )
🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

⚫ In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ ⇒ X1 ... Xm Y1...Yr ai ai+1


... an $
(SLR) Parsing Tables for Expression
Grammar Action Table Goto Table
1) E → E+T state id + * ( ) $ E T F
2) E→T 0 s5 s4 1 2 3
3) T → T*F 1 s6 acc
4) T→F
2 r2 s7 r2 r2
5) F → (E)
3 r4 r4 r4 r4
6) F → id
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Actions of A (S)LR-Parser --
Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by F→id F→id
0F3 *id+id$ reduce by T→F T→F
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by F→id F→id
0T2*7F10 +id$ reduce by T→T*F T→T*F
0T2 +id$ reduce by E→T E→T
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by F→id F→id
0E1+6F3 $ reduce by T→F T→F
0E1+6T9 $ reduce by E→E+T E→E+T
0E1 $ accept
Constructing SLR Parsing Tables –
LR(0) Item
⚫ An LR(0) item of a grammar G is a production of G

.
a dot at the some position of the right side.

.
⚫ Ex: A → aBb Possible LR(0) Items: A → aBb

.
(four different possibility) A → a Bb

.
A → aB b
A → aBb
⚫ Sets of LR(0) items will be the states of action and
goto table of the SLR parser.
⚫ A collection of sets of LR(0) items (the canonical
LR(0) collection) is the basis for constructing SLR
parsers.
⚫ Augmented Grammar:
G’ is G with a new production rule S’→S where S’ is

You might also like