0% found this document useful (0 votes)
17 views

SLR Parsing

This document discusses several key concepts related to LR(0) parsing: 1. It defines LR(0) items as productions with a dot indicating parsing progress, and describes the initial item set I0 for a sample grammar. 2. It introduces closure of item sets, the GOTO function used to derive new item sets, and defines the canonical LR(0) collection and LR(0) automaton derived from these operations. 3. It briefly describes shift-reduce parsing and the stack/input configuration used by LR parsers to represent parsing progress.

Uploaded by

Anurag Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

SLR Parsing

This document discusses several key concepts related to LR(0) parsing: 1. It defines LR(0) items as productions with a dot indicating parsing progress, and describes the initial item set I0 for a sample grammar. 2. It introduces closure of item sets, the GOTO function used to derive new item sets, and defines the canonical LR(0) collection and LR(0) automaton derived from these operations. 3. It briefly describes shift-reduce parsing and the stack/input configuration used by LR parsers to represent parsing progress.

Uploaded by

Anurag Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

SLR Parsing

Dr. Rahul Das Gupta


Augmented Grammar:
Given Grammar G={V, T, P, S}
Augmented Grammar G’={V’, T’, P’, S’}
={VU{S’}, T, PU{S’->S}, S’}
Here, it is important to note that L(G)=L(G’), as an extra
unit production S’->S is added in G’ i.e., P’=PU{S’->S}.
S’ is the start symbol of the new Augmented Grammar
G’.
Example:
G={ {E,T,F}, {+,*,(,),id}, {E->E+T, E->T, T->T*F, T->F,
F->(E), F->id}, E}

G’={ {E’,E,T,F}, {+,*,(,),id}, { E’->E, E->E+T, E->T, T->T*F,


T->F, F->(E), F->id}, E’}

LR(0) Items

An LR(0) item of a grammar G is a production of G with a dot at some


position of the right side. Thus, production A → XYZ yields the four
items

A→.XYZ, A→X.YZ, A→XY.Z, A→XYZ.


Intuitively, an item indicates how much of a production
we have seen at given point in the parsing process.

I0:
E’->.E
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id

I1: delta(I0, E)=I1


E’->E.
E->E.+T

I2: delta(I0, T)=I2


E->T.
T->T.*F
I3: delta(I0, F)= I3
T->F.

I4: delta(I0, ()= I4


F->(.E)
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id

I5: delta (I0, id)= I5


F->id.

I6: delta (I1, +)= I6


E->E+.T
T->.T*F
T->.F
F->.(E)
F->.id

Viable Prefixes

The set of prefixes of right sentential forms that can appear


on the stack of a shift-reduce parser are called viable
prefixes.
An equivalent definition of a viable prefix is that it is a prefix
of a right sentential form that does not continue past the
right end of the rightmost handle of that sentential form.

Right-sentential Form

A right-sentential form of a grammar G is a sequence of tokens and


nonterminals (variables) that can be derived from the starting nonterminal
(variable) S in a rightmost derivation.

Remember that the parser constructs a rightmost derivation backwards.

At any given point, it finds the right-hand side of a production, called a handle,
and replaces the handle by the left-hand side of the production.

Handle

A handle of a string is a substring that matches the


right side of a production, and whose reduction to
the nonterminal on the left side of the production
represents one step along the reverse of a
rightmost derivation.

A handle of a right — sentential form γ is a production A→β and a


position of γ where the string β may be found and replaced by A to
produce the previous right-sentential form in a rightmost derivation of
γ. That is , if S =>αAw =>αβw,then A→β in the position following α is
a handle of αβw.
Handle Pruning

A rightmost derivation in reverse order can be obtained by


Handle Pruning.
i.e., start with a string of terminals w that is to parse. If w is a
sentence of the grammar at hand, then w = γn, where γn is the nth right
sentential form of some as yet unknown rightmost derivation.
S = γ0 →rm γ1 →rm γ2 →rm … →rm γn-1 →rm γn = w.

Example for right sentential form and handle for grammar


E→E+E
E→E*E
E→(E)
E → id

RIGHT SENTENTIAL HANDLE REDUCING


FORM PRODUCTION
id1+ id2* id3 id1 E->id
E+ id2* id3 id2 E->id
E+E* id3 id3 E->id
E+ E* E E*E E->E*E
E+E E+E E->E+E
E
Shift-Reduce Parsing

▪ It is a Bottom Up Parsing with following two actions:


▪ Shift: Move the terminal symbol from right of the
dot to the left substring.
If the string before shift action is a.pqr
then
ap.qr
the string after shift action will be
▪ Reduce: Immediately on the left of ‘. ‘ identify a string
same as RHS of a production and replace it by LHS.
▪ If the string before reduce action is aβ.pqr and
A→β is a production rule in the given grammar
then
the string after reduce action will be aA.pqr

Example of Shift-Reduce Parsing:

Production rules of the given grammar are


E->E+E, E->E*E, E->id
Production rules of the augmented grammar are
E’->E, E->E+E, E->E*E, E->id

String Action
.id * id + id $ shift
id .* id + id $ reduce E->id
E .* id + id $ shift
E *. id + id $ shift
E * id . + id $ reduce E->id
E * E . + id $ reduce E->E*E
E .+ id $ shift
E + . id $ shift
E + id . $ reduce E->id
E+E.$ reduce E->E+E
E. $ reduce E’->E
E’ Accept

How to detect the presence of a Handle at the


left side of ‘.’ ???

Configuration of an LR parser to
represent the right-sentential form
X1X2...Xmaiai+1...an

A configuration of an LR parser is a pair whose first


component is the stack and whose second component
is the unexpanded input:

(s0X1s1X2s2...Xmsm, aiai+1...an$)
ai : Current input symbol (terminal)
sm: Current state on top of the stack
s0: Initial state
Xi are Grammar Symbol (Terminal or Non-
terminal) inside stack.

This configuration represents the following right-


sentential form
X1X2...Xmaiai+1...an
Shift

If action[sm, ai] = shift s and current configuration


be (s0X1s1X2s2...Xmsm, aiai+1...an$) then the
configuration after shift operation will be the
following
(s0X1s1X2s2...Xmsmais, ai+1...an$)

Reduce

If action[sm, ai] = reduce A→β, the current


configuration be
(s0X1s1X2s2...Xmsm, aiai+1...an$) and ai is one
of the member of Follow(A)
then
the configuration after reduce operation will be the
following
(s0X1s1X2s2...Xm-rsm-rAs, aiai+1...an$)
where GOTO[sm-r, A] = s and r = length of β.

Here, Xm-r+1Xm-r+2 ... Xm = β is a viable


prefix and ai is one of the member of Follow(A)

Kernel and Non-kernel items

Kernel items, whish include the initial item, S’→ .S, and all
items whose dots are not at the left end.

Non-kernel items, which have their dots at the left end.

Closure of Item Sets


Say I is a set of items and one of these items is A→α·Bβ. This item represents the
parser having seen α and records that the parser might soon see the remainder of
the RHS. For that to happen the parser must first see a string derivable from B.
Now consider any production starting with B, say B→γ. If the parser is to make
progress on A→α·Bβ, it will need to be making progress on one such B→·γ.
Hence we want to add all the latter productions to any state that contains the
former. We formalize this into the notion of closure.

For any set of items I, CLOSURE(I) is formed as follows.

Algorithm to find Closure of an item I


1. Initialize CLOSURE(I) = {I}
2. If A → α · B β is in CLOSURE(I) and B → γ is a
production, then add B → · γ to the closure i.e.,
CLOSURE(I) = CLOSURE(I) U { B → · γ }
3. Repeat step -2 until no more items can be added
to CLOSURE(I).
Example: Recall our main example

E' → E
E → E + T | T
T → T * F | F
F → ( E ) | id
CLOSURE({E' → · E}) contains 7 elements. The 6 new elements are the 6 original
productions each with a dot right after the arrow. Make sure you understand why
all 6 original productions are added. It is not because the E'→E production is
special.

The Function GOTO

If X is a grammar symbol, then moving from A→α·Xβ to A→αX·β signifies that


the parser has just processed (input derivable from) X. The parser was in the
former position and (input derivable from) X was on the input; this caused the
parser to go to the latter position. We (almost) indicate this by writing
GOTO(A→α·Xβ,X) is A→αX·β. I said almost because GOTO is actually defined
from item sets to item sets not from items to items.

Definition of GOTO:

If I is an item set and X is a grammar symbol, then


GOTO(I, X) is the closure of the set of items A→αX·β
where A→α·Xβ is in I i.e.,
GOTO (I, X) = { CLOSURE (A→αX.β) for all A→α·Xβ is
in I } .
The Canonical Collection of LR(0) Items—the LR(0) Automaton

I really believe this is very clear, but I understand that the formalism makes it seem
confusing. Let me begin with the idea.
We augment the grammar and get this one new production; take its closure. That is
the first element of the collection; call it I0. Try GOTOing from I0, i.e., for each
grammar symbol, consider GOTO(I0,X); each of these (almost) is another element
of the collection. Now try GOTOing from each of these new elements of the
collection, etc. Start with jane smith, add all her friends F, then add the friends of
everyone in F, called FF, then add all the friends of everyone in FF, etc

The (almost) is because GOTO(I0,X) could be empty so formally we construct the


canonical collection of LR(0) items, C, as follows

1. Initialize C = CLOSURE({S' → S})


2. If I is in C, X is a grammar symbol, and GOTO(I,X)≠φ then add it to C and
repeat.

This GOTO gives exactly the arcs in the DFA I constructed earlier. The formal
treatment does not include the NFA, but works with the DFA from the beginning.

Definition: The above collection of item sets (so this is a set of sets) is called
the canonical LR(0) collection and the DFA having this collection as nodes and
the GOTO function as arcs is called the LR(0) automaton.

Homework: Construct the LR(0) automaton for the following grammar (which
produces simple postfix expressions).
S → S S + | S S * | a
Don't forget to augment the grammar.

The DFA for our Main Example

Our main example

E' → E
E → E + T | T
T → T * F | F
F → ( E ) | id
is larger than the toy I did before. The NFA would have 2+4+2+4+2+4+2=20
states (a production with k symbols on the RHS gives k+1 N-states since there k+1
places to place the dot). This gives rise to 12 D-states. However, the development
in the book, which we are following now, constructs the DFA directly. The
resulting diagram is on the right.

Start constructing the diagram on the board:


Begin with {E' → ·E}, take the closure, and then keep applying GOTO.
c

Use of the LR(0) Automaton

A state of the automaton is an item set as described previously. The transition


function is GOTO. If during a parse we are up to item set Ij (often called state sj or
simply state j) and the next input symbol is b (it of course must be a terminal), then
the parser shifts in b if the state j has an outgoing transition labeled b. If there is no
such transition, then the parser performs a reduction; choosing which reduction to
use is determined by the items in Ij and the FOLLOW sets. (It is also possible that
the parser will now accept the input string or announce that the input string is not
in the language).
4.6.3: The LR-Parsing Algorithm

The LR-parsing algorithm must decide when to shift and when to reduce (and in
the latter case, by which production). It does this by consulting two tables,
ACTION and GOTO. The basic algorithm is the same for all LR parsers, what
changes are the tables ACTION and GOTO.

The LR Parsing Tables

We have already seen GOTO (for SLR).

Technical point that may, and probably should, be ignored: our GOTO was defined
on pairs [item-set,grammar-symbol]. The new GOTO is defined on pairs
[state,nonterminal]. A state is simply an item set (so nothing is new here). We will
not use the new GOTO on terminals so we just define it on nonterminals.

Given a state i and a terminal a (or the endmarker), ACTION[i,a] can be

1. Shift j. The terminal a is shifted on to the stack and the parser enters state j.
2. Reduce A → α. The parser reduces α on the TOS to A.
3. Accept.
4. Error

So ACTION is the key to deciding shift vs. reduce. We will soon see how this
table is computed for SLR.

Since ACTION is defined on [state,terminal] pairs and GOTO is defined on


[state,nonterminal] pairs, we can combine these tables into one defined on
[state,grammar-symbol] pairs.

LR-Parser Configurations (formalism)

This formalism is useful for stating the actions of the parser precisely, but I believe
the parser can be explained without this formalism. The essential idea of the
formalism is that the entire state of the parser can be represented by the vector of
states on the stack and input symbols not yet processed.

As mentioned above the Symbols column is redundant so a configuration of the


parser consists of the current stack and the remainder of the input. Formally it is

(s0,s1...sm,aiai+1...an$)
where the s's are states and the a's input symbols. This configuration could also be
represented by the right-sentential form
X1...Xm,ai...an
where the X is the symbol associated with the state. X is either the terminal just
shifted in or the LHS of the reduction just performed.

Behavior of the LR Parser

The parser consults the combined ACTION-GOTO table for its current state (TOS)
and next input symbol, formally this is ACTION[sm,ai], and proceeds based on the
value in the table. If the action is a shift, the next state is clear from the DFA We
have done this informally just above; here we use the formal treatment).

1. Shift s. The input symbol a is pushed and s becomes the new state. The new
configuration is

(s0...sms,ai+1...an)

2. Reduce A → α. Let r be the number of symbols in the RHS of the


production. The parser pops r items off the stack (backing up r states) and
enters the state GOTO(sm-r,A). That is after backing up it goes where A says
to go. A real parser would now probably do something, e.g., build a tree
node or perform a semantic action. Although we know about this from the
chapter 2 overview, we don't officially know about it here. So for now
simply print the production the parser reduced by.
3. Accept.
4. Error.

4.6.4 Constructing SLR-Parsing Tables

The missing piece of the puzzle is finally revealed.

A Simple Example

Before defining the ACTION and GOTO tables precisely, I want to do it


informally via the simple example on the right. I produced that diagram without
starting from a grammar so I really don't know if it is realistic, but it does illustrate
how the tables are constructed directly from the diagram.

For convenience number the productions of the grammar to make them easy to
reference. Assume that the production B → a+b is numbered 2.

IMPORTANT: In order to construct the ACTION table, you do need something


not in the diagram. You need to know the FOLLOW sets, the same sets that we
constructed for top-down parsing.
For this example let us assume FOLLOW(B)={b} and all other follow sets are
empty. Again, I am not claiming that there is a grammar with this diagram and
these FOLLOW sets.

The action table is defined with states (item sets) as rows and terminals and the $
endmarker as columns. GOTO has the same rows, but has nonterminals as
columns. So we construct a combined ACTION-GOTO table, with states as rows
and grammar symbols (terminals + nonterminals) as
columns.
State a b + $ ABC
1. Each arc in the diagram labeled with a terminal 7 acc
indicates a shift. In the entry with row the state
8 s11 s10 9 7
at the tail of the arc and column the labeling
terminal place sn, where n is the state at the 9
head of the arc. This indicates that if we are in 10
the given state and the input is the given 11 s12
terminal, we shift to new state n. 12 s13
2. Each arc in the diagram labeled with a 13 r2
nonterminal informs us what state to enter if we
reduce. In the entry with row the state at the tail of the arc and column the
labeling nonterminal place n, where n is the state at the head of the arc.
3. Each completed item (dot at the extreme right) indicates
a possible reduction. In each entry with row the state containing the
completed item and column a terminal in the FOLLOW set of the LHS of
the production corresponding to this item, place rn, where n is the number of
the production. (In particular, at entry [13,c] place an r2.).
4. In the entry with row (i.e., state) containing S'→S and column $,
place accept.
5. If any entry is labelled twice (i.e., a conflict) the grammar is not SLR(1).
6. Any unlabeled entry corresponds to an input error. If the parser accesses this
entry, the input sentence is not in the language generated by the grammar.

A Terminology Point

The book (both editions) and the rest of the world seem to use GOTO for both the
function defined on item sets and the derived function on states. As a result we will
be defining GOTO in terms of GOTO. Item sets are denoted by I or Ij, etc. States
are denoted by s or si or i. Indeed both books use i in this section. The advantage is
that on the stack we placed integers (i.e., i's) so this is consistent. The disadvantage
is that we are defining GOTO(i,A) in terms of GOTO(Ii,A), which looks confusing.
Actually, we view the old GOTO as a function and the new one as an array
(mathematically, they are the same) so we actually write GOTO(i,A) and
GOTO[Ii,A].
Parsing Table Construction Algorithm
We start with an augmented grammar (i.e., we added
S' → S).
1. Construct {I0,...,In} the LR(0) items.
2. The parsing actions for state i.
a. If A→α·bβ is in Ii for b a terminal, then
ACTION[i, b] = shift j, where GOTO(Ii, b) = Ij.
b. If A→α· is in Ii, for A≠S', then, for all b in
FOLLOW(A), ACTION[i, b]= reduce A→α .
c. If S'→S· is in Ii, then ACTION[I, $] = accept.
d. If any conflicts occurred, the grammar is not
SLR(1).
3. If GOTO(Ii,A)=Ij, for a nonterminal A, then
GOTO[i,A]=j.
4. All entries not yet defined are error.
5. The initial state is the one containing S'→·S.
Given Grammar G = {{B, A, C},{a, b, +},{B->a+b,
A->b, C->A}, B}

Augmented Grammar G’ = {{B’, B, A, C},{a, b,


+},{B’->B, B->a+b, A->b, C->A}, B’}

Example: Our main example gives the table on ACTION GOTO


the right. The entry s5 abbreviates shift and go to State
id + * ( ) $ ET F
state 5.
0 s5 s4 1 2 3
The entry r2 abbreviates reduce by production
number 2, where we have numbered the 1 s6 acc
productions as follows. 2 r2 s7 r2 r2
3 r4 r4 r4 r4
1. E→E+T 4 s5 s4 8 2 3
2. E→T
5 r6 r6 r6 r6
3. T→T*F
4. T→F 6 s5 s4 9 3
5. F→(E) 7 s5 s4 10
6. F → id 8 s6 s11
9 r1 s7 r1 r1
The shift actions can be read directly off the DFA.
10 r3 r3 r3 r3
For example I1 with a + goes to I6, I6 with an id
goes to I5, and I9 with a * goes to I7. 11 r5 r5 r5 r5

The reduce actions require FOLLOW. Consider I5={F→id·}. Since the dot is at the
end, we are ready to reduce, but we must check if the next symbol can follow the F
we are reducing to. Since FOLLOW(F)={+,*,),$}, in row 5 (for I5) we put r6
(for reduce by production 6) in the columns for +, *, ), and $.

The GOTO columns can also be read directly off the DFA. Since there is an E-
transition (arc labeled E) from I0 to I1, the column labeled E in row 0 contains a 1.

Since the column labeled + is blank for row 7, we see that it would be an error if
we arrived in state 7 when the next input character is +.

Finally, if we are in state 1 when the input is exhausted ($ is the next input
character), then we have a successfully parsed the input.
Example: The diagram on the right shows the actions when SLR is parsing
id*id+id. On the blackboard let's do id+id*id and see how the precedence is
handled.

Given grammar:

G={ {E,T,F}, {+,*,(,),id}, {E->E+T, E->T, T->T*F, T->F,


F->(E), F->id}, E}
Augmented Grammar:

G’={ {E’,E,T,F}, {+,*,(,),id}, { E’->E, E->E+T, E->T, T->T*F,


T->F, F->(E), F->id}, E’}

Non-terminals First Follow


E’ {(, id} {$}
E {(, id} {$, +, )}
T {(, id} {+, *, ), $}
F {(, id} {+, *, ), $}
ACTION GOTO
State
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 Accept
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

E’->E,
#1: E->E+T,
#2: E->T,
#3: T->T*F,
#4: T->F,
#5: F->(E),
#6: F->id
Part of the Input String after
Stack Action
Current Input Pointer
0 id*id+id$ shift
0 id 5 *id+id$ reduce by F→id
0F3 *id+id$ reduct by T→id
0T2 *id+id$ shift
0T2*7 id+id$ shift
0 T 2 * 7 id 5 +id$ reduce by F→id
0 T 2 * 7 F 10 +id$ reduce by T→T*F
0T2 +id$ reduce by E→T
0E1 +id$ shift
0E1+6 id$ shift
0 E 1 + 6 id 5 $ reduce by F→id
0E1+6F3 $ reduce by T→F
0E1+6T9 $ reduce by E→E+T
0E1 $ accept
Homework: 2 (you already constructed the LR(0) automaton for this example in
the previous homework), 3, 4 (this problem refers to 4.2.2(a-g); only use 4.2.2(a-c).

Example: What about ε-productions? Let's do


A → B D
B → b B | ε
D → d

Reducing by the ε-production actually adds a state (pops ZERO states since zero
symbols on RHS and pushes one).

You might also like