SLR Parsing
SLR Parsing
LR(0) Items
I0:
E’->.E
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id
Viable Prefixes
Right-sentential Form
At any given point, it finds the right-hand side of a production, called a handle,
and replaces the handle by the left-hand side of the production.
Handle
String Action
.id * id + id $ shift
id .* id + id $ reduce E->id
E .* id + id $ shift
E *. id + id $ shift
E * id . + id $ reduce E->id
E * E . + id $ reduce E->E*E
E .+ id $ shift
E + . id $ shift
E + id . $ reduce E->id
E+E.$ reduce E->E+E
E. $ reduce E’->E
E’ Accept
Configuration of an LR parser to
represent the right-sentential form
X1X2...Xmaiai+1...an
(s0X1s1X2s2...Xmsm, aiai+1...an$)
ai : Current input symbol (terminal)
sm: Current state on top of the stack
s0: Initial state
Xi are Grammar Symbol (Terminal or Non-
terminal) inside stack.
Reduce
Kernel items, whish include the initial item, S’→ .S, and all
items whose dots are not at the left end.
E' → E
E → E + T | T
T → T * F | F
F → ( E ) | id
CLOSURE({E' → · E}) contains 7 elements. The 6 new elements are the 6 original
productions each with a dot right after the arrow. Make sure you understand why
all 6 original productions are added. It is not because the E'→E production is
special.
Definition of GOTO:
I really believe this is very clear, but I understand that the formalism makes it seem
confusing. Let me begin with the idea.
We augment the grammar and get this one new production; take its closure. That is
the first element of the collection; call it I0. Try GOTOing from I0, i.e., for each
grammar symbol, consider GOTO(I0,X); each of these (almost) is another element
of the collection. Now try GOTOing from each of these new elements of the
collection, etc. Start with jane smith, add all her friends F, then add the friends of
everyone in F, called FF, then add all the friends of everyone in FF, etc
This GOTO gives exactly the arcs in the DFA I constructed earlier. The formal
treatment does not include the NFA, but works with the DFA from the beginning.
Definition: The above collection of item sets (so this is a set of sets) is called
the canonical LR(0) collection and the DFA having this collection as nodes and
the GOTO function as arcs is called the LR(0) automaton.
Homework: Construct the LR(0) automaton for the following grammar (which
produces simple postfix expressions).
S → S S + | S S * | a
Don't forget to augment the grammar.
E' → E
E → E + T | T
T → T * F | F
F → ( E ) | id
is larger than the toy I did before. The NFA would have 2+4+2+4+2+4+2=20
states (a production with k symbols on the RHS gives k+1 N-states since there k+1
places to place the dot). This gives rise to 12 D-states. However, the development
in the book, which we are following now, constructs the DFA directly. The
resulting diagram is on the right.
The LR-parsing algorithm must decide when to shift and when to reduce (and in
the latter case, by which production). It does this by consulting two tables,
ACTION and GOTO. The basic algorithm is the same for all LR parsers, what
changes are the tables ACTION and GOTO.
Technical point that may, and probably should, be ignored: our GOTO was defined
on pairs [item-set,grammar-symbol]. The new GOTO is defined on pairs
[state,nonterminal]. A state is simply an item set (so nothing is new here). We will
not use the new GOTO on terminals so we just define it on nonterminals.
1. Shift j. The terminal a is shifted on to the stack and the parser enters state j.
2. Reduce A → α. The parser reduces α on the TOS to A.
3. Accept.
4. Error
So ACTION is the key to deciding shift vs. reduce. We will soon see how this
table is computed for SLR.
This formalism is useful for stating the actions of the parser precisely, but I believe
the parser can be explained without this formalism. The essential idea of the
formalism is that the entire state of the parser can be represented by the vector of
states on the stack and input symbols not yet processed.
(s0,s1...sm,aiai+1...an$)
where the s's are states and the a's input symbols. This configuration could also be
represented by the right-sentential form
X1...Xm,ai...an
where the X is the symbol associated with the state. X is either the terminal just
shifted in or the LHS of the reduction just performed.
The parser consults the combined ACTION-GOTO table for its current state (TOS)
and next input symbol, formally this is ACTION[sm,ai], and proceeds based on the
value in the table. If the action is a shift, the next state is clear from the DFA We
have done this informally just above; here we use the formal treatment).
1. Shift s. The input symbol a is pushed and s becomes the new state. The new
configuration is
(s0...sms,ai+1...an)
A Simple Example
For convenience number the productions of the grammar to make them easy to
reference. Assume that the production B → a+b is numbered 2.
The action table is defined with states (item sets) as rows and terminals and the $
endmarker as columns. GOTO has the same rows, but has nonterminals as
columns. So we construct a combined ACTION-GOTO table, with states as rows
and grammar symbols (terminals + nonterminals) as
columns.
State a b + $ ABC
1. Each arc in the diagram labeled with a terminal 7 acc
indicates a shift. In the entry with row the state
8 s11 s10 9 7
at the tail of the arc and column the labeling
terminal place sn, where n is the state at the 9
head of the arc. This indicates that if we are in 10
the given state and the input is the given 11 s12
terminal, we shift to new state n. 12 s13
2. Each arc in the diagram labeled with a 13 r2
nonterminal informs us what state to enter if we
reduce. In the entry with row the state at the tail of the arc and column the
labeling nonterminal place n, where n is the state at the head of the arc.
3. Each completed item (dot at the extreme right) indicates
a possible reduction. In each entry with row the state containing the
completed item and column a terminal in the FOLLOW set of the LHS of
the production corresponding to this item, place rn, where n is the number of
the production. (In particular, at entry [13,c] place an r2.).
4. In the entry with row (i.e., state) containing S'→S and column $,
place accept.
5. If any entry is labelled twice (i.e., a conflict) the grammar is not SLR(1).
6. Any unlabeled entry corresponds to an input error. If the parser accesses this
entry, the input sentence is not in the language generated by the grammar.
A Terminology Point
The book (both editions) and the rest of the world seem to use GOTO for both the
function defined on item sets and the derived function on states. As a result we will
be defining GOTO in terms of GOTO. Item sets are denoted by I or Ij, etc. States
are denoted by s or si or i. Indeed both books use i in this section. The advantage is
that on the stack we placed integers (i.e., i's) so this is consistent. The disadvantage
is that we are defining GOTO(i,A) in terms of GOTO(Ii,A), which looks confusing.
Actually, we view the old GOTO as a function and the new one as an array
(mathematically, they are the same) so we actually write GOTO(i,A) and
GOTO[Ii,A].
Parsing Table Construction Algorithm
We start with an augmented grammar (i.e., we added
S' → S).
1. Construct {I0,...,In} the LR(0) items.
2. The parsing actions for state i.
a. If A→α·bβ is in Ii for b a terminal, then
ACTION[i, b] = shift j, where GOTO(Ii, b) = Ij.
b. If A→α· is in Ii, for A≠S', then, for all b in
FOLLOW(A), ACTION[i, b]= reduce A→α .
c. If S'→S· is in Ii, then ACTION[I, $] = accept.
d. If any conflicts occurred, the grammar is not
SLR(1).
3. If GOTO(Ii,A)=Ij, for a nonterminal A, then
GOTO[i,A]=j.
4. All entries not yet defined are error.
5. The initial state is the one containing S'→·S.
Given Grammar G = {{B, A, C},{a, b, +},{B->a+b,
A->b, C->A}, B}
The reduce actions require FOLLOW. Consider I5={F→id·}. Since the dot is at the
end, we are ready to reduce, but we must check if the next symbol can follow the F
we are reducing to. Since FOLLOW(F)={+,*,),$}, in row 5 (for I5) we put r6
(for reduce by production 6) in the columns for +, *, ), and $.
The GOTO columns can also be read directly off the DFA. Since there is an E-
transition (arc labeled E) from I0 to I1, the column labeled E in row 0 contains a 1.
Since the column labeled + is blank for row 7, we see that it would be an error if
we arrived in state 7 when the next input character is +.
Finally, if we are in state 1 when the input is exhausted ($ is the next input
character), then we have a successfully parsed the input.
Example: The diagram on the right shows the actions when SLR is parsing
id*id+id. On the blackboard let's do id+id*id and see how the precedence is
handled.
Given grammar:
E’->E,
#1: E->E+T,
#2: E->T,
#3: T->T*F,
#4: T->F,
#5: F->(E),
#6: F->id
Part of the Input String after
Stack Action
Current Input Pointer
0 id*id+id$ shift
0 id 5 *id+id$ reduce by F→id
0F3 *id+id$ reduct by T→id
0T2 *id+id$ shift
0T2*7 id+id$ shift
0 T 2 * 7 id 5 +id$ reduce by F→id
0 T 2 * 7 F 10 +id$ reduce by T→T*F
0T2 +id$ reduce by E→T
0E1 +id$ shift
0E1+6 id$ shift
0 E 1 + 6 id 5 $ reduce by F→id
0E1+6F3 $ reduce by T→F
0E1+6T9 $ reduce by E→E+T
0E1 $ accept
Homework: 2 (you already constructed the LR(0) automaton for this example in
the previous homework), 3, 4 (this problem refers to 4.2.2(a-g); only use 4.2.2(a-c).
Reducing by the ε-production actually adds a state (pops ZERO states since zero
symbols on RHS and pushes one).