Lecture-7-Introduction-to-LR-Parsing-Simple-LR-part-1
Lecture-7-Introduction-to-LR-Parsing-Simple-LR-part-1
to LR Parsing: Simple
LR
(part-1)
Dr. Raheel Siddiqi
Compiler Construction
1
The most prevalent type of bottom-up parser today is based
on a concept called LR(k) parsing; the "L" is for left-to-right
scanning of the input, the "R" for constructing a rightmost
derivation in reverse, and the k for the number of input
symbols of lookahead that are used in making parsing
decisions.
This lecture introduces the basic concepts of LR parsing and
the easiest method for constructing shift-reduce parsers,
called "simple LR" (or SLR, for short).
2
Why LR Parsers?
LR parsers are table-driven, much like the non-recursive LL parser.
LR parsing is attractive for a variety of reasons:
LR parsers can be constructed to recognize virtually all programming
language constructs for which context-free grammars can be written.
The class of grammars that can be parsed using LR methods is a proper
superset of the class of grammars that can be parsed with predictive or LL
methods.
The principal drawback of the LR method is that it is too much work
to construct an LR parser by hand for a typical programming-language
grammar.
3
Items and the LR(0) Automaton
How does a shift-reduce parser know when to shift and when to
reduce?
An LR parser makes shift-reduce decisions by maintaining states to
keep track of where we are in a parse.
States represent sets of "items."
An LR(0) item (item for short) of a grammar G is a production of G
with a dot at some position of the body.
Thus, production A →XYZ yields the four items
4
The production A → ε generates only one item, A →..
Intuitively, an item indicates how much of a production we have seen
at a given point in the parsing process.
One collection of sets of LR(0) items, called the canonical LR(0)
collection, provides the basis for constructing a deterministic finite
automaton that is used to make parsing decisions.
Such an automaton is called an LR(0) automaton.
To construct the canonical LR(0) collection for a grammar, we define
an augmented grammar and two functions, CLOSURE and GOTO .
If G is a grammar with start symbol S, then G', the augmented
grammar for G, is G with a new start symbol S' and production S' → S.
5
Closure of Item Sets
If I is a set of items for a grammar G, then CLOSURE(I) is the set of
items constructed from I by the two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A → α·Bβ is in CLOSURE(I) and B → γ is a production, then add the
item B → .γ to CLOSURE(I), if it is not already there. Apply this rule until
no more new items can be added to CLOSURE (I).
6
Example
Consider the augmented expression grammar:
If I is the set of one item {[E' → ∙E]}, then CLOSURE(I) contains the set
of items I0 as shown in Figure 1
7
LR(0) automaton for the
expression grammar
8
We divide all the sets of items of interest into two
classes:
1. Kernel items: the initial item, S' → ·S, and all items
whose dots are not at the left end.
2. Non-kernel items: all items with their dots at the
left end, except for S' → ·S.
9
The Function GOTO
The second useful function is GOTO(I, X) where I is a set of items and
X is a grammar symbol.
GOTO (I, X) is defined to be the closure of the set of all items [A →
αX·β] such that [A → α ∙Xβ] is in I.
Intuitively, the GOTO function is used to define the transitions in the
LR(0) automaton for a grammar.
10
Example
If I is the set of two items {[E' → E·], [E → E·+T]}, then GOTO(I, +)
contains the items
11
Use of the LR(0) Automaton
How can LR(0) automata help with shift-reduce decisions?
Shift-reduce decisions can be made as follows.
Suppose that the string γ of grammar symbols takes the LR(0) automaton
from the start state 0 to some state j.
Then, shift on next input symbol a if state j has a transition on a.
Otherwise, we choose to reduce; the items in state j will tell us which
production to use.
12
Figure 2 illustrates the actions of a shift-reduce parser on input id *
id, using the LR(0) automaton in Figure 1.
We use a stack to hold states; for clarity, the grammar symbols
corresponding to the states on the stack appear in column SYMBOLS.