0% found this document useful (0 votes)
32 views106 pages

Unit 3

Uploaded by

bahes80516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views106 pages

Unit 3

Uploaded by

bahes80516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Unit - II I

Chapter 4
Syntax Analysis
Bottom – Up Parsing :
Shift Reduce Parsing and
Operator Precedence Parsing

Partha Sarathi Chakraborty


Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi – NCR Campus
2

Outline
• Bottom-Up Parsing
– Shift Reduce Parsing
– Operator Precedence Parsing.
– LR parsers: (next Presentation)
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
3

Bottom-Up Parsing
• Start at the leaves and grow toward root.
• We can think of the process as reducing the input
string to the start symbol.
• At each reduction step a particular substring matching
(C) 2014, Prepared by Partha Sarathi Chakraborty

the right-side of a production is replaced by the


symbol on the left-side of the production.
• Bottom-up parsers handle a large class of grammars.
4

Bottom-Up Parsing
• A general style of bottom-up syntax analysis, known as
shift-reduce parsing.
• Main actions are shift and reduce.
• At each shift action, the current symbol in the input
string is pushed to a stack.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• At each reduction step, the symbols at the top of the


stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the
left side of that production.
• There are also two more actions: accept and error.
5

Shift – Reduce Parsing


• “Shift-Reduce” Parsing
• Reduce a string to the start symbol of the grammar.
• At every step a particular sub-string is matched (in left-to-right
fashion) to the right side of some production and then it is
substituted by the non-terminal in the left hand side of the
production.
(C) 2014, Prepared by Partha Sarathi Chakraborty

Consider: Reverse Order


abbcde
S  aABe
aAbcde
A  Abc | b
aAde
Bd
aABe
S
Rightmost Derivation:
S  aABe  aAde  aAbcde  abbcde
6

Shift-Reduce Parsing

Grammar: Reducing a sentence: Shift-reduce corresponds


SaABe abbcde to a rightmost derivation:
AAbc|b aAbcde S rm a A B e
Bd aAde rm a A d e
aABe rm a A b c d e
(C) 2014, Prepared by Partha Sarathi Chakraborty

These match S rm a b b c d e


production’s
right-hand sides
S
A A A
A A A B A B
a b b c d e a b b c d e a b b c d e a b b c d e
7

Handles
A handle is a substring of grammar symbols in a right-
sentential form that matches a right-hand side
of a production
Grammar: abbcde
SaABe aAbcde
(C) 2014, Prepared by Partha Sarathi Chakraborty

AAbc|b aAde Handle


Bd aABe
S
abbcde
aAbcde NOT a handle, because
aAAe further reductions will fail
…? (result is not a sentential form)
8

Handles
• A handle of a right sentential form  ( ) is a
production rule A   and a position of  where the string 
may be found and replaced by A to produce the previous right-
sentential form in a rightmost derivation of .
S  A  
i.e. A   is a handle of  at the location immediately after
(C) 2014, Prepared by Partha Sarathi Chakraborty

the end of ,
• If the grammar is unambiguous, then every right-sentential
form of the grammar has exactly one handle.
•  is a string of terminals
9

Handle Pruning
• The process of discovering a handle & reducing it to
the appropriate left-hand side is called handle
pruning. Handle pruning forms the basis for a
bottom-up parsing method.

• To construct a rightmost derivation


(C) 2014, Prepared by Partha Sarathi Chakraborty

S = 0  1  2  ...  n-1  n=  input string

Reduction made by Shift-Reduce Parser


10

Shift – Reduce Parser


• There are four possible actions of a shift-parser
action:
– Shift : The next input symbol is shifted onto the top of the
stack.
– Reduce: Replace the handle on the top of the stack by the
(C) 2014, Prepared by Partha Sarathi Chakraborty

non-terminal.
– Accept: Successful completion of parsing.
– Error: Parser discovers a syntax error, and calls an error
recovery routine.
11

Stack Implementation of Shift – Reduce Parser

• Initial State
STACK INPUT
$ W$
• Final State
(C) 2014, Prepared by Partha Sarathi Chakraborty

STACK INPUT
$S $
12

Stack Implementation of
Shift-Reduce Parsing

Stack Input Action


$ id+id*id$ shift
$id +id*id$ reduce E  id How to
Grammar: $E +id*id$ shift
resolve
(C) 2014, Prepared by Partha Sarathi Chakraborty

EE+E $E+ id*id$ shift


$E+id *id$ reduce E  id conflicts?
EE*E $E+E *id$ shift (or reduce?)
E(E) $E+E* id$ shift
E  id $E+E*id $ reduce E  id
$E+E*E $ reduce E  E * E
$E+E $ reduce E  E + E
Find handles $E $ accept
to reduce
13

Justifies the use of stack in shift –


reduce parsing
• The handle will always appear on top of the
stack, never inside.
• Consider the possible forms of two successive
(C) 2014, Prepared by Partha Sarathi Chakraborty

steps in any rightmost derivation.


• These steps can be of the form:
1. S *rm Az *rm Byz *rm yz
2. S *rm BxAz *rm Bxyz *rm xyz
14

Justifies the use of stack in shift –


reduce parsing
• Let consider case (1) in reverse, where a shift –
reduce parser just reached the configuration
STACK INPUT
(C) 2014, Prepared by Partha Sarathi Chakraborty

$ yz$ handle identified


$B yz$ reduce
$By z$ handle identified
$A z$ reduce
15

Justifies the use of stack in shift –


reduce parsing (contd…)
• Case (2) configuration,
STACK INPUT
$ xyz$ handle identified
(C) 2014, Prepared by Partha Sarathi Chakraborty

$B xyz$ reduce

$Bx yz$ shift

$Bxy z$ handle identified

$BxA z$ reduce
Note: It never had to go into the stack to find the handle. It is this
aspect of handle pruning that makes a stack a particularly
convenient data structure to implementing a shift reduce parser.
16

Conflicts
• Shift-reduce and reduce-reduce conflicts are
caused by
– The limitations of the LR parsing method (even
when the grammar is unambiguous)
– Ambiguity of the grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty
17

Shift-Reduce Parsing:
Shift-Reduce Conflicts

Stack Input Action


$… …$ …
$…if E then S else…$ shift or reduce?
Ambiguous grammar:
(C) 2014, Prepared by Partha Sarathi Chakraborty

S  if E then S
| if E then S else S
| other

Resolve in favor
of shift, so else
matches closest if
18

Shift-Reduce Parsing:
Reduce-Reduce Conflicts

Stack Input Action


$ aa$ shift
$a a$ reduce A  a or B  a ?
Grammar:
(C) 2014, Prepared by Partha Sarathi Chakraborty

CAB
Aa
Ba

Resolve in favor
of reduce A  a,
otherwise we’re stuck!
19

Operator – Precedence Parsing


• Operator grammar is a small, but an important class of
grammars.
• It easily construct an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
• Properties: In an operator grammar, no production rule
(C) 2014, Prepared by Partha Sarathi Chakraborty

can have:
–  at the right side
– two adjacent non-terminals at the right side.
• Example
E → E + E | E * E | ( E ) | −E | id
20

Operator – Precedence Parsing


• Disadvantage
– Hard to handle tokens like the minus sign, which
has two different precedences (depending on
whether it is unary or binary)
– Worse, since the relationship between a grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty

for the language being parsed and the operator-


precedence parser itself is tenuous, one cannot
always be sure the parser accepts exactly the
desired language.
– Only a small class of grammars can be parsed.
21

Operator – Precedence Parsing


• In operator-precedence parsing, we define three
disjoint precedence relations between certain pairs
of terminals.
• Relations Meaning
a⋖b b has higher precedence than a
(C) 2014, Prepared by Partha Sarathi Chakraborty

a≐b b has same precedence as a


a⋗b b has lower precedence than a

• There are two common ways of determining what


precedence relations should hold between a pair of
terminals.
22

Operator – Precedence Parsing


• Two Common ways -
– First method is intuitive and is based on the traditional
notions of associativity and precedence of operators.
(Unary minus causes a problem).
• Example: If * has higher precedence than + than the
relationship shown as + ⋖ * and * ⋗ +
(C) 2014, Prepared by Partha Sarathi Chakraborty

– Second method of selecting operator-precedence


relationship is first to construct an unambiguous grammar
for the language, a grammar that reflects the correct
associativity and precedence in its parse trees.
• Example: the dangling else grammar.
23

Using Operator – Precedence Relations


• The intention of the precedence relations is to
find the handle of a right-sentential form,
⋖ with marking the left end,
≐ appearing in the interior of the handle, and
(C) 2014, Prepared by Partha Sarathi Chakraborty

⋗ marking the right hand.


• In our input string $0a11a2... n-1ann$, we
insert the precedence relation between the
pairs of terminals (the precedence relation
holds between the terminals in that pair).
24

Using Operator – Precedence Relations


• Consider the grammar id + * $
E → E + E | E * E | id id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
• Operator – Precedence Relations
* ⋖ ⋗ ⋗ ⋗
(C) 2014, Prepared by Partha Sarathi Chakraborty

$ ⋖ ⋖ ⋖

• Then the input string id + id * id with the precedence


relations inserted will be:
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $
25

Using Operator – Precedence


Relations: To find handle
• The handle can be found by the following process:
1. Scan the string from left end until the first ⋗ is encountered.
2. Then scan backwards (to the left) over any ≐ until a ⋖ is
encountered.
3. The handle contains everything to left of the first ⋗ and to the
right of the ⋖ is encountered in step (2), including any
(C) 2014, Prepared by Partha Sarathi Chakraborty

intervening or surrounding non-terminals.

$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $ E  id $ id + id * id $
$ ⋖ + ⋖ id ⋗ * ⋖ id ⋗ $ E  id $ E + id * id $
$ ⋖ + ⋖ * ⋖ id .⋗ $ E  id $ E + E * id $
$⋖ +⋖ *⋗ $ EE*E $E+ E*E$
$⋖ +⋗ $ EE+E $E+E$
$$ $E$
26

Using Operator – Precedence


Relations: LEADING and TRAILING
• Consider the grammar
EE+T|T
TT*F|F
F  ( E ) | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

• LEADING(A) and TRAILING(A) for each


nonterminal A, defined by
– LEADING(A) = { a | A + a, where  is  or a single
nonterminal }
– TRAILING(A) = { a | A + a, where  is  or a single
nonterminal }
27

LEADING and TRAILING Algorithm


• LEADING(A) Algorithm
– a is in LEADING(A) if there is a production of the form A
→ a, where  is  or a single nonterminal.
– If a is in LEADING(B), and there is a production of the
form A → B, then a is in LEADING(A).
(C) 2014, Prepared by Partha Sarathi Chakraborty

• TRAILING(A) Algorithm
– a is in TRAILING(A) if there is a production of the form A
→ a, where  is  or a single nonterminal.
– If a is in TRAILING(B), and there is a production of the
form A → B, then a is in TRAILING(A).
28

LEADING and TRAILING


• The LEADING and TRAILING terminals for the
previous grammar.
• NONTERMINAL LEADING TRAILING
E * , + , ( , id * , + , ) , id
(C) 2014, Prepared by Partha Sarathi Chakraborty

T * , ( , id * , ) , id
F ( , id ) , id
29

Computing Operator - Precedence Relations


Input: An operator grammar G.
Output: The relations ⋖ , ≐ , and ⋗ for G.
Method:
1.Compute LEADING(A) and TRAILING(A) for each
nonterminal A.
(C) 2014, Prepared by Partha Sarathi Chakraborty

2.Execute the algorithm, examining each position of the right


side of each production.
3.Set $ ⋖ a for all a in LEADING(S) and set b ⋗ $ for all b
in TRAILING(S), where S is the start symbol of G.
for each production A → X1X2…Xn do
for i ≔ 1 to n – 1 do
30

Computing Operator - Precedence Relations


(Contd…)
begin
if Xi and Xi+1 are both terminals then set Xi ≐ Xi+1 ;
if i ≤ (n − 2) and Xi and Xi+2 are terminals
and Xi+1 is a nonterminal then
(C) 2014, Prepared by Partha Sarathi Chakraborty

set Xi ≐ Xi+2 ;
if Xi is a terminal and Xi+1 is a nonterminal then
for all a in LEADING(Xi+1) do set Xi ⋖ a ;
if Xi is a nonterminal and Xi+1 is a terminal then
for all a in TRAILNG(Xi) do set a ⋗ Xi+1 ;
end
31

Operator – Precedence Relations


The grammar
EE+T|T $⋖a
TT*F|F
F  ( E ) | id
right part or g()
+ * ( ) id $
left part or f()

EE+T
(C) 2014, Prepared by Partha Sarathi Chakraborty

+ ⋗ ⋖ ⋖ ⋗ ⋖ ⋗
* ⋗ ⋗ ⋖ ⋗ ⋖ ⋗
Xi Xi+1 Xi+2 ( ⋖ ⋖ ⋖ ≐ ⋖
Xi Xi+1 ) ⋗ ⋗ ⋗ ⋗
id ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖
32

Operator – Precedence Parsing Algorithm


Input: An input string w and a table of precedence relations.

Output: If w is well formed, a skeletal parse tree, with a


placeholder nonterminal E labeling all interior nodes; otherwise,
an error indication.

Method: Initially, the stack contains $ and the input buffer the
(C) 2014, Prepared by Partha Sarathi Chakraborty

string w$.

set ip to point to the first symbol of w$ ;


repeat forever
if $ is on top of the stack and ip points to $ then
accept and return
else
33

Operator – Precedence Parsing Algorithm


(Contd…)
begin
let a be the topmost terminal symbol on the stack
and let b be the current symbol pointed to by ip ;
if a ⋖ b or a ≐ b then /* shift */
begin
shift/push b onto the stack ;
(C) 2014, Prepared by Partha Sarathi Chakraborty

advance ip to the next input symbol;


end ;
else if a ⋗ b then /* reduce */
repeat
pop the stack
until the top stack terminal is related by ⋖
to the terminal most recently popped
else call error recovery routine or error()
end
34

Moves made by Operator – Precedence


Parsing
STACK Precedence INPUT ACTION
$ ⋖ id + id * id$ shift
id + * $
$ id ⋗ + id * id$ reduce E  id
$ ⋖ + id * id$ shift id ⋗ ⋗ ⋗
(C) 2014, Prepared by Partha Sarathi Chakraborty

$+ ⋖ id * id$ shift + ⋖ ⋗ ⋖ ⋗
$ + id ⋗ * id$ reduce E  id * ⋖ ⋗ ⋗ ⋗
$+ ⋖ * id$ shift
$ ⋖ ⋖ ⋖
$+* ⋖ id$ shift
$ + * id ⋗ $ reduce E  id
$+* ⋗ $ reduce E  E * E
$+ ⋗ $ reduce E  E + E
$ $ Accept
35

Actions of Operator – Precedence Parsing


input string: id + id
Stack Input Stack Input
$ id + id $ $ id + id $

id
(a) (b)
(C) 2014, Prepared by Partha Sarathi Chakraborty

$● + id $ $●+ id $

● ● +

id id
(c) (d)
Actions of Operator – Precedence Parsing
36

input string: id + id

Stack Input Stack Input


$ ● + id $ $●+● $

● + id ● + ●

id (e) id id (f)
(C) 2014, Prepared by Partha Sarathi Chakraborty

Stack Input
$● $

● + ●

id id (g)
37

Operator-Precedence Relations from


Associativity and Precedence
Consider the grammar for the arithmetic expression
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
Note: Grammar is ambiguous, and right-sentential
forms could have many handles.
(C) 2014, Prepared by Partha Sarathi Chakraborty

1. If operator 1 has higher precedence than operator 2,


make
1 ⋗ 2 and 2 ⋖ 1
Example: * and +, input: E + E * E + E
38
Operator-Precedence Relations from
Associativity and Precedence (Contd…)
2. If operator 1 and operator 2 have equal precedence,
they are left-associative, make 1 ⋗ 2 and 2 ⋗ 1
they are right-associative, make 1 ⋖ 2 and 2 ⋖ 1
Example: left-associative input: E  E + E
+ ⋗ + , + ⋗  ,  ⋗  , and  ⋗ +
right-associative ↑ ⋖ ↑
(C) 2014, Prepared by Partha Sarathi Chakraborty

3. For all operators ,


 ⋖ id , id ⋗  ,  ⋖ ( , ( ⋖  ,
( ⋖  ,  ⋗ ) , ) ⋗  ,  ⋗ $, and $ ⋖ 
Also, let
(≐ ) $ ⋖( $ ⋖ id
(⋖ ( id ⋗ $ ) ⋗$
( ⋖ id id ⋗ ) ) ⋗)
39

Operator-Precedence Relations from


Associativity and Precedence (Contd…)
Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
Assuming,
• ↑ is of highest precedence and right associative.
(C) 2014, Prepared by Partha Sarathi Chakraborty

• * and / are of next highest precedence and left –


associative, and
• + and – are of lowest precedence and left –
associative
Input: id * ( id ↑ id ) – id / id
Try input string with the table in next slide.
40

Operator-Precedence Relations from


Associativity and Precedence (Contd…)
+ − * / ↑ id )( $
+ ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗⋖ ⋗
− ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗⋖ ⋗
* ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
/ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
(C) 2014, Prepared by Partha Sarathi Chakraborty

↑ ⋗ ⋗ ⋗ ⋗ ⋖ ⋖ ⋗⋖ ⋗
id ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
( ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ≐ error
) ⋗ ⋗ ⋗ ⋗ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖
41

Handling Unary Operators


• Operator-Precedence parsing cannot handle the unary minus
when we also have the binary minus in our grammar.
• The best approach to solve this problem, let the lexical
analyzer handle this problem.
– The lexical analyzer will return two different tokens for the unary minus
and the binary minus.
– The lexical analyzer will need a lookhead to distinguish the binary
(C) 2014, Prepared by Partha Sarathi Chakraborty

minus from the unary minus.


• Then, we make
 ⋖ unary-minus for any operator
unary-minus ⋗  if unary-minus has higher precedence than 
unary-minus ⋖  if unary-minus has lower (or equal)
precedence than 
42

Precedence Functions
• Compilers using operator precedence parsers do not need to
store the table of precedence relations.
• The table can be encoded by two precedence functions f and g
that map terminal symbols to integers.
• For symbols a and b.
f(a) < g(b) whenever a ⋖ b
(C) 2014, Prepared by Partha Sarathi Chakraborty

f(a) = g(b) whenever a ≐ b


f(a) > g(b) whenever a ⋗ b
• The precedence relation between a and b can be determined by
a numerical comparison between f(a) and g(b).
• Note: Error entries in the precedence matrix are obscured,
since one of (1), (2), or (3) holds no matter what f(a) and g(b)
are.
43

Precedence Functions
Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id

+ − * / ↑ ( ) id $
f 2 2 4 4 4 0 6 6 0
(C) 2014, Prepared by Partha Sarathi Chakraborty

g 1 1 3 3 5 5 0 5 0
Precedence Functions
For example:
* ⋖ id, and f(*) < g(id)
Note: f(id) > g(id) suggests that id ⋗ id;
In fact no precedence relation holds between id and id.
44

Constructing Precedence Functions


Input: An operator precedence matrix
Output: Precedence function representing the input matrix,
or an indication the none exist.

Method
(C) 2014, Prepared by Partha Sarathi Chakraborty

1.Create symbols fa and ga for each a that is a terminal or $.


2.Partition the created symbols into as many groups as
possible, in such a way that if a ≐ b, then fa and gb are in the
same group.
45

Constructing Precedence Functions


3. Create a directed graph whose nodes are the groups
found in (2). For any a and b,
• If a ⋖ b, place an edge from the group of gb to the group of fa.
• If a ⋗ b, place an edge from the group of fa to that of gb.

4. If the graph constructed in (3) has a cycle, then no


(C) 2014, Prepared by Partha Sarathi Chakraborty

precedence functions exist. If there are no cycle, let


f(a) be the length of the longest path beginning at the
group of fa; let g(a) be the length of the longest path
beginning at the group of ga.
46
Example

+ +
(C) 2014, Prepared by Partha Sarathi Chakraborty

Graph representing precedence functions

+ * id $
f 2 4 4 0
g 1 3 5 0
47

Error Recovery in Operator-


Precedence Parsing
• Parser can discover syntactic errors:
– If no precedence relation holds between the
terminal on top of the stack and the current
(C) 2014, Prepared by Partha Sarathi Chakraborty

input.
– If a handle has been found, but there is no
production with this handle as a right side.
48

Handling Errors during Reductions


Consider the grammar
E → E + E | E − E | E * E | E / E | E ↑ E | ( E ) | −E | id
• The error checker for reductions need only check that the proper
set of nonterminal makers appears among the terminal strings
being reduced.
• Specifically, the checker does the following:
– If + , − , * , / , or ↑ is reduced, it checks that nonterminals appear on both
(C) 2014, Prepared by Partha Sarathi Chakraborty

sides. If not, it issues the diagnostic


missing operand
– If id is reduced, it checks that there is no terminal to the right or left. If
there is, it can warn
missing operator
– If ( ) is reduced, it checks that there is a nonterminal between the
parentheses. If not, it can say
no expression between parentheses
id ( ) $ 49

id e3 e3 ⋗ ⋗

⋖ ⋖ ≐
Handling Shift/Reduce Errors: ( e4

) e3 e3 ⋗ ⋗
Error handling routines $ ⋖ ⋖ e2 e1

• e1: /* called when whole expression is missing */


– insert id onto the input
– issue diagnostic: “missing operand”
• e2: /* called when expression begins with a right parenthesis */
(C) 2014, Prepared by Partha Sarathi Chakraborty

– delete ) from the input


– issue diagnostic: “unbalanced right parenthesis”
• e3: /* called when id or ) is followed by id or ( */
– insert + onto the input
– issue diagnostic: “missing operator”
• e4: /* called when expression ends with a left parenthesis */
– pop ( from stack
– issue diagnostic: “missing right parenthesis”
50
Error Handling Mechanism
Erroneous input: id id ) ( $ Original String: ( id + id )

STACK Precedence INPUT ACTION


$ ⋖ id id )($ shift
$ id blank id )($ error, e3. missing operator
$ id + id )($ insert ‘+’ in INPUT
$ id ⋗ + id )($ reduce
$ ⋖ + id )($ shift
$+ ⋖ id )($ shift
(C) 2014, Prepared by Partha Sarathi Chakraborty

$ + id ⋗ )($ reduce
$+ ⋗ )($ reduce
$ blank )($ error, e2. unbalanced right parenthesis
$ ($ delete ‘)’from the INPUT
$ ⋖ ($ shift
$( blank $ error, e4. missing right parenthesis
$ $ pop ‘(’from STACK
$ $ accept
1

Unit - II
Chapter 4
Syntax Analysis
Bottom – Up Parsing : LR parsers
(C) 2014, Prepared by Partha Sarathi Chakraborty

Partha Sarathi Chakraborty


Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi – NCR Campus
2

Outline
• Bottom-Up Parsing
– LR parsers:
• Simple LR (SLR)
• Canonical LR
• Lookahead LR (LALR)
(C) 2014, Prepared by Partha Sarathi Chakraborty
3

LR Parsers
• Efficient bottom-up syntax analysis technique
that can used to parse a large class of CFG.
• The technique is called LR(k) parsing; ‘L’ is
for Left-to-Right scanning of the input, ‘R’ for
(C) 2014, Prepared by Partha Sarathi Chakraborty

constructing a rightmost derivation in reverse,


and the k for the number of input symbols of
lookahead that are used in making parsing
decisions.
• When (k) is omitted, k is assumed to be 1.
4

LR Parsers: Attractive
• LR parsing is attractive for variety of reasons.
– LR parsers can be constructed to recognize virtually all
programming language constructs for which context-free
grammars can be written.
– The LR-parsing method is the most general nonbacktracking
shift-reduce parsing method known, yet it can be
(C) 2014, Prepared by Partha Sarathi Chakraborty

implemented as efficiently as other, more primitive shift-


reduce methods.
– An LR parser can detect a syntactic error as soon as it is
possible to do so on a left-to-right scan of the input.
– The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive or LL methods.
5

LR Parsers: Drawback
• The principal drawback of the LR method is that it is
too much work to construct an LR parser by hand for a
typical programming-language grammar.
• A specialized tool, an LR parser generator, is needed.
• YACC: Such a generator takes a context-free grammar
(C) 2014, Prepared by Partha Sarathi Chakraborty

and automatically produces a parser for that grammar.


– If the grammar contains ambiguities or other constructs that
are difficult to parse in a left-to-right scan of the input, then
the parser generator locates these constructs and provides
detailed diagnostic messages.
6

Three Techniques for creating LR


Parsing Table
• Simple LR (in short SLR) – It is easiest to
implement, but least powerful as It may fail to
produce a paring table for certain grammars.
• Canonical LR – It is most powerful and most
(C) 2014, Prepared by Partha Sarathi Chakraborty

expensive.
• Lookahead LR (in short LALR) – It is intermediate
in power and cost between other two. It will work on
most programming-language grammars, and with
some effort, implemented efficiently.
• Powerful: Canonical LR > LALR > SLR
7

LR(0) Items of a Grammar


• An LR(0) item of a grammar G is a production of G
with a • at some position of the right-hand side
• Thus, a production
AXYZ
has four items:
[A  • X Y Z]
(C) 2014, Prepared by Partha Sarathi Chakraborty

[A  X • Y Z]
[A  X Y • Z]
[A  X Y Z •]
• Note that production A   has one item [A  •]
8

Constructing the set of LR(0) Items of


a Grammar
• If G is a grammar with start symbol S, then G’, the
augmented grammar for G i.e. G with a new start
symbol S’ and production S’S.
The purpose of new starting production is to indicate to the parser when it should
stop parsing and announce acceptance of the input i.e. acceptance occurs when
(C) 2014, Prepared by Partha Sarathi Chakraborty

and only when the parser is about to reduce by S’ S.

• The Closure Operation


9
Example
• Consider the Grammar G,

F
FIRST(E) = FIRST(T) = FIRST(F) = { ( , id}
FOLLOW(E) = { $ , ) , + }
FOLLOW(T) = { * , $ , ) , + }
FOLLOW(F) = { * , $ , ) , + }
(C) 2014, Prepared by Partha Sarathi Chakraborty

• Create augmented expression grammar G’

F
• If I is the set of one item {[E’ → . E}, then closure(I) contains
the items.
10

The Closure Operation (Example)


(C) 2014, Prepared by Partha Sarathi Chakraborty

Final Closure I0
(C) 2014, Prepared by Partha Sarathi Chakraborty

Items
The Goto Operation for LR(0)
11
12

The Sets-of-Items Construction


(C) 2014, Prepared by Partha Sarathi Chakraborty

Kernel and Nonkernel Items


13
LR(0) collection for Grammar
id
I0  I5

+
I1  I6

E *
I0  I1 I2  I7

T E F
I4  I8 I4  I3
(C) 2014, Prepared by Partha Sarathi Chakraborty

I0  I2
( id
I4  I4 I4  I5
F
I0  I3 T F
I6  I9 I6  I3
( id
I6  I4 I6  I5
(
I0  I4 F ( id
I7  I10 I7  I4 I7  I5
)
I8  I11 *
+ I9  I7
I8  I6
14
Transition diagram for the grammar G
represent Goto operation
(C) 2014, Prepared by Partha Sarathi Chakraborty

Transition diagram of DFA D for viable prefixes.


(C) 2014, Prepared by Partha Sarathi Chakraborty

Constructing SLR parser table


15
(C) 2014, Prepared by Partha Sarathi Chakraborty

expression grammar G
Parsing table SLR(1) for
16
17

Parse table Action and Goto function


• Parsing table shows the parsing Action and Goto functions for the
following grammar G, repeated here with the production numbered.
(C) 2014, Prepared by Partha Sarathi Chakraborty
(C) 2014, Prepared by Partha Sarathi Chakraborty

Model of an LR Parser
18
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR Parsing Algorithm
19
20

Example of LR parsing
(C) 2014, Prepared by Partha Sarathi Chakraborty

Moves of LR parser on id * id + id
21

Another Example – SLR parsing


• Consider the following grammar G
(1) S  S ( S )
(2) S  
• Augmented Grammar G’
S’  S
(C) 2014, Prepared by Partha Sarathi Chakraborty

S S(S)
S 
• FIRST(S) = { ( , }
• FOLLOW(S) = { $ , ( , ) }
(C) 2014, Prepared by Partha Sarathi Chakraborty

Closure Set I
Construction of LR(0) items:
22
23

SLR(1) Parsing Table and Parsing

Now, the moves of LR parser on parsing the input string ( ) ( )


(C) 2014, Prepared by Partha Sarathi Chakraborty
24

LL vs. LR Grammars
• For a grammar to be LR(k), we must be able to
recognize the occurrence of the right side of a
production in a right-sentential form, with k input
symbols of lookahead.
• This requirement is far less stringent than that for
(C) 2014, Prepared by Partha Sarathi Chakraborty

LL(k) grammars where we must be able to recognize


the use of a production seeing only the first k symbols
of what its right side derives.
• Thus, it should not be surprising that LR grammars
can describe more languages than LL grammars.
25

Unambiguous grammars that are


not SLR(l)
• Every SLR(1) grammar is unambiguous, but
there are many unambiguous grammars that
are not SLR(1).
• Consider the grammar X with productions:
(C) 2014, Prepared by Partha Sarathi Chakraborty

FIRST (S) = FIRST(L) / FIRST (R) = { * , id}


FOLLOW(S) = {$} , FOLLOW(L) = { = , $}
FOLLOW(R) = {$ , = }
26

Canonical LR(0) collection for


Grammar X id
I0  I5

=
I2  I6

S R *
I0  I1 I4  I7 I4  I4
(C) 2014, Prepared by Partha Sarathi Chakraborty

L id
I4  I8 I4  I5
L
I0  I2
R
I6  I9
R
I0  I3 L
I6  I8
*
I0  I4 *
I6  I4
id
I6  I5
27
Parsing table SLR(1) for expression
grammar X shows S-R conflict

Conflict
Grammar X,
(1) S  L = R
(C) 2014, Prepared by Partha Sarathi Chakraborty

(2) S  R
(3) L  * R
(4) L  id
(5) R  L

Grammar X is not ambiguous. This shift/reduce conflict arises from the fact that the
SLR parser construction method is not powerful enough to remember enough left
context to decide what action the parser should take on input =, having seen a string
reducible to L.
28

LR(1) Grammars

• SLR too simple


• LR(1) parsing uses lookahead to avoid
unnecessary conflicts in parsing table
• LR(1) item = LR(0) item + lookahead
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR(0) item: LR(1) item:


[A•] [A•, a]
29

SLR Versus LR(1)


• Split the SLR states by
adding LR(1) lookahead
• Unambiguous grammar
SL=R|R
L  * R | id
(C) 2014, Prepared by Partha Sarathi Chakraborty

RL
30

LR(1) Items
• An LR(1) item
[A•, a]
contains a lookahead terminal a, meaning  already
on top of the stack, expect to see a
• For items of the form
[A•, a]
(C) 2014, Prepared by Partha Sarathi Chakraborty

the lookahead a is used to reduce A only if the


next input is a
• For items of the form
[A•, a]
with  the lookahead has no effect
(C) 2014, Prepared by Partha Sarathi Chakraborty

Constructing LR( l ) Sets of Items


31
32

Constructing LR( l ) Sets of Items


• Unambiguous LR(1) grammar:
(1) S  L = R
(2) S  R
(3) L  * R
(4) L  id
(5) R  L

• Augment with S’  S
(C) 2014, Prepared by Partha Sarathi Chakraborty

• LR(1) items (next slide) lookahead / Second Component

A•B , a
Core / First Component

FIRST(a)
33

Construct Closure I0
Core lookahead

FIRST(a)
I0 rewrite as:
(C) 2014, Prepared by Partha Sarathi Chakraborty
34

Constructing LR( l ) Sets of Items for grammar X


(C) 2014, Prepared by Partha Sarathi Chakraborty
(C) 2014, Prepared by Partha Sarathi Chakraborty

Construction of canonical-LR parsing tables.


35
(C) 2014, Prepared by Partha Sarathi Chakraborty

grammar X
Canonical LR(1) parsing table for
36
37

LALR(1) Grammars
• LR(1) parsing tables have many states
• LALR(1) parsing (Look-Ahead LR) combines LR(1)
states to reduce table size
• Less powerful than LR(1)
– Will not introduce shift-reduce conflicts, because shifts do
(C) 2014, Prepared by Partha Sarathi Chakraborty

not use lookaheads


– May introduce reduce-reduce conflicts, but seldom do so
for grammars of programming languages
38

Merging of states with common core may produce a


reduce-reduce conflict, but not shift-reduce conflict
(C) 2014, Prepared by Partha Sarathi Chakraborty
39

Constructing LALR(1) Parsing


Tables
1. Construct sets of LR(1) items
2. Combine LR(1) sets with sets of items that share
the same first component or Core, I4 = I11
I4: [L  *•R, =]
[R  •L, =] Shorthand
(C) 2014, Prepared by Partha Sarathi Chakraborty

[L  •*R, =] for two items


[L  *•R, =/$]
[L  •id, =]
[R  •L, =/$] in the same set
[L  •*R, =/$]
I11: [L  *•R, $]
[L  •id, =/$]
[R  •L, $]
[L  •*R, $]
[L  •id, $]
Similarly, following closures share same core or first component.
I5 = I12 , I7 = I13 , I8 = I10
(C) 2014, Prepared by Partha Sarathi Chakraborty

LALR parsing table construction


40
(C) 2014, Prepared by Partha Sarathi Chakraborty

grammar X
LALR(1) parsing table for
41
42

Analysis
• The LR and LALR parsers will mimic one
another on correct inputs.
• When presented with erroneous input, the
LALR parser may proceed to do some
(C) 2014, Prepared by Partha Sarathi Chakraborty

reductions after the LR parser has declared an


error. However, the LALR parser will never
shift another symbol after the LR parser
declares an error.
43

Another Example – canonical LR


and LALR
• Consider the following grammar G
(1) S  S ( S )
(2) S  
• Augmented Grammar G’
(C) 2014, Prepared by Partha Sarathi Chakraborty

S’  S
S S(S)
S 
• LR(1) items (next slide)
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR(1) items for grammar G


44
(C) 2014, Prepared by Partha Sarathi Chakraborty

Canonical LR parsing table


45
46

LALR(1) Parsing Table


• Following closures share same core or first
component of LR(1) items. I4 = I7 , I3 = I6 , I2 = I5
• Therefore, Union of States results
I4  I7 , I3  I6 , I2  I5
(C) 2014, Prepared by Partha Sarathi Chakraborty
(C) 2014, Prepared by Partha Sarathi Chakraborty

Parsing canonical LR
47
(C) 2014, Prepared by Partha Sarathi Chakraborty

LALR parsing
48
49

Another Example – canonical LR


and LALR
• Consider the following grammar G
(1) S  CC
(2) C  cC FIRST(S) = FIRST(C)
(3) C  d ={c,d}
FOLLOW(S) = { $ }
• Augmented Grammar G’
(C) 2014, Prepared by Partha Sarathi Chakraborty

FOLLOW(S) = { $ , c , d }
S’  S
S  CC
C  cC
Cd
• LR(1) items (next slide)
(C) 2014, Prepared by Partha Sarathi Chakraborty

LR(1) items for grammar G


50
(C) 2014, Prepared by Partha Sarathi Chakraborty

Canonical LR parsing table


51
52

LALR(1) Parsing Table


• Following closures share same core or first component of
LR(1) items. I3 = I6 , I4 = I7 , I8 = I9
• Therefore, Union of States results
I3  I6 , I4  I7 , I8  I9
(C) 2014, Prepared by Partha Sarathi Chakraborty

I36: C  cC , c | d | $
C  cC , c | d | $
C  d , c | d | $

I47: C  d , c | d | $

I89: C  cC , c | d | $
(C) 2014, Prepared by Partha Sarathi Chakraborty

LALR(1) Parsing Table


53
54

LL, SLR, LR, LALR Summary


• LL parse tables computed using FIRST/FOLLOW
– Nonterminals  terminals  productions
– Computed using FIRST/FOLLOW
• LR parsing tables computed using closure/goto
– LR states  terminals  shift/reduce actions
(C) 2014, Prepared by Partha Sarathi Chakraborty

– LR states  terminals  goto state transitions


• A grammar is
– LL(1) if its LL(1) parse table has no conflicts
– SLR if its SLR parse table has no conflicts
– LALR(1) if its LALR(1) parse table has no conflicts
– LR(1) if its LR(1) parse table has no conflicts
(C) 2014, Prepared by Partha Sarathi Chakraborty

LL, SLR, LR, LALR Grammars


55
56

Assignment
Consider the following grammar G,
EE+T|T
T  TF | F
F  F* | a | b
(C) 2014, Prepared by Partha Sarathi Chakraborty

i. Construct the sets of LR(0) items.


ii. Construct the SLR(1) parsing table for this grammar.
iii. Construct the sets of LR(1) items.
iv. Construct Canonical LR(1) parsing table.
v. Construct the LALR parsing table.

You might also like