0% found this document useful (0 votes)
16 views

Chp3 Syntax Analysis

The document discusses syntax analysis in compilers, including syntax trees, context-free grammars, push-down automata, and various parsing techniques like top-down parsing and bottom-up parsing. Syntax analysis recognizes the syntactic structure of a programming language and transforms tokens into a parse tree.

Uploaded by

Batool Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Chp3 Syntax Analysis

The document discusses syntax analysis in compilers, including syntax trees, context-free grammars, push-down automata, and various parsing techniques like top-down parsing and bottom-up parsing. Syntax analysis recognizes the syntactic structure of a programming language and transforms tokens into a parse tree.

Uploaded by

Batool Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

Chapter 3 Syntax Analysis

Syntax Analysis

 Syntax analysis recognizes the syntactic


structure of the programming language and
transforms a string of tokens into a tree of
tokens and syntactic categories
 Parser is the program that performs syntax
analysis
Outline

 Introduction to parsers
 Syntax trees
 Context-free grammars
 Push-down automata
 Top-down parsing
 Bison - a parser generator
 Bottom-up parsing
Introduction to Parsers

source token syntax Semantic


Scanner Parser Analyzer
code tree
next token

Symbol
Table
Syntax Trees

 A syntax tree represents the syntactic structure


of tokens in a program defined by the grammar
of the programming language

:=
id1 +
id2 *
id3
60
Context-Free Grammars (CFG)

 A set of terminals: basic symbols (token types)


from which strings are formed
 A set of nonterminals: syntactic categories
each of which denotes a set of strings
 A set of productions: rules specifying how the
terminals and nonterminals can be combined
to form strings
 The start symbol: a distinguished nonterminal
that denotes the whole language
An Example: Arithmetic Expressions

 Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’


 Nonterminals: expr, op
 Productions:
expr  expr op expr
expr  ‘(’ expr ‘)’
expr  ‘-’ expr
expr  id
op  ‘+’ | ‘-’ | ‘*’ | ‘/’
 Start symbol: expr
An Example: Arithmetic Expressions

id  { id },
‘+’  { + },
‘-’  { - },
‘*’  { * },
‘/’  { / },
‘(’  { ( },
‘)’  { ) },
op  { +, -, *, / },
expr  { id, - id, ( id ), id + id, id - id, … }.
Derivations

 A derivation step is an application of a


production as a rewriting rule, namely,
replacing a nonterminal in the string by one of
its right-hand sides, N  
…N………
 Starting with the start symbol, a sequence of
derivation steps is called a derivation
S…
or S * 
An Example

Grammar:
1. expr  expr op expr Derivation:
2. expr  ‘(’ expr ‘)’ expr
3. expr  ‘-’ expr  - expr
4. expr  id  - (expr )
5. op  ‘+’  - (expr op expr )
6. op  ‘-’  - ( id op expr )
7. op  ‘*’  - ( id + expr )
8. op  ‘/’  - ( id + id )
Left- & Right-Most Derivations

 If there are more than one nonterminal in the


string, many choices are possible
 A leftmost derivation always chooses the
leftmost nonterminal to rewrite
 A rightmost derivation always chooses the
rightmost nonterminal to rewrite
An Example

Leftmost derivation: Rightmost derivation:


expr expr
 - expr  - expr
 - (expr )  - (expr )
 - (expr op expr )  - (expr op expr )
 - (id op expr )  - (expr op id)
 - ( id + expr )  - (expr + id )
 - ( id + id )  - ( id + id )
Parse Trees

 A parse tree is a graphical representation for a


derivation that filters out the order of choosing
nonterminals for rewriting
 Many derivations may correspond to the same
parse tree, but every parse tree has associated
with it a unique leftmost and a unique rightmost
derivation
An Example

Leftmost derivation: Rightmost derivation:


expr expr
 - expr expr  - expr
 - (expr )  - (expr )
 - (expr op expr ) - expr  - (expr op expr )
 - (id op expr )  - (expr op id)
 - ( id + expr ) ( expr )  - (expr + id )
 - ( id + id )  - ( id + id )
expr op expr

id + id
Ambiguous Grammars

 A grammar is ambiguous if it can derive a


string with two different parse trees
 If we use the syntactic structure of a parse tree
to interpret the meaning of the string, the two
parse trees have different meanings
 Since compilers do use parse trees to derive
meaning, we would prefer to have
unambiguous grammars
An Example

id + id * id
expr expr

expr + expr expr * expr

id expr * expr expr + expr id

id id id id
Transform Ambiguous Grammars

Ambiguous grammar: Unambiguous grammar:


expr  expr op expr expr  expr ‘+’ term
expr  ‘(’ expr ‘)’ expr  expr ‘-’ term
expr  ‘-’ expr expr  term
expr  id term  term ‘*’ factor
op  ‘+’ | ‘-’ | ‘*’ | ‘/’ term  term ‘/’ factor
term  factor
Not every ambiguous
factor  ‘(’ expr ‘)’
grammar can be
factor  ‘-’ expr
transformed to an
factor  id
unambiguous one!
Push-Down Automata

Input
$

Stack Finite Automata Output

$
End-Of-File and Bottom-of-Stack
Markers

 Parsers must read not only terminal symbols


but also the end-of-file marker and the bottom-
of-stack maker
 We will use $ to represent the end of file
marker
 We will also use $ to represent the bottom-of-
stack maker
An Example

SaSb (a, a) (b, a)


S a a
start (a, $) (b, a) ($, $)
1 2 3 4
a a
($, $)

1  2  2  3  3  4
a a b b $
a
a a a
$ $ $ $ $
CFG versus RE

 Every language defined by a RE can also be


defined by a CFG
 Why use REs for lexical syntax?
– do not need a notation as powerful as CFGs
– are more concise and easier to understand than
CFGs
– More efficient lexical analyzers can be constructed
from REs than from CFGs
– Provide a way for modularizing the front end into
two manageable-sized components
Nonregular Languages

 REs can denote only a fixed number of


repetitions or an unspecified number of
repetitions of one given construct
an, a*
 A nonregular language: L = {anbn | n  0}
SaSb
S
Top-Down Parsing

 Construct a parse tree from the root to the


leaves using leftmost derivation
S  cAB
A ab input: cad
A a
B d
S S S S
c A B c A B c A B c A B

a b a a d
Predictive Parsing

 Predictive parsing is a top-down parsing


without backtracking
 Namely, according to the next token, there is
only one production to choose at each
derivation step

stmt  if expr then stmt else stmt


| while expr do stmt
| begin stmt_list end
LL(k) Parsing

 Predictive parsing is also called LL(k) parsing


 The first L stands for scanning the input from
left to right
 The second L stands for producing a leftmost
derivation
 The k stands for using k lookahead input
symbol to choose alternative productions at
each derivation step
LL(1) Parsing

 We will only describe LL(1) parsing from now


on, namely, parsing using only one lookahead
input symbol
 Recursive-descent parsing – hand written or
tool (e.g. PCCTS and CoCo/R) generated
 Table-driven predictive parsing – tool (e.g.
LISA and LLGEN) generated
Recursive Descent Parsing

 A procedure is associated with each


nonterminal of the grammar
 An alternative case in the procedure is
associated with each production of that
nonterminal
 A match of a token is associated with each
terminal in the right hand side of the production
 A procedure call is associated with each
nonterminal in the right hand side of the
production
Recursive Descent Parsing

begin print num = num ; end

S  if E then S else S S
| begin L end
begin L end
| print E
LS;L S ; L
| 
print E
E  num = num
num = num
Choosing the Alternative Case

S  if E then S else S
| begin L end
| print E
LS;L FIRST(S ; L) = {if, begin, print}
| FOLLOW(L) = {end}
E  num = num
An Example

const int
IF = 1, THEN = 2, ELSE = 3, BEGIN = 4,
END =5, PRINT = 6, SEMI = 7, NUM = 8,
EQ = 9;
int token = yylex();

void match(int t)
{
if (token == t) token = yylex(); else error();
}
An Example

void S() {
switch (token) {
case IF: match(IF); E(); match(THEN); S();
match(ELSE); S(); break;
case BEGIN: match(BEGIN); L();
match(END); break;
case PRINT: match(PRINT); E(); break;
default: error();
}
}
An Example

void L() {
switch (token) {
case END: break;
case IF: case BEGIN: case PRINT:
S(); match(SEMI); L(); break;
default: error();
}
}
An Example

void E() {
switch (token) {
case NUM:
match(NUM); match(EQ); match(NUM);
break;
default: error();
}
}
First and Follow Sets

 The first set of a string , FIRST(), is the set


of terminals that can begin the strings derived
from . If  *  , then  is also in FIRST()
 The follow set of a nonterminal X, FOLLOW(X),
is the set of terminals that can immediately
follow X
Computing First Sets

 If X is terminal, then FIRST(X) is {X}


 If X is nonterminal and X   is a production,
then add  to FIRST(X)
 If X is nonterminal and X  Y1 Y2 ... Yk is a
production, then add a to FIRST(X) if
for some i, a is in FIRST(Yi) and  is in all of
FIRST(Y1), ..., FIRST(Yi-1). If  is in FIRST(Yj)
for all j, then add  to FIRST(X)
An Example

S  if E then S else S | begin L end | print E


LS;L|
E  num = num

FIRST(S) = { if, begin, print }


FIRST(L) = { if, begin, print ,  }
FIRST(E) = { num }
Computing Follow Sets

 Place $ in FOLLOW(S), where S is the start


symbol and $ is the end-of-file marker
 If there is a production A   B , then
everything in FIRST() except for  is placed in
FOLLOW(B)
 If there is a production A   B or A   B
where FIRST() contains  , then everything in
FOLLOW(A) is in FOLLOW(B)
An Example

S  if E then S else S | begin L end | print E


LS;L|
E  num = num

FOLLOW(S) = { $, else, ; }
FOLLOW(L) = { end }
FOLLOW(E) = { then, $, else, ; }
Table-Driven Predictive Parsing

Input. Grammar G. Output. Parsing Table M.


Method.
1. For each production A   of the grammar,
do steps 2 and 3.
2. For each terminal a in FIRST( ), add A   to M[A, a].
3. If  is in FIRST( ), add A   to M[A, b] for each
terminal b in FOLLOW(A). If  is in FIRST( ) and $ is in
FOLLOW(A), add A   to M[A, $].
4. Make each undefined entry of M be error.
An Example

S L E
if S  if E then S else S L  S ; L
then
else
begin S  begin L end LS;L
end L
print S  print E LS;L
num E  num = num
;
$
An Example
Stack Input
$S begin print num = num ; end $
$ end L begin begin print num = num ; end $
$ end L print num = num ; end $
$ end L ; S print num = num ; end $
$ end L ; E print print num = num ; end $
$ end L ; E num = num ; end $
$ end L ; num = num num = num ; end $
$ end L ; ; end $
$ end L end $
$ end end $
$ $
LL(1) Grammars

 A grammar is LL(1) iff its predictive parsing table


has no multiply-defined entries
 A grammar G is LL(1) iff whenever A   | 
are two distinct productions of G, the following
conditions hold:
(1)FIRST()  FIRST() = ,
(2)If  FIRST(), FOLLOW(A)  FIRST() = ,
(3)If  FIRST(), FOLLOW(A)  FIRST() = .
A Counter Example

S  i E t S S' | a
S'  e S | 
Eb

a b e i t $
S Sa S  i E t S S'
S' S'   S'  
S'  e S
E Eb

 FIRST()  FOLLOW(S')  FIRST(e S) = {e}  


Left Recursive Grammars

 A grammar is left recursive if it has a


nonterminal A such that A * A 
 Left recursive grammars are not LL(1) because
AA
A
will cause FIRST(A )  FIRST()  
 We can transform them into LL(1) by
eliminating left recursion
Eliminating Left Recursion

AR
A A| 
RR|

A A R R
A R R
A
A
        
Direct Left Recursion

A  A 1 | A 2 | ... | A m | 1 | 2 | ... | n

A  1 A' | 2 A' | ... | n A'

A'  1 A' | 2 A' | ... | m A' | 


An Example

E E + T | T
T T * F | F
F  ( E ) | id

E  T E'
E'  + T E' | 
T  F T'
T'  * F T' | 
F  ( E ) | id
Indirect Left Recursion

S Aa|b
A Ac|Sd|

S  Aa  Sda

A Ac|Aad|bd|

S Aa|b
A  b d A' | A'
A'  c A' | a d A' | 
Left factoring

 A grammar is not LL(1) if two productions of a


nonterminal A have a nontrivial common prefix.
For example, if    , and A   1 |  2,
then FIRST( 1)  FIRST( 2)  
 We can transform them into LL(1) by
performing left factoring
A   A'
A'  1 | 2
An Example

S iEtS|iEtSeS|a
E b

S  i E t S S' | a
S'  e S | 
E b
Bottom-Up Parsing

 Construct a parse tree from the leaves to the


root using rightmost derivation in reverse

S  aABe input: abbcde


A Abc|b S
B d
A A B A B
A A A A
abbcde abbcde abbcde abbcde abbcde
abbcde  aAbcde  aAde  aABe  S
LR(k) Parsing

 The L stands for scanning the input from left to


right
 The R stands for producing a rightmost
derivation
 The k stands for using k lookahead input
symbol to choose alternative productions at
each derivation step
An Example

1. S’  S
2. S  if E then S else S
3. S  begin L end
4. S  print E
5. L  
6. L  S ; L
7. E  num = num
An Example
Stack Input Action
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ reduce
$ begin print E ; end $ reduce
$ begin S ; end $ shift
$ begin S ; end $ reduce
$ begin S ; L end $ reduce
$ begin L end $ shift
$ begin L end $ reduce
$S $ accept
LL(k) versus LR(k)

 LL(k) parsing must predict which production to


use after seeing only the first k tokens of the
right-hand side
 LR(k) parsing is able to postpone the decision
until it has seen tokens corresponding to the
entire right-hand side and k more tokens
beyond
 LR(k) parsing thus can handle more grammars
than LL(k) parsing
LR Parsers

$ Input

s2 Parsing driver Output


Y
s1
X
Parsing table Finite Automaton
$
Stack
LR Parsing Tables

if then else begin end print ; num = $ S L E


1 s3 s4 s5 g2
2 a
3 s7 g6
4 s3 s4 r5 s5 g9 g8
5 s7 g10
6 s11
7 s12
8 s13
9 s14
10 r4 r4 r4 r4
LR Parsing Tables

if then else begin end print ; num = $ S L E


11 s3 s4 s5 g15
12 s16
13 r3 r3 r3
14 r5 g9 g17
15 s18
16 r7 r7 r7 r7
17 r6
18 s3 s4 s5 g19
19 r2 r2 r2
action goto
An Example

1. S’  S
2. S  if E then S else S
3. S  begin L end
4. S  print E
5. L  
6. L  S ; L
7. E  num = num
An Example
Stack Input Action
$1 begin print num = num ; end $ s4
$1begin4 print num = num ; end $ s5
$1begin4print5 num = num ; end $ s7
$1begin4print5num7 = num ; end $ s12
$1begin4print5num7=12 num ; end $ s16
$1begin4print5num7=12num16 ; end $ r7
$1begin4print5E10 ; end $ r4
$1begin4S9 ; end $ s14
$1begin4S9;14 end $ r5
$1begin4S9;14L17 end $ r6
$1begin4L8 end $ s13
$1begin4L8end13 $ r3
$1S2 $ a
LR Parsing Driver

while (true) {
s = top(); a = gettoken();
if (action[s, a] == shift s‟) { push(a); push(s‟); }
else if (action[s, a] == reduce A  ) {
pop 2 * |  | symbols off the stack;
s‟ = goto[top(), A]; push(A); push(s‟); }
else if (action[s, a] == accept) { return; }
else { error(); }
}
Bison – A Parser Generator

A langauge for specifying parsers and semantic analyzers

Bison compiler lang.tab.c


lang.y
lang.tab.h (-d option)

lang.tab.c C compiler a.out

tokens a.out syntax tree


Bison Programs

%{
C declarations
%}
Bison declarations
%%
Grammar rules
%%
Additional C code
An Example

line  expr „\n‟


expr  expr „+‟ term | term
term  term „*‟ factor | factor
factor  „(‟ expr „)‟ | DIGIT
An Example - expr.y

%token DIGIT %%
%start line line: expr „\n‟
;
expr: expr „+‟ term
| term
;
term: term „*‟ factor
| factor
;
factor: „(‟ expr „)‟
| DIGIT
;
An Example - expr.y

%token NEWLINE %%
%token ADD line: expr NEWLINE
%token MUL ;
%token LP expr: expr ADD term
%token RP | term
%token DIGIT ;
%start line term: term MUL factor
| factor
;
factor: LP expr RP
| DIGIT
;
An Example - expr.tab.h

#define NEWLINE 278


#define ADD 279
#define MUL 280
#define LP 281
#define RP 282
#define DIGIT 283
Semantic Actions

line: expr „\n‟ {printf(“line: expr \\n\n”);}


; Semantic action
expr: expr „+‟ term {printf(“expr: expr + term\n”);}
| term {printf(“expr: term\n”}
;
term: term „*‟ factor {printf(“term: term * factor\n”;}
| factor {printf(“term: factor\n”);}
;
factor: „(‟ expr „)‟ {printf(“factor: ( expr )\n”);}
| DIGIT {printf(“factor: DIGIT\n”);}
;
Functions

 yyparse(): the parser function


 yylex(): the lexical analyzer function. Bison
recognizes any non-positive value as
indicating the end of the input
Variables

 yylval: the attribute value of a token. Its default


type is int, and can be declared to be multiple
types in the first section using
%union {
int ival;
double dval;
}
 Tokens with attribute value can be declared as
%token <ival> intcon
%token <dval> doublecon
Conflict Resolutions

 A reduce/reduce conflict is resolved by


choosing the production listed first
 A shift/reduce conflict is resolved in favor of
shift
 A mechanism for assigning precedences and
assocoativities to terminals
Precedence and Associativity

 The precedence and associativity of


operators are declared simultaneously
%nonassoc ‘<’ /* lowest */
%left ‘+’ ‘-’
%right ‘^’ /* highest */
 The precedence of a rule is determined by
the precedence of its rightmost terminal
 The precedence of a rule can be modified by
adding %prec <terminal> to its right end
An Example

%{
#include <stdio.h>
%}

%token NUMBER
%left „+‟ „-‟
%left „*‟ „/‟
%right UMINUS

%%
An Example

line : expr „\n‟


;
expr: expr „+‟ expr
| expr „-‟ expr
| expr „*‟ expr
| expr „/‟ expr
| „-‟ expr %prec UMINUS
| „(‟ expr „)‟
| NUMBER
;
Error Report

 The parser can report a syntax error by calling


the user provided function yyerror(char *)

yyerror(char *s)
{
fprintf(stderr, “%s: line %d\n”, s, yylineno);
}
LR Parsing Table Generation

 An LR parsing table generation algorithm


transforms a CFG to an LR parsing table
 SLR(1) parsing table generation
 LR(1) parsing table generation
 LALR(1) parsing table generation
From CFG to NPDA

 An LR(0) item of a grammar in G is a


production of G with a dot at some position of
the right-hand side, A    
 The production A  X Y Z yields the following
four LR(0) items
A  • X Y Z, A  X • Y Z,
A  X Y • Z, A  X Y Z •
 An LR(0) item represents a state in a NPDA
indicating how much of a production we have
seen at a given point in the parsing process
An Example

1. E‟  E
2. E  E + T
3. E  T
4. T  T * F
5. T  F
6. F  ( E )
7. F  id
An Example
2 9 18 15
E + T EE+T•
E•E+T EE•+T EE+•T
4
 
 5

10 6 13 17 20
 ( )
ET• F(•E) E F(E•)
F•(E) F(E)•
T  2
1 3 5 7 14 
   id  3
E‟•E E•T T•F F•id Fid•

F 12

 TF•
E
T•T*F T TT•*F * TT*•F
F TT*F•
8 4 11 16 6 19
 
E‟E•  7
From NPDA to DPDA

 There are two functions performed on sets of


LR(0) items (states)
 The function closure(I) adds more items to I
when there is a dot to the left of a nonterminal
 The function goto(I, X) moves the dot past the
symbol X in all items in I that contain X
The Closure Function

closure(I) =
repeat
for any item A   X  in I
for any production X  
I=I{X}
until I does not change
return I
An Example

1. E‟  E s1 = E‟   E,
2. E  E + T I1 = closure({s1 }) = {
3. E  T E‟   E,
4. T  T * F E   E + T,
5. T  F E   T,
6. F  ( E ) T   T * F,
7. F  id T   F,
F   ( E ),
F   id }
The Goto Function

goto(I, X) =
set J to the empty set
for any item A   X  in I
add A  X   to J
return closure(J)
An Example

I1 = {E‟   E,
E   E + T, E   T,
T   T * F, T   F,
F   ( E ), F   id }

goto(I1 , E)
= closure({E‟  E , E  E  + T })
= {E‟  E , E  E  + T }
The Subset Construction Function

subset-construction(cfg) =
initialize T to {closure({S‟   S})}
repeat
for each state I in T and each symbol X
let J be goto(I, X)
if J is not empty and not in T then
T=T{J}
until T does not change
return T
An Example

I1 : {E‟   E, E   E + T, E   T, T   T * F,
T   F, F   ( E ), F   id}

goto(I1, E) = I2 : {E‟  E , E  E  + T}
goto(I1, T) = I3 : {E  T , T  T  * F}
goto(I1, F) = I4 : {T  F }
goto(I1, „(‟) = I5 : {F  (  E ), E   E + T, E   T
T   T * F, T   F, F   ( E ), F   id}
goto(I1, id) = I6 : {F  id }

goto(I2, „+‟) = I7 : {E  E +  T, T   T * F, T   F
F   ( E ), F   id}
An Example

goto(I3, „*‟) = I8 : {T  T *  F, F   ( E ), F   id}

goto(I5, E) = I9 : {F  ( E ), E  E  + T}
goto(I5, T) = I3
goto(I5, F) = I4
goto(I5, „(‟) = I5
goto(I5, id) = I6

goto(I7, T) = I10 : {E  E + T , T  T  * F}
goto(I7, F) = I4
goto(I7, „(‟) = I5
goto(I7, id) = I6
An Example

goto(I8, F) = I11 : {T  T * F }
goto(I8, „(„) = I5
goto(I8, id) = I6

goto(I9, „)‟) = I12 : {F  ( E ) }


goto(I9, „+‟) = I7

goto(I10, „*‟) = I8
An Example
E‟  • E F  id • F(•E)
E  • E + T id 6 id E•E+T F (E•) 9
E•T E•T
( E EE•+T
T•T*F T•T*F +
T•F T T•F ( )
F•(E) 8 id ( F F•(E)
F  ( E ) • 12
F  • id 1 TT*•F F  • id 5
E F T 3 F•(E) F
* F  • id T  T * F • 11
ET•
T  T • * F id E  E + • T *
(
T•T*F
TF• 4 F T E  E + T • 10
T•F TT•*F
E‟  E • + F•(E)
EE•+T 2 F  • id 7
SLR(1) Parsing Table Generation
SLR(cfg) =
for each state I in subset-construction(cfg)
if A   a  in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if A   in I and A  S‟ then
action[I, a] = “reduce A ” for all a in Follow(A)
if S‟  S  in I then action[I, $] = “accept”
if A   X  in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
all other entries in action and goto are made error
An Example

+ * ( ) id $ E T F
1 s5 s6 g2 g3 g4
2 s7 a
3 r3 s8 r3 r3
4 r5 r5 r5 r5
5 s5 s6 g9 g3 g4
6 r7 r7 r7 r7
7 s5 s6 g10 g4
8 s5 s6 g11
9 s7 s12
10 r2 s8 r2 r2
An Example

+ * ( ) id $ E T F
11 r4 r4 r4 r4
12 r6 r6 r6 r6
LR(I) Items

 An LR(1) item of a grammar in G is a pair,


( A    , a ), of an LR(0) item A    
and a lookahead symbol a
 The lookahead has no effect in an LR(1) item
of the form ( A    , a ), where  is not 
 An LR(1) item of the form ( A    , a ) calls
for a reduction by A   only if the next input
symbol is a
The Closure Function

closure(I) =
repeat
for any item (A   X , a) in I
for any production X  
for any b  First(a)
I = I  { (X   , b) }
until I does not change
return I
An Example

I1 = closure({(S‟   S, $)}) =
1. S‟  S
2. S  C C {(S‟   S, $), First($) = {$}
3. C  c C
(S   C C, $),
4. C  d
(C   c C, c), (C   c C, d),
First(C$) = {c, d} (C   d, c), (C   d, d)}
The Goto Function

goto(I, X) =
set J to the empty set
for any item (A   X , a) in I
add (A  X  , a) to J
return closure(J)
An Example

goto(I1, C)
= closure({S  C  C, $)})
= {S  C  C, $), (C   c C, $), (C   d, $)}
The Subset Construction Function

subset-construction(cfg) =
initialize T to {closure({(S‟   S , $)})}
repeat
for each state I in T and each symbol X
let J be goto(I, X)
if J is not empty and not in T then
T=T{J}
until T does not change
return T
An Example

1. S‟  S
2. S  C C
3. C  c C
4. C  d
An Example
I1: closure({(S‟   S, $)}) = I4: goto(I1, c) =
(S‟   S, $) (C  c  C, c/d)
(S   C C, $) (C   c C, c/d)
(C   c C, c/d) (C   d, c/d)
(C   d, c/d)
I5: goto(I1, d) =
I2: goto(I1, S) = (S‟  S , $) (C  d , c/d)

I3: goto(I1, C) = I6: goto(I3, C) =


(S  C  C, $) (S  C C , $)
(C   c C, $)
(C   d, $)
An Example

I7: goto(I3, c) = : goto(I4, c) = I4


(C  c  C, $)
(C   c C, $) : goto(I4, d) = I5
(C   d, $)
I10: goto(I7, C) =
I8: goto(I3, d) = (C  c C , $)
(C  d , $)
: goto(I7, c) = I7
I9: goto(I4, C) =
(C  c C , c/d) : goto(I7, d) = I8
LR(1) Parsing Table Generation
LR(cfg) =
for each state I in subset-construction(cfg)
if (A   a , b) in I and goto(I, a) = J for a terminal a
then action[I, a] = “shift J”
if (A  , a) in I and A  S‟
then action[I, a] = “reduce A ”
if (S‟  S  , $) in I then action[I, $] = “accept”
if (A   X , a) in I and goto(I,X) = J for a nonterminal X
then goto[I, X] = J
all other entries in action and goto are made error
An Example

c d $ S C
1 s4 s5 g2 g3
2 a
3 s7 s8 g6
4 s4 s5 g9
5 r4 r4
6 r2
7 s7 s8 g10
8 r4
9 r3 r3
10 r3
An Example

$,r1
2 6 $,r2
C
S c
c C 10 $,r3
C 3 7
1 c d d
c 4 8 $,r4
d C
d
5 9 c/d,r3
c/d,r4
An Example

$,r1
2 6 $,r2
C
S c
c C 10 $,r3
C 3 7
1 c d d
c 4 8 $,r4
d C
d
5 9 c/d,r3
c/d,r4
The Core of LR(1) Items

 The core of a set of LR(1) Items is the set of


their first components (i.e., LR(0) items)
 The core of the set of LR(1) items
{ (C  c  C, c/d),
(C   c C, c/d),
(C   d, c/d) }
is { C  c  C,
C   c C,
Cd}
Merging Cores

I4: { (C  c  C, c/d), (C   c C, c/d), (C   d, c/d) }


 I7: { (C  c  C, $), (C   c C, $), (C   d, $) }
 I47: { (C  c  C, c/d/$), (C   c C, c/d/$),
(C   d, c/d/$) }

I5: { (C  d , c/d) }  I8: { (C  d , $) }


 I58: { (C  d , c/d/$) }

I9: { (C  c C , c/d) }  I10: { (C  c C , $) }


 I910: { (C  c C , c/d/$) }
LALR(1) Parsing Table Generation
LALR(cfg) =
for each state I in merge-core(subset-construction(cfg))
if (A   a , b) in I and goto(I, a) = J for a terminal a
then action[I, a] = “shift J”
if (A  , a) in I and A  S‟
then action[I, a] = “reduce A ”
if (S‟  S , $) in I then action[I, $] = “accept”
if (A   X , a) in I and goto(I,X) = J for a nonterminal X
then goto[I, X] = J
all other entries in action and goto are made error
An Example

c d $ S C
1 s47 s58 g2 g3
2 a
3 s47 s58 g6
47 s47 s58 g910
58 r4 r4 r4
6 r2
910 r3 r3 r3
Shift/Reduce Conflicts

stmt  if expr then stmt


| if expr then stmt else stmt
| other

Stack Input
$ - - - if expr then stmt else - - - $

Shift  if expr then stmt else stmt


Reduce  if expr then stmt
Reduce/Reduce Conflicts

stmt  id ( para_list ) | expr := expr


para_list  para_list , para | para
para  id
expr_list  expr_list , expr | expr
expr  id ( expr_list ) | id
Stack Input
$ - - - id ( id , id ) - - - $

$- - - procid ( id , id ) - - - $
LR Grammars

 A grammar is SLR(1) iff its SLR(1) parsing


table has no multiply-defined entries
 A grammar is LR(1) iff its LR(1) parsing table
has no multiply-defined entries
 A grammar is LALR(1) iff its LALR(1) parsing
table has no multiply-defined entries
Hierarchy of Grammar Classes

Unambiguous Grammars Ambiguous Grammars

LL(k) LR(k)

LR(1)

LALR(1)

LL(1) SLR(1)

You might also like