0% found this document useful (0 votes)

16 views113 pages

Chp3 Syntax Analysis

The document discusses syntax analysis in compilers, including syntax trees, context-free grammars, push-down automata, and various parsing techniques like top-down parsing and bottom-up parsing. Syntax analysis recognizes the syntactic structure of a programming language and transforms tokens into a parse tree.

Uploaded by

Batool Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views113 pages

Chp3 Syntax Analysis

Uploaded by

Batool Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 113

Chapter 3 Syntax Analysis

Syntax Analysis

 Syntax analysis recognizes the syntactic

structure of the programming language and
transforms a string of tokens into a tree of
tokens and syntactic categories
 Parser is the program that performs syntax
analysis
Outline

 Introduction to parsers
 Syntax trees
 Context-free grammars
 Push-down automata
 Top-down parsing
 Bison - a parser generator
 Bottom-up parsing
Introduction to Parsers

source token syntax Semantic

Scanner Parser Analyzer
code tree
next token

Symbol
Table
Syntax Trees

 A syntax tree represents the syntactic structure

of tokens in a program defined by the grammar
of the programming language

:=
id1 +
id2 *
id3
60
Context-Free Grammars (CFG)

 A set of terminals: basic symbols (token types)

from which strings are formed
 A set of nonterminals: syntactic categories
each of which denotes a set of strings
 A set of productions: rules specifying how the
terminals and nonterminals can be combined
to form strings
 The start symbol: a distinguished nonterminal
that denotes the whole language
An Example: Arithmetic Expressions

 Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’

 Nonterminals: expr, op
 Productions:
expr  expr op expr
expr  ‘(’ expr ‘)’
expr  ‘-’ expr
expr  id
op  ‘+’ | ‘-’ | ‘*’ | ‘/’
 Start symbol: expr
An Example: Arithmetic Expressions

id  { id },
‘+’  { + },
‘-’  { - },
‘*’  { * },
‘/’  { / },
‘(’  { ( },
‘)’  { ) },
op  { +, -, *, / },
expr  { id, - id, ( id ), id + id, id - id, … }.
Derivations

 A derivation step is an application of a

production as a rewriting rule, namely,
replacing a nonterminal in the string by one of
its right-hand sides, N  
…N………
 Starting with the start symbol, a sequence of
derivation steps is called a derivation
S…
or S * 
An Example

Grammar:
1. expr  expr op expr Derivation:
2. expr  ‘(’ expr ‘)’ expr
3. expr  ‘-’ expr  - expr
4. expr  id  - (expr )
5. op  ‘+’  - (expr op expr )
6. op  ‘-’  - ( id op expr )
7. op  ‘*’  - ( id + expr )
8. op  ‘/’  - ( id + id )
Left- & Right-Most Derivations

 If there are more than one nonterminal in the

string, many choices are possible
 A leftmost derivation always chooses the
leftmost nonterminal to rewrite
 A rightmost derivation always chooses the
rightmost nonterminal to rewrite
An Example

Leftmost derivation: Rightmost derivation:

expr expr
 - expr  - expr
 - (expr )  - (expr )
 - (expr op expr )  - (expr op expr )
 - (id op expr )  - (expr op id)
 - ( id + expr )  - (expr + id )
 - ( id + id )  - ( id + id )
Parse Trees

 A parse tree is a graphical representation for a

derivation that filters out the order of choosing
nonterminals for rewriting
 Many derivations may correspond to the same
parse tree, but every parse tree has associated
with it a unique leftmost and a unique rightmost
derivation
An Example

Leftmost derivation: Rightmost derivation:

expr expr
 - expr expr  - expr
 - (expr )  - (expr )
 - (expr op expr ) - expr  - (expr op expr )
 - (id op expr )  - (expr op id)
 - ( id + expr ) ( expr )  - (expr + id )
 - ( id + id )  - ( id + id )
expr op expr

id + id
Ambiguous Grammars

 A grammar is ambiguous if it can derive a

string with two different parse trees
 If we use the syntactic structure of a parse tree
to interpret the meaning of the string, the two
parse trees have different meanings
 Since compilers do use parse trees to derive
meaning, we would prefer to have
unambiguous grammars
An Example

id + id * id
expr expr

expr + expr expr * expr

id expr * expr expr + expr id

id id id id
Transform Ambiguous Grammars

Ambiguous grammar: Unambiguous grammar:

expr  expr op expr expr  expr ‘+’ term
expr  ‘(’ expr ‘)’ expr  expr ‘-’ term
expr  ‘-’ expr expr  term
expr  id term  term ‘*’ factor
op  ‘+’ | ‘-’ | ‘*’ | ‘/’ term  term ‘/’ factor
term  factor
Not every ambiguous
factor  ‘(’ expr ‘)’
grammar can be
factor  ‘-’ expr
transformed to an
factor  id
unambiguous one!
Push-Down Automata

Input
$

Stack Finite Automata Output

$
End-Of-File and Bottom-of-Stack
Markers

 Parsers must read not only terminal symbols

but also the end-of-file marker and the bottom-
of-stack maker
 We will use $ to represent the end of file
marker
 We will also use $ to represent the bottom-of-
stack maker
An Example

SaSb (a, a) (b, a)

S a a
start (a, $) (b, a) ($, $)
1 2 3 4
a a
($, $)

1  2  2  3  3  4
a a b b $
a
a a a
$ $ $ $ $
CFG versus RE

 Every language defined by a RE can also be

defined by a CFG
 Why use REs for lexical syntax?
– do not need a notation as powerful as CFGs
– are more concise and easier to understand than
CFGs
– More efficient lexical analyzers can be constructed
from REs than from CFGs
– Provide a way for modularizing the front end into
two manageable-sized components
Nonregular Languages

 REs can denote only a fixed number of

repetitions or an unspecified number of
repetitions of one given construct
an, a*
 A nonregular language: L = {anbn | n  0}
SaSb
S
Top-Down Parsing

 Construct a parse tree from the root to the

leaves using leftmost derivation
S  cAB
A ab input: cad
A a
B d
S S S S
c A B c A B c A B c A B

a b a a d
Predictive Parsing

 Predictive parsing is a top-down parsing

without backtracking
 Namely, according to the next token, there is
only one production to choose at each
derivation step

stmt  if expr then stmt else stmt

| while expr do stmt
| begin stmt_list end
LL(k) Parsing

 Predictive parsing is also called LL(k) parsing

 The first L stands for scanning the input from
left to right
 The second L stands for producing a leftmost
derivation
 The k stands for using k lookahead input
symbol to choose alternative productions at
each derivation step
LL(1) Parsing

 We will only describe LL(1) parsing from now

on, namely, parsing using only one lookahead
input symbol
 Recursive-descent parsing – hand written or
tool (e.g. PCCTS and CoCo/R) generated
 Table-driven predictive parsing – tool (e.g.
LISA and LLGEN) generated
Recursive Descent Parsing

 A procedure is associated with each

nonterminal of the grammar
 An alternative case in the procedure is
associated with each production of that
nonterminal
 A match of a token is associated with each
terminal in the right hand side of the production
 A procedure call is associated with each
nonterminal in the right hand side of the
production
Recursive Descent Parsing

begin print num = num ; end

S  if E then S else S S
| begin L end
begin L end
| print E
LS;L S ; L
| 
print E
E  num = num
num = num
Choosing the Alternative Case

S  if E then S else S
| begin L end
| print E
LS;L FIRST(S ; L) = {if, begin, print}
| FOLLOW(L) = {end}
E  num = num
An Example

const int
IF = 1, THEN = 2, ELSE = 3, BEGIN = 4,
END =5, PRINT = 6, SEMI = 7, NUM = 8,
EQ = 9;
int token = yylex();

void match(int t)
{
if (token == t) token = yylex(); else error();
}
An Example

void S() {
switch (token) {
case IF: match(IF); E(); match(THEN); S();
match(ELSE); S(); break;
case BEGIN: match(BEGIN); L();
match(END); break;
case PRINT: match(PRINT); E(); break;
default: error();
}
}
An Example

void L() {
switch (token) {
case END: break;
case IF: case BEGIN: case PRINT:
S(); match(SEMI); L(); break;
default: error();
}
}
An Example

void E() {
switch (token) {
case NUM:
match(NUM); match(EQ); match(NUM);
break;
default: error();
}
}
First and Follow Sets

 The first set of a string , FIRST(), is the set

of terminals that can begin the strings derived
from . If  *  , then  is also in FIRST()
 The follow set of a nonterminal X, FOLLOW(X),
is the set of terminals that can immediately
follow X
Computing First Sets

 If X is terminal, then FIRST(X) is {X}

 If X is nonterminal and X   is a production,
then add  to FIRST(X)
 If X is nonterminal and X  Y1 Y2 ... Yk is a
production, then add a to FIRST(X) if
for some i, a is in FIRST(Yi) and  is in all of
FIRST(Y1), ..., FIRST(Yi-1). If  is in FIRST(Yj)
for all j, then add  to FIRST(X)
An Example

S  if E then S else S | begin L end | print E

LS;L|
E  num = num

FIRST(S) = { if, begin, print }

FIRST(L) = { if, begin, print ,  }
FIRST(E) = { num }
Computing Follow Sets

 Place $ in FOLLOW(S), where S is the start

symbol and $ is the end-of-file marker
 If there is a production A   B , then
everything in FIRST() except for  is placed in
FOLLOW(B)
 If there is a production A   B or A   B
where FIRST() contains  , then everything in
FOLLOW(A) is in FOLLOW(B)
An Example

S  if E then S else S | begin L end | print E

LS;L|
E  num = num

FOLLOW(S) = { $, else, ; }
FOLLOW(L) = { end }
FOLLOW(E) = { then, $, else, ; }
Table-Driven Predictive Parsing

Input. Grammar G. Output. Parsing Table M.

Method.
1. For each production A   of the grammar,
do steps 2 and 3.
2. For each terminal a in FIRST( ), add A   to M[A, a].
3. If  is in FIRST( ), add A   to M[A, b] for each
terminal b in FOLLOW(A). If  is in FIRST( ) and $ is in
FOLLOW(A), add A   to M[A, $].
4. Make each undefined entry of M be error.
An Example

S L E
if S  if E then S else S L  S ; L
then
else
begin S  begin L end LS;L
end L
print S  print E LS;L
num E  num = num
;
$
An Example
Stack Input
$S begin print num = num ; end $
$ end L begin begin print num = num ; end $
$ end L print num = num ; end $
$ end L ; S print num = num ; end $
$ end L ; E print print num = num ; end $
$ end L ; E num = num ; end $
$ end L ; num = num num = num ; end $
$ end L ; ; end $
$ end L end $
$ end end $
$ $
LL(1) Grammars

 A grammar is LL(1) iff its predictive parsing table

has no multiply-defined entries
 A grammar G is LL(1) iff whenever A   | 
are two distinct productions of G, the following
conditions hold:
(1)FIRST()  FIRST() = ,
(2)If  FIRST(), FOLLOW(A)  FIRST() = ,
(3)If  FIRST(), FOLLOW(A)  FIRST() = .
A Counter Example

S  i E t S S' | a
S'  e S | 
Eb

a b e i t $
S Sa S  i E t S S'
S' S'   S'  
S'  e S
E Eb

 FIRST()  FOLLOW(S')  FIRST(e S) = {e}  

Left Recursive Grammars

 A grammar is left recursive if it has a

nonterminal A such that A * A 
 Left recursive grammars are not LL(1) because
AA
A
will cause FIRST(A )  FIRST()  
 We can transform them into LL(1) by
eliminating left recursion
Eliminating Left Recursion

AR
A A| 
RR|

A A R R
A R R
A
A
        
Direct Left Recursion

A  A 1 | A 2 | ... | A m | 1 | 2 | ... | n

A  1 A' | 2 A' | ... | n A'

A'  1 A' | 2 A' | ... | m A' | 

An Example

E E + T | T
T T * F | F
F  ( E ) | id

E  T E'
E'  + T E' | 
T  F T'
T'  * F T' | 
F  ( E ) | id
Indirect Left Recursion

S Aa|b
A Ac|Sd|

S  Aa  Sda

A Ac|Aad|bd|

S Aa|b
A  b d A' | A'
A'  c A' | a d A' | 
Left factoring

 A grammar is not LL(1) if two productions of a

nonterminal A have a nontrivial common prefix.
For example, if    , and A   1 |  2,
then FIRST( 1)  FIRST( 2)  
 We can transform them into LL(1) by
performing left factoring
A   A'
A'  1 | 2
An Example

S iEtS|iEtSeS|a
E b

S  i E t S S' | a
S'  e S | 
E b
Bottom-Up Parsing

 Construct a parse tree from the leaves to the

root using rightmost derivation in reverse

S  aABe input: abbcde

A Abc|b S
B d
A A B A B
A A A A
abbcde abbcde abbcde abbcde abbcde
abbcde  aAbcde  aAde  aABe  S
LR(k) Parsing

 The L stands for scanning the input from left to

right
 The R stands for producing a rightmost
derivation
 The k stands for using k lookahead input
symbol to choose alternative productions at
each derivation step
An Example

1. S’  S
2. S  if E then S else S
3. S  begin L end
4. S  print E
5. L  
6. L  S ; L
7. E  num = num
An Example
Stack Input Action
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ shift
$ begin print num = num ; end $ reduce
$ begin print E ; end $ reduce
$ begin S ; end $ shift
$ begin S ; end $ reduce
$ begin S ; L end $ reduce
$ begin L end $ shift
$ begin L end $ reduce
$S $ accept
LL(k) versus LR(k)

 LL(k) parsing must predict which production to

use after seeing only the first k tokens of the
right-hand side
 LR(k) parsing is able to postpone the decision
until it has seen tokens corresponding to the
entire right-hand side and k more tokens
beyond
 LR(k) parsing thus can handle more grammars
than LL(k) parsing
LR Parsers

$ Input

s2 Parsing driver Output

Y
s1
X
Parsing table Finite Automaton
$
Stack
LR Parsing Tables

if then else begin end print ; num = $ S L E

1 s3 s4 s5 g2
2 a
3 s7 g6
4 s3 s4 r5 s5 g9 g8
5 s7 g10
6 s11
7 s12
8 s13
9 s14
10 r4 r4 r4 r4
LR Parsing Tables

if then else begin end print ; num = $ S L E

11 s3 s4 s5 g15
12 s16
13 r3 r3 r3
14 r5 g9 g17
15 s18
16 r7 r7 r7 r7
17 r6
18 s3 s4 s5 g19
19 r2 r2 r2
action goto
An Example

1. S’  S
2. S  if E then S else S
3. S  begin L end
4. S  print E
5. L  
6. L  S ; L
7. E  num = num
An Example
Stack Input Action
$1 begin print num = num ; end $ s4
$1begin4 print num = num ; end $ s5
$1begin4print5 num = num ; end $ s7
$1begin4print5num7 = num ; end $ s12
$1begin4print5num7=12 num ; end $ s16
$1begin4print5num7=12num16 ; end $ r7
$1begin4print5E10 ; end $ r4
$1begin4S9 ; end $ s14
$1begin4S9;14 end $ r5
$1begin4S9;14L17 end $ r6
$1begin4L8 end $ s13
$1begin4L8end13 $ r3
$1S2 $ a
LR Parsing Driver

while (true) {
s = top(); a = gettoken();
if (action[s, a] == shift s‟) { push(a); push(s‟); }
else if (action[s, a] == reduce A  ) {
pop 2 * |  | symbols off the stack;
s‟ = goto[top(), A]; push(A); push(s‟); }
else if (action[s, a] == accept) { return; }
else { error(); }
}
Bison – A Parser Generator

A langauge for specifying parsers and semantic analyzers

Bison compiler lang.tab.c

lang.y
lang.tab.h (-d option)

lang.tab.c C compiler a.out

tokens a.out syntax tree

Bison Programs

%{
C declarations
%}
Bison declarations
%%
Grammar rules
%%
Additional C code
An Example

line  expr „\n‟

expr  expr „+‟ term | term
term  term „*‟ factor | factor
factor  „(‟ expr „)‟ | DIGIT
An Example - expr.y

%token DIGIT %%
%start line line: expr „\n‟
;
expr: expr „+‟ term
| term
;
term: term „*‟ factor
| factor
;
factor: „(‟ expr „)‟
| DIGIT
;
An Example - expr.y

%token NEWLINE %%
%token ADD line: expr NEWLINE
%token MUL ;
%token LP expr: expr ADD term
%token RP | term
%token DIGIT ;
%start line term: term MUL factor
| factor
;
factor: LP expr RP
| DIGIT
;
An Example - expr.tab.h

#define NEWLINE 278

#define ADD 279
#define MUL 280
#define LP 281
#define RP 282
#define DIGIT 283
Semantic Actions

line: expr „\n‟ {printf(“line: expr \\n\n”);}

; Semantic action
expr: expr „+‟ term {printf(“expr: expr + term\n”);}
| term {printf(“expr: term\n”}
;
term: term „*‟ factor {printf(“term: term * factor\n”;}
| factor {printf(“term: factor\n”);}
;
factor: „(‟ expr „)‟ {printf(“factor: ( expr )\n”);}
| DIGIT {printf(“factor: DIGIT\n”);}
;
Functions

 yyparse(): the parser function

 yylex(): the lexical analyzer function. Bison
recognizes any non-positive value as
indicating the end of the input
Variables

 yylval: the attribute value of a token. Its default

type is int, and can be declared to be multiple
types in the first section using
%union {
int ival;
double dval;
}
 Tokens with attribute value can be declared as
%token <ival> intcon
%token <dval> doublecon
Conflict Resolutions

 A reduce/reduce conflict is resolved by

choosing the production listed first
 A shift/reduce conflict is resolved in favor of
shift
 A mechanism for assigning precedences and
assocoativities to terminals
Precedence and Associativity

 The precedence and associativity of

operators are declared simultaneously
%nonassoc ‘<’ /* lowest */
%left ‘+’ ‘-’
%right ‘^’ /* highest */
 The precedence of a rule is determined by
the precedence of its rightmost terminal
 The precedence of a rule can be modified by
adding %prec <terminal> to its right end
An Example

%{
#include <stdio.h>
%}

%token NUMBER
%left „+‟ „-‟
%left „*‟ „/‟
%right UMINUS

%%
An Example

line : expr „\n‟

 The parser can report a syntax error by calling

the user provided function yyerror(char *)

yyerror(char *s)
{
fprintf(stderr, “%s: line %d\n”, s, yylineno);
}
LR Parsing Table Generation

 An LR parsing table generation algorithm

transforms a CFG to an LR parsing table
 SLR(1) parsing table generation
 LR(1) parsing table generation
 LALR(1) parsing table generation
From CFG to NPDA

 An LR(0) item of a grammar in G is a

production of G with a dot at some position of
the right-hand side, A    
 The production A  X Y Z yields the following
four LR(0) items
A  • X Y Z, A  X • Y Z,
A  X Y • Z, A  X Y Z •
 An LR(0) item represents a state in a NPDA
indicating how much of a production we have
seen at a given point in the parsing process
An Example

1. E‟  E
2. E  E + T
3. E  T
4. T  T * F
5. T  F
6. F  ( E )
7. F  id
An Example
2 9 18 15
E + T EE+T•
E•E+T EE•+T EE+•T
4
 
 5

10 6 13 17 20
 ( )
ET• F(•E) E F(E•)
F•(E) F(E)•
T  2
1 3 5 7 14 
   id  3
E‟•E E•T T•F F•id Fid•

F 12

 TF•
E
T•T*F T TT•*F * TT*•F
F TT*F•
8 4 11 16 6 19
 
E‟E•  7
From NPDA to DPDA

 There are two functions performed on sets of

LR(0) items (states)
 The function closure(I) adds more items to I
when there is a dot to the left of a nonterminal
 The function goto(I, X) moves the dot past the
symbol X in all items in I that contain X
The Closure Function

closure(I) =
repeat
for any item A   X  in I
for any production X  
I=I{X}
until I does not change
return I
An Example

1. E‟  E s1 = E‟   E,
2. E  E + T I1 = closure({s1 }) = {
3. E  T E‟   E,
4. T  T * F E   E + T,
5. T  F E   T,
6. F  ( E ) T   T * F,
7. F  id T   F,
F   ( E ),
F   id }
The Goto Function

goto(I, X) =
set J to the empty set
for any item A   X  in I
add A  X   to J
return closure(J)
An Example

I1 = {E‟   E,
E   E + T, E   T,
T   T * F, T   F,
F   ( E ), F   id }

goto(I1 , E)
= closure({E‟  E , E  E  + T })
= {E‟  E , E  E  + T }
The Subset Construction Function

subset-construction(cfg) =
initialize T to {closure({S‟   S})}
repeat
for each state I in T and each symbol X
let J be goto(I, X)
if J is not empty and not in T then
T=T{J}
until T does not change
return T
An Example

I1 : {E‟   E, E   E + T, E   T, T   T * F,
T   F, F   ( E ), F   id}

goto(I1, E) = I2 : {E‟  E , E  E  + T}
goto(I1, T) = I3 : {E  T , T  T  * F}
goto(I1, F) = I4 : {T  F }
goto(I1, „(‟) = I5 : {F  (  E ), E   E + T, E   T
T   T * F, T   F, F   ( E ), F   id}
goto(I1, id) = I6 : {F  id }

goto(I2, „+‟) = I7 : {E  E +  T, T   T * F, T   F
F   ( E ), F   id}
An Example

goto(I3, „‟) = I8 : {T  T  F, F   ( E ), F   id}

goto(I5, E) = I9 : {F  ( E ), E  E  + T}
goto(I5, T) = I3
goto(I5, F) = I4
goto(I5, „(‟) = I5
goto(I5, id) = I6

goto(I7, T) = I10 : {E  E + T , T  T  * F}
goto(I7, F) = I4
goto(I7, „(‟) = I5
goto(I7, id) = I6
An Example

goto(I8, F) = I11 : {T  T * F }
goto(I8, „(„) = I5
goto(I8, id) = I6

goto(I9, „)‟) = I12 : {F  ( E ) }

goto(I9, „+‟) = I7

goto(I10, „*‟) = I8
An Example
E‟  • E F  id • F(•E)
E  • E + T id 6 id E•E+T F (E•) 9
E•T E•T
( E EE•+T
T•T*F T•T*F +
T•F T T•F ( )
F•(E) 8 id ( F F•(E)
F  ( E ) • 12
F  • id 1 TT*•F F  • id 5
E F T 3 F•(E) F
* F  • id T  T * F • 11
ET•
T  T • * F id E  E + • T *
(
T•T*F
TF• 4 F T E  E + T • 10
T•F TT•*F
E‟  E • + F•(E)
EE•+T 2 F  • id 7
SLR(1) Parsing Table Generation
SLR(cfg) =
for each state I in subset-construction(cfg)
if A   a  in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if A   in I and A  S‟ then
action[I, a] = “reduce A ” for all a in Follow(A)
if S‟  S  in I then action[I, $] = “accept”
if A   X  in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
all other entries in action and goto are made error
An Example

+ * ( ) id $ E T F
1 s5 s6 g2 g3 g4
2 s7 a
3 r3 s8 r3 r3
4 r5 r5 r5 r5
5 s5 s6 g9 g3 g4
6 r7 r7 r7 r7
7 s5 s6 g10 g4
8 s5 s6 g11
9 s7 s12
10 r2 s8 r2 r2
An Example

+ * ( ) id $ E T F
11 r4 r4 r4 r4
12 r6 r6 r6 r6
LR(I) Items

 An LR(1) item of a grammar in G is a pair,

( A    , a ), of an LR(0) item A    
and a lookahead symbol a
 The lookahead has no effect in an LR(1) item
of the form ( A    , a ), where  is not 
 An LR(1) item of the form ( A    , a ) calls
for a reduction by A   only if the next input
symbol is a
The Closure Function

closure(I) =
repeat
for any item (A   X , a) in I
for any production X  
for any b  First(a)
I = I  { (X   , b) }
until I does not change
return I
An Example

I1 = closure({(S‟   S, $)}) =
1. S‟  S
2. S  C C {(S‟   S, $), First($) = {$}
3. C  c C
(S   C C, $),
4. C  d
(C   c C, c), (C   c C, d),
First(C$) = {c, d} (C   d, c), (C   d, d)}
The Goto Function

goto(I, X) =
set J to the empty set
for any item (A   X , a) in I
add (A  X  , a) to J
return closure(J)
An Example

goto(I1, C)
= closure({S  C  C, $)})
= {S  C  C, $), (C   c C, $), (C   d, $)}
The Subset Construction Function

subset-construction(cfg) =
initialize T to {closure({(S‟   S , $)})}
repeat
for each state I in T and each symbol X
let J be goto(I, X)
if J is not empty and not in T then
T=T{J}
until T does not change
return T
An Example

1. S‟  S
2. S  C C
3. C  c C
4. C  d
An Example
I1: closure({(S‟   S, $)}) = I4: goto(I1, c) =
(S‟   S, $) (C  c  C, c/d)
(S   C C, $) (C   c C, c/d)
(C   c C, c/d) (C   d, c/d)
(C   d, c/d)
I5: goto(I1, d) =
I2: goto(I1, S) = (S‟  S , $) (C  d , c/d)

I3: goto(I1, C) = I6: goto(I3, C) =

(S  C  C, $) (S  C C , $)
(C   c C, $)
(C   d, $)
An Example

I7: goto(I3, c) = : goto(I4, c) = I4

(C  c  C, $)
(C   c C, $) : goto(I4, d) = I5
(C   d, $)
I10: goto(I7, C) =
I8: goto(I3, d) = (C  c C , $)
(C  d , $)
: goto(I7, c) = I7
I9: goto(I4, C) =
(C  c C , c/d) : goto(I7, d) = I8
LR(1) Parsing Table Generation
LR(cfg) =
for each state I in subset-construction(cfg)
if (A   a , b) in I and goto(I, a) = J for a terminal a
then action[I, a] = “shift J”
if (A  , a) in I and A  S‟
then action[I, a] = “reduce A ”
if (S‟  S  , $) in I then action[I, $] = “accept”
if (A   X , a) in I and goto(I,X) = J for a nonterminal X
then goto[I, X] = J
all other entries in action and goto are made error
An Example

c d $ S C
1 s4 s5 g2 g3
2 a
3 s7 s8 g6
4 s4 s5 g9
5 r4 r4
6 r2
7 s7 s8 g10
8 r4
9 r3 r3
10 r3
An Example

$,r1
2 6 $,r2
C
S c
c C 10 $,r3
C 3 7
1 c d d
c 4 8 $,r4
d C
d
5 9 c/d,r3
c/d,r4
An Example

$,r1
2 6 $,r2
C
S c
c C 10 $,r3
C 3 7
1 c d d
c 4 8 $,r4
d C
d
5 9 c/d,r3
c/d,r4
The Core of LR(1) Items

 The core of a set of LR(1) Items is the set of

their first components (i.e., LR(0) items)
 The core of the set of LR(1) items
{ (C  c  C, c/d),
(C   c C, c/d),
(C   d, c/d) }
is { C  c  C,
C   c C,
Cd}
Merging Cores

I4: { (C  c  C, c/d), (C   c C, c/d), (C   d, c/d) }

 I7: { (C  c  C, $), (C   c C, $), (C   d, $) }
 I47: { (C  c  C, c/d/$), (C   c C, c/d/$),
(C   d, c/d/$) }

I5: { (C  d , c/d) }  I8: { (C  d , $) }

 I58: { (C  d , c/d/$) }

I9: { (C  c C , c/d) }  I10: { (C  c C , $) }

 I910: { (C  c C , c/d/$) }
LALR(1) Parsing Table Generation
LALR(cfg) =
for each state I in merge-core(subset-construction(cfg))
if (A   a , b) in I and goto(I, a) = J for a terminal a
then action[I, a] = “shift J”
if (A  , a) in I and A  S‟
then action[I, a] = “reduce A ”
if (S‟  S , $) in I then action[I, $] = “accept”
if (A   X , a) in I and goto(I,X) = J for a nonterminal X
then goto[I, X] = J
all other entries in action and goto are made error
An Example

c d $ S C
1 s47 s58 g2 g3
2 a
3 s47 s58 g6
47 s47 s58 g910
58 r4 r4 r4
6 r2
910 r3 r3 r3
Shift/Reduce Conflicts

stmt  if expr then stmt

| if expr then stmt else stmt
| other

Stack Input
$ - - - if expr then stmt else - - - $

Shift  if expr then stmt else stmt

Reduce  if expr then stmt
Reduce/Reduce Conflicts

stmt  id ( para_list ) | expr := expr

para_list  para_list , para | para
para  id
expr_list  expr_list , expr | expr
expr  id ( expr_list ) | id
Stack Input
$ - - - id ( id , id ) - - - $

$- - - procid ( id , id ) - - - $
LR Grammars

 A grammar is SLR(1) iff its SLR(1) parsing

table has no multiply-defined entries
 A grammar is LR(1) iff its LR(1) parsing table
has no multiply-defined entries
 A grammar is LALR(1) iff its LALR(1) parsing
table has no multiply-defined entries
Hierarchy of Grammar Classes

Unambiguous Grammars Ambiguous Grammars

LL(k) LR(k)

LR(1)

LALR(1)

LL(1) SLR(1)

Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Compiler Design CS_4
No ratings yet
Compiler Design CS_4
70 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Compiler Design 3
No ratings yet
Compiler Design 3
140 pages
Chapter-3 so far
No ratings yet
Chapter-3 so far
50 pages
APznzaYtAWjYy0s_GBEoizaF1ROv5e2pS_Nl6BcNYabrBN8gt4KeYj7LFiXdkYVxT_V92vXdgLmWE0ZcbyVltch5fozoqQQ4KdG766DLjO8aJsMIPKjEjniZOjL0qtNhMykCRh_ohPtDpZvrHNBAvbbZBhvxDpVEqpjDluyzuJGi-VI3NuG46DY_24QwGBEoRdfQYjfevW6tvweeRG (1)
No ratings yet
APznzaYtAWjYy0s_GBEoizaF1ROv5e2pS_Nl6BcNYabrBN8gt4KeYj7LFiXdkYVxT_V92vXdgLmWE0ZcbyVltch5fozoqQQ4KdG766DLjO8aJsMIPKjEjniZOjL0qtNhMykCRh_ohPtDpZvrHNBAvbbZBhvxDpVEqpjDluyzuJGi-VI3NuG46DY_24QwGBEoRdfQYjfevW6tvweeRG (1)
100 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
G52Cmp Compilers: Syntax Analysis
No ratings yet
G52Cmp Compilers: Syntax Analysis
36 pages
2.2 - Syntax Analysis (Upto Top-down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-down Parsing)
91 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
SYNTAX Analyzer
No ratings yet
SYNTAX Analyzer
29 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Flat (Unit-III) Notes
No ratings yet
Flat (Unit-III) Notes
33 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
RG CFG AMbiguity
No ratings yet
RG CFG AMbiguity
8 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
CH03
No ratings yet
CH03
57 pages
Chapter Four Automata
No ratings yet
Chapter Four Automata
36 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
No ratings yet
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
28 pages
CFG & GNF
No ratings yet
CFG & GNF
21 pages
Class 18 Context Free Grammar
No ratings yet
Class 18 Context Free Grammar
35 pages
03 Compiler Design Lecture - Syntax Analysis
No ratings yet
03 Compiler Design Lecture - Syntax Analysis
39 pages
CST302_COMPILER_DESIGN_MODULE 2
No ratings yet
CST302_COMPILER_DESIGN_MODULE 2
19 pages
Lecture 6 (6-2-23)
No ratings yet
Lecture 6 (6-2-23)
9 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Chapter3-CFG
No ratings yet
Chapter3-CFG
67 pages
CD Unit 2
No ratings yet
CD Unit 2
15 pages
CD.mod2
No ratings yet
CD.mod2
18 pages
Module 2
No ratings yet
Module 2
19 pages
Chapter – three
No ratings yet
Chapter – three
139 pages
ToC Notes - Unit 2
No ratings yet
ToC Notes - Unit 2
20 pages
Lex
No ratings yet
Lex
13 pages
Unit 2
No ratings yet
Unit 2
39 pages
Why Syntax Analysis?
No ratings yet
Why Syntax Analysis?
15 pages
Cdmodule 2
No ratings yet
Cdmodule 2
22 pages
Toc 4 and 5 Unit Notes
No ratings yet
Toc 4 and 5 Unit Notes
72 pages
Chapter 3 (2)
No ratings yet
Chapter 3 (2)
41 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
TPL Lect 17-20
No ratings yet
TPL Lect 17-20
8 pages
CH-3 Syntax Analyzer
No ratings yet
CH-3 Syntax Analyzer
41 pages
Unit2 TopDownParsing
No ratings yet
Unit2 TopDownParsing
12 pages
Chapter 3 Syntax Analysis I
No ratings yet
Chapter 3 Syntax Analysis I
27 pages
CC-Lec 5 Week 5 Cfgs
No ratings yet
CC-Lec 5 Week 5 Cfgs
29 pages
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
19 pages
toc 2
No ratings yet
toc 2
23 pages
[Week 4] Syntax Analysis (CFG)
No ratings yet
[Week 4] Syntax Analysis (CFG)
50 pages
Syntax Analysis
No ratings yet
Syntax Analysis
58 pages
Automata Ch3
No ratings yet
Automata Ch3
29 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Context Free Grammar (CFG)
No ratings yet
Context Free Grammar (CFG)
18 pages
04 Syntax Analysis
No ratings yet
04 Syntax Analysis
112 pages

Chp3 Syntax Analysis

Uploaded by

Chp3 Syntax Analysis

Uploaded by

Chapter 3 Syntax Analysis

 Syntax analysis recognizes the syntactic

source token syntax Semantic

 A syntax tree represents the syntactic structure

 A set of terminals: basic symbols (token types)

 Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’

 A derivation step is an application of a

 If there are more than one nonterminal in the

Leftmost derivation: Rightmost derivation:

 A parse tree is a graphical representation for a

Leftmost derivation: Rightmost derivation:

 A grammar is ambiguous if it can derive a

expr + expr expr * expr

id expr * expr expr + expr id

Ambiguous grammar: Unambiguous grammar:

Stack Finite Automata Output

 Parsers must read not only terminal symbols

SaSb (a, a) (b, a)

 Every language defined by a RE can also be

 REs can denote only a fixed number of

 Construct a parse tree from the root to the

 Predictive parsing is a top-down parsing

stmt  if expr then stmt else stmt

 Predictive parsing is also called LL(k) parsing

 We will only describe LL(1) parsing from now

 A procedure is associated with each

begin print num = num ; end

 The first set of a string , FIRST(), is the set

 If X is terminal, then FIRST(X) is {X}

S  if E then S else S | begin L end | print E

FIRST(S) = { if, begin, print }

 Place $ in FOLLOW(S), where S is the start

S  if E then S else S | begin L end | print E

Input. Grammar G. Output. Parsing Table M.

 A grammar is LL(1) iff its predictive parsing table

 FIRST()  FOLLOW(S')  FIRST(e S) = {e}  

 A grammar is left recursive if it has a

A  1 A' | 2 A' | ... | n A'

A'  1 A' | 2 A' | ... | m A' | 

 A grammar is not LL(1) if two productions of a

 Construct a parse tree from the leaves to the

S  aABe input: abbcde

 The L stands for scanning the input from left to

 LL(k) parsing must predict which production to

s2 Parsing driver Output

if then else begin end print ; num = $ S L E

if then else begin end print ; num = $ S L E

A langauge for specifying parsers and semantic analyzers

Bison compiler lang.tab.c

lang.tab.c C compiler a.out

tokens a.out syntax tree

line  expr „\n‟

#define NEWLINE 278

line: expr „\n‟ {printf(“line: expr \\n\n”);}

 yyparse(): the parser function

 yylval: the attribute value of a token. Its default

 A reduce/reduce conflict is resolved by

 The precedence and associativity of

line : expr „\n‟

 The parser can report a syntax error by calling

 An LR parsing table generation algorithm

 An LR(0) item of a grammar in G is a

 There are two functions performed on sets of

goto(I3, „*‟) = I8 : {T  T *  F, F   ( E ), F   id}

goto(I9, „)‟) = I12 : {F  ( E ) }

 An LR(1) item of a grammar in G is a pair,

I3: goto(I1, C) = I6: goto(I3, C) =

I7: goto(I3, c) = : goto(I4, c) = I4

 The core of a set of LR(1) Items is the set of

I4: { (C  c  C, c/d), (C   c C, c/d), (C   d, c/d) }

I5: { (C  d , c/d) }  I8: { (C  d , $) }

I9: { (C  c C , c/d) }  I10: { (C  c C , $) }

stmt  if expr then stmt

Shift  if expr then stmt else stmt

stmt  id ( para_list ) | expr := expr

 A grammar is SLR(1) iff its SLR(1) parsing

goto(I3, „‟) = I8 : {T  T  F, F   ( E ), F   id}