0% found this document useful (0 votes)

109 views

Parsing Notes

This document discusses parsing and context-free grammars. It defines parsing as checking if a sequence of tokens matches the language specification. Context-free grammars are used to specify languages, consisting of terminals, non-terminals, productions, and a start symbol. Productions define syntactic rules by replacing non-terminals. Grammars derive strings by replacing non-terminals. The strings derived from the start symbol form the language. Parse trees represent operator evaluations. Precedence and associativity determine evaluation order when operators have equal precedence. Grammars must encode precedence and associativity to be unambiguous.

Uploaded by

tamaghno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views

Parsing Notes

Uploaded by

tamaghno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Parsing

Rupesh Nasre.

CS3300 Compiler Design

IIT Madras
Aug 2015
Character stream

Machine-Independent
Machine-Independent
Lexical
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer

Intermediate representation

Backend
Token stream
Frontend

Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator

Syntax tree Target machine code

Machine-Dependent
Machine-Dependent
Semantic
SemanticAnalyzer
Analyzer Code
CodeOptimizer
Optimizer

Syntax tree Target machine code

Intermediate
Intermediate Symbol
Code
CodeGenerator
Generator Table
2
Intermediate representation
Jobs of a Parser
● Read specification given by the language
implementor.
● Get help from lexer to collect tokens.
● Check if the sequence of tokens matches the
specification.
● Declare successful program structure or report
errors in a useful manner.
● Later: Also identify some semantic errors.
Parsing Specification
● In general, one can write a string manipulation
program to recognize program structures (e.g.,
Lab 2).
● However, the string manipulation / recognition
can be generated from a higher level
description.
● We use Context-Free Grammars to specify.
– Precise, easy to understand + modify, correct
translation + error detection, incremental language
development.
CFG
1. A set of terminals called tokens. list →
list → list
list++digit
digit
list →
list → list
list––digit
digit
 Terminals are elementary symbols list →
list →digit
digit
of the parsing language. digit→
digit →00| |11| |...
...| |88| |99
2. A set of non-terminals called variables.
 A non-terminal represents a set of strings of
terminals.
3. A set of productions.
– They define the syntactic rules.
4. A start symbol designated by a non-terminal.
Productions, Derivations and Languages
list →
list → list
list++digit
digit
list →
list → list
list––digit
digit
list →
list →digit
digit
digit→
digit →00| |11| |...
...| |88| |99

left or right or
head body

● We say a production is for a non-terminal if the non-terminal is the

head of the production (first production is for list).
● A grammar derives strings by beginning with the start symbol and
repeatedly replacing a non-terminal by the body of a production for
that non-terminal (the grammar derives 3+1-0+8-2+0+1+5).
● The terminal strings that can be derived from the start symbol form
the language defined by the grammar (0, 1, ..., 9, 0+0, 0-0, ... or infix
expressions on digits involving plus and minus).
++

Parse Tree ++ 55

++ 11
list →
list → list
list++digit
digit
list →
list -- 00
→ list
list––digit
digit
list →
list →digit
digit ++ 22
digit→
digit →00| |11| |...
...| |88| |99 -- 88

++ 00

33 11

3+1-0+8-2+0+1+5
● A parse tree is a pictorial representation of operator evaluation.
Precedence
What
Whatififboth
boththe
theoperators
operatorsare
arethe
thesame?
same?
● x#y@z ## @
@
– How does a compiler know whether xx @
@ ## zz
to execute # first or @ first? xx yy
yy zz
– Think about x+y*z vs. x/y-z
– A similar situation arises in if-if-else.
● Humans and compilers may “see” different parse
trees.
#define MULT(x) x*x
int main() {
printf(“%d”, MULT(3 + 1));
}
Same Precedence

## @
@ -- --
x+y+z Order of evaluation
xx @
@ ## zz doesn't matter. xx -- -- zz

yy zz xx yy yy zz xx yy
x-y-z Order of evaluation
matters.
Associativity
● Associativity decides the order in
which multiple instances of same- -- --

priority operations are executed. xx -- -- zz

– Binary minus is left associative, yy zz xx yy

hence x-y-z is equal to (x-y)-z.

Homework: Write a C program to find out that assignment

operator = is right associative.
Grammar for Expressions
Why is the grammar of expressions written this
way?

E→E+T|E–T|T
T→T*F|T/F|F
F → (E) | number | name
Ambiguous / Unambiguous Grammars
Grammar for simple arithmetic expressions
E → E + E | E * E | E – E | E / E | (E) | number | name Precedence not encoded
a+b*c
E→E+E|E–E|T
T→T*T|T/T|F Associativity not encoded
F → (E) | number | name a–b-c

E→E+T|E–T|T
T→T*F|T/F|F Unambiguous grammar
F → (E) | number | name

Homework: Find out the issue with the final grammar.

Ambiguous / Unambiguous Grammars
Grammar for simple arithmetic expressions
E → E + E | E * E | E – E | E / E | (E) | number | name Precedence not encoded
a+b*c
E→E+E|E–E|T
T→T*T|T/T|F Associativity not encoded
F → (E) | number | name a–b-c

E→E+T|E–T|T
T→T*F|T/F|F Unambiguous grammar
F → (E) | number | name Left recursive, not suitable
for top-down parsing

E → T E'
Non-left-recursive grammar
E' → + T E' | - T E' | ϵ Can be used for top-down
T → F T' parsing
T' → * F T' | / F T' | ϵ
F → (E) | number | name
Sentential Forms
● Example grammar E → E + E | E * E | – E | (E) | id

- (id + id)
● Sentence / string
E => - E => - (E) => - (E + E) => - (id + E) => - (id + id)
● Derivation
● Sentential forms E, -E, -(E), ..., - (id + id)

– At each derivation step we make two choices

– One, which non-terminal to replace
– Two, which production to pick with that non-
terminal as the head
E => -E => - (E) => - (E + E) => - (E + id) => - (id + id)

● Would it be nice if a parser doesn't have this confusion?

Leftmost, Rightmost
● Two special ways to choose the non-terminal
– Leftmost: the leftmost non-terminal is replaced.
E => -E => - (E) => - (E + E) => - (id + E) => - (id + id)

– Rightmost: ...
E => -E => - (E) => - (E + E) => - (E + id) => - (id + id)

● Thus, we can talk about left-sentential forms

and right-sentential forms.
● Rightmost derivations are sometimes called
canonical derivations.
Parse Trees
● Two special ways to choose the non-terminal
– Leftmost: the leftmost non-terminal is replaced.
E => -E => - (E) => - (E + E) => - (id + E) => - (id + id)

E
E E
E E
E E
E E
E E
E

-- E
E -- E
E -- E
E -- E
E -- E
E

(( E
E )) (( E
E )) (( E
E )) (( E
E ))

E
E ++ E
E E
E ++ E
E E
E ++ E
E

E
E id
id id
id
Parse Trees
● Given a parse tree, it is unclear which order
was used to derive it.
– Thus, a parse is a pictorial representation of future
operator order.
– It is oblivious to a specific derivation order.
E
E
● Every parse tree has a unique leftmost
derivation and a unique rightmost derivation -- EE
(( E
E ))
– We will use them in uniquely identifying
a parse tree. E
E ++ E
E

id
id id
id
Context-Free vs Regular
● We can write grammars for regular
expressions.
– Consider our regular expression (a|b)*abb.
– We can write a grammar for it.
A → aA | bA | aB
B → bC
C → bD
D→ϵ

– This grammar can be mechanically generated

from an NFA.
Classwork
● Write a CFG for postfix expressions {a,+,-,*,/}.
– Give the leftmost derivation for aa-aa*/a+.
– Is your grammar ambiguous or unambiguous?
●
What is this language: S → aSbS | bSaS | ϵ ?
– Draw a parse tree for aabbab.
– Give the rightmost derivation for aabbab.
● Palindromes, unequal number of as and bs, no
substring 011.
● Homework: Section 4.2.8.
Error Recovery, viable prefix
● Panic-mode recovery
– Discard input symbols until synchronizing tokens e.g. } or ;.
– Does not result in infinite loop.
● Phrase-level recovery
– Local correction on the remaining input
– e.g., replace comma by semicolon, delete a char
● Error productions
– Augment grammar with error productions by anticipating
common errors [I differ in opinion]
● Global correction
– Minimal changes for least-cost input correction
– Mainly of theoretical interest
– Useful to gauge efficacy of an error-recovery technique
Parsing and Context
● Most languages have keywords reserved.
● PL/I doesn't have reserved keywords.

ifif ifif == else

else then
then
then
then == else
else
else
else
then
then == ifif ++ else
else

● Meaning is derived from the context in which a word is used.

● Needs support from lexer – it would return token IDENT for all words or
IDENTKEYWORD.
● It is believed that PL/I syntax is notoriously difficult to parse.
if-else Ambiguity
stmt
stmt ->
-> ifif expr
expr then
then stmt
stmt
|| ifif expr
expr then
then stmt
stmt else
else stmt
stmt
|| otherstmt
otherstmt

There are two parse trees for the following string

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt if expr then stmt else stmt

E1 expr stmt stmt E1 if expr then stmt S2

if then else
E2 S1
E2 S1 S2
if-else Ambiguity
1.One way to resolve the ambiguity is to make
yacc decide the precedence: shift over reduce.
– Recall lex prioritizing longer match over shorter.
2.Second way is to change the grammar itself to
not have any ambiguity.

stmt
stmt ->
-> matched_stmt
matched_stmt || open_stmt
open_stmt
matched_stmt
matched_stmt -> -> ifif expr
expr then
then matched_stmt
matched_stmt else
else matched_stmt
matched_stmt
|| otherstmt
otherstmt
open_stmt
open_stmt ->-> ifif expr
expr thenthen stmt
stmt
|| ifif expr
expr thenthen matched_stmt
matched_stmt else
else open_stmt
open_stmt
if-else Ambiguity
stmt

if expr then stmt

E1 if expr then stmt else stmt

E2 S1 S2

if E1 then if E2 then S1 else S2

unambiguous
unambiguous
stmt
stmt ->
-> matched_stmt
matched_stmt || open_stmt
open_stmt
matched_stmt
matched_stmt -> -> ifif expr
expr then
then matched_stmt
matched_stmt else
else matched_stmt
matched_stmt
|| otherstmt
otherstmt
open_stmt
open_stmt ->-> ifif expr
expr thenthen stmt
stmt
|| ifif expr
expr thenthen matched_stmt
matched_stmt else
else open_stmt
open_stmt

Classwork: Write an unambiguous grammar for associating else with the first if.
Left Recursion
A grammar is left-recursive if it has a non-terminal A
such that there is a derivation A =>+ Aα for some
string α.
● Top-down parsing methods cannot handle left-
recursive grammars.
●
A → Aα | β
A
... Can we eliminate left recursion?
A
A
A

β α α ... α
Left Recursion
A grammar is left-recursive if it has a non-terminal A
such that there is a derivation A =>+ Aα for some
string α.
● Top-down parsing methods cannot handle left-
recursive grammars.
Right reursive.
●
A → Aα | β
A A
... R
A R
...
A
R
A R

β α α ... α β α α ... α є
Left Recursion

AA →
→βBβB
AA →
→Aα
Aα || ββ
BB → αB || ϵϵ
→ αB

Right reursive.

A A
... R
A R
...
A
R
A R

β α α ... α β α α ... α є
Left Recursion

AA →
→βBβB
AA →
→Aα
Aα || ββ
BB → αB || ϵϵ
→ αB
In general

AA→
→Aα
Aα11 ||Aα
Aα22 || ...
... ||Aα
Aαmm || ββ11 || ββ22 || ...
... || ββnn

AA →
→ ββ11BB || ββ22BB || ...... || ββnnBB
→ αα1BB || αα2BB || ...
BB →
1 2
... || ααmBB || ϵϵ
m
Algorithm for
Eliminating Left Recursion
arrange non-terminals in some order A1, ..., An.
for i = 1 to n {
for j = 1 to i -1 {
replace Ai → Ajα by Ai → β1α | ... | βkα

where Aj → α1 | ... | αk are current Aj productions

}
eliminate immediate left recursion among Ai productions.
}
Classwork
● Remove left recursion from the following
grammar.

EE →→TTE'E'
EE →
→EE++TT||TT E' →
E' → ++ TTE'E'||ϵϵ
TT→→ TT**FF||FF TT→ →FFT'T'
FF →
→ (E)
(E) || name
name || number
number T' →
T' →**FFT'T'||ϵϵ
FF →→ (E)
(E) || name
name || number
number
Ambiguous / Unambiguous Grammars
Grammar for simple arithmetic expressions
E → E + E | E * E | E – E | E / E | (E) | number | name Precedence not encoded
a+b*c
E→E+E|E–E|T
T→T*T|T/T|F Associativity not encoded
F → (E) | number | name a–b-c

E→E+T|E–T|T
T→T*F|T/F|F Unambiguous grammar
F → (E) | number | name Left recursive, not suitable
for top-down parsing

E → T E'
Non-left-recursive grammar
E' → + T E' | - T E' | ϵ Can be used for top-down
T → F T' parsing
T' → * F T' | / F T' | ϵ
F → (E) | number | name
Classwork
● Remove left recursion from the following
grammar.

SS→
→AAaa||bb SS→
→AAaa||bb
AA→
→ AAcc||SSdd||ϵϵ AA→
→ AAcc||AAaadd||bbdd||ϵϵ
SS →→AAaa||bb
AA→ → bbddA'A'||A'A'
A' →
A' →ccA'A'||aaddA'A'||ϵϵ
Left Factoring
● When the choice between two alternative
productions is unclear, rewrite the grammar to
defer the decision until enough input is seen.
– Useful for predictive or top-down parsing.
● A → α β1 | α β2
– Here, common prefix α can be left factored.
– A → α A'
– A' → β1 | β2
●
Left factoring doesn't change ambiguity. e.g. in
dangling if-else.
Non-Context-Free
Language Constructs
● wcw is an example of a language that is not CF.
● In the context of C, what does this language
indicate?
● It indicates that declarations of variables (w)
followed by arbitrary program text (c), and then
use of the declared variable (w) cannot be
specified in general by a CFG.
● Additional rules or passes (semantic phase) are
required to identify declare-before-use cases.
What does the language anbmcndm indicate in C?
Q1 Paper Discussion
● And attendance.
● And assignment marks.
Top-Down Parsing
● Constructs parse-tree for the input string,
starting from root and creating nodes.
● Follows preorder (depth-first).
● Finds leftmost derivation.
● General method: recursive descent.
– Backtracks
● Special case: Predictive (also called LL(k))
– Does not backtrack
– Fixed lookahead
Recursive Descent Parsing
void A() { Nonterminal A
saved = current input position;
for each A-production A -> X1 X2 X3 ... Xk { A-> BC | Aa | b

for (i = 1 to k) { Terms in body

if (Xi is a nonterminal) call Xi();
else if (Xi == next symbol) advance-input(); Term match

else { yyless(); break; } Term mismatch

}
if (A matched) break; Prod. match

else current input position = saved; Prod. mismatch

}
} ● Backtracking is rarely needed to parse PL constructs.
● Sometimes necessary in NLP, but is very inefficient. Tabular methods are
used to avoid repeated input processing.
Recursive Descent Parsing
void A() {
saved = current input position; SS→
→ccAAdd
for each A-production A -> X1 X2 X3 ... Xk { AA→
→ aabb||aa
for (i = 1 to k) { Input string: cad
if (Xi is a nonterminal) call Xi();
else if (Xi == next symbol) advance-input();
else { yyless(); break; } cad cad cad cad
S
S S
S S
S S
S
}
if (A matched) break; c A
A d c A
A d c A
A d

else current input position = saved; a b a

}
}
Classwork: Generate Parse Tree
E
E E
E E
E E
E E
E
E → T E'
TT E'
E' TT E'
E' TT E'
E' TT E'
E' E' → + T E' | ϵ
T → F T'
FF T'
T' FF T'
T' FF T'
T' T' → * F T' | ϵ
F → (E) | id
E
E E
E id id ϵ
E
E
TT E'
E' TT E'
E'
TT E'
E'
FF T'
T' + TT E'
E' FF T'
T' + TT E'
E'
T' + TT E'
FF T' E'
id ϵ id ϵ FF T'
T'
E
E E
E id ϵ FF T' E
E
T'
TT E'
E' TT E'
E' id TT E'
E'
* FF T'
T'
FF T'
T' + TT E'
E' T' + TT E'
FF T' E' FF T'
T' + TT E'
E'

id ϵ FF T'
T' id ϵ FF T'
T' id ϵ FF T'
T' ϵ

id FF T'
T' id * FF T'
T' id FF T'
T'
* *
id id ϵ id ϵ
FIRST and FOLLOW
● Top-down (as well as bottom-up) parsing is aided
by FIRST and FOLLOW sets.
– Recall firstpos, followpos from lexing.
● First and Follow allow a parser to choose which
production to apply, based on lookahead.
● Follow can be used in error recovery.
– While matching a production for A→ α, if the input
doesn't match FIRST(α), use FOLLOW(A) as the
synchronizing token.
FIRST and FOLLOW
●
FIRST(α) is the set of terminals that begin strings
derived from α, where α is any string of symbols
– If α =>* ϵ, ϵ is also in FIRST(α)
– If A → α | β and FIRST(α) and FIRST(β) are disjoint sets,
then the lookahead decides the production to be applied.
● FOLLOW(A) is the set of terminals that can appear
immediately to the right of A in some sentential form,
where A is a nonterminal.
– If S =>* αAaβ, then FOLLOW(A) contains a.
– If S =>* αABaβ and B =>* ϵ then FOLLOW(A) contains a.
– If A can be the rightmost symbol, we add $ to FOLLOW(A).
This means FOLLOW(S) always contains $.
FIRST and FOLLOW
E → T E'
● First(E) = {(, id} ● Follow(E) = {), $} E' → + T E' | ϵ
T → F T'
● First(T) = {(, id} ● Follow(T) = {+, ), $} T' → * F T' | ϵ
F → (E) | id
● First(F) = {(, id} ● Follow(F) = {+, *, ), $}
● First(E') = {+, ϵ} ● Follow(E') = {), $}
● First(T') = {*, ϵ} ● Follow(T') = {+, ), $}
First and Follow
Non-terminal FIRST FOLLOW
E (, id ), $ E → T E'
E' → + T E' | ϵ
+, ϵ
E' ), $
T → F T'
T (, id +, ), $ T' → * F T' | ϵ
F → (E) | id
*, ϵ
T' +, ), $

F (, id +, *, ), $
Predictive Parsing Table
Non- id + * ( ) $
terminal
E
E'
T
T'
F

Non-terminal FIRST FOLLOW

E (, id ), $ E → T E'
E' → + T E' | ϵ
+, ϵ
E' ), $
T → F T'
T (, id +, ), $ T' → * F T' | ϵ
F → (E) | id
*, ϵ
T' +, ), $

F (, id +, *, ), $
Predictive Parsing Table
Non- id + * ( ) $
terminal
E E → T E' E → T E'
E' E'→ +TE' E' → ϵ E' → ϵ
T
T→F T→F
T' T'
T' → ϵ T'→ *FT' T' → ϵ T' → ϵ
T'

F
→ id
FNon-terminal FIRST FOLLOW
F → (E)
E (, id ), $ E → T E'
E' → + T E' | ϵ
+, ϵ
E' ), $
T → F T'
T (, id +, ), $ T' → * F T' | ϵ
F → (E) | id
*, ϵ
T' +, ), $

F (, id +, *, ), $
Predictive Parsing Table
for each production A → α
for each terminal a in FIRST(α) Process terminals
using FIRST
Table[A][a].add(A→ α)
if ϵ is in FIRST(α) then
Process terminals
for each terminal b in FOLLOW(A) on nullable using
FOLLOW

Table[A][b].add(A→ α)
if $ is in FOLLOW(A) then Process $ on
nullable using
Table[A][$].add(A→ α) FOLLOW
LL(1) Grammars
● Predictive parsers needing no backtracking
can be constructed for LL(1) grammars.
– First L is left-to-right input scanning.
– Second L is leftmost derivation.
– 1 is the maximum lookahead.
– In general, LL(k) grammars.
– LL(1) covers most programming constructs.
– No left-recursive grammar can be LL(1).
– No ambiguous grammar can be LL(1).
Any example of RR grammar?
LL(1) Grammars
●
A grammar is LL(1) iff whenever A → α | β are
two distinct productions, the following hold:
– FIRST(α) and FIRST(β) are disjoint sets.
– If ϵ is in FIRST(β) then FIRST(α) and FOLLOW(A)
are disjoint sets, and likewise if ϵ is in FIRST(α)
FIRST(β) and FOLLOW(A) are disjoint sets.
Predictive Parsing Table
Non- id + * ( ) $
terminal
E E → T E' E → T E'
E' E'→ +TE' E' → ϵ E' → ϵ
T T → F T' T → F T'

T' → ϵ T'→ *FT' T' → ϵ T' → ϵ

F
F → id F → (E)
● Each entry contains a single production.
● Empty entries correspond to error states.
● For LL(1) grammar, each entry uniquely identifies an entry or signals an error.
● If there are multiple productions in an entry, then that grammar is not LL(1).
However, it does not guarantee that the language produced is not LL(1). We may
be able to transform the grammar into an LL(1) grammar (by eliminating left-
recursion and by left-factoring).
● There exist languages for which no LL(1) grammar exists.
Classwork: Parsing Table
Non- i t a e b $
terminal
S S → i E t S S' S→a
S' S' → e S S' → ϵ
S' → ϵ
E
E→b

Non-terminal FIRST FOLLOW

S → i E t S S' | a
S i, a e, $
S' → eS | ϵ
e, ϵ
S' e, $ E →b
E b t

What is this grammar?

Need for Beautification
● Due to a human programmer, sometimes
beautification is essential in the language
(well, the language itself is due to a human).
– e.g., it suffices for correct parsing not to provide an
opening parenthesis, but it doesn't “look” good.
No opening parenthesis

for
for ii == 0;
0; ii << 10;
10; ++i)
++i)
a[i+1]
a[i+1] == a[i];
a[i];

Example YACC grammar

Homework
● Consider a finite domain (one..twenty), and
four operators plus, minus, mult, div. Write a
parser to parse the following.
for
fornum1
num1inin{one..twenty}
{one..twenty}{{
for
fornum2
num2inin{one..twenty}
{one..twenty}{{
for
fornum3
num3inin{one..twenty}
{one..twenty}{{
for
fornum4
num4inin{one..twenty}
{one..twenty}{{
for
forop1
op1inin{plus,
{plus,minus,
minus,mult,
mult,div}
div}{{
for
forop2
op2inin{plus,
{plus,minus,
minus,mult,
mult,div}
div}{{
ififnum1
num1op1op1num2
num2====num3
num3op2op2num4
num4{{
print
printnum1
num1op1
op1num2
num2“==”
“==”num3
num3op2
op2num4;
num4;
}}
}}
}} }} }} }} }}

● Change the meaning of == from numeric equality

to anagram / shuffle, and see the output.
Bottom-Up Parsing
● Parse tree constructed bottom-up
– In reality, an explicit tree may not be constructed.
– It is also called a reduction.
– At each reduction step, a specific substring
matching the body of a production is replaced by
the nonterminal at the head of the production.
id * id Reduction
FF * id TT * id TT * FF TT E
E sequence

id FF FF id TT * FF TT

id id FF id TT * FF

id FF id

id
Bottom-Up Parsing
● A reduction is the reverse of a derivation.
● Therefore, the goal of bottom-up parsing is to
construct a derivation in reverse.

id * id FF * id TT * id TT * FF TT E
E
Bottom-Up Parsing
● A reduction is the reverse of a derivation.
● Therefore, the goal of bottom-up parsing is to
construct a derivation in reverse.

id * id FF * id TT * id TT * FF TT E
E

● This, in fact, is a rightmost derivation.

● Thus, scan the input from Left, and construct a
Rightmost derivation in reverse.
Handle Pruning
● A handle is a substring that matches the body of a production.
● Reduction of a handle represents one step in the reverse of a
rightmost derivation.

id * id FF * id TT * id TT * FF TT E
E

Right Sentential Form Handle Reducing Production

id1 * id2 id1 F -> id
F * id2 F T -> F
T * id2 id2 F -> id
T*F T*F T -> T * F
T T E -> T

We say a handle rather than the handle because ...

... the grammar could be ambiguous.
Shift-Reduce Parsing
● Type of bottom-up parsing
● Uses a stack (to hold grammar symbols)
● Handle appears at the stack top prior to pruning.
Stack Input Action
$ id1 * id2 $ shift
$ id1 * id2 $ reduce by F -> id
$F * id2 $ rduce by T -> F
$T * id2 $ shift
$T* id2 $ shift
$ T * id2 $ reduce by F -> id
$T*F $ reduce by T -> T * F
$T $ reduce by E -> T
$E $ accept
Shift-Reduce Parsing
● Type of bottom-up parsing
● Uses a stack (to hold grammar symbols)

● Handle appears at the stack top prior to pruning.

1.Initially, stack is empty ($...) and string w is on

the input (w $).
2.During left-to-right input scan, the parser shifts
zero or more input symbols on the stack.
3.The parser reduces a string to the head of a
production (handle pruning)
4.This cycle is repeated until error or accept (stack
contains start symbol and input is empty).
Conflicts
● There exist CFGs for which shift-reduce
parsing cannot be used.
● Even with the knowledge of the whole stack
(not only the stack top) and k lookahead
– The parser doesn't know whether to shift (be lazy)
or reduce (be eager) (shift-reduce conflict).
– The parser doesn't know which of the several
reductions to make (reduce-reduce conflict).
Shift-Reduce Conflict
● Stack: $ ... if expr then stmt
● Input: else ... $
– Depending upon what the programmer intended, it
may be correct to reduce if expr then stmt to stmt,
or it may be correct to shift else.
– One may direct the parser to prioritize shift over
reduce (recall longest match rule of lex).
– Shift-Reduce conflict is often not a show-stopper.
Reduce-Reduce Conflict
● Stack: $ ... id ( id
● Input: , id ) ... $
– Consider a language where arrays are accessed as
arr(i, j) and functions are invoked as fun(a, b).
– Lexer may return id for both the array and the function.
– Thus, by looking at the stack top and the input, a
parser cannot deduce whether to reduce the handle as
an array expression or a function call.
– Parser needs to consult the symbol table to deduce the
type of id (semantic analysis).
– Alternatively, lexer may consult the symbol table and
may return different tokens (array and function).
Ambiguity

The one
above

Apni to har aah ek tufaan hai

Uparwala jaan kar anjaan hai...
LR Parsing
● Left-to-right scanning, Rightmost derivation in
reverse.
● Type of bottom-up parsers.
– SLR (Simple LR)
– CLR (Canonical LR)
– LALR (LookAhead LR)
● LR(k) for k symbol lookahead.
– k = 0 and k = 1 are of practical interest.
● Most prevalent in use today.
Why LR?
● LR > LL
● Recognizes almost all programming language
constructs (structure, not semantics).
● Most general non-backtracking shift-reduce
parsing method known.
Simple LR (SLR)
● We saw that a shift-reduce parser looks at the stack and
the next input symbol to decide the action.
● But how does it know whether to shift or reduce?
– In LL, we had a nice parsing table; and we knew what
action to take based on it.
● For instance, if stack contains $ T and the next input
symbol is *, should it shift (anticipating T * F) or reduce
(E → T)?
● The goal, thus, is to build a parsing table similar to LL.
Items and Itemsets
● An LR parser makes shift-reduce decisions by
maintaining states to keep track of where we
are in a parse.
● For instance, A → XYZ may represent a state:
1. A → . XYZ
2. A → X . YZ LR(0) Item Itemset == state
3. A → XY . Z
4. A → XYZ .
●
A → ϵ generates a single item A → .
● An item indicates how much of a production
the parser has seen so far.
LR(0) Automaton
1. Find sets of LR(0) items.
2. Build canonical LR(0) collection.
– Grammar augmentation (start symbol)
– CLOSURE (similar in concept to ϵ-closure in FA)
– GOTO (similar to state transitions in FA)
3. Construct the FA

E→E+T|T
T→T*F|F
F → (E) | id
II0
0
Initial state
E'
E' → .EE
→ . Kernel item
EE→→..EE++TT
EE→ Item closure
→..TT
TT→→..TT**FF Non-kernel items
TT→→..FF
FF→→..(E)(E)
FF→→..id id

Classwork:
Find closure set for T → T * . F
Find closure set for F → ( E ) .

E' → E
E→E+T|T
T→T*F|F
F → (E) | id
LR(0) Automaton
1. Find sets of LR(0) items.
2. Build canonical LR(0) collection.
Grammar augmentation (start symbol)
CLOSURE (similar in concept to ϵ-closure in FA)
– GOTO (similar to state transitions in FA)
3. Construct the FA

E' → E
E→E+T|T
T→T*F|F
F → (E) | id
II0 II1
0 E 1
E'
E' → .EE
→ . E'
E' → E..
→ E
EE→→..EE++TT EE→→EE..++TT
EE→→..TT
TT→→..TT**FF
TT→→..FF
FF→→..(E)(E)
FF→→..id id

GOTO(I, X) is the closure of the set of items

[A → α X . β] such that [A → α . X β] is in I.
● For instance, GOTO(I0, E) is {E' → E ., E → E . + T}.

Classwork:
● Find GOTO(I, +) where I contains
{E' → E ., E → E . + T} E' → E
E→E+T|T
T→T*F|F
F → (E) | id
II0 II1 II6 II9
0 E 1 + 6 T 9
E'
E' → .EE
→ . E' → E
E' → E . . EE→
→ E++..TT
E EE→
→ E++TT..
E
EE→→..EE++TT EE→→EE..++TT TT→
→..TT**FF TT→
→TT..**FF
EE→→..TT TT→ F
$ →..FF *
TT→→..TT**FF FF→
→..(E)(E) (
TT→→..FF accept FF→ . id
→ . id
FF→ id
→..(E)(E) T II2
FF→→..id id 2
EE→
→ T..
T * II7
T TT→
→TT..**FF 7
TT→
→ T**..FF
T F II10
10
FF→
→..((EE)) TT→
id → T**FF..
T
II5 id FF→
→..id id
5
FF→
→ id..
id
id
id +
II8
II4 E 8 ) II11 Is the automaton
4 EE→ E
→E.+T . + T 11
LR(0) Automaton
FF→ FF→
→ ( E))..
( E complete?
( →((..EE)) FF→
→((EE..))
EE→
→..EE++TT
EE→
→..TT
( TT→
→..TT**FF (
TT→
→..FF
T FF→
→..(E)(E)
FF→ . id
→ . id (

F E' → E
F E→E+T|T
F II3
3 T→T*F|F
TT→
→FF.. F → (E) | id
II0 II1 II6 II9
0 E 1 + T 9
E'
E' → .EE
→ . E' → E . EE→
6
EE→
→ E++TT..
E ● Initially, the state
E' → E . → E++..TT
E
EE→→..EE++TT EE→→EE..++TT TT→
→..TT**FF TT→
→TT..**FF is 0 (for I0).
EE→→..TT TT→ F
$ →..FF *
● On seeing input
TT→→..TT**FF FF→
→..(E)(E) ( symbol id, the
TT→→..FF accept FF→ . id
→ . id state changes to 5
FF→ id
→..(E)(E) T II2 (for I5).
FF→→..id id 2
EE→
→ T..
T * On seeing input *,
II7 ●

T TT→
→TT..**FF 7
TT→
→ T**..FF
T F II10 there is no action
FF→
→..((EE)) TT→
10
out of state 5.
id → T**FF..
T
II5 id FF→
→..id id
5
FF→
→ id..
id
id
id +
II8
II4 E 8 ) II11
4 EE→ E
→E.+T . + T 11
FF→ FF→
→ ( E))..
( E
( →((..EE)) FF→
→((EE..))
EE→
→..EE++TT
EE→
→..TT
( TT→
→..TT**FF (
TT→
→..FF
T FF→
→..(E)(E)
FF→ . id
→ . id (

F E' → E
F E→E+T|T
F II3
3 T→T*F|F
TT→
→FF.. F → (E) | id
SLR Parsing using Automaton
Contains
Contains states
states like
like II00,, II11,, ...
...

Sr No Stack Symbols Input Action

1 0 $ id * id $ Shift to 5
2 05 $ id * id $ Reduce by F -> id
3 03 $F * id $ Reduce by T -> F
4 02 $T * id $ Shift to 7
5 027 $T* id $ Shift to 5
6 0275 $ T * id $ Reduce by F -> id
7 0 2 7 10 $T*F $ Reduce by T -> T * F
8 02 $T $ Reduce by E -> T
9 01 $E $ Accept

Homework: Construct such a table for parsing id * id + id.

E' → E
E→E+T|T
T→T*F|F
F → (E) | id
SLR(1) Parsing Table
State id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 accept
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

E' → E
E→E+T|T
T→T*F|F
F → (E) | id
LR Parsing
let a be the first symbol of w$
push 0 state on stack
while (true) {
let s be the state on top of the stack
if ACTION[s, a] == shift t {
push t onto the stack
let a be the next input symbol
} else if ACTION[s, a] == reduce A → β {
pop |β| symbols off the stack
let state t now be on top of the stack
push GOTO[t, a] onto the stack
output the production A → β
} else if ACTION[s, a] == accept { break }
else yyerror()
}
Classwork
● Construct LR(0) automaton and SLR(1)
parsing table for the following grammar.

S→AS|b
A→ SA| a

● Run it on string abab.

SLR(1) Parsing Table
State id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 accept
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

E' → E
E→E+T|T
Why do we not have a transition out of state 5 on (? T→T*F|F
F → (E) | id
Reduce Entries in the Parsing Table
● Columns for reduce entries are lookaheads.
● Therefore, they need to be in the FOLLOW of
the head of the production.
●
Thus, A -> α. is the production to be applied
(that is, α is being reduced to A), then the
lookahead (next input symbol) should be in
FOLLOW(A).
II5
5
Reduction F -> id should be applied only if the next input
FF→
→ id..
id symbol is FOLLOW(F) which is {+, *, ), $}.

State id + * ( ) $ E T F
5 r6 r6 r6 r6
l-values and r-values

S→L=R|R
L → *R | id
R→L
l-values and r-values
II0 II5
0 5 Consider state I2.
S' → . S
S' → . S LL→
→ id..
id
SS→ ● Due to the first item (S → L . = R),
→..LL==RR
SS→→..RR ACTION[2, =] is shift 6.
LL→→..**RR ● Due to the second item (R → L .), and
LL→→..id
id II6
6
RR→ →..LL SS→
→ L==..RR
L because FOLLOW(R) contains =,
RR→
→..LL ACTION[2, =] is reduce R → L..
LL→
→..**RR
II1
1 LL→
→..id id
S'
S' → S..
→ S Thus, there is a shift-reduce conflict.
II2
Does that mean the grammar is
SS→
2
II7 ambiguous?
→ L..==RR
L 7
RR→
→LL.. LL→
→ * R..
* R Not necessarily; in this case no.
However, our SLR parser is not able to
II3
3 handle it.
SS→
→ R..
R
II8
8
RR→
→ L..
L
II4
4 S'→ S
LL→
→ *..RR
*
RR→ S→L=R|R
→..LL II9
LL→
→..**RR 9
L → *R | id
LL→
→..id id SS→
→ L =RR..
L = R→L
LR(0) Automaton and
Shift-Reduce Parsing
● Why can LR(0) automaton be used to make
shift-reduce decisions?
● LR(0) automaton characterizes the strings of
grammar symbols that can appear on the
stack of a shift-reduce parser.
● The stack contents must be a prefix of a right-
sentential form [but not all prefixes are valid].
● If stack hold β and the rest of the input is x,
then a sequence of reductions will take βx to S.
Thus, S =>* βx.
Viable Prefixes
● Example
– E =>* F * id => (E) * id
– At various times during the parse, the stack holds
(, (E and (E).
– However, it must not hold (E)*. Why?
– Because (E) is a handle, which must be reduced.
– Thus, (E) is reduced to F before shifting *.
● Thus, not all prefixes of right-sentential forms
can appear on the stack.
● Only those that can appear are viable.
Viable Prefixes
● SLR parsing is based on the fact that LR(0)
automata recognize viable prefixes.
●
Item A -> β1.β2 is valid for a viable prefix αβ1 if
there is a derivation S =>* αAw => αβ1β2w.
●
Thus, when αβ1 is on the parsing stack, it
suggests we have not yet shifted the handle –
so shift (not reduce).
– Assuming β2 -> ϵ.
Homework
● Exercises in Section 4.6.6.
LR(1) Parsing
● Lookahead of 1 symbol.
● We will use similar construction (automaton),
but with lookahead.
● This should increase the power of the parser.

S'→ S
S→L=R|R
L → *R | id
R→L
LR(1) Parsing
● Lookahead of 1 symbol.
● We will use similar construction (automaton),
but with lookahead.
● This should increase the power of the parser.

S'→ S
S→CC
C→cC|d
LR(1) Automaton
S II1 $
II0 1 accept
0 S'
S' → S.,.,$$
→ S
S'
S' → . S,$$
→ . S,
SS→→..CC,
CC,$$ C II5
5
CC→ II2
→..ccC, C,c/d
c/d C 2 SS→
→ CC.,.,$$
CC
CC→ . d, c/d
→ . d, c/d SS→
→ C .C,
C . C,$$
CC→
→..ccC, C,$$ C II9
c II6 9
CC→
→..d,
d,$$ 6 CC→
→ c C.,.,$$
c C
CC→
→ c .C,
c . C,$$
CC→
→..ccC, C,$$
LL→ c
→..d,
d,$$
d
d II7
7
CC→
→ d.,.,$$
d

II3 C II8
3 Same LR(0) item, but
c CC→
→cc..C, C,c/d CC→
8
c/d → c C.,.,c/d
c C c/d different LR(1) items.
CC→ . c C, c/d
→ . c C, c/d c
CC→
→..d,d,c/dc/d

d S'→ S
d S→CC
II4
CC→
4 C→cC|d
→dd.,.,c/d
c/d
LR(1) Grammars
● Using LR(1) items and GOTO functions, we
can build canonical LR(1) parsing table.
● An LR parser using the parsing table is
canonical-LR(1) parser.
● If the parsing table does not have multiple
actions in any entry, then the given grammar is
LR(1) grammar.
● Every SLR(1) grammar is also LR(1).
– SLR(1) < LR(1)
– Corresponding CLR parser may have more states.
CLR(1) Parsing Table
State c d $ S C
0 s3 s4 1 2
1 accept
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2

S'→ S
S→CC
C→cC|d
LR(1) Automaton
S II1 $
II0 1 accept
0 S'
S'→ →SS.,.,$$
S'
S'→ →..S,
S,$$
SS→ →..CC,
CC,$$ C II5
5
CC→ II2
→..ccC,
C,c/d
c/d C 2 SS→ →CC
CC.,.,$$
CC→ . d, c/d
→ . d, c/d SS→ →CC..C,
C,$$
CC→ →..ccC,
C,$$ C II9
c II6 9
CC→ →..d,
d,$$ 6 CC→ →ccCC.,.,$$
CC→ →cc..C,
C,$$
CC→ →..ccC,
C,$$
CC→ c
→..d,
d,$$
d
d II7
7
CC→ →dd.,.,$$

II3 C II8
3 Same LR(0) item, but
c CC→ →cc..C,
C,c/d CC→
8
c/d →ccCC.,.,c/d
c/d different LR(1) items.
CC→ . c C, c/d
→ . c C, c/d c
CC→ →..d,
d,c/d
c/d
I8 and I9, I4 and I7, I3 and I6 S'→ S
d
Corresponding SLR parser has seven S→CC
d II4
4
states. C→cC|d
CC→ →dd.,.,c/d
c/d Lookahead makes parsing precise.
LALR Parsing
● Can we have memory efficiency of SLR and
precision of LR(1)?
● For C, SLR would have a few hundred states.
● For C, LR(1) would have a few thousand states.
● How about merging states with same LR(0)
items?
● Knuth invented LR in 1965, but it was considered
impractical due to memory requirements.
● Frank DeRemer invented SLR and LALR in 1969 (LALR
as part of his PhD thesis).
● YACC generates LALR parser.
State c d $ S C
I8 and I9, I4 and I7, I3 and I6
0 s3 s4 1 2 Corresponding SLR parser has seven
1 accept states.
2 s6 s7 5 Lookahead makes parsing precise.
3 s3 s4 8
4 r3 r3 ● LALR parser mimics LR parser on
correct inputs.
5 r1 ● On erroneous inputs, LALR may
6 s6 s7 9 proceed with reductions while LR
7 r3 has declared an error.
● However, eventually, LALR is
8 r2 r2 guaranteed to give the error.
9 r2
CLR(1) Parsing Table LALR(1) Parsing Table
State c d $ S C
0 s36 s47 1 2
1 accept
2 s36 s47 5
36 s36 s47 89
S'→ S 47 r3 r3 r3
S→CC
5 r1
C→cC|d
89 r2 r2 r2
State Merging in LALR
● State merging with common kernel items does
not produce shift-reduce conflicts.
● A merge may produce a reduce-reduce conflict.

S'→ S
S→aAd|bBd|aBe|bAe A → c., d/e
A→c B → c., d/e
B→c

● This grammar is LR(1).

●
Itemset {[A → c., d], [B → c., e]} is valid for viable prefix ac (due to acd and ace).
●
Itemset {[A → c., e], [B → c., d]} is valid for viable prefix bc (due to bcd and bce).
● Neither of these states has a conflict. Their kernel items are the same.
● Their union / merge generates reduce-reduce conflict.
Using Ambiguous Grammars
● Ambiguous grammars should be sparingly used.
● They can sometimes be more natural to specify
(e.g., expressions).
● Additional rules may be specified to resolve ambiguity.

S'→ S
S→iSeS|iS|a
Using Ambiguous Grammars
II0 S II1 $
0
S'
1
accept
S' → .
S' → . SS S' → S..
→ S
SS→→..iSeSiSeS
SS→ II5
→..iS iS II2 II4
5
SS→ . a i S e SS→
→ iSe..SS
iSe S II6
→.a 2 4
SS→ i . SeS
→ i . SeS SS→
→ iS..eS
iS eS SS→
→..iSeS
iSeS SS→
6

SS→ → iSeS..
iSeS
→i i..SS SS→ iS
→ iS . . SS→
→..iSiS
SS→
→..iSeS
iSeS SS→ .
→.a a
SS→
→..iS iS i
SS→ . a i
→.a
a
a II3 a State i e a $ S
3
SS→
→ a..
a
0 s2 1
1 s3 accept
2 s2 s3 4
3 r3 r3
4 s5/r2 r2
S'→ S 5 s2 s3 6
S→iSeS|iS|a 6 r1 r1
Summary
-- --
● Precedence / Associativity xx -- -- zz

● Parse Trees yy zz xx yy

● Left Recursion .
. .
.
● Left factoring β α α ...
.
α β α α . ... α є
● Top-Down Parsing
● LL(1) Grammars
● Bottom-Up Parsing
● Shift-Reduce Parsers
● LR(0), SLR
● LR(1), LALR

An Introduction To Proof Theory. Normalization, - Paolo Mancosu, Sergio Galvan, Richard Zach - 2021 - Oxford University Press
No ratings yet
An Introduction To Proof Theory. Normalization, - Paolo Mancosu, Sergio Galvan, Richard Zach - 2021 - Oxford University Press
430 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
PCD 1.4 Syntax Analysis
No ratings yet
PCD 1.4 Syntax Analysis
33 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Syntax Analysis
No ratings yet
Syntax Analysis
73 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
CH03
No ratings yet
CH03
57 pages
Compilers - Week 3
No ratings yet
Compilers - Week 3
17 pages
Chapter-3 so far
No ratings yet
Chapter-3 so far
50 pages
APznzaYtAWjYy0s_GBEoizaF1ROv5e2pS_Nl6BcNYabrBN8gt4KeYj7LFiXdkYVxT_V92vXdgLmWE0ZcbyVltch5fozoqQQ4KdG766DLjO8aJsMIPKjEjniZOjL0qtNhMykCRh_ohPtDpZvrHNBAvbbZBhvxDpVEqpjDluyzuJGi-VI3NuG46DY_24QwGBEoRdfQYjfevW6tvweeRG (1)
No ratings yet
APznzaYtAWjYy0s_GBEoizaF1ROv5e2pS_Nl6BcNYabrBN8gt4KeYj7LFiXdkYVxT_V92vXdgLmWE0ZcbyVltch5fozoqQQ4KdG766DLjO8aJsMIPKjEjniZOjL0qtNhMykCRh_ohPtDpZvrHNBAvbbZBhvxDpVEqpjDluyzuJGi-VI3NuG46DY_24QwGBEoRdfQYjfevW6tvweeRG (1)
100 pages
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
No ratings yet
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
76 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
19 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Unit - II CD
No ratings yet
Unit - II CD
38 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
CC_unit_3
No ratings yet
CC_unit_3
51 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
04 Syntax Analysis
No ratings yet
04 Syntax Analysis
66 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
SYNTAX Analyzer
No ratings yet
SYNTAX Analyzer
29 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
Chapter – three
No ratings yet
Chapter – three
139 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
CD Unit 2
No ratings yet
CD Unit 2
15 pages
2.2 - Syntax Analysis (Upto Top-down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-down Parsing)
91 pages
SE Compiler Chapter 3-Parser
No ratings yet
SE Compiler Chapter 3-Parser
27 pages
CSC 409 Note 2
No ratings yet
CSC 409 Note 2
12 pages
CST302_COMPILER_DESIGN_MODULE 2
No ratings yet
CST302_COMPILER_DESIGN_MODULE 2
19 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Parser
No ratings yet
Parser
40 pages
CD - Ch.2
No ratings yet
CD - Ch.2
39 pages
04 Syntax Analysis
No ratings yet
04 Syntax Analysis
112 pages
Compiler Theory: (A Simple Syntax-Directed Translator)
No ratings yet
Compiler Theory: (A Simple Syntax-Directed Translator)
50 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
CH-3 Syntax Analyzer
No ratings yet
CH-3 Syntax Analyzer
41 pages
CD Chapter 2
No ratings yet
CD Chapter 2
39 pages
Lec4 SyntaxAnalysis
No ratings yet
Lec4 SyntaxAnalysis
41 pages
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
4.parsing
No ratings yet
4.parsing
32 pages
Unit 3 SDD
No ratings yet
Unit 3 SDD
7 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
Why Syntax Analysis?
No ratings yet
Why Syntax Analysis?
15 pages
02 Simple Sysntax Directed Translation
No ratings yet
02 Simple Sysntax Directed Translation
57 pages
[Week 4] Syntax Analysis (CFG)
No ratings yet
[Week 4] Syntax Analysis (CFG)
50 pages
17 CFGremove Ambiguity Optional
No ratings yet
17 CFGremove Ambiguity Optional
30 pages
16 Ambiguity
No ratings yet
16 Ambiguity
51 pages
CD UNIT 3
No ratings yet
CD UNIT 3
76 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
90 pages
2. Simple Syntax Directed Translation
No ratings yet
2. Simple Syntax Directed Translation
51 pages
Principles of Programming Languages: Syntax Analysis
No ratings yet
Principles of Programming Languages: Syntax Analysis
51 pages
CD UNIT II
No ratings yet
CD UNIT II
11 pages
CSE 4102 Syntax Analysis or Parsing
No ratings yet
CSE 4102 Syntax Analysis or Parsing
73 pages
3 Role of Parser
No ratings yet
3 Role of Parser
135 pages
8 Notes
No ratings yet
8 Notes
12 pages
Chomsky Hierarchy of Grammar (Original)
No ratings yet
Chomsky Hierarchy of Grammar (Original)
5 pages
18CS36 DMS Module 1 Notes of VTU 3rd Semester CSE/ISE Discrete Mathematics
80% (5)
18CS36 DMS Module 1 Notes of VTU 3rd Semester CSE/ISE Discrete Mathematics
63 pages
Underlying Assumption
No ratings yet
Underlying Assumption
2 pages
Syntax Analysis: Role of Parsers
No ratings yet
Syntax Analysis: Role of Parsers
6 pages
Modified COMPD FIN
No ratings yet
Modified COMPD FIN
53 pages
Midterm 2011 Solution
No ratings yet
Midterm 2011 Solution
5 pages
Graham Priest Thinking The Impossible PDF
No ratings yet
Graham Priest Thinking The Impossible PDF
14 pages
How To Convert A Grammar To Griebach Normal Form
No ratings yet
How To Convert A Grammar To Griebach Normal Form
86 pages
Edward Huntington - A Set of Postulates For Abstract Geometry, Expressed in Terms of The Simple Relation of Inclusion
No ratings yet
Edward Huntington - A Set of Postulates For Abstract Geometry, Expressed in Terms of The Simple Relation of Inclusion
38 pages
TOC Question Paper
67% (3)
TOC Question Paper
3 pages
CHAPTER 3 Boolean Logic
No ratings yet
CHAPTER 3 Boolean Logic
5 pages
(Melvin Fitting) Types, Tableaus, and Gödel's God
100% (3)
(Melvin Fitting) Types, Tableaus, and Gödel's God
198 pages
Chapter 13
No ratings yet
Chapter 13
56 pages
ACD First Mid Question Paper
No ratings yet
ACD First Mid Question Paper
3 pages
Soft Computing SYLLABUS
No ratings yet
Soft Computing SYLLABUS
2 pages
Cse2004 Theory of Computation and Compiler Design LT 1.0 1 Cse2004 Theory of Computation and Compiler Design LT 1.0 1 Cse2004
No ratings yet
Cse2004 Theory of Computation and Compiler Design LT 1.0 1 Cse2004 Theory of Computation and Compiler Design LT 1.0 1 Cse2004
2 pages
Fuzzy 1
No ratings yet
Fuzzy 1
11 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Lecture 03
No ratings yet
Lecture 03
11 pages
Programming Language Theory Notes
No ratings yet
Programming Language Theory Notes
61 pages
Stoic Logic
No ratings yet
Stoic Logic
73 pages
Discrete Structures: Mohsin Raza
No ratings yet
Discrete Structures: Mohsin Raza
48 pages
Formal Systems & Programming Languages: An Introduction
No ratings yet
Formal Systems & Programming Languages: An Introduction
20 pages
GED Lesson Exercise 2 - GONZALES
No ratings yet
GED Lesson Exercise 2 - GONZALES
3 pages
Bottom Up Parser
No ratings yet
Bottom Up Parser
75 pages
07 Fuzzy Logic
No ratings yet
07 Fuzzy Logic
40 pages
6062 Question Paper
No ratings yet
6062 Question Paper
2 pages

Parsing Notes

Uploaded by

Parsing Notes

Uploaded by

Parsing

CS3300 Compiler Design

Syntax tree Target machine code

Syntax tree Target machine code

● We say a production is for a non-terminal if the non-terminal is the

priority operations are executed. xx -- -- zz

– Binary minus is left associative, yy zz xx yy

hence x-y-z is equal to (x-y)-z.

Homework: Write a C program to find out that assignment

Homework: Find out the issue with the final grammar.

– At each derivation step we make two choices

● Would it be nice if a parser doesn't have this confusion?

● Thus, we can talk about left-sentential forms

– This grammar can be mechanically generated

ifif ifif == else

● Meaning is derived from the context in which a word is used.

There are two parse trees for the following string

if expr then stmt if expr then stmt else stmt

E1 expr stmt stmt E1 if expr then stmt S2

if expr then stmt

E1 if expr then stmt else stmt

if E1 then if E2 then S1 else S2

where Aj → α1 | ... | αk are current Aj productions

for (i = 1 to k) { Terms in body

else { yyless(); break; } Term mismatch

else current input position = saved; Prod. mismatch

else current input position = saved; a b a

Non-terminal FIRST FOLLOW

T' → ϵ T'→ *FT' T' → ϵ T' → ϵ

Non-terminal FIRST FOLLOW

What is this grammar?

Example YACC grammar

● Change the meaning of == from numeric equality

● This, in fact, is a rightmost derivation.

Right Sentential Form Handle Reducing Production

We say a handle rather than the handle because ...

● Handle appears at the stack top prior to pruning.

1.Initially, stack is empty ($...) and string w is on

Apni to har aah ek tufaan hai

GOTO(I, X) is the closure of the set of items

Sr No Stack Symbols Input Action

Homework: Construct such a table for parsing id * id + id.

● Run it on string abab.

● This grammar is LR(1).

You might also like