0% found this document useful (0 votes)
165 views

Topic 4

This document discusses syntax analysis, which is the second phase of compilation. It examines grammars, context-free grammars, derivation trees, and pushdown machines. Grammars are used to formally specify languages and consist of terminal symbols, non-terminal symbols, and rewriting rules. Context-free grammars restrict rules to have a single non-terminal on the left-hand side. Derivation trees represent grammars and show how strings are generated. Pushdown machines are used for syntax analysis similar to how finite state machines are used for lexical analysis.

Uploaded by

aeisyah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views

Topic 4

This document discusses syntax analysis, which is the second phase of compilation. It examines grammars, context-free grammars, derivation trees, and pushdown machines. Grammars are used to formally specify languages and consist of terminal symbols, non-terminal symbols, and rewriting rules. Context-free grammars restrict rules to have a single non-terminal on the left-hand side. Derivation trees represent grammars and show how strings are generated. Pushdown machines are used for syntax analysis similar to how finite state machines are used for lexical analysis.

Uploaded by

aeisyah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 198

Topic 4: Syntax &

Semantic Analysis

Text Book: Compiler Design: Theory, Tools & Examples (JAVA Edition)
by Bergmann, Seth D. (Chapter 3)
1
Introduction
• Syntax Analysis : second phase of compiler
• INPUT to this phase : streams of tokens from
lexical analysis.
• Input will be checked for proper syntax
• If there’s error: compiler should generate informative
message
• OUTPUT to this phase:
• Streams of atoms
• Syntax trees
2
Introduction
atom
Primitive operation found in
most computer architectures,
can be implemented using
only a few machine language
operations

syntax tree
Data structure where the
interior nodes = operations &
leaves = operands 3
Introduction
• Parser = not only check syntax, but also capable to
produce output
• This capability is called syntax directed translation

4
1. Grammars
• Previously learned formal methods in specifying a language:
1. Finite State Machine
2. Regular Expression
• 3rd way to specify language and implement syntax analysis
phase - Grammar
• Grammar -> list of rules used to generate all strings of a
language and does not generate strings not in the language
• Grammar consists of:
1. Input/Terminal symbols – usually lowercase (a, b), and also
numbers
2. Non-terminal symbols : one is set as starting non-terminal –
usually uppercase (A, B, S), S is starting non terminal
3. Rewriting rules: define how strings may be generated – have
5
rewrite arrow -->
1. Grammars
• How does grammar specifies a language?
1. Starts with a non-terminal
2. Choose any of the rewriting rules and apply
repeatedly to produce sentential form
3. If sentential form contains no non-terminal symbols,
then language is in the grammar
4. If G is a grammar, language specified by this
grammar = L(G)
• Derivation: sequence of rewriting rules applied to
non-terminal and ends with a string of terminal
6
1. Grammars
Grammar G1, consists of four rules, the terminal symbols {0,1},
and the starting non-terminal, S.

G1:
1. S → 0S0
2. S → 1S1
3. S → 0
4. S → 1

An example of a derivation using this grammar is:


S ⇒ 0S0 ⇒ 00S00 ⇒ 001S100 ⇒ 0010100
Thus, 0010100 is in L(G1), i.e. it is one of the strings in the
language of grammar G1. 7
1. Grammars
G2:
1. S → ASB
2. S → ε
3. A → a
4. B → b
S ⇒ ASB ⇒ AASBB ⇒ AaSBB ⇒ AaBB ⇒ AaBb ⇒ Aabb
⇒ aabb

➢ aabb is in L(G2).
➢ G2 specifies the set of all strings of a’s and b’s which
contain the same number of a’s as b’s and in which all
the a’s precede all the b’s
➢ The null string is permitted in a rewriting rule.

L(G2) = {ε,ab,aabb,aaabbb,aaaabbbb,aaaaabbbbb,…}
= {anbn} such that n≥0 8
1. Grammars

9
1. Grammars

Two grammars, G1 and G2, are


said to be equivalent if
L(G1) = L(G2)
i.e., they specify the same
language.

10
2. Classes of Grammars
In referring to grammars, the following symbols
will have particular meanings:

symbols meanings
A,B,C,… Single non terminal
a,b,c,… Single terminal
…,X,Y,Z Single terminal/non terminal
…x,y,z String of terminals
α, β, γ,… String of terminals and non terminals

11
2. Classes of Grammars
Chomsky’s classification of grammars:

0. Unrestricted 1. Context -Sensitive


• No restrictions on rewriting rules • Each rule must be of this form:
• Each rule can contain random string of • αAγ → αβγ
terminals/non-terminals on both sides • Example: SaB -> caB
of the arrow (Left context is null)
• Example: SaB → cS
• ε permitted on the right side

2. Context-Free 3. Right Linear


• Each rule must be of this form: • Each rule must be of this form:
• A→α • A → aB or A → a
• Most programming languages are of • Can be used to define lexical items e.g
this type keywords, identifiers
12
• Example: A -> aABb
2. Classes of Grammars
Unrestricted

Context-Sensitive

Context-Free

Right Linear

Classes of Grammars

➢ The figure depicts the classes of grammars as circles.


➢ All points in a circle belong to the class of that circle.

13
2. Classes of Grammars
Exercise:
Classify each of the following grammar rules according to
Chomsky’s classification of grammars (in each case give the
largest – i.e., most restricted – classification type that applies):

14
3. Context-Free Grammar
• CFG can be represented in Backus-Naur Form (BNF).
• Non terminals = angle brackets < >
• Rewrite arrow = double colons + equal sign : : =
• Multiple definitions can be written on one line using the
alternation vertical bar |
• Examples:
• S --> aSb
• <S> : := a<S>b
• S --> aSb
• S --> ε
15
• <S> : := a<S>b | ε
3. Context-Free Grammar
Derivation Tree
• A tree which:
• Interior node: nonterminal in sentential form
• Leaf node: terminal symbol in the derived string

G2:
1. S → ASB
2. S → ε
3. A → a
4. B → b

A Derivation Tree for


aaabbb Using Grammar G2
16
3. Context-Free Grammar
Ambiguous Context Free Grammar
• CFG is ambiguous when there is more than one derivation
tree for a particular string

G4:
1. Expr → Expr + Expr
2. Expr → Expr ∗ Expr
3. Expr → ( Expr )
4. Expr → var
5. Expr → const

Two Different
Derivation Trees for
the String:

var + var ∗ var

using grammar G4 17
Left Right
3. Context-Free Grammar
Left-most derivation
• The one where left most non terminal is always the one where
rewriting rule is applied
• Left-most (or right-most) derivation is a normal form of
derivations

G2:
1. S → ASB
S ⇒ ASB ⇒ aSB ⇒ aASBB ⇒ aaSBB
2. S → ε
⇒ aaBB ⇒ aabB ⇒ aabb
3. A → a
4. B → b

18
3. Context-Free Grammar
• Sample Problem
• Determine whether the following grammar is ambiguous.
If so, show two different derivation trees for the same
string of terminals, and show a left-most derivation
corresponding to each tree.

• 1. S → a S b S
• 2. S → a S
• 3. S → c
19
3. Context-Free Grammar
Solution:
S S

a S b S a S

a S b S
a S c

c c c

S ⇒ a S b S ⇒ a a S b S ⇒ a a c b S ⇒ a a c b c
S ⇒ a S ⇒ a a S b S ⇒ a a c b S ⇒ a a c b c

Note that the two derivation trees correspond to two different left-most 20
derivations, and the grammar is ambiguous.
3. Context-Free Grammar
Exercises:

21
3. Context-Free Grammar
Exercises:

22
4. Pushdown Machine

Pushdown
Pushdown Machine can be
Machine is a type used for syntax
of analysis (just like
abstract/theoretic Finite State
machine Machine is used
for lexical analysis)
23
4. Pushdown Machine
• Pushdown Machine is made up of:
1. A finite set of states, with one designated as starting state
2. A finite set of input symbols aka input alphabets
3. An infinite stack and a finite set of stack symbols
• Which may be pushed on top or removed from the top in a
Last-In-First-Out (LIFO) manner
• Stack symbols don’t have to be different from input symbols
• Stack must be initialized with at least 1 stack symbol before
first input symbol is read
4. A state transition function which takes as arguments the
current state, the current input symbol and the symbol
currently on top of the stack. Result of this function is the new
state of the machine 24
4. Pushdown Machine
• Pushdown Machine is made up of (cont):
5. On each state transition, the machine may advance to next
input symbol or retain input pointer (not advance)
6. On each state transition, the machine may perform one of the
stack operations: push(X) or pop where X is one of the stack
symbols
7. A state transition may include an exit labeled as ‘Accept’ or
‘Reject’ to determine if input string is part of the specified
language

25
4. Pushdown Machine
Example of pushdown machine for grammar G2
S1 a b S2 a b
Push (X) Pop Pop
X Advance Advance Reject X Reject Advance Reject
S1 S2 S1 S2
Push (X)
▽ Advance Reject Accept ▽ Reject Reject Accept

• Each table: represent stacks


• Columns header: input symbols G2:
• Rows header: stack symbols 1. S → ASB
• End marker : end of input string 2. S → ε
• Each cell can have: 3. A → a
• Input pointer: advance or retain 4. B → b
• Stack operation: push( ) or pop 26
• Next state: S1 / S2
• Exits: Accept or Reject ▽ Bottom of stack
4. Pushdown Machine
• Sequence of Stacks as Pushdown Machine for G2
Grammar. Accepts the Input String aabb

Stack initialized with a


Try these strings:
stack symbol that marks
1. aaabbb
▽ the bottom of the stack
2. aaabb
Initial
Stack

a a X b b
→ X → X → X → →
▽ ▽ ▽ ▽ ▽ ▽
27
S1 S1 S1 S2 S2 S2
Accept
4. Pushdown Machine
• Exercise no. 1: Try the input string aaabbb based on the
Pushdown Machine for G2 Grammar

Sample solution:

X
a a X a X b X b b
→ X → → → X → → → Accept
X X X
▽ ▽ ▽ ▽ ▽ ▽ ▽
S1 S1 S1 S1 S2 S2 S2

28
4. Pushdown Machine
• Pushdown Machine to Accept Any String of Well-Balanced
Parentheses

S1 ( )
Push (X) Pop Try this string:
X Advance Advance Reject
1. ab)(cd)
S1 S1
2. (a)(bc)(
Push (X)
▽ Advance Reject Accept Example solution for no. 1:
S1
)
Reject
29

4. Pushdown Machine
1.
Pushdown
Machine

4.
- Pushdown Pushdown 2.
Extended - Machine that has
translator that has Pushdown
Pushdown Machine output function
a replace function Translator
Translator

3.
Extended - Machine that has replace
Pushdown operation
- Replace top stack symbols
Machine
with all the symbols in its
30
argument list
5. Ambiguities in Programming
Languages
• Ambiguities should be avoided
• HOW?
• Rewrite the grammar
• The new grammar must be equivalent with the original
grammar

Example of ambiguities:

var + var ∗ var

G4:
1. Expr → Expr + Expr
2. Expr → Expr ∗ Expr
3. Expr → ( Expr )
4. Expr → var 31
5. Expr → const
5. Ambiguities in Programming
Languages
• G4 rewritten into G5:
G4:
1. Expr → Expr + Expr
2. Expr → Expr ∗ Expr
3. Expr → ( Expr )
4. Expr → var
5. Expr → const

G5:
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term ∗ Factor
4. Term → Factor
5. Factor → ( Expr )
6. Factor → var
7. Factor → const
32
var + var ∗ var
6. Parsing Problem
• Given a grammar and a string of input symbols:
• Check if string belong in the language
• IF YES: determine structure
• Atoms
• Syntax tree
• IF NO: gives appropriate error message
• Parsing algorithm: solves the parsing problem by using
grammar
• Context Free Grammar:
• Top-down algorithm
33
• Bottom-up algorithm
TOPIC 4:
SYNTAX & SEMANTIC ANALYSIS
– PART B

TEXT BOOK: COMPILER DESIGN: THEORY, TOOLS &


EXAMPLES (JAVA EDITION) BY BERGMANN, SETH D.
(CHAPTER 4)

1
INTRODUCTION

• Parsing problem - given a grammar and an


input string, use parsing algorithm to:
1. Determine if the string is in the language of
the grammar
2. Determine its structure
• Parsing algorithms are classified as:
• Top Down Parsing
• Bottom Up Parsing

2
INTRODUCTION – TOP DOWN
PARSING
• Top Down Parsing algorithm: grammar rules are
applied in a top-to-down direction in the derivation
tree.
S
• Input string: abbbaccb
a S b
G8:
1. S → a S b b A c
2. S → b A c
3. A → b S b S
4. A → a
b A c

a
Derivation
Tree

3
INTRODUCTION – TOP DOWN
PARSING
• Start with starting nonterminal and decide which
rule of the grammar should be applied:
• Examine a single input symbol and compare it with the first
symbol on the right side of the rules
• abbbaccb
G8:
1. S → a S b
2. S → b A c
3. A → b S
4. A → a

S ⇒ aSb ⇒ abAcb ⇒ abbScb ⇒ abbbAccb ⇒ abbbaccb


4

1 2 3 2 4
RELATIONS AND CLOSURE

• From a grammar, we can automate the


process of producing a parser
• This requires the use of mathematical theories
involving sets and relations.

Each pair may be


Relation is a set of listed in parentheses
ordered pairs and separated by
commas

5
RELATIONS AND CLOSURE

• If R is a relation, then the reflexive transitive closure


of R is designated as R*
• R* is a relation made up of the same elements of R
with the following properties:
1) All pairs of R are also in R*
2) If (a,b) and (b,c) are in R*, then (a,c) is in R* [TRANSITIVE]
3) If a is in one of the pairs in R, then (a,a) is in R* [REFLEXIVE]
R1
(a,b)
(c,d)
(b,a)
(b,c)
(c,c)
6
RELATIONS AND CLOSURE

Example: Show R1* the reflexive transitive closure of R1.


Solution:

• (a,b)
• (c,d)
• (b,a) all pairs of R1
• (b,c) Try this:
• (c,c) Show the reflexive transitive closure of R2:
(a,b)
(c,d)
• (a,c)
(b,c)
• (b,d) transitive (d,a)
• (a,d)

• (a,a)
• (b,b) reflexive
• (d,d)

7
SIMPLE GRAMMARS

• A grammar is a simple grammar IF


1. Every rule is of the form:

A → aα any string of terminal & non


terminal

any non terminal


any terminal

2. Every pair of rules defining the same non terminal begin


with different terminals on the right side of the arrow

8
SIMPLE GRAMMARS

• Example: consider the following grammars:


G9: G10: G11:
1. S → aSb 1. S → aSb 1. S → aSb
2. S → b 2. S → ε 2. S → a

• Which one is a simple grammar?


• Answer: G9, because G10 has an epsilon rule, and
G11 have the same terminal for the same non
terminal

9
SIMPLE
GRAMMARS
Exercise:
Determine which of the
following grammars are
simple:

10
SELECTION SET

• Parsing algorithm must decide which rule in the


grammar have to be applied when parsing a string/
• HOW? by using selection set

SELECTION SET: set of input symbols which


imply the application of a grammar rule

• In simple grammar, selection set of each rule is


exactly one terminal symbol: the first one on the
right hand side

11
SELECTION SET

• What is the selection set for each of the rules in G9?


G9: G11:
1. S → aSb 1. S → aSb
2. S → b 2. S → a

• Rule 1: { a }
• Rule 2: { b }

• In top down parsing, rules defining the same non


terminal must have disjoint / non-intersecting
selection sets – thus G11 can’t be used.

12
SELECTION SET

• Example: abbaddd
G12: Selection Sets
1. S → a b S d 1. { a }
2. S → b a S d 2. { b }
3. S → d 3. { d }
S
rule1
a b S d
a
S

rule 2 b
abb a S d

b a S d

abbad rule 3 a b S d

b a S d

d 13
EXTENDED PUSHDOWN MACHINE
CONSTRUCTION FOR SIMPLE GRAMMARS
1. Build a table with:
1. Each column labeled by a terminal symbol (and end marker
↵)
2. Each row labeled by a non terminal or terminal (and bottom
marker ▽)
2. For each grammar rule of the form A → aα, fill in the
cell in row A and column a with: REP(αra), Retain
where αr = α reversed
3. Fill in the cell in the row a and column a with pop,
advance for each terminal symbol a
4. Fill in the cell in row ▽ and column ↵ with Accept
5. Fill in all other cells with Reject
6. Initialize stack with ▽ and the starting non terminal

14
EXTENDED PUSHDOWN MACHINE
CONSTRUCTION FOR SIMPLE GRAMMARS

• Build an extended pushdown machine for G13:


G13: a b ↵
1. S → aSB S Rep Rep
2. S → b (BSa) (b) Reject
3. B → a Retain Retain
4. B → bBa
B Rep (a) Rep
Retain (aBb) Reject
Retain
a Pop
Advance Reject Reject
b Pop
Reject Advance Reject
▽ Reject Reject Accept

15
EXTENDED PUSHDOWN MACHINE
CONSTRUCTION FOR SIMPLE GRAMMARS

• Once we have filled in each cell of the pushdown


machine table:
• Replace the top stack symbol with the symbols in the
right side of the rule in reverse order and retain the
input pointer
• When the top stack symbol is a terminal, check that the
current input symbol matches that stack symbol.
• If yes -> pop the stack and advance the input pointer
• if no -> reject the input string
• When the end of input string is encountered:
• Stack should be empty (except for ▽) in order for it to
be accepted.
16
EXTENDED PUSHDOWN MACHINE
CONSTRUCTION FOR SIMPLE GRAMMARS

• Show the sequence of stacks for input string: aba

a a b a
S → S → S → b → → → ↵
▽ B B B B a Accept

▽ ▽ ▽ ▽ ▽ ▽

Show the sequence of stack for the input strings:


1. aab↵
2. abbaa↵

17
EXTENDED PUSHDOWN MACHINE
CONSTRUCTION FOR SIMPLE GRAMMARS

Exercise:
Construct the extended one-state pushdown machine
to accept the language of that grammars:

Try parse:
1. aab
2. abba

18
PARSER IMPLEMENTATION

• We could write a program from a grammar that


can accept any string in the language of that
grammar, and reject any string not in the language
of that grammar.
• There is software that can automatically generate a
parser from a grammar. We call this kind of software
compiler-compiler.
• An example of compiler-compiler is SableCC

19
RECURSIVE DESCENT PARSER FOR
SIMPLE GRAMMARS
• Second way of implementing a parser for simple
grammars is to use a methodology known as
recursive descent.
Recursive Descent Parser - parser is written
using a traditional programming language
such as Java or C++
• Method is written for each non terminal in the
grammar:
• it handles non terminals by calling the corresponding
method
• it handles terminal by reading another input symbol

20
RECURSIVE DESCENT PARSER FOR
SIMPLE GRAMMARS
G13:
1. S → aSB
2. S → b
3. B → a void B()
4. B → bBa {
if (inp=='a')
void S() inp = getInp();
{ // rule 3
if (inp=='a') else
// apply rule 1 if (inp=='b')
{ // apply rule 4
inp = getInp(); {
S(); inp = getInp();
B(); B();
} // end rule 1 if (inp=='a')
else inp = getInp();
if (inp=='b') else
inp = getInp(); reject();
// apply rule 2 } // end rule 4
else else
reject(); reject();
} }
21
QUASI-SIMPLE GRAMMARS

• Quasi-simple grammar is a grammar which obeys


the restrictions of simple grammars, but may also
contain rules of the form:

N → ε epsilon / empty string

any non terminal

• As long as all rules defining the same nonterminal


have disjoint selection sets.

22
QUASI-SIMPLE GRAMMARS

• For example, G14 is a quasi-simple grammar


G14:
1. S → a A S
2. S → b
3. A → c A S
4. A → ε

• In order to do a top-down parse for this grammar,


we have to again find the selection set for each
rule.
• To find the selection set for ε rules, we need to find
the follow set.

23
FOLLOW SETS

1. For a nonterminal A, Fol(A) = the set of all terminals


that can appear immediately to the right of A in
some partial derivation.
2. Derivation starts from S↵, where S is the starting
non terminal.
3. If A is the rightmost symbol in a derivation, then
endmarker (↵) is in Fol(A).
4. ε is never in a follow set.

24
FOLLOW SETS

• For grammar G14, the follow sets for S and A are:


G14:
1. S → a A S
2. S → b
3. A → c A S
4. A → ε

S↵ ⇒ aAS ↵ ⇒ acASS↵ ⇒ acASaAS↵ Fol(S) = {a,b,↵}


⇒ acASb↵

S↵ ⇒ aAS↵ ⇒ aAaAS↵ Fol(A) = {a,b}


⇒ aAb↵

25
QUASI-SIMPLE GRAMMARS: SELECTION
SET
• The selection set for an ε rule is simply the follow set
of the nonterminal on the left side of the arrow.
G14:
1. S → a A S Fol(S) = {a,b,↵}
2. S → b
3. A → c A S Fol(A) = {a,b}
4. A → ε

• In G14, the selection set for rule 4 is:


• Sel(4) = Fol(A) = {a,b}

26
QUASI-SIMPLE GRAMMAR

• Example: acbb

G14: Selection Sets:


1. S → a A S 1. { a }
2. S → b 2. { b }
3. A → c A S 3. { c }
4. A → ε 4. { a,b }

27
QUASI-SIMPLE GRAMMAR
S
• Example: acbb
S
S a A S
a A S
a A S c A S
c A S
rule 4
rule 1 rule 3 acb ⇒
a ⇒
ε
ac ⇒
S S

a A S a A S

c A S b
c A S
rule 2 rule 2
acb ⇒ acbb ⇒
ε b ε b
28
EXTENDED PUSHDOWN MACHINES
CONSTRUCTION FOR QUASI-SIMPLE
GRAMMARS
1. Build a table with:
1. Each column labeled by a terminal symbol (and endmarker ↵)
2. Each row labeled by a non terminal or terminal (and bottom marker
▽)
2. For each grammar rule of the form A → aα, fill in the cell in
row A and column a with: REP(αra), Retain where αr = α
reversed
3. Fill in the cell in the row a and column a with pop, advance
for each terminal symbol a
4. Fill in the cell in row ▽ and column ↵ with Accept
1. For each ε rule in the grammar, fill in the cells of the row
corresponding to the nonterminal on the left side of the arrow and
columns corresponding to elements of the follow set of the
nonterminal. Fill in these cells with Pop, Retain
5. Fill in all other cells with Reject
6. Initialize stack with ▽ and the starting non terminal
29
EXTENDED PUSHDOWN MACHINES
CONSTRUCTION FOR QUASI-SIMPLE
GRAMMARS
a b c ↵ Parse the following:
S Rep (SAa) Rep 1. ab ↵
Retain (b) Reject Reject S 2. acbb↵
Retain ▽ 3. aab↵
A Pop Pop Rep
Retain Retain (SAc) Reject G14:
Retain 1. S → a A S
a Pop
Advance Reject Reject Reject
2. S → b
3. A → c A S
b Pop 4. A → ε
Reject Advance Reject Reject
G14:
Pop Sel(1) = {a}
c Reject Reject Advance
Sel(2) = {b}
▽ Reject Reject Reject Accept
Sel(3) = {c}
Fadzlin Ahmadon - UiTM Jasin Sel(4) = {a,b} 30
RECURSIVE DESCENT PARSER FOR
QUASI-SIMPLE GRAMMARS
• Recursive descent parser for quasi-simple grammars
are similar to those for simple grammars.
• The only difference is if any of the selection set for ε
rule is the current input symbol, we simply return to
the calling method without reading any input.

31
RECURSIVE DESCENT PARSER FOR
QUASI-SIMPLE GRAMMARS
void S ()
{
if (inp=='a') void A ()
// apply rule 1 {
{ if (inp=='c')
inp = getInp(); {
A (); inp = getInp();
S (); // apply rule 3
} // end rule 1 A();
else S();
if (inp=='b') }// end rule 3
inp = getInp(); else
// apply rule 2 if (inp=='a'||inp=='b')
else ; // apply rule 4
reject(); else
} reject();
}

32
EXERCISE

• Find the selection sets for the following grammar. Is


the grammar quasi-simple? If so, show a pushdown
machine and a recursive descent parser (show
methods S() and A() only) corresponding to this
grammar.

1. S bAb
2. S a
3. A ε
4. A aSa

33
LL(1) GRAMMARS

• Grammars that can be parsed down:


• Simple Grammar (A → aα)
• Quasi-simple Grammar (A → aα) with (N → ε)
• LL(1) Grammar (A → α)
• Called LL(1) Grammar because:
• The parser finds a left-most derivation when
scanning the input from left to right if it can look
ahead no more than one input symbol.
• Grammar must be such that any two rules defining
the same non terminal must have disjoint selection
sets.

34
LL(1) GRAMMARS

• Like Simple Grammar and Quasi-simple Grammar,


we can construct a one-state pushdown machine
parser or recursive descent parser for LL(1)
Grammar.
• We have to find the selection sets for the LL(1)
Grammar before we can construct pushdown
machine / recursive descent parser
• There are 12 steps that we have to do to find the
selection set

35
LL(1) GRAMMARS

G15:
1. S → ABc
2. A → bA
3. A → ε
4. B → c
Step 1.
Find all nullable rules and nullable nonterminals:

a. Remove, temporarily, all rules containing a terminal.


b. All ε rules are nullable rules.
c. The nonterminal defined in a nullable rule is a
nullable nonterminal.
LL(1) GRAMMARS

Step 1.
Find all nullable rules and nullable nonterminals:

d. All rules in the form A → B C D ... where B,C,D,... are all


nullable non-terminals are nullable rules, (nullable
nonterminals).
e. A nonterminal is nullable if ε can be derived from it, and a
rule is nullable if ε can be derived from its right side.

For G15: Nullable rules: rule 3; Nullable nonterminals: A


LL(1) GRAMMARS

Step 2.
Compute the relation “Begins Directly With” for each
nonterminal:

A BDW X if there is a rule A → α X β such that


• α is a nullable string (a string of nullable non-
terminals).
• A represents a nonterminal .
• X represents a terminal or nonterminal.
• β represents any string of terminals and nonterminals.
LL(1) GRAMMARS

Step 2.
Compute the relation “Begins Directly With” for each
nonterminal:

G15 :
G15:
S BDW A (from rule 1)
1. S → ABc
S BDW B (also from rule 1,
2. A → bA
because A is nullable)
3. A → ε
A BDW b (from rule 2)
4. B → c
B BDW c (from rule 4)
LL(1) GRAMMARS

Step 3.
Compute the relation “Begins With”:

a. X BW Y if there is a string beginning with Y that can


be derived from X.
b. BW is the reflexive transitive closure of BDW. In
addition, BW should contain pairs of the form
a BW a for each terminal a in the grammar.
Step 3.
Compute the relation “Begins With”:
For G15 –
S BW A
S BW B (from BDW)
A BW b G15:
B BW c 1. S → ABc
2. A → bA
S BW b (transitive) 3. A → ε
S BW c 4. B → c

S BW S
A BW A
B BW B (reflexive)
b BW b
c BW c
LL(1) GRAMMARS

Step 4.
Compute the set of terminals "First(x)" for each symbol x
in the grammar.

a. At this point, we can find the set of all terminals which


can begin a sentential form when starting with
a given symbol of the grammar.
b. First(A) = set of all terminals b, such that A BW b for
each nonterminal A.
c. First(t) = {t} for each terminal.
LL(1) GRAMMARS

Step 4.
Compute the set of terminals "First(x)" for each symbol
x in the grammar.

For G15 – G15:


1. S → ABc
First(S) = {b,c} 2. A → bA
First(A) = {b} 3. A → ε
First(B) = {c} 4. B → c
First(b) = {b}
First(c) = {c}
LL(1) GRAMMARS

Step 5.
Compute "First" of right side of each rule:
a. Compute the set of terminals which can begin a
sentential form derivable from the right side of each
rule.
First (XYZ...) = {First(X)}
U {First(Y)} if X is nullable
U {First(Z)} if Y is also nullable . . .
b. Find the union of the First(x) sets for each symbol on
the right side of a rule, but stop when reaching a
non-nullable symbol.
LL(1) GRAMMARS

Step 5.
Compute "First" of right side of each rule:

For G15 – G15:


1. First(ABc)=First(A) U First(B)={b,c} 1. S → ABc
(because A is nullable) 2. A → bA
2. First(bA) = {b} 3. A → ε
3. First(ε) = {} 4. B → c
4. First(c) = {c}

* If the grammar contains no nullable rules, skip to


step 12 at this point.
LL(1) GRAMMARS

Step 6. Compute the relation “Is Followed Directly By”:

✓ B FDB X if there is a rule of the form A → α B β X γ


where β is a string of nullable nonterminals, α, γ are
strings of symbols, X is any symbol, and A and B are
nonterminals.
LL(1) GRAMMARS

Step 6. Compute the relation “Is Followed Directly By”:

✓For G15 –
A FDB B (from rule 1) G15:
B FDB c (from rule 1) 1. S → ABc
2. A → bA
✓ If B were a nullable nonterminal we 3. A → ε
would also have A FDB c. 4. B → c
LL(1) GRAMMARS

Step 7.
Compute the relation “Is Direct End Of”:

✓X DEO A if there is a rule of the form:


A → α X β where β is a string of nullable
nonterminals, α is a string of symbols, and X is a
single grammar symbol.
LL(1) GRAMMARS

Step 7. G15:
Compute the relation “Is Direct End Of”: 1. S → ABc
2. A → bA
For G15 – 3. A → ε
c DEO S (from rule 1) 4. B → c
A DEO A (from rule 2)
b DEO A (from rule 2, since A is nullable)
c DEO B (from rule 4)
LL(1) GRAMMARS

Step 8.
Compute the relation “Is End Of”:

a. X EO Y if there is a string derived from Y that


ends with X.
b. EO is the reflexive transitive closure of DEO.
c. EO should contain pairs of the form N EO N for
each nullable nonterminal, N, in the grammar.
LL(1) GRAMMARS

Step 8.
Compute the relation “Is End Of”:

For G15 – G15:


c EO S 1. S → ABc
A EO A (from DEO ) 2. A → bA
b EO A 3. A → ε
c EO B 4. B → c
(no transitive entries)
c EO c
S EO S (reflexive)
b EO b
B EO B
LL(1) GRAMMARS

Step 9.
Compute the relation “Is Followed By”:

✓ W FB Z if there is a string derived from S↵ in which


W is immediately followed by Z.
✓If there are symbols X and Y such that
W EO X (Step 8)
X FDB Y (Step 6)
Y BW Z (Step 3)
then
W FB Z
LL(1) GRAMMARS

Step 9.
Compute the relation “Is Followed By”:

For G15 –
A EO A A FDB B B BW B A FB B
B BW c A FB c
b EO A B BW B b FB B
B BW c b FB c
B EO B B FDB c c BW c B FB c
c EO B c BW c c FB c
(Step 8) (Step 6) (Step 3)
LL(1) GRAMMARS

Step 10.
Extend the FB relation to include end marker:

✓ A FB ↵ if A EO S where A represents any non terminal


and S represents the starting non terminal.

For G15 –
S FB ↵ because S EO S

✓ There are now seven pairs in the FB relation for


grammar G15.
LL(1) GRAMMARS

Step 11.
Compute the Follow Set for each nullable non terminal:

✓ The follow set of any non terminal A is the set of all


terminals, t, for which A FB t.
Fol(A) = {t | A FB t}

✓ To find selection sets, we need find follow sets for


nullable non terminals only.

For G15 –
Fol(A) = {c} since A is the only nullable nonterminal and
A FB c.
LL(1) GRAMMARS

Step 12.
Compute the selection set for each rule:

i. A → α
G15:
if rule i is not a nullable rule, then
Sel(i) = First(α) (from Step 5) 1. S → ABc
2. A → bA
if rule i is a nullable rule, then
Sel(i) = First(α) U Fol(A) 3. A → ε
4. B → c
LL(1) GRAMMARS

Step 12.
Compute the selection set for each rule:

For G15 –
Sel(1) = First(ABc) = {b,c}
Sel(2) = First(bA) = {b}
Sel(3) = First(ε) U Fol(A) = {} U {c} = {c}
Sel(4) = First(c) = {c}
b c ↵ Pushdown Machines for
S Rep Rep LL(1) Grammars exactly
(cBA) (cBA) Reject as Pushdown Machines
Retain Retain
for quasi-simple
A Rep (Ab) Pop grammars
Retain Retain Reject
B Reject Rep (c)
Retain Reject G15:
b Pop 1. S → ABc
Advance Reject Reject S 2. A → bA

c Pop 3. A → ε
Reject Advance Reject
Initial 4. B → c
▽ Reject Reject Accept Stack
Pushdown Machine for Grammar G5
b
b A A A c c Accept
→ B → B → B → B → c → → ↵
S c c c c c c →
▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽
Sequence of Stacks for Machine for Grammar G15. Input bcc↵
Recursive Descent for LL(1) Grammars
void S() void A()
{ {
if (inp=='b' || inp=='c') if (inp=='b')
// apply rule 1 // apply rule 2
{ {
A (); inp=getInp();
B (); A ();
if (inp=='c') } // end rule 2
inp=getInp(); else
else if (inp=='c')
reject(); ; // apply rule 3
} // end rule 1 else
else reject();
reject(); }
}

The construction of the


void B()
recursive descent parser
{
if (inp=='c') is exactly as for quasi-
inp=getInp(); // apply rule 4 simple grammars.
else ➢ check for the input
reject(); symbols in the selection
} set for that grammar
Dependency Graph for the Steps in the
Algorithm for Finding Selection Set 1

1. Find nullable rules and nullable


non terminals 6 7
2. Find “Begins Directly With” 2
relation (BDW).
3. Find “Begins With” relation (BW).
4. Find “First(x)” for each symbol, x.
3 9 8
5. Find “First(n)” for the right side of
each rule, n.
6. Find “Followed Directly By”
relation (FDB).
7. Find “Is Direct End Of” relation
4 10
(DEO).
8. Find “Is End Of” relation (EO).
9. Find “Is Followed By” relation
(FB).
10. Extend FB to include endmarker. 5 11
11. Find Follow Set, Fol(A), for each
nullable nonterminal, A.
12. Find Selection Set, Sel(n), for
each rule, n. 12
EXERCISES

61
EXERCISES

62
EXERCISES

63
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
• We now understand how to determine if a
grammar can be parsed down and how to
construct a top down parser
• Now we can begin to study how to parse arithmetic
expressions : used widely in programming
languages

64
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
• Check if this grammar is LL(1):
G 5:
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term ∗ Factor
4. Term → Factor
5. Factor → ( Expr )
6. Factor → var

• The twelve steps algorithm


1. Nullable rule & nonterminal
• Nullable rules: none
• Nullable nonterminals: none

65
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
2. Begins Directly With Relation
• Expr BDW Expr
• Expr BDW Term
• Term BDW Term
• Term BDW Factor
• Factor BDW (
• Factor BDW var

66
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
3. Begins With Relation
• Expr BW Expr
• Expr BW Term
• Term BW Term
• Term BW Factor
• Factor BW (
• Factor BW var
• Expr BW Factor
• Expr BW (
• Expr BW var
• Term BW (
• Term BW var
• Factor BW Factor
• ( BW (
• var BW var
• ∗ BW ∗
• + BW +
• ) BW )
67
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
4. First(x)
• First(Expr) = {(,var}
• First(Term) = {(,var}
• First(Factor) = {(,var}

5. First( ) right side of each rule


1) First(Expr + Term) = {(,var}
2) First(Term) = {(,var}
3) First(Term ∗ Factor) = {(,var}
4) First(Factor) = {(,var}
5) First( ( Expr ) ) = {(}
6) First (var) = {var}

68
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
G 5:
1. Expr → Expr + Term Sel(1) = {(,var}
2. Expr → Term Sel(2) = {(,var}
3. Term → Term ∗ Factor Sel(3) = {(,var}
4. Term → Factor Sel(4) = {(,var}
5. Factor → ( Expr ) Sel(5) = {(}
6. Factor → var Sel(6) = {var}

• This grammar is not LL(1) because rules 1 and 2


define the same non terminal Expr and their
selection sets intersect
• This is also true for rules 3 and 4

69
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN
G 5:
Sel(1) = {(,var}
1. Expr → Expr + Term
Sel(2) = {(,var}
2. Expr → Term
Sel(3) = {(,var}
3. Term → Term ∗ Factor
Sel(4) = {(,var}
4. Term → Factor
Sel(5) = {(}
5. Factor → ( Expr )
Sel(6) = {var}
6. Factor → var

• Rules 1 and 3 both have a property known as left


recursion :
1. Expr → Expr + Term
3. Term → Term ∗ Factor

• They are in the form: A → Aα


PARSING ARITHMETIC EXPRESSIONS
TOP DOWN

• Left recursion would produce infinite recursion.


• The left recursion can be eliminated by rewriting
the grammar with an equivalent grammar that
does not have left recursion

• The offending rule might be in the form:


A → Aα
A→β
in which we assume that β is a string of terminals
and nonterminals that does not begin with an A.
PARSING ARITHMETIC EXPRESSIONS
TOP DOWN

• The left recursion can be eliminated by introducing a


new nonterminal, R, and rewriting the rules as:

A→βR
A → Aα
R→αR
A→β
R→ε
Parsing Arithmetic Expressions Top Down
G16:
G 5:
1. Expr → Term Elist
1. Expr → Expr + Term 2. Elist → + Term Elist
2. Expr → Term 3. Elist → ε
3. Term → Term ∗ Factor 4. Term → Factor Tlist
4. Term → Factor
5. Factor → ( Expr )
5. Tlist → ∗ Factor Tlist
6. Factor → var 6. Tlist → ε
7. Factor → ( Expr )
8. Factor → var

Step 2.
Step 1. Expr BDW Term
Nullable rules: 3,6 Elist BDW +
Nullable nonterminals: Term BDW Factor
Elist, Tlist Tlist BDW ∗
Factor BDW (
Factor BDW var
Parsing Arithmetic Expressions Top Down
G16:
1. Expr → Term Elist
2. Elist → + Term Elist
Step 3. 3. Elist → ε
Expr BW Term 4. Term → Factor Tlist
Elist BW + 5. Tlist → ∗ Factor Tlist
Term BW Factor (from BDW) 6. Tlist → ε
Tlist BW ∗ 7. Factor → ( Expr )
Factor BW ( 8. Factor → var
Factor BW var
Expr BW Expr
Term BW Term
Expr BW Factor Factor BW Factor
Term BW ( Elist BW Elist
Term BW var (transitive) Tlist BW Tlist (reflexive)
Expr BW ( Factor BW Factor
Expr BW var + BW +
∗ BW ∗
( BW (
var BW var
) BW )
Parsing Arithmetic Expressions Top Down
G16:
1. Expr → Term Elist
Step 4. 2. Elist → + Term Elist
First (Expr) = {(,var} 3. Elist → ε
First (Elist) = {+} 4. Term → Factor Tlist
First (Term) = {(,var} 5. Tlist → ∗ Factor Tlist
First (Tlist) = {∗} 6. Tlist → ε
First (Factor) = {(,var} 7. Factor → ( Expr )
8. Factor → var

Step 5.
1. First(Term Elist) = {(,var}
2. First(+ Term Elist) = {+}
3. First(ε) = {}
4. First(Factor Tlist) = {(,var}
5. First(∗ Factor Tlist) = {∗}
6. First(ε) = {}
7. First(( Expr )) = {(}
8. First(var) = {var}
Parsing Arithmetic Expressions Top Down

Step 6. G16:
Term FDB Elist 1. Expr → Term Elist
Factor FDB Tlist 2. Elist → + Term Elist
Expr FDB ) 3. Elist → ε
4. Term → Factor Tlist
Step 7. 5. Tlist → ∗ Factor Tlist
Elist DEO Expr 6. Tlist → ε
Term DEO Expr 7. Factor → ( Expr )
Elist DEO Elist 8. Factor → var
Term DEO Elist
Tlist DEO Term
Factor DEO Term
Tlist DEO Tlist
Factor DEO Tlist
) DEO Factor
var DEO Factor
Parsing Arithmetic Expressions Top Down
Step 8. ) EO Term
Elist EO Expr ) EO Tlist
Term EO Expr ) EO Expr (transitive)
Elist EO Elist ) EO Elist
Term EO Elist var EO Term
Tlist EO Term var EO Tlist
Factor EO Term (from DEO) var EO Expr
Tlist EO Tlist var EO Elist
Factor EO Tlist
) EO Factor Expr EO Expr
var EO Factor Term EO Term
Factor EO Factor
Tlist EO Expr ) EO )
Tlist EO Elist (transitive) var EO var (reflexive)
Factor EO Expr + EO +
Factor EO Elist ∗ EO ∗
( EO (
Elist EO Elist
Tlist EO Tlist
Parsing Arithmetic Expressions Top Down
Step 9.
Tlist EO Term FDB Elist BW + Tlist FB +
BW Elist
Factor EO BW +
BW Elist
var EO BW +
BW Elist
Term EO BW +
BW Elist
) EO BW +
BW Elist
) EO Factor FDB Tlist BW ∗
BW Tlist
var EO BW ∗
BW Tlist
Factor EO BW ∗
BW Tlist
Elist EO Expr FDB ) BW ) Elist FB )
Tlist EO Expr Tlist FB )
Parsing Arithmetic Expressions Top Down
Step 10. G16:
Elist FB ↵ 1. Expr → Term Elist
Term FB ↵ 2. Elist → + Term Elist
Expr FB ↵ 3. Elist → ε
Tlist FB ↵ 4. Term → Factor Tlist
Factor FB ↵ 5. Tlist → ∗ Factor Tlist
6. Tlist → ε
Step 11. 7. Factor → ( Expr )
Fol (Elist) = {), ↵} 8. Factor → var
Fol (Tlist) = {+,), ↵}

Step 12.
Sel(1) = First(Term Elist) = {(,var}
Sel(2) = First(+ Term Elist) = {+}
Sel(3) = Fol(Elist) = {), ↵}
Sel(4) = First(Factor Tlist) = {(,var}
Sel(5) = First(∗ Factor Tlist) = {∗}
Sel(6) = Fol(Tlist) = {+,), ↵}
Sel(7) = First( ( Expr ) ) = {(}
Sel(8) = First(var) = {var}
PUSHDOWN MACHINES FOR LL(1)
GRAMMARS
G16:
1. Expr → Term Elist Sel(1) = {(,var}
2. Elist → + Term Elist Sel(2) = {+}
3. Elist → ε Sel(3) = {), ↵}
4. Term → Factor Tlist Sel(4) = {(,var}
5. Tlist → ∗ Factor Tlist Sel(5) = {∗}
6. Tlist → ε Sel(6) = {+,), ↵}
7. Factor → ( Expr ) Sel(7) = {(}
8. Factor → var Sel(8) = {var}

Since all rules defining the same non terminal (rules 2 and 3,
rules 5 and 6 and rules 7 and 8 have disjoint selection sets, the
grammar G16 is LL(1) grammar.
+ * ( ) var ↵
Expr Reject Reject Rep(Elist, Reject Rep(Elist, Reject
Term) Term)
Retain Retain
Elist Rep(Elist, Reject Reject Pop Reject Pop
Term,+) Retain Retain
Retain

Term Reject Reject Rep Reject Rep Reject


(Tlist,Factor) (Tlist,Factor)
Retain Retain S
Tlist Pop Rep(Tlist, Reject Pop Reject Pop ▽
Retain Factor,*) Retain Retain
Retain
Initial
Factor Reject Reject Rep(),Expr,() Reject Rep(var) Reject
Retain Retain Stack
+ Pop Reject Reject Reject Reject Reject
Advance
* Reject Pop Reject Reject Reject Reject
Advance
( Reject Reject Pop Reject Reject Reject
Advance
) Reject Reject Reject Pop Reject Reject
Advance
var Reject Reject Reject Reject Pop Reject
Advance
▽ Reject Reject Reject Reject Reject Accept
Recursive Descent for LL(1) Grammars
G16:
void parse () 1. Expr → Term Elist
{ 2. Elist → + Term Elist
inp = getInp(); 3. Elist → ε
Expr (); 4. Term → Factor Tlist
// Call start nonterminal
5. Tlist → ∗ Factor Tlist
if (inp=='\r') 6. Tlist → ε
accept(); 7. Factor → ( Expr )
// end of string marker 8. Factor → var
else
reject(); void Elist ()
} {
if (inp=='+')
// apply rule 2
void Expr ()
{
{
inp=getInp();
if (inp=='(' || inp=='var')
Term ();
// apply rule 1
Elist ();
{
} // end rule 2
Term ();
else
Elist ();
if (inp==')' || inp=='\r')
} // end rule 1
; // apply rule 3
else
else
reject();
reject ();
}
}
void Term ()
{
if (inp=='(' || inp=='v') Recursive Descent for
// apply rule 4 LL(1) Grammars
{
Factor ();
Tlist ();
} // end rule 4 void Factor ()
else {
reject(); if (inp=='(')
} // apply rule 7
{
void Tlist () inp=getInp();
{ Expr ();
if (inp=='*') if (inp==')')
// apply rule 5 inp=getInp();
{ else
inp=getInp(); reject();
Factor (); } // end rule 7
Tlist (); else
} // end rule 5 if (inp=='v')
else inp=getInp();
if (inp=='+' || inp==')' || inp=='\r') // apply rule 8
; // apply rule 6 else
else reject();
reject(); }
}
TOPIC 4:
SYNTAX & SEMANTIC
ANALYSIS
TEXT BOOK: COMPILER DESIGN: THEORY, TOOLS &
EXAMPLES (JAVA EDITION) BY BERGMANN, SETH D.
(CHAPTER 4)

1
SYNTAX DIRECTED TRANSLATION
• The programs we have developed can check only for
syntax errors; they cannot produce output, and do not
deal at all with semantic
• Semantic – The intent or meaning of the source program
• To determine the semantics, action symbols are
introduced
• Action symbols – gives the capability of producing
output and/or calling other methods while parsing an input
string
• Action symbols in grammar is enclosed in curly braces
{}
• Translation grammar – a grammar containing action
symbols

2
SYNTAX DIRECTED
TRANSLATION Capability of parser to
produce output is called
Example of translation grammar: Syntax Directed Translation

G17:
1. Expr → Term Elist
2. Elist → + Term {+} Elist
3. Elist → ε
4. Term → Factor Tlist
5. Tlist → ∗ Factor {∗} Tlist
6. Tlist → ε
7. Factor →( Expr )
8. Factor → var {var}

G17 translate infix expressions involving addition and multiplication to


postfix form

3
SYNTAX DIRECTED
TRANSLATION
• To find the selection sets in a translation grammar, simply remove
the action symbols
• Underlying grammar – translation grammar with action symbols
removed
• Underlying grammar for G17 is G16 (from previous lesson)
G16:
1. Expr → Term Elist Sel(1) = {(,var}
2. Elist → + Term Elist Sel(2) = {+}
3. Elist → ε Sel(3) = {), ↵}
4. Term → Factor Tlist Sel(4) = {(,var}
5. Tlist → ∗ Factor Tlist Sel(5) = {∗}
6. Tlist → ε Sel(6) = {+,), ↵}
7. Factor → ( Expr ) Sel(7) = {(}
8. Factor → var Sel(8) = {var}

4
SYNTAX DIRECTED
TRANSLATION
Question:
• This grammar is of this form:
• A → {action}
• What is the underlying grammar of this
translation grammar?

5
SYNTAX DIRECTED
TRANSLATION
• Draw a derivation tree for the expression : var + var * var
using Grammar G17:
Expr
G17:
1. Expr → Term Elist
2. Elist → + Term {+} Elist
3. Elist → ε Term Elist
4. Term → Factor Tlist
5. Tlist → ∗ Factor {∗} Tlist
6. Tlist → ε
7. Factor →( Expr ) + Term {+} Elist
Factor Tlist
8. Factor → var {var}

Sel(1) = {(,var}
Factor Tlist 
Sel(2) = {+} var {var} 
Sel(3) = {), ↵}
Sel(4) = {(,var}
Sel(5) = {∗} var {var} Factor
Sel(6) = {+,), ↵} * {*} Tlist
Sel(7) = {(}
Sel(8) = {var}
var {var} 

6
SYNTAX DIRECTED
TRANSLATION
• List the leaves of the derivation tree and we shall get:
var {var} + var {var} ∗ var {var} {∗} {+}

• To get the output as defined by


the translation grammar; just
separate the action symbols:

{var} {var} {var} {∗} {+}

7
EXTENDED PUSHDOWN
TRANSLATOR FOR
TRANSLATION GRAMMAR
Rules to construct a pushdown machine for translation
grammar:
• Each action symbol {A} representing output have a
row on its own on the pushdown machine
• Fill up every column in that row with Out(A), Pop,
Retain.
• Action symbols should be treated as stack symbols
and are pushed onto the stack in exactly the same
way as terminals and non terminals occurring on the
right side of a grammar rule.

8
Extended Pushdown Machines for G17
+ * ( ) var ↵
Expr Reject Reject Rep(Elist, Reject Rep(Elist, Reject
Term) Term)
Retain Retain
Elist Rep(Elist, Reject Reject Pop Reject Pop
{+},Term,+) Retain Retain
Retain
Term Reject Reject Rep Reject Rep Reject
(Tlist,Factor) (Tlist,Factor) Expr
Retain Retain

. . . . . . .
. . . . . . .
. . . . . . . Initial
. . . . . . . Stack
{var} Pop Pop Pop Pop Pop Pop
Retain Retain Retain Retain Retain Retain
Out(var) Out(var) Out(var) Out(var) Out(var) Out(var
)
{+} Pop Pop Pop Pop Pop Pop
Retain Retain Retain Retain Retain Retain
Out(+) Out(+) Out(+) Out(+) Out(+) Out(+)
{*} Pop Pop Pop Pop Pop Pop
Retain Retain Retain Retain Retain Retain
Out(*) Out(*) Out(*) Out(*) Out(*) Out(*)
▽ Reject Reject Reject Reject Reject Accept

An Extended Pushdown Infix to Postfix Translator Constructed from


Grammar G17
Implementing Translation
Grammars with Recursive Descent

void Expr() G17:


{ 1. Expr → Term Elist
if (inp=='(' || inp=='var') 2. Elist → + Term {+} Elist
3. Elist → ε
// apply rule 1 4. Term → Factor Tlist
{ 5. Tlist → ∗ Factor {∗} Tlist
6. Tlist → ε
Term(); 7. Factor →( Expr )
Elist(); 8. Factor → var {var}
} // end rule 1 Sel(1) = {(,var}
else Sel(2) = {+}
reject(); Sel(3) = {), ↵}
Sel(4) = {(,var}
} Sel(5) = {∗}
Sel(6) = {+,), ↵}
Sel(7) = {(}
Sel(8) = {var}
Implementing Translation Grammars
with Recursive Descent

void Term()
G17:
{ 1. Expr → Term Elist
if (inp=='(' || inp=='var') 2. Elist → + Term {+} Elist
3. Elist → ε
// apply rule 4 4. Term → Factor Tlist
{ 5. Tlist → ∗ Factor {∗} Tlist
6. Tlist → ε
Factor(); 7. Factor →( Expr )
8. Factor → var {var}
Tlist();
} // end rule 4 Sel(1) = {(,var}
Sel(2) = {+}
else Sel(3) = {), ↵}
reject(); Sel(4) = {(,var}
Sel(5) = {∗}
} Sel(6) = {+,), ↵}
Sel(7) = {(}
Sel(8) = {var}
Implementing Translation Grammars
with Recursive Descent
void Elist()
{
G17:
if (inp=='+') // apply rule 2 1. Expr → Term Elist
{ 2. Elist → + Term {+} Elist
3. Elist → ε
inp=getInp(); 4. Term → Factor Tlist
Term (); 5. Tlist → ∗ Factor {∗} Tlist
System.out.print("+ "); 6. Tlist → ε
7. Factor →( Expr )
Elist(); 8. Factor → var {var}
} // end rule 2
Sel(1) = {(,var}
else Sel(2) = {+}
if (inp==')' || inp=='\r') Sel(3) = {), ↵}
Sel(4) = {(,var}
; // apply rule 3 Sel(5) = {∗}
else Sel(6) = {+,), ↵}
reject(); Sel(7) = {(}
Sel(8) = {var}
}
Implementing Translation Grammars with
Recursive Descent
void Tlist()
{ G17:
if (inp=='*') // apply rule 5 1. Expr → Term Elist
2. Elist → + Term {+} Elist
{ 3. Elist → ε
inp=getInp(); 4. Term → Factor Tlist
Factor(); 5. Tlist → ∗ Factor {∗} Tlist
6. Tlist → ε
System.out.print ("* "); 7. Factor →( Expr )
Tlist(); 8. Factor → var {var}
} // end rule 5
else
if (inp=='+' || inp==‘)’ || inp=='\r‘);
// apply rule 6
else Sel(1) = {(,var}
reject(); Sel(2) = {+}
} Sel(3) = {), ↵}
Sel(4) = {(,var}
Sel(5) = {∗}
Sel(6) = {+,), ↵}
Sel(7) = {(}
Sel(8) = {var}
Implementing Translation Grammars with
Recursive Descent
void Factor()
{ G17:
if (inp=='(') // apply rule 7 1. Expr → Term Elist
{ 2. Elist → + Term {+} Elist
3. Elist → ε
inp=getInp(); 4. Term → Factor Tlist
Expr(); 5. Tlist → ∗ Factor {∗} Tlist
if (inp==')') 6. Tlist → ε
7. Factor →( Expr )
inp=getInp(); 8. Factor → var {var}
else
reject(); Sel(1) = {(,var}
} // end rule 7 Sel(2) = {+}
else Sel(3) = {), ↵}
Sel(4) = {(,var}
if (inp=='var') Sel(5) = {∗}
{ Sel(6) = {+,), ↵}
inp=getInp(); // apply rule 8 Sel(7) = {(}
Sel(8) = {var}
System.out.print ("var");
}
else
reject();
}
EXERCISE 1

Show the recursive descent translator and the extended


pushdown translator based on the G18 grammar above.

15
EXERCISE 2

Based on the grammar above:

1. Show a derivation tree and the output string for the input bacb.
2. Show an extended pushdown translator for the translation
grammar.
3. Show a recursive descent translator for the translation
grammar

16
ATTRIBUTED
GRAMMARS

Attributed grammar:
Context-free grammar that has been
extended to provide context-sensitive
information by appending attributes to
some of its terminals and non terminals.

17
ATTRIBUTED
GRAMMARS
• Many programming languages construct cannot be
adequately described by context-free grammar.
• Examples:
• Loop control variable must not be altered within loop
scope
• Information such as location of temporary result of
sub expression must be transmitted
• To handle the situations above we further extend
grammars by introducing attributed grammars

18
ATTRIBUTED GRAMMARS
ATTRIBUTED GRAMMARS:
• Each of the terminals and non terminals in the
grammar may have zero or more attributes
• Attributes are normally designated by subscripts
• There may be zero or more attribute computation
rules associated with each grammar rule
• These attribute computation rules show how the attributes
are assigned values when the rule is being parsed
G19:
Grammar rules Computation rules
1. Exprp → + Exprq Exprr p  q + r
2. Exprp → ∗ Exprq Exprr p  q ∗ r
3. Exprp → constq p  q

19
ATTRIBUTED
GRAMMARS
• To understand attributed grammar we can build
derivation trees.
• Steps:
1. Eliminate all attributes from grammar and build the
derivation tree
2. Enter attribute values into the tree according to attribute
computation rules
3. Know that attributed grammars can pass values through
two ways:
1. From child node to parent (synthesized attributes)
2. From parent node to child (inherited attributes)

20
ATTRIBUTED
GRAMMARS
An Attributed Derivation Tree using G19 for the Prefix Expression:
+ ∗ 3 4 + 5 6 G19:
Grammar rules Computation rules
1. Exprp → + Exprq Exprr p  q + r
2. Exprp → ∗ Exprq Exprr p  q ∗ r
Expr23 3. Exprp → constq p  q

Expr12 Expr11
+ synthesized attributes: all
attribute values are taken from
a lower node in the tree.
* Expr3 Expr4 + Expr5 Expr6

const3 const4 const5 const6

21
ATTRIBUTED
GRAMMARS
An Attributed Derivation Tree using G20 to specify type
declarations for: G20:
Integer a, b, c 1. Dcl → typet Varlistv v  t
2. Varlistv → Varlistw , identx w  v
Dcl
3. Varlistv → identx

Typeinteger Varlistinteger inherited attributes: all


attribute values are taken from
higher nodes in the tree.
,
Varlistinteger identc

,
Varlistinteger identb

identa

22
ATTRIBUTED
TRANSLATION GRAMMAR
FOR EXPRESSION
• What have we learned so far on top down parsing?
• LL(1) Grammar
• Translation Grammar
• Attributed Grammar
• Now we will combine what we’ve learned and apply
it to an attributed translation grammar
• The example we are discussing is for infix
operations involving addition and multiplication
• The output of this is a stream of atoms, which can
be easily translated to machine language

23
ATTRIBUTED
TRANSLATION GRAMMAR
FOR EXPRESSION
• The atoms for this translator will consist of four
parts:
1. an operation: ADD or MULT
2. a left operand
3. a right operand
4. a result
• Example for this string: A + B * C + D
• Output: MULT B C Temp1
ADD A Temp1 Temp2
ADD Temp2 D Temp3

24
ATTRIBUTED
TRANSLATION GRAMMAR
FOR EXPRESSION put out an ADD
atom with
operands p and
r and result s

G21:
1. Exprp → Termq Elistq,p
2. Elistp,q → + Termr {ADD}p,r,s Elists,q s  Alloc()
3. Elistp,q → ε q  p
4. Termp → Factorq Tlistq,p
5. Tlistp,q → ∗ Factorr {MULT}p,r,s Tlists,q s  Alloc()
6. Tlistp,q → ε q  p
7. Factorp → (Exprp )
8. Factorp → identp

put out a MULT method to allocate


atom with space for temporary
operands p and result and returns
r and result s pointer to it

25
ATTRIBUTED TRANSLATION GRAMMAR FOR
EXPRESSION

Show an attributed derivation tree for the expression a+b using grammar
G21:
ExprT1

Terma Elista,T1

Factora Tlista,a + Termb {ADD}a,b,T1 ElistT1,T1

Factorb Tlistb,b
ε
identa
ε

26
identb ε
EXERCISE

Show an attributed derivation tree for the following expressions, using


grammar G21. Assume that the Alloc method returns a new temporary
location each time it is called (Temp1, Temp2, Temp3, ...).

27
DECAF EXPRESSIONS
• We have looked at how to translate infix expressions
involving addition and multiplication
• Subtractions and division is straightforward because it
have the same precedence as addition and multiplication
• We will now extend the expression to include:
• Comparisons (boolean expression)
• Assignment
• To do this we will introduce three new atoms

28
DECAF EXPRESSIONS
Atoms used for Transfer of Control

Atom Attributes Purpose


LBL label name Mark a spot to be used as a branch
destination
JMP label name Unconditional branch to the label
specified
TST Expr1 Compare Expr1 and Expr2 using the
Expr2 comparison code. Branch to the label if
comparison code the result is false.
label name

29
DECAF EXPRESSIONS
Boolean expression
• use TST atom but it branches when the comparison is false
if
(x==3)
[TST] // Branch to the Label only if x==3 is false
stmt
[Label]

while
[Label1]
(x>2)
[TST] // Branch to Label2 only if x>2 is false
stmt
[JMP] // Unconditional branch to Label1
[Label2]

BoolExprLbl → Exprp comparec Exprq {TST}p,q,,7-c,Lbl

30
DECAF EXPRESSIONS
BoolExprLbl → Exprp comparec Exprq {TST}p,q,,7-c,Lbl

• The attributed grammar for boolean expression is as above.


• {TST} a,b,,c,x
• a, b = values being compared
• c = comparison code
• x = if comparison false branch to label x
Comparison Code Logical Complement Code for complement
== 1 != 6
< 2 >= 5
> 3 <= 4
<= 4 > 3
>= 5 < 2

31
!= 6 == 1
DECAF EXPRESSIONS
Assignment
• Assignment is operator which returns a result that can be used
as part of a larger expression
x = (y = 2) + (z = 3); // y is 2, z is 3, x is 5

• This means we need to use MOV atom to implement


assignment
• The attributed grammar for assignment expression can be as
below:

Exprp → AssignExprp
AssignExprp → identp = Exprq {MOV}q,,p

32
1. BoolExprLl → Exprp comparec Exprq {TST}p,q,,7-c,Ll L1  newLabel()
2. Exprp → AssignExprp
3. Exprp → Rvaluep
4. AssignExprp → identp = Exprq {MOV}q,,p
5. Rvaluep → Termq Elistq,p
6. Elistp,q → + Termr {ADD}p,r,s Elists,q s  alloc()
7. Elistp,q → - Termr {SUB}p,r,s Elists,q s  alloc()
8. Elistp,q → ε q  p
9. Termp → Factorq Tlistq,p
10. Tlistp,q → * Factorr {MUL}p,r,s Tlists,q s  alloc()
11. Tlistp,q → / Factorr {DIV}p,r,s Tlists,q s  alloc()
12. Tlistp,q → ε q  p
13. Factorp → ( Exprp )
14. Factorp → + Factorp
15. Factorp → - Factorq {Neg}q,,p p  alloc()
16. Factorp → nump
17. Factorp → identp An attributed translation grammar for Decaf

33
expressions
EXERCISE
Using the grammar for Decaf expressions given in this section,
show an attributed derivation tree for the boolean expressions:
1) a==(b=3)+c
2) a=b=c
3) a == b+c
4) (a=3) <= (b=2)
5) a * (b=3) + c != 9

34
DECAF EXPRESSIONS
ANSWER FOR (1)

a==(b=3)+c

(Answer in
book page
158)

35
TRANSLATING CONTROL
STRUCTURES
• What are control structures?
• for statement
• while statement
• if statement
• In order to translate these structures, we have to bear in mind
the limitation of machine’s control operations. Usually they are
only capable of simple operations such as:
• Unconditional Jump
• Goto
• Compare
• Conditional Jump

36
TRANSLATING CONTROL
STRUCTURES
• What we can do is to use atoms such as JMP, TST ,
LBL and MOV
Atoms used for Transfer of Control

Atom Attributes Purpose


LBL label name Mark a spot to be used as a branch
destination
JMP label name Unconditional branch to the label
specified
TST Expr1 Compare Expr1 and Expr2 using the
Expr2 comparison code. Branch to the label if
comparison code the result is false.
label name
MOV Src Trg = Src
Trg

37
TRANSLATING CONTROL
STRUCTURES
while stmt
While

Lbl Lbl1

( BoolExprLbl2 )

False

Stmt • Arrow indicates flow of


control at runtime
JMP Lbl1 • ADD, JMP = rectangles
• LBL = ovals
• TST = diamond
Lbl Lbl2
for stmt
TRANSLATING CONTROL for
STRUCTURES ( Expr ;

Lbl Lbl1

BoolExprLbl2 ;
false
JMP Lbl3

Lbl Lbl4

Exprq )

JMP Lbl1

Lbl Lbl3

• Arrow indicates flow of Stmt


control at runtime
• ADD, JMP = rectangles JMP Lbl4
• LBL = ovals
• TST = diamond Lbl Lbl2
TRANSLATING CONTROL
STRUCTURES Else Part (may be omitted)
if stmt Else

If Stmt

( BoolExprLbl1 )

Stmt boolean exprlbl


JMP Lbl2 Exprp
• Arrow indicates flow of
False control at runtime Comparec
• ADD, JMP = rectangles
LBL Lbl1 • LBL = ovals Exprq
• TST = diamond
ElsePart

LBL Lbl2 TSTp,q,,7-c,Lbl


TRANSLATING CONTROL
STRUCTURES
Example: Show the atom string which would be put out that corresponds
to the following Java statement: While (x>0) Stmt

Answer:
(LBL, L1)
(TST, x, 0, , 4, L2) //Branch to L2 if x<=0

Atoms for Stmt

(JMP, L1)
(LBL, L2)

41
TRANSLATING CONTROL
STRUCTURES
Example: Show the atom string which would be put out that
corresponds to the following Java statement: if (y==5) k=k+2;

Answer:
(TST, y, 5, , 6, L1)
(ADD, k, 2, T1)
(MOV, T1, k)
(JMP, L2)
(LBL, L1)
(LBL, L2)

42
TRANSLATING CONTROL
STRUCTURES
Control structure can also be derived from grammar. Example
below shows how attributed translation grammar for Decaf define
‘IF’ and ‘WHILE’ statements:

43
SUMMARY
• Parsing algorithm reads an input string one symbol at a time
• It checks if the string is a member of the language defined by
the grammar
• Top down parsing apply grammar rules in a downward direction in
the derivation tree
• We use selection set to direct the parser to decide which grammar
rule to apply
• There are two techniques for top down parsing that can be used if
the rules defining the same nonterminal have disjoint selection
sets:
• Pushdown machine
• Recursive descent
• Extension of context-free grammar:
• Translation grammar – can specify output
• Attributed grammar – can pass value from one rule to another
• Attributed translation grammar – combination of the two

44
TOPIC 5 – BOTTOM UP
PARSING
1

TEXT BOOK: COMPILER DESIGN: THEORY,


TOOLS & EXAMPLES (JAVA EDITION) BY
BERGMANN, SETH D. (CHAPTER 5)
Introduction
2

 Top Down Parsing: Parse from top to bottom


 Bottom Up Parsing: Parse from bottom to up
 From the bottom of derivation tree to up
 Apply grammar rules in reverse
Shift Reduce Parsing
3

 Bottom up parsing involves TWO fundamental


operations:
 SHIFT Operation : move an input symbol to the stack
 REDUCE Operation: replace symbols on top of the stack with a
nonterminal
 This two operations are doing derivation step in
reverse
 Bottom up parsers are also called shift reduce
parsers because they use the two operations
Shift Reduce Parsing
4

 Shift Reduce parsing steps:


1. Begin with an empty stack
2. Input symbols are moved to the stack (SHIFT)
3. Symbols on the stack replaced by non terminals
according to grammar rules (REDUCE)
4. When all input symbols have been read and the symbol
on top of the stack is the starting non terminal :
ACCEPT
Shift Reduce Parsing
5

 Sequence of Stack Frames Parsing aabb


▽ aabb↵ shift
G:
▽a abb↵ shift 1. A → a B
▽aa bb↵ shift 2. B → A b
▽aab b↵ reduce using rule 3
3. B → b
▽aaB b↵ reduce using rule 1
▽aA b↵ shift
▽aAb ↵ reduce using rule 2
▽aB ↵ reduce using rule 1
▽A ↵ Accept

 String of symbols being reduced: called a handler


▽ caabaab↵
Shift Reduce Parsing
shift
▽c aabaab↵ reduce using rule 2
▽S aabaab↵ shift
▽Sa abaab↵ shift
▽Saa baab↵ G19:
shift
▽Saab aab↵
1. S → S a B
reduce using rule 3
2. S → c
▽SaB aab↵ reduce using rule 1 3. B → a b
▽S aab↵ Shift
▽Sa ab↵ Shift Sequence of Stack
▽Saa b↵ Shift Frames Parsing
▽Saab ↵ reduce using rule 3
caabaab using
▽SaB ↵ Grammar G19
reduce using rule 1
▽S ↵ Accept

6
LR Grammar
7

 For top down parsing we have been introduced with


the concept of LL(1) grammar.
 LL(1) grammar:
 Parser finds a left-most derivation when scanning the input
from left to right
 While looking ahead on only ONE input symbol at a time
 Grammar that can be parsed using top-down parser
 We now will learn the concept of LR grammar:
 LR( L indicates reading input from left, R indicates finding
right-most derivation)
 Parser reads input from the left while finding a right-most
derivation
 Grammar that can be parsed using shift reduce parser
Shift/Reduce Conflict
8

 A Shift/Reduce conflict happens when the parser


does not know whether to:
 Shift an input symbol OR
 Reduce the handle on the stack

 When this happens, the grammar is not LR and


we must either rewrite the grammar or use a
different parsing algorithm
Shift/Reduce Conflict
9

 An example of a Shift/Reduce Conflict for string:


aaab
▽ aaab↵ shift G20:
▽a aab↵ reduce using rule 2 1. S → S a B
▽S aab↵ shift 2. S → a
▽Sa ab↵ shift/reduce conflict
3. B → a b
reduce using rule 2
(incorrect)
▽SS ab↵ shift
▽SSa b↵ shift
▽SSab ↵ reduce using rule 3
▽SSB ↵ Syntax error (incorrect)
Reduce/Reduce Conflict
10

 A Reduce/Reduce conflict happens when it is clear


that a reduce operation should be performed BUT:
 There is more than one grammar rule whose right hand
side matches the top of the stack AND
 It is not clear which rule should be used

 Grammar that causes a reduce/reduce conflict is an


ambiguous grammar, and not LR.
Reduce/Reduce Conflict
11
 Reduce/reduce conflict for string: aa using G21
G21:
1. S → S A
2. S → a
▽ aa↵ shift 3. A → a
▽a a↵ reduce/reduce conflict (rules 2 and 3)

reduce using rule 3 (incorrect)


▽A a↵ shift
▽Aa ↵ reduce/reduce conflict (rules 2 and 3)

reduce using rule 2 (rule 3 will also yield a syntax error)


▽AS ↵ Syntax error
How to Resolve the Conflicts?
12

1. Make an assumption:
 Example: If all shift/reduce conflicts could be resolved by
shifting rather than reducing, no need to rewrite the grammar
2. Rewrite the grammar
3. Look ahead at additional input characters
 Example: Look ahead at k input symbols. (Grammar would be
called LR(k))

For implementing programming languages


using bottom up parser, we usually use only
LR(1) grammar, meaning we don’t look
ahead beyond the current input symbol
Exercises
13

Book:page 177
LR Parsing with Tables
14

 One way to implement shift reduce parsing is by


using tables that determine whether to shift or
reduce and which grammar rule to reduce.
 This technique uses two tables:
 ACTION TABLE: Determines whether a shift or reduce is to be
invoked
 GOTO TABLE: Indicates which stack symbol is to be pushed
on the stack after a reduction

Action GoTo
Table Table
LR Parsing with Tables
15

• Implemented by a push operation


Shift • Followed by an advance input
(pointer) operation

Action
• Implemented by a replace
Reduce operation
• Stack symbols on the right side of

Action
the grammar rule are replaced by
stack symbol from goto table
• Input pointer is retained
LR Parsing with Tables
16

 LR Parser operation:
1. Find the action corresponding to the current input and
the top stack symbol
2. If that action is a shift action:
a) Push the input symbol onto the stack
b) Advance the input pointer
3. If that action is a reduce action:
1. Find the grammar rule specified by the reduce action
2. Pop all the symbols on the right side of the rule
3. On the goto table, check the left side of the rule on COLUMN and
the top stack symbol on ROW – Push the symbol on the CELL
onto the stack
4. Retain input pointer
LR Parsing with Tables
17

 LR Parser operation (continued):


4. If that action is blank, a syntax error has been detected
5. If that action is ACCEPT, terminate
6. Repeat from step 1
LR Parsing with Tables
18

 For example, suppose we have the following stack


and input configuration:
Stack Input
▽S ab↵

 The action shift in action table will result in the


following configuration:
Stack Input
▽Sa b↵

 The a has been shifted from the input to the stack.


LR Parsing with Tables
19

 Suppose, then, that in the grammar, rule 7 is:


7. B → Sa
 Select the row of the goto table labeled ▽, and the
column labeled B.

 If the entry in this cell is push X, then the action


reduce 7 would result in the following
configuration:
Stack Input
▽X b↵
LR Parsing with Tables
20

 LR Parsing tables for grammar G5 (arithmetic expressions


involving only addition and multiplication):

G5
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor
4. Term → Factor
5. Factor →( Expr ) ▽
6. Factor → var Initial Stack
LR Parsing with Tables
21

 Action Table for grammar G5


A c t i o n T a b l e
+ * ( ) var ↵
▽ shift ( shift var
Expr1 shift + Accept
Term1 reduce 1 shift * reduce 1 reduce 1
Factor3 reduce 3 reduce 3 reduce 3 reduce 3
( shift ( shift var
Expr5 shift + shift )
) reduce 5 reduce 5 reduce 5 reduce 5
+ shift ( shift var
Term2 reduce 2 shift * reduce 2 reduce 2
* shift ( shift var
Factor4 reduce 4 reduce 4 reduce 4 reduce 4
var reduce 6 reduce 6 reduce 6 reduce 6
LR Parsing with Tables
22

 Goto table for grammar G5


G o t o T a b l e
Left side
Expr Term Factor
of rules
▽ push Expr1 push Term2 push Factor4
Expr1
Term1

Top Factor3
of ( push Expr5 push Term2 push Factor4
stack
after Expr5
pop
)
+ push Term1 push Factor4
Term2
* push Factor3
Factor4
var
LR Parsing with Tables
23
 Show the sequence of stack, input, action, and goto configurations for the input
var∗var using the parsing tables in previous slides
G5
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor
4. Term → Factor
5. Factor →( Expr )
6. Factor → var
▽ var*var↵ shift var
▽var *var↵ reduce 6 push Factor4
▽Factor4 *var↵ reduce 4 push Term2
▽Term2 *var↵ Shift*
▽Term2* var↵ shift var
▽Term2*var ↵ reduce 6 push Factor3
▽Term2*Factor3 ↵ reduce 3 push Term2
▽Term2 ↵ reduce 2 push Expr1
▽Expr1 ↵ Accept
G5 LR Parsing with Tables
1. Expr → Expr + Term
2. Expr → Term
3. Term → Term * Factor Show the sequence of
4. Term → Factor
configurations when parsing:
5. Factor →( Expr )
6. Factor → var (var+var)∗var

▽ (var+var)*var↵ shift (
▽( var+var)*var↵ shift var
▽(var +var)*var↵ reduce 6 push Factor4
▽(Factor4 +var)*var↵ reduce 4 push Term2
▽(Term2 +var)*var↵ reduce 2 push Expr5
▽(Expr5 +var)*var↵ shift +
▽(Expr5+ var)*var↵ shift var
▽(Expr5+var )*var↵ reduce 6 push Factor4
▽(Expr5+Factor4 )*var↵ reduce 4 push Term1
▽(Expr5+Term1 )*var↵ reduce 1 push Expr5

24
G5 LR Parsing with Tables
1. Expr → Expr + Term
2. Expr → Term Show the sequence of
3. Term → Term * Factor configurations when parsing:
4. Term → Factor (var+var)∗var
5. Factor →( Expr )
6. Factor → var Try parse:
1. (var)
2. (var *var
▽(Expr5 )*var↵
shift )
▽(Expr5) *var↵ reduce 5 push Factor4
▽Factor4 *var↵ reduce 4 push Term2
▽Term2 *var↵ shift *
▽Term2* var↵ shift var
▽Term2*var ↵ reduce 6 push Factor3
▽Term2*Factor3 ↵ reduce 3 push Term2
▽Term2 ↵ reduce 2 push Expr1
▽Expr1 ↵ Accept

25
SableCC
26

 There are software systems that can generate a


parser automatically from specifications
 In this example we will again take a look at SableCC,
which was developed at McGill University
 We will take a look on how to use SableCC to do a
bottom up parsing
SableCC - Overview
27

 User of SableCC prepares a grammar file as well as


two java classes:
 Translation
 Compiler
 Using the grammar file as input SableCC generates
java code to compile source code and produce an
abstract syntax tree as output
Grammar New Compiler
File SableCC
(Java Codes)

New Abstract Syntax


Source Code
Compiler Tree
SableCC - Overview
28

 Generation and Compilation of a Compiler using


SableCC
language.grammar

SableCC

parser lexer node analysis

Translation.java Compiler.java

javac javac

Translation.class Compiler.class
SableCC Grammar File
29

 There are six sections in the grammar file:


1. Package
2. Helpers
3. States
4. Tokens
5. Ignored Tokens
6. Productions
 We’ve discussed on the first four sections in Topic 3
 Ignored Tokens: specify tokens ignored by parser
SableCC - Productions
30

 Productions: contains grammar rules for the


language being defined
 Each definition consists of the name of the
nonterminal being defined, an equal sign, an EBNF
definition and a semicolon to terminate the
production
 Example of a production defining a while statement:

stmt = while l_par bool_expr r_par stmt ;

*l_par = left parenthesis


*r_par = right parenthesis
SableCC - Productions
31

 Productions may use EBNF-like constructs. For


example if x is any grammar symbol, then:

x? // An optional x (0 or 1 occurrences of x)

x* // 0 or more occurrences of x

x+ // 1 or more occurrences of x
SableCC - Productions
32

 Alternative definitions using | are also permitted


 Alternatives must be labeled with names enclosed in
braces
 The following defines an argument list as 1 or more
identifiers, separated with commas:

arg_list = {single} identifier


| {multiple} identifier ( comma identifier ) +
;
SableCC - Productions
33

 Labels must be used when two identical names


appear in a grammar rule
 Each item label must be enclosed in brackets and
followed by a colon
for_stmt = for l_par [init]: assign_expr [s1]:semi bool_expr
[s2]:semi [incr]: assign_expr r_par stmt ;

*semi = semicolon
*[init] = initialize
*[incr] = increment
An Example using SableCC
34

 Grammar to translate infix to postfix

Grammar:
1. Expr → Expr + Term
2. Expr → Expr - Term
3. Expr → Term
4. Term → Term ∗ Factor
5. Term → Term / Factor
6. Term → Factor
7. Factor → ( Expr )
8. Factor → number
An Example using SableCC
35

 Tokens & Ignored Tokens


Tokens
number = ['0'..'9']+;
plus = '+';
minus = '-';
mult = '*';
div = '/';
l_par = '(';
r_par = ')';
blank =
(' ' | 10 | 13 | 9)+;
semi = ';' ;

Ignored Tokens
blank;
An Example using SableCC
36

 Productions Productions
expr =
{term} term |
Alternative {plus} expr plus term |
definition {minus} expr minus term
using ‘|’ and
have name ;
term =
{factor} factor |
Grammar:
1. Expr → Expr + Term
{mult} term mult factor |
2. Expr → Expr - Term {div} term div factor
3. Expr → Term
4. Term → Term ∗ Factor
;
5. Term → Term / Factor factor =
6. Term → Factor {number} number |
7. Factor → ( Expr )
8. Factor → number {paren} l_par expr r_par
;
Exercises
37
 Question 1
Exercises
38
 Question 2

▪ Question 3

You might also like