0% found this document useful (0 votes)
3 views85 pages

Lect 4 IntroSyntaxAnalysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views85 pages

Lect 4 IntroSyntaxAnalysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

✬ ✩

BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 1

Introduction to Syntax Analysis

The Second Phase of Front-End

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 2

Syntax Analysis

• The syntactic or the structural correctness of


a program is checked during the syntax
analysis phase of compilation.
• Structural properties of language constructs
can be specified in different ways.
• Different styles of specification are useful for
different purpose.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 3

Different Formalisms

• Syntax diagram (SD),


• Backus-Naur form (BNF), and
• Context-free grammar (CFG).

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 4

Example

We take an example of simple variable


declaration in C languagea .
int a, b, c;
float x, y;
a Thispart of syntax can be expressed as a regular expression. But we shall
treat them as a context-free language.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 5

Syntax Diagram

type id ;
varDclr:

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 6

Exercise

How will the diagram change if the variables


are one or multidimensional arrays?

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 7

Context-Free Grammar

< VDP > → ε | < VD >< VD OPT >


< VD > → < TYPE > id < ID OPT >
< ID OPT > → ε | , id < ID OPT >
< VD OPT > → ; | ; < VD >< VD OPT >
< TYPE > → int | float | · · ·

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 8

Exercise

Modify the grammar so that the variables are


one or multidimensional arrays?

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 9

Backus-Naur Form

< VDP > ::= ε | < VD >; { < VD > ; }


< VD > ::= < TYPE > id { , id }

This formalism is a mixture of CFG and


regular expression. Here Kleene closure x∗ is
written as {x}.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 10

Exercise

Introduce multidimensional arrays in


Bacus-Naur form.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 11

Note

Our variable declaration is actually a regular


language with the following state transition
diagram:
type

;
0 2 4
type 1 id
,
id
3

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 12

Exercise

Is the variable declaration with


multidimensional arrays a regular language?

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 13

Note

• Why go for context-free grammar. Why


regular expression is not good enough.
• Consider arithmetic expressions (AE) with
integer constants (IC), identifiers (ID) and
four basic operators + - * /.
• There are regular expressions corresponding
to ID and IC.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 14

Exercise

A regular expression corresponding to AE is as


follows:
(IC|ID)((+ | - | * | /)(IC|ID))∗ .
Why it is not good enough?

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 15

Note

• SD is good for human understanding and


visualization.
• The BNF is very compact. It is used for
theoretical analysis and also in automatic
parser generating software.
• But for most of our discussion we shall
consider structural specification in the form
of a context-free grammar (CFG).
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 16

Note

There are non-context-free structural features


of a programming language that are handled
outside the formalism of grammar.
• Variable declaration and use:
... int sum ... sum = ..., this is of the
form xwywz and is not context-free.
• Matching of actual and formal parameters of
a function, matching of print format and the
corresponding expressions etc.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 17

Specification to Recognizer

The syntactic specification of a programming


language, written as a context-free grammar
can be be used to construct its parser by
synthesizing a push-down automaton (PDA)a .
a Thisis similar to the synthesis of a scanner from the regular expressions of
the token classes.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 18

Context-Free Grammar

• A context-free grammar (CFG) G is defined


by a 4-tuple of data (Σ, N, P, S), where Σ is
a finite set of terminals, N is a finite set of
non-terminals. P is a finite subset of
N × (Σ ∪ N )∗ . Elements of P are called
production or rewriting rules.
• The forth element S is a distinguished
member of N , called the start symbol
(axiom) of the grammar.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 19

Derivation and Reduction

• If p = (A, α) ∈ P , we write it as A → α (“A


produces α” or “A can be replaced by α”).
• If x = uAv ∈ (Σ ∪ N )∗ , then we can rewrite
x as y = uαv using the rule p ∈ P . Similarly,
y = uαv can be reduced to x = uAv.
• The first process is called derivation and the
second process is called reduction.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 20

Language of a Grammar

• The language of a grammar G is denoted by


L(G) ⊆ Σ∗ .
• x ∈ Σ∗ is an element of L(G), if starting
from the start symbol S, a finite sequence of
rewritinga can produce x.
• The sequence of derivation of x may be
written as S → xb .
a Inother word x can be reduced to the start symbol S.
b In fact it is the reflexive-transitive closure of the single step derivation. We

✫ ✪
abuse the same notation.

Lect IV: COM 5202: Compiler Construction Goutam Biswas


✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 21

Sentence and Sentential Form

• Any α ∈ (N ∪ Σ)∗ derivable from the start


symbol S is called a sentential form of the
grammar.
• If α ∈ Σ∗ , i.e. α ∈ L(G), then α is called a
sentence of the grammar.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 22

Parse Tree

Given a grammar G = (Σ, N, P, S), the parse


tree of a sentential form x of the grammar is a
rooted ordered tree with the following
properties:
• The root of the tree is labeled by the start
symbol S.
• The leaf nodes from left two right are
labeled by the symbols of x.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 23

Parse Tree

• Internal nodes are labeled by non-terminals


so that if an internal node is labeled by
A ∈ N and its children from left to right are
A1 A2 · · · An , then A → A1 A2 · · · An ∈ P .
• A leaf node may be labeled by ε is there is a
A → ε ∈ P and the parent of the leaf node
has label A.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 24

Example

Consider the following grammar for arithmetic


expressions:
G = ({id, ic, (, ), +, −, ∗, /}, {E, T, F }, P, E).
The set of production rules, P , are,
E → E+T |E−T |T
T → T ∗ F | T /F | F
F → id | ic | (E)

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 25

Example

Two derivations of the sentence id + ic ∗ id


are,
d1 : E → E + T → E + T ∗ F → E + F ∗ F →
T + F ∗ F → F + F ∗ F → F + ic ∗ F →
id + ic ∗ F → id + ic ∗ id
d2 :
E → E + T → T + T → F + T → id + T → id +
T ∗ F → id + F ∗ F → id + ic ∗ F → id + ic ∗ id
It is clear that a derivation sequence of a
sentential form need not be unique.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 26

Leftmost and Rightmost Derivations

• A derivation is leftmost if at every step the


leftmost nonterminal of a sentential form is
rewritten to get the next sentential form.
• Similarly, a rightmost derivation is defined
similarly.
• Any string derivable unrestricted, can also
be derived by leftmost or rightmost
derivation, (context-free).
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 27

Ambiguous Grammar

A grammar G is said to be ambiguous if there


is a sentence x ∈ L(G) that has two distinct
parse trees.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 28

Example

Our previous grammar of arithmetic


expressions is unambiguous. Following is an
ambiguous grammar for the same language:
G′ = ({id, ic, (, ), +, −, ∗, /}, {E}, P, E). The
production rules are,
E → E + E | E − E | E ∗ E | E/E |
id | ic | (E)
Number of non-terminals may be less in an
ambiguous grammar.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 29

Unique Parse Tree

+ T
E

F
T T *

id
F F

id ic
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 30

Non-Unique Parse Tree

E E

E + E E *
E

E E id
id E * E
+

id ic
ic
id

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 31

Note

• Leftmost(rightmost) derivation is unique in an


unambiguous grammar, but not in case of an
ambiguous grammar.
• d3 : E → E + E → id + E → id + E ∗ E →
id + ic ∗ E → id + ic ∗ id
d4 : E → E ∗ E → E + E ∗ E → id + E ∗ E →
id + ic ∗ E → id + ic ∗ id
• The length of derivation with an ambiguous grammar
may be shorter.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 32

if-else Ambiguity

Consider the following production rules:

S → if(E)S | if(E) S else S | · · ·


A statement of the form
if(E1) if(E2) S2 else S3
can be parsed in two different ways. Normally
we associate the else to the nearest ifa .
aC compiler gives you a warning to disambiguate using curly braces.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 33

if-else Ambiguity

S S

if ( E ) S if ( E ) S
else S

if ( E ) S else if ( E ) S
S

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 34

if-else Modified

Consider the following production rules:

S → if(E)S | if(E) ES else S | · · ·


ES → if(E) ES else ES | · · ·
We restrict the statement that can appear in
then-part. Now following statement has unique
parse tree.
if(E1) if(E2) S2 else S3

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 35

if-else Unambiguous

if ( E ) S

if ( E ) S else
S

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 36

Note

Consider the following grammar G1 for


arithmetic expressions:

E → T +E |T −E |T
T → F ∗ T | F/T | F
F → id | ic | (E)
Is L(G) = L(G1 )? What difference does the
grammar make?
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 37

Problem

Consider another version of the grammar G2 :


E → E ∗ T | E/T | T
T → T +F |T −F |F
F → id | ic | (E)
What is the difference in this case? Is
L(G) = L(G2 ).

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 38

Problem

• Construct parse trees and abstract syntax


trees corresponding to the input 25-2-10 for
G and G1 .
• Similarly, construct parse trees and abstract
syntax trees corresponding to the input
5+2*10 for G and G2 .
• In both the cases find evaluation orders.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 39

G and G1
G: G1:
E E

E − T T − E

− T F T − E
E F

F <ic,10> <ic,25> F T
T

<ic,2> F
F <ic,2>

<ic,10>
<ic,25>

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 40

G and G2
G:
E G2: E

E + T E T
*
F
T T * F
T
<ic, 10>
F F
T + F <ic, 10>

<ic, 5> <ic, 2>


<ic, 2>
F

<ic, 5>
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 41

• G: (25 − 2) − 10 = 13
G1 : 25 − (2 − 10) = 33
• G: 5 + (2 ∗ 10) = 25
G2 : (5 + 2) ∗ 10 = 70

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 42

A Few Important Transformations

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 43

Useless Symbols

A grammar may have useless symbols that can


be removed to produce a simpler grammar. A
symbol is useless if it does not appear in any
sentential form producing a sentence.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 44

Useless Symbols

We first remove all non-terminals that does not


produce any terminal string; then we remove all
the symbols (terminal or non-terminal) that
does not appear in any sentential form. These
two steps are to be followed in the given ordera .
a As an example (HU), all useless symbols will not be removed if done in the
reverse order on the grammar S → AB | a and A → a.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 45

ε-Production

If the language of the grammar does not have


any ε, then we can free the grammar from
ε-production rules. If ε is in the language, we
can have only the start symbol with
ε-production rule and the remaining grammar
free of it.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 46

Example

S → 0A0 | 1B1 | BB
A → C
B → S|A
C → S|ε
All non-terminals are nullable.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 47

Example

After removal of ε-productions.

S → 0A0 | 1B1 | BB | 00 | 11 | B | ε
A → C
B → S|A
C → S

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 48

Unit Production

A production of the form A → B may be


removed otherwise the attributes of B is to be
propagated to A.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 49

Normal Forms

A context-free grammar can be converted into


different normal forms e.g. Chomsky normal
form etc. These are useful for some decision
procedure e.g. CKY algorithm. But are not of
much importance for compilation.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 50

Left and Right Recursion

A CFG is called left-recursive if there is a


non-terminal A such that A ⇒∗ Aα after a
finite number of steps. It is necessary to remove
left-recursion for a top-down parsera .
a The right recursion can be similarly defined. It does not have so much
problem as we do not read input from right to left, but in a bottom-up parser
the stack size may be large due to right-recursion.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 51

Immediate Left-Recursion

A left-recursion is called immediate if a


production rule of the form A → Aα is present
in the grammar. It is easy to eliminate an
immediate left-recursion. We certainly have
production rules of the form
A → Aα1 | β
where the first symbol of β does not produce A
as the first symbola .
a Otherwise

✫ ✪
A will be a useless symbol.

Lect IV: COM 5202: Compiler Construction Goutam Biswas


✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 52

Parse Tree

The parse tree with this pair of production


rules looks as follows:
A

A
α

β
The yield is βα.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 53

Rotation

We can rotate the parse tree to get the same


yield, but without the left-recursion.
A

A’
β

α
The new rules are A → βA′ and A′ → αA′ | ε.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 54

Removal of Immediate Left-Recursion

The original grammar is

A → Aα1 | Aαk | · · · | Aαk


A → β1 | β2 | · · · | βl

The transformed grammar is


A → β1 A′ | β2 A′ | · · · | βl A′
A′ → α1 A′ | α2 A′ | · · · | αk A′ | ε

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 55

Example

Original grammar:

E → E+T |T
T → T ∗F |F
F → (E) | ic

The transformed grammar is

E → T E′ E ′ → +T E ′ | ε
T → FT′ T ′ → ∗F T ′ | ε
F → (E) | ic

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 56

Change in the Parse Tree

Consider the input ic+ic*ic:


E E

T E’
E + T
E’
F F + T
T T *
ε
ic T’
ic F
F F
* T’
ic F
ic ic ε
ic

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 57

Removal of Indirect Left-Recursion

Consider the following grammar:

A → Aab | Ba | Cb | b
B → Aa | Db
C → Ab | Da
D → Bb | Ca
The grammar has indirect left-recursion:
A → Ba → Aaa etc.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 58

Removal of Indirect Left-Recursion

• First we order the non-terminals:


A1 < A 2 < · · · < A n .
• Following algorithm eliminates direct and
indirect left-recursions.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 59

Algorithm

for i = 1 to n
for j = 1 to i − 1
replace rule of the form Ai → Aj γ
by Ai → δ1 γ | · · · | δk γ, where
Aj → δ1 | · · · | δk are the current
Aj productions
remove immediate left-recursion of
Ai -productions.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 60

Removal of Indirect Left-Recursion

• In the first iteration of the outer loop (i = 1),


immediate left recursions of A1 are removed.
• After this iteration any production rule of
the form A1 → Al β has l > 1.
• Similarly after the (i − 1)th iteration of the
outer-loop, for no Ak , (k = 1, · · · , i − 1),
there is any production rule of the form
Ak → Al γ, where k ≥ l.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 61

Removal of Indirect Left-Recursion

• In the ith iteration, the inner loop exposes


any recursion of Ai through Aj s,
j = 1, · · · , i − 1.
• It progressively transforms (j = 1, · · · , i − 1)
every production Ai → Aj β, until j ≥ i.
• Then the outer loop removes the immediate
left recursions of Ai .
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 62

Example

Let A < B < C < D. In the first-pass (i = 1) of


the outer loop, the immediate recursion of A is
removed.
′ ′ ′
A → BaA | CbA | bA
A′ → abA′ | ε
B → Aa | Db
··· ··· ···

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 63

Example

In the second-pass (i = 2) of the outer loop,


B → Aa are replaced and immediate
left-recursions on B are removed.
′ ′ ′
A → BaA | CbA | bA
A′ → abA′ | ε
B → BaA′ a | CbA′ a | bA′ a | Db
··· ··· ···

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 64

Example

′ ′ ′
A → BaA | CbA | bA
A′ → abA′ | ε
B → DbB ′ | bA′ aB ′ | CbA′ aB ′
B ′ → aA′ aB ′ | ε
C → Ab | Da
··· ··· ···

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 65

Example

In the third-pass (i = 3) of the outer loop,


A → BaA′ | CbA′ | bA′
A′ → abA′ | ε
′ ′ ′ ′ ′
B → DbB | bA aB | CbA aB
B ′ → aA′ aB ′ | ε
C → BaA′ b | CbA′ b | bA′ b | Da
··· ··· ···

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 66

Example

A → BaA′ | CbA′ | bA′


A′ → abA′ | ε
B → DbB ′ | bA′ aB ′ | CbA′ aB ′
B ′ → aA′ aB ′ | ε
C → DbB ′ aA′ b | bA′ aB ′ aA′ b | CbA′ aB ′ aA′ b
CbA′ b | bA′ b | Da
··· ··· ···

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 67

Left Factoring

• More than one production rules of a


non-terminal, with the same prefix at the
right hand side, creates the problem of rule
selection in a top-down parser.
• The grammar is transformed by left factoring
so that the prefixes of the right-hand of
different rules of a non-terminal are different.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 68

Example

If we have production rules of the form


A → xBα, A → xCβ, A → xDγ, we transform
them to A → xE and E → Bα | Cβ | Dγ,
where x ∈ Σ∗ .

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 69

Substitution

• The left factor may not be visible due to the


presence of non-terminals.
• It may be necessary to substitute the
leftmost non-terminals of the right-hand
sides of production rules.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 70

Example

• Let A → Bb | Cd, B → abB | b, C → adC | d


before substitution.
• After the substitution we get,
A → abBb | bb | adCd | dd, B → abB | b,
C → adC | d.
• Now the rules of A can be factored.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 71

Parsing

• Using the grammar as a specification, a


parser tries to construct the parse tree
corresponding to the input (a program to
compile). This construction may be
top-down or bottom-up.
• The top-down parsing may be viewed as a
pre-order construction and the bottom-up
parsing as a post-order construction of the
parse tree.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 72

Top-Down Parsing

• A top-down parser starts from the start


symbol (S) to generate the input string of
tokens (x).
• When a top-down parser tries to build the
subtree of an internal node, the non-terminal
A present at the node is known.
• It decides the appropriate production rule of
A using the information from the input.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 73

Top-Down Parsing

• The node is expanded to its children and


they are labeled by the symbols of the
chosen production rule of A.
• The parser continues the construction of the
tree from the left child (left to right) of A.
• If the left child is a terminal it matches with
the leftmost token of the token stream.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 74

Top-Down Parsing

• Once a terminal is matched with the token,


the parser continues with the next pre-order
node.
• For a context-free grammar the choice of the
appropriate rule of a non-terminal, may not
be deterministic. And it may be necessary to
backtrack.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 75

Top-Down Parsing

Consider the grammar: S → aSa | bSb | c


S S

S
a S a a S a

b S b

Input: abcba

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 76

Non-Determinism

• The situation will be different if the rule


S → c is replaced by S → a or S → b or
S → ε.
• Looking at fixed number of incoming tokens
we cannot decide the rule to expand S.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 77

Note: Top-Down

• Input is always read (consumed) from


left-to-right.
• A snapshot of a top-down parser on an input
x is as follows.
• A part of the input u has already been
generated (tokens consumed) i.e. x = uv and
the parser has the sentential form uAα.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 78

Note: Top-Down

• The parser tries to decide the correct rule for


A to get the next sentential form.
• It always expands the leftmost variable,
following the leftmost derivation.
• The choice of rule depends on the initial part
of the remaining input.
• A choice of production rule may lead to a
dead-end and backtracking.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 79

Example

Consider the following grammar:


S → aSa | bSb | a | b
Given a sentential form aabaSabaa and the
remaining portion of the input ab· · · it is
impossible to decide by seeing one or two or
any finite number of input symbols, whether to
use the first or the third production rule to
generate ‘a’ of the input.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 80

Example

Consider the following grammar:


S → aSa | bSb | c
Given a sentential form aabaSabaa and the
remaining portion of the input abc· · · , it is
clear from the first element of input that the
first production rule is to be applied to get the
next sentential form.

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 81

Bottom-Up Parsing

• A bottom-up parser starts from the input x


and tries to reduce it to the start symbol S.
• The internal nodes of the syntax-tree are
constructed in post-order.
• The root of a subtree is constructed after its
children are constructed and labeled (already
known).
• Each Token is a sub-tree of label 1.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 82

Bottom-Up Parsing

Consider the grammar: S → aSa | bSb | c


5 S
4

S S
1 2 3

a b c c b c b

Input: abcba

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 83

Note: Bottom-Up

• In a bottom-up parser on the input x, the


parsing proceeds as follows:
• The current sentential form is αv where
α ∈ Σ ∪ N , and the remaining portion of the
input is v. If x = uv, then α ⇒∗ u.
• At this point the parser tries to find a β so
that α′ βv ′ = αv, A → β ∈ P and α′ Av ′ is
the previous sentential form.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 84

Note: Bottom-Up

There may be more than one such choices


possible, and some of them may be incorrect. If
β is always a suffix of α, then we are following a
sequence of right-most derivation in reverse
order (reductions).

✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas
✬ ✩
BS-MS & MS-PhD in Maths and Computing: SMCS (IACS) 85

Example

Consider the grammar:

E → E + E | E ∗ E | ic
Given the input ic+ic*ic· · · , many reductions
are possible and in this case all of them will
finally lead to the start symbol. The previous
sentential form can be any one of the following
three, and there are many more:
E+ic*ic· · · , ic+E*ic· · · , ic+ic*E· · · etc. The
first one is the right sentential form.
✫ ✪
Lect IV: COM 5202: Compiler Construction Goutam Biswas

You might also like