0% found this document useful (0 votes)
45 views23 pages

Unit 3 Class

A parser checks data to build a parse tree. There are two types of parsing: top-down and bottom-up. Top-down parsing constructs the tree starting from the root node, while bottom-up parsing starts from the leaves. A context-free grammar has four components: variables, terminals, production rules, and a start symbol. LL(1) parsing uses a table and one token of lookahead. It constructs parsers from unambiguous, left-recursive, and left-factored grammars. The first and follow functions are used to populate the parsing table.

Uploaded by

Shoaib Sidd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views23 pages

Unit 3 Class

A parser checks data to build a parse tree. There are two types of parsing: top-down and bottom-up. Top-down parsing constructs the tree starting from the root node, while bottom-up parsing starts from the leaves. A context-free grammar has four components: variables, terminals, production rules, and a start symbol. LL(1) parsing uses a table and one token of lookahead. It constructs parsers from unambiguous, left-recursive, and left-factored grammars. The first and follow functions are used to populate the parsing table.

Uploaded by

Shoaib Sidd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

PARSER

A parser usually checks all data provided to ensure it is sufficient to build a data structure in the
form of a parse tree or syntax tree.

Parsing is of two types: top down parsing and bottom up parsing.

Context-Free Grammar
A context-free grammar has four components:
G=(V,T,P,S)
● G is a grammar, which consists of a set of production rules. It is used to generate the
strings of a language.
● T is the final set of terminal symbols. It is denoted by lower case letters.
● V is the final set of non-terminal symbols. It is denoted by capital letters
● P is a set of production rules, which is used for replacing non-terminal symbols (on the
left side of production) in a string with other terminals (on the right side of production).
● S is the start symbol used to derive the string
Example 1
Construct CFG for the language having any number of a's over the set ∑={a}
Regular Expression= a*
Production rule for the Regular Expression is as follows −
S->aS → rule 1
S-> ε → rule 2
Now if we want to derive a string "aaaaaa" we can start with start symbol
Start with start symbol:
s rule

aS 1

aaS 1

aaaS 1

aaaaS 1

aaaaaS 1

aaaaaaS 1

aaaaaa 2

Example 2:
Let a CFG {V,T,P,S} be,
V = {S},
T = {a, b},
Starting symbol = S,
P = S → SS | aSb | ε
One derivation from the above CFG is “abaabb”, Derive this step by step.

Solution:
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb
Types of Parsers in Compiler Design
Top-Down Parsing

Top-Down Parsing is based on Left Most Derivation.


The process of constructing the parse tree which starts from the root and goes down to the leaf is
Top-Down Parsing.
Top-Down Parsers constructs from the Grammar which is free from ambiguity, free from left
recursion and free from left factoring.

A. Ambiguity in Grammar

A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivation or more than one parse tree for the given input string.

Example 1:

Construct String "id + id - id" from given grammar and check whether the given grammar G is
ambiguous or not.

1. E → E + E
2. E → E - E
3. E → id

Solution: String can be derive in 2 ways.

First Leftmost derivation


1. E → E + E
2. → id + E
3. → id + E - E
4. → id + id - E
5. → id + id- id

Second Leftmost derivation

1. E → E - E
2. →E+E-E
3. → id + E - E
4. → id + id - E
5. → id + id - id

Since there are two leftmost derivation for a single string "id + id - id", the grammar G is
ambiguous.

B. Left recursion

A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS. It is also called Immediate Left Recursion

A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position
on the right side of production.
Elimination of Left Recursion
Left Recursion can be eliminated by introducing new non-terminal A such that.

Example1 −
Consider the Left Recursion from the Grammar.
E → E + T|T
T → T * F|F
F → (E)|id
Eliminate immediate left recursion from the Grammar.
Solution:
First Comparing E → E + T|T
with A → A α |β
Second Comparing T → T ∗ F|F
with A → A α |β
Third Production F → (E)|id does not have any left recursion
E → TE′
E′ → +TE′| ε
T → FT′
T →* FT′|ε
F → (E)| id

Example2 −
Eliminate the left recursion for the following Grammar.
S → a|^|(T)
T → T, S|S
Solution:
S→ a|^(T)
T→ ST′
T′ →,ST′| ε
.
C. Left Factoring
Grammar With Common Prefixes-
If RHS of more than one production starts with the same symbol, then such a grammar is called
as Grammar With Common Prefixes

Example-
A → αβ1 / αβ2 / αβ3 (Grammar with common prefixes)

● To remove this confusion, we use left factoring.

Rule:
A → αβ1 / αβ2 ——— (1)
1) A → αA’
2) A → β1 / β2
Problem-01:
Do left factoring in the following grammar-

S → iEtS / iEtSeS / a

E→b

Solution:

The left factored grammar is-

S → iEtSS’ / a

S’ → eS/∈/a (S’ → ∈/eS)

E→b

Problem-02:
Do left factoring in the following grammar-

A → aAB / aBc / aAc

Solution-
Step-01:
A → aA’

A’ → AB / Bc / Ac

Again, this is a grammar with common prefixes.


Step-02:
A → aA’

A’ → AD / Bc

D→B/c

This is a left factored grammar.

1. Top Down parsing:

1.1 Backtracking:

1. Whenever a Non-terminal spend the first time then go with the first alternative and compare it
with the given I/P String
2. If matching doesn’t occur then go with the second alternative and compare with the given I/P
String.
3. If matching is not found again then go with the alternative and so on.
4. Moreover, If matching occurs for at least one alternative, then the I/P string is parsed
successfully.

1.​​2. LL(1) or Table Driver or Predictive Parser –


In LL1, first L stands for Left to Right and second L stands for Left-most Derivation. 1 stands for
a number of Look Ahead tokens used by a parser while parsing a sentence.
LL(1) parsing is constructed from the grammar which is free from left recursion, common prefix,
and ambiguity.
First Function:
First(A) is a set of terminal symbols that begin in strings derived from A
Example 1-
Consider the production rule-

A → abc / def / ghi

Then, we have-

First(A) = { a , d , g }
Example 2-
Consider the production rule-

S → ABCD

A→ b

B→ c

C→d

D→e

Then, we have-

First(S) = { b }

First(A) = { b }

First(B) = { c }

First(C) = { d }

First(D) = { e }

Example 3-
Consider the production rule-

S → ABCD

A→ b/ε

B→ c

C→d

D→e

Then, we have-

First(S) = { b,c }

First(A) = { b,ε }

First(B) = { c }

First(C) = { d }

First(D) = { e }
Follow Function-
Follow(A) is a set of terminal symbols that appear immediately to the right of A.

Rules For Calculating Follow Function-


1. For the start symbol S, place $ in Follow(S).
2. For any production rule A → αB,
Follow(B) = Follow(A)
3. For any production rule A → αBβ,

● If ∈ ∉ First(β), then Follow(B) = First(β)

● If ∈ ∈ First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A)

Example 1-
Consider the production rule-

S → ABCD

A→ b/ε

B→ c

C→d

D→e

Solution-
Then, we have-

First(S) = { b,c } Follow(S) = {$}

First(A) = { b,ε } Follow(A) = First(B)= {c}

First(B) = { c } Follow(B) = First(C)= {d}

First(C) = { d } Follow(C) = First(D)= {e}

First(D) = { e } Follow(D) = Follow(S)= {$}

Example 2-
Calculate the first and follow functions for the given grammar-
S → aBDh

B → cC

C → bC / ∈

D → EF

E→g/∈

F→f/∈

Solution-
The first and follow functions are as follows-

First Functions- Follow Functions-

First(S) = { a } Follow(S) = { $ }

First(B) = { c } Follow(B) = { g , f , h }

First(C) = { b , ∈ } Follow(C) = Follow(B) = { g , f , h }

First(D) = { g , f , ∈ } Follow(D) = { h }

First(E) = { g , ∈ } Follow(E) = { f , h }

First(F) = { f , ∈ } Follow(F) = Follow(D) = { h }

Example 3-
Calculate the first and follow functions for the given grammar and construct parsing table-

E→E+T/T

T→TxF/F

F → (E) / id

First Functions- Follow Functions-

First(E) = First(T) = First(F) = { ( , id } Follow(E) = { $ , ) }


First(E’) = { + , ∈ } Follow(E’) = Follow(E) = { $ , ) }

First(T) = First(F) = { ( , id } Follow(T) = { + , $ , ) }

First(T’) = { x , ∈ } Follow(T’) = Follow(T) = { + , $ , ) }

First(F) = { ( , id } Follow(F) = { x , + , $ , ) }

Parsing Table:

id + * ( ) $

E E –> TE’ E –> TE’

E’ E’ –> +TE’ E’ –> ε E’ –> ε

T T –> FT’ T –> FT’

T’ T’ –> ε T’ –> *FT’ T’ –> ε T’ –> ε

F F –> id F –> (E)


2. Bottom Up Parsers

Build the parse tree from leaves to root.

Eg.

2.1 Operator Precedence Parser/ Shift Reduce Parsers:

A parser that reads and understand an operator precedence is called as Operator Precedence
Parser.
Ambiguous grammars are not allowed in any parser except operator precedence parser.
A Grammar G is Operator Grammar if it has the following properties −
● Production should not contain ϵ
● There should not be two adjacent non-terminals(variables)

Example1 − Verify whether the following Grammar is operator Grammar or not.


E → E A E | id
A→+|∗
Solution
No, it is not an operator Grammar as it does not satisfy property 2 of operator Grammar.
As it contains two adjacent Non-terminals on R.H.S of production E → E A E.
We can convert it into the operator Grammar by substituting the value of A in E → E A E.
E → E + E |E * E | id.
Defining Precedence Relations-
The precedence relations are defined using the following rules-
Rule-01:
● a ⋗ b means that terminal "a" has the higher precedence than terminal "b".
● a ⋖ b means that terminal "a" has the lower precedence than terminal "b".
● a ≐ b means that the terminal "a" and "b" both have same precedence.

Rule-02:
● An identifier is always given the higher precedence than any other symbol.
● $ symbol is always given the lowest precedence.

Rule-03:
● If two operators have the same precedence, then we go by checking their associativity.

Example 1:

E → EAE | id

A→+|x

1. Construct the operator precedence parser and parse the string id + id x id.
2. Construct the operator precedence parser and parse the string id + id + id.

Solution for 1:

E → E + E | E x E | id

The terminal symbols in the grammar are { id, + , x , $ }

We construct the operator precedence table as-

id + x $

id ⋗ ⋗ ⋗

+ ⋖ ⋗ ⋖ ⋗

x ⋖ ⋗ ⋗ ⋗

$ ⋖ ⋖ ⋖
Draw Stack of operator precedence the string id + id x id.

Draw shift reduce parsing of the string id + id + id.

2.2 LR Parser

In the LR parsing, "L" stands for left-to-right scanning of the input. "R" stands for constructing a
right most derivation in reverse. "K" is the number of input symbols of the look ahead used to
make number of parsing decision.

LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR
parsing.

Here LR(0) and SLR(1) type uses canonical collection of LR(0) items , and LALR(1) and
CLR(1) uses canonical collection of LR(1) items.

Augment Grammar

Augmented grammar G` will be generated if we add one more production in the given grammar
G. It helps the parser to identify when to stop the parsing and announce the acceptance of the
input.

2. 2. 1 : LR(0)

An LR (0) item is a production G with dot at some position on the right side of the production.

LR(0) items is useful to indicate that how much of the input has been scanned up to a given point
in the process of parsing.

In the LR (0), we place the reduce node in the entire row.

Example1

Given grammar:
1. S → AA
2. A → aA | b

Add Augment Production and insert '•' symbol at the first position for every production in G

1. S` → •S
2. S → •AA
3. A → •aA
4. A → •b

LR(0) Table

● If a state is going to some other state on a terminal then it correspond to a shift move.

● If a state is going to some other state on a variable then it correspond to goto move.

● If a state contain the final item in the particular row then write the reduce node
completely.
SLR(1) Table:

2.2.2: SLR (1) Parsing

SLR (1) refers to simple LR Parsing. To construct SLR (1) parsing table, we use canonical
collection of LR (0) item. Difference is in the SLR (1) parsing, we place the reduce move only in
the follow of left hand side.

Example 1:

E→E+T|T

T→T*F|F

F → id

Add Augment Production and insert '•' symbol at the first position for every production in G

E → .E
E → .E + T
E→ . T
T → .T * F
T→ .F
F → .id

First (F) = {id} Follow (E) == {+, $}

First (T) = {id} Follow (T) = {*, +, $}

First (E) = {id} Follow (F) = {*, +, $}

Parsing Table:
No conflict in table so this is SLR(1) grammar

Assignment Example 2: SLR(1)

Find the canonical collection of sets of LR (0) items for Grammar -

E→E+T|T

T→T*F|F

F → (E) | id

Step1− Construct the Augmented Grammar and number the productions


(0) E′ → E
(1) E → E + T
(2) E → T
(3) T → T * F
(4) T → F
(5) F → (E)
(6) F → id
2.2.3: CLR(1) Parsing:

CLR refers to canonical lookahead. CLR parsing use the canonical collection of LR (1) items to
build the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as
compare to the SLR (1) parsing.

In the CLR (1), we place the reduce node only in the lookahead symbols.

LR (1) item

LR (1) item is a collection of LR (0) items and a look ahead symbol.

LR (1) item = LR (0) item + look ahead

The look ahead always add ,$ symbol for the argument production.

Example

CLR ( 1 ) Grammar

1. S → AA
2. A → aA
3. A → b

Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the lookahead.

1. S` → •S, $
2. S → •AA, $
3. A → •aA, a/b
4. A → •b, a/b
CLR (1) Parsing table:

2.2.4 : LALR (1) Parsing:

LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the canonical
collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look ahead
are combined to form a single set of items

LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.

Example:
LALR ( 1 ) Grammar

1. S → AA
2. A → aA
3. A → b

Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the look ahead.

1. S` → •S, $
2. S → •AA, $
3. A → •aA, a/b
4. A → •b, a/b

You might also like