Unit 3 Class
Unit 3 Class
A parser usually checks all data provided to ensure it is sufficient to build a data structure in the
form of a parse tree or syntax tree.
Context-Free Grammar
A context-free grammar has four components:
G=(V,T,P,S)
● G is a grammar, which consists of a set of production rules. It is used to generate the
strings of a language.
● T is the final set of terminal symbols. It is denoted by lower case letters.
● V is the final set of non-terminal symbols. It is denoted by capital letters
● P is a set of production rules, which is used for replacing non-terminal symbols (on the
left side of production) in a string with other terminals (on the right side of production).
● S is the start symbol used to derive the string
Example 1
Construct CFG for the language having any number of a's over the set ∑={a}
Regular Expression= a*
Production rule for the Regular Expression is as follows −
S->aS → rule 1
S-> ε → rule 2
Now if we want to derive a string "aaaaaa" we can start with start symbol
Start with start symbol:
s rule
aS 1
aaS 1
aaaS 1
aaaaS 1
aaaaaS 1
aaaaaaS 1
aaaaaa 2
Example 2:
Let a CFG {V,T,P,S} be,
V = {S},
T = {a, b},
Starting symbol = S,
P = S → SS | aSb | ε
One derivation from the above CFG is “abaabb”, Derive this step by step.
Solution:
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb
Types of Parsers in Compiler Design
Top-Down Parsing
A. Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivation or more than one parse tree for the given input string.
Example 1:
Construct String "id + id - id" from given grammar and check whether the given grammar G is
ambiguous or not.
1. E → E + E
2. E → E - E
3. E → id
1. E → E - E
2. →E+E-E
3. → id + E - E
4. → id + id - E
5. → id + id - id
Since there are two leftmost derivation for a single string "id + id - id", the grammar G is
ambiguous.
B. Left recursion
A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS. It is also called Immediate Left Recursion
A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position
on the right side of production.
Elimination of Left Recursion
Left Recursion can be eliminated by introducing new non-terminal A such that.
Example1 −
Consider the Left Recursion from the Grammar.
E → E + T|T
T → T * F|F
F → (E)|id
Eliminate immediate left recursion from the Grammar.
Solution:
First Comparing E → E + T|T
with A → A α |β
Second Comparing T → T ∗ F|F
with A → A α |β
Third Production F → (E)|id does not have any left recursion
E → TE′
E′ → +TE′| ε
T → FT′
T →* FT′|ε
F → (E)| id
Example2 −
Eliminate the left recursion for the following Grammar.
S → a|^|(T)
T → T, S|S
Solution:
S→ a|^(T)
T→ ST′
T′ →,ST′| ε
.
C. Left Factoring
Grammar With Common Prefixes-
If RHS of more than one production starts with the same symbol, then such a grammar is called
as Grammar With Common Prefixes
Example-
A → αβ1 / αβ2 / αβ3 (Grammar with common prefixes)
Rule:
A → αβ1 / αβ2 ——— (1)
1) A → αA’
2) A → β1 / β2
Problem-01:
Do left factoring in the following grammar-
S → iEtS / iEtSeS / a
E→b
Solution:
S → iEtSS’ / a
E→b
Problem-02:
Do left factoring in the following grammar-
Solution-
Step-01:
A → aA’
A’ → AB / Bc / Ac
A’ → AD / Bc
D→B/c
1.1 Backtracking:
1. Whenever a Non-terminal spend the first time then go with the first alternative and compare it
with the given I/P String
2. If matching doesn’t occur then go with the second alternative and compare with the given I/P
String.
3. If matching is not found again then go with the alternative and so on.
4. Moreover, If matching occurs for at least one alternative, then the I/P string is parsed
successfully.
Then, we have-
First(A) = { a , d , g }
Example 2-
Consider the production rule-
S → ABCD
A→ b
B→ c
C→d
D→e
Then, we have-
First(S) = { b }
First(A) = { b }
First(B) = { c }
First(C) = { d }
First(D) = { e }
Example 3-
Consider the production rule-
S → ABCD
A→ b/ε
B→ c
C→d
D→e
Then, we have-
First(S) = { b,c }
First(A) = { b,ε }
First(B) = { c }
First(C) = { d }
First(D) = { e }
Follow Function-
Follow(A) is a set of terminal symbols that appear immediately to the right of A.
Example 1-
Consider the production rule-
S → ABCD
A→ b/ε
B→ c
C→d
D→e
Solution-
Then, we have-
Example 2-
Calculate the first and follow functions for the given grammar-
S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈
Solution-
The first and follow functions are as follows-
First(S) = { a } Follow(S) = { $ }
First(B) = { c } Follow(B) = { g , f , h }
First(D) = { g , f , ∈ } Follow(D) = { h }
First(E) = { g , ∈ } Follow(E) = { f , h }
Example 3-
Calculate the first and follow functions for the given grammar and construct parsing table-
E→E+T/T
T→TxF/F
F → (E) / id
First(F) = { ( , id } Follow(F) = { x , + , $ , ) }
Parsing Table:
id + * ( ) $
Eg.
A parser that reads and understand an operator precedence is called as Operator Precedence
Parser.
Ambiguous grammars are not allowed in any parser except operator precedence parser.
A Grammar G is Operator Grammar if it has the following properties −
● Production should not contain ϵ
● There should not be two adjacent non-terminals(variables)
Rule-02:
● An identifier is always given the higher precedence than any other symbol.
● $ symbol is always given the lowest precedence.
Rule-03:
● If two operators have the same precedence, then we go by checking their associativity.
Example 1:
E → EAE | id
A→+|x
1. Construct the operator precedence parser and parse the string id + id x id.
2. Construct the operator precedence parser and parse the string id + id + id.
Solution for 1:
E → E + E | E x E | id
id + x $
id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
x ⋖ ⋗ ⋗ ⋗
$ ⋖ ⋖ ⋖
Draw Stack of operator precedence the string id + id x id.
2.2 LR Parser
In the LR parsing, "L" stands for left-to-right scanning of the input. "R" stands for constructing a
right most derivation in reverse. "K" is the number of input symbols of the look ahead used to
make number of parsing decision.
LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR
parsing.
Here LR(0) and SLR(1) type uses canonical collection of LR(0) items , and LALR(1) and
CLR(1) uses canonical collection of LR(1) items.
Augment Grammar
Augmented grammar G` will be generated if we add one more production in the given grammar
G. It helps the parser to identify when to stop the parsing and announce the acceptance of the
input.
2. 2. 1 : LR(0)
An LR (0) item is a production G with dot at some position on the right side of the production.
LR(0) items is useful to indicate that how much of the input has been scanned up to a given point
in the process of parsing.
Example1
Given grammar:
1. S → AA
2. A → aA | b
Add Augment Production and insert '•' symbol at the first position for every production in G
1. S` → •S
2. S → •AA
3. A → •aA
4. A → •b
LR(0) Table
● If a state is going to some other state on a terminal then it correspond to a shift move.
● If a state is going to some other state on a variable then it correspond to goto move.
● If a state contain the final item in the particular row then write the reduce node
completely.
SLR(1) Table:
SLR (1) refers to simple LR Parsing. To construct SLR (1) parsing table, we use canonical
collection of LR (0) item. Difference is in the SLR (1) parsing, we place the reduce move only in
the follow of left hand side.
Example 1:
E→E+T|T
T→T*F|F
F → id
Add Augment Production and insert '•' symbol at the first position for every production in G
E → .E
E → .E + T
E→ . T
T → .T * F
T→ .F
F → .id
Parsing Table:
No conflict in table so this is SLR(1) grammar
E→E+T|T
T→T*F|F
F → (E) | id
CLR refers to canonical lookahead. CLR parsing use the canonical collection of LR (1) items to
build the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as
compare to the SLR (1) parsing.
In the CLR (1), we place the reduce node only in the lookahead symbols.
LR (1) item
The look ahead always add ,$ symbol for the argument production.
Example
CLR ( 1 ) Grammar
1. S → AA
2. A → aA
3. A → b
Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the lookahead.
1. S` → •S, $
2. S → •AA, $
3. A → •aA, a/b
4. A → •b, a/b
CLR (1) Parsing table:
LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the canonical
collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look ahead
are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example:
LALR ( 1 ) Grammar
1. S → AA
2. A → aA
3. A → b
Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the look ahead.
1. S` → •S, $
2. S → •AA, $
3. A → •aA, a/b
4. A → •b, a/b