Unit - 3 Mid - 1
Unit - 3 Mid - 1
Unit - 3 Mid - 1
CD : COMPILER DESIGN
Parsing Unit no : 3
Parsing
(01CE0714)
Scanner –
Parser
Interaction
For well-formed programs, the parser constructs a
parse tree and passes it to the rest of the compiler for
further processing.
If a compiler had to process only correct programs,
its design and implementation would be greatly
simplified.
But programmers frequently write incorrect
programs, and a good compiler should assist the
programmer in identifying and locating errors.
We know that programs can contain errors at many
Syntax Error different levels. For example, errors can be
Handling Lexical : Such a misspelling an identifier, keyword, or
operator
Syntactic : Such as arithmetic expression with
unbalanced parenthesis
Semantic : Such as an operator applied to
incompatible operand
Logical : Such as infinitely recursive call
The error handler in a parser has simple-to-state
goals :
It should report the presence of errors clearly and
Syntax Error accurately.
Handling It should recover from each error quickly enough to
be able to detect sub sequent errors.
It should not significantly slow down the processing
of correct programs.
Parse tree is graphical representation of symbol.
Symbol can be terminal as well as non-terminal.
The root of parse tree is start symbol of the string.
Parse tree follows the precedence of operators. The
deepest sub-tree traversed first. So, the operator in
the parent node has less precedence over the
Parse tree operator in the sub-tree.
Example:-
Syntax tree Parse tree
Parse tree
V/S
Syntax tree
Grammars are classified on the basis of production they use
(Chomsky, 1963).
Given below are class of grammar where each class has its own
characteristics and limitations.
1. Type-0 Grammar:- Recursively Enumerable Grammar
These grammars are known as phrase structure grammars.
Their productions are of the form,
α = β, where both α and β are terminal and non-terminal
Classification symbols.
This type of grammar is not relevant to Specifications of
of Grammar programming languages.
2. Type-1 Grammar:- Context Sensitive Grammar
These Grammars have rules of the form αAβ → αΥβ with A
nonterminal and α, β, Υ strings of terminal and nonterminal
symbols. The string α and β may be empty but Υ must not be
nonempty.
Eg:- AB->CDB
Ab->Cdb
A->b
3. Type-2 Grammar:- Context Free Grammar
These are defined by the rules of the form A → Υ, with A
nonterminal and Υ a sting of terminal and nonterminal
Symbols. These grammar can be applied independent of its
context so it is Context free Grammar (CFG). CFGs are ideally
suited for programming language specification.
Eg:- A → aBc
Classification
4. Type-3 Grammar:- Regular Grammar
of Grammar It restrict its rule of single nonterminal on the left hand side
and a right-hand side consisting of a single terminal, possibly
followed by a single nonterminal. The rule S → ϵ is also
allowed if S does not appear on the right side of any rule.
Eg:- A → ϵ
A→a
A → aB
Let production P1 of grammar G be of the form
P1 : A::= α
and let β be a string such that β = γAθ, then
replacement of A by α in string β constitutes a
derivation according to production P1.
• Example
Derivation <Sentence> ::= <Noun Phrase><Verb Phrase>
<Noun Phrase> ::= <Article> <Noun>
<Verb Phrase> ::= <Verb><Noun Phrase>
<Article> ::= a | an | the
<Noun> ::= boy | apple
<Verb> ::= ate
The following strings are sentential form.
<Sentence>
<Noun Phrase> <Verb Phrase>
the boy <Verb Phrase>
Derivation the boy <verb> <Noun Phrase>
the boy ate <Noun Phrase>
the boy ate an apple
String : id + id * id
The process of deriving string is called Derivation
and graphical representation of derivation is called
derivation tree or parse tree.
Derivation is a sequence of a production rules, to get
the input string.
During parsing we take two decisions:
1) Deciding the non terminal which is to be replaced.
Derivation
2) Deciding the production rule by which non
terminal will be replaced.
For this we are having:
1) Left most derivation
2) Right most derivation
A derivation of a string S in a grammar G is a left most
derivation if at every step the left most non terminal is
replaced.
Example:
Production:
S S + S
S S * S
Left S id
Derivation String:- id+id*id
S S * S
SS + S * S
S id + S * S
S id + id * S
S id + id * id
A derivation of a string S in a grammar G is a right most
derivation if at every step the Right most non terminal
is replaced.
Example:
Production:
S S + S
S S * S
Right S id
Derivation String:- id+id*id
S S + S
SS + S * S
S S + S * id
S S + id * id
S id + id * id
Let production P1 of grammar G be of the form
P1 : A::= α
and let σ be a string such that σ = γ α θ, then replacement of
α by A in string σ constitutes a reduction according to
production P1.
Step String
0 the boy ate an apple
Reduction 1 <Article> boy ate an apple
2 <Article> <Noun> ate an apple
3 <Article> <Noun> <Verb> an apple
4 <Article> <Noun> <Verb> <Article> apple
5 <Article> <Noun> <Verb> <Article> <Noun>
6 <Noun Phrase> <Verb> <Article> <Noun>
7 <Noun Phrase> <Verb> <Noun Phrase>
8 <Noun Phrase> <Verb Phrase>
9 <Sentence>
• It implies the possibility of different interpretation of a
source string.
• Existence of ambiguity at the level of the syntactic
structure of a string would mean that more than one parse
tree can be built for the string. So string can have more
than one meaning associated with it.
• A grammar that produces more than one parse tree for
Ambiguous some sentence is said to be ambiguous.
Grammar Ambiguous Grammar
E Id| E + E | E * E a+b*c a+b*c
+ *
Id a | b | c
+ c
a *
Both tree have same
string : a + b * c b c a b
E E + E | E * E | id
By parse tree:-
E
E + E Parse tree-1
Ambiguous id E * E
Grammar
id id
E
E * E
Parse tree-2
E + E id
id id
Prove that given grammar is ambiguous grammar:
E a | Ea | bEE | EEb | EbE
Ans:-
Assume string baaab
E bEE
baE
Left derivation-1
Ambiguous baEEb
Grammar baaEb
baaab
Example:- OR
E EEb
bEEEb
Left derivation-2
baEEb
baaEb
baaab
In leftmost derivation by scanning the input from left
to right, grammars of the form A A x may cause
endless recursion.
Such grammars are called left-recursive and they
must be transformed if we want to use a top-down
parser.
Left Example:
Recursion E Ea | E+b | c
Assign an ordering from A1,…….An to the non terminal of
the grammar;
For i = 1 to n do
begin
for j=1 to i-1 do
begin
replace each production of the form Ai AiΥ
by the productions Ai δ1Υ | δ2Υ |…… | δkΥ
Algorithm where Aj δ1 | δ2 |……….. | δk are all current
Aj production.
end
eliminate the intermediate left recursion
among Ai productions.
end
• There are three types of left recursion:
direct (A A x)
indirect (A B C, B A )
hidden (A B A, B )
Left
To eliminate direct left recursion replace
Recursion
A A1 | A2 | ... | Am | 1 | 2 | ... | n
with
A 1 A’ | 2 A’ | ... | n A’
A’ 1 A’ | 2 A’ | ... | m A’ |
1. E E + T | T
T T * F | F
F (E) | id
Ans.
A A |
Replace with,
A A’
Example A’ A’ |
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
1. A Aad | Afg | b
Ans:-
Remove left recursion
A bA’
A’ adA’ | fgA’ |
Example 2. A Acd | Ab | jk
B Bh | n
Ans :-
Remove left recursion
A jkA’
A’ cdA’ | bA’ |
B nB’
B’ hB’ |
3. E Aa | b
A Ac | Ed |
Ans:-
Replace E,
E Aa | b
Example A Ac | Aad | bd |
Remove left recursion
E Aa | b
A bdA’ | A’
A cA’ | adA’ |
Left factoring is a grammar transformation that is useful
for producing a grammar suitable for predictive parsing.
Consider,
S if E then S else S | if E then S
Which of the two productions should we use to
expand non-terminal S when the next token is if?
We can solve this problem by factoring out the
common part in these rules. This way, we are
Left postponing the decision about which rule to choose
Factoring until we have more information (namely, whether
there is an else or not).
This is called left factoring
For each non terminal A find the longest prefix α common to
two or more of its alternative.
If α!=E, i.e, there is a non trivial common prefix, replace all the
A productions A αβ1 | αβ2 | ….. | αβn | Υ ,
where Υ represents all the alternative which do not
starts with α by,
A αA’ | Υ
Algorithm A’ β1 | β2 |…… | βn
Here, A’ is new non terminal, repeatedly apply this
transformation until no two alternatives for a non-terminal
have a common prefix.
A 1 | 2 |...| n |
becomes
Left A A”|
Factoring A” 1 | 2 |...| n
E -> T+E | T
T -> V*T | V
V-> id
Ans.
E TE’
E’ +E |
T VT’
Example T’ *T |
V id
1. S cdLk | cdk | cd
L mn |
Ans.
S cdS’
S’ Lk | k |
L mn |
A bcg | gh
FIRST(A) = {b, g}
Compute
A Bcd | gh
First
Bm|
FIRST(A) = {m, c , g}
FIRST(B) = {m , }
Example :
A BCD | Cx
Bb|
Cc|
Dd|
Compute
First
FIRST(A) = {b, c, d, x, }
FIRST(B) = {b, }
FIRST(C) = {c , }
FIRST(D) = {d , }
Example :
S PQr | s
P Abc |
Qd|
A a|
Compute
First
A mn | Xy | Z
Xx|
Z
Example :
A bcg | gh
FOLLOW(A) = {$}
A Bcd | gh
Compute
B mA |
Follow
FOLLOW(A) = {$, c}
FOLLOW(B) = {c}
Example :
A BCD | Cx
Bb|
Cc|
Dd|
Compute
Follow
Parser
SLR(1)
LL(1) Parser
LALR(1)
End of the the Mid-1
syllabus.