CD UNIT-II Syntax Analysis
CD UNIT-II Syntax Analysis
CD UNIT-II Syntax Analysis
Syntax Analysis
This is the 2 nd phase of the compiler, checks the syntax and construct the syntax/parse
tree.
Input of parser is tokens and output is a parse/ syntax tree.
Constructing Parse Tree: Construction of derivation tree for a given input string by
using the production of grammar is called parse tree.
Consider the grammar
E → E + E|E * E
E → id
The parse tree for the string
ω = id + id * id is
Panic Mode: on discovering an error, the parser discards input symbols one at a time
until one of the synchronizing tokens is found.
Phrase Level: A parser may perform local correction on the remaining input. It may
replace the prefix of the remaining input.
Error Productions: Parser can generate appropriate error messages to indicate the
erroneous construct that has been recognized in the input.
Global Corrections: There are algorithms for choosing a minimal sequence of changes
to obtain a globally least cost correction.
Context free Grammars and Ambiguity
A grammar is a set of rules or productions which generates a collection of finite/infinite
strings.
It is a 4-tuple defined as G = (V, T, P, S)
Where V = set of variables / Non-Terminals
T = set of terminals
P = set of productions / rules
S = start symbol
Ex: S → (S)/e
S → (S) --------- (i)
S → e ----------- (ii)
Here S is start symbol and the only variable.
( , ), e are terminals.
(i)and (ii) are production rules.
Sentential Forms: s =>α, Where α may contain non-terminals, then we say that α is a
sentential form of G.
Sentence: A sentence is a sentential form with no non terminal.
Ex: – (id + id) is a sentence of the grammar
E → E + E | – E | (E) |id.
Left Most derivation (LMD): In a derivation of a string if only left most non terminals
are replaced, then it is a left most derivation.
Right Most Derivation (RMD): In a derivation of a string, if only rightmost non
terminals are replaced, then it is a right most derivation.
Ex: Consider a grammar
E → E + E |E * E | – E |(E) |id.
The LMD and RMD for the string – (id + id) is ?
Left Most Derivation
E => – E => – €
=> – (E + E)
=> – (id + E)
=> – (id + id)
Right Most Derivation
E => – E => – (E)
=> – (E + E)
=> – (E + id)
=> – (id + id)
**Right most derivations are also known as canonical derivations.
Ambigous Grammar: A grammar ‘G’ is said to be ambigous if there exists more than
one derivation tree for the given input string. (OR)
A grammar that produces more than one left most or more than one right most
derivations is ambiguous.
For example consider the following grammar:
String → String + String|String – String
|0|1|2|...|9 9 – 5 + 2 has two parse trees as shown below
• Ambiguity is problematic because the meaning of the program can be incorrect.
• Ambiguity of CFG is undecidable.
• Ambiguity can be handled in several ways
Removal of Ambiguity: The ambiguity of grammar is undecidable, ambiguity of a
grammar can be eliminated by rewriting the grammar.
Example:E → E + E |E * E |id
The above grammar is ambiguous. The ambiguous grammar doesn’t follow
associativity and precedence rule.
Let ‘+’ is Right associative
and ‘*’ is Left associative.
Unambiguous grammar
E → E + T|T * E |id
Let the precedence of ‘*’ is higher than ‘+’.
The operator whose precedence is higher should appear at lower level.
E→E+T|T
T→T*F|F
F → id
The above grammar is unambiguous grammar.
• Left Recursion: In a grammar, if left most variable of Right hand side is same as the
variable at left hand side then it is called left recursion. Left recursion can take the
parser into infinite loop so we need to remove left recursion.
Elimination of Left Recursion
A → Aα|β is a left recursive.
It needs to converted into equivalent right recursive grammar.
A →β A 1
A 1 → αA 1||ϵ
In general
A → Aα1 / Aα2 /... / Aαm /β1 /β2 /.../βn
We can replace A productions by
A → β1A1/β2A1/---/βnA1
A 1 →α1A1/α2A1/--/αmA1/ϵ
Examples:
Eg 1. Eliminate left recursion from
E → E + T/T
T → T * F/F
F → (E) /id
Sol. E → E + T/T it is in the form
A → Aα/β
So, we can write it as E → TE 1
E 1 → +TE 1 /ϵ
Similarly, other productions are written as
T → FT 1
T 1 → * FT 1 / ϵ
F → (E)/id
Eg 2. Eliminate left recursion from the grammar
S → (L) /a
L → LS/b
Sol. S → (L) /a
L → bL I
L I →SL I / ϵ
Left Factoring: A grammar with common prefixes is called non-deterministic
grammar.
To make it deterministic we need to remove common prefixes. This process is called
as Left Factoring.
The grammar: A → αβ1 /αβ2 has common prefixes (α) it is transformed into
A → αA 1
A 1 →β1 /β2
Eg 1.What is the resultant grammar after left factoring the following grammar?
S → iEtS/iEtSeS/a
E→b
Sol. S → iEtSS 1 /a
S 1 → eS/ ϵ
E→b
Non Recursive Descent Parser
Example: (table driven parsing)
• It maintains a stack explicitly, rather thanimplicitly via recursive calls.
• A table driven predictive parser has
1.an input buffer
2.a stack
3.a parsing table
4.output stream
Recursive Descent Parser
Predictive Parsers
By eliminating left recursion and by left factoring the grammar, we can have parse
tree without backtracking. To construct a predictive parser, we must know,
(a) current input symbol
(b) non terminal which is to be expanded
The parser is controlled by a program. The program consider x, the symbol on top of
the stack and ‘a’ the current input symbol.
1. If x = a = $, the parser halts and announces successful completion of parsing.
2. If x = a ≠ $, the parser pops x off the stack and advances the input pointer to the next
input symbol.
3. If x is a non terminal, the program consults entry M[x, a] of the parsing table M. This
entry will be either an x-production of the grammar or an error entry. If M[x, a] = {x→
X Y Z}, the parser replaces x on top of the stack by Z Y X with X on the top.
If M[x, a] = error, the parser calls an error recovery routine.
For example, consider the moves made by predictive parser on input id + id *id, which
are Shown below
Matche Stack Input Action
d
E$ id + id *
id$
TE1$ id + id * Output
id$ E→TE1
FT1E1$ id + id * Output
id$ T→FT1
idT1E1 id + id * Output F→id
$ id$
id T1E1$ + id * id$ Match id
id E1$ + id * id$ Output T1→ϵ
id +TE1$ + id * id$ Output
E1→+TE1
id + TE1$ id * id$ Match +
id + FT1E1$ id * id$ Output
T→FT1
id + idT1E1 id * id$ Output F→id
$
id + id T1E1$ * id$ Match id
id + id *FT1E1 * id$ Output
$ T1→*FT1
id + id * FT1E1$ id$ Match *
id + id * idT1E1 id$ Output F→id
$
id + id * T1E1$ $ Match id
id
id + id * E1$ $ Output T1→ϵ
id
id + id * $ $ Output E1→ϵ
id
Eg.8. What will be the entries in predictive parsing table for given grammar?
S → iBfSS I / d
S I → eS / ϵ
B→b
Sol. First (S) = {i, d}
First (S I ) = {e, ϵ}
First (B) = {b}
Follow (S) = {$} U {First (S I ) – ϵ}
= {$, e}
Follow (S I ) = Follow(S) = {$, e}
Follow (B) = {f}
Predictive Parsing table
d b e i f $
S S→d S→iBf
SS1
S1 S1→ϵ S1→
ϵ
S→eS
B B→b
The given grammar has multiple entries in [S I , e] therefore given grammar is not LL
(1).