Syntax Analyser
Syntax Analyser
346 Syntax
S t Analyzer
A l
Resource: Textbook
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles,Techniques, and Tools”,
Addison-Wesley 1986
Addison-Wesley, 1986.
Syntax Analyzer
Syntax Analyzer: creates the syntactic structure of the given source program
Parser
Syntactic structure: parse tree
Syntax
S t off a programming:
i described
d ib d by
b a context-free
f grammar (CFG)
Steps
Parser checks whether a given source program satisfies the rules implied
b a CFG or not
by
If it satisfies, the parser creates the parse tree of that program
Otherwise the parser gives the error messages
Syntax Analyzer
CFG
gives a precise syntactic specification of a programming language
• Smallest
S ll t item:
it ttoken
k
1. Top-Down Parser
the
th parse tree
t createdt d top
t tot bottom,
b tt starting
t ti from
f th roott
the
2. Bottom-Up Parser
the parse created bottom to top; starting from the leaves
Both top
top-down
down and bottom
bottom-upup parsers scan the input from left to right (one
symbol at a time)
Efficient top-down and bottom-up parsers can be implemented only for sub-
classes of CFG
LL for top-down parsing
LR for f bottom-up
b tt parsing
i
Context-Free Grammars (CFG)
Inherently recursive structures of a programming language are defined by a
CFG
In a CFG, we have:
A finite set of terminals ((in our case,, this will be the set of tokens))
A finite set of non-terminals (syntactic-variables)
A finite set of productions rules in the following form
A where A is a non-terminal and
is a string of terminals and non-terminals (including the
empt string);
empty string) |A| <= ||
A start symbol: one of the non-terminal symbols
Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E id
Derivations
E E+E
E E+E id + E id + id
*
S - If contains non-terminals, it is called as a sentential form of G
- If does not contain non-terminals, it is called as a sentence of G
Derivation: Example
E -E -(E) -(E+E) -(id+E) -(id+id)
OR
E -E -(E) -(E+E) -(E+id) -(id+id)
Right-Most Derivation
E -E
E -(E)
(E) -(E+E)
(E+E) -(E+id)
(E+id) -(id+id)
(id+id)
rm rm rm rm rm
E -E E
-(E) E
-(E+E) E
- E - E - E
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
Ambiguity
g y
• A grammar that produces more than one parse tree for a sentence is
called as an ambiguous grammar
E
E E+E id+E id+E*E E + E
id+id*E id+id*id
id E * E
id id
E
E E*E E+E*E id+E*E
id+id*E id+id*id E * E
E + E id
id id
Ambiguity (cont.)
For the most parsers, the grammar must be unambiguous
Unambiguous grammar
unique selection
l off the
h parse tree for
f a sentence
• Disambiguation
--Necessary to eliminate the ambiguity in the grammar during the
design phase of the compiler
Design unambiguous grammar
Choose one of the parse trees of a sentence to restrict to this choice
Ambiguity
g y ((cont.))
stmt if expr then stmt |
if expr
e pr then stmt else stmt | otherstmts
if E1 then
th if E2 then
th S1 else
l S2
IInterpretation-1:
1 S2 being
b i executedd when
h E1 is
i false
f l (thus
( h attaching
hi the
h else
l to the
h
first if)
if E1 then (if E2 then S1) else S2
IInterpretation-I1:
i I1 E1 is
i true
t andd E2 is
i false
f l (thus
(th attaching
tt hi theth else
l tot the
th secondd if)
if E1 then (if E2 then S1 else S2)
Ambiguity (cont.)
(cont )
stmt if expr then stmt |
if expr then stmt else stmt | otherstmts
stmt stmt
E1 if expr
p then stmt S2 E1 if expr
p then stmt else stmt
E2 S1 E2 S1 S2
1 2
Ambiguity (cont.)
• We prefer the second parse tree (else matches with closest if)
• Unambiguous grammar:
unmatchedstmt if expr
p then stmt |
if expr then matchedstmt else unmatchedstmt
Ambiguity – Operator Precedence
Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules
In general,
A A 1 | ... | A m | 1 | ... | n h 1 ... n do
where d not start with
i hA
eliminate immediate left recursion
A 1 A’ | ... | n A’
A’ 1 A’ | ... | m A’ | an equivalent grammar
Immediate Left
Left-Recursion
Recursion -- Example
E E+T | T
T T*F | F
F id | (E)
S Aa | b
A Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive
S Aa Sca or
A Sc
S Aac
A causes to
t a lleft-recursion
ft i
f
• Solution: eliminate all left-recursions ffrom the ggrammar
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An
- for
f i from f 1 to
t n dod {
- for j from 1 to i-1 do {
replace
l eachh production
d i
Ai Aj
by
Ai 1 | ... | k
h Aj 1 | ... | k
where
}
- eliminate
l immediate
d lleft-recursions
f among Ai productions
d
}
Eliminate Left-Recursion -- Example
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S, A
for S:
p
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A Sd with A Aad | bd
So, we will have A Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A bdA’ | fA’
A’ cA’ | adA’ |
convert it into
A A’ | 1 | ... | m
A’ 1 | ... | n
Left-Factoring – Example1
1 L1 = { c | is in (a|b)*}
Example-1:
Example
declaring an identifier and checking whether it is declared or not
later We cannot do this with a context-free
later. context free language.
language We need
semantic analyzer (which is not context-free)