0% found this document useful (0 votes)
94 views

Syntax Analyser

The document discusses a syntax analyzer, which creates the syntactic structure of a source program using a parser. The parser checks if a program satisfies the rules of a context-free grammar (CFG) and creates a parse tree if it does. The CFG precisely specifies a language's syntax. Parsers can work top-down or bottom-up on a token stream to find a parse tree through derivations. Ambiguous grammars with more than one parse tree must be disambiguated for most parsers.

Uploaded by

Abhijit Karan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Syntax Analyser

The document discusses a syntax analyzer, which creates the syntactic structure of a source program using a parser. The parser checks if a program satisfies the rules of a context-free grammar (CFG) and creates a parse tree if it does. The CFG precisely specifies a language's syntax. Parsers can work top-down or bottom-up on a token stream to find a parse tree through derivations. Ambiguous grammars with more than one parse tree must be disambiguated for most parsers.

Uploaded by

Abhijit Karan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CS 346:

346 Syntax
S t Analyzer
A l

Resource: Textbook
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles,Techniques, and Tools”,
Addison-Wesley 1986
Addison-Wesley, 1986.
Syntax Analyzer
 Syntax Analyzer: creates the syntactic structure of the given source program
 Parser
 Syntactic structure: parse tree
 Syntax
S t off a programming:
i described
d ib d by
b a context-free
f grammar (CFG)

 Steps
 Parser checks whether a given source program satisfies the rules implied
b a CFG or not
by
 If it satisfies, the parser creates the parse tree of that program
 Otherwise the parser gives the error messages
Syntax Analyzer

 CFG
 gives a precise syntactic specification of a programming language

 the design of the grammar is an initial phase of the design of a


compiler
p

 a grammar can be directly converted into a parser by some tools


Parser

• Parser works on a stream of tokens

• Smallest
S ll t item:
it ttoken
k

source Lexical token pparse tree


program Parser
Analyzer get next token
Parsers (cont.)
 Well-known categories of parsers:

1. Top-Down Parser
 the
th parse tree
t createdt d top
t tot bottom,
b tt starting
t ti from
f th roott
the
2. Bottom-Up Parser
 the parse created bottom to top; starting from the leaves

 Both top
top-down
down and bottom
bottom-upup parsers scan the input from left to right (one
symbol at a time)
 Efficient top-down and bottom-up parsers can be implemented only for sub-
classes of CFG
 LL for top-down parsing
 LR for f bottom-up
b tt parsing
i
Context-Free Grammars (CFG)
 Inherently recursive structures of a programming language are defined by a
CFG
 In a CFG, we have:
 A finite set of terminals ((in our case,, this will be the set of tokens))
 A finite set of non-terminals (syntactic-variables)
 A finite set of productions rules in the following form
A   where A is a non-terminal and
 is a string of terminals and non-terminals (including the
empt string);
empty string) |A| <= ||
 A start symbol: one of the non-terminal symbols
 Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E  id
Derivations
E  E+E

 E+E derives from E


 we ca
can replace
ep ace E by E+E

E  E+E  id + E  id + id

 A sequence of replacements of non-terminal symbols is called a derivation of id + id from E

 In ggeneral a derivation step


p is
A   if there is a production rule A in our grammar
where  and  are arbitrary strings of terminal and non-terminal symbols

1  2  ...  n (n derives


d ffrom 1 or 1 derives
d n )

 : derives in one step

 : derives in zero or more steps


 : derives in one or more steps
*
+
CFG - Terminology
 L(G) is the language of G (the language generated by G) which is a set of
sentences
 A sentence of L(G) is a string of terminal symbols of G
 If S iis the
h start symbol
b l off G then
h
+
 is a sentence of L(G) iff S   where  is a string of terminals of G

 If G is a context-free grammar, L(G) is a context-free language


 Two grammars are equivalent if they produce the same language

*
S - If  contains non-terminals, it is called as a sentential form of G
- If  does not contain non-terminals, it is called as a sentence of G
Derivation: Example
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
OR
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)

 At each derivation step, we can choose any of the non-terminals in the


sentential form of G for the replacement

 left-most derivation: always


y chooses the left-most non-terminal in each
derivation step

 right-most derivation: always chooses the right-most non-terminal in


each derivation step
Left Most and Right-Most
Left-Most Right Most Derivations
Left-Most Derivation
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
lm lm lm lm lm

Right-Most Derivation
E  -E
E  -(E)
(E)  -(E+E)
(E+E)  -(E+id)
(E+id)  -(id+id)
(id+id)
rm rm rm rm rm

 top-down parsers: finds the left-most derivation of the given source


program

 bottom-up parsers: finds the right-most derivation of the given source


program in the reverse order
Parse Tree
• Intermediate nodes: Inner nodes of a parse tree
• Leaves:
eaves: Terminal
e a sy
symbols
os
• A parse tree can be seen as a graphical representation of a derivation

E  -E E
 -(E) E
 -(E+E) E
- E - E - E

( E ) ( E )

E E E + E
- E - E
 -(id+E)  -(id+id)
( E ) ( E )

E + E E + E

id id id
Ambiguity
g y
• A grammar that produces more than one parse tree for a sentence is
called as an ambiguous grammar
E
E  E+E  id+E  id+E*E E + E
 id+id*E  id+id*id
id E * E

id id

E
E  E*E  E+E*E  id+E*E
 id+id*E  id+id*id E * E

E + E id

id id
Ambiguity (cont.)
 For the most parsers, the grammar must be unambiguous

 Unambiguous grammar
 unique selection
l off the
h parse tree for
f a sentence

• Disambiguation
--Necessary to eliminate the ambiguity in the grammar during the
design phase of the compiler
 Design unambiguous grammar
 Choose one of the parse trees of a sentence to restrict to this choice
Ambiguity
g y ((cont.))
stmt  if expr then stmt |
if expr
e pr then stmt else stmt | otherstmts

if E1 then
th if E2 then
th S1 else
l S2

IInterpretation-1:
1 S2 being
b i executedd when
h E1 is
i false
f l (thus
( h attaching
hi the
h else
l to the
h
first if)
if E1 then (if E2 then S1) else S2

IInterpretation-I1:
i I1 E1 is
i true
t andd E2 is
i false
f l (thus
(th attaching
tt hi theth else
l tot the
th secondd if)
if E1 then (if E2 then S1 else S2)
Ambiguity (cont.)
(cont )
stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr
p then stmt S2 E1 if expr
p then stmt else stmt

E2 S1 E2 S1 S2
1 2
Ambiguity (cont.)
• We prefer the second parse tree (else matches with closest if)

So, we have to disambiguate our grammar to reflect this choice

• Unambiguous grammar:

stmt  matchedstmt | unmatchedstmt

matchedstmt  if expr then matchedstmt else matchedstmt |


otherstmts

unmatchedstmt  if expr
p then stmt |
if expr then matchedstmt else unmatchedstmt
Ambiguity – Operator Precedence
 Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules

E  E+E | E*E | E^E | id | (E)


disambiguate the grammar
 precedence: ^ (right to left)
* (left to right)
+ (left to right)
g
E  E+T | T
T  T*F | F
F  G^F | G
G  id | (E)
Left Recursion

 A grammar is left recursive if it has a non


non-terminal
terminal A such that
there is a derivation
+
A  A for some string 
 Top-down
p p
parsing
g techniques
q cannot handle left-recursive
grammars
 Conversion of left-recursive grammar into an equivalent non-
recursive grammar is essentiall
 Possible ways of left-recursion
 may appear in
i a single
i l step off the
h derivation
d i i (immediate
(i di left-recursion)
lf i ) or
 may appear in more than one step of the derivation
Immediate Left-Recursion
A A  |  where  does not start with A
 eliminate immediate left recursion
A   A’
A’   A’ |  an equivalent grammar

In general,
A  A 1 | ... | A m | 1 | ... | n h 1 ... n do
where d not start with
i hA
 eliminate immediate left recursion
A  1 A’ | ... | n A’
A’  1 A’ | ... | m A’ |  an equivalent grammar
Immediate Left
Left-Recursion
Recursion -- Example
E  E+T | T
T  T*F | F
F  id | (E)

 eliminate immediate left recursion


E  T E’
E’  +T E’ | 
T  F T’
T’  *F T’ | 
F  id | (E)
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can be left-recursive
• Just elimination of the immediate left-recursion does not guarantee a grammar
which is not left-recursive

S  Aa | b
A  Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive

S  Aa  Sca or
A  Sc
S  Aac
A causes to
t a lleft-recursion
ft i

f
• Solution: eliminate all left-recursions ffrom the ggrammar
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An
- for
f i from f 1 to
t n dod {
- for j from 1 to i-1 do {
replace
l eachh production
d i
Ai  Aj 
by
Ai  1  | ... | k 
h Aj  1 | ... | k
where
}
- eliminate
l immediate
d lleft-recursions
f among Ai productions
d
}
Eliminate Left-Recursion -- Example
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: S, A
for S:
p
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A  Sd with A  Aad | bd
So, we will have A  Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A  bdA’ | fA’
A’  cA’ | adA’ | 

So,, the resultingg equivalent


q grammar
g which is not left-recursive is:
S  Aa | b
A  bdA’ | fA’
A’  cA’ | adA’ | 
Eliminate Left-Recursion – Example2
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop
- Eliminate the immediate left-recursion in A
A  SdA’ | fA’
A’  cA’ | 
for S:
- Replace S  Aa with S  SdA’a | fA’a
So, we will have S  SdA’a | fA’a | b
- Eliminate the immediate left
left-recursion
recursion in S
S  fA’aS’ | bS ’
S’  dA’aS’ | 
So, the resulting equivalent grammar which is not left-recursive
So left recursive is:
S  fA’aS’ | bS ’
S’  dA’aS’ | 
A  SdA’ | fA’
A’  cA’ | 
Left-Factoring
 Top-down parser without backtracking (predictive parser) insists that the
grammar must be left
left-factored
factored

grammar  a new equivalent grammar suitable for predictive parsing

stmt  if expr then stmt else stmt |


if expr then stmt

After seeing if,


if we cannot decide which production rule to choose
to re-write stmt in the derivation
Left-Factoring (cont.)
 In general,

A  1 | 2 where  is non-empty and the first symbols


of 1 and 2 (if they have one) are different
 Choice involved when processing 
A to 
1 or
A to 2

 Re-write the grammar as follows:


A  A’
A’  1 | 2 so, we can immediately expand A to A’
Left-Factoring -- Algorithm
 For each non-terminal A with two or more alternatives (production rules)
with a common non-empty
non empty prefix,
prefix let say

A  1 | ... | n | 1 | ... | m

convert it into

A  A’ | 1 | ... | m
A’  1 | ... | n
Left-Factoring – Example1

A  abB | aB | cdg | cdeB | cdfB



A  aA’ | cdg | cdeB | cdfB
A’  bB | B

A  aAA’ | cdA
dA’’
A’  bB | B
A’’  g | eB | fB
Left-Factoring
Left Factoring – Example2
A  ad | a | ab | abc | b

A  aA’ | b
A’  d |  | b | bc

A  aA’
A’ | b
A’  d |  | bA’’
A’’   | c
Non-Context Free Language
g g Constructs
 Some language constructions in the programming languages are not
context free
context-free

1 L1 = { c |  is in (a|b)*}
Example-1:
Example
 declaring an identifier and checking whether it is declared or not
later We cannot do this with a context-free
later. context free language.
language We need
semantic analyzer (which is not context-free)

Example-2: L2 = {anbmcndm | n1 and m1 }


 declaring two functions (one with n parameters, the other one with
m parameters), and then calling them with actual parameters

You might also like