0% found this document useful (0 votes)

94 views

Syntax Analyser

The document discusses a syntax analyzer, which creates the syntactic structure of a source program using a parser. The parser checks if a program satisfies the rules of a context-free grammar (CFG) and creates a parse tree if it does. The CFG precisely specifies a language's syntax. Parsers can work top-down or bottom-up on a token stream to find a parse tree through derivations. Ambiguous grammars with more than one parse tree must be disambiguated for most parsers.

Uploaded by

Abhijit Karan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Syntax Analyser

Uploaded by

Abhijit Karan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

CS 346:

346 Syntax
S t Analyzer
A l

Resource: Textbook
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles,Techniques, and Tools”,
Addison-Wesley 1986
Addison-Wesley, 1986.
Syntax Analyzer
 Syntax Analyzer: creates the syntactic structure of the given source program
 Parser
 Syntactic structure: parse tree
 Syntax
S t off a programming:
i described
d ib d by
b a context-free
f grammar (CFG)

 Steps
 Parser checks whether a given source program satisfies the rules implied
b a CFG or not
by
 If it satisfies, the parser creates the parse tree of that program
 Otherwise the parser gives the error messages
Syntax Analyzer

 CFG
 gives a precise syntactic specification of a programming language

 the design of the grammar is an initial phase of the design of a

compiler
p

 a grammar can be directly converted into a parser by some tools

Parser

• Parser works on a stream of tokens

• Smallest
S ll t item:
it ttoken
k

source Lexical token pparse tree

program Parser
Analyzer get next token
Parsers (cont.)
 Well-known categories of parsers:

1. Top-Down Parser
 the
th parse tree
t createdt d top
t tot bottom,
b tt starting
t ti from
f th roott
the
2. Bottom-Up Parser
 the parse created bottom to top; starting from the leaves

 Both top
top-down
down and bottom
bottom-upup parsers scan the input from left to right (one
symbol at a time)
 Efficient top-down and bottom-up parsers can be implemented only for sub-
classes of CFG
 LL for top-down parsing
 LR for f bottom-up
b tt parsing
i
Context-Free Grammars (CFG)
 Inherently recursive structures of a programming language are defined by a
CFG
 In a CFG, we have:
 A finite set of terminals ((in our case,, this will be the set of tokens))
 A finite set of non-terminals (syntactic-variables)
 A finite set of productions rules in the following form
A   where A is a non-terminal and
 is a string of terminals and non-terminals (including the
empt string);
empty string) |A| <= ||
 A start symbol: one of the non-terminal symbols
 Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E  id
Derivations
E  E+E

 E+E derives from E

 we ca
can replace
ep ace E by E+E

E  E+E  id + E  id + id

 A sequence of replacements of non-terminal symbols is called a derivation of id + id from E

 In ggeneral a derivation step

p is
A   if there is a production rule A in our grammar
where  and  are arbitrary strings of terminal and non-terminal symbols

1  2  ...  n (n derives

d ffrom 1 or 1 derives
d n )

 : derives in one step

 : derives in zero or more steps

 : derives in one or more steps
*
+
CFG - Terminology
 L(G) is the language of G (the language generated by G) which is a set of
sentences
 A sentence of L(G) is a string of terminal symbols of G
 If S iis the
h start symbol
b l off G then
h
+
 is a sentence of L(G) iff S   where  is a string of terminals of G

 If G is a context-free grammar, L(G) is a context-free language

 Two grammars are equivalent if they produce the same language

*
S - If  contains non-terminals, it is called as a sentential form of G
- If  does not contain non-terminals, it is called as a sentence of G
Derivation: Example
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
OR
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)

 At each derivation step, we can choose any of the non-terminals in the

sentential form of G for the replacement

 left-most derivation: always

y chooses the left-most non-terminal in each
derivation step

 right-most derivation: always chooses the right-most non-terminal in

each derivation step
Left Most and Right-Most
Left-Most Right Most Derivations
Left-Most Derivation
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
lm lm lm lm lm

Right-Most Derivation
E  -E
E  -(E)
(E)  -(E+E)
(E+E)  -(E+id)
(E+id)  -(id+id)
(id+id)
rm rm rm rm rm

 top-down parsers: finds the left-most derivation of the given source

program

 bottom-up parsers: finds the right-most derivation of the given source

program in the reverse order
Parse Tree
• Intermediate nodes: Inner nodes of a parse tree
• Leaves:
eaves: Terminal
e a sy
symbols
os
• A parse tree can be seen as a graphical representation of a derivation

E  -E E
 -(E) E
 -(E+E) E
- E - E - E

( E ) ( E )

E E E + E
- E - E
 -(id+E)  -(id+id)
( E ) ( E )

E + E E + E

id id id
Ambiguity
g y
• A grammar that produces more than one parse tree for a sentence is
called as an ambiguous grammar
E
E  E+E  id+E  id+E*E E + E
 id+id*E  id+id*id
id E * E

id id

E
E  E*E  E+E*E  id+E*E
 id+id*E  id+id*id E * E

E + E id

id id
Ambiguity (cont.)
 For the most parsers, the grammar must be unambiguous

 Unambiguous grammar
 unique selection
l off the
h parse tree for
f a sentence

• Disambiguation
--Necessary to eliminate the ambiguity in the grammar during the
design phase of the compiler
 Design unambiguous grammar
 Choose one of the parse trees of a sentence to restrict to this choice
Ambiguity
g y ((cont.))
stmt  if expr then stmt |
if expr
e pr then stmt else stmt | otherstmts

if E1 then
th if E2 then
th S1 else
l S2

IInterpretation-1:
1 S2 being
b i executedd when
h E1 is
i false
f l (thus
( h attaching
hi the
h else
l to the
h
first if)
if E1 then (if E2 then S1) else S2

IInterpretation-I1:
i I1 E1 is
i true
t andd E2 is
i false
f l (thus
(th attaching
tt hi theth else
l tot the
th secondd if)
if E1 then (if E2 then S1 else S2)
Ambiguity (cont.)
(cont )
stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr
p then stmt S2 E1 if expr
p then stmt else stmt

E2 S1 E2 S1 S2
1 2
Ambiguity (cont.)
• We prefer the second parse tree (else matches with closest if)

So, we have to disambiguate our grammar to reflect this choice

• Unambiguous grammar:

stmt  matchedstmt | unmatchedstmt

matchedstmt  if expr then matchedstmt else matchedstmt |

otherstmts

unmatchedstmt  if expr
p then stmt |
if expr then matchedstmt else unmatchedstmt
Ambiguity – Operator Precedence
 Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules

E  E+E | E*E | E^E | id | (E)

disambiguate the grammar
 precedence: ^ (right to left)
* (left to right)
+ (left to right)
g
E  E+T | T
T  T*F | F
F  G^F | G
G  id | (E)
Left Recursion

 A grammar is left recursive if it has a non

non-terminal
terminal A such that
there is a derivation
+
A  A for some string 
 Top-down
p p
parsing
g techniques
q cannot handle left-recursive
grammars
 Conversion of left-recursive grammar into an equivalent non-
recursive grammar is essentiall
 Possible ways of left-recursion
 may appear in
i a single
i l step off the
h derivation
d i i (immediate
(i di left-recursion)
lf i ) or
 may appear in more than one step of the derivation
Immediate Left-Recursion
A A  |  where  does not start with A
 eliminate immediate left recursion
A   A’
A’   A’ |  an equivalent grammar

In general,
A  A 1 | ... | A m | 1 | ... | n h 1 ... n do
where d not start with
i hA
 eliminate immediate left recursion
A  1 A’ | ... | n A’
A’  1 A’ | ... | m A’ |  an equivalent grammar
Immediate Left
Left-Recursion
Recursion -- Example
E  E+T | T
T  T*F | F
F  id | (E)

 eliminate immediate left recursion

E  T E’
E’  +T E’ | 
T  F T’
T’  *F T’ | 
F  id | (E)
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can be left-recursive
• Just elimination of the immediate left-recursion does not guarantee a grammar
which is not left-recursive

S  Aa | b
A  Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive

S  Aa  Sca or
A  Sc
S  Aac
A causes to
t a lleft-recursion
ft i

f
• Solution: eliminate all left-recursions ffrom the ggrammar
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An
- for
f i from f 1 to
t n dod {
- for j from 1 to i-1 do {
replace
l eachh production
d i
Ai  Aj 
by
Ai  1  | ... | k 
h Aj  1 | ... | k
where
}
- eliminate
l immediate
d lleft-recursions
f among Ai productions
d
}
Eliminate Left-Recursion -- Example
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: S, A
for S:
p
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A  Sd with A  Aad | bd
So, we will have A  Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A  bdA’ | fA’
A’  cA’ | adA’ | 

So,, the resultingg equivalent

grammar  a new equivalent grammar suitable for predictive parsing

stmt  if expr then stmt else stmt |

if expr then stmt

After seeing if,

if we cannot decide which production rule to choose
to re-write stmt in the derivation
Left-Factoring (cont.)
 In general,

A  1 | 2 where  is non-empty and the first symbols

of 1 and 2 (if they have one) are different
 Choice involved when processing 
A to 
1 or
A to 2

 Re-write the grammar as follows:

A  A’
A’  1 | 2 so, we can immediately expand A to A’
Left-Factoring -- Algorithm
 For each non-terminal A with two or more alternatives (production rules)
with a common non-empty
non empty prefix,
prefix let say

A  1 | ... | n | 1 | ... | m

convert it into

A  A’ | 1 | ... | m
A’  1 | ... | n
Left-Factoring – Example1

A  abB | aB | cdg | cdeB | cdfB


A  aA’ | cdg | cdeB | cdfB
A’  bB | B

A  aAA’ | cdA
dA’’
A’  bB | B
A’’  g | eB | fB
Left-Factoring
Left Factoring – Example2
A  ad | a | ab | abc | b

A  aA’ | b
A’  d |  | b | bc

A  aA’
A’ | b
A’  d |  | bA’’
A’’   | c
Non-Context Free Language
g g Constructs
 Some language constructions in the programming languages are not
context free
context-free

1 L1 = { c |  is in (a|b)*}
Example-1:
Example
 declaring an identifier and checking whether it is declared or not
later We cannot do this with a context-free
later. context free language.
language We need
semantic analyzer (which is not context-free)

Example-2: L2 = {anbmcndm | n1 and m1 }

 declaring two functions (one with n parameters, the other one with
m parameters), and then calling them with actual parameters

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Tekkom M4,5
No ratings yet
Tekkom M4,5
29 pages
Lec03 parserCFG
No ratings yet
Lec03 parserCFG
27 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
88 pages
Context Free Grammars
No ratings yet
Context Free Grammars
10 pages
CD - Ch.2
No ratings yet
CD - Ch.2
39 pages
CD Chapter 2
No ratings yet
CD Chapter 2
39 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
51 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Parsing ME Modified
No ratings yet
Parsing ME Modified
168 pages
Unit-II CD
No ratings yet
Unit-II CD
81 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
74 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
74 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
CD UNIT 3
No ratings yet
CD UNIT 3
76 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
No ratings yet
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
26 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
79 pages
Chapter – 3
No ratings yet
Chapter – 3
46 pages
2.2 - Syntax Analysis (Upto Top-down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-down Parsing)
91 pages
CH03
No ratings yet
CH03
57 pages
parser (1)
No ratings yet
parser (1)
36 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
Top to Bottom (1)
No ratings yet
Top to Bottom (1)
31 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
56 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
36 pages
Unit - 3 Mid - 1
No ratings yet
Unit - 3 Mid - 1
37 pages
3 Role of Parser
No ratings yet
3 Role of Parser
135 pages
CD Unit-3 Part-1
No ratings yet
CD Unit-3 Part-1
99 pages
Chapter4-1
No ratings yet
Chapter4-1
61 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
68 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Ch4a Modified
No ratings yet
Ch4a Modified
53 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Atcd Unit 2
No ratings yet
Atcd Unit 2
49 pages
Module 2a - With soln
No ratings yet
Module 2a - With soln
90 pages
Top Down PDF
No ratings yet
Top Down PDF
49 pages
Chapter 2 - Simple Syntax Directed Translator
No ratings yet
Chapter 2 - Simple Syntax Directed Translator
39 pages
51114. Compiler Design Syntax Analysis Top Down
No ratings yet
51114. Compiler Design Syntax Analysis Top Down
34 pages
SSC Module3 SyntaxAnalysis
No ratings yet
SSC Module3 SyntaxAnalysis
54 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
CSE 4102 Syntax Analysis or Parsing
No ratings yet
CSE 4102 Syntax Analysis or Parsing
73 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
82 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
Chapter 4 - Syntax Analysis Part 1
No ratings yet
Chapter 4 - Syntax Analysis Part 1
36 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Top Down
No ratings yet
Top Down
25 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
Ch3 SyntaxAnalysispdf 2024 01 01 08 48 28
No ratings yet
Ch3 SyntaxAnalysispdf 2024 01 01 08 48 28
134 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
73 pages
CS 346: Intermediate Code Generation: Resource
No ratings yet
CS 346: Intermediate Code Generation: Resource
60 pages
CS346 Code Generation II
No ratings yet
CS346 Code Generation II
72 pages
CS 346: Code Generation: Resource
No ratings yet
CS 346: Code Generation: Resource
52 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
CS346 Bottom Up Parser
No ratings yet
CS346 Bottom Up Parser
64 pages
Sem 2 Syllabus
No ratings yet
Sem 2 Syllabus
7 pages
Unit I SRM
100% (1)
Unit I SRM
36 pages
30 Dec 2013 Technical Videos
No ratings yet
30 Dec 2013 Technical Videos
132 pages
Compiler Design Lab Report
No ratings yet
Compiler Design Lab Report
44 pages
Chapter 6 Homework
No ratings yet
Chapter 6 Homework
7 pages
CD Farre
No ratings yet
CD Farre
13 pages
TOC_ImpQuestion
No ratings yet
TOC_ImpQuestion
2 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
16 pages
UG Question Papers & Answer Keys (Paper 1 & 2)
75% (4)
UG Question Papers & Answer Keys (Paper 1 & 2)
800 pages
Syllabus Theory of Computation (CSC257)
No ratings yet
Syllabus Theory of Computation (CSC257)
6 pages
Library Name: Boost C++ Libraries
No ratings yet
Library Name: Boost C++ Libraries
8 pages
Back Patching
No ratings yet
Back Patching
18 pages
CS8602 CD
No ratings yet
CS8602 CD
2 pages
EXP - 4 - To - 6CD - Lab Manual - ODD - 2024 - Removed
No ratings yet
EXP - 4 - To - 6CD - Lab Manual - ODD - 2024 - Removed
16 pages
Unit 4 and 5
No ratings yet
Unit 4 and 5
31 pages
03 Compiler Design Lecture - Syntax Analysis
No ratings yet
03 Compiler Design Lecture - Syntax Analysis
39 pages
Week 1 Adv Theory Comp
No ratings yet
Week 1 Adv Theory Comp
8 pages
21CS51 - ATCD - MODULE 1 - Introduction & Central Concepts of Automata Theory
No ratings yet
21CS51 - ATCD - MODULE 1 - Introduction & Central Concepts of Automata Theory
43 pages
CSEDS 5 24-Aug-2022
No ratings yet
CSEDS 5 24-Aug-2022
20 pages
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
No ratings yet
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
56 pages
Com 413 Ammar Usman Sabo H21CS018
No ratings yet
Com 413 Ammar Usman Sabo H21CS018
16 pages
Classical and Non-Classical Computation
No ratings yet
Classical and Non-Classical Computation
19 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
25 pages
Practise Questns
No ratings yet
Practise Questns
6 pages
FLAT (Question Bank)
100% (1)
FLAT (Question Bank)
8 pages
Lex and Yacc
0% (1)
Lex and Yacc
18 pages
EEX6335 - Compiler Design EEX6363 - Compiler Construction
No ratings yet
EEX6335 - Compiler Design EEX6363 - Compiler Construction
24 pages
BTechCSE AI ML 2022 Curriculum v1
No ratings yet
BTechCSE AI ML 2022 Curriculum v1
61 pages
BBM401 Automata Theory and Formal Languages 1
No ratings yet
BBM401 Automata Theory and Formal Languages 1
34 pages
Finite Automata and Formal Language (19CS3501) : Department of Computer Science
No ratings yet
Finite Automata and Formal Language (19CS3501) : Department of Computer Science
49 pages