Chapter 4 Syntax Analysis
Chapter 4 Syntax Analysis
Design
Chapter 4: Syntax
Analysis
Contents – Plan of attack
• Introduction of syntax analysis
• Role of parser
• Derivation
• Ambiguity
• Classification of parsing
– Bottom up parsing
Class Discussion
1. What is the role of syntax analysis in the compilation process?
2. How does syntax analysis differ from lexical analysis?
3. What is a context-free grammar, and how is it used to describe the
syntax of a language?
4. What are the components of a context-free grammar?
5. Can a grammar have multiple derivations for the same sentence? If
so, what are the implications?
6. How can we identify and resolve ambiguity in a grammar?
7. How do top-down and bottom-up parsing differ in their approach?
Syntax Analysis
Syntax analyzer receives the source code in
token stream.
Syntax analysis is also called parsing.
Syntax Analysis
• They are then checked for proper syntax
– the compiler checks to make sure the statements and
expressions are correctly formed.
– It checks whether the given input is in the correct syntax
of the programming language or not.
– It construct the Parse Tree for checking.
• Type-3 grammars must have a single non terminal on the left-hand side
and a single terminal or single terminal followed by a single non-terminal
on the right-hand side .
– X → a or
• The rule S → ε is allowed if S does not appear on the right side of any
rule.
Type - 2 Grammar
G: E E O E| (E) | -E | id
O+|-|*|/|↑
• Write terminals, non terminals, start symbol, and
productions for following grammar.
– Terminals: id + - * / ↑ ( )
– Non terminals: E, O
– Start symbol: E
– Productions: E E O E| (E) | -E | id
O+|-|*|/|↑
Example #2 - Context-Free Grammars
• G: S AB
A aAA
A aA
Aa
B bB
Bb
1. Q1. Identify Start variable, Terminal symbols , Non
terminals and Production rules.
CFG
• Algorithm
• If the left linear grammar has a rule with the start symbol
2) If the left linear grammar has a rule A →p, then add the
following rule to the right linear grammar: S → p A
Left Linear
S → Aa
A → ab
Right Linear
left linear
S → abA
S → Aa
A → ab
S → Aa S → abA
A → ab A→a
Both grammars generate this language: {aba}
22
Convert this left linear grammar
Convert this
23
Right hand side has terminals
S0 → S S0 → aA
S → Ab
S → Sb
A → Aa
A→a
2) If the left linear grammar has this rule A → p, then
add the following rule to the right linear grammar: S
→ pA 24
Right hand side has non-terminal
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
1. Leftmost derivation
2. Rightmost derivation
Leftmost Derivation
Example:
Rules: E E+E | E*E | -E | (E) | id
Input: –(id + id )
DERIVATION TREES
• Example -1: A grammar G which is context-free has the
productions
S → aAB
A → Bba
B → bB
B→c
• The word w = acbabc is derived as follows:
S ⇒ aAB
⇒ a(Bba)B
⇒ acbaB
⇒ acba(bB)
⇒ acbabc.
• Obtain the derivation tree.
DERIVATION TREES
Exercise- Derivation
1. Perform leftmost derivation and draw parse tree.
S A1B
A 0A | 𝜖
B 0B | 1B | 𝜖
Output string: id + id * id
Exercise- Derivation
Ambiguity
• Ambiguity, is a word, phrase, or statement
which contains more than one meaning.
Ambiguity
A grammar that produces more than one parse tree for some sentence is
said to be ambiguous. Or
Ambiguous grammar is one that produces more than one
leftmost or rightmost derivation for the same sentence.
Ambiguous grammar
Ambiguous grammar is one that produces more than
one leftmost or rightmost derivation for the same
sentence.
Grammar: S→S+S | (S) | a Output string: a+a+a
In other words , in the derivation process starting from any non – terminal A,
if the sentential form starts with the same non-terminal A, then we say that
E → E+T | T
T → T* F | F
F → (E) | id
Left Recursion Elimination
?
S → Ab | a
A → aaA1
A1 →b A1 | ba A1 | ϵ
S → Ab | a
A → Ab | Aba | aa
Example #2: eliminate left recursion
Eliminating Ambiguity - Left Factoring
where
A → αA’| 1 | 2 | …… | m , where
A’ → 1 | 2 | 3 |…….. |n
Left Factoring - Elimination
Example #1:
Definition of parser
Ways of generating Parse tree
Classification of parsers
Top down Parsing - Backtracking
• Backtracking is top down parsing method that involve repeated scans of the
input.
• If any mismatch occurs then we try another alternative.
Simplification of Rule 3
• If 𝐴 → 𝑌1𝑌2 … … . . 𝑌𝐾 ,
𝐹𝐼𝑅𝑆𝑇(𝐴) = 𝐹𝐼𝑅𝑆𝑇(𝑌1)
– If 𝑌1 derives ∈ 𝑡ℎ𝑒𝑛,
𝑈 𝐹𝐼𝑅𝑆𝑇(𝑌3)
Top down Parsing - LL(1) parsing (predictive parsing)
• If 𝐴 → 𝑌1𝑌2 … … . . 𝑌𝐾 ,
Simplification of Rule 3
𝑈 𝐹𝐼𝑅𝑆𝑇(𝑌3) − 𝜖 𝑈 𝐹𝐼𝑅𝑆𝑇(𝑌4)
𝑈 𝐹𝐼𝑅𝑆𝑇(𝑌3) − 𝜖 𝑈 𝐹𝐼𝑅𝑆𝑇(𝑌4) − 𝜖 𝑈 … … … …
+ * ( ) Id $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F →(E) F→ id
Example - predictive parsing - LL(1) parsing
+ * ( ) Id $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F →(E) F→ id
Explanations: FIRST(E), FIRST(T)and FIRST(F) contains {(, id}, hence place E,T& F
productions in respective terminals and FIRST(E’) &FIRST(T’) contains then place E’
→ & T’→ in FOLLOW of E’ and T’.
Top down Parsing - Recursive Descent Parsing
• Recursive descent parser executes a set of
recursive procedure to process the input without
backtracking.
– There is a procedure for each non terminal in the
grammar.
– Consider RHS of any production rule as definition of
the procedure.
– As it reads expected input symbol, it advances input
Here, we start from a sentence and then apply production rules in reverse
manner in order to reach start symbol.
Shift –Reduce
LR Parsing
SLR Parsing
CLR Parsing
LALR Parsing
Bottom up parsing - Shift-Reduce Parsers
Shift-reduce parsers are a type of bottom-up parser used in syntax analysis of
programming languages.
They operate by shifting input symbols onto a stack and reducing them to grammar rules
when a rule can be applied.
Stack
Bottom up parsing - SHIFT-REDUCE PARSING
Stack i/o string Actions
1 $ id*(id+id)$ Shift
2 $id *(id+id)$ Shift
3 $E *(id+id)$ Reduce (E →id)
4 $E* (id+id)$ Shift
5 $E*( id+id)$ Shift
6 $E*(id +id)$ Shift
7 $E*(E +id)$ Reduce (E →id)
id
E
) 8 $E*(E+ id)$ Shift
+++ 9 $E*(E+ id )$ Shift
id
EEE
E 10 $E*(E+ E )$ Reduce (E →id)
E
((( 11 $E*(E )$ Reduce (E →E+E)
***
* 12 $E*(E) $ Shift
E
E
id
EE 13 $E*E $ Reduce (E →(E))
$$$$$ 14 $E $ Accept
LR Parser
S S1 → S .
S →AA.
A A
S → A .A
A → . aA | . b
a a
b A → a.A |.b A
A → aA .
A → . aA|.b
b a
b
A →b .
S4
Types of LR Parser S-denotes shift
action and
– Example#1: LR(0) parsing for the grammar G:
4-indicates state
(1) S → AA number
(2) A →aA | b(3)
– Find prepare LR(0) parsing table
Action Goto
a b $ A S
0 S3 S4 2 1
1 Accept
2 S3 S4 5
3 S3 S4 6
4 r3 r3 r3
5 r1 r1 r1
6 r2 r2 r2
states
S 1 → .S
A
I2
S → . A/. a
S → A. 0 s3 1 2
A→. a
a 1 accept
I3 2 r1
S → a. 3 r1/r2
A → a.
If there is 2 entries in
same, we call it conflicts.
Meaning the parser is not
SLR parser.
Types of SLR Parser – CLR(1) & LALR(1)
– Example#1: CLR(1) parsing for the grammar:
S → AA
A →aA/b
– Find augmented grammar
Operator-precedence Parsing
The operator-precedence parser is a shift –reduce parser
that can be easily constructed by hand.
Operator precedence parser can be constructed from a
grammar called Operator-grammar.
These grammars have the property that
no production ε
Example:
Given the production rules: Leading(E): { ( , id } Trailing(E): { +, *, ) , id }
1. E → E + T | T Leading(T): { ( , id } Trailing(T): { *, ) , id }
2. T → T * F | F
3. F → ( E ) | id Leading(F): { ( , id } Trailing(F): { ) , id }
Bottom up parsing - Operator-precedence Parsing
Example#2: Operator-precedence relations for the grammar
S→ a | | (T)
T → T,S | S , is given in the following table
Step 01: Compute LEADING
– LEADING(S) = {a, ,( } Operator Precedence Relation Table
a ( ) , $
– LEADING (T) = {, , a, , ( }
a
Step 02: Compute the TRAILING >. >. .>
– TRAILING(S) ={a, ,) } >. >. .>
– TRAILING(T) ={, , a, ,) }( <. <. <. =. <.
) >. >. .>
, <. <. <. >. >.
$ <. <. <.
Syntax Error Handling
Most programming language specifications do not describe how
a compiler should respond to errors
Planning the error handling right from the start can both
1) Lexical errors: occurs when the compiler does not recognize a sequence of
characters as a proper lexical token.
– Example : printf("Geeksforgeeks");$
2) Syntax errors: misplaced semicolons, extra or missing braces; that is, " { "
or " } " Example : swich(ch)
{
• Typical syntax errors are:
.......
– Errors in structure
.......
– Missing operator
}
– Misspelled keywords
The keyword switch is incorrectly
– Unbalanced parenthesis
written as a swich. Hence, an
• Example - int 2; “Unidentified keyword/ identifier”
Syntax Error Handling
3) Semantic errors: type mismatches between operators
and operands.
– Undeclared variables
d=e + f
2) Phrase-level recovery
I. G is ambiguous
II. G produces all strings with equal number of a’s and b’s
III. G can be accepted by a deterministic PDA
Which combination below expresses all the true statements about G?
A. I only
D. I, II and III
Exercises
Solution : There are different LMD’s for string abab which can be
S => SS => SSS => abSS => ababS => abab
S => SS => abS => abab, So the grammar is ambiguous. Therefore statement I is true.
Statement II states that the grammar G produces all strings with equal number of a’s and b’s but it
can’t generate aabb string. So statement II is incorrect.
Statement III is also correct as it can be accepted by deterministic PDA. So correct option is (B).
Solution : (A) is correct because for ambiguous CFL’s, all CFG corresponding to it are
ambiguous.
(B) is also correct as unambiguous CFG has a unique parse tree for each string of the
language generated by it.
(C) is false as some languages are accepted by Non – deterministic PDA but not by
deterministic PDA.