0% found this document useful (0 votes)
5 views113 pages

Syntax Analysis.pptx

The document discusses the role of the parser in syntax analysis for compiler design, detailing how it validates input programs through token streams and grammar. It outlines common syntax errors, recovery strategies for handling these errors, and the limitations of syntax analysis. Additionally, it introduces context-free grammars (CFGs) and their components, along with derivation processes and parse trees to represent the structure of programming languages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views113 pages

Syntax Analysis.pptx

The document discusses the role of the parser in syntax analysis for compiler design, detailing how it validates input programs through token streams and grammar. It outlines common syntax errors, recovery strategies for handling these errors, and the limitations of syntax analysis. Additionally, it introduces context-free grammars (CFGs) and their components, along with derivation processes and parse trees to represent the structure of programming languages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

Compiler Design

Syntax Analysis
6th Semester B.Tech. (CSE)
Course Code: 18CS1T08

Dr. Jasaswi Prasad Mohanty


School of Computer Engineering
KIIT Deemed to be University
Bhubaneswar, India
Parser Interface

CD / Module-III / Jasaswi 2 20 February 2024


The Role of the Parser
 Given an input program, scanner generates a stream of tokens classified
according to the syntactic category.
 The parser determines if the input program, represented by the token stream, is a
valid sentence in the programming language.
 The parser attempts to build a derivation for the input program, using a grammar
for the programming language.
• If the input stream is a valid program, parser builds a valid model for later
phases.
• If the input stream is invalid, parser reports the problem and diagnostic
information to the user.

CD / Module-III / Jasaswi 3 20 February 2024


Syntax Analysis
 Given a programming language grammar 𝐺 and a stream of tokens 𝑠, parser tries
to find a derivation in 𝐺 that produces 𝑠.
 In addition, a syntax analyzer
• Forward the information as Intermediate Representation (IR) to the next
compilation phases.
• Handle errors if the input string is not in 𝐿(𝐺).

CD / Module-III / Jasaswi 4 20 February 2024


Examples of Syntax Errors
1. Missing semicolon at the end of a 3. Missing closing braces:
statement: void printHelloNinja( String s )
#include <stdio.h> {
int main() // function - body
{
print("Hello, World")
4. Errors in Expressions:
print("Missing Semicolon")
#include <stdio.h>
return 0;
int main()
}
{
2. Missing Parentheses: int a, b=3, c=2, d=1, i, j;
#include <stdio.h> a = ( b + c * ( c + d ); //missing closing parentheses
int main() i = j * + c ; // missing argument between “*” and “+”
{ return 0;
if (x > 0 }
print("Positive");
return 0;
}
CD / Module-III / Jasaswi 5 20 February 2024
Syntax Error Recovery Strategies
 The errors found in the syntax analysis phase are known as Syntactic errors
• For e.g., missing or extra parentheses, missing or extra braces, mismatched quotes, missing or misplaced
semicolons, missing operator.
 These are some common error-recovery strategies that can be implemented in the parser to deal with errors in the
code.
1. Panic mode Recovery:
• This is the easiest way of error-recovery and also, it prevents the parser from developing infinite loops.
• When a parser encounters an error anywhere in the statement, it ignores the rest of the statement by not
processing input from erroneous input to delimiter.
• In this method of discovering the error, the parser discards input symbols one at a time.
• This process is continued until one of the designated sets of synchronizing tokens is found.
• We can correct the error by deleting extra semicolons, replacing commas with semicolons, or reintroducing
missing semicolons.
• Example:

• This technique may lead to semantic error or runtime error in further stages.
• No guarantee of not to go into an infinite loop.
CD / Module-III / Jasaswi 6 20 February 2024
Syntax Error Recovery Strategies – contd..
2. Phrase Level Recovery
• In this strategy, on discovering an error, parser performs local correction on the
remaining input.
• It may replace a prefix of the remaining input with some string that allows the
parser to continue.
• The local correction can be replacing a comma by a semicolon, delete an extraneous
semicolon or insert a missing semicolon.
• Example:

• Parser designers have to be careful to choose replacements that do not lead to


infinite loops.

CD / Module-III / Jasaswi 7 20 February 2024


Syntax Error Recovery Strategies – contd..
3. Error productions:
• Works only for common mistakes known to the compiler designers, complicates the
grammar.
• In addition, the designers can create augmented grammar to be used, as
productions that generate erroneous constructs when these errors are encountered.
• The method is very difficult to maintain because if we change the grammar then it
becomes necessary to change the corresponding production.
• Example:
Suppose the input string is “abcd”. So, we need to add a production in the
Grammar: S  A existing Grammar.
A  aA | bA | a | b Augmented Grammar: E  SB
B  cd SA
The input string is not obtainable by the A  aA | bA | a | b
above grammar. B  cd
Now, it is possible to obtain the string “abcd”.
CD / Module-III / Jasaswi 8 20 February 2024
Syntax Error Recovery Strategies – contd..
4. Global correction
• Given an incorrect input string 𝑥 and grammar 𝐺, find a parse tree for a related
string 𝑦 such that the number of modifications (insertions, deletions, and
changes) of tokens required to transform 𝑥 into 𝑦 is as small as possible
• This may allow the parser to make minimal changes in the source code,
but due to the complexity (time and space) of this strategy, it has not been
implemented in practice yet.

CD / Module-III / Jasaswi 9 20 February 2024


Limitations of Syntax Analysis
 Cannot determine whether
• A variable has been declared before use.
• A variable has been initialized.
• Variables are of types on which operations are allowed.
• Number of formal and actual arguments of a function match.
 These limitations are handled during semantic analysis

CD / Module-III / Jasaswi 10 20 February 2024


Context-Free Grammars
 Describe the syntax of programming language constructs like expressions and
statements.
 A context-free grammar (CFG) 𝐺 is a quadruple (𝑇, 𝑁𝑇, 𝑆, 𝑃)
• 𝑇: Set of terminal symbols (also called words) in the language 𝐿(𝐺)
• 𝑁𝑇: Set of nonterminal symbols that appear in the productions of 𝐺
• 𝑆: Goal or start symbol of the grammar 𝐺
• 𝑃: Set of productions (or rules) in G
 Terminal symbols correspond to syntactic categories returned by the scanner.
• Terminal symbol is a word that can occur in a sentence.
 Non-terminals are syntactic variables introduced to provide abstraction and structure in
the productions.
 S represents the set of sentences in 𝐿(𝐺).
 Each rule in 𝑃 is of the form 𝑵𝑻 → (𝑻 ∪ 𝑵𝑻)

CD / Module-III / Jasaswi 11 20 February 2024
CFG: Example
 The following grammar defines simple arithmetic expressions.
 In this grammar, the terminal symbols are id, +, −, *, / , (, )
 The nonterminal symbols are expression, term and factor, and expression is the
start symbol.

CD / Module-III / Jasaswi 12 20 February 2024


Notational Conventions
1. These symbols are terminals:
a) Lowercase letters early in the alphabet, such as a, b, e.
b) Operator symbols such as +, *, and so on.
c) Punctuation symbols such as parentheses, comma, and so on.
d) The digits 0, 1, . . . , 9.
e) Boldface strings such as id or if, each of which represents a single terminal symbol.
2. These symbols are non-terminals:
a) Uppercase letters early in the alphabet, such as A, B, C.
b) The letter S, which, when it appears, is usually the start symbol.
c) Lowercase, italic names such as expr or stmt.
d) When discussing programming constructs, uppercase letters may be used to represent
non-terminals for the constructs. For example, non-terminals for expressions, terms, and
factors are often represented by E, T, and F, respectively.

CD / Module-III / Jasaswi 13 20 February 2024


Notational Conventions
3. Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols;
that is, either non-terminals or terminals.
4. Lowercase letters late in the alphabet, chiefly u, v, . . . , z, represent (possibly empty)
strings of terminals.
5. Lowercase Greek letters, , ,  for example, represent (possibly empty) strings of
grammar symbols. Thus, a generic production can be written as A  , where A is the
head and  the body.
6. A set of productions A  1, A  2, . . . , A  k with a common head A (call them
A-productions), may be written A  1 | 2 . . . k . Call 1 | 2 . . . k the alternatives
for A.
7. Unless stated otherwise, the head of the first production is the start symbol.

CD / Module-III / Jasaswi 14 20 February 2024


CFG: Example using Notational Conventions

CD / Module-III / Jasaswi 15 20 February 2024


Derivation
 Consider a nonterminal A in the middle of a sequence of grammar symbols, as in
A, where  and  are arbitrary strings of grammar symbols.
 Then we can write 𝛼𝐴𝛽  𝛼𝛾𝛽 if 𝐴 → 𝛾 is a production. The symbol  means,
“derives in one step”.
 When a sequence of derivation steps 𝛼1  𝛼2 . . .  𝛼n rewrites 𝛼1 to 𝛼n, we
say 𝛼1 derives 𝛼n.

 We write 𝜶𝟏 ֜ 𝜶𝒏 . Often, we wish to say, "derives in zero or more steps.“

• For any string 𝛼, 𝛼 ֜ 𝛼
∗ ∗ ∗
• If 𝛼 ֜ 𝛽 and 𝛽 ֜ 𝛾, then 𝛼 ֜ 𝛾.
+
 Likewise, ֜ means, “derives in one or more steps”.

CD / Module-III / Jasaswi 16 20 February 2024


Derivation

 If S ֜ 𝛼, where S is a start symbol of grammar G, we say that 𝛼 is a sentential form of
G.
• Note that a sentential form may contain both terminals and non-terminals, and may
be empty. A sentential form can be derived from the start symbol in zero or more steps.
 Derivation is a sequence of rewriting steps that begins with the grammar 𝐺’s start
symbol S and ends with a sentence in the language.
+
𝑆 ֜ w where 𝑤 ∈ 𝐿(𝐺)
 The language generated by a grammar is its set of sentences. Thus, a string of terminals

w is in L(G), the language generated by G, if and only if w is a sentence of G (or 𝑆 ֜ w).
 A language that can be generated by context-free grammar is said to be a context-free
language.
 If two grammars generate the same language, the grammars are said to be equivalent.

CD / Module-III / Jasaswi 17 20 February 2024


Leftmost and Rightmost Derivations
 At each step during derivation, we have two choices to make.
• Which non-terminal to rewrite?
• Which production rule to pick?
 Rightmost (or canonical) derivation rewrites the rightmost nonterminal at each step,
denoted by 𝛼 𝛽
𝑟𝑚
• Similarly, leftmost derivation rewrites the leftmost nonterminal at each step,
denoted by 𝛼 𝛽
𝑙𝑚

 Every leftmost derivation can be written as 𝑤𝐴𝛾 𝑤𝛿𝛾 where w consists of terminals
𝑙𝑚
only, 𝐴 → 𝛿 is the production applied, and  is a string of grammar symbols.

 If 𝑆  , then we say that  is a left-sentential form of the grammar at hand.
𝑙𝑚

CD / Module-III / Jasaswi 18 20 February 2024


Leftmost and Rightmost Derivations: Example
 Consider the following grammar:
E  E + E |  E | E * E | ( E ) | id
 The string -(id + id) is a sentence of the above grammar because there is a left
derivation
𝐸 − 𝐸 − 𝐸  − 𝐸 + 𝐸  − 𝒊𝒅 + 𝐸  − (𝒊𝒅 + 𝒊𝒅)
 For the same string -(id + id) there is a right derivation
𝐸 − 𝐸 − 𝐸  − 𝐸 + 𝐸  − 𝐸 + 𝒊𝒅  − (𝒊𝒅 + 𝒊𝒅)

CD / Module-III / Jasaswi 19 20 February 2024


Parse Tree
 A parse tree is a graphical representation of a derivation.
• Root is labeled with by the start symbol 𝑆
• Each internal node is a nonterminal, and represents the application of a
production
• Leaves are labeled by terminals and constitute a sentential form, read from
left to right, called the yield or frontier of the tree.
 Parse tree filters out the order in which productions are applied to replace non-
terminals
• It just represents the rules applied.

CD / Module-III / Jasaswi 20 20 February 2024


Parse Tree: Example
CFG: E  E + E |  E | E * E | ( E ) | id
Parse Tree:

CD / Module-III / Jasaswi 21 20 February 2024


Example of Derivation, Sentential Form and Parse Tree
 CFG:
𝐸𝑥𝑝𝑟 → (𝐸𝑥𝑝𝑟) | 𝐸𝑥𝑝𝑟 𝑂𝑝 name | name
𝑂𝑝 → + | − | × | ÷
 Derivation of (𝒂 + 𝒃) × c:
𝐸𝑥𝑝𝑟  𝐸𝑥𝑝𝑟 𝑂𝑝 name
 𝐸𝑥𝑝𝑟 × name
 (𝐸𝑥𝑝𝑟) × name
 (𝐸𝑥𝑝𝑟 𝑂𝑝 name) × name
 (𝐸𝑥𝑝𝑟 + name) × name
 (name + name) × name Parse Tree

CD / Module-III / Jasaswi 22 20 February 2024


Ambiguity
 A grammar that produces more than one parse tree for some sentence is said to
be ambiguous.
• i.e. An ambiguous grammar is one that produces more than one leftmost
derivation or more than one rightmost derivation for the same sentence.
 A grammar 𝐺 is ambiguous if some sentence in 𝐿(𝐺) has more than one
rightmost (or leftmost) derivation
• An ambiguous grammar can produce multiple derivations and parse trees
 To show that a grammar is ambiguous, all we need to do is find a terminal string
that is the yield of more than one parse tree.

CD / Module-III / Jasaswi 23 20 February 2024


Ambiguity: Example
 Consider the following arithmetic expression grammar:
E  E + E | E * E | ( E ) | id
 This grammar permits two distinct leftmost derivations for the sentence id + id * id.

 The corresponding parse trees appear in the following figure:

CD / Module-III / Jasaswi 24 20 February 2024


Ambiguity: Example (Dangling Else Problem)
 Consider the following arithmetic expression grammar:
Note: Here "other" stands for any
statement other than if statement.

 This grammar permits two distinct leftmost derivations for the following sentence:
if E1 then if E2 then S1 else S2
 The corresponding parse trees appear in the following figure:

Note: Every else statement should match with closest unmatched then statement.
CD / Module-III / Jasaswi 25 20 February 2024
Lexical Versus Syntactic Analysis
 Everything that can be described by a regular expression can also be described by a
grammar.
• Why do we use regular expressions to define the lexical syntax of a language?
 These are several reasons:
• Separating the syntactic structure of a language into lexical and non-lexical parts
provides a convenient way of modularizing the front end of compiler into two
manageable-sized components.
• The lexical rules of a language are frequently quite simple and to describe them we
do not need a notation as powerful as grammars.
• Regular expressions generally provide a more concise and easier-to-
understand notations for tokens than grammars.
• More efficient lexical analyzers can be constructed automatically from regular
expressions than from arbitrary grammars.

CD / Module-III / Jasaswi 26 20 February 2024


Dealing with Ambiguous Grammars
 Ambiguous grammars are problematic for compilers
• Compilers use parse trees to interpret the meaning of the expressions during later stages
• Multiple parse trees can give rise to multiple interpretations
 Fixing ambiguous grammars
• Transform the grammar to remove the ambiguity.
• Include rules to disambiguate during derivations
 For e.g., associativity and precedence

CD / Module-III / Jasaswi 27 20 February 2024


Removal of Ambiguity
We can remove ambiguity solely on the basis of the following two properties:
 Precedence: If different operators are used, we will consider the precedence of the operators. The three
important characteristics are :
1. The level at which the production is present denotes the priority of the operator used.
2. The production at higher levels will have operators with less priority. In the parse tree, the nodes
which are at top levels or close to the root node will contain the lower priority operators.
3. The production at lower levels will have operators with higher priority. In the parse tree, the nodes
which are at lower levels or close to the leaf nodes will contain the higher priority operators.
 Associativity: If the same precedence operators are in production, then we will have to consider the
associativity.
1. If the associativity is left to right, then we have to prompt a left recursion in the production. The
parse tree will also be left recursive and grow on the left side.
Example: +, -, *, / are left associative operators.
2. If the associativity is right to left, then we have to prompt the right recursion in the productions. The
parse tree will also be right recursive and grow on the right side.
Example: ^ is a right associative operator.

CD / Module-III / Jasaswi 28 20 February 2024


Removal of Ambiguity: Example 1
 Consider the grammar shown below, which has two different operators + and *:
E  E + E | E * E | id
 The above grammar is ambiguous as we can draw two parse trees for the string id + id * id
 Consider the expression 3 + 2 * 5 in which the operator * has higher priority than +

 The “+” having the least


priority has to be at the
upper level and has to wait
for the result produced by
the “*” operator which is at
the lower level.
 So, the first parse tree is the
correct one and gives the
same result as expected.

CD / Module-III / Jasaswi 29 20 February 2024


Removal of Ambiguity: Example 1
 The unambiguous grammar will contain the productions having the highest priority operator (“*”
in the example) at the lower level and vice versa.
 The associativity of both the operators are Left to Right. So, the unambiguous grammar has to
be left recursive.
 The correct grammar will be:

EE+P
EE+E EP EE+P|P
EE*E PP*Q OR PP*Q|Q
E  id PQ Q  id
Q  id

 The parse tree for the string ” id+id*id+id ” is shown here.

CD / Module-III / Jasaswi 30 20 February 2024


Removal of Ambiguity: Example 2
 The unambiguous grammar for an expression having the operators +, , * , ^ is :
EE+P|EP|P // Plus and minus operator are at highest level due to least priority and
left associative.
PP*Q|Q // Multiplication operator has more priority than + and – and
lesser than ^ and left associative.
QR^Q|R // Exponent operator is at lower level due to highest priority and right
associative.
R  id
 NOTE:
• While converting an ambiguous grammar to an unambiguous grammar, we shouldn’t change
the original language provided by the ambiguous grammar.
• So, the non-terminals in the ambiguous grammar have to be replaced with other variables in
such a way that we get the same language as it was derived before and also maintain the
precedence and associativity rule simultaneously.
• There are some ambiguous grammars which can’t be converted into unambiguous
grammars.

CD / Module-III / Jasaswi 31 20 February 2024


Removal of Ambiguity: Example 3
 Convert the following ambiguous grammar into unambiguous grammar:
bexp → bexp or bexp | bexp and bexp | not bexp | T | F
where bexp represents Boolean expression, T represents True and F represents False.
 The given grammar consists of the following operators: or , and , not
 The grammar consists of the following operands : T , F
 The priority order is: not > and > or where and operator is left associative and or operator is left
associative.
 The unambiguous grammar is:
bexp → bexp or M
bexp → bexp or bexp bexp → M
bexp → bexp or M | M
bexp → bexp and bexp M → M and N
M → M and N | N
bexp → not bexp M→N OR
N → not N | G
bexp → T N → not N
G→T|F
bexp → F N→G
G→T|F

CD / Module-III / Jasaswi 32 20 February 2024


Solving Dangling Else Problem
 Idea: A statement appearing between a then and an else must be “matched" i.e., the
interior statement must not end with an unmatched or open then.
 The modified grammar is as follows:
stmt

open_stmt

if expr then stmt


E1
 Required Sentence: if E1 then if E2 then S1 else S2 matched_stmt
 For the above grammar the modified parse tree is shown here.
if expr then matched_stmt else
matched_stmt
E2 S1
S2

CD / Module-III / Jasaswi 33 20 February 2024


Evaluation of Expression: Example 1
 Consider the given grammar:  The priority order and associativity of
𝐸 →𝐸+𝑇|𝑇 operators on the basis of the given grammar
𝑇 →𝐹 × 𝑇|𝐹 is: 𝑖𝑑 > × > +
𝐹 → 𝑖𝑑
 The operator × is right associative and
 Evaluate the following expression in operator + is left associative.
accordance with the given grammar:
 Now, we parenthesize the given expression
based on the precedence and associativity
2+3×5×6+2
of operators as:
2+ 3× 5×6 +2
 We, evaluate the parenthesized expression
as:
2+ 3× 5×6 + 2 = 2 + 3 × 30 +2
= 2 + 90 + 2 = 92 + 2 = 94

CD / Module-III / Jasaswi 34 20 February 2024


Evaluation of Expression: Example 2
 Consider the given grammar:  The priority order and associativity of operators
𝐸 →𝐸+𝑇 𝐸−𝑇 𝑇 on the basis of the given grammar is:
𝑇 →𝑇 ×𝐹 𝑇÷𝐹 𝐹 𝑖𝑑 > ↑ > (×,÷) > (+, −)
𝐹 →𝐺 ↑𝐹|𝐺  The operator ↑ is right associative and the
𝐺 → 𝑖𝑑 operators +, −, ×, ÷ are left associative.
 Evaluate the following expression in  Now, we parenthesize the given expression
accordance with the given grammar: based on the precedence and associativity of
operators as:
2×1+4↑2↑1×1+3 ( 2×1 + 4 ↑ 2 ↑1 ×1 )+3
 We, evaluate the parenthesized expression as:
2×1 + 4↑ 2↑1 ×1 +3
= 2×1 + 4↑2 ×1 +3
= 2 × 1 + 16 × 1 + 3 = 2 + 16 × 1 +3
= 2 + 16 + 3 = 18 + 3 = 21

CD / Module-III / Jasaswi 35 20 February 2024


Evaluation of Expression: Example 3
 Consider the given grammar:  The priority order and associativity of operators on
𝐸 →𝐸+𝑇 𝐸−𝑇 𝑇 the basis of the given grammar is:
𝑖𝑑 > ↑ > (×,÷) > (+, −)
𝑇 →𝑇 ×𝐹 𝑇÷𝐹 𝐹
𝐹 →𝐹 ↑𝐺|𝐺  All the operators ↑, +, −, ×, ÷ are left associative.
𝐺 → 𝑖𝑑  Now, we parenthesize the given expression based
on the precedence and associativity of operators as:
 Evaluate the following expression in ( 2 ↑ 1 ↑ 4 + 3 × 5 × 6 ↑ 1 ) + (2 ↑ 3)
accordance with the given grammar:
 We, evaluate the parenthesized expression as:
2↑1↑4+3×5×6↑1+2↑3 2↑1 ↑4 + 3×5 × 6↑1 + 2↑3
= 2↑4 + 3×5 × 6↑1 + 2↑3
= 16 + 3×5 × 6↑1 + 2↑3
= 16 + 3×5 ×6 + 2↑3
= 16 + 3×5 ×6 + 8 = 16 + 90 + 8
= 106 + 8 = 114

CD / Module-III / Jasaswi 36 20 February 2024


Evaluation of Expression: Example 4
 Consider the given grammar:  The priority order and associativity of operators on the
𝐸 →𝐸 ↑𝑇|𝑇 basis of the given grammar is:
𝑖𝑑 > − > + > ↑
𝑇 →𝑇+𝐹|𝐹
𝐹 →𝐺−𝐹|𝐺  The operators ↑, + are left associative and the
𝐺 → 𝑖𝑑 operator − is right associative.
 Now, we parenthesize the given expression based on
 Evaluate the following expression in the precedence and associativity of operators as:
accordance with the given grammar:
((2 ↑ 1) ↑ (((3 + (5 − 6 − 8 − 5) ) + 10) + 11)) ↑ 2

2 ↑ 1 ↑ 3 + 5 − 6 − 8 − 5 + 10 + 11 ↑ 2  We, evaluate the parenthesized expression as:


((2 ↑ 1) ↑ (((3 + (5 − 6 − 8 − 5) ) + 10) + 11)) ↑ 2
= ((2 ↑ 1) ↑ (((3 + 5 − 6 − 3 ) + 10) + 11)) ↑ 2
= ((2 ↑ 1) ↑ (((3 + (5 − 3)) + 10) + 11)) ↑ 2
= ((2 ↑ 1) ↑ (((3 + 2) + 10) + 11)) ↑ 2
= ((2 ↑ 1) ↑ (((3 + 2) + 10) + 11)) ↑ 2
= ((2 ↑ 1) ↑ ((5 + 10) + 11)) ↑ 2
= ((2 ↑ 1) ↑ (15 + 11)) ↑ 2
= ((2 ↑ 1) ↑ 26) ↑ 2 = (2 ↑ 26) ↑ 2 = (226 ) ↑ 2 = (226 )2
CD / Module-III / Jasaswi 37 20 February 2024
Left Recursion

 A grammar is left-recursive if it has a nonterminal A such that there is a derivation A ֜ A𝛼 for some
string 𝛼.
• Direct left recursion (Immediate left recursion) : There is a production of the form 𝑨 → 𝑨𝜶
• Indirect left recursion (Left recursion involving derivations of two or more steps): First symbol
on the right-hand side of a rule can derive the symbol on the left.
 We need to eliminate left-recursion since Top-down parsing methods cannot handle left-recursive
grammars. We can often reformulate a grammar to avoid left recursion.
 Removal of Left Recursion: Left recursion can be removed as follows:

Example 1
Formula

|𝝐

|𝝐

CD / Module-III / Jasaswi 38 20 February 2024


Non-Left-Recursive Expression Grammar: Example 2
CFG with Left Recursion Formula CFG without Left Recursion

CD / Module-III / Jasaswi 39 20 February 2024


Removing Left Recursion: Example 3
 Consider the following grammar
and eliminate left recursion:
S  (L) | a

Formula
LL,S|S
 After Eliminating Left recursion:
L  S 𝑳′
𝑳′  , S 𝑳′ | 
S  (L) | a

CD / Module-III / Jasaswi 40 20 February 2024


Removing Left Recursion: Example 4
 Consider the following grammar  Simulating the production of A in B  A b
and eliminate left recursion: resultant grammar is:
ABa|Aa|c A  B a 𝐀′ | c 𝐀′
BBb|Ab|d 𝐀′  a 𝐀′ | 
 Eliminating Left recursion from B  B b | B a 𝐀′ b | c 𝐀′ b | d
ABa|Aa|c:
 Eliminating Left recursion from the
ABa𝐀 |c𝐀 ′ ′
productions of B the final grammar is:
𝐀′  a 𝐀′ |  A  B a 𝐀′ | c 𝐀′ Formula
 Now the resultant grammar is: 𝐀′  A 𝐀′ | 
A  B a 𝐀′ | c 𝐀′ B  c 𝐀′ b 𝐁′ | d 𝐁′
𝐀′  a 𝐀′ |  B  b 𝐁′ | a 𝐀′ b 𝐁′ | 
BBb|Ab|d

CD / Module-III / Jasaswi 41 20 February 2024


Removing Left Recursion: Example 5
 Consider the following grammar  Substituting the production of X in
and eliminate left recursion: S  X a:
XXSb|Sa|b X  S a 𝐗′ | b 𝐗′
SSb|Xa|a 𝐗′  S b 𝐗′| 
 Eliminating Left recursion from S  S b | S a 𝐗′ a | b 𝐗′a | a
XXSb|Sa|B:
 Eliminating Left recursion from the
XSa𝐗 |b𝐗 ′ ′
productions of B the final grammar
𝐗′  S b 𝐗′|  is: Formula
 Now the resultant grammar is: X  S a 𝐗′ | b 𝐗′
X  S a 𝐗′ | b 𝐗′
𝐗′  S b 𝐗′| 
S  b 𝐗′ a 𝐒′ | a 𝐒′
𝐗′  S b 𝐗′| 
SSb|Xa|a 𝐒′  b 𝐒′ | a 𝐗′ a 𝐒′ | 

CD / Module-III / Jasaswi 42 20 February 2024


Eliminating Indirect Left Recursion
 The rule for removing immediate left recursion does not eliminate left recursion involving
derivations of two or more steps.
 Example:
𝑺→𝑨𝒂|𝒃
𝑨→𝑨𝒄 𝑺𝒅 𝝐
 The nonterminal 𝑆 is left recursive because 𝑆 ⟹ 𝐴𝑎 ⟹ 𝑆𝑑𝑎 but it is not immediately left
recursive.
 The next slide shows one algorithm which systematically eliminates left recursion from a
grammar.
+
 It is guaranteed to work of the grammar has no cycles (derivations of the form 𝐴 ֜ 𝐴)
or 𝜖-productions (productions of the form 𝐴 → 𝜖).

CD / Module-III / Jasaswi 43 20 February 2024


Algorithm for Eliminating Left Recursion
 INPUT: Grammar 𝐺 with no cycles or 𝜖-productions
 OUTPUT: An equivalent grammar with no left recursion.
 METHOD: Apply the following algorithm to G. Note that the resulting non-left-recursive grammar
may have 𝜖-productions.

CD / Module-III / Jasaswi 44 20 February 2024


Example of Eliminating Left Recursion
 Let us apply the algorithm to the following grammar:
𝑺→𝑨𝒂|𝒃
𝑨→𝑨𝒄 𝑺𝒅 𝝐
 Although the algorithm is not guaranteed to work,
because of the 𝜖-production, but in this case, the
production 𝑨 → 𝝐 turns out to be harmless.
 We order the nonterminals 𝑺, 𝑨.
𝑨𝟏 : 𝑺, 𝑨𝟐 : 𝑨
 There is no immediate left recursion among the
𝑺-productions, so nothing happens during the outer For i=1, among 𝐴1 𝑺 productions there is no immediate left
recursion.
loop for i = 1.
For i=2, and j=1, we need to replace the productions of the form
 For i = 2, we substitute for 𝑺 in 𝑨 → 𝑺𝒅 to obtain the 𝑨𝟐 → 𝑨𝟏 𝜸 (𝑨 → 𝑺𝒅) by the productions 𝑨𝟐 → 𝜹𝟏 𝜸 𝜹𝟐 𝜸 . . . |𝜹𝒌 𝜸
following 𝑨-productions. (𝑨 → 𝑨𝒂𝒅 | 𝒃𝒅) where 𝑨𝟏 → 𝜹𝟏 | 𝜹𝟐 . . . 𝜹𝒌 (𝑺 → 𝑨𝒂 | 𝒃) are all
𝑨→𝑨𝒄 𝑨𝒂𝒅 𝒃𝒅|𝝐 current 𝑨𝟏 (𝑺) productions.
 Eliminating the immediate left recursion among Now the grammar becomes:
𝑺 → 𝑨𝒂 | 𝒃
these 𝑨-productions yields the following grammar:
𝑨 → 𝑨𝒄 𝑨𝒂𝒅 𝒃𝒅 | 𝝐
𝑺→𝑨𝒂|𝒃
𝑨 → 𝒃 𝒅 𝑨′ | 𝑨′ Now eliminating the left recursion among the 𝑨𝟐 𝑨 productions
(𝑨 → 𝑨𝒄 𝑨𝒂𝒅 𝒃𝒅 | 𝝐) we introduce 𝑨 → 𝒃𝒅𝑨′ | 𝑨′ and
𝑨′ → 𝒄 𝑨′ 𝒂 𝒅 𝑨′ 𝝐 𝑨′ → 𝒄 𝑨′ 𝒂 𝒅 𝑨′ 𝝐 in place of it.
CD / Module-III / Jasaswi 45 20 February 2024
Left Factoring
 Left factoring is a process by which the grammar with common prefixes is transformed to make it
is useful for producing a grammar suitable for predictive, or top-down parsing.
 When the choice between two alternative A-productions is not clear, we may be able to rewrite
the productions to defer the decision until enough of the input has been seen that we can make
the right choice.
 Example: Consider the following productions
𝑠𝑡𝑚𝑡 → 𝐢𝐟 𝑒𝑥𝑝𝑟 𝐭𝐡𝐞𝐧 𝑠𝑡𝑚𝑡 𝐞𝐥𝐬𝐞 𝑠𝑡𝑚𝑡 | 𝐢𝐟 𝑒𝑥𝑝𝑟 𝐭𝐡𝐞𝐧 𝑠𝑡𝑚𝑡
 On seeing the input 𝐢𝐟, we cannot immediately tell which production to choose to expand stmt.
 In general, if 𝑨 → 𝜶𝜷𝟏 | 𝜶𝜷𝟐 are two A-productions, and the input begins with a nonempty string
derived from 𝛼, we do not know whether to expand 𝐴 to 𝛼𝛽1 or 𝛼𝛽2 .
 We may defer the decision by expanding 𝐴 to 𝛼𝐴′ . Then, after seeing the input derived from 𝛼, we
expand 𝐴′ to 𝛽1 or to 𝛽2 .
 That is, left-factored, the original productions become
𝑨 → 𝜶𝑨′
𝑨′ → 𝜷𝟏 | 𝜷𝟐
CD / Module-III / Jasaswi 46 20 February 2024
Left Factoring
 Left factoring is the process of extracting and isolating common prefixes in a set of productions.
 Algorithm:

CD / Module-III / Jasaswi 47 20 February 2024


Left Factoring a grammar
 INPUT: Grammar 𝐺
 OUTPUT: An equivalent left-factored grammar.
 METHOD:
• For each nonterminal 𝑨, find the longest prefix 𝜶 common to two or more of its alternatives.
• If 𝜶 ≠ 𝝐 i.e. there is a nontrivial common prefix replace all of the 𝑨-productions
𝑨 → 𝜶𝜷𝟏 𝜶𝜷𝟐 . . . 𝜶𝜷𝒏 𝜸, where 𝜸 represents all alternatives that do not begin with 𝜶, by
𝑨 → 𝜶𝑨′ | 𝜸
𝑨′ → 𝜷𝟏 𝜷𝟐 . . . |𝜷𝒏
Here 𝑨′ is a new nonterminal
• Repeatedly apply this transformation until no two alternatives for a nonterminal have a
common prefix.

CD / Module-III / Jasaswi 48 20 February 2024


Left Factoring a grammar: Example 1
 The following grammar abstracts the "dangling-else" problem:
𝑺→𝒊𝑬𝒕𝑺 𝒊𝑬𝒕𝑺𝒆𝑺 𝒂
𝑬 →𝒃
 Here, 𝒊, 𝒕, and 𝒆 stand for if, then, and else; 𝑬 and 𝑺 stand for “conditional
expression” and “statement”.
 Left−factored, this grammar becomes:
𝑺 → 𝒊 𝑬 𝒕 𝑺 𝑺′ | 𝒂
𝑺′ → 𝒆 𝑺 | 𝝐
𝑬 →𝒃

CD / Module-III / Jasaswi 49 20 February 2024


Left Factoring a grammar: Example 2
 Do the left factoring in the following grammar:
𝑨 → 𝒂𝑨𝑩 𝒂𝑩𝑪 𝒂𝑨𝒄
 S𝐭𝐞𝐩 𝟏:
𝑨 → 𝒂𝑨′
𝑨′ → 𝑨𝑩 𝑩𝑪 𝑨𝒄
 S𝐭𝐞𝐩 𝟐:
𝑨 → 𝒂𝑨′
𝑨′ → 𝑨𝑫 | 𝑩𝑪
𝑫→𝑩|𝒄

CD / Module-III / Jasaswi 50 20 February 2024


Left Factoring a grammar: Example 3
 Do the left factoring in the following grammar:
𝑺 → 𝒃𝑺𝑺𝒂𝒂𝑺 𝒃𝑺𝑺𝒂𝑺𝒃 𝒃𝑺𝒃 | 𝒂
 S𝐭𝐞𝐩 𝟏:
𝑺 → 𝒃𝑺𝑺′ | 𝒂
𝑺′ → 𝑺𝒂𝒂𝑺 𝑺𝒂𝑺𝒃 𝒃
 S𝐭𝐞𝐩 𝟐:
𝑺 → 𝒃𝑺𝑺′ | 𝒂
𝑺′ → 𝑺𝒂𝑨 | 𝒃
𝑨 → 𝒂𝑺 | 𝑺𝒃

CD / Module-III / Jasaswi 51 20 February 2024


Left Factoring a grammar: Example 4
 Do the left factoring in the following grammar:
𝑺 → 𝒂𝑺𝑺𝒃𝑺 𝒂𝑺𝒂𝑺𝒃 𝒂𝒃𝒃 | 𝒃
 S𝐭𝐞𝐩 𝟏:
𝑺 → 𝒂𝑺′ | 𝒃
𝑺′ → 𝑺𝑺𝒃𝑺 𝑺𝒂𝑺𝒃 𝒃𝒃
 S𝐭𝐞𝐩 𝟐:
𝑺 → 𝒂𝑺′ | 𝒃
𝑺′ → 𝑺𝑨 | 𝒃𝒃
𝑨 → 𝑺𝒃𝑺 | 𝒂𝑺𝒃

CD / Module-III / Jasaswi 52 20 February 2024


Left Factoring a grammar: Example 5
 Do the left factoring in the following grammar:
𝑺 → 𝒂 𝒂𝒃 | 𝒂𝒃𝒄 𝒂𝒃𝒄𝒅
 S𝐭𝐞𝐩 𝟏:
𝑺 → 𝒂𝑺′
𝑺′ → 𝒃 𝒃𝒄 𝒃𝒄𝒅 | 𝝐
 S𝐭𝐞𝐩 𝟐:
𝑺 → 𝒂𝑺′
𝑺′ → 𝒃𝑨 | 𝝐
𝑨 → 𝒄 𝒄𝒅 𝝐
 S𝐭𝐞𝐩 𝟑:
𝑺 → 𝒂𝑺′
𝑺′ → 𝒃𝑨 | 𝝐
𝑨 → 𝒄𝑩 | 𝝐
𝑩→𝒅|𝝐
CD / Module-III / Jasaswi 53 20 February 2024
Left Factoring a grammar: Example 6
 Do the left factoring in the following grammar:
𝑺 → 𝒂𝑨𝒅 | 𝒂𝑩
𝑨 → 𝒂 | 𝒂𝒃
𝑩 → 𝒄𝒄𝒅 | 𝒅𝒅𝒄
 S𝐭𝐞𝐩 𝟏:
𝑺 → 𝒂𝑺′
𝑺′ → 𝑨𝒅 | 𝑩
𝑨 → 𝒂𝑨′
𝑨′ → 𝒃 | 𝝐
𝑩 → 𝒄𝒄𝒅 | 𝒅𝒅𝒄

CD / Module-III / Jasaswi 54 20 February 2024


FIRST and FOLLOW
 The construction of both top-down and bottom-up parsers is aided by two
functions, FIRST and FOLLOW, associated with a grammar G.
 During topdown parsing, FIRST and FOLLOW allow us to choose which
production to apply, based on the next input symbol.
 During panic-mode error recovery, sets of tokens produced by FOLLOW can be
used as synchronizing tokens.

CD / Module-III / Jasaswi 55 20 February 2024


FIRST Set
 Intuition
• Each alternative for the leftmost nonterminal leads to a distinct terminal
symbol.
• Which rule to choose becomes obvious by comparing the next word in the
input stream.
 Given a string 𝛾 of terminal and nonterminal symbols, FIRST(𝛾) is the set of all
terminal symbols that can begin any string derived from 𝛾.
• We also need to keep track of which symbols can produce the empty string
• FIRST: (𝑁𝑇 ∪ 𝑇 ∪ {𝜖, EOF} ) → (𝑇 ∪ {𝜖, EOF} )

CD / Module-III / Jasaswi 56 20 February 2024


Rules to Compute FIRST Set
1. If a is a terminal, then FIRST(a) = {a}
2. If 𝑋 → 𝜖 is a production, then 𝜖 ∈ FIRST(𝑋)
3. If 𝑋 is a nonterminal and 𝑋 → 𝑌1𝑌2 … 𝑌𝑘 is a production
a) Everything in FIRST(𝑌1) is in FIRST(𝑋)
b) If for some 𝑖, 𝑎 ∈ FIRST(𝑌𝑖) and ∀1 ≤ 𝑗 < 𝑖, 𝜖 ∈ FIRST(𝑌𝑗), then 𝑎 ∈ FIRST(𝑋)
c) If 𝜖 ∈ FIRST(𝑌1 … 𝑌𝑘), then 𝜖 ∈ FIRST(X)
NOTE:
• Before calculating the FIRST
Generalize FIRST relation to string of symbols functions, eliminate Left
 FIRST(𝑋𝛾) → FIRST(𝑋) if 𝑋 ↛ 𝜖 Recursion from the grammar, if
 FIRST(𝑋𝛾) → FIRST(𝑋) ∪ FIRST(𝛾) if 𝑋 → 𝜖 present.
• 𝜖 may appear in the FIRST
function of a non-terminal.

CD / Module-III / Jasaswi 57 20 February 2024


Finding FIRST Set: Example-1
 CFG:
• 𝑺 → 𝑨𝑩𝑪
• 𝑨→𝒂 𝒃 𝝐
• 𝑩→𝒄 𝒅 𝝐
• 𝑪→𝒆 𝒇 𝝐
 FIRST(C) = { e, f, 𝜖 }
 FIRST(B) = { c, d, 𝜖 }
 FIRST(A) = { a, b, 𝜖 }
 FIRST(S) =
{ a, b, by using FIRST(A)
c, d, by putting A as 𝜖 and using FIRST(B)
e, f, by putting A, B as 𝜖 and using FIRST(C)
𝜖 putting A, B, C by 𝜖
} 58
CD / Module-III / Jasaswi 20 February 2024
Finding FIRST Set: Example-2
 CFG:
• 𝑺 → 𝑨𝑩𝑪 𝒈𝒉𝒊 𝒋𝒌𝒍
• 𝑨→𝒂 𝒃 𝒄
• 𝑩→𝒃
• 𝑫→𝒅
 FIRST(D) = { d }
 FIRST(B) = { b }
 FIRST(A) = { a, b, c }
 FIRST(S) =
{ a, b, c by using FIRST(A)
g, j
}

CD / Module-III / Jasaswi 59 20 February 2024


Finding FIRST Set: Example-3
 CFG:
• 𝑬 → 𝑻 𝑬′
• 𝑬′ → + 𝑻 𝑬′ | 𝝐
• 𝑻 → 𝑭 𝑻′
• 𝑻′ →∗ 𝑭 𝑻′ | 𝝐
• 𝑭 → 𝑬 | 𝒊𝒅
 FIRST(F) = { (, id }
 FIRST(T ′ ) = { *, 𝜖 }
 FIRST(T) = { (, id } by FIRST(F)
 FIRST(E′ ) = { +, 𝜖 }
 FIRST(E) = { (, id } by FIRST(T)

CD / Module-III / Jasaswi 60 20 February 2024


Finding FIRST Set: Example-4

CD / Module-III / Jasaswi 61 20 February 2024


FOLLOW Set
 FOLLOW(𝑋) is the set of terminals that can immediately follow 𝑋.
 FOLLOW(𝑋) contains set of all terminals present immediately in right of 𝑋.
 That is, 𝑡 ∈ FOLLOW(𝑋) if there is any derivation containing 𝑋t.

CD / Module-III / Jasaswi 62 20 February 2024


Rules to Compute FOLLOW Set
1. Place $ in FOLLOW(𝑆) where 𝑆 is the start symbol and $ is the end marker.
2. If there is a production 𝐴 → 𝛼𝐵𝛽, then everything in FIRST(𝛽) except 𝜖 is in
FOLLOW(𝐵).
a) If 𝜖 ∉ First(𝛽), then Follow(B) = First(𝛽)
b) If 𝜖 ∈ First(𝛽), then Follow(B) = { First(𝛽) – 𝜖 } ∪ Follow(A)
3. If there is a production 𝐴 → 𝛼𝐵, or a production 𝐴 → 𝛼𝐵𝛽 where FIRST(𝛽)
contains 𝜖, then everything in FOLLOW(𝐴) is in FOLLOW(𝐵).
4. FOLLOW never contains 𝜖.
NOTE:
• Before calculating the FIRST and FOLLOW functions, eliminate Left Recursion from
the grammar, if present.
• We calculate the FOLLOW function of a non-terminal by looking where it is present on
the RHS of a production rule.
CD / Module-III / Jasaswi 63 20 February 2024
Finding FOLLOW Set: Examples
 CFG 1:  CFG 3:
• 𝑺→𝑨𝑪𝑫 • 𝑺→𝑨𝑩𝑪
Example - 1

• 𝑪→𝒂|𝒃 • 𝑨→𝑫𝑬𝑭
 FOLLOW(S) = { $ } • 𝑩→𝝐
 FOLLOW(A) = FIRST(C) = { a, b } • 𝑪→𝝐

Example - 3
 FOLLOW(D) = FOLLOW(S) = { $ } • 𝑫→𝝐
• 𝑬→𝝐
 CFG 2:
• 𝑭→𝝐
• 𝑺→𝑨𝒂𝑨𝒃|𝑩𝒃𝑩𝒂
 FOLLOW(S) = { $ }
Example - 2

• 𝑨→𝝐
 FOLLOW(A) = FIRST(B) =
• 𝑩→𝝐 FIRST(C) = FOLLOW(S) = { $ }
 FOLLOW(S) = { $ }  FOLLOW(D) = = FIRST(E) =
 FOLLOW(A) = { a, b } FIRST(F) = FOLLOW(A) = { $ }
 FOLLOW(B) = { a, b }
CD / Module-III / Jasaswi 64 20 February 2024
Finding FOLLOW Set: Example - 4

CD / Module-III / Jasaswi 65 20 February 2024


Finding FIRST and FOLLOW Set: Example - 1

CFG: FIRST:
 𝑺 → 𝒊𝑬𝒕𝑺𝑺′ | 𝒂  FIRST(S) = { i, a }
 𝑺′ → 𝒆𝑺 | 𝝐  FIRST(S ′ ) = { e, 𝝐 }
 𝑬→𝒃  FIRST(E) = { b }

FOLLOW:
 FOLLOW(S) = { $ }  { FIRST(S ′ )   } = { $, e }

 FOLLOW(S ′ )= FOLLOW(S) = { $, e }
 FOLLOW(E) = { t }

CD / Module-III / Jasaswi 66 20 February 2024


Finding FIRST and FOLLOW Set: Example - 2
CFG: FIRST:
 𝑬 → 𝑻 𝑬′  FIRST(F) = { (, id }
 𝑬′ → + 𝑻 𝑬′ | 𝝐  FIRST(T ′ ) = { *, 𝜖 }
 𝑻 → 𝑭 𝑻′  FIRST(T) = { (, id } by FIRST(F)
 𝑻′ →∗ 𝑭 𝑻′ | 𝝐  FIRST(E′ ) = { +, 𝜖 }
 𝑭 → 𝑬 | 𝒊𝒅  FIRST(E) = { (, id } by FIRST(T)

FOLLOW:
 FOLLOW(E) = { $, ) }
 FOLLOW(E′ ) = FOLLOW(E) = { $, ) }
 FOLLOW(T) = {FIRST(E ′ ) }  FOLLOW(E)  FOLLOW(E′ ) = { +, $, )}
 FOLLOW(T ′ ) = FOLLOW(T) = { +, $, )}
 FOLLOW(F) = {FIRST(T ′ ) }  FOLLOW(T)  FOLLOW(T ′ ) = { *, +, $, )}

CD / Module-III / Jasaswi 67 20 February 2024


Finding FIRST and FOLLOW Set: Example - 3
CFG: FIRST:
 FIRST(S) = { a }
 𝑺 → 𝒂𝑩𝑫𝒉
 FIRST(B) = { c }
 𝑩 → 𝒄𝑪
 FIRST(C) = { b, 𝜖 }
 𝑪 → 𝒃𝑪 | 𝝐
 FIRST(D) = { FIRST(E)  𝜖 }  FIRST(F) = {g, f, 𝜖}
 𝑫 → 𝑬𝑭
 FIRST(E) = { g, 𝜖 }
 𝑬→𝒈|𝝐
 FIRST(F) = { f. 𝜖 }
 𝑭→𝒇|𝝐
FOLLOW:
 FOLLOW(S) = { $ }
 FOLLOW(𝐵) = { FIRST(D)  𝜖 }  FIRST(h) = {g, f, ℎ}
 FOLLOW(C) = FOLLOW(𝐵) = { g, f, h}
 FOLLOW(D) = FIRST(h) = { h }
 FOLLOW(E) = { FIRST(F)  𝜖 }  FOLLOW(D) = { f, ℎ}
 FOLLOW(F) = FOLLOW(D) = { h }
CD / Module-III / Jasaswi 68 20 February 2024
Finding FIRST and FOLLOW Set: Example - 4
CFG: FIRST:
 𝑺→𝑨  FIRST(S) = FIRST(A) = { a }
 𝑨 → 𝒂𝑩 | 𝑨𝒅  FIRST(𝐴) = { a }
 𝑩→𝒃  FIRST(𝑨′ )= { d, 𝝐 }
 𝑪→𝒈  FIRST(B) = { b }
 FIRST(C) = { g }
Eliminating Left Recursion:
 𝑺→𝑨 FOLLOW:
 𝑨 → 𝒂𝑩𝑨′  FOLLOW(S) = { $ }
 𝑨′ → 𝒅𝑨′ | 𝝐  FOLLOW(𝐴) = FOLLOW(S) = { $ }
 𝑩→𝒃  FOLLOW(𝑨′ )= FOLLOW(𝐴) = { $ }
 𝑪→𝒈  FOLLOW(B) = { FIRST(𝑨′ )  𝜖 }  FOLLOW(A) = { d, $}
 FOLLOW(C) = NA

CD / Module-III / Jasaswi 69 20 February 2024


Finding FIRST and FOLLOW Set: Example - 5
CFG: FIRST:
 𝑺→ 𝑳 |𝒂  FIRST(S) = { (, a }
 𝑳 → 𝑺𝑳′  FIRST(𝐿) = FIRST(S) = { (, a }
 𝑳′ → , 𝑺𝑳′ | 𝝐  FIRST(𝑳′ )= { , , 𝝐}

FOLLOW:
 FOLLOW(S) = { $ }  {FIRST(𝑳′ ) - 𝝐 }  FOLLOW(L)  FOLLOW(𝑳′ ) = { $, , , ) }
 FOLLOW(𝐿) = { ) }
 FOLLOW(𝑳′ )= FOLLOW(𝐿) = { ) }

CD / Module-III / Jasaswi 70 20 February 2024


Finding FIRST and FOLLOW Set: Example - 6
CFG: FIRST:
 𝑬→𝑬+𝑻|𝑻  FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
 𝑻→𝑻×𝑭|𝑭  FIRST(𝑬′ )= { +, 𝝐 }
 𝑭 → 𝑬 | 𝒊𝒅  FIRST(T) = FIRST(F) = { (, id }
 FIRST(𝑻′ )= { ×, 𝝐 }
 FIRST(F) = { (, id }

Eliminating Left Recursion: FOLLOW:


 𝑬 → 𝑻𝑬′  FOLLOW(E) = { $, ) }
 𝑬′ → +𝑻𝑬′ | 𝝐  FOLLOW(𝑬′ )= FOLLOW(E) = { $, ) }
 𝑻 → 𝑭𝑻′  FOLLOW(𝑻) = {FIRST(𝑬′ ) - 𝝐 }  FOLLOW(E)  FOLLOW(𝑬′ ) = { +, $, ) }
 𝑻′ → × 𝑭 𝑻′ | 𝝐  FOLLOW(𝑻′ )= FOLLOW(𝑻) ) = { +, $, ) }
 𝑭 → 𝑬 | 𝒊𝒅
 FOLLOW(F) = {FIRST(𝑻′ ) - 𝝐}  FOLLOW(T)  FOLLOW(𝑻′ ) = { ×, +, $, ) }

CD / Module-III / Jasaswi 71 20 February 2024


Finding FIRST and FOLLOW Set: Example - 7
CFG: FIRST:
 𝑺 → 𝑨𝑪𝑩 𝑪𝒃𝑩 𝑩𝒂  FIRST(S) = {FIRST(𝑨) - 𝝐 }  {FIRST(C) - 𝝐 }  FIRST(B)  FIRST(b)
 𝑨 → 𝒅𝒂 | 𝑩𝑪  {FIRST(𝑩) - 𝝐 }  FIRST(a) = { d , g , h , ∈ , b , a }
 FIRST(A) = FIRST(d) ∪ {FIRST(B) – ∈ } ∪ FIRST(C) = { d , g , h , ∈ }
 𝑩→𝒈|𝝐
 FIRST(B) = { g , ∈ }
 𝑪→𝒉|𝝐
 FIRST(C) = { h , ∈ }

FOLLOW:
 FOLLOW(S) = { $ }
 FOLLOW(𝐴) = {FIRST(𝑪) - 𝝐 }  {FIRST(𝑩) - 𝝐 }  FOLLOW(S) = { h, g, $ }
 FOLLOW(B) = FOLLOW(S)  FIRST(a)  { FIRST(C)- 𝝐 }  FOLLOW(A) = {$, a, h, g }
 FOLLOW(C) = {FIRST(𝑩) - 𝝐 }  FOLLOW(S)  FIRST(b)  FOLLOW(A) = {g, $, b, h }

CD / Module-III / Jasaswi 72 20 February 2024


Parsing
 In the syntax analysis phase, a compiler verifies whether or not the tokens
generated by the lexical analyzer are grouped according to the syntactic rules of
the language.
 Parsing is the process of determining how a string of terminals can be generated
by a grammar. This is done by a parser.
 A parser is a program that generates a parse tree for the given string, if the string
is generated from the underlying grammar.
 The parser of a programming language has following functions:
• verifies that the string of tokens for a program in language can be generated
from the grammar of the language.
• detects and reports any syntax errors in the program
• constructs a parse tree which is given to the remaining phases of compiler for
further processing.
CD / Module-III / Jasaswi 73 20 February 2024
Types of Parsers

Parsers

Top-down Buttom-up Parser


Parser (Shift Reduce Parser)

TDP with full Operator


TDP without Precedence Parser LR Parser
backtracking
(Brute Force Method) backtracking (Only used for operator Grammars)

Non-Recursive LR(0) SLR(1) LALR(1) CLR(1)


Recursive
Descent Parser Descent Parser
Predictive
Parser
LL(1) Parser

CD / Module-III / Jasaswi 74 20 February 2024


Example Expression Grammar

The current input


terminal being
scanned is called
the lookahead
symbol

CD / Module-III / Jasaswi 75 20 February 2024


Top-down vs Bottom-up Parsers
Productions: String: aabcde

CD / Module-III / Jasaswi 76 20 February 2024


Top-down Parsing
 Top-down parsing can be viewed as the problem of constructing a parse tree for
the input string, starting from the root and creating the nodes of the parse tree in
preorder (depth-first).
 Equivalently, top-down parsing can be viewed as finding a leftmost derivation for
an input string.
 Key problem: Determining the the production to be applied for a nonterminal, say
A.
 After choosing an A-production, the rest of the parsing process consists of
“matching” the terminal symbols in the production body with the input string.

CD / Module-III / Jasaswi 77 20 February 2024


Example of Top-down Parsing
CFG

Input: id+ id * id

CD / Module-III / Jasaswi 78 20 February 2024


General Idea of Top-down Parsing
1. Start with the root (start symbol) of the parse tree.
2. Grow the tree downwards by expanding productions at the lower levels of the tree
• Select a nonterminal and extend it by adding children corresponding to the right side of some
production for the nonterminal
3. Repeat till
• Lower fringe consists only terminals and the input is consumed
• Mismatch in the lower fringe and the remaining input stream
 Selection of a production may involve trial-and-error
 Wrong choice of productions while expanding nonterminals
 Input character stream is not part of the language.

NOTE: Top down parsing basically finds a leftmost derivation for an input string

CD / Module-III / Jasaswi 79 20 February 2024


Leftmost Top-down Parsing Algorithm

CD / Module-III / Jasaswi 80 20 February 2024


Implementing Backtracking

CD / Module-III / Jasaswi 81 20 February 2024


Example of Backtracking
 Consider the following grammar:

 Let us construct a parse tree top-down for the input string w = cad

CD / Module-III / Jasaswi 82 20 February 2024


Cost of Backtracking
 Backtracking is expensive
• Parser expands a nonterminal with the wrong rule
• Mismatch between the lower fringe of the parse tree and the input is detected
• Parser undoes the last few actions
• Parser tries other productions if any

CD / Module-III / Jasaswi 83 20 February 2024


Avoid Backtracking
 Parser is to select the next rule
• Compare the current symbol and the next input symbol called the lookahead
• Use the lookahead to disambiguate the possible production rules
 Backtrack-free grammar is a CFG for which the leftmost, top-down parser can
always predict the correct rule with one word lookahead
• Also called a predictive grammar

CD / Module-III / Jasaswi 84 20 February 2024


Example of Top-down Parsing
CFG

How does a top-down parser


choose which rule to apply?

CD / Module-III / Jasaswi 85 20 February 2024


Example of Top-down Parsing
CFG

A top-down parser can loop indefinitely


with left-recursive grammar

CD / Module-III / Jasaswi 86 20 February 2024


Conditions for Backtrack-Free Grammar

CD / Module-III / Jasaswi 87 20 February 2024


Backtracking

CD / Module-III / Jasaswi 88 20 February 2024


Key Insight in Using Top-Down Parsing
 Efficiency depends on the accuracy of selecting the correct production for
expanding a nonterminal
• Parser may not terminate in the worst case
 A large subset of the context-free grammars can be parsed without backtracking

CD / Module-III / Jasaswi 89 20 February 2024


Recursive-Descent Parsing
 Recursive-descent parsing is a form of top-down parsing that may require backtracking
 Consists of a set of procedures, one for each nonterminal.
 Execution begins with the procedure for the start symbol, which halts and announces success if
its procedure body scans the entire input string.
 Pseudocode for a typical nonterminal appears below.

 Note that this pseudocode is non-deterministic, since it begins by choosing the A-production to
apply in a manner that is not specified.
CD / Module-III / Jasaswi 90 20 February 2024
Recursive-Descent Parsing – contd…
 General recursive descent may require backtracking that is, it may require
repeated scans over the input.
 The previous code needs to be modified to allow backtracking.
 In general form it cannot choose an appropriate production easily. We need to try
all alternatives.
 If one fails, the input pointer needs to be reset and another alternative has to be
tried.
 Recursive descent parsers cannot be used for left-recursive grammars since it
can go into an infinite loop.
• When we try to expand a nonterminal A, we may eventually find ourselves
again trying to expand A without having consumed any input.

CD / Module-III / Jasaswi 91 20 February 2024


Recursive-Descent Parsing – contd…
 In order to construct Top-down parser without backtracking the CFG should not
have:
• Left Recursion (either direct or indirect)
• Non-determinism
• Ambiguity
 A recursive descent parser is a top-down parser built from a set of mutually
recursive procedures (or a non-recursive equivalent) where each such procedure
implements one of the non-terminals of the grammar. Thus the structure of the
resulting program closely mirrors that of the grammar it recognizes.
 Execution begins with the procedure for the start symbol, which halts and
announces success if its procedure body scans the entire input string.

CD / Module-III / Jasaswi 92 20 February 2024


Recursive Descent Parser: Example 1
#include <stdio.h> else {
// Grammar rule: 𝑬′ → +𝒊𝑬′ | 𝝐
#include <string.h> puts("--------------------------------"); int Edash()
#define SUCCESS 1 puts("Error in parsing String"); {
#define FAILED 0 return 1; if (*look_ahead == '+') {
int E(), Edash(); } look_ahead++;
const char *look_ahead; } if (*look_ahead == 'i'){
printf("%-16s E' -> + iE' \n", look_ahead);
char string[64]; // Grammar rule: 𝑬 → 𝒊𝑬′
look_ahead++;
int main() int E()
if (Edash())
{ { return SUCCESS;
puts("Enter the input string"); if (*look_ahead == 'i') { else
scanf("%s", string); printf("%-16s E -> iE' \n", look_ahead); return FAILED;
look_ahead = string; look_ahead++; } CFG
puts(""); if (Edash()) else 𝑬 → 𝒊𝑬 ′
puts("Input Action"); return SUCCESS; return FAILED;
}
𝑬′ → +𝒊𝑬′ | 𝝐
puts("--------------------------------"); else
if (E() && *look_ahead == '\0') { return FAILED; else {
puts("--------------------------------"); } printf("%-16s E' -> epsilon \n", look_ahead);
puts("String is successfully parsed"); else return SUCCESS;
return 0; return FAILED; }
} } }

CD / Module-III / Jasaswi 93 20 February 2024


Recursive Descent Parser: Example 2
#include <stdio.h> else { // Grammar rule: 𝑬′ → +𝑻𝑬′ | 𝝐
#include <string.h> puts("--------------------------------"); int Edash()
#define SUCCESS 1 puts("Error in parsing String"); {
#define FAILED 0 return 1; if (*look_ahead == '+') {
int E(), Edash(), T(), Tdash(), F(); } printf("%-16s E' -> + T E'\n", look_ahead);
const char *look_ahead; } look_ahead++;
char string[64]; // Grammar rule: 𝑬 → 𝑻𝑬′ if (T()) {
int main() { int E() if (Edash())
puts("Enter the input string"); { return SUCCESS; CFG
scanf("%s", string); printf("%-16s E -> T E'\n", look_ahead); else 𝑬 → 𝑻𝑬′
look_ahead = string; if (T()) { return FAILED; 𝑬′ → +𝑻𝑬′ | 𝝐
puts(""); if (Edash()) } 𝑻 → 𝑭𝑻′
puts("Input Action"); return SUCCESS; else 𝑻′ →∗ 𝑭𝑻′ | 𝝐
puts("--------------------------------"); else return FAILED; 𝑭→ 𝑬 |𝒊
if (E() && *look_ahead == '\0') { return FAILED; }
puts("--------------------------------"); } else {
puts("String is successfully parsed"); else printf("%-16s E'->epsilon \n", look_ahead);
return 0; return FAILED; return SUCCESS;
} } }
}
CD / Module-III / Jasaswi 94 20 February 2024
Recursive Descent Parser: Example 2
// Grammar rule: 𝑻 → 𝑭𝑻′ // Grammar rule: 𝑻′ →∗ 𝑭𝑻′ | 𝝐 // Grammar rule: 𝑭 → 𝑬 | 𝒊
int T() int Tdash() int F()
{
{ {
if (*look_ahead == '(') {
printf("%-16s T -> F T'\n", look_ahead); if (*look_ahead == '*') { printf("%-16s F -> ( E )\n", look_ahead);
if (F()) { printf("%-16s T' -> * F T'\n", look_ahead); look_ahead++;
look_ahead++; if (E()) {
if (Tdash())
if (*look_ahead == ')') {
return SUCCESS; if (F()) {
look_ahead++;
else if (Tdash()) return SUCCESS;
return SUCCESS; }
return FAILED;
else else
} return FAILED;
return FAILED;
else }
} else
return FAILED;
else return FAILED;
}
return FAILED; }
CFG else if (*look_ahead == 'i'){
}
𝑬 → 𝑻𝑬′ else{
printf("%-16s F -> i\n", look_ahead);
𝑬′ → +𝑻𝑬′ | 𝝐 printf("%-16s T'->epsilon\n", look_ahead);
look_ahead++;
return SUCCESS;
𝑻 → 𝑭𝑻′ return SUCCESS; }
𝑻′ →∗ 𝑭𝑻′ | 𝝐 } else
return FAILED;
𝑭→ 𝑬 |𝒊 }
}

CD / Module-III / Jasaswi 95 20 February 2024


Recursive Descent Parser: Exercise 1
Write down the procedures for the recursive descent parser for the given grammar:
E  0E*A | E+A | E%0
A  1A01 | 1
Answer:
Modified Grammar (post removal of Left Recursion and Application of Left Factoring):
𝐸 → 0 𝐸 ∗ 𝐴 𝐸′
𝐸′  + 𝐴 𝐸′ % 0 𝐸′ 
𝐴1𝐵
𝐵𝐴01|
Then write the pseudocode or program using the method discussed in the previous slide.

CD / Module-III / Jasaswi 96 20 February 2024


Limitations with Recursive-Descent Parsing
 Consider a grammar with two productions 𝑋 → 𝛾1 and 𝑋 → 𝛾2
 Suppose FIRST(𝛾1) ∩ FIRST(𝛾2) ≠ 𝜙
• Say 𝑎 is the common terminal symbol
 Function corresponding to 𝑋 will not know which production to use on input token
𝑎

CD / Module-III / Jasaswi 97 20 February 2024


Recursive-Descent Parsing with Backtracking
 To support backtracking
• All productions should be tried in some order
• Failure for some production implies we need to try remaining productions
• Report an error only when there are no other rules

CD / Module-III / Jasaswi 98 20 February 2024


Predictive Parsing
 Special case of recursive-descent parsing that does not require backtracking
 Lookahead symbol unambiguously determines which production rule to use
 Advantage is that the algorithm is simple and the parser can be constructed by
hand

CD / Module-III / Jasaswi 99 20 February 2024


Pseudocode for a Predictive Parser

CD / Module-III / Jasaswi 100 20 February 2024


LL(1) Grammars
 Class of grammars for which no backtracking is required
• First L stands for left-to-right scan, second L stands for leftmost derivation
• There is one lookahead token
 No left-recursive or ambiguous grammar can be LL(1)
 In LL(k), k stands for k lookahead tokens
• Predictive parsers accept LL(k) grammars
• The LL(1) parser uses one input symbol of lookahead at each step to make
parsing action decisions.
• Every LL(1) grammar is a LL(2) grammar.

CD / Module-III / Jasaswi 101 20 February 2024


LL(1) Grammars
 A grammar 𝐺 is LL(1) if and only if whenever 𝐴 → 𝛼 | 𝛽 are two distinct
productions of 𝐺, the following conditions hold:
1. For no terminal a do both 𝛼 and 𝛽, derive strings beginning with a.
2. At most one of 𝛼 and 𝛽 can derive the empty string.

3. If 𝛽 ֜ 𝜖 then 𝛼 does not derive any string beginning with a terminal in
FOLLOW(A).

Likewise, if 𝛼 ֜ 𝜖, then 𝛼 does not derive any string beginning with a
terminal in FOLLOW(A).
 The first two conditions are equivalent to the statement that 𝐹𝐼𝑅𝑆𝑇 𝛼 and
𝐹𝐼𝑅𝑆𝑇 𝛽 are disjoint sets.
 The third condition is equivalent to stating that if 𝜖 is in 𝐹𝐼𝑅𝑆𝑇 𝛽 , the 𝐹𝐼𝑅𝑆𝑇 𝛼
and 𝐹𝑂𝐿𝐿𝑂𝑊 𝐴 are disjoint sets.
CD / Module-III / Jasaswi 102 20 February 2024
Non-recursive Table-Driven Predictive Parser

CD / Module-III / Jasaswi 103 20 February 2024


Construction of a predictive parsing table
INPUT: Grammar G
OUTPUT: Parsing Table M
METHOD: For each production A → α of the grammar do the following:
1. For each terminal 𝑎 in 𝐹𝐼𝑅𝑆𝑇 𝛼 , add A → α to 𝑀[𝐴, 𝑎].
2. If 𝜖 is in 𝐹𝐼𝑅𝑆𝑇 𝛼 , then for each terminal 𝑏 in 𝐹𝑂𝐿𝐿𝑂𝑊 𝐴 , add 𝐴 → 𝛼 to
𝑀[𝐴, b]. If 𝜖 is in 𝐹𝐼𝑅𝑆𝑇 𝛼 and $ is in 𝐹𝑂𝐿𝐿𝑂𝑊(𝐴), add 𝐴 → 𝛼 to 𝑀[𝐴, $]
NOTE: If after performing the above, there is no production at all in 𝑀[𝐴, 𝑎], then
set 𝑀[𝐴, 𝑎] to error. No production in 𝑀[𝐴, 𝑎] indicates error.

CD / Module-III / Jasaswi 104 20 February 2024


Construction of predictive parsing table: Example
INPUT: Grammar G
OUTPUT: Parsing Table M
METHOD: For each production A → α of the grammar do the following:
1. For each terminal 𝑎 in 𝐹𝐼𝑅𝑆𝑇 𝛼 , add A → α to 𝑀[𝐴, 𝑎].
2. If 𝜖 is in 𝐹𝐼𝑅𝑆𝑇 𝛼 , then for each terminal 𝑏 in 𝐹𝑂𝐿𝐿𝑂𝑊 𝐴 , add 𝐴 → 𝛼
to 𝑀[𝐴, b]. If 𝜖 is in 𝐹𝐼𝑅𝑆𝑇 𝛼 and $ is in 𝐹𝑂𝐿𝐿𝑂𝑊(𝐴), add 𝐴 → 𝛼 to
𝑀[𝐴, $]
NOTE: If after performing the above, there is no production at all in 𝑀[𝐴, 𝑎],
then set 𝑀[𝐴, 𝑎] to error. No production in 𝑀[𝐴, 𝑎] indicates error.
Non-
FIRST FOLLOW
Terminal
E (, id $, )
𝑬′ +, 𝝐 $, )
T (, id +, $, )
𝑻′ ∗, 𝝐 +, $, )
F (, id *, +, $, )
CD / Module-III / Jasaswi 105 20 February 2024
Non-recursive Table-Driven Predictive Parser
Predictive Parsing Algorithm

• Output: If 𝑤 is in 𝐿(𝐺), a leftmost derivation of 𝑤;


otherwise error indication.

CD / Module-III / Jasaswi 106 20 February 2024


Working of Predictive Parser

Parsing Table

CFG

Moves made by a predictive parser on input id+id*id

CD / Module-III / Jasaswi 107 20 February 2024


Predictive Parsing
 Grammars whose predictive parsing tables contain no duplicate entries are
called LL(1).
 If grammar 𝐺 is left-recursive or is ambiguous, then parsing table 𝑀 will have at
least one multiply-defined cell
 Some grammars cannot be transformed into LL(1)
• The following grammar is ambiguous

CD / Module-III / Jasaswi 108 20 February 2024


Error Recovery in Predictive Parsing
 Error conditions
• Terminal on top of the stack does not match the next input symbol.
• Nonterminal 𝐴 is on top of the stack, 𝑎 is the next input symbol, and 𝑀[𝐴, 𝑎] is
error.
 Choices
• Raise an error and quit parsing.
• Print an error message, try to recover from the error, and continue with
compilation.

CD / Module-III / Jasaswi 109 20 February 2024


Error Recovery in Predictive Parsing
Panic mode:
 Skip over symbols on the input until a token in a selected set of synchronizing (synch) tokens
appears.
 The elements of the synchronizing tokens are chosen so that the parser recovers quickly
from errors.
 Some heuristics of how to chose the synchronizing tokens are:
• Add all tokens in FOLLOW(𝐴) to the synch set for nonterminal 𝐴.
 If we skip tokens until an element of FOLLOW(𝐴) is seen and pop A from the stack,
then parsing can continue.
• Add symbols in FIRST(𝐴) to the synch set for 𝐴.
 It may be possible to resume parsing according to A if a symbol in FIRST(A) appears
in the input.
• Add keywords that can begin statements to the synchronizing sets for the nonterminals
generating expressions.
CD / Module-III / Jasaswi 110 20 February 2024
Error Recovery: Example
 The following parsing table for the already considered grammar is given with synchronizing tokens
obtained from the FOLLOW set of the nonterminal.
 How to use the Table?
• If the parser looks up entry M[A,a] and finds that it is blank, then the input symbol a is skipped.
• If the entry is “synch”, then the nonterminal on top of the stack is popped in an attempt to resume
parsing.
• If the token on top of the stack does not match the input symbol, then pop the token from the stack.

CFG

Non-Terminal FOLLOW
E $, )
𝑬′ $, )
T +, $, )

The parsing table with "synch" indicating synchronizing tokens 𝑻′ +, $, )


obtained from the FOLLOW set of the nonterminals. F *, +, $, )

CD / Module-III / Jasaswi 111 20 February 2024


Error Recovery Moves: Example

The parsing table with "synch

CFG

Parsing and error recovery moves made by a predictive parser on the erroneous input )id *+id
CD / Module-III / Jasaswi 112 20 February 2024
Error Recovery in Predictive Parsing
Phrase-Level Recovery:
 Phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing
table with pointers to error routines.
 These routines may change, insert, or delete symbols on the input and issue appropriate error
messages.
• They may also pop from the stack.
• Alteration of stack symbols or Pushing of new symbols onto the stack. This is questionable for
the following reasons:
 The steps carried out by the parser might then not correspond to the derivation of any word
in the language at all.
 We must ensure that there is no possibility of an infinite loop.

CD / Module-III / Jasaswi 113 20 February 2024

You might also like