Exp 6
Exp 6
Session 6 :
Experiment Title: Implementation using a program to compute FOLLOW S
et
Aim/Objective:
To make the student conversant with the predictive parser (top-down) need to know the FOLLOW set, in
order to parse without backtracking.
Description:
The experiment involves the following steps:
i. Understand the situations where there is need to compute the FOLLOW set, for a top-down
parser. Why top-down search might need to backtrack, in absence of the FOLLOW set.
ii. How to compute the FOLLOW
set. Pre-Requisites:
Students should have a basic understanding of top-down search process, and the left-to-right derivation
that can be derived from that in parsing, with left side symbol of a production replaced by the rhs string.
Pre-Lab Experiments:
1)Given the definition of FOLLOW set of a non-terminal X, as Follow(X):
All terminals that can immediately follow X (not, strictly leftmost), i.e., t ∈ Follow(X) if there is a
derivation containing Xt.
The Follow set of the start symbol, always contains the end-of-input marker (#).
For a predictive LL parser, having done the computation of the FIRST set, elaborate the need
to compute the FOLLOW set, based on the below:
For a nullable non-terminal Yi, there is need to compute what follows Yi
2) State why FOLLOW set (of a non-terminal symbol) can never contain empty set.
The FOLLOW set of a non-terminal can never contain the empty set because it represents
the terminals that can appear immediately after the non-terminal in some derivation, and there
must always be some terminal or the end-of-input marker (#) that follows.
3) Why for a given set of productions, for a grammar G, the FOLLOW set of the start symbol
contains the end-of-input marker (#)?
The FOLLOW set of the start symbol contains the end-of-input marker (#) because the start
symbol's derivation must account for the end of the input string, indicating the complete parsing of
the input.
4) d) List out various rules to compute FOLLOW function for every Non-Terminal for CFG grammar.
The rules to compute the FOLLOW set for every non-terminal in a CFG are:
Also, show how the C code below will exhibit ambiguity, as unlike Python, there is no spacing based
identation.
if (x !=0)
if (y==1/x) ok =
TRUE; else z = 1/x;
And, how to remove the ambiguity in the given C code.
#include <stdio.h>
#include <stdbool.h>
int main() {
int x = 2;
int y = 0;
bool ok = false;
int z;
if (x != 0)
if (y == 1/x)
ok = true;
else
z = 1/x;
printf("Ambiguous code:\n");
printf("ok: %d\n", ok);
printf("z: %d\n\n", z);
ok = false;
if (x != 0) {
if (y == 1/x) {
ok = true;
} else {
z = 1/x;
}
}
printf("Disambiguated code (else with second if):\n");
printf("ok: %d\n", ok);
printf("z: %d\n\n", z);
ok = false;
if (x != 0) {
if (y == 1/x) {
ok = true;
}
} else {
Course Title COMPILER DESIGN ACADEMIC YEAR: 2023-24
Course Code(s) 22CS2235 Page 3 of 86
Experiment <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
z = 1/x;
}
printf("Disambiguated code (else with first if):\n");
printf("ok: %d\n", ok);
printf("z: %d\n", z);
return 0;
}
OUTPUT:
In-Lab Experiments:
1. FOLLOW is used only if the current non-terminal can deriveε; then we're interested in what
could have followed it in a sentential form. (NB: A string can derive ε if and only if ε is in
its FIRST set). FOLLOW can be applied to a non-terminal only and returns a set of terminals.
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
int main() {
int num_nonterminals = 2;
int num_productions = 2;
char nonterminals[] = "ST"; // Non-terminals in the grammar
// Productions: S → aT, T → b
char productions[][MAX_SYMBOLS] = {
"SaT",
"Tb"
};
// FOLLOW sets
char follow_set[MAX_NONTERMINALS][MAX_FIRST_FOLLOW];
return 0;
}
Output:
Post-Lab Experiments:
1. Write a program to compute FOLLOW of NON-Terminal for the following grammar?
E→E+T | T
T→T*F |
F F→(E) |
i
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
int main() {
int num_nonterminals = 3;
int num_productions = 6;
char nonterminals[] = "ETF"; // Non-terminals in the grammar
// Productions
char productions[][MAX_SYMBOLS] = {
"E+E",
"E+T",
"T",
"T*F",
Course Title COMPILER DESIGN ACADEMIC YEAR: 2023-24
Course Code(s) 22CS2235 Page 9 of 86
Experiment <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
"T",
"(E)",
"i"
};
// FIRST sets
char first_set[][MAX_FIRST_FOLLOW] = {
"i+(",
"i*(",
"(i",
"i",
"i",
"(i",
"i"
};
// FOLLOW sets
char follow_set[MAX_NONTERMINALS][MAX_FIRST_FOLLOW];
return 0;
}
OUTPUT:
2. Write a program to compute FOLLOW set of non-terminal for the ‘Dangling else’problem?
#include <stdio.h>
#include <stdbool.h>
int main() {
int x = 2;
int y = 0;
bool ok = false;
int z;
if (x != 0)
if (y == 1/x)
ok = true;
else
z = 1/x;
printf("Ambiguous code:\n");
printf("ok: %d\n", ok);
printf("z: %d\n\n", z);
ok = false;
if (x != 0) {
if (y == 1/x) {
ok = true;
} else {
Course Title COMPILER DESIGN ACADEMIC YEAR: 2023-24
Course Code(s) 22CS2235 Page 11 of 86
Experiment <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
z = 1/x;
}
}
printf("Disambiguated code (else with second if):\n");
printf("ok: %d\n", ok);
printf("z: %d\n\n", z);
ok = false;
if (x != 0) {
if (y == 1/x) {
ok = true;
}
} else {
z = 1/x;
}
printf("Disambiguated code (else with first if):\n");
printf("ok: %d\n", ok);
printf("z: %d\n", z);
return 0;
}
OUTPUT:
viva-voce questions:
Q1. Calculate FOLLOW(S) and FOLLOW(A) for the given CFG :
S→Aa|Ac, A→b
Ans)
Q2. What is Recursive Descent Parser? Why it is called a top-down parser, rather than a bottom-up one,
though the base case(s) occur at the leaf node level only?
Ans)
**Recursive Descent Parser** is a top-down parser that uses recursive procedures for each non-terminal to parse
the input string.
Q.3 a) Consider the evaluation of unparenthesized arithmetic expressions, with two groups (having
equal precedence, inside a group) operators: *, /, and +, -, with the first group having higher precedence
than the second one; using either suffix (reverse polish notation/form), or prefix (polish notation/form).
Which form is amenable to construction by bottom-up parsing, and why?
Which form is amenable to top-down parsing?
Which data structure is used for each of the two types of parsing?
- **Bottom-Up Parsing**:
- **Amenable Form**: Suffix (Reverse Polish Notation).
- **Why**: It processes operators after their operands, aligning with shift-reduce parsing
techniques.
- **Data Structure**: Stack.
- **Top-Down Parsing**:
- **Amenable Form**: Prefix (Polish Notation).
- **Why**: It processes operators before their operands, fitting the recursive descent method.
- **Data Structure**: Recursive function call stack.
Q.3 b) Given the below arithmetic expressions, construct the equivalent suffix, & prefix forms; without
parenthesis. What is the use of precedence of operators, in conversion?
(i) (w + x) * (y +z)
(ii) (x+y)/z
(iii) x/y/z
Show how to extend the precedence functions to handle :
1. relop (relational operators),
2. conditional statements,
3. Unconditional transfers (goto),
4. subscripted variables,
with code below to show the implementation of arithmetic expressions:
class Precedence:
precedence = {
'+': 1,
Course Title COMPILER DESIGN ACADEMIC YEAR: 2023-24
Course Code(s) 22CS2235 Page 14 of 86
Experiment <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
'-': 1,
'*': 2,
'/': 2,
'>': 0.5,
'<': 0.5,
'==': 0.5,
'if': 0,
'goto': -1,
'[': 3, # For subscripted variables
}
@staticmethod
def get_precedence(op):
return Precedence.precedence.get(op, -1)
def infix_to_postfix(expression):
stack = []
output = []
for token in expression.split():
if token.isalnum(): # If the token is an operand
output.append(token)
elif token == '[': # Handling subscripted variables
stack.append(token)
elif token == ']':
while stack and stack[-1] != '[':
output.append(stack.pop())
stack.pop() # Pop '['
elif token in Precedence.precedence:
while stack and Precedence.get_precedence(stack[-1]) >= Precedence.get_precedence(token):
output.append(stack.pop())
stack.append(token)
elif token == 'if': # Special handling for conditionals
output.append(token)
stack.append('then')
elif token == 'goto': # Handling for unconditional transfers
output.append(token)
while stack:
output.append(stack.pop())
def infix_to_prefix(expression):
def reverse_expression(expr):
return expr[::-1]
expression = reverse_expression(expression.split())
stack = []
output = []
while stack:
output.append(stack.pop())
# Example usage
expressions = [
"( w + x ) * ( y + z )",
"( x + y ) / z",
"x / y / z",
"if x > y goto L1",
"a [ i ] = b"
]
Q.3 c) The below CFG grammar handles the precedence (mulop above addop) , but not the left-
associativity of mulop (*, /), & addop (+,-).
But, this grammar is not suitable for top-down parsing, due to left recursion.
C uses bottom-up parsing, as left recursion is not an issue there. YACC tool also uses bottom-up
parsing. State any possible modifications, so that the above modified grammar is fit for top-down
parsing, as well as enables left-associativity.
To make the grammar suitable for top-down parsing while enabling left-associativity, we need to
eliminate left recursion. Here's the modified grammar:
```plaintext
exp → term exp'
exp' → addop term exp' | ε
addop → + | -
term → factor term'
term' → mulop factor term' | ε
mulop → * | /
factor → ( exp ) | number
```
### Explanation:
- **Eliminated Left Recursion**: By introducing new non-terminals `exp'` and `term'`, we remove the
left recursion.
- **Left-Associativity**: `exp'` and `term'` allow chaining operations to ensure left-associativity.
- **Suitable for Top-Down Parsing**: This form can be parsed using recursive descent or predictive
parsing techniques.
This modified grammar supports left-associativity and is suitable for top-down parsing.
Q. 4. Why top-down parsing is stated to be based on left-to-right, search for applicable derivations,
while bottom-up parser is stated to be based on right-to-left search for applicable reductions? Explain
with example for the below set of productions, for the CFG below:
0. Goal→Expr
1. Expr→Term Expr’
2. Expr’→+ Term Expr’
3. | - Term Expr’
4. | e
5. Term→Factor Term’
6. Term’→* Factor Term’
7. | / Factor Term’
8. | e
9. Factor→(Expr)
10. | num
11. | name
Q.5. List out various Error detection and Error recovery strategies, for predictive LL parsers.
1. **Unexpected Token**: Detect when the current token does not match any expected tokens.
2. **Missing Tokens**: Identify when required tokens are missing from the input.
1. **Panic Mode**: Skip tokens until a synchronizing token (like a semicolon or closing
bracket) is found.
2. **Phrase-Level Recovery**: Replace a bad token with a plausible token and continue
parsing.
3. **Error Productions**: Include specific error rules in the grammar to handle common
mistakes.
4. **Synchronization Points**: Define points in the grammar where the parser can safely
resume after detecting an error (e.g., after statement boundaries).
Experiment <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>