0% found this document useful (0 votes)
34 views61 pages

CC Exp CS 20

Uploaded by

atg2479
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views61 pages

CC Exp CS 20

Uploaded by

atg2479
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

1

University of Engineering and Technology, Lahore


(Narowal Campus)

ExpErimEnts

Compiler Construction (Lab)


Teacher: Dr. Iqra Munir
Teaching Assistant: Muzamil Yousaf

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
2

tablE of ContEnt

EXPERIMENT NAME PAGE


NO. NO.
EXP NO. 1 DFA to NFA concersion 3

EXP NO. 2 LEX as Standalone 8

EXP NO. 3 Predictive Parsing Table 14

EXP NO. 4 SLR Parser Table 20

EXP NO. 5 Unification Algorithm 29

EXP NO. 6 LR Parser Table 33

EXP NO. 7 Code Generation 40

EXP NO. 8 Code Optimization 44

EXP NO. 9 Basic LEX 47

EXP NO. 10 Recursive descent Parser 50

EXP NO. 11 FIRST & FOLLOW Algorithm 55

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
3

Experiment # 1
Finite automata:

A finite automaton (FA) is a simple idealized machine used to recognize patterns within
input taken from some character set (or alphabet) C. The job of an FA is
to accept or reject an input depending on whether the pattern defined by the FA occurs in
the input.

A finite automaton consists of:

 a finite set S of N states


 a special start state
 a set of final (or accepting) states
 a set of transitions T from one state to another, labeled with chars in C

As noted above, we can represent a FA graphically, with nodes for states, and arcs for
transitions.

We execute our FA on an input sequence as follows:

 Begin in the start state


 If the next input char matches the label on a transition from the current state to a
new state, go to that new state
 Continue making transitions on each input char
o If no move is possible, then stop
o If in accepting state, then accept

Types

The different types of Finite Automata are as follows −

 Finite Automata without output


o Deterministic Finite Automata (DFA)
o Non-Deterministic Finite Automata (NFA or NDFA)

Deterministic Finite Automata (DFA)

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
4

o DFA refers to deterministic finite automata. Deterministic refers to the uniqueness


of the computation. The finite automata are called deterministic finite automata if
the machine is read an input string one symbol at a time.
o In DFA, there is only one path for specific input from the current state to the next
state.
o DFA does not accept the null move, i.e., the DFA cannot change state without any
input character.
o DFA can contain multiple final states. It is used in Lexical Analysis in Compiler.

In the following diagram, we can see that from state q0 for input a, there is only one path
which is going to q1. Similarly, from q0, there is only one path for input b going to q2.

Non-deterministic finite automata (NFA)

o NFA stands for non-deterministic finite automata. It is easy to construct an NFA


than DFA for a given regular language.
o The finite automata are called NFA when there exist many paths for specific input
from the current state to the next state.
o Every NFA is not DFA, but each NFA can be translated into DFA.
o NFA is defined in the same way as DFA but with the following two exceptions, it
contains multiple next states, and it contains ε transition.

In the following image, we can see that from state q0 for input a, there are two next states
q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed
or determined that with a particular input where to go next. Hence this FA is called non-
deterministic finite automata.

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
5

Code
Code for the conversion of DFA to NFA.
#include <iostream>
#include <vector>
#include <set>
#include <map>

// Define a structure for NFA transitions


struct NFATransition {
char inputSymbol;
std::set<int> targetStates;
};

// Function to convert DFA to NFA


std::map<int, std::vector<NFATransition>> convertDFAToNFA(const
std::vector<std::vector<int>>& dfatransitions) {
int numStates = dfatransitions.size();

// Initialize the NFA transition table


std::map<int, std::vector<NFATransition>> nfatransitions;

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
6

for (int state = 0; state < numStates; ++state) {


nfatransitions[state]; // Initialize the state in the NFA transition table

for (char symbol = 'a'; symbol <= 'd'; ++symbol) {


std::set<int> targetStates;
int nextState = dfatransitions[state][symbol - 'a'];

if (nextState != -1) {
targetStates.insert(nextState);
}

nfatransitions[state].push_back({symbol, targetStates});
}
}

return nfatransitions;
}

int main() {
// Define a DFA transition table (example)
std::vector<std::vector<int>> dfatransitions = {
{1, 2}, // State 0, transitions for 'a' and 'b'
{0, 2}, // State 1, transitions for 'a' and 'b'
{2, 0} // State 2, transitions for 'a' and 'b'
};

// Convert DFA to NFA

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
7

std::map<int, std::vector<NFATransition>> nfatransitions =


convertDFAToNFA(dfatransitions);

// Print the NFA transition table


for (const auto& stateTransitions : nfatransitions) {
int state = stateTransitions.first;
const std::vector<NFATransition>& transitions = stateTransitions.second;

for (const NFATransition& transition : transitions) {


std::cout << "State " << state << " on input '" << transition.inputSymbol << "' goes to
states: ";
for (int targetState : transition.targetStates) {
std::cout << targetState << " ";
}
std::cout << std::endl;
}
}

return 0;
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
8

Experiment # 2
Lexical Analyzer to Implement Stand-Alone:
A standalone lexical analyzer, also known as a lexer or tokenizer, is a program that
operates independently and is designed to analyze the lexical structure of a given input.
The primary purpose of a lexical analyzer is to break down the input into a stream of tokens,
which are the smallest units of meaning in a programming language. This process is a
fundamental step in the compilation of source code.
Definition:
Here's a general definition of a standalone lexical analyzer:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
9

A standalone lexical analyzer is a self-contained program or module that takes source


code as input and produces a sequence of tokens as output. It is responsible for scanning
the input character by character, recognizing and categorizing the elements based on
predefined rules and patterns. These rules often involve regular expressions that describe
the syntactic structure of the programming language.
Key Features:
The key features of a standalone lexical analyzer include:
1. Input Handling:
 Accepts source code or a character stream as input.
2. Tokenization:
 Break down the input into tokens, where each token represents a meaningful
unit in the programming language (e.g., keywords, identifiers, operators,
literals).
3. Pattern Matching:
 Uses regular expressions or similar mechanisms to define the patterns for
different types of tokens.
4. Token Output:
 Generates a stream of tokens, associating each token with its type and, in
some cases, additional information like the token's value or position in the
source code.
5. Error Handling:
 Provides mechanisms for handling and reporting lexical errors when the
input does not conform to expected token patterns.
6. Independence:
 Operates independently of other components in a compiler or interpreter. A
standalone lexer can be used in various contexts without relying on a specific
compiler or interpreter implementation.
7. Configurability:
 Allows for configurability to adapt to different programming languages or
dialects by adjusting the tokenization rules.

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
10

8. Integration:
 Can be integrated into larger systems, such as compilers or interpreters, to
perform the initial phase of lexical analysis.
Standalone lexical analyzers are foundational components in the compilation process
and are often employed in combination with parsers and other components to transform
source code into executable or intermediate code. They play a crucial role in understanding
the structure of programming languages and converting human-readable code into a form
that is easier for machines to process.

CODE
Write down a LEX program to implement stand-alone.
#include <iostream>
#include <cctype>
#include <vector>
#include <string>

// Token types
enum TokenType {
IDENTIFIER,
NUMBER,
OPERATOR,
PUNCTUATION,
KEYWORD,
END_OF_FILE
};

// Token structure
struct Token {

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
11

TokenType type;
std::string value;
};

class Lexer {
public:
Lexer(const std::string& input) : input(input), position(0) {}

Token getNextToken() {
if (position >= input.length()) {
return {END_OF_FILE, ""};
}

char currentChar = input[position++];

// Skip whitespace
while (std::isspace(currentChar)) {
currentChar = input[position++];
}

// Identify token type


if (std::isalpha(currentChar)) {
return scanIdentifier(currentChar);
} else if (std::isdigit(currentChar)) {
return scanNumber(currentChar);
} else if (isOperator(currentChar)) {
return {OPERATOR, std::string(1, currentChar)};

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
12

} else if (isPunctuation(currentChar)) {
return {PUNCTUATION, std::string(1, currentChar)};
}

// Handle unknown characters


return {END_OF_FILE, ""};
}

private:
std::string input;
std::size_t position;

Token scanIdentifier(char initialChar) {


std::string identifier(1, initialChar);

while (position < input.length() && (std::isalnum(input[position]) || input[position] == '_'))


{
identifier += input[position++];
}

// Check if the identifier is a keyword


if (isKeyword(identifier)) {
return {KEYWORD, identifier};
} else {
return {IDENTIFIER, identifier};
}
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
13

Token scanNumber(char initialChar) {


std::string number(1, initialChar);

while (position < input.length() && std::isdigit(input[position])) {


number += input[position++];
}

return {NUMBER, number};


}

bool isOperator(char c) {
return c == '+' || c == '-' || c == '*' || c == '/';
}

bool isPunctuation(char c) {
return c == '(' || c == ')' || c == '{' || c == '}' || c == ',' || c == ';';
}

bool isKeyword(const std::string& identifier) {


// For simplicity, we'll consider a few keywords
return identifier == "if" || identifier == "else" || identifier == "while" || identifier == "int";
}
};

int main() {
std::string sourceCode = "int main() { return 0; }";

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
14

Lexer lexer(sourceCode);

Token token = lexer.getNextToken();


while (token.type != END_OF_FILE) {
std::cout << "Token Type: " << token.type << ", Value: '" << token.value << "'\n";
token = lexer.getNextToken();
}

return 0;
}

Experiment # 3
Predictive Parser:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
15

Predictive parsing is a parsing technique used in compiler design to analyze and validate
the syntactic structure of a given input string based on a grammar. It predicts the production
rules to apply without backtracking, making it efficient and deterministic.
Components of Predictive Parsing
1. Input Buffer
The input buffer is a temporary storage area that holds the input symbols yet to be
processed by the parser. It provides a lookahead mechanism, allowing the parser to
examine the next input symbol without consuming it.

In this diagram, the input buffer is shown as a series of unprocessed tokens (T) and non-
terminals (N). The end-of-input marker is denoted by the "$" symbol. Based on the top of
the parsing stack and the current input symbol, the parsing table is a data structure that
directs the parser's activities.
2. Stack
The stack, also known as the parsing stack or the pushdown stack, is a data structure used
by the parser to keep track of the current state and to guide the parsing process. It stores
grammar symbols, both terminals and non-terminals, during the parsing process.

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
16

3. Predictive Parsing Table


The predictive parsing table is a data structure used in predictive parsing. It maps the
combination of a non-terminal and a lookahead symbol to a production rule. It guides the
parser's decisions, determining which production to apply based on the current non-
terminal and the next input symbol.

This diagram represents the parsing table as a grid with rows and columns. The rows
correspond to the non-terminals in the grammar, and the columns represent the input
symbols (terminals and the end-of-input marker $).
Each cell in the table contains information about the action or the production to be applied
when a specific non-terminal and input symbol combination is encountered. Here's what
each entry in the table represents:
 S, A, B: Non-terminals in the grammar.
 S->, A->, B->: Indicates a production rule to be applied.
 a, b: Terminal symbols from the input alphabet.
 ε: Represents an empty or epsilon production.
 Empty cells: Indicate that no action or production is defined for that combination of
non-terminal and input symbols.

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
17

For example, if the parser is in state S (row) and the next input symbol is "a" (column), the
corresponding cell entry "S->aAB" instructs the parser to apply the production rule "S ->
aAB".
CODE
write a cpp code for construction of predictive parsing table
#include <iostream>
#include <unordered_map>
#include <vector>
#include <set>

using namespace std;

// Define the grammar rules


vector<string> rules = {"E$", "TR", "+TR", "FY", "*FY", "(E)", "id"};

// Define the non-terminals


set<char> nonTerminals = {'S', 'E', 'R', 'T', 'Y', 'F'};

// Define the terminals


set<string> terminals = {"$", "+", "*", "(", ")", "id"};

// Define the parsing table


unordered_map<char, unordered_map<string, string>> parsingTable;

// Function to add entries to the parsing table


void addEntry(char nonTerminal, const string &terminal, const string &rule) {
parsingTable[nonTerminal][terminal] = rule;
}
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
18

// Function to print the parsing table


void printParsingTable() {
cout << "Predictive Parsing Table:\n";
for (char nonTerminal : nonTerminals) {
for (const string &terminal : terminals) {
if (parsingTable[nonTerminal].find(terminal) != parsingTable[nonTerminal].end()) {
cout << "M[" << nonTerminal << ", " << terminal << "] = " <<
parsingTable[nonTerminal][terminal] << endl;
}
}
}
}

int main() {
// Initialize the parsing table
for (char nonTerminal : nonTerminals) {
for (const string &terminal : terminals) {
parsingTable[nonTerminal][terminal] = "";
}
}

// Add entries to the parsing table based on the grammar rules


addEntry('S', "(", "E$");
addEntry('S', "id", "E$");
addEntry('E', "(", "TR");
addEntry('E', "id", "TR");
addEntry('R', "+", "+TR");
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
19

addEntry('R', ")", "ε");


addEntry('R', "$", "ε");
addEntry('T', "(", "FY");
addEntry('T', "id", "FY");
addEntry('Y', "+", "ε");
addEntry('Y', "*", "*FY");
addEntry('Y', ")", "ε");
addEntry('Y', "$", "ε");
addEntry('F', "(", "(E)");
addEntry('F', "id", "id");

// Print the parsing table


printParsingTable();

return 0;

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
20

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
21

Experiment # 4
SLR Parser :
SLR is simple LR. It is the smallest class of grammar having few number of states. SLR
is very easy to construct and is similar to LR parsing. The only difference between SLR
parser and LR(0) parser is that in LR(0) parsing table, there’s a chance of ‘shift
reduced’ conflict because we are entering ‘reduce’ corresponding to all terminal states.
We can solve this problem by entering ‘reduce’ corresponding to FOLLOW of LHS of
production in the terminating state. This is called SLR(1) collection of items

Steps for constructing the SLR parsing table :


1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions:goto[list of terminals] and action[list of non-terminals] in the
parsing table
CODE
write a cpp code for SLR parser table generation
#include <iostream>
#include <vector>
#include <unordered_map>
#include <set>
#include <stack>
#include <algorithm>

using namespace std;

// Define the grammar rules


vector<string> rules = {"S -> A", "A -> B C", "B -> b", "C -> c"};

// Define the terminals and non-terminals


set<string> terminals = {"b", "c", "$"};
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
22

set<string> nonTerminals = {"S", "A", "B", "C"};

// Define the augmented grammar


string augmentedGrammar = "S' -> S";

// Define the productions for each non-terminal


unordered_map<string, vector<vector<string>>> productions;

// Define the LR(0) items


vector<set<string>> lr0Items;

// Define the canonical LR(0) collection of sets


vector<set<string>> canonicalSets;

// Define the action and goto tables


unordered_map<int, unordered_map<string, string>> actionTable;
unordered_map<int, unordered_map<string, int>> gotoTable;

// Function to compute the closure of a set of items


set<string> closure(const set<string>& items) {
set<string> closureSet = items;

bool added;
do {
added = false;
for (const string& item : closureSet) {
size_t dotPos = item.find('.');

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
23

if (dotPos != string::npos && dotPos + 1 < item.size() && nonTerminals.count(string(1,


item[dotPos + 1]))) {
for (const vector<string>& production : productions[string(1, item[dotPos + 1])]) {
string newItem = production[0] + " -> ." + production[1];
if (closureSet.count(newItem) == 0) {
closureSet.insert(newItem);
added = true;
}
}
}
}
} while (added);

return closureSet;
}

// Function to compute the goto of a set of items on a symbol


set<string> goTo(const set<string>& items, const string& symbol) {
set<string> goToSet;

for (const string& item : items) {


size_t dotPos = item.find('.');
if (dotPos != string::npos && dotPos + 1 < item.size() && item[dotPos + 1] == symbol[0]
&& dotPos + 2 < item.size()) {
string newItem = item;
swap(newItem[dotPos + 1], newItem[dotPos + 2]);
goToSet.insert(newItem);
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
24

return closure(goToSet);
}

// Function to generate LR(0) items for each production


void generateLR0Items() {
for (const string& rule : rules) {
string item = augmentedGrammar + " -> ." + rule;
set<string> closureSet = closure({item});
lr0Items.push_back(closureSet);
}
}

// Function to generate the canonical LR(0) collection of sets


void generateCanonicalSets() {
stack<set<string>> itemStack;
itemStack.push(lr0Items[0]);
canonicalSets.push_back(lr0Items[0]);

while (!itemStack.empty()) {
set<string> currentItem = itemStack.top();
itemStack.pop();

set<string> symbols;
for (const string& item : currentItem) {
size_t dotPos = item.find('.');

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
25

if (dotPos != string::npos && dotPos + 1 < item.size()) {


symbols.insert(string(1, item[dotPos + 1]));
}
}

for (const string& symbol : symbols) {


set<string> goToSet = goTo(currentItem, symbol);
if (goToSet.size() > 0 && find(canonicalSets.begin(), canonicalSets.end(), goToSet) ==
canonicalSets.end()) {
canonicalSets.push_back(goToSet);
itemStack.push(goToSet);
}
}
}
}

// Function to generate the SLR(1) parsing tables


void generateParsingTables() {
for (int i = 0; i < canonicalSets.size(); ++i) {
for (const string& symbol : terminals) {
set<string> goToSet = goTo(canonicalSets[i], symbol);
if (goToSet.size() > 0) {
auto it = find(canonicalSets.begin(), canonicalSets.end(), goToSet);
if (it != canonicalSets.end()) {
int j = distance(canonicalSets.begin(), it);
actionTable[i][symbol] = "S" + to_string(j);
}
}
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
26

for (const string& symbol : nonTerminals) {


set<string> goToSet = goTo(canonicalSets[i], symbol);
if (goToSet.size() > 0) {
auto it = find(canonicalSets.begin(), canonicalSets.end(), goToSet);
if (it != canonicalSets.end()) {
int j = distance(canonicalSets.begin(), it);
gotoTable[i][symbol] = j;
}
}
}

for (const string& item : canonicalSets[i]) {


size_t dotPos = item.find('.');
if (dotPos != string::npos && dotPos + 1 < item.size() && dotPos + 2 == item.size()) {
for (int j = 0; j < rules.size(); ++j) {
if (item.substr(dotPos - 1) == rules[j].substr(3)) {
if (item.substr(0, 2) == augmentedGrammar) {
actionTable[i]["$"] = "Accept";
} else {
actionTable[i][item.substr(0, 1)] = "R" + to_string(j);
}
}
}
}
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
27

}
}

// Function to print the LR(0) items


void printLR0Items() {
cout << "LR(0) Items:\n";
for (int i = 0; i < lr0Items.size(); ++i) {
cout << "I" << i << ":\n";
for (const string& item : lr0Items[i]) {
cout << " " << item << endl;
}
cout << endl;
}
}

// Function to print the canonical LR(0) collection of sets


void printCanonicalSets() {
cout << "Canonical LR(0) Collection of Sets:\n";
for (int i = 0; i < canonicalSets.size(); ++i) {
cout << "I" << i << ":\n";
for (const string& item : canonicalSets[i]) {
cout << " " << item << endl;
}
cout << endl;
}
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
28

int main() {
// Initialize the productions map
for (const string& rule : rules) {
size_t arrowPos = rule.find("->");
string nonTerminal = rule.substr(0, arrowPos - 1);
productions[nonTerminal].push_back({nonTerminal, rule.substr(arrowPos + 2)});
}

generateLR0Items();
generateCanonicalSets();
generateParsingTables();

// Print LR(0) items and canonical LR(0) sets


printLR0Items();
printCanonicalSets();

return 0;
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
29

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
30

Experiment # 5
Unification Algorithm :
Unification is a process used in various areas of computer science, including artificial
intelligence, logic programming, and automated theorem proving. The goal of unification
is to find a common substitution that makes two expressions identical. The process involves
resolving variables and terms to make different expressions equivalent. Here is a
conceptual overview of the unification algorithm:
Terms and Variables
 Terms: In the context of unification, a term is a basic unit of expression. It can be a
constant (like a name or a number) or a variable. Examples of terms include "John,"
"42," or "$X" (where $X is a variable).
 Variables: Variables are symbols that represent unknown values. They are
placeholders that can be substituted with other terms during the unification process.
Substitution
 Substitution: A substitution is a mapping from variables to terms. It defines how
variables should be replaced to make two expressions identical. For example, if we
have a substitution $X → "John," it means that wherever $X appears in an
expression, it can be replaced with "John."
Unification Process
The unification process involves recursively comparing and matching terms. Here's a step-
by-step overview:
1. Base Case: If both terms are constants and are the same, unification is successful.
If they are different, unification fails.
2. Variable Unification: If one term is a variable and the other is a constant, unify the
variable with the constant by creating a substitution.
3. Variable-Variable Unification: If both terms are variables, unify them by creating
a substitution that maps one variable to the other.
4. Recursive Unification: If both terms are compound (e.g., functions or structures),
recursively unify their sub-terms.
5. Failure Conditions: If at any point during the process a conflict arises (e.g.,
variable occurring in its own definition, occurs-check failure), unification fails.
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
31

Example
Consider the following expressions:
 Expression 1: $X + 5
 Expression 2: 7
The unification process might proceed as follows:
1. Attempt to unify $X and 7. Create a substitution: $X → 7.
2. The expressions are now unified.
CODE
implementation of unification algorithm in cpp
#include <iostream>
#include <map>
#include <vector>
#include <string>

// Define a type for variables and constants


using Term = std::string;

// Define a type for substitutions (variable -> term)


using Substitution = std::map<Term, Term>;

// Function to apply a substitution to a term


Term applySubstitution(const Term &term, const Substitution &subst) {
auto it = subst.find(term);
if (it != subst.end()) {
// If term is a variable, return its substitution, otherwise return the term itself
return applySubstitution(it->second, subst);
}
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
32

return term;
}

// Function to unify two terms


bool unify(const Term &term1, const Term &term2, Substitution &subst) {
Term t1 = applySubstitution(term1, subst);
Term t2 = applySubstitution(term2, subst);

if (t1 == t2) {
// Terms are already the same, no further unification needed
return true;
} else if (t1[0] == '$' && t2[0] != '$') {
// Unify variable with term
subst[t1] = t2;
return true;
} else if (t1[0] != '$' && t2[0] == '$') {
// Unify term with variable
subst[t2] = t1;
return true;
} else if (t1[0] == '$' && t2[0] == '$' && t1 != t2) {
// Unify two variables
subst[t1] = t2;
return true;
} else if (t1[0] != '$' && t2[0] != '$') {
// Unify two constants
return t1 == t2;
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
33

return false; // Unable to unify


}

int main() {
// Example usage
Term term1 = "$X";
Term term2 = "John";
Substitution subst;

if (unify(term1, term2, subst)) {


std::cout << "Unification successful. Substitution:\n";
for (const auto &entry : subst) {
std::cout << entry.first << " -> " << entry.second << "\n";
}
} else {
std::cout << "Unification failed.\n";
}

return 0;
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
34

Experiment # 6
LR Parser:
An LR parser (Left-to-right, Rightmost derivation) is a type of bottom-up parser used in
the field of compiler design for syntax analysis. LR parsers are capable of parsing a broad
class of context-free grammars, making them popular choices for implementing
programming language compilers.
Here are the key concepts associated with LR parsers:
1. Grammar and Productions:
 LR parsers operate based on a context-free grammar, typically expressed in
Backus-Naur Form (BNF) or Extended Backus-Naur Form (EBNF).
 The grammar consists of a set of productions or rules that describe how valid
sentences or programs can be constructed.
2. Items and States:
 LR parsers use items, which are partially completed productions with a "dot"
indicating the current position in the production.
 States represent sets of items, and the parser moves between states during the
parsing process.
3. LR(0), SLR(1), LR(1), and LALR(1):
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
35

 LR parsers come in different flavors, such as LR(0), SLR(1), LR(1), and


LALR(1), each representing a different level of lookahead and efficiency in
parsing.
 The number in LR(k) denotes the amount of lookahead symbols considered
when making parsing decisions.
4. Parsing Tables:
 LR parsers use parsing tables to make decisions during the parsing process.
These tables include action and goto entries.
 The action table guides the parser on whether to shift a symbol, reduce using
a production, or accept the input.
 The goto table indicates which state to transition to after a reduction.
5. Shift-Reduce and Reduce-Reduce Actions:
 The LR parser operates by repeatedly performing shift and reduce actions.
 A shift action involves pushing a symbol onto the stack and transitioning to
a new state.
 A reduce action involves popping items off the stack and replacing them with
a non-terminal symbol according to a production.
6. Handle and Handle Pruning:
 A handle is a substring in the rightmost derivation that matches the right-
hand side of a production.
 Handle pruning is the process of replacing a handle with the corresponding
non-terminal symbol during reduce actions.
7. Bottom-Up Parsing:
 LR parsers are bottom-up parsers because they start parsing from the input
symbols and build the parse tree from the bottom to the top.
 The parser attempts to reduce the input to the start symbol through a series
of shift and reduce actions.
8. LR Parsing Algorithm:
 The LR parsing algorithm involves maintaining a stack of states and a stack
of input symbols.
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
36

 The parser transitions between states based on the current state, input symbol,
and the parsing tables until it either accepts the input or detects an error.
LR parsers are efficient and capable of handling a wide range of grammars. They are widely
used in practice for implementing compilers and parser generators. The most common
variants used in practice are SLR(1), LR(1), and LALR(1) parsers.
CODE
Code to implement the LR parser table.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_PROD 10
#define MAX_LEN 10

typedef struct {
char left;
char right[MAX_LEN];
} Production;

typedef struct {
char symbol;
int state;
} Action;

typedef struct {
char symbol;
int state;
} Goto;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
37

Production productions[MAX_PROD];
Action actionTable[MAX_PROD][MAX_LEN];
Goto gotoTable[MAX_PROD][MAX_LEN];

int numProductions = 0;

void addProduction(char left, const char* right) {


productions[numProductions].left = left;
strcpy(productions[numProductions].right, right);
numProductions++;
}

void printTable(int numStates, char terminals[], char nonTerminals[]) {


// Print header
printf("\nLR(0) Parser Table:\n\n");
printf("State\t");

for (int i = 0; i < strlen(terminals); i++) {


printf("%c\t", terminals[i]);
}

for (int i = 0; i < strlen(nonTerminals); i++) {


printf("%c\t", nonTerminals[i]);
}

printf("\n");

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
38

// Print action and goto tables


for (int i = 0; i < numStates; i++) {
printf("%d\t", i);

for (int j = 0; j < strlen(terminals); j++) {


if (actionTable[i][j].symbol == 'a') {
printf("acc\t");
} else if (actionTable[i][j].symbol == 's') {
printf("s%d\t", actionTable[i][j].state);
} else if (actionTable[i][j].symbol == 'r') {
printf("r%d\t", actionTable[i][j].state);
} else {
printf("\t");
}
}

for (int j = 0; j < strlen(nonTerminals); j++) {


if (gotoTable[i][j].state != -1) {
printf("%d\t", gotoTable[i][j].state);
} else {
printf("\t");
}
}

printf("\n");
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
39

int main() {
// Example grammar: E -> E+T | T, T -> T*F | F, F -> (E) | id

addProduction('E', "E+T");
addProduction('E', "T");
addProduction('T', "T*F");
addProduction('T', "F");
addProduction('F', "(E)");
addProduction('F', "id");

// Construct LR(0) parser tables


int numStates = 5; // Number of states in LR(0) automaton
char terminals[] = {'+', '*', '(', ')', 'i', 'd', '$'};
char nonTerminals[] = {'E', 'T', 'F'};

// Example LR(0) parser tables


actionTable[0][4] = {'s', 5};
actionTable[0][2] = {'s', 4};
actionTable[1][0] = {'r', 2};
actionTable[1][1] = {'s', 6};
actionTable[1][3] = {'r', 2};
actionTable[1][6] = {'r', 2};
actionTable[2][0] = {'r', 4};
actionTable[2][1] = {'r', 4};
actionTable[2][3] = {'s', 7};

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
40

actionTable[2][6] = {'r', 4};


actionTable[3][0] = {'s', 5};
actionTable[3][3] = {'s', 8};
actionTable[4][0] = {'r', 1};
actionTable[4][1] = {'r', 1};
actionTable[4][3] = {'r', 1};
actionTable[4][6] = {'r', 1};

gotoTable[0][0] = {1, 2};


gotoTable[0][1] = {2, 3};
gotoTable[0][2] = {3, 4};
gotoTable[3][0] = {-1, -1};
gotoTable[3][1] = {-1, -1};
gotoTable[3][2] = {-1, -1};
gotoTable[4][0] = {-1, -1};
gotoTable[4][1] = {-1, -1};
gotoTable[4][2] = {-1, -1};

printTable(numStates, terminals, nonTerminals);

return 0;
}

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
41

Experiment # 7
Code Generation:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
42

Code generation is a crucial phase in a compiler that involves translating an intermediate


representation of the source code (typically in the form of an abstract syntax tree,
intermediate code, or some other representation) into machine code or another target code.
Here's an overview of the concept of code generation in a compiler:
1. Intermediate Representation (IR):
 Before generating target code, a compiler usually translates the source code
into an intermediate representation. This representation simplifies code
analysis and transformations.
2. Code Generation Process:
 Selection of Target Machine: The code generator takes the intermediate
representation and translates it into machine code or another target code
suitable for a specific architecture or platform.
 Mapping to Instructions: The generator maps the IR to sequences of
instructions or operations that the target machine can execute.
 Optimization: Code generation often involves optimization techniques to
enhance the efficiency of the generated code. These optimizations can
include instruction scheduling, register allocation, loop optimization, and
more.
3. Techniques in Code Generation:
 Syntax-Directed Translation: This technique uses rules associated with the
grammar of the source language to generate code.
 Tree Rewriting Systems: Based on a syntax tree, the generator applies
rewriting rules to transform the tree into target code.
 Peephole Optimization: Examines and refines small sequences of generated
instructions to optimize code quality.
 Register Allocation: Determines which values to store in processor registers
to minimize memory access.
4. Output Code Quality and Performance:
 The effectiveness of a code generator can significantly impact the efficiency
and performance of the resulting compiled program. A good code generator
aims to produce efficient code in terms of execution speed, memory usage,
and other metrics.
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
43

5. Debugging and Diagnostics:


 Debugging can be more challenging with generated code, as it might not
directly relate to the original source code. Good compilers often include
features to map generated code back to the source for debugging purposes.
6. Backend vs. Frontend:
 Code generation is part of the compiler's backend, which comes after the
frontend (lexical analysis, syntax analysis, semantic analysis) and before the
code optimization phase.
Overall, code generation plays a vital role in the translation process, converting high-level
source code into machine-readable code that can be executed on the target machine,
ensuring efficient and correct execution of the program.
CODE
Write a C++ program for code generation.
#include <iostream>
#include <memory>
#include <string>

// Abstract Syntax Tree Node


struct Node {
std::string value;
Node* left;
Node* right;

Node(const std::string& value) : value(value), left(nullptr), right(nullptr) {}


};

// Function to generate code from the AST


void generateCode(Node* root) {
if (root == nullptr) {
return;
}

generateCode(root->left);
generateCode(root->right);

if (root->value == "+") {
std::cout << "Add operation code here" << std::endl;
} else if (root->value == "-") {
std::cout << "Subtract operation code here" << std::endl;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
44

} else if (root->value == "*") {


std::cout << "Multiply operation code here" << std::endl;
} else if (root->value == "/") {
std::cout << "Divide operation code here" << std::endl;
} else {
// Assuming it's a numeric value, generate code to load the value
std::cout << "Load " << root->value << " into register or memory" << std::endl;
}
}

int main() {
// Example AST for the expression: (5 * 3) + (7 - 2)
Node* rootNode = new Node("+");
rootNode->left = new Node("*");
rootNode->right = new Node("-");

rootNode->left->left = new Node("5");


rootNode->left->right = new Node("3");

rootNode->right->left = new Node("7");


rootNode->right->right = new Node("2");

// Generate code for the expression


generateCode(rootNode);

// Clean up the allocated memory (in a real scenario, you'd need proper memory management)
delete rootNode->left->left;
delete rootNode->left->right;
delete rootNode->left;
delete rootNode->right->left;
delete rootNode->right->right;
delete rootNode->right;
delete rootNode;

return 0;
}

Output:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
45

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
46

Experiment # 8
Code Optimization:
Code optimization in a compiler is the process of improving the generated code to make it
more efficient in terms of speed, memory usage, and overall performance without altering
its functionality. It's a crucial phase in the compilation process that aims to produce
optimized code from the intermediate representation of the source code. Here's an overview
of code optimization in compilers:
1. Types of Code Optimization:
 Local Optimizations: These optimizations focus on improving code within a
small portion of the program, such as basic blocks or individual instructions.
Examples include constant folding, common subexpression elimination, and
strength reduction.
 Global Optimizations: These optimizations consider the entire program or
significant parts of it to identify and exploit opportunities for improvement
across multiple code blocks or functions. Techniques like loop optimization,
inlining, and code motion fall under this category.
2. Common Code Optimization Techniques:
 Constant Folding and Propagation: Evaluate constant expressions during
compile-time rather than runtime.
 Dead Code Elimination: Remove code that will never be executed, such as
unreachable code or variables that are not used.
 Loop Optimization: Improve the efficiency of loops by reducing loop
overhead, eliminating redundancies, applying loop unrolling, or
vectorization.
 Inlining: Replace function calls with the actual body of the function to reduce
the overhead of function call instructions.
 Register Allocation: Optimize the usage of CPU registers to minimize
memory access and improve performance.
 Code Motion: Move computations or instructions outside loops or repeated
blocks to reduce redundancy and improve efficiency.

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
47

 Peephole Optimization: Analyze small sequences of generated instructions


and replace them with more efficient sequences.
3. Optimization Levels:
 Compilers often offer different optimization levels (e.g., -O1, -O2, -O3 in
GCC) that trade off between compilation time and the level of optimization
applied. Higher optimization levels generally result in more optimized but
potentially longer compilation times.
4. Impact on Program Performance:
 Efficient code optimization can significantly improve program execution
speed, reduce memory usage, and overall enhance the performance of the
compiled code.
5. Challenges:
 Balancing optimization efforts with compilation time: Aggressive
optimizations may significantly increase compilation time, which might not
be desirable in certain scenarios.
 Correctness vs. Optimality tradeoff: Extremely aggressive optimizations
might lead to incorrect code if not handled properly.
6. Debugging and Maintainability:
 Highly optimized code might be harder to debug and understand, as the
optimized code may differ significantly from the original source code.
Overall, code optimization in compilers aims to transform code to achieve better
performance while maintaining correctness, and it's an integral part of the compilation
process that significantly impacts the behavior of the final executable.

CODE
Write a C++ program for code optimization.
#include <iostream>

// Original Code
int main() {
int a = 5;
int b = 10;

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
48

int result = 0;

for (int i = 0; i < 1000; ++i) {


result += a * b;
}

std::cout << "Original Result: " << result << std::endl;

// Optimized Code
constexpr int opt_a = 5;
constexpr int opt_b = 10;
constexpr int opt_result = opt_a * opt_b;

int finalResult = 0;

for (int i = 0; i < 1000; ++i) {


finalResult += opt_result;
}

std::cout << "Optimized Result: " << finalResult << std::endl;

return 0;
}

Output:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
49

Experiment # 9
LEX:
LEX (Lexical Analyzer Generator) is a widely used tool for generating lexical analyzers
or tokenizers in the context of compiler construction. It helps in recognizing and generating
tokens from the input stream of characters, which are then used by parsers to understand
the structure of the programming language. Below is a basic explanation of how LEX
works:
1. Defining Tokens:
 LEX operates based on regular expressions and corresponding actions
defined for these patterns. Tokens such as keywords, identifiers, operators,
and literals are specified using regular expressions.
2. Lexical Rules:
 In a LEX file, you define patterns of characters (using regular expressions)
and corresponding actions to be taken when these patterns are matched.
 For example, a pattern for recognizing integers might be defined as [0-9]+,
which represents one or more digits. When LEX encounters this pattern, it
performs an action like returning the token for an integer.
3. LEX Syntax:
 A simple LEX file consists of a series of rules in the form of regular
expression patterns followed by actions.
 Each rule typically follows the structure: pattern action.
4. Actions:
 Actions in LEX are code fragments written in the programming language of
the target compiler. These actions are executed when a corresponding pattern
is matched.

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
50

5. LEX Compilation:
 The LEX file is processed by the LEX tool to generate a C code file (e.g.,
lex.yy.c) that includes the logic to recognize and handle tokens based on the
defined rules.
6. Integration with YACC/Bison:
 LEX is often used in conjunction with YACC (or Bison) to create a complete
compiler. While LEX handles the lexical analysis (tokenizing), YACC deals
with parsing and syntax analysis.

CODE
Write a C++ program for basic LEX program.
#include <iostream>
#include <cctype>
#include <string>

enum TokenType {
NUMBER,
OPERATOR,
INVALID
};

TokenType getTokenType(char c) {
if (isdigit(c)) {
return NUMBER;
} else if (c == '+' || c == '-' || c == '*' || c == '/' || c == '=' || c == '(' || c == ')') {
return OPERATOR;
} else {
return INVALID;
}
}

void tokenize(const std::string& input) {


std::string token;
for (char c : input) {
TokenType currentTokenType = getTokenType(c);

if (currentTokenType == NUMBER) {
token += c;
} else {
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
51

if (!token.empty()) {
std::cout << "NUMBER: " << token << std::endl;
token = "";
}

if (currentTokenType == OPERATOR) {
std::cout << "OPERATOR: " << c << std::endl;
} else {
std::cout << "INVALID CHARACTER: " << c << std::endl;
}
}
}

if (!token.empty()) {
std::cout << "NUMBER: " << token << std::endl;
}
}

int main() {
std::string expression;
std::cout << "Enter an arithmetic expression: ";
std::getline(std::cin, expression);

tokenize(expression);

return 0;
}

Output:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
52

Experiment # 10
Recursive Descent Parsing:
Recursive Descent Parsing is a top-down parsing technique used in compiler construction
to analyze the syntax of a given input based on a set of production rules. It's called
"recursive descent" because the parser recursively descends through the input by
recursively invoking procedures corresponding to the grammar rules.
Here's an overview of how Recursive Descent Parsing works:
1. Grammar Representation:
 Recursive Descent Parsing is typically implemented for grammars expressed
in a context-free grammar (CFG) notation.
2. Parsing Procedures:
 Each non-terminal in the grammar is represented by a parsing procedure or
function.
 For each non-terminal symbol in the grammar, there is a corresponding
parsing function that recognizes and handles that particular construct in the
input.
3. Procedure Invocation:
 When parsing starts, the procedure corresponding to the start symbol of the
grammar is invoked.
 The parsing functions recursively call each other based on the rules of the
grammar and the input being processed.
4. Handling Terminals and Non-terminals:
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
53

 Parsing functions for terminals handle the actual recognition of input tokens
(such as identifiers, numbers, operators).
 Parsing functions for non-terminals are responsible for recognizing higher-
level constructs by calling other parsing functions or performing further
analysis.
5. Recursive Nature:
 The parsing functions use recursive calls to handle nested or complex
grammar rules. For example, a parsing function for an expression might call
itself to handle sub-expressions.
6. Backtracking:
 Recursive Descent Parsing can backtrack when a parsing function encounters
an incorrect path in the input. This backtracking is often handled using
lookahead techniques or explicit error handling.
7. Implementation in Code:
 Recursive Descent Parsing is often implemented using a series of functions
or methods, each corresponding to a grammar rule.
 The parser reads tokens from the input and matches them against the
expected constructs based on the grammar rules.
8. Limitations:
 Recursive Descent Parsing works well for LL(k) grammars (left-to-right
scan, leftmost derivation, k tokens lookahead) but might face limitations
when dealing with certain ambiguous or left-recursive grammars.

CODE
Write a C++ program for recursive descent parsing.
#include <iostream>
#include <string>
#include <cctype>

class RecursiveDescentParser {
private:
std::string input;
size_t position;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
54

char peek() {
return position < input.size() ? input[position] : '\0';
}

bool isEOF() {
return position >= input.size();
}

void advance() {
if (!isEOF()) {
position++;
}
}

bool match(char expected) {


if (peek() == expected) {
advance();
return true;
}
return false;
}

// Function to check if a character is a digit


bool isDigit(char c) {
return std::isdigit(c);
}

// Function to check if a character is an operator


bool isOperator(char c) {
return c == '+' || c == '-' || c == '*' || c == '/';
}

// Function to identify digits in the input


std::string extractDigits() {
size_t initialPos = position;
while (!isEOF() && isDigit(peek())) {
advance();
}
return input.substr(initialPos, position - initialPos);
}

// Function to identify numbers in the input


double parseNumber() {
size_t initialPos = position;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
55

std::string numStr = extractDigits();


if (peek() == '.' && !isEOF() && isDigit(input[position + 1])) {
advance(); // Consume '.'
numStr += ".";
numStr += extractDigits();
}
return std::stod(numStr);
}

double factor() {
if (std::isdigit(peek())) {
return parseNumber();
} else if (peek() == '(') {
match('(');
double result = expr();
match(')');
return result;
} else {
std::cerr << "Unexpected character: " << peek() << std::endl;
return 0;
}
}

double term() {
double result = factor();
while (peek() == '*' || peek() == '/') {
char op = peek();
advance();
double nextFactor = factor();
if (op == '*') {
result *= nextFactor;
} else {
result /= nextFactor;
}
}
return result;
}

double expr() {
double result = term();
while (peek() == '+' || peek() == '-') {
char op = peek();
advance();
double nextTerm = term();
if (op == '+') {
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
56

result += nextTerm;
} else {
result -= nextTerm;
}
}
return result;
}

public:
RecursiveDescentParser(const std::string& inputStr) : input(inputStr), position(0) {}

void classifyInput() {
while (!isEOF()) {
char currentChar = peek();
if (isDigit(currentChar)) {
std::cout << currentChar << " is a digit." << std::endl;
advance();
} else if (isOperator(currentChar)) {
std::cout << currentChar << " is an operator." << std::endl;
advance();
} else {
std::cerr << "Unexpected character: " << currentChar << std::endl;
advance();
}
}
}

void parse() {
classifyInput();
}
};

int main() {
std::string inputExpression;

// Prompt the user to enter an expression


std::cout << "Enter an arithmetic expression: ";
std::getline(std::cin, inputExpression);

// Create a parser instance and parse the input expression


RecursiveDescentParser parser(inputExpression);
parser.parse();

return 0;
}
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
57

Output:

Experiment # 11
FIRST and FOLLOW:
In compiler theory and formal language theory, FIRST and FOLLOW are sets used in
parsing techniques, specifically in constructing predictive parsers for context-free
grammars (CFGs).
1. FIRST Set:
 The FIRST set of a grammar symbol in a context-free grammar is the set of
terminal symbols that can begin the strings derived from that symbol. In other
words, it represents the set of terminals that can appear as the first symbol of
a production derived from a non-terminal.
 Algorithmically, the FIRST set of a symbol X can be defined as follows:
 If X is a terminal, FIRST(X) = {X}.
 If X is a non-terminal, and there is a production X → Y1Y2...Yn, then:
 Add FIRST(Y1) to FIRST(X) excluding ε (empty string), and
 If ε is in FIRST(Y1), add FIRST(Y2) to FIRST(X), and so on,
until ε is not in FIRST(Yi), or if all Yi have ε in their FIRST
sets, then add ε to FIRST(X).

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
58

2. FOLLOW Set:
 The FOLLOW set of a non-terminal symbol in a context-free grammar is the
set of terminals that can appear immediately to the right of that non-terminal
in some derivation.
 Algorithmically, the FOLLOW set of a non-terminal A can be defined as
follows:
 Initially, add $ (end of input marker) to FOLLOW(S), where S is the
start symbol and appears in the start rule.
 For each production A → αBβ (where α and β are possibly empty
sequences of grammar symbols), add FIRST(β) to FOLLOW(B)
excluding ε.
 If ε is in FIRST(β) or β is ε (i.e., A → αB), then add FOLLOW(A) to
FOLLOW(B).

CODE
Write a C++ program to find the FIRST and FOLLOW of a given grammer.
#include <iostream>
#include <unordered_map>
#include <set>
#include <vector>
#include <string>

using namespace std;

unordered_map<char, set<char>> first;


unordered_map<char, set<char>> follow;

// Function to add elements to a set


void addToSet(set<char>& targetSet, const set<char>& sourceSet) {
for (char c : sourceSet) {
if (c != 'e') { // Exclude epsilon while adding to the set
targetSet.insert(c);
}
}
}

// Function to compute FIRST set for each non-terminal


Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
59

void computeFirst(const unordered_map<char, vector<string>>& grammar) {


for (const auto& rule : grammar) {
char nonTerminal = rule.first;
for (const string& production : rule.second) {
char symbol = production[0];
if (isupper(symbol)) { // Non-terminal symbol
addToSet(first[nonTerminal], first[symbol]);
} else { // Terminal symbol
first[nonTerminal].insert(symbol);
}
}
}
}

// Function to compute FOLLOW set for each non-terminal


void computeFollow(const unordered_map<char, vector<string>>& grammar, char startSymbol)
{
follow[startSymbol].insert('$'); // $ is the marker for end of input
for (const auto& rule : grammar) {
char nonTerminal = rule.first;
for (const string& production : rule.second) {
for (size_t i = 0; i < production.size(); ++i) {
char symbol = production[i];
if (isupper(symbol)) { // Non-terminal symbol
if (i < production.size() - 1) { // If not the last symbol
char nextSymbol = production[i + 1];
if (isupper(nextSymbol)) { // Next symbol is non-terminal
addToSet(follow[symbol], first[nextSymbol]);
} else { // Next symbol is terminal
follow[symbol].insert(nextSymbol);
}
} else { // Last symbol in production
addToSet(follow[symbol], follow[nonTerminal]);
}
}
}
}
}
}

int main() {
unordered_map<char, vector<string>> grammar = {
{'S', {"AB", "BC"}},
{'A', {"aA", "e"}},
{'B', {"bB", "c"}},
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
60

{'C', {"d"}}
};

char startSymbol = 'S';

computeFirst(grammar);
computeFollow(grammar, startSymbol);

// Display FIRST sets


cout << "FIRST sets:\n";
for (const auto& nonTerminal : first) {
cout << nonTerminal.first << ": ";
for (char c : nonTerminal.second) {
cout << c << " ";
}
cout << endl;
}

// Display FOLLOW sets


cout << "\nFOLLOW sets:\n";
for (const auto& nonTerminal : follow) {
cout << nonTerminal.first << ": ";
for (char c : nonTerminal.second) {
cout << c << " ";
}
cout << endl;
}

return 0;
}

Output:

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)
61

Department of Computer Science


University of Engineering and Technology, Lahore (Narowal Campus)

You might also like