CC Exp CS 20
CC Exp CS 20
ExpErimEnts
tablE of ContEnt
Experiment # 1
Finite automata:
A finite automaton (FA) is a simple idealized machine used to recognize patterns within
input taken from some character set (or alphabet) C. The job of an FA is
to accept or reject an input depending on whether the pattern defined by the FA occurs in
the input.
As noted above, we can represent a FA graphically, with nodes for states, and arcs for
transitions.
Types
In the following diagram, we can see that from state q0 for input a, there is only one path
which is going to q1. Similarly, from q0, there is only one path for input b going to q2.
In the following image, we can see that from state q0 for input a, there are two next states
q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed
or determined that with a particular input where to go next. Hence this FA is called non-
deterministic finite automata.
Code
Code for the conversion of DFA to NFA.
#include <iostream>
#include <vector>
#include <set>
#include <map>
if (nextState != -1) {
targetStates.insert(nextState);
}
nfatransitions[state].push_back({symbol, targetStates});
}
}
return nfatransitions;
}
int main() {
// Define a DFA transition table (example)
std::vector<std::vector<int>> dfatransitions = {
{1, 2}, // State 0, transitions for 'a' and 'b'
{0, 2}, // State 1, transitions for 'a' and 'b'
{2, 0} // State 2, transitions for 'a' and 'b'
};
return 0;
}
Experiment # 2
Lexical Analyzer to Implement Stand-Alone:
A standalone lexical analyzer, also known as a lexer or tokenizer, is a program that
operates independently and is designed to analyze the lexical structure of a given input.
The primary purpose of a lexical analyzer is to break down the input into a stream of tokens,
which are the smallest units of meaning in a programming language. This process is a
fundamental step in the compilation of source code.
Definition:
Here's a general definition of a standalone lexical analyzer:
8. Integration:
Can be integrated into larger systems, such as compilers or interpreters, to
perform the initial phase of lexical analysis.
Standalone lexical analyzers are foundational components in the compilation process
and are often employed in combination with parsers and other components to transform
source code into executable or intermediate code. They play a crucial role in understanding
the structure of programming languages and converting human-readable code into a form
that is easier for machines to process.
CODE
Write down a LEX program to implement stand-alone.
#include <iostream>
#include <cctype>
#include <vector>
#include <string>
// Token types
enum TokenType {
IDENTIFIER,
NUMBER,
OPERATOR,
PUNCTUATION,
KEYWORD,
END_OF_FILE
};
// Token structure
struct Token {
TokenType type;
std::string value;
};
class Lexer {
public:
Lexer(const std::string& input) : input(input), position(0) {}
Token getNextToken() {
if (position >= input.length()) {
return {END_OF_FILE, ""};
}
// Skip whitespace
while (std::isspace(currentChar)) {
currentChar = input[position++];
}
} else if (isPunctuation(currentChar)) {
return {PUNCTUATION, std::string(1, currentChar)};
}
private:
std::string input;
std::size_t position;
bool isOperator(char c) {
return c == '+' || c == '-' || c == '*' || c == '/';
}
bool isPunctuation(char c) {
return c == '(' || c == ')' || c == '{' || c == '}' || c == ',' || c == ';';
}
int main() {
std::string sourceCode = "int main() { return 0; }";
Lexer lexer(sourceCode);
return 0;
}
Experiment # 3
Predictive Parser:
Predictive parsing is a parsing technique used in compiler design to analyze and validate
the syntactic structure of a given input string based on a grammar. It predicts the production
rules to apply without backtracking, making it efficient and deterministic.
Components of Predictive Parsing
1. Input Buffer
The input buffer is a temporary storage area that holds the input symbols yet to be
processed by the parser. It provides a lookahead mechanism, allowing the parser to
examine the next input symbol without consuming it.
In this diagram, the input buffer is shown as a series of unprocessed tokens (T) and non-
terminals (N). The end-of-input marker is denoted by the "$" symbol. Based on the top of
the parsing stack and the current input symbol, the parsing table is a data structure that
directs the parser's activities.
2. Stack
The stack, also known as the parsing stack or the pushdown stack, is a data structure used
by the parser to keep track of the current state and to guide the parsing process. It stores
grammar symbols, both terminals and non-terminals, during the parsing process.
This diagram represents the parsing table as a grid with rows and columns. The rows
correspond to the non-terminals in the grammar, and the columns represent the input
symbols (terminals and the end-of-input marker $).
Each cell in the table contains information about the action or the production to be applied
when a specific non-terminal and input symbol combination is encountered. Here's what
each entry in the table represents:
S, A, B: Non-terminals in the grammar.
S->, A->, B->: Indicates a production rule to be applied.
a, b: Terminal symbols from the input alphabet.
ε: Represents an empty or epsilon production.
Empty cells: Indicate that no action or production is defined for that combination of
non-terminal and input symbols.
For example, if the parser is in state S (row) and the next input symbol is "a" (column), the
corresponding cell entry "S->aAB" instructs the parser to apply the production rule "S ->
aAB".
CODE
write a cpp code for construction of predictive parsing table
#include <iostream>
#include <unordered_map>
#include <vector>
#include <set>
int main() {
// Initialize the parsing table
for (char nonTerminal : nonTerminals) {
for (const string &terminal : terminals) {
parsingTable[nonTerminal][terminal] = "";
}
}
return 0;
Experiment # 4
SLR Parser :
SLR is simple LR. It is the smallest class of grammar having few number of states. SLR
is very easy to construct and is similar to LR parsing. The only difference between SLR
parser and LR(0) parser is that in LR(0) parsing table, there’s a chance of ‘shift
reduced’ conflict because we are entering ‘reduce’ corresponding to all terminal states.
We can solve this problem by entering ‘reduce’ corresponding to FOLLOW of LHS of
production in the terminating state. This is called SLR(1) collection of items
bool added;
do {
added = false;
for (const string& item : closureSet) {
size_t dotPos = item.find('.');
return closureSet;
}
return closure(goToSet);
}
while (!itemStack.empty()) {
set<string> currentItem = itemStack.top();
itemStack.pop();
set<string> symbols;
for (const string& item : currentItem) {
size_t dotPos = item.find('.');
}
}
int main() {
// Initialize the productions map
for (const string& rule : rules) {
size_t arrowPos = rule.find("->");
string nonTerminal = rule.substr(0, arrowPos - 1);
productions[nonTerminal].push_back({nonTerminal, rule.substr(arrowPos + 2)});
}
generateLR0Items();
generateCanonicalSets();
generateParsingTables();
return 0;
}
Experiment # 5
Unification Algorithm :
Unification is a process used in various areas of computer science, including artificial
intelligence, logic programming, and automated theorem proving. The goal of unification
is to find a common substitution that makes two expressions identical. The process involves
resolving variables and terms to make different expressions equivalent. Here is a
conceptual overview of the unification algorithm:
Terms and Variables
Terms: In the context of unification, a term is a basic unit of expression. It can be a
constant (like a name or a number) or a variable. Examples of terms include "John,"
"42," or "$X" (where $X is a variable).
Variables: Variables are symbols that represent unknown values. They are
placeholders that can be substituted with other terms during the unification process.
Substitution
Substitution: A substitution is a mapping from variables to terms. It defines how
variables should be replaced to make two expressions identical. For example, if we
have a substitution $X → "John," it means that wherever $X appears in an
expression, it can be replaced with "John."
Unification Process
The unification process involves recursively comparing and matching terms. Here's a step-
by-step overview:
1. Base Case: If both terms are constants and are the same, unification is successful.
If they are different, unification fails.
2. Variable Unification: If one term is a variable and the other is a constant, unify the
variable with the constant by creating a substitution.
3. Variable-Variable Unification: If both terms are variables, unify them by creating
a substitution that maps one variable to the other.
4. Recursive Unification: If both terms are compound (e.g., functions or structures),
recursively unify their sub-terms.
5. Failure Conditions: If at any point during the process a conflict arises (e.g.,
variable occurring in its own definition, occurs-check failure), unification fails.
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
31
Example
Consider the following expressions:
Expression 1: $X + 5
Expression 2: 7
The unification process might proceed as follows:
1. Attempt to unify $X and 7. Create a substitution: $X → 7.
2. The expressions are now unified.
CODE
implementation of unification algorithm in cpp
#include <iostream>
#include <map>
#include <vector>
#include <string>
return term;
}
if (t1 == t2) {
// Terms are already the same, no further unification needed
return true;
} else if (t1[0] == '$' && t2[0] != '$') {
// Unify variable with term
subst[t1] = t2;
return true;
} else if (t1[0] != '$' && t2[0] == '$') {
// Unify term with variable
subst[t2] = t1;
return true;
} else if (t1[0] == '$' && t2[0] == '$' && t1 != t2) {
// Unify two variables
subst[t1] = t2;
return true;
} else if (t1[0] != '$' && t2[0] != '$') {
// Unify two constants
return t1 == t2;
}
int main() {
// Example usage
Term term1 = "$X";
Term term2 = "John";
Substitution subst;
return 0;
}
Experiment # 6
LR Parser:
An LR parser (Left-to-right, Rightmost derivation) is a type of bottom-up parser used in
the field of compiler design for syntax analysis. LR parsers are capable of parsing a broad
class of context-free grammars, making them popular choices for implementing
programming language compilers.
Here are the key concepts associated with LR parsers:
1. Grammar and Productions:
LR parsers operate based on a context-free grammar, typically expressed in
Backus-Naur Form (BNF) or Extended Backus-Naur Form (EBNF).
The grammar consists of a set of productions or rules that describe how valid
sentences or programs can be constructed.
2. Items and States:
LR parsers use items, which are partially completed productions with a "dot"
indicating the current position in the production.
States represent sets of items, and the parser moves between states during the
parsing process.
3. LR(0), SLR(1), LR(1), and LALR(1):
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
35
The parser transitions between states based on the current state, input symbol,
and the parsing tables until it either accepts the input or detects an error.
LR parsers are efficient and capable of handling a wide range of grammars. They are widely
used in practice for implementing compilers and parser generators. The most common
variants used in practice are SLR(1), LR(1), and LALR(1) parsers.
CODE
Code to implement the LR parser table.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_PROD 10
#define MAX_LEN 10
typedef struct {
char left;
char right[MAX_LEN];
} Production;
typedef struct {
char symbol;
int state;
} Action;
typedef struct {
char symbol;
int state;
} Goto;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
37
Production productions[MAX_PROD];
Action actionTable[MAX_PROD][MAX_LEN];
Goto gotoTable[MAX_PROD][MAX_LEN];
int numProductions = 0;
printf("\n");
printf("\n");
}
int main() {
// Example grammar: E -> E+T | T, T -> T*F | F, F -> (E) | id
addProduction('E', "E+T");
addProduction('E', "T");
addProduction('T', "T*F");
addProduction('T', "F");
addProduction('F', "(E)");
addProduction('F', "id");
return 0;
}
Experiment # 7
Code Generation:
generateCode(root->left);
generateCode(root->right);
if (root->value == "+") {
std::cout << "Add operation code here" << std::endl;
} else if (root->value == "-") {
std::cout << "Subtract operation code here" << std::endl;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
44
int main() {
// Example AST for the expression: (5 * 3) + (7 - 2)
Node* rootNode = new Node("+");
rootNode->left = new Node("*");
rootNode->right = new Node("-");
// Clean up the allocated memory (in a real scenario, you'd need proper memory management)
delete rootNode->left->left;
delete rootNode->left->right;
delete rootNode->left;
delete rootNode->right->left;
delete rootNode->right->right;
delete rootNode->right;
delete rootNode;
return 0;
}
Output:
Experiment # 8
Code Optimization:
Code optimization in a compiler is the process of improving the generated code to make it
more efficient in terms of speed, memory usage, and overall performance without altering
its functionality. It's a crucial phase in the compilation process that aims to produce
optimized code from the intermediate representation of the source code. Here's an overview
of code optimization in compilers:
1. Types of Code Optimization:
Local Optimizations: These optimizations focus on improving code within a
small portion of the program, such as basic blocks or individual instructions.
Examples include constant folding, common subexpression elimination, and
strength reduction.
Global Optimizations: These optimizations consider the entire program or
significant parts of it to identify and exploit opportunities for improvement
across multiple code blocks or functions. Techniques like loop optimization,
inlining, and code motion fall under this category.
2. Common Code Optimization Techniques:
Constant Folding and Propagation: Evaluate constant expressions during
compile-time rather than runtime.
Dead Code Elimination: Remove code that will never be executed, such as
unreachable code or variables that are not used.
Loop Optimization: Improve the efficiency of loops by reducing loop
overhead, eliminating redundancies, applying loop unrolling, or
vectorization.
Inlining: Replace function calls with the actual body of the function to reduce
the overhead of function call instructions.
Register Allocation: Optimize the usage of CPU registers to minimize
memory access and improve performance.
Code Motion: Move computations or instructions outside loops or repeated
blocks to reduce redundancy and improve efficiency.
CODE
Write a C++ program for code optimization.
#include <iostream>
// Original Code
int main() {
int a = 5;
int b = 10;
int result = 0;
// Optimized Code
constexpr int opt_a = 5;
constexpr int opt_b = 10;
constexpr int opt_result = opt_a * opt_b;
int finalResult = 0;
return 0;
}
Output:
Experiment # 9
LEX:
LEX (Lexical Analyzer Generator) is a widely used tool for generating lexical analyzers
or tokenizers in the context of compiler construction. It helps in recognizing and generating
tokens from the input stream of characters, which are then used by parsers to understand
the structure of the programming language. Below is a basic explanation of how LEX
works:
1. Defining Tokens:
LEX operates based on regular expressions and corresponding actions
defined for these patterns. Tokens such as keywords, identifiers, operators,
and literals are specified using regular expressions.
2. Lexical Rules:
In a LEX file, you define patterns of characters (using regular expressions)
and corresponding actions to be taken when these patterns are matched.
For example, a pattern for recognizing integers might be defined as [0-9]+,
which represents one or more digits. When LEX encounters this pattern, it
performs an action like returning the token for an integer.
3. LEX Syntax:
A simple LEX file consists of a series of rules in the form of regular
expression patterns followed by actions.
Each rule typically follows the structure: pattern action.
4. Actions:
Actions in LEX are code fragments written in the programming language of
the target compiler. These actions are executed when a corresponding pattern
is matched.
5. LEX Compilation:
The LEX file is processed by the LEX tool to generate a C code file (e.g.,
lex.yy.c) that includes the logic to recognize and handle tokens based on the
defined rules.
6. Integration with YACC/Bison:
LEX is often used in conjunction with YACC (or Bison) to create a complete
compiler. While LEX handles the lexical analysis (tokenizing), YACC deals
with parsing and syntax analysis.
CODE
Write a C++ program for basic LEX program.
#include <iostream>
#include <cctype>
#include <string>
enum TokenType {
NUMBER,
OPERATOR,
INVALID
};
TokenType getTokenType(char c) {
if (isdigit(c)) {
return NUMBER;
} else if (c == '+' || c == '-' || c == '*' || c == '/' || c == '=' || c == '(' || c == ')') {
return OPERATOR;
} else {
return INVALID;
}
}
if (currentTokenType == NUMBER) {
token += c;
} else {
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
51
if (!token.empty()) {
std::cout << "NUMBER: " << token << std::endl;
token = "";
}
if (currentTokenType == OPERATOR) {
std::cout << "OPERATOR: " << c << std::endl;
} else {
std::cout << "INVALID CHARACTER: " << c << std::endl;
}
}
}
if (!token.empty()) {
std::cout << "NUMBER: " << token << std::endl;
}
}
int main() {
std::string expression;
std::cout << "Enter an arithmetic expression: ";
std::getline(std::cin, expression);
tokenize(expression);
return 0;
}
Output:
Experiment # 10
Recursive Descent Parsing:
Recursive Descent Parsing is a top-down parsing technique used in compiler construction
to analyze the syntax of a given input based on a set of production rules. It's called
"recursive descent" because the parser recursively descends through the input by
recursively invoking procedures corresponding to the grammar rules.
Here's an overview of how Recursive Descent Parsing works:
1. Grammar Representation:
Recursive Descent Parsing is typically implemented for grammars expressed
in a context-free grammar (CFG) notation.
2. Parsing Procedures:
Each non-terminal in the grammar is represented by a parsing procedure or
function.
For each non-terminal symbol in the grammar, there is a corresponding
parsing function that recognizes and handles that particular construct in the
input.
3. Procedure Invocation:
When parsing starts, the procedure corresponding to the start symbol of the
grammar is invoked.
The parsing functions recursively call each other based on the rules of the
grammar and the input being processed.
4. Handling Terminals and Non-terminals:
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
53
Parsing functions for terminals handle the actual recognition of input tokens
(such as identifiers, numbers, operators).
Parsing functions for non-terminals are responsible for recognizing higher-
level constructs by calling other parsing functions or performing further
analysis.
5. Recursive Nature:
The parsing functions use recursive calls to handle nested or complex
grammar rules. For example, a parsing function for an expression might call
itself to handle sub-expressions.
6. Backtracking:
Recursive Descent Parsing can backtrack when a parsing function encounters
an incorrect path in the input. This backtracking is often handled using
lookahead techniques or explicit error handling.
7. Implementation in Code:
Recursive Descent Parsing is often implemented using a series of functions
or methods, each corresponding to a grammar rule.
The parser reads tokens from the input and matches them against the
expected constructs based on the grammar rules.
8. Limitations:
Recursive Descent Parsing works well for LL(k) grammars (left-to-right
scan, leftmost derivation, k tokens lookahead) but might face limitations
when dealing with certain ambiguous or left-recursive grammars.
CODE
Write a C++ program for recursive descent parsing.
#include <iostream>
#include <string>
#include <cctype>
class RecursiveDescentParser {
private:
std::string input;
size_t position;
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
54
char peek() {
return position < input.size() ? input[position] : '\0';
}
bool isEOF() {
return position >= input.size();
}
void advance() {
if (!isEOF()) {
position++;
}
}
double factor() {
if (std::isdigit(peek())) {
return parseNumber();
} else if (peek() == '(') {
match('(');
double result = expr();
match(')');
return result;
} else {
std::cerr << "Unexpected character: " << peek() << std::endl;
return 0;
}
}
double term() {
double result = factor();
while (peek() == '*' || peek() == '/') {
char op = peek();
advance();
double nextFactor = factor();
if (op == '*') {
result *= nextFactor;
} else {
result /= nextFactor;
}
}
return result;
}
double expr() {
double result = term();
while (peek() == '+' || peek() == '-') {
char op = peek();
advance();
double nextTerm = term();
if (op == '+') {
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
56
result += nextTerm;
} else {
result -= nextTerm;
}
}
return result;
}
public:
RecursiveDescentParser(const std::string& inputStr) : input(inputStr), position(0) {}
void classifyInput() {
while (!isEOF()) {
char currentChar = peek();
if (isDigit(currentChar)) {
std::cout << currentChar << " is a digit." << std::endl;
advance();
} else if (isOperator(currentChar)) {
std::cout << currentChar << " is an operator." << std::endl;
advance();
} else {
std::cerr << "Unexpected character: " << currentChar << std::endl;
advance();
}
}
}
void parse() {
classifyInput();
}
};
int main() {
std::string inputExpression;
return 0;
}
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
57
Output:
Experiment # 11
FIRST and FOLLOW:
In compiler theory and formal language theory, FIRST and FOLLOW are sets used in
parsing techniques, specifically in constructing predictive parsers for context-free
grammars (CFGs).
1. FIRST Set:
The FIRST set of a grammar symbol in a context-free grammar is the set of
terminal symbols that can begin the strings derived from that symbol. In other
words, it represents the set of terminals that can appear as the first symbol of
a production derived from a non-terminal.
Algorithmically, the FIRST set of a symbol X can be defined as follows:
If X is a terminal, FIRST(X) = {X}.
If X is a non-terminal, and there is a production X → Y1Y2...Yn, then:
Add FIRST(Y1) to FIRST(X) excluding ε (empty string), and
If ε is in FIRST(Y1), add FIRST(Y2) to FIRST(X), and so on,
until ε is not in FIRST(Yi), or if all Yi have ε in their FIRST
sets, then add ε to FIRST(X).
2. FOLLOW Set:
The FOLLOW set of a non-terminal symbol in a context-free grammar is the
set of terminals that can appear immediately to the right of that non-terminal
in some derivation.
Algorithmically, the FOLLOW set of a non-terminal A can be defined as
follows:
Initially, add $ (end of input marker) to FOLLOW(S), where S is the
start symbol and appears in the start rule.
For each production A → αBβ (where α and β are possibly empty
sequences of grammar symbols), add FIRST(β) to FOLLOW(B)
excluding ε.
If ε is in FIRST(β) or β is ε (i.e., A → αB), then add FOLLOW(A) to
FOLLOW(B).
CODE
Write a C++ program to find the FIRST and FOLLOW of a given grammer.
#include <iostream>
#include <unordered_map>
#include <set>
#include <vector>
#include <string>
int main() {
unordered_map<char, vector<string>> grammar = {
{'S', {"AB", "BC"}},
{'A', {"aA", "e"}},
{'B', {"bB", "c"}},
Department of Computer Science
University of Engineering and Technology, Lahore (Narowal Campus)
60
{'C', {"d"}}
};
computeFirst(grammar);
computeFollow(grammar, startSymbol);
return 0;
}
Output: