0% found this document useful (0 votes)
16 views

CompilerDesign

Uploaded by

tejsvibhat
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

CompilerDesign

Uploaded by

tejsvibhat
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Practical-1

 Objective : Practice of LEX/YACC in Compiler Design

 Introduction
In compiler design, two critical components are the lexical analyzer and the syntax analyzer. LEX and
YACC are tools that help automate the creation of these components.
 LEX (Lexical Analyser Generator): Used to generate a lexical analyzer or scanner that identifies
tokens in the input code.
 YACC (Yet Another Compiler Compiler): Used to generate a parser that analyzes the syntactic
structure of the tokens.

 LEX (Lexical Analyser Generator) :


 Purpose:
LEX is a tool that generates lexical analyzers. It simplifies the process of tokenizing input by
converting specified patterns of characters (such as keywords, operators, and identifiers) into
recognizable tokens that the parser can use.

 How LEX Works:


LEX translates patterns (defined using regular expressions) into C code that forms a scanner. This
scanner reads the input stream, identifies tokens, and passes them to the parser.
Process Flow:
1. Input: A specification file with regular expressions and corresponding actions.
2. Output: C code for the lexical analyzer (scanner).
3. Execution: The generated lexical analyzer reads the input, matches patterns to identify tokens,
and passes them to the parser.

 Structure of a LEX Program:


A typical LEX program has three main sections:
1. Definition Section: Includes library imports, declarations of variables, and constants.
2. Rules Section: Contains regular expressions & actions to be executed when patterns are matched.
3. User Code Section: Contains helper functions written in C.

Example:
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf("NUMBER\n"); }
[a-zA-Z]+ { printf("IDENTIFIER\n"); }
"+" { printf("PLUS\n"); }
\n { /* ignore newline */ }
. { printf("UNKNOWN\n"); }
%%
int main() {
yylex();
return 0;
}
Explanation:
- The above example identifies numbers, identifiers, and the plus operator.
- The pattern `[0-9]+` matches sequences of digits, and `[a-zA-Z]+` matches sequences of letters.

 Advantages of LEX:
 Automated Lexical Analysis: Automates the process of recognizing tokens using regular
expressions.
 Efficiency: Generates fast and optimized C code for token recognition.
 Ease of Use: Simplifies the development of lexical analyzers.

 YACC (Yet Another Compiler Compiler) :


 Purpose:
YACC is a tool used to generate a syntax analyzer (parser). It takes a formal grammar as input and
generates C code that can parse sequences of tokens produced by a lexical analyzer like LEX.

 How YACC Works:


YACC uses context-free grammar to create a parser. It typically generates a bottom-up parser
(LALR(1) parser), which reads tokens from the lexical analyzer and applies grammar rules to
validate the input's syntax.
Process Flow:
1. Input: A specification file with grammar rules.
2. Output: C code for the syntax analyzer (parser).
3. Execution: The parser reads tokens, applies grammar rules, and either reports syntax errors or
builds an Abstract Syntax Tree (AST).

 Structure of a YACC Program:


A typical YACC program consists of three sections:
1. Definition Section: Token declarations, type definitions, and library imports.
2. Rules Section: Grammar rules, each followed by corresponding actions in C code.
3. User Code Section: Additional C functions for auxiliary tasks.

Example:
%{
#include <stdio.h>
%}
%token NUMBER
%%
expression: NUMBER '+' NUMBER { printf("Sum: %d\n", $1 + $3); }
;
%%
int main() {
yyparse();
return 0;
}
Explanation:
- The rule `expression: NUMBER '+' NUMBER` defines an expression that consists of two numbers
separated by a plus sign.
- `$1` and `$3` represent the first and third tokens (numbers), and `$2` would represent the plus sign.

 Advantages of YACC:
-Automated Parser Generation: Simplifies parser creation from complex grammars.
- Error Detection and Handling: Efficiently identifies and handles syntax errors.
- Flexibility: Allows defining complex grammar rules for different programming languages.

 Integration of LEX and YACC


To use LEX and YACC together:
1. Create a LEX file (e.g., `scanner.l`) to identify tokens.
2. Create a YACC file (e.g., `parser.y`) to define grammar rules.
3. Compile both using commands:
lex scanner.l
yacc -d parser.y
gcc lex.yy.c y.tab.c -o parser
./parser

Example LEX File (`scanner.l`):


%{
#include "y.tab.h"
%}
%%
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
"+" { return '+'; }
\n { /* Ignore new lines */ }
%%

Example YACC File (`parser.y`):


%{
#include <stdio.h>
%}
%token NUMBER
%%
expression: NUMBER '+' NUMBER { printf("Result: %d\n", $1 + $3); }
;
%%
int main() {
yyparse();
return 0;
}
Compilation and Execution Steps:
1. Generate scanner code: `lex scanner.l`
2. Generate parser code: `yacc -d parser.y`
3. Compile: `gcc lex.yy.c y.tab.c -o parser`
4. Execute: `./parser`

Output:
Input: 4 + 5
Output: Result: 9

 Advantages of Using LEX and YACC Together:


- Modular Design: Separates lexical analysis (LEX) and syntax analysis (YACC), making the code
more organized and maintainable.
- Efficiency: Both tools generate optimized C code, which enhances performance.
- Automation: Reduces manual coding effort by automatically generating lexical analyzers and
parsers.

 Conclusion
Using LEX and YACC in compiler design offers a streamlined approach to implementing lexical and
syntax analysis. By automating token recognition and grammar parsing, these tools help developers
create efficient and reliable compilers for different programming languages.

 References
- "Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
- Compiler Construction: Principles and Practice by Kenneth C. Louden
- Official GNU Documentation for Flex and Bison
Practical-2
 Objective: Write a program to check whether a string include keyword or not.
 Code:
#include <iostream>
#include <string>

int main() {
std::string str, keyword;
std::cout << "Enter a string: ";
std::getline(std::cin, str);
std::cout << "Enter a keyword: ";
std::getline(std::cin, keyword);

if (str.find(keyword) != std::string::npos) {
std::cout << "The keyword is present in the string." << std::endl;
} else {
std::cout << "The keyword is not present in the string." << std::endl;
}
return 0;
}

 Output:
Practical-3
 Objective: Write a program to check whether a string contains an alphabet or not.
 Code:
#include <iostream>
#include <string>
#include <cctype> // For isalpha function

int main() {
std::string str;
bool hasAlphabet = false;

std::cout << "Enter a string: ";


std::getline(std::cin, str);

for (char c : str) {


if (isalpha(c)) {
hasAlphabet = true;
break;
}
}

if (hasAlphabet) {
std::cout << "The string contains at least one alphabet character." << std::endl;
} else {
std::cout << "The string does not contain any alphabet characters." << std::endl;
}
return 0;
}

 Output:
Practical-4
 Objective: Write a program to show all the operations of a stack.
 Code:
#include <iostream>
#include <stack>

int main() {
std::stack<int> stack;
int choice, value;

do {
std::cout << "\nStack Operations Menu:";
std::cout << "\n1. Push";
std::cout << "\n2. Pop";
std::cout << "\n3. Top";
std::cout << "\n4. Is Empty";
std::cout << "\n5. Size";
std::cout << "\n6. Exit";
std::cout << "\nEnter your choice: ";
std::cin >> choice;

switch (choice) {
case 1:
std::cout << "Enter value to push: ";
std::cin >> value;
stack.push(value);
std::cout << value << " pushed into the stack." << std::endl;
break;
case 2:
if (!stack.empty()) {
std::cout << "Popped value: " << stack.top() << std::endl;
stack.pop();
} else {
std::cout << "Stack is empty." << std::endl;
}
break;
case 3:
if (!stack.empty()) {
std::cout << "Top value: " << stack.top() << std::endl;
} else {
std::cout << "Stack is empty." << std::endl;
}
break;
case 4:
std::cout << (stack.empty() ? "Stack is empty." : "Stack is not empty.") << std::endl;
break;
case 5:
std::cout << "Stack size: " << stack.size() << std::endl;
break;
case 6:
std::cout << "Exiting..." << std::endl;
break;
default:
std::cout << "Invalid choice. Please try again." << std::endl;
}
} while (choice != 6);

return 0;
}
 Output:
Practical-5
 Objective: Write a program to remove left recursion from a grammar.
 Code:
#include <iostream>
#include <vector>
#include <string>
using namespace std;

// Structure to represent a production rule


struct Production {
char nonTerminal;
vector<string> symbols;
};

// Function to remove left recursion


vector<Production> removeLeftRecursion(const vector<Production>
&productions) {
vector<Production> newProductions;

for (const auto &prod : productions) {


vector<string> nonRecursive, recursive;

// Split symbols into recursive and non-recursive


for (const auto &symbol : prod.symbols) {
if (symbol[0] == prod.nonTerminal) {
recursive.push_back(symbol.substr(1));
} else {
nonRecursive.push_back(symbol);
}
}

// If there is left recursion, create new production rules


if (!recursive.empty()) {
char newNonTerminal = prod.nonTerminal + 1;
vector<string> updatedNonRecursive;

// Update non-recursive productions


for (const auto &symbol : nonRecursive) {
updatedNonRecursive.push_back(symbol + newNonTerminal);
}

newProductions.push_back({prod.nonTerminal, updatedNonRecursive});

// Add epsilon and recursive productions


for (auto &recSymbol : recursive) {
recSymbol += newNonTerminal;
}
recursive.push_back(""); // epsilon production
newProductions.push_back({newNonTerminal, recursive});
} else {
newProductions.push_back(prod);
}
}
return newProductions;
}

int main() {
// Define original productions
vector<Production> productions = {
{'E', {"E+T", "T"}},
{'T', {"T*F", "F"}},
{'F', {"(E)", "id"}}
};

// Display original productions


cout << "Original Productions:\n";
for (const auto &prod : productions) {
cout << prod.nonTerminal << " -> ";
for (const auto &symbol : prod.symbols) {
cout << symbol << " | ";
}
cout << "\b\b \n"; // Remove trailing "| "
}

// Remove left recursion


auto newProductions = removeLeftRecursion(productions);

// Display new productions after removing left recursion


cout << "\nProductions after removing left recursion:\n";
for (const auto &prod : newProductions) {
cout << prod.nonTerminal << " -> ";
for (const auto &symbol : prod.symbols) {
cout << symbol << " | ";
}
cout << "\b\b \n"; // Remove trailing "| "
}

return 0;
}
 Output:
Practical-6
 Objective: Write a program to perform left factoring on a grammar.
 Code:
#include <iostream>
#include <vector>
#include <string>
#include <map>
using namespace std;

// Function to find the longest common prefix of two strings


string longestCommonPrefix(string s1, string s2) {
int len = min(s1.length(), s2.length());
string result = "";
for (int i = 0; i < len; i++) {
if (s1[i] == s2[i]) {
result += s1[i];
} else {
break;
}
}
return result;
}

// Function to perform left factoring on a grammar


void leftFactorGrammar(map<string, vector<string>> &grammar) {
for (auto &rule : grammar) {
string nonTerminal = rule.first;
vector<string> productions = rule.second;

if (productions.size() > 1) {
string commonPrefix = productions[0];
for (int i = 1; i < productions.size(); i++) {
commonPrefix = longestCommonPrefix(commonPrefix, productions[i]);
if (commonPrefix.empty()) break;
}

// If there's a common prefix, perform left factoring


if (!commonPrefix.empty()) {
cout << nonTerminal << " -> " << commonPrefix << nonTerminal << "'\n";
vector<string> newProductions;
for (string &prod : productions) {
string suffix = prod.substr(commonPrefix.length());
if (suffix.empty()) suffix = "ε"; // ε represents an empty production
newProductions.push_back(suffix);
}

cout << nonTerminal << "' -> ";


for (int i = 0; i < newProductions.size(); i++) {
if (i > 0) cout << " | ";
cout << newProductions[i];
}
cout << "\n";
} else {
cout << nonTerminal << " -> ";
for (int i = 0; i < productions.size(); i++) {
if (i > 0) cout << " | ";
cout << productions[i];
}
cout << "\n";
}
} else {
cout << nonTerminal << " -> " << productions[0] << "\n";
}
}
}

int main() {
map<string, vector<string>> grammar;
int n;
cout << "Enter the number of grammar rules: ";
cin >> n;

// Input grammar
for (int i = 0; i < n; i++) {
string nonTerminal, arrow, production;
cout << "Enter non-terminal: ";
cin >> nonTerminal >> arrow; // Arrow input: '->'
vector<string> productions;
cout << "Enter productions (separated by space, end with newline): ";
while (cin >> production) {
productions.push_back(production);
if (cin.peek() == '\n') break; // End input when newline is encountered
}
grammar[nonTerminal] = productions;
}

cout << "\nLeft Factored Grammar:\n";


leftFactorGrammar(grammar);
return 0;
}

 Output:
Practical-7
 Objective: Write a program to find out the FIRST of the Nonterminals in a grammar.
 Code:
#include <iostream>
#include <vector>
#include <unordered_set>
#include <unordered_map>
#include <string>
using namespace std;

// Production rules of the grammar


unordered_map<string, vector<string>> productions;
unordered_map<string, unordered_set<char>> firstSets;

// Calculate the FIRST set for a non-terminal symbol


void calculateFirstSet(const string &nonTerminal) {
if (firstSets.find(nonTerminal) != firstSets.end()) return; // Already calculated

unordered_set<char> firstSet;
for (const string &production : productions[nonTerminal]) {
char symbol = production[0];
if (isupper(symbol)) { // Non-terminal symbol
calculateFirstSet(string(1, symbol));
firstSet.insert(firstSets[string(1, symbol)].begin(), firstSets[string(1, symbol)].end());
} else {
firstSet.insert(symbol); // Terminal symbol
}
}
firstSets[nonTerminal] = firstSet;
}

int main() {
productions["S"] = {"aBC", "b"};
productions["B"] = {"b", "C"};
productions["C"] = {"c", "e"};

// Calculate FIRST sets for each non-terminal


for (const auto &production : productions) {
calculateFirstSet(production.first);
}

// Print the FIRST sets


for (const auto &nonTerminal : firstSets) {
cout << "FIRST(" << nonTerminal.first << "): {";
for (char symbol : nonTerminal.second) {
cout << symbol << " ";
}
cout << "}\n";
}

return 0;
}

 Output:
Practical-8
 Objective: Implementing Programs using Flex (Lexical analyzer tool).
 Theory:
Introduction of Flex:
Flex is a lexical analyzer generator, which is a tool for programming that
recognizes lexical patterns in the input with the help of flex specifications. Scroll
below to see the list of flex programs.

Structure of a Flex program


//Definition Section :
%{ }%
//Rules Section :
%%{}%%
//User Code

Program (a): Print ‘Hello World’


%{
#undef yywrap
#define yywrap() 1
%}
%%
[\n] { printf("Hello World\n"); }
%%
int main() {
yylex();
return 0;
}

Output:
Program (b): Program to check if the given letter is a vowel or not.
%{
#undef yywrap
#define yywrap() 1
void display(int);
%}
%%
[a|e|i|o|u] { display(1); }
. { display(0); }
%%
void display(int flag) {
if(flag == 1)
printf("The given letter [%s] is a vowel\n", yytext);
else
printf("The given letter [%s] is NOT a vowel\n", yytext);
}
int main() {
printf("Enter a letter: ");
yylex();
return 0;
}

Output:
Practical-9
 Objective: Elaborate DAG Representation with examples.
 Theory:
 Directed Acyclic Graph:
The Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to visualize
the flow of values between basic blocks, and to provide optimization techniques in the basic
block. To apply an optimization technique to a basic block, a DAG is a three-address code that is
generated as the result of an intermediate code generation.
• Directed acyclic graphs are a type of data structure and they are used to apply
transformations to basic blocks.
• The Directed Acyclic Graph (DAG) facilitates the transformation of basic blocks.
• DAG is an efficient method for identifying common subexpressions.
• It demonstrates how the statement’s computed value is used in subsequent statements.
Examples of directed acyclic graph:

Directed Acyclic Graph Characteristics:


A Directed Acyclic Graph for Basic Block is a directed acyclic graph with the following labels
on nodes.
• The graph’s leaves each have a unique identifier, which can be variable names or
constants.
• The interior nodes of the graph are labelled with an operator symbol.
• In addition, nodes are given a string of identifiers to use as labels for storing the
computed value.
• Directed Acyclic Graphs have their own definitions for transitive closure and transitive
reduction.
• Directed Acyclic Graphs have topological orderings defined.
Algorithm for construction of Directed Acyclic Graph:
There are three possible scenarios for building a DAG on three address codes:
Case 1 – x = y op z
Case 2 – x = op y
Case 3 – x = y

Directed Acyclic Graph for the above cases can be built as follows:
Step 1 –
• If the y operand is not defined, then create a node (y).
• If the z operand is not defined, create a node for case(1) as node(z).
Step 2 –
• Create node(OP) for case(1), with node(z) as its right child and node(OP) as its left child (y).
• For the case (2), see if there is a node operator (OP) with one child node (y).
• Node n will be node(y) in case (3).
Step 3 –
Remove x from the list of node identifiers. Step 2: Add x to the list of attached identifiers for node n.

Example 1:
T0 = a + b : Expression 1
T1 = T0 + c : Expression 2
d = T0 + T1 : Expression 3

Expression 1: T0 = a + b Expression 2: T1 = T 0 + c

Expression 3 : d = T0 + T1

Example 2:
T1 = a + b
T2 = T1 + c
T3 = T1 x T2

Example 3 :
T1 = a + b
T2 = a – b
T3 = T1 * T2
T4 = T1 – T3 T5 = T4 + T3

You might also like