Compiler Design Lab File
Compiler Design Lab File
IT-302
COMPILER DESIGN
Batch :- IT-B
1
INDEX
keyword
lexical analysis
2
EXPERIMENT-1
THEORY :
Non-deterministic Finite Automata (NFA): NFA is a finite automaton where for some
cases when a single input is given to a single state, the machine goes to more than 1
states, i.e. some of the moves cannot be uniquely determined by the present state and
the present input symbol.
CODE :
#include <iostream>
using namespace std;
int main(){
int n = s.size();
char input;
int state = 0;
bool flag = false;
3
for(int i = 1 ; i <= n ; i++){
input = s[i-1];
switch (state)
{
case 0:
if(input == 'J'){
state++;
cout<<"Current state is: "<<state<<endl;
}else{
cout<<"Not Reached"<<endl;
return 0;
}
break;
case 1:
if(input == 'A'){
state++;
cout<<"Current state is: "<<state<<endl;
}else{
cout<<"Not Reached"<<endl;
return 0;
}
break;
case 2:
if(input == 'T'){
state++;
cout<<"Current state is: "<<state<<endl;
}else{
cout<<"Not Reached"<<endl;
return 0;
}
break;
case 3:
if(input == 'I'){
state++;
cout<<"Current state is: "<<state<<endl;
}else{
cout<<"Not Reached"<<endl;
return 0;
}
break;
4
case 4:
if(input == 'N'){
state++;
}else{
cout<<"Not Reached"<<endl;
return 0;
}
break;
case 5:
cout<<"Not Reached"<<endl;
return 0;
default:
break;
}
}
if (flag){
cout<<"Reached Final state: "<<state<<endl;
} else{
cout<<"Not Reached"<<endl;
}
5
OUTPUT :
Explanation:
The provided C++ code is a simple lexical analyzer that recognizes the string
"JATIN" using a non-deterministic finite automaton (NFA). The program
prompts the user to input a string, and it iterates through the characters of the
input, transitioning between states based on the characters encountered.
If the input matches the sequence "JATIN," it reaches the final state, and the
message "Reached Final state" is printed. Otherwise, it outputs "Not Reached."
The code uses a switch statement to handle state transitions and checks for the
correct sequence of characters to match the desired string.
6
EXPERIMENT- 2
AIM: Write a program to implement NFA corresponding to ‘for’
keyword
THEORY:
As it is known that Lexical Analysis is the first phase of compiler also known as
scanner. It converts the input program into a sequence of Tokens.
A C program consists of various tokens and a token is either a keyword, an
identifier, a constant, a string literal, or a symbol.
For Example:
1) Keywords:
Examples- for, while, if etc.
2) Identifier
Examples- Variable name, function name etc.
3) Operators:
Examples- '+', '++', '-' etc.
4) Separators:
Examples- ', ' ';' etc
CODE :
#include <iostream>
using namespace std;
int main() {
char input;
int state = 0;
bool flag = true;
switch (state)
{
case 0:
if (input == 'f') {
state++;
cout << "Current state is: " << state << endl;
}else{
flag = false;
}
break;
case 1:
if (input == 'o') {
state++;
cout << "Current state is: " << state << endl;
}else{
flag = false;
}
break;
case 2:
if (input == 'r') {
state++;
cout << "Current state is: " << state << endl;
}else{
flag = false;
}
break;
case 3:
if ((input >= 'a' && input <= 'z') ||
(input >= 'A' && input <= 'Z') ||
(input >= '0' && input <= '9'))
{
flag = false;
} else {
8
state++;
cout << "Current state is: " << state << endl;
i--;
}
break;
default:
break;
}
9
OUTPUT :
10
Explanation:
The provided C++ code is a simple lexical analyzer that recognizes the keyword "for"
in an input string using a basic nondeterministic finite automaton (NFA) approach.
The NFA has four states, each corresponding to a character in the keyword "for." The
analyzer iterates through the input string character by character, transitioning between
states based on the characters encountered. The final state is reached if the input
matches the keyword "for," and a corresponding token is issued.
The NFA starts in state 0 and progresses through states 1, 2, and 3 for each character
in "for." If the next character is not as expected, the flag is set to false, indicating that
the string does not match the keyword. The analyzer also checks for invalid characters
after reaching the final state.
The code outputs the current state at each transition and issues a token if the final state
is reached, denoting the recognition of the "for" keyword. This lexical analyzer serves
as a basic example of string recognition using a finite automaton. However, for a more
comprehensive lexical analysis, tools like Lex or Flex are commonly used in compiler
design.
11
EXPERIMENT-3
AIM: Write a Program to implement a lexical analyzer to issue tokens
for “for(”
THEORY:
The provided code is a C++ implementation of a simple lexical analyzer. It utilises a
finite state machine approach to recognize and categorise tokens in a given input
string. The code reads characters from the input and transitions between states based
on the observed characters, identifying and printing tokens such as identifiers,
numbers, and operators. It employs a buffer to accumulate characters for token
formation and recognizes whitespace to separate tokens.
Lexical Analysis, the initial compiler phase, transforms input code into Tokens. In C
programs, tokens include keywords (e.g., for, while), identifiers (e.g., variable
names), operators (e.g., +, ++), and separators (e.g., ',', ';'). The process categorises
elements to facilitate subsequent compilation phases.
CODE :
#include <bits/stdc++.h>
using namespace std;
12
state = 1;
}
else
{
return false;
}
case 1:
if (s[state] == 'o')
{
state = 2;
}
else
{
return false;
}
case 2:
if (s[state] == 'r')
{
state = 3;
}
else
{
return false;
}
case 3:
if (state < s.size() && (s[state] < '0' || s[state] > '9') &&
(s[state] < 'a' || s[state] > 'z') && (s[state] < 'A' || s[state] > 'Z'))
{
cout << "Token Sucessfully Issued <for>" << endl;
return true;
}
else
{
return false;
}
default:
return false;
}
cout << "Token Sucessfully Issued <for>" << endl;
return true;
}
bool isParanthesis(string s)
{
int state = 0;
13
switch (state)
{
case 0:
if (s[state] == '(')
{
cout << "Token Sucessfully Issued <PAR,S-OP>" << endl;
return true;
}
else if (s[state] == ')')
{
cout << "Token Sucessfully Issued <PAR,S-CL>" << endl;
return true;
}
else
{
cout << "Input invalid" << endl;
return false;
}
}
return false;
}
void mainProgram(string s)
{
int lexemeBegin = 0, forward = 0;
while (lexemeBegin < s.length() - 1)
{
while (forward < s.length())
{
string s1 = s.substr(lexemeBegin, forward - lexemeBegin + 1);
if (isKeyword(s1))
{
lexemeBegin = forward;
break;
}
else
{
forward++;
}
}
forward = lexemeBegin;
while (forward < s.length())
{
string s1 = s.substr(lexemeBegin, forward - lexemeBegin + 1);
if (isParanthesis(s1))
14
{
lexemeBegin = forward + 1;
break;
}
else
{
forward++;
}
}
forward = lexemeBegin;
}
}
int main()
{
string s;
getline(cin, s);
mainProgram(s);
return 0;
}
15
OUTPUT :
Explanation:
The given C++ code implements a simple lexical analyzer that recognizes keywords
and parentheses in an input string. Let's break down the key components and the flow
of the program:
1. isKeyword() Function:
- This function checks whether a given string `s` represents the keyword "for."
- It uses a state machine with different cases representing the sequence of characters
expected for the keyword "for."
- The function returns `true` if the string matches the keyword, and it prints a
success message. Otherwise, it returns `false`.
2. isParanthesis() Function:
- This function checks whether a given string `s` represents an open parenthesis '(' or
a close parenthesis ')'.
- It uses a simple case to distinguish between '(' and ')'.
16
- If the input is a valid parenthesis, it prints a corresponding success message and
returns `true`. Otherwise, it returns `false`.
3. mainProgram() Function:
- This function is the main driver for the lexical analysis process.
- It utilizes two pointers, `lexemeBegin` and `forward`, to extract substrings from
the input string.
- The function iterates through the input string and, for each substring, checks
whether it is a keyword using `isKeyword`. If a keyword is found, the `lexemeBegin`
pointer is updated to the next position.
- Subsequently, it checks for parentheses using `isParenthesis` in a similar manner.
4. main() Function:
- In the `main` function, the user inputs a string, and `mainProgram` is called to
perform lexical analysis on the input string.
- The program terminates after processing the input string.
It's important to note that the code currently handles only keywords and parentheses.
A comprehensive lexical analyzer in a compiler would typically handle a broader set
of token types, including identifiers, operators, literals, and separators.
17
EXPERIMENT-4
AIM: Write a program to implement all the categories of
the lexical analysis
THEORY:
Lexical Analysis:
● Lexical analysis, also known as scanning or tokenization, marks the
initial phase of compilation.
● Its primary task is to divide the source code into a sequence of tokens.
Tokens:
● Tokens are the smallest units of meaning in a programming language.
● They serve as fundamental building blocks that represent various
elements in a program.
Types of Tokens:
● Tokens include keywords, identifiers, operators, literals, and punctuation.
● These categories cover the essential components that form the structure of
a program.
18
OVERVIEW:
CODE:
#include <bits/stdc++.h>
using namespace std;
19
bool isDigit(char s){
return (s >= '0' && s <= '9');
}
// To check if the given string is a keyword or not
bool isKeyword(string s){
int state = 0;
switch (state){
case 0:
if (s[state] == 'f')
{
state = 1;
}
else
{
return false;
}
case 1:
if (s[state] == 'o')
{
state = 2;
}
else
{
return false;
}
case 2:
if (s[state] == 'r'){
state = 3;
}
else{
return false;
}
case 3:
if (state < s.size() && (!isLetter(s[state]) && !isDigit(s[state]))){
cout << "Token Issued Successfully <for>" << endl;
return true;
}
else{
return false;
}
default:
return false;
}
cout << "<for>" << endl;
return true;
20
}
22
bool isArith(string s)
{
int state = 0;
switch (state)
{
case 0:
if (s[state] == '+')
{
cout << "Token Issued Successfully <arith,PL>" << endl;
return true;
}
else if (s[state] == '-')
{
cout << "Token Issued Successfully <arith,MN>" << endl;
return true;
}
else if (s[state] == '*')
{
cout << "Token Issued Successfully <arith,ML>" << endl;
return true;
}
else if (s[state] == '/')
{
cout << "Token Issued Successfully <arith,DV>" << endl;
return true;
}
else
{
return false;
}
}
return false;
}
bool isNumber(string s)
{
int state = 0;
int val = 0;
switch (state)
{
case 0:
if (s[state] >= '0' && s[state] <= '9')
{
val = val * 10 + (s[state] - '0');
23
state = 1;
}
else
{
return false;
}
case 1:
while (s[state] >= '0' && s[state] <= '9')
{
bool isVariable(string s)
{
int state = 0;
switch (state)
{
case 0:
if ((s[state] >= 'a' && s[state] <= 'z') || (s[state] >= 'A' && s[state] <=
'Z'))
{
state = 1;
}
else
{
return false;
}
case 1:
while ((s[state] >= 'a' && s[state] <= 'z') || (s[state] >= 'A' && s[state]
<= 'Z') || (s[state] >= '0' && s[state] <= '9'))
{
state++;
continue;
24
}
if ((s[state] < '0' || s[state] > '9') && (s[state] < 'a' || s[state] >
'z') && ((s[state] < 'A' || s[state] > 'Z')))
{
string ss = s.substr(0, s.size() - 1);
if (mp.find(ss) == mp.end())
{
cout << "Token Issued Successfully <id," << ct << ">" << endl;
mp[ss] = ct;
ct++;
}
else
{
cout << "Token Issued Successfully id," << mp[ss] << ">" << endl;
}
return true;
}
else
{
return false;
}
}
return false;
}
int main()
{
string s;
getline(cin, s);
forward++;
}
}
forward = lexemeBegin;
if (lexemeBegin < s.length() && s[lexemeBegin] != '=' && s[lexemeBegin] != '
' && s[lexemeBegin] != '+' && s[lexemeBegin] != '-' && s[lexemeBegin] != '<' &&
s[lexemeBegin] != '>' && s[lexemeBegin] != '$' && s[lexemeBegin] != ';' &&
s[lexemeBegin] != ':' && s[lexemeBegin] != '(' && s[lexemeBegin] != ')' &&
(s[lexemeBegin] < '0' || s[lexemeBegin] > '9') && (s[lexemeBegin] < 'a' ||
s[lexemeBegin] > 'z') && ((s[lexemeBegin] < 'A' || s[lexemeBegin] > 'Z')))
{
cout << "Error " << s[lexemeBegin] << " is not identified so token not
issued" << endl;
28
return 0;
}
if (forward == 0)
{
break;
}
}
return 0;
}
OUTPUT :
29