BCSE307P - Compiler Lab Manual
BCSE307P - Compiler Lab Manual
BCSE307P - Compiler Lab Manual
Table of Contents
Expt. Page
Date Name of the Experiment
No No
Experiment 1
Aim: Study of Phases of Compiler
COMPILER
A compiler is a program that translates source code written in a high-level programming
language into machine code that can be executed by a computer.
The process of translating the source code into machine code involves several distinct
phases, each of which performs a specific task. Here are the main phases of a compiler:
1. Lexical Analysis:
The first phase of a compiler is lexical analysis, which is also known as scanning.
The lexical analyser, often generated by a tool like LEX, reads the source code character by
character and groups them into meaningful units called tokens.
[Tokens are the smallest units of syntax in a programming language and can be keywords,
identifiers, operators, or constants.]
4
2. Syntax Analysis:
The parser, often generated by a tool like YACC, takes the tokens generated by the lexical
analyser and groups them into a hierarchical structure called a parse tree.
The parse tree represents the syntactic structure of the source code and can be used to
check for syntax errors.
3. Semantic Analysis:
The semantic analyser checks the source code for semantic errors, such as type mismatches
or undeclared variables. It also assigns meanings to the symbols used in the source code,
such as variables, functions, and classes.
The compiler generates an intermediate representation of the source code that is more
abstract than the source code itself. The intermediate code can be optimized for efficiency
and translated into different target languages.
5. Code Optimization:
The compiler analyses the intermediate code and applies transformations that improve the
efficiency of the generated code. This can include eliminating redundant operations,
reordering instructions, and simplifying expressions.
6
6. Code Generation:
The compiler generates the target code, which is the actual machine code that can be
executed by a computer. The target code can be optimized for the specific platform or
architecture for which it is intended.
Experiment 2
Aim: Implementation of Token Separation (Lexical Analyzer)
Algorithm:
For every space separated string (token)
For each character in token
If isOperator(token)
StoreOperator(token)
If isKeyword(token)
StoreKeyword(token)
Else if isConstant(token)
StoreConstant(token)
Else
StoreIdentifier(token)
Source Code:
// identifiers, constants, operators, keywords
#include <stdio.h>
#include <string.h>
int isOperator(char c)
{
if (c == '+' || c == '-' || c == '/' || c == '*' || c == '=' || c == ';' ||
c == '(' || c == ')' || c == '{' || c == '}')
{
return 1;
}
return 0;
}
int main()
{
char keywordList[20][255];
int keyword_count = 0;
char identList[20][255];
int ident_count = 0;
char constList[20][255];
char const_count = 0;
char symbols[20];
char symb_count = 0;
Input Text:
int a;
int b;
a = 5;
b = 10;
int c = a + b;
print(c);
Output:
11
Experiment 3
Aim: Study of Lex and Yacc tools
“LEX and YACC are tools that are commonly used in the construction of compilers. LEX is
a lexical analyser generator, while YACC is a parser generator. Together, they provide a
way to generate a complete compiler for a programming language.”
LEX:
LEX is used to generate a lexical analyser, which is responsible for breaking up the input
source code into individual tokens.
LEX generates a C program that performs this analysis on the input source code.
12
YACC
A parser is responsible for analysing the sequence of tokens generated by the lexical
analyzer and checking whether it conforms to the grammar of the programming language
being compiled.
YACC generates a C program that implements a parsing algorithm that is based on the
grammar of the programming language.
2. Defining the grammar of the programming language. This involves specifying the
rules that govern the syntax of the programming language.
3. Writing the LEX input file that defines the regular expressions for the various types
of tokens in the programming language.
13
4. Writing the YACC input file that defines the grammar of the programming language
and specifies the actions that should be taken when a valid sequence of tokens is
recognized.
5. Running LEX and YACC on their respective input files to generate the C programs
that will form the basis of the compiler.
6. Writing additional code to handle semantic analysis, code generation, and other
tasks that are necessary to transform the input source code into executable code.
LEX VS YACC -
“In summary, LEX is used to generate a lexical analyzer, while YACC is used to generate a
parser, and together they form the basis of a compiler for a programming language.”
The resulting compiler can then be used to compile programs written in the
programming language for which it was designed.
14
SUMMARY:-
LEX and YACC are powerful tools that can be used to generate the lexical analyzer and
parser components of a compiler.
They allow for the construction of compilers that can handle complex programming
languages and automate much of the process of transforming source code into executable
code.
15
Experiment 4
Aim: Implementation of Lexical analyser using Lex Tool
a.) Program to recognize Integer, Real and Exponential
%{
#include <stdio.h>
%}
sign [+-]?
digit [0-9]+
exp ([e|E]{sign}{digit})
%%
\+?[0-9]+ {printf("Positive Number");}
\-[0-9]+ {printf("Negative number");}
{sign}{digit}?\.{digit}? {printf("Fractional value");}
{sign}{digit}(\.{digit}?)?{exp} {printf("Exponential value");}
%%
int yywrap(void){
return 1;
}
int main(){
yylex();
return 0;
}
16
b.) C program to implement lexical analyser using lex tool for a simple statement
%{
#include <stdio.h>
#include <stdlib.h>
int count = 1;
%}
%%
%{
#include <stdio.h>
int l_count = 1;
%}
sign [+-]?
digit [0-9\.]+
alpha [a-zA-Z_]+
op [\+\=\-\/\<\>\*\(\)\%\^]
identifier {alpha}+[a-zA-Z_0-9.]*
keyword (include|void|main|int|float|for|define|scanf|if|printf|else|pow)
%%
[ \t]
\{ {printf("%s \t: block start\n", yytext);}
\} {printf("%s \t: block end\n", yytext);}
^# {printf("%s \t: packaging delimiter \n", yytext);}
{keyword} {printf("%s \t: keyword\n", yytext);}
{op} {printf("%s \t: operator\n", yytext);}
\".+?\" {printf("%s \t: string\n", yytext);}
[,;&]
{identifier} {printf("%s \t: identifier\n", yytext);}
{digit} {printf("%s \t: constant\n", yytext);}
[\n] {printf("End of line %d \n\n", l_count); l_count++; }
%%
int yywrap(void){
return 1;
}
int main(){
yyin=fopen("sample.c","r");
yylex();
printf("Line count : %d\n", l_count);
return 0;
}
18
19
20
21
Q1. Recognizing an bn
Algorithm:
The grammar
S -> A Q B N
Q-> A Q B | ε
Yacc code:
%{
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int yywrap();
int yylex();
int yyerror(char *msg);
%}
%token A B N
%%
S: A Q B N { printf("valid string\n");
exit(0); }
;
Q: A Q B |
;
%%
Lex code:
%{
#include "y.tab.h"
%}
%%
[aA] {return A;}
[bB] {return B;}
\n {return N;}
. {return yytext[0];}
%%
int yywrap()
{
return 1;
}
Output:
23
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int yywrap();
int yylex();
int yyerror(char *msg);
%}
%token A B N
%%
S: A Q B N { printf("valid string\n");
exit(0); }
;
Q: A Q |
;
%%
int main()
{
printf("enter the string\n");
yyparse();
}
24
Lex code:
%{
#include "y.tab.h"
%}
%%
[aA] {return A;}
[bB] {return B;}
\n {return N;}
%%
int yywrap()
{
return 1;
}
Output:
25
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int yywrap();
int yylex();
int yyerror(char *msg);
%}
%token A B N
%%
S: A Q N { printf("valid string\n");
exit(0); }
;
Q: Q B |
;
%%
int main()
{
printf("enter the string\n");
yyparse();
}
26
Lex Code:
%{
#include "y.tab.h"
%}
%%
[aA] {return A;}
[bB] {return B;}
\n {return N;}
%%
int yywrap()
{
return 1;
}
Output:
27
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int yywrap();
int yylex();
int yyerror(char *msg);
%}
%token A B N
%%
S: A B Q N { printf("valid string\n");
exit(0); }
;
Q: A B Q |
;
%%
int main()
{
printf("enter the string\n");
yyparse();
}
28
Lex Code:
%{
#include "y.tab.h"
%}
%%
[aA] {return A;}
[bB] {return B;}
\n {return N;}
%%
int yywrap()
{
return 1;
}
Output:
29
Algorithm:
S -> E
E -> E + E {E.val = E1.val +
E2.val} E -> E – E {E.val =
E1.val - E2.val} E -> E * E
{E.val = E1.val * E2.val} E -> E
/ E {E.val = E1.val / E2.val} E ->
E % E {E.val = E1.val % E2.val}
Yacc code:
%{
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int yywrap();
int yylex();
int yyerror(char *msg);
int flag = 0;
%}
%token N
%left '+' '-' '*' '/' '%' '(' ')'
%%
S: E { printf("Answer is : %d\n", $$); return 0; }
;
E: E'+'E {$$=$1+$3;}
| E'-'E {$$=$1-$3;}
| E'*'E {$$=$1*$3;}
| E'/'E {$$=$1/$3;}
| E'%'E {$$=$1%$3;}
30
int main()
{
printf("enter the arithmetic operation\n");
yyparse();
if(flag==0){
printf("Valid arithmetic operation\n");
}
}
Lex code:
%{
#include "y.tab.h"
#include <stdio.h>
#include <stdlib.h>
%}
%%
[0-9]+ {yylval=atoi(yytext); return N;}
[\n] {return 0;}
. {return yytext[0];}
%%
int yywrap()
{
return 1;
}
31
Output:
32
Yacc code:
%{
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int yywrap();
int yylex();
int yyerror(char *msg);
int flag = 0;
char st[100];
int top=0;
void A1()
{
st[top++]=yytext[0];
}
void A2()
{
printf("%c", st[--top]);
}
void A3()
{
printf("%c", yytext[0]);
}
%}
%token ID
%left '+' '-' '*' '/' '%' '(' ')' UMINUS
%%
S: E
E: E'+'{A1();}T{A2();}
| E'-'{A1();}T{A2();}
| T
;
T: T'*'{A1();}F{A2();}
| T'/'{A1();}F{A2();}
| F
;
F: '('E{A2();}')'
33
| '-'{A1();}F{A2();}
| ID{A3();}
;
%%
//driver code
int main()
{
printf("enter the arithmetic operation\n");
yyparse();
printf("\n");
}
Lex code:
%{
#include "y.tab.h"
#include <stdio.h>
#include <stdlib.h>
%}
alpha [A-Za-z]
digit [0-9]
%%
{alpha}+({alpha}|{digit})* {return ID;}
{digit}+ {yylval=atoi(yytext); return ID;}
[\n] {return 0;}
. {return yytext[0];}
%%
int yywrap()
{
return 1;
}
34
Output:
35
char pop()
{
temp = top;
if (temp == NULL)
return -1;
else
temp = temp->ptr;
int popped = top->info;
free(top);
top = temp;
return popped;
36
char peek()
{
return top->info;
}
int isEmpty()
{
if (top == NULL)
return 1;
return 0;
}
{
postfix[j++] = infix[i];
}
else if (infix[i] == '(')
{
push(infix[i]);
}
else if (infix[i] == ')')
{
while (!isEmpty() && peek() != '(')
postfix[j++] = pop();
if (!isEmpty() && peek() != '(')
return "Invalid Expression";
else
pop();
}
else if (isOperator(infix[i]))
{
while (!isEmpty() && precedence(peek()) >= precedence(infix[i]))
postfix[j++] = pop();
push(infix[i]);
}
}
while (!isEmpty())
{
if (peek() == '(')
{
return "Invalid Expression";
}
postfix[j++] = pop();
}
postfix[j] = '\0';
return postfix;
}
}
else if (oper == '*')
{
printf("MUL ");
}
else if (oper == '/')
{
printf("DIV ");
}
else if (oper == '^')
{
printf("EXP ");
}
}
int isAlpha(char c)
{
return (c >= 65 && c <= 90) || (c >= 97 && c <= 122);
}
void printVal(char c)
{
if (isAlpha(c))
{
printf("%c ", c);
}
else
{
printf("t%d ", c + 1);
}
}
int main()
{
char infix[MAX_EXPR_SIZE];
printf("Enter an infix expression: ");
fgets(infix, MAX_EXPR_SIZE, stdin);
char *postfix = infixToPostfix(infix);
printf("Postfix expression : %s\n", postfix);
int n = strlen(postfix);
int count = 0;
char t1, t2;
char quad[50][4];
top = NULL;
for (int i = 0; i < n; i++)
39
{
if (isalnum(postfix[i]))
{
push(postfix[i]);
}
else
{
t1 = pop();
t2 = pop();
quad[count][0] = (count);
quad[count][1] = postfix[i];
quad[count][2] = t2;
quad[count][3] = t1;
push(count);
count++;
}
}
printf("Quadruple table :\n");
for (int i = 0; i < count; i++)
{
printf("%d) ", i);
printf("%c ", quad[i][1]);
printVal(quad[i][2]);
printVal(quad[i][3]);
printf("t%d\n", i + 1);
}
top = NULL;
int reg_count = 0;
for (int i = 0; i < count; i++)
{
if (isAlpha(quad[i][2]))
{
printf("MOV R%d, %c\n", reg_count, quad[i][2]);
push(reg_count);
reg_count++;
}
if (isAlpha(quad[i][3]))
{
printf("MOV R%d, %c\n", reg_count, quad[i][3]);
push(reg_count);
reg_count++;
}
printInstruction(quad[i][1]);
t1 = pop();
t2 = pop();
40
Output:
41
int main()
{
vector<vector<string>> vs;
int u;
cin >> u;
for (int i = 0; i < u + 1; i++)
{
string S, T;
getline(cin, S);
stringstream X(S);
vector<string> v1;
while (getline(X, T, ' '))
v1.push_back(T);
vs.push_back(v1);
42
}
vs.erase(vs.begin());
vector<int> buff(u, -1);
int i = 0;
for (auto &v : vs)
{
if (isdigit(v[1][0]))
{
int n = stoi(v[1]);
if (buff[n] != -1)
v[1] = "#" + to_string(buff[n]);
}
if (isdigit(v[2][0]))
{
int n = stoi(v[2]);
if (buff[n] != -1)
v[2] = "#" + to_string(buff[n]);
}
if ((v[1].rfind("#", 0) == 0) && (v[2].rfind("#", 0) == 0))
{
v[1].erase(v[1].begin());
v[2].erase(v[2].begin());
buff[i] = calc(stoi(v[1]), stoi(v[2]), v[3]);
cout << buff[i] << endl;
}
i++;
}
vector<vector<string>> cp;
for (int i = 0; i < u; i++)
{
if (buff[i] == -1)
{
cp.push_back(vs[i]);
}
}
for (auto v : cp)
{
for (auto v1 : v)
cout << v1 << " ";
cout << "\n";
}
return 0;
}
43
Output:
44
What is LLVM?
LLVM is an acronym that stands for low level virtual machine. It also refers to a compiling.
technology called the LLVM project, which is a collection of modular and reusable compiler.
and toolchain technologies.
A compiler infrastructure designed to optimise code and generate high-performance.
machine code. LLVM is an open-source project that is used in a variety of software,
including.
the LLVM compiler itself, the Clang C/C++ compiler, and the Swift programming language.
We will explore the role of LLVM in modern compiler design, the architecture of LLVM, the
LLVM intermediate representation (IR), and the benefits of using LLVM in compiler design.
LLVM has become an important tool for modern compiler design due to its powerful
optimization features and its ability to generate code for multiple architectures. LLVM
provides a flexible and modular framework that allows developers to easily add new
optimizations or target new architectures.
LLVM is also designed to work with other tools in the compiler toolchain, such as the Clang
C/C++ compiler or the GCC compiler, allowing for seamless integration of different.
components in the compilation process.
THE OPTIMIZER:
The optimizer is the core component of LLVM and is responsible for performing a wide range
of optimizations on the LLVM IR. These optimizations include dead code elimination,
constant propagation, loop optimization, and many others. LLVM uses a modular
architecture that allows developers to easily add new optimizations or modify existing ones.
LLVM IR is designed to be easy to work with and understand. The syntax of LLVM IR is
simple and concise, and it is easy to generate LLVM IR from other languages. LLVM IR is
also designed to be easy to optimize.
LLVM IR includes a rich set of instructions that can be used to perform a wide range of
optimizations such as:
● loop optimization
● constant propagation
● & many others.
3.>LLVM is open-source and has a large and active community. This means that developers
can easily find help and support when using LLVM.
4.>LLVM is designed to work well with other tools in the compiler toolchain, such as Clang
or GCC. This means that compilers built with LLVM can easily integrate with other tools in
the toolchain.
5.> LLVM provides a flexible and modular framework that allows developers to easily add
new optimizations or target new architectures.
Conclusion
LLVM is a powerful and flexible compiler infrastructure that is used in a wide range of
software, including compilers for C/C++, Swift, and many others.
LLVM helps build new computer languages and improve existing languages. It automates
many of the difficult and unpleasant tasks involved in language creation, such as porting the
outputted code to multiple platforms and architectures.