0% found this document useful (0 votes)
45 views28 pages

CC Projectgroup 2

The document summarizes a student project to build a C++ compiler. It includes: 1. An introduction describing the scope of the project to design a mini C++ compiler that can detect C code and report basic syntax errors in C++ code. 2. A table of contents listing the sections and subsections of the project documentation. 3. An overview of the lexical analysis phase that involves converting source code to tokens using regular expressions, deterministic finite automata and lexical analysis code.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views28 pages

CC Projectgroup 2

The document summarizes a student project to build a C++ compiler. It includes: 1. An introduction describing the scope of the project to design a mini C++ compiler that can detect C code and report basic syntax errors in C++ code. 2. A table of contents listing the sections and subsections of the project documentation. 3. An overview of the lexical analysis phase that involves converting source code to tokens using regular expressions, deterministic finite automata and lexical analysis code.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

SUPERIOR UNIVERSITY LAHORE

Faculty of Computer Science & IT


Compiler Construction

PROJECT Final Documentations

[C++ compiler]

Project Team
Student Name Student ID Program Contact Number Email Address
Muhammad Taha Bcsm-f17-110 BSCS Bcsm-f17-
[email protected]
Ahmar Awan Bcsm-f17-098 BSCS Bcsm-f17-098
@superior.edu.pk
Ahsen ejaz Bcsm-f17-314 BSCS Bcsm-f17-314
@superior.edu.pk
Asad Ali Rana Bcsm-s17-036 BSCS Bcsm-s17-036
@superior.edu.pk
Acknowledgements

We are thankful to our lecturer Miss. Maryam for her invaluable guidance, continuous encouragement and
constant support in making this project possible. We appreciate her guidance from the initial to the final level
that enabled us to develop an understanding of this project thoroughly. Without her advice and assistance, it
would have been a lot tougher to complete this project.
Executive Summary
As we studied about the complier that is use to translate high-level language and convert it into target
and understandable low-level language. Compilers are utility programs that take your code and
transform it into executable machine code files .When you run a compiler on your code, first, the
preprocessor reads the source code (the C++ file you just wrote). The preprocessor searches for any
preprocessor directives (lines of code starting with a #). Preprocessor directives cause the preprocessor
to change your code in some way (by usually adding some library or another C++ file). The compiler
works through the preprocessed code line by line translating each line into the appropriate machine
language instruction. This will also uncover any syntax errors that are present in your source code and
will throw an error to the command line.
Table of Contents
Acknowledgments...........................................................................................................................1
Executive Summary.......................................................................................................................2
Table of Contents………………………………………………………………………………...3
1. Introduction.…………………………………………………………………………………...4
2. Scope of Project………………………………………………………………………………..4
3.Problem Statement …………………………………………………………………………….
4.proposed Solution……………………………………………………………………………...
5.Existing System…………………………………………………………………………………
6. Lexical Analysis Phase………………………………………………………………………..6
Lexical Analysis (Tokenization) Code……………………………………………………6
Regular Expression………………………………………………………………………..8
Deterministic Finite Automata (DFA)…………………………………………………….8
Lexical Code (for implementing DFA)……………………………………………………9
Project Title: c++ compiler

1. Introduction
The C and C++ programming languages are closely related. C++ grew out of C, as it was designed to be
source-and-link compatible with C. C++ was based on C and retains a great deal of the functionality.
The C++ language provides mechanisms for mixing code that is compiled by compatible C and C++
compilers in the same program. As a matter of fact, C++ can run most of C code while C cannot run
most C++ code. 
The purpose of compatibility with C is so that C++ programs can have convenient access to the billions
(trillions?) of lines of existing C code in the world.
Although, C and C++ code are almost compatible but there are still many incompatibilities or conflicts
between them. The conflicts can be of two types:
Incompatible C feature - valid as C code but not as C++ code.
Incompatible C++ feature - valid as C++ code but not as C code.
In this project we focus on a different domain. Compatible C/C++ features i.e. features of C code that
are valid in C++. We aim at detecting such snippets of code in our input program and will give an error
if a C code is detected, whilst if no C code could be detected then we will compile it for minor errors,
i.e. a mini compiler strictly for C++.
Scope of the Project
The purpose of this project is to design a convenient and easy to use compiler for c++ language.  In
addition to detect C code inside C++ code, our Mini C++ compiler will also be able to report following
errors to the user:
Invalid variable name.
Invalid basic arithmetic expression.
Syntax error in While loop.
Syntax errors in For loop.
Syntax errors in If-Then-Else.

Problem Statement

Compiler:
Its difficult to find exact error in huge line of code.
Construct a mini C++ type compiler.
Don’t tell us about the exact error.
It should be able to strictly identify only C++ code.
It should report an error And any C code which is acceptable in C++.

Proposed Solution

The project is implemented in the following steps: -


Read the given Input.
Tokenize the input using Lex rules.
Parse using Yacc rules.
Run the Algorithm described below.
Algorithm:-
Goal: detect a c code that is generally successfully compiled by a C++ compiler and
accept small C++ codes.
Steps:
Detect for header files (generally all c codes have .h header files)
Detect C language functions and keywords that are compatible with C++ compilers.
If(c code detected )
Then return ERROR and STOP.
Else go to step 4.
4. Check for error like
Invalid variable name.
Invalid basic arithmetic expression.
Syntax error in While loop.
Syntax errors in For loop.
Syntax errors in If-Then-Else.
Lexical Analysis Phase
Lexical analysis, lexing or tokenization is the process of converting a High level input
program of characters into a sequence of tokens. It is the first phase of compiler also known
as scanner. The output is a sequence of tokens that is sent to the parser for syntax analysis.

Lexical Analysis (Tokenization) Code

%{
#include<stdio.h>
%}

%%

[0-9]+|[0-9]*\.[0-9]+ {printf("Number");}
[\+\-\*\/\^] {printf("Operator");}
[()] {printf("Punctuation");}

%%

Int yywrap()
{
return 1;
}

int main()
{

printf("Enter an Expression: ",yytext);

yylex();
return 0;
}
Final Project

Regular Expression

gletter_ ( letter_ | digit )* (

Lexical Code (for implementing DFA)

%{

%}

%s A B C D E F G H I J K L M N X

%%
<INITIAL>([-+]?[0-9]*[\.]?[0-9]+)+ BEGIN A;
<INITIAL>l BEGIN B;
<INITIAL>s BEGIN E;
<INITIAL>c BEGIN I;
<INITIAL>t BEGIN K;
<INITIAL>[^lsct0123456789\.\n] BEGIN X;
<INITIAL>\n BEGIN INITIAL; {printf("Accepted");}

<A>\^ BEGIN INITIAL;


<A>\+ BEGIN INITIAL;
<A>\- BEGIN INITIAL;
<A>\* BEGIN INITIAL;
<A>\/ BEGIN INITIAL;
<A>[^\+\-\*\/\^\n] BEGIN X;
<A>\n BEGIN INITIAL; {printf("Accepted");}

<B>o BEGIN C;
<B> [^o\n] BEGIN X;
<B>\n BEGIN INITIAL; {printf("Not Accepted");}

<C>g BEGIN D;
<C>[^g\n] BEGIN X;

1
Final Project

<C>\n BEGIN INITIAL; {printf("Not Accepted");}

<E>q BEGIN F;
<E>i BEGIN H;
<E>[^qi\n] BEGIN X;
<E>\n BEGIN INITIAL; {printf("Not Accepted");}

<F>r BEGIN G;
<F>[^r\n] BEGIN X;
<F>\n BEGIN INITIAL; {printf("Not Accepted");}

<G>t BEGIN D;
<G>[^t\n] BEGIN X;
<G>\n BEGIN INITIAL; {printf("Not Accepted");}

<H>n BEGIN D;
<H>[^n\n] BEGIN X;
<H>\n BEGIN INITIAL; {printf("Not Accepted");}

<I>o BEGIN J;
<I>[^o\n] BEGIN X;
<I>\n BEGIN INITIAL; {printf("Not Accepted");}

<J>s BEGIN D;
<J>[^s\n] BEGIN X;
<J>\n BEGIN INITIAL; {printf("Not Accepted");}

<K>a BEGIN L;
<K>[^a\n] BEGIN X;
<K>\n BEGIN INITIAL; {printf("Not Accepted");}

<L>n BEGIN D;
<L>[^n\n] BEGIN X;
<L>\n BEGIN INITIAL; {printf("Not Accepted");}

<D>\( BEGIN M;
<D>[^\(\n] BEGIN X;
<D>\n BEGIN INITIAL; {printf("Not Accepted");}

<M>([-+]?[0-9]*[\.]?[0-9]+)+ BEGIN N;
<M>[^0123456789\.\n] BEGIN X;
<M>\n BEGIN INITIAL; {printf("Not Accepted");}

2
Final Project

<N>\) BEGIN A;
<N>[^\)\n] BEGIN X;
<N>\n BEGIN INITIAL; {printf("Not Accepted");}

<X>[^\n] BEGIN X;
<X>\n BEGIN INITIAL; {printf("Invalid");}

%%

int yywrap()
{
return 1;
}

int main()
{
printf("Enter an expression: ", yytext);
yylex();
return 0;
}

To count total number of token  


%{ 
int n = 0 ;  
%} 
  
%% 
  
"while"|"if"|"else" {n++;printf("\t keywords : %s", yytext);}  
  
"int"|"float" {n++;printf("\t keywords : %s", yytext);}   
  
[a-zA-Z_][a-zA-Z0-9_]* {n++;printf("\t identifier : %s", yytext);} 
  
"<="|"=="|"="|"++"|"-"|"*"|"+" {n++;printf("\t operator : %s", yytext);}
  
[(){}|, ;]    {n++;printf("\t separator : %s", yytext);} 
  
[0-9]*"."[0-9]+ {n++;printf("\t float : %s", yytext);}  
  
[0-9]+ {n++;printf("\t integer : %s", yytext);}                        

3
Final Project

  
.    ;
%% 
   
   
int main() 
  

      
    yylex();
      
    printf("\n total no. of token = %d\n", n);   
       

lex program to count number of words


%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}
  
/* Rules Section*/
%%
([a-zA-Z0-9])*    {i++;} /* Rule for counting 
                          number of words*/
  
"\n" {printf("%d\n", i); i = 0;}
%%
  
int yywrap(void){}
  
int main()
{   
    // The function that starts the analysis
    yylex();
  
    return 0;
}

4
Final Project

%{

#include <iostream.h> // for I/O


#include <string.h> // for string handling
#include <ctype.h> // for character predicates
int x, product, mode, sq; // mode must be global
%}

%%

[-+]?[0-9]+ {
cout << "*** " << yytext << " is an integer.\n";
x = atoi(yytext);
product = x * mode;
sq = x * x;
if (mode == 1)
cout << "Its square is " << sq << ".\n";
else
cout << "Its product with your mode is "
<< product << ".\n"; }

exit|quit {
cout << "\nBye\n";
return 0; }
[\t ]+ cout << " ";
\n cout << endl;
. cout << yytext;

%%

int main(int argc, char *argv[]) {


mode = 1;
if (argc==2 && isdigit(argv[1][0]) ) {
mode = argv[1][0] - '0';
cout << "From the command line, your mode is " << mode << ".\n";
if (strlen(argv[1]) > 1 )
cout << "Characters after the first are ignored.\n"

5
Final Project

<< "Ignoring: \"" << ++argv[1] << "\".\n";


}
yylex();
return 0;
}

%{

  #include<stdio.h>

  #include "y.tab.h"

  extern int yylval;

%}

  

%%

[0-9]+ {

          yylval=atoi(yytext);

          return NUMBER;

  

       }

[\t] ;

  

6
Final Project

[\n] return 0;

  

. return yytext[0];

  

%%

  

int yywrap()

 return 1;

Parser Source Code :

%{

   /* Definition section */

  #include<stdio.h>

  int flag=0;

%}

  

7
Final Project

%token NUMBER

  

%left '+' '-'

  

%left '*' '/' '%'

  

%left '(' ')'

  

/* Rule Section */

%%

  

ArithmeticExpression: E{

  

         printf("\nResult=%d\n", $$);

  

         return 0;

  

8
Final Project

        };

 E:E'+'E {$$=$1+$3;}

  

 |E'-'E {$$=$1-$3;}

  

 |E'*'E {$$=$1*$3;}

  

 |E'/'E {$$=$1/$3;}

  

 |E'%'E {$$=$1%$3;}

  

 |'('E')' {$$=$2;}

  

 | NUMBER {$$=$1;}

  

 ;

  

9
Final Project

%%

  

//driver code

void main()

   printf("\nEnter Any Arithmetic Expression which 

                   can have operations Addition, 

                   Subtraction, Multiplication, Division, 

                          Modulus and Round brackets:\n");

  

   yyparse();

   if(flag==0)

   printf("\nEntered arithmetic expression is Valid\n\n");

  

void yyerror()

10
Final Project

   printf("\nEntered arithmetic expression is Invalid\n\n");

   flag=1;

2. Syntax Analysis Phase


Parsing, syntax analysis, or syntactic analysis is the process of analyzing the syntactical
structure of the given input. It checks if the given input is in the correct syntax of the
programming language in which the input which has been written.Syntax analysis is a
second phase of the compiler design process that comes after lexical analysis.

Context Free Grammar (CFG)


E -> E+T T -> T*F
E -> E-T T -> T/F
E -> T T -> F

11
Final Project

F -> F^G
F -> G
G -> H
H -> number

12
After removing Ambiguity by converting it from Left Recursive Grammar to Right
Recursive Grammar, it becomes:
E -> TE’
E’ -> Є|+TE’|-TE’
T -> FT’
T’ -> Є|*FT’|/FT’
F -> GF’
F’ -> Є|^GF’

Operator Precedence Parser


An operator precedence parser is a bottom-up parser that interprets an operator grammar.
This parser is only used for operator grammars. Ambiguous grammars are not allowed in
any parser except operator precedence parser.
So, we will be using our ambiguous CFG in this method.
E -> E+T|E-T|T
T -> T*F|T/F|F
F -> F^G|G
G->sqrt(H)|log(H)|sin(H)|cos(H)|tan(H)|H
H -> number

Operator Precedence Table Or Operation Relation Table

^ / * - + $
^ ּ> ּ> ּ> ּ> ּ> ּ>
/ <ּ ּ> ּ> ּ> ּ> ּ>
* <ּ ּ> ּ> ּ> ּ> ּ>
- <ּ <ּ <ּ ּ> ּ> ּ>
+ <ּ <ּ <ּ ּ> ּ> ּ>
$ <ּ <ּ <ּ <ּ <ּ <ּ

Stack Implementation
Final Project

CFG: E -> E+T|E-T|T


T -> T*F|T/F|F
F -> F^G|G
G->sqrt(H)|log(H)|sin(H)|cos(H)|tan(H)|H
H -> number

1
Final Project

Context Free Grammar (CFG)


E -> E+L G -> H
E -> E-L H -> number
E -> L
L -> L*F
L -> L/F
L -> F
F -> F^G
F -> G

2
After removing Ambiguity by converting it from Left Recursive Grammar to Right
Recursive Grammar, it becomes:
E -> LE’
E’ -> Є|+LE’|-LE’
L-> FL’
L’ -> Є|*FL’|/FL’
F -> GF’
F’ -> Є|^GF’

Operator Precedence Parser


An operator precedence parser is a bottom-up parser that interprets an operator grammar.
This parser is only used for operator grammars. Ambiguous grammars are not allowed in
any parser except operator precedence parser.
So, we will be using our ambiguous CFG in this method.
E -> E+L|E-L|L
T -> L*F|L/F|F
F -> F^G|G
G->sqrt(H)|log(H)|sin(H)|cos(H)|tan(H)|H
H -> number

Operator Precedence Table Or Operation Relation Table

^ / * - + $
^ ּ> ּ> ּ> ּ> ּ> ּ>
/ <ּ ּ> ּ> ּ> ּ> ּ>
* <ּ ּ> ּ> ּ> ּ> ּ>
- <ּ <ּ <ּ ּ> ּ> ּ>
+ <ּ <ּ <ּ ּ> ּ> ּ>
$ <ּ <ּ <ּ <ּ <ּ <ּ

Stack Implementation
Final Project

CFG: E -> E+L|E-L|L


L -> L*F|L/F|F
F -> F^G|G
G->sqrt(H)|log(H)|sin(H)|cos(H)|tan(H)|H
H -> number
Context Free Grammar (CFG)
E -> E+L G -> H
E -> E-L H -> number
E -> L
L -> L*F
L -> L/F
L -> F
F -> F^G
F -> G

1
Final Project

After removing Ambiguity by converting it from Left Recursive Grammar to Right


Recursive Grammar, it becomes:
E -> LE’
E’ -> Є|+LE’|-LE’
L-> FL’
L’ -> Є|*FL’|/FL’
F -> GF’
F’ -> Є|^GF’

Operator Precedence Parser


An operator precedence parser is a bottom-up parser that interprets an operator grammar.
This parser is only used for operator grammars. Ambiguous grammars are not allowed in
any parser except operator precedence parser.
So, we will be using our ambiguous CFG in this method.
E -> E+L|E-L|L
T -> L*F|L/F|F
F -> F^G|G
H -> number

Operator Precedence Table Or Operation Relation Table

^ / * - + $
^ ּ> ּ> ּ> ּ> ּ> ּ>
/ <ּ ּ> ּ> ּ> ּ> ּ>
* <ּ ּ> ּ> ּ> ּ> ּ>
- <ּ <ּ <ּ ּ> ּ> ּ>
+ <ּ <ּ <ּ ּ> ּ> ּ>
$ <ּ <ּ <ּ <ּ <ּ <ּ

Stack Implementation
CFG: E -> E+L|E-L|L

2
Final Project

L -> L*F|L/F|F
F -> F^G|G
H -> number

3. Semantic Analysis Phase


Semantic analysis or context sensitive analysis is a process in compiler construction, usually
after parsing, to gather necessary semantic information from the source code and to ensure
that the declarations and statements of a program are semantically correct, i.e, that It
verifies the parse tree, whether it’s meaningful or not.
Synthesized Attribute (S attribute)
E -> E+T {E.value = E.value + T.value;}
E -> E-T {E.value = E.value - T.value;}
E -> T {E.value = T.value;}
T -> T*F {T.value = T.value + F.value;}
T -> T/F {T.value = T.value + F.value;}
T -> F {T.value = F.value;}
F -> F^G {F.value = F.value ^ G.value;}
F -> G {F.value = G.value;}
G ->sqrt(H){G.value = sqrt(H.value);}
G -> log(H) {G.value = log(H.value);}
G -> sin(H) {G.value = sin(H.value);}
G -> cos(H){G.value = cos(H.value);}
G -> tan(H){G.value = tan(H.value);}
G -> H {G.value = H.value;}
H -> number {H.value = number.value;}

3
Final Project

Conclusion

A compiler operates in various phases, each phase transforms the source program from one
representation to another. Every phase takes inputs from its previous stage and feeds its
output to the next phase of the compiler.All these phases convert the source code by
dividing into tokens, creating parse trees, and optimizing the source code by different
phases.The front end includes all analysis phases and requires enormous amount of space
to store tokens and trees.

You might also like