Compiler Design Lab Manual (1)
Compiler Design Lab Manual (1)
1
This lab manual is intended to aid third-year undergraduate computer science and
engineering students in their course Compiler Design Lab [B20CS3108].
2
Preface
Compiler is a software which takes as input a program written in a High-Level language
and translates it into its equivalent program in Low Level program. Compilers teaches us
how real-world applications are working and how to design them. Learning Compilers gives
us with both theoretical and practical knowledge that is crucial to implement a
programming language. It gives you a new level of understanding of a language to make
better use of the language. Sometimes just using a compiler is not enough. You need to
optimize the compiler itself for your application.
Compilers have a general structure that can be applied in many other applications, from
debuggers to simulators to 3D applications to a browser and even a cmd / shell.
Understanding compilers and how they work makes it super simple to understand all the
rest. Just using something is usually enough when everything goes as expected. But if
something goes wrong, only a true understanding of the inner workings and details will
help to fix it. Compilers are super elaborated / sophisticated systems. If you will say
that can or have written a compiler by yourself - there will be no doubt as to your
capabilities as a programmer. Every computer scientist can do much better if have
knowledge of compilers apart from the domain and technical knowledge.
Compiler design lab provides deep understanding of how programming language Syntax,
Semantics are used in translation into machine equivalents apart from the knowledge of
various compiler generation tools like Jlex, LEX, YACC etc.
This manual is helpful to prepare for Compiler Design Lab. It provides an opportunity for
the students to learn and apply phases of a compiler to a source program.
B N V NARASIMHA RAJU
K V S S R MURTHY
CH RAGHURAM
K HARI KRISHNA
L V SRINIVAS
3
Compiler Design Lab Evaluation
Scheme
Evaluation Scheme
Examination Marks
Exercise Programs 5
Record 5
Internal Exam 5
External Exam 35
4
Compiler Design Lab
Course Objectives
Course Outcomes
5
List of Experiments
1. A. Write a program to construct DFA from the given regular expression and test
whether the given string is accepted or not
B. Write a program to construct NFA from the given regular expression and test
whether the given string is accepted or not
2. A. Write a Program for lexical analyzer to read if, for, while statements and
separate them to characters, and then group them to form the tokens.
B. Write a Program for lexical analyzer recognize identifiers, constants,
operators and key words of the mini language.
C. Write a Program for lexical analyzer to read an expression and identify the
tokens variables constants and operators in it.
6
5. A. Implement the lexical analyzer using JLex, flex or lex other lexical analyzer
generating tools.
B. Write a LEX specification program for the tokens of C language.
C. Write YACC program to implement a calculator and find the value of arithmetic
expression.
7
Session #1
Construction of DFA and NFA
Learning Objective
To construct DFA and NFA from the given regular expression and test whether the given
string is accepted or not
Learning Context
Finite Automata
Finite automata are characterized by having no temporary storage. Since an input tape
cannot be rewritten, a finite automaton is severely limited in its capacity to "remember"
things during the computation. A finite amount of information can be retained in the
finite control by placing the unit into a specific state. But since the number of such
states is finite, a finite automaton can only deal with situations in which the information
to be stored at any time is strictly bounded. Finite automata consist of a finite set of
states and a set of transitions from one state to another state that occur after applying
input symbols from the alphabet Σ.
The first type of automaton we study in detail are finite automata that are deterministic
in their operation. In common with all automata, a deterministic finite automaton has
internal states, rules for transitions from one state to another, some input, and ways of
making decisions. All of these are incorporated in the following definition.
8
A deterministic finite automaton operates in the following manner. At the initial time, it
is assumed to be in the initial state q0, with its input mechanism on the leftmost symbol
of the input string. During each move of the automaton, the input mechanism advances
one position to the right, so each move consumes one input symbol. When the end of the
string is reached, the string is accepted if the automaton is in one of its final states.
Otherwise, the string is rejected. The input mechanism can move only from left to right
and reads exactly one symbol on each step. The transitions from one internal state to
another are governed by the transition function ẟ. A string x is said to be accepted by
finite automata M (Q, Σ, ẟ, q_0, F) if ẟ(q0, x) = p, for some p in F.
Note that there are three major differences between this definition and the definition
of a DFA. In a NFA, the range of ẟ is in the powerset 2Q, so that its value is not a single
element of Q, but a subset of it. This subset defines the set of all possible states that
can be reached by the transition. Finally, in an NFA, the set ẟ(qi, a) may be empty, meaning
that there is no transition defined for this specific situation. A string is accepted by an
NFA if there is some sequence of possible moves that will put the machine in a final state
at the end of the string. A string is rejected (that is, not accepted) only if there is no
possible sequence of moves by which a final state can be reached.
Regular Expressions
The Languages accepted by finite automata are easily described by simple expressions
called regular expression.
9
Exercises
1. A. Write a program to construct DFA from the given regular expression and test
whether the given string is accepted or not.
B. Write a program to construct NFA from the given regular expression and test
whether the given string is accepted or not.
1. Compilers: Principles, Techniques and Tools, Second Edition, Alfred V. Aho, Monica
S. Lam, Ravi Sethi, Jeffry D. Ullman, Pearson, Pearson Education India; 2nd
edition, 2013
2. Compiler Construction-Principles and Practice, Kenneth C Louden, Cengage
Learning, 2nd Edition, 1 January 2011.
3. Modern compiler implementation in C, Andrew W Appel, Revised edition,
Cambridge University Press.
4. The Theory and Practice of Compiler writing, J. P. Tremblay and P. G. Sorenson,
TMH
5. Writing compilers and interpreters, R. Mak, 3rd edition, Wiley student edition.
10
Solutions
1A. Write a program to construct DFA from the given regular expression and test
whether the given string is accepted or not
Following is the C program to construct DFA for Regular Expression
(a+aa*b)*
PROGRAM
#include<stdio.h>
#include<conio.h>
#include<strings.h>
void main() {
int table[2][2],i,j,l,status=0,success;
char input[100];
printf("To implementing DFA of language (a+aa*b)* Enter Input String:”);
table[0][0]=1;
table[0][1]=-1;
table[1][0]=1;
table[1][1]=0;
scanf("%s",input);
l=strlen(input);
for (i=0;i<l;i++) {
if(input[i]!='a'&&input[i]!='b') {
printf("The entered Value is wrong");
getch();
exit(0);
}
if(input[i]=='a')
status=table[status][0]; else
status=table[status][1];
if(status==-1) {
printf("String not Accepted");
break;
11
}
}
if(i==l)
printf("String Accepted");
getch();
}
Output:
Run 1:
To implementing DFA of language (a+aa*b)*
Enter Input String:cbsd
The entered Value is wrong.
Run 2:
To implementing DFA of language (a+aa*b)*
Enter Input String:abbababa
String not Accepted.
Run 3:
To implementing DFA of language (a+aa*b)*
Enter Input String:babbaab
String not Accepted.
12
1B. Write a program to construct NFA from the given regular expression and test
whether the given string is accepted or not.
PROGRAM
#include <stdio.h>
#include <string.h>
void main()
{
char str[100];
char state='P';
int i=0;
printf("Enter input string: \n");
scanf("%s",str);
while(str[i]!='\0')
{
switch(state)
{
case 'P':
if (str[i]=='a') state='Q';
else if (str[i]=='b') state='P';
break;
case 'Q':
if (str[i]=='b') state='R';
else state='T';
break;
case 'R':
if (str[i]=='b') state='S';
else state='T';
break;
case 'S':
if (str[i]=='b') state='P';
else state='T';
break;
case 'T':
if (str[i]=='b') state='R';
13
else state='T';
break;
}
i++;
}
if (state=='S')
{
printf("String Accepted\n");
}
else
{
printf("String not Accepted\n");
}
}
OUTPUT
Enter input string:
(a+aa*b)*
String Accepted
14
Session #2
Learning Objective
To implement lexical analyzer to identify and generate tokens from the source program
Learning Context
The Lexical Analysis reads the input characters of the source program and produces a
sequence of tokens as output. These tokens are sent to the parser for syntax analysis.
Whenever the lexical analyzer finds an identifier, it needs to enter that into the symbol
table. It will also use the symbol table for knowing the kind of identifier, which helps in
determining the proper token. These interactions are shown in Fig. 1. In this, the parser
calls the lexical analyzer by getNextToken command, then the lexical analyzer reads the
characters from its input until it identifies the next token. This token is returned to the
parser.
The Lexical analyzer will also perform other tasks like stripping out comments and
whitespace, correlating the error messages generated by the compiler with the source
program. Sometimes, lexical analyzers are divided into a cascade of two processes:
15
• Scanning consists of simple processes, such as deletion of comments and reducing
consecutive whitespace characters into one.
• Lexical analysis, which produces tokens from the output of the scanner.
Exercise
2. A. Write a Program for lexical analyzer to read if, for, while statements and separate
them to characters, and then group them to form the tokens.
B. Write a Program for lexical analyzer recognize identifiers, constants, operators and
key words of the mini language.
C. Write a Program for lexical analyzer to read an expression and identify the tokens
variables constants and operators in it
16
Solution
2A. Write a Program for lexical analyzer to read if, for, while statements and separate
them to characters, and then group them to form the tokens.
PROGRAM
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int isdel(char ch)
{
if(ch=='(' || ch==' ')
return 1;
}
int main()
{
int count=0;
char s[50];
int i=0;
printf("\n Enter the string:");
scanf("%s",s);
while(i<strlen(s))
{
if(s[i]=='i' && s[i+1]=='f' && isdel(s[i+2]))
{
printf("if\n");
i=i+3;
count++;
}
else if(s[i]=='f' && s[i+1]=='o' &&s[i+2]=='r' && isdel(s[i+3]))
{
printf("for\n");
i=i+4;
count++;
}
else if(s[i]=='w' && s[i+1]=='h' &&s[i+2]=='i' && s[i+3]=='l'
&&s[i+4]=='e' && isdel(s[i+5]))
{
printf("while\n");
i=i+6;
17
count++;
}
else
{
i++;
}
}
printf("total number of keywords= %d",count);
}
OUTPUT
IF(A==0)
total number of keywords= 1
18
2B. Write a Program for lexical analyzer recognize identifiers, constants, operators and
key words of the mini language.
2C. Write a Program for lexical analyzer to read an expression and identify the tokens
variables constants and operators in it
PROGRAM
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
// Returns 'true' if the character is a DELIMITER.
bool isDelimiter(char ch)
{
if (ch == ' ' || ch == '+' || ch == '-' || ch == '*' ||
ch == '/' || ch == ',' || ch == ';' || ch == '>' ||
ch == '<' || ch == '=' || ch == '(' || ch == ')' ||
ch == '[' || ch == ']' || ch == '{' || ch == '}')
return (true);
return (false);
}
// Returns 'true' if the character is an OPERATOR.
bool isOperator(char ch)
{
if (ch == '+' || ch == '-' || ch == '*' ||
ch == '/' || ch == '>' || ch == '<' ||
ch == '=')
return (true);
return (false);
}
// Returns 'true' if the string is a VALID IDENTIFIER.
bool validIdentifier(char* str)
{
if (str[0] == '0' || str[0] == '1' || str[0] == '2' ||
str[0] == '3' || str[0] == '4' || str[0] == '5' ||
str[0] == '6' || str[0] == '7' || str[0] == '8' ||
19
str[0] == '9' || isDelimiter(str[0]) == true)
return (false);
return (true);
}
// Returns 'true' if the string is an INTEGER.
bool isInteger(char* str)
{
int i, len = strlen(str);
if (len == 0)
return (false);
for (i = 0; i < len; i++) {
if (str[i] != '0' && str[i] != '1' && str[i] != '2'
&& str[i] != '3' && str[i] != '4' && str[i] != '5'
&& str[i] != '6' && str[i] != '7' && str[i] != '8'
&& str[i] != '9' || (str[i] == '-' && i > 0))
return (false);
}
return (true);
}
// Returns 'true' if the string is a REAL NUMBER.
bool isRealNumber(char* str)
{
int i, len = strlen(str);
bool hasDecimal = false;
if (len == 0)
return (false);
for (i = 0; i < len; i++) {
if (str[i] != '0' && str[i] != '1' && str[i] != '2'
&& str[i] != '3' && str[i] != '4' && str[i] != '5'
&& str[i] != '6' && str[i] != '7' && str[i] != '8'
&& str[i] != '9' && str[i] != '.' ||
(str[i] == '-' && i > 0))
return (false);
20
if (str[i] == '.')
hasDecimal = true;
}
return (hasDecimal);
}
// Extracts the SUBSTRING.
char* subString(char* str, int left, int right)
{
int i;
char* subStr = (char*)malloc(
sizeof(char) * (right - left + 2));
21
else if (isRealNumber(subStr) == true)
printf("'%s' IS A REAL NUMBER\n", subStr);
else if (validIdentifier(subStr) == true
&& isDelimiter(str[right - 1]) == false)
printf("'%s' IS A VALID IDENTIFIER\n", subStr);
else if (validIdentifier(subStr) == false
&& isDelimiter(str[right - 1]) == false)
printf("'%s' IS NOT A VALID IDENTIFIER\n",
subStr);
left = right;
}
}
return;
}
// DRIVER FUNCTION
int main()
{
// maximum length of string is 100 here
char str[100] = "a = 10";
parse(str); // calling the parse function
return (0);
}
Output:
'a' IS A VALID IDENTIFIER
'=' IS AN OPERATOR
'10' IS AN INTEGER
22
Session #3
Implementation of Parsing Techniques
Learning Objective
To identify valid expressions and implement various parsing techniques like Shift reduce,
recursive descent and predictive parsing.
Learning Context
A Parser for a grammar G is a program that takes an input string w and produces output
as a parse tree if w is generated from grammar G, otherwise, an error message is
generated. A parse tree is produced only in the figurative sense. In reality, a parse tree
exists as a set of actions made by the parser.
In this, the parser obtains a string of tokens from the lexical analyzer, as shown in Fig.
1, and verifies whether the string of token names is generated by the grammar of the
source language. The parser must report any syntax errors, and it must recover from
common errors to process the remaining program. For well-formed programs, the parser
constructs a parse tree and passes it to the rest of the compiler for further processing.
In fact, the parse tree need not be constructed explicitly because checking and
translation actions can be mixed with parsing. Thus, the parser and the rest of the front
end could well be implemented by a single module.
23
There are three general types of parsers, they are universal, top-down, and bottom-up.
The commonly used methods in compilers are top-down and bottom-up. The top-down
methods build parse trees from the top (root) to the bottom (leaves), while bottom-up
methods start from the leaves and work their way up to the root. In both the top-down
and bottom-up parsers, the input is scanned from left to right, one symbol at a time.
The most efficient top-down and bottom-up methods work only for subclasses of
grammars. In this, we assume that the output of the parser is the parse tree. In practice,
a number of tasks can be conducted during parsing, such as collecting information about
various tokens, performing type checking and other kinds of semantic analysis, and
generating intermediate code. We have combined all of these activities into the "rest of
the front end" in Fig. 1
Exercise
3 A. Write a parsing program to identify whether the given expression is valid or
not.
B. Write a program to implement shift reduce parser for the simple CFG.
C. Implement recursive descent parser by creating a separate function for each
variable from the given CFG.
D. Write a program to determine FIRST sets for all variables and terminals from
the given CFG.
24
E. Write a program to determine FOLLOW sets for all variables from the given
CFG.
F. Write a program which takes predictive parsing table as input and to determine
whether the input string is accepted or not.
Solutions
3A. Write a parsing program to identify whether the given expression is valid or not.
PROGRAM
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<stdbool.h>
#include<ctype.h>
void main()
{
int i,j,k,l,s;
int a[50];
char ch[50];
printf("enter expression:");
scanf("%s",ch);
l=strlen(ch);
for(i=0;i<l;i++)
{
if(isalpha(ch[i]) && isalpha(ch[i+1]) || isalnum(ch[i]) && ch[i-1]==')' &&
isalnum(ch[i]) && ch[i+1]=='(')
{
s=0;
break;
}
25
if(isalnum(ch[k-1]) && isalnum(ch[k+1]) || ch[k-1]==')' || ch[k+1]=='(')
{
s=1;
}
else
{
s=0;
}
}
}
if(s==1)
{
printf("valid");
}
else
{
printf("invalid");
}
}
OUTPUT
Valid
26
3B. Write a program to implement shift reduce parser for the simple CFG.
PROGRAM
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
//Global Variables
int z = 0, i = 0, j = 0, c = 0;
// Modify array size to increase
// length of string to be parsed
char a[16], ac[20], stk[15], act[10];
//printing action
27
printf("\n$%s\t%s$\t", stk, a);
}
}
28
return ; //return to main
}
//Driver Function
int main()
{
printf("GRAMMAR is -\nE->2E2 \nE->3E3 \nE->4\n");
// a is input string
strcpy(a,"32423");
29
// Moving the pointer
a[j]=' ';
// Printing action
printf("\n$%s\t%s$\t", stk, a);
Output
GRAMMAR is -
E->2E2
E->3E3
E->4
30
stack input action
$ 32423$ SHIFT
$3 2423$ SHIFT
$3E 3$ SHIFT
$E $ Accept
31
3C. Implement recursive descent parser by creating a separate function for each variable
from the given CFG.
PROGRAM
#include<stdio.h>
#include<string.h>
int E(),Edash(),T(),Tdash(),F();
char *ip;
char string[50];
int main()
{
printf("Enter the string\n");
scanf("%s",string);
ip=string;
printf("\n\nInput\tAction\n--------------------------------\n");
if(E() && ip=="\0"){
printf("\n--------------------------------\n");
printf("\n String is successfully parsed\n");
}
else{
printf("\n--------------------------------\n");
printf("Error in parsing String\n");
}
}
int E()
{
printf("%s\tE->TE' \n",ip);
if(T())
{
if(Edash())
{
return 1;
}
else
return 0;
32
}
else
return 0;
}
int Edash()
{
if(*ip=='+')
{
printf("%s\tE'->+TE' \n",ip);
ip++;
if(T())
{
if(Edash())
{
return 1;
}
else
return 0;
}
else
return 0;
}
else
{
printf("%s\tE'->^ \n",ip);
return 1;
}
}
int T()
{
printf("%s\tT->FT' \n",ip);
if(F())
{
if(Tdash())
{
33
return 1;
}
else
return 0;
}
else
return 0;
}
int Tdash()
{
if(*ip=='*')
{
printf("%s\tT'->*FT' \n",ip);
ip++;
if(F())
{
if(Tdash())
{
return 1;
}
else
return 0;
}
else
return 0;
}
else
{
printf("%s\tT'->^ \n",ip);
return 1;
}
}
int F()
{
if(*ip=='(')
34
{
printf("%s\tF->(E) \n",ip);
ip++;
if(E())
{
if(*ip==')')
{
ip++;
return 0;
}
else
return 0;
}
else
return 0;
}
else if(*ip=='i')
{
ip++;
printf("%s\tF->id \n",ip);
return 1;
}
else
return 0;
}
Output
Enter the string
Input Action
--------------------------------
E->TE'
T->FT'
--------------------------------
Error in parsing String
35
3D. Write a program to determine FIRST sets for all variables and terminals from the
given CFG.
PROGRAM
#include<stdio.h>
#include<ctype.h>
void FIRST(char[],char );
void addToResultSet(char[],char);
int numOfProductions;
char productionSet[10][10];
main()
{
int i;
char choice;
char c;
char result[20];
printf("How many number of productions ? :");
scanf(" %d",&numOfProductions);
for(i=0;i<numOfProductions;i++)//read production string eg: E=E+T
{
printf("Enter productions Number %d : ",i+1);
scanf(" %s",productionSet[i]);
}
do
{
printf("\n Find the FIRST of :");
scanf(" %c",&c);
FIRST(result,c); //Compute FIRST; Get Answer in 'result' array
printf("\n FIRST(%c)= { ",c);
for(i=0;result[i]!='\0';i++)
printf(" %c ",result[i]); //Display result
printf("}\n");
printf("press 'y' to continue : ");
scanf(" %c",&choice);
}
36
while(choice=='y'||choice =='Y');
}
void FIRST(char* Result,char c)
{
int i,j,k;
char subResult[20];
int foundEpsilon;
subResult[0]='\0';
Result[0]='\0';
if(!(isupper(c)))
{
addToResultSet(Result,c);
return ;
}
for(i=0;i<numOfProductions;i++)
{
if(productionSet[i][0]==c)
{
if(productionSet[i][2]=='$') addToResultSet(Result,'$');
else
{
j=2;
while(productionSet[i][j]!='\0')
{
foundEpsilon=0;
FIRST(subResult,productionSet[i][j]);
for(k=0;subResult[k]!='\0';k++)
addToResultSet(Result,subResult[k]);
for(k=0;subResult[k]!='\0';k++)
if(subResult[k]=='$')
{
foundEpsilon=1;
break;
}
if(!foundEpsilon)
37
break;
j++;
}
}
}
}
return ;
}
void addToResultSet(char Result[],char val)
{
int k;
for(k=0 ;Result[k]!='\0';k++)
if(Result[k]==val)
return;
Result[k]=val;
Result[k+1]='\0';
}
OUTPUT
How many number of productions ? :4
Enter productions Number 1 : E=TR
Enter productions Number 2 : R=+TR
Enter productions Number 3 : T=a
Enter productions Number 4 : Y=s
Find the FIRST of :E
FIRST(E)= { a }
press 'y' to continue : y
Find the FIRST of :R
FIRST(R)= { + }
press 'y' to continue : y
Find the FIRST of :Y
FIRST(Y)= { s }
press 'y' to continue :
38
3E. Write a program to determine FOLLOW sets for all variables from the given CFG.
PROGRAM
#include<stdio.h>
#include<string.h>
int n,m=0,p,i=0,j=0;
char a[10][10],followResult[10];
void follow(char c);
void first(char c);
void addToResult(char);
int main()
{
int i;
int choice;
char c,ch;
printf("Enter the no. of productions: ");
scanf("%d", &n);
printf(" Enter %d productions\n Production with multiple terms should be
give as separate productions \n", n);
for(i=0;i<n;i++)
scanf("%s%c",a[i],&ch);
// gets(a[i]);
do
{
m=0;
printf("Find FOLLOW of -->");
scanf(" %c",&c);
follow(c);
printf("FOLLOW(%c) = { ",c);
for(i=0;i<m;i++)
printf("%c ",followResult[i]);
printf(" }\n");
39
printf("Do you want to continue(Press 1 to continue....)?");
scanf("%d%c",&choice,&ch);
}
while(choice==1);
}
void follow(char c)
{
if(a[0][0]==c)addToResult('$');
for(i=0;i<n;i++)
{
for(j=2;j<strlen(a[i]);j++)
{
if(a[i][j]==c)
{
if(a[i][j+1]!='\0')first(a[i][j+1]);
if(a[i][j+1]=='\0'&&c!=a[i][0])
follow(a[i][0]);
}
}
}
}
void first(char c)
{
int k;
if(!(isupper(c)))
//f[m++]=c;
addToResult(c);
for(k=0;k<n;k++)
{
if(a[k][0]==c)
{
if(a[k][2]=='$') follow(a[i][0]);
40
else if(islower(a[k][2]))
//f[m++]=a[k][2];
addToResult(a[k][2]);
else first(a[k][2]);
}
}
}
void addToResult(char c)
{
int i;
for( i=0;i<=m;i++)
if(followResult[i]==c)
return;
followResult[m++]=c;
}
OUTPUT
Enter the no. of productions: 6
Enter 6 productions
Production with multiple terms should be give as separate productions
E=TR
R=+TR
T=FY
Y=*FY
F=(E)
F=a
Find FOLLOW of -->E
FOLLOW(E) = { $ ) }
Do you want to continue(Press 1 to continue....)?1
Find FOLLOW of -->R
FOLLOW(R) = { ) }
Do you want to continue(Press 1 to continue....)?
41
3F. Write a program which takes predictive parsing table as input and to determine
whether the input string is accepted or not.
PROGRAM:
#include <stdio.h>
#include <string.h>
char prol[7][10] = {"S", "A", "A", "B", "B", "C", "C"};
char pror[7][10] = {"A", "Bb", "Cd", "aB", "@", "Cc", "@"};
char prod[7][10] = {"S->A", "A->Bb", "A->Cd", "B->aB", "B->@", "C->Cc", "C->@"};
char first[7][10] = {"abcd", "ab", "cd", "a@", "@", "c@", "@"};
char follow[7][10] = {"$", "$", "$", "a$", "b$", "c$", "d$"};
char table[5][6][10];
int numr(char c) {
switch (c) {
case 'S': return 0;
case 'A': return 1;
case 'B': return 2;
case 'C': return 3;
case 'a': return 0;
case 'b': return 1;
case 'c': return 2;
case 'd': return 3;
case '$': return 4;
}
return 2;
}
int main() {
int i, j, k;
clrscr();
42
printf("\nThe following is the predictive parsing table for the following
grammar:\n");
43
for (j = 0; j < 6; j++) {
printf("%-10s", table[i][j]);
if (j == 5)
printf("\n--------------------------------------------------------\n");
}
getchar();
return 0;
}
The following is the predictive parsing table for the following grammar:
S->A
A->Bb
A->Cd
B->aB
B->@
C->Cc
C->@
Predictive parsing table is
------------------------------------------------------------------
a b c d $
------------------------------------------------------------------
S S->AS->AS->AS->A
-----------------------------------------------------------------
A A->Bb A->BbA->Cd A->Cd
------------------------------------------------------------------
B B->aB B->@ B->@ B->@
------------------------------------------------------------------
C C->@C->@ C->@
------------------------------------------------------------------
44
Session #4
Learning Objective
To generate Three-Address Statements, perform loop unrolling and constant
propagation, implement simple code generator
Learning Context
Three- Address Code
In three-address code, there is at most one operator on the right side of the
instruction. Thus, a source-language expression like x + y * z might be translated
into the sequence of three-address instructions
t1 = y * z
t2 = x + t1
Loop unrolling
45
• Increases program efficiency.
• Reduces loop overhead.
• If statements in loop are not dependent on each other, they can be executed in
parallel.
Constant propagation
Constant Propagation is one of the local code optimization technique in Compiler Design.
It can be defined as the process of replacing the constant value of variables in the
expression. In simpler words, we can say that if some value is assigned a known constant,
than we can simply replace the that value by constant. Constants assigned to a variable
can be propagated through the flow graph and can be replaced when the variable is used.
pi = 22/7 = 3.14
In the above code the compiler must first perform division operation, which is an
expensive operation and then assign the computed result 3.14 to the variable pi. Now if
anytime we have to use this constant value of pi, then the compiler again has to look – up
for the value and again perform division operation and then assign it to pi and then use
it. This is not a good idea when we can directly assign the value 3.14 to pi variable, thus
reducing the time needed for code to run.
46
Also, Constant propagation reduces the number of cases where values are directly copied
from one location or variable to another, in order to simply allocate their value to another
variable.
An algorithm is used to generate code for a single basic block. It considers each three-
address instruction and keeps track of what values are in what registers, so we can avoid
unnecessary loads and stores. One of the primary issues during code generation is
deciding how to use registers in the best way. There are four uses of registers:
• In most machine architectures, some or all of the operands of an operation must
be in registers in order to perform the operation.
• Registers make good temporaries - places to hold the result of a sub expression
while a larger expression is being evaluated.
• Registers are used to hold (global) values that are computed in one basic block
and used in other blocks.
• Registers are often used to help with run-time storage management.
An essential part of the algorithm is a function getReg(I), which selects registers for
each memory location associated with the three-address instruction I. Function getReg
has access to the register and address descriptors for all the variables of the basic
block, and may also have access to the variables that are live on exit from the block. We
assume that there are enough registers and we will free all available registers by storing
their values in memory.
As the code-generation algorithm issues load, store, and other machine instructions, it
needs to update the register and address descriptors. The rules are as follows:
• For the instruction LD R, x
47
o Change the register descriptor for register R so it holds only x.
o Change the address descriptor for x by adding register R as an additional
location.
• For the instruction ST x, R, change the address descriptor for x to include its
own memory location.
• For an operation such as ADD Rx, Ry, Rz implementing a three-address instruction
x=y+z
o Change the register descriptor for Rx so that it holds only x.
o Change the address descriptor for x so that its only location is Rx.
o Remove Rx from address descriptor of any variable other than x.
• When we process a copy statement x = y, after generating the load for y into
register Ry, if needed, and after managing descriptors as for all load statements
o Add x to the register descriptor for Ry.
o Change the address descriptor for x so that its only location is Ry.
Exercise
4 A. Write a program to take simple expressions and generate the corresponding
three address statements.
B. Write a program to perform loop unrolling.
48
C. Write a program to perform constant propagation
D. Write a program to implement simple code generator from the given three
address statements.
Solutions
4A. Write a program to take simple expressions and generate the corresponding three
address statements.
PROGRAM
#include<stdio.h>
#include<string.h>
void pm();
void plus();
void div();
int i,ch,j,l,addr=100;
char ex[10], exp[10] ,exp1[10],exp2[10],id1[5],op[5],id2[5];
void main()
{
clrscr();
while(1)
{
printf("\n1.assignment\n2.arithmetic\n3.relational\n4.Exit\nEnter the choice:");
scanf("%d",&ch);
switch(ch)
{
case 1:
printf("\nEnter the expression with assignment operator:");
scanf("%s",exp);
l=strlen(exp);
exp2[0]='\0';
i=0;
while(exp[i]!='=')
{
49
i++;
}
strncat(exp2,exp,i);
strrev(exp);
exp1[0]='\0';
strncat(exp1,exp,l-(i+1));
strrev(exp1);
printf("Three address code:\ntemp=%s\n%s=temp\n",exp1,exp2);
break;
case 2:
printf("\nEnter the expression with arithmetic operator:");
scanf("%s",ex);
strcpy(exp,ex);
l=strlen(exp);
exp1[0]='\0';
for(i=0;i<l;i++)
{
if(exp[i]=='+'||exp[i]=='-')
{
if(exp[i+2]=='/'||exp[i+2]=='*')
{
pm();
break;
}
else
{
plus();
break;
}
}
else if(exp[i]=='/'||exp[i]=='*')
{
div();
break;
}
50
}
break;
case 3:
printf("Enter the expression with relational operator");
scanf("%s%s%s",&id1,&op,&id2);
if(((strcmp(op,"<")==0)||(strcmp(op,">")==0)||(strcmp(op,"<=")==
0)||(strcmp(op,">=")==0)||(strcmp(op,"==")==0)||(strcmp(op,"!="
)==0))==0)
printf("Expression is error");
else
{
printf("\n%d\tif %s%s%s goto %d",addr,id1,op,id2,addr+3);
addr++;
printf("\n%d\t T:=0",addr);
addr++;
printf("\n%d\t goto %d",addr,addr+2);
addr++;
printf("\n%d\t T:=1",addr);
}
break;
case 4:
exit(0);
}
}
}
void pm()
{
strrev(exp);
j=l-i-1;
strncat(exp1,exp,j);
strrev(exp1);
printf("Three address
code:\ntemp=%s\ntemp1=%c%ctemp\n",exp1,exp[j+1],exp[j]);
}
void div()
51
{
strncat(exp1,exp,i+2);
printf("Three address
code:\ntemp=%s\ntemp1=temp%c%c\n",exp1,exp[i+2],exp[i+3]);
}
void plus()
{
strncat(exp1,exp,i+2);
printf("Three address
code:\ntemp=%s\ntemp1=temp%c%c\n",exp1,exp[i+2],exp[i+3]);
}
OUTPUT
1.assignment
2.arithmetic
3.relational
4.Exit
Enter the choice:2
Enter the expression with arithmetic operator:
a+b-c
Three address code:
temp=a+b
52
temp1=temp-c
1.assignment
2.arithmetic
3.relational
4.Exit
Enter the choice:2
Enter the expression with arithmetic operator:
a-b/c
Three address code:
temp=b/c
temp1=a-temp
1.assignment
2.arithmetic
3.relational
4.Exit
Enter the choice:2
Enter the expression with arithmetic operator:
a*b-c
Three address code:
temp=a*b
temp1=temp-c
1.assignment
2.arithmetic
3.relational
4.Exit
Enter the choice:2
Enter the expression with arithmetic operator:a/b*c
Three address code:
temp=a/b
temp1=temp*c
1.assignment
2.arithmetic
53
3.relational
4.Exit
Enter the choice:3
Enter the expression with relational operator
a<=b
1.assignment
2.arithmetic
3.relational
4.Exit
Enter the choice:4
54
4B. Write a program to perform loop unrolling.
PROGRAM
#include<stdio.h>
#include<conio.h>
void main() {
unsigned int n;
int x;
char ch;
clrscr();
printf("\nEnter N\n");
scanf("%u", & n);
printf("\n1. Loop Roll\n2. Loop UnRoll\n");
printf("\nEnter ur choice\n");
scanf(" %c", & ch);
switch (ch) {
case '1':
x = countbit1(n);
printf("\nLoop Roll: Count of 1's : %d", x);
break;
case '2':
x = countbit2(n);
printf("\nLoop UnRoll: Count of 1's : %d", x);
break;
default:
printf("\n Wrong Choice\n");
}
getch();
}
int countbit1(unsigned int n) {
int bits = 0, i = 0;
while (n != 0) {
if (n & 1) bits++;
n >>= 1;
55
i++;
}
printf("\n no of iterations %d", i);
return bits;
}
int countbit2(unsigned int n) {
int bits = 0, i = 0;
while (n != 0) {
if (n & 1) bits++;
if (n & 2) bits++;
if (n & 4) bits++;
if (n & 8) bits++;
n >>= 4;
i++;
}
printf("\n no of iterations %d", i);
return bits;
}
Input/Output
Enter N
3
1. Loop Roll
2. Loop UnRoll
Enter ur choice
2
no of iterations 1
Loop UnRoll: Count of 1's :2
56
4C. Write a program to perform constant propagation
PROGRAM
#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<conio.h>
void input();
void output();
void change(int p,char *res);
void constant();
struct expr
{
char op[2],op1[5],op2[5],res[5];
int flag;
}arr[10];
int n;
void main()
{
clrscr();
input();
constant();
output();
getch();
}
void input()
{
int i;
printf("\n\nEnter the maximum number of expressions : ");
scanf("%d",&n);
printf("\nEnter the input : \n");
for(i=0;i<n;i++)
{
scanf("%s",arr[i].op);
scanf("%s",arr[i].op1);
57
scanf("%s",arr[i].op2);
scanf("%s",arr[i].res);
arr[i].flag=0;
}
}
void constant()
{
int i;
int op1,op2,res;
char op,res1[5];
for(i=0;i<n;i++)
{
if(isdigit(arr[i].op1[0]) && isdigit(arr[i].op2[0]) || strcmp(arr[i].op,"=")==0)
/*if both digits, store them in variables*/
{
op1=atoi(arr[i].op1);
op2=atoi(arr[i].op2);
op=arr[i].op[0];
switch(op)
{
case '+':
res=op1+op2;
break;
case '-':
res=op1-op2;
break;
case '*':
res=op1*op2;
break;
case '/':
res=op1/op2;
break;
case '=':
res=op1;
break;
58
}
sprintf(res1,"%d",res);
arr[i].flag=1; /*eliminate expr and replace any operand below that uses
result of this expr */
change(i,res1);
}
}
}
void output()
{
int i=0;
printf("\nOptimized code is : ");
for(i=0;i<n;i++)
{
if(!arr[i].flag)
{
printf("\n%s %s %s %s",arr[i].op,arr[i].op1,arr[i].op2,arr[i].res);
}
}
}
void change(int p,char *res)
{
int i;
for(i=p+1;i<n;i++)
{
if(strcmp(arr[p].res,arr[i].op1)==0)
strcpy(arr[i].op1,res);
else if(strcmp(arr[p].res,arr[i].op2)==0)
strcpy(arr[i].op2,res);
}
}
59
INPUT:
Enter the maximum number of expressions: 4
OUTPUT:
Optimized code is :
+ 3 b t1
+ 3 c t2
+ t1 t2 t3
60
4D. Write a program to implement simple code generator from the given three address
statements
PROGRAM
#include<stdio.h>
#include<conio.h>
#include<string.h>
struct three
{
char data[10],temp[7];
}s[30];
void main()
{
char *d1,*d2;
int i=0,len=0;
FILE *f1,*f2;
clrscr();
f1=fopen("exe1.txt","r");
f2=fopen("exe2.txt","w");
while(fscanf(f1,"%s",s[len].data)!=EOF)
len++;
for(i=0;i<=len;i++)
{
if(!strcmp(s[i].data,"="))
{
fprintf(f2,"\nLDA\t%s",s[i+1].data);
if(!strcmp(s[i+2].data,"+"))
fprintf(f2,"\nADD\t%s",s[i+3].data);
if(!strcmp(s[i+2].data,"-"))
fprintf(f2,"\nSUB\t%s",s[i+3].data);
fprintf(f2,"\nSTA\t%s",s[i-1].data);
}
}
fclose(f1);
fclose(f2);
getch();
}
Input: exe1.txt
61
t1 = in1 + in2
t2 = t1 + in3
t3 = t2 - in4
out = t3
Output: exe2.txt
LDA in1
ADD in2
STA t1
LDA t1
ADD in3
STA t2
LDA t2
SUB in4
STA t3
LDA t3
STA out
62
Session #5
Learning Objective
To implement lexical analyzer using Jlex, Lex program for the tokens of C language and
YACC program to implement calculator
Learning Context
A lexical analyzer breaks an input stream of characters into tokens. Writing lexical
analyzers by hand can be a tedious process, so software tools have been developed to
ease this task.
Perhaps the best known such utility is Lex. Lex is a lexical analyzer generator for the
UNIX operating system, targeted to the C programming language. Lex takes a specially
formatted specification file containing the details of a lexical analyzer. This tool then
creates a C source file for the associated table-driven lexer.
The JLex utility is based upon the Lex lexical analyzer generator model. JLex takes a
specification file similar to that accepted by Lex, then creates a Java source file for the
corresponding lexical analyzer.
user code
%%
JLex directives
%%
63
regular expression rules
The ``%%'' directives distinguish sections of the input file and must be placed at the
beginning of their line. The remainder of the line containing the ``%%'' directives may
be discarded and should not be used to house additional declarations or code.
The user code section - the first section of the specification file - is copied directly into
the resulting output file. This area of the specification provides space for the
implementation of utility classes or return types.
The JLex directives section is the second part of the input file. Here, macros definitions
are given and state names are declared.
The third section contains the rules of lexical analysis, each of which consists of three
parts: an optional state list, a regular expression, and an action.
A Lex program is separated into three sections by %% delimiters. The formal of Lex
source is as follows:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
/* definitions */
64
....
%%
/* rules */
....
%%
/* auxiliary routines */
Exercise
5 A. Implement the lexical analyzer using JLex, flex or lex other lexical analyzer
generating tools.
B. Write a LEX specification program for the tokens of C language.
C. Write YACC program to implement a calculator and find the value of arithmetic
expression.
65
Solutions
5A. Implement the lexical analyzer using JLex, flex or lex other lexical analyzer
generating tools.
PROGRAM
66
int main(int argc,char **argv)
{
if (argc > 1)
{
FILE *file;
file = fopen(argv[1],"r"); if(!file)
{
printf("could not open %s \n",argv[1]); exit(0);
}
yyin = file;
}
yylex();
printf("\n\n");
return 0;
}
int yywrap()
{
return 0;
}
Input
$vi var.c
#include
main()
{
int a,b;
}
Output
$lex lex.l
$cc lex.yy.c
$./a.out var.c
#include is a PREPROCESSOR DIRECTIVE
FUNCTION
main (
)
67
BLOCK BEGINS
int is a KEYWORD
a IDENTIFIER
b IDENTIFIER
BLOCK ENDS
68
5B. Write a LEX specification program for the tokens of C language.
PROGRAM
/*Lex code to count total number of tokens */
%{
int n = 0 ;
%}
// rule section
%%
//count number of keywords
"while"|"if"|"else" {n++;printf("\t keywords : %s", yytext);}
// count number of keywords
"int"|"float" {n++;printf("\t keywords : %s", yytext);}
// count number of identifiers
[a-zA-Z_][a-zA-Z0-9_]* {n++;printf("\t identifier : %s", yytext);}
// count number of operators
"<="|"=="|"="|"++"|"-"|"*"|"+" {n++;printf("\t operator : %s", yytext);}
// count number of separators
[(){}|, ;] {n++;printf("\t separator : %s", yytext);}
// count number of floats
[0-9]*"."[0-9]+ {n++;printf("\t float : %s", yytext);}
// count number of integers
[0-9]+ {n++;printf("\t integer : %s", yytext);}
.;
%%
int main()
{
yylex();
printf("\n total no. of token = %d\n", n);
}
69
Input:
Output:
total no. of tokens = 13
70
5C. Write YACC program to implement a calculator and find the value of arithmetic
expression.
PROGRAM
%{
/* Definition section */
#include<stdio.h>
#include "y.tab.h"
extern int yylval;
%}
/* Rule Section */
%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;
}
[\t] ;
[\n] return 0;
. return yytext[0];
%%
int yywrap()
{
return 1;
}
Input:
10+5-
Output:
Entered arithmetic expression is Invalid
71