Compiler Design
Compiler Design
Compiler Design
(3170701)
Enrolment No 210210107052
Name Vivek Chetan Jadhav
Branch Computer Engineering
Academic Term 2024-25(ODD)
Institute Name Government Engineering College Bhavnagar
Directorate of Technical
Education,Gandhinagar,Gujarat
Government Engineering College, Bhavnagar
Computer Engineering Department
CERTIFICATE
No. _______________ of B.E. Semester VII Ifrom Computer Engineering Department of this
Institute (GTU Code: 021) has satisfactorily completed the Practical work for the subject Compiler
Place: ___________
Date: ___________
Preface
Compiler Design is an essential subject for computer science and engineering students. It deals with
the theory and practice of developing a program that can translate source code written in one
programming language into another language. The main objective of this subject is to teach students
how to design and implement a compiler, which is a complex software system that converts high-
level language code into machine code that can be executed on a computer.The design of compilers
is an essential aspect of computer science, as it helps in bridging the gap between human-readable
code and machine-executable code.
This lab manual is designed to help students understand the concepts of compiler design and
develop hands-on skills in building a compiler. The manual provides step-by-step instructions for
implementing a simple compiler using C and other applicable programming language, covering all
the essential components such as lexical analyzer, parser, symbol table, intermediate code
generator, and code optimizer.
The manual is divided into several sections, each focusing on a specific aspect of compiler design.
The first section provides an introduction to finite automata, phases of compiler and covering the
basic concepts of lexical analysis. The subsequent sections cover parsing, code generation, and
study of Learning Basic Block Scheduling. Each section includes detailed instructions for
completing the lab exercises and programming assignments, along with examples and code
snippets.
The lab manual also includes a set of challenging programming assignments and quizzes that will
help students test their understanding of the subject matter. Additionally, the manual provides a list
of recommended books and online resources for further study.
This manual is intended for students studying Compiler Design and related courses. It is also useful
for software developers and engineers who want to gain a deeper understanding of compiler design
and implementation. We hope that this manual will be a valuable resource for students and
instructors alike and will contribute to the learning and understanding of compiler design.
Compiler Design (3170701) 210210107052
OBJECTIVE:
This laboratory course is intended to make the students experiment on the basic
techniques of compiler construction and tools that can used to perform syntax-
directed translation of a high-level programming language into an executable code.
Students will design and implement language processors in C by using tools to
automate parts of the implementation process. This will provide deeper insights into
the more advanced semantics aspects of programming languages, code generation,
machine independent optimizations, dynamic memory allocation, and object
orientation.
OUTCOMES:
Upon the completion of Compiler Design practical course, the student will be able
to:
1. Understand the working of lex and yacc compiler for debugging of programs.
2. Understand and define the role of lexical analyzer, use of regular expression and
transition diagrams.
3. Understand and use Context free grammar, and parse tree construction.
4. Learn & use the new tools and technologies used for designing a compiler.
5. Develop program for solving parser problems.
6. Learn how to write programs that execute faster.
Compiler Design (3170701) 210210107052
DTE’s Vision
Institute’s Vision
● To transform students into good human beings, responsible citizens and employable
engineering graduates through imbibing human values and excellence in technical
education.
Institute’s Mission
● To educate students from the local and rural areas, so they become enlightened individuals,
improving the living standards of their families, industry and society. We will provide
individual attention, quality education and take care of character building.
Department’s Vision
● To achieve excellence for providing value based education in computer science and
Information Technology through innovation, team work, and ethical practices.
Department’s Mission
● To produce graduates according to the need of industry, government, society and scientific
community and to develop partnership with industries, government agencies and R & D
Organizations for knowledge sharing and overall development of faculties and students.
● To motivate students/graduates to be entrepreneurs.
● To motivate students to participate in reputed conferences, workshops, symposiums,
seminars and related technical activities.
● To impart human and ethical values in our students for better serving of society.
Compiler Design (3170701) 210210107052
Sr. C C C C
No Title of experiment O O O O
. 1 2 3 4
Implementation of Finite Automata and String Validation.
1. √
Compiler Design is a vital subject in computer science and engineering that focuses on the
design and implementation of compilers. Here are some industry-relevant skills that students
can develop while studying Compiler Design:
● Proficiency in programming languages: A good understanding of programming languages
is essential for building compilers. Students should be proficient in programming
languages such as C/C++, Java, and Python.
● Knowledge of data structures and algorithms: Compiler Design involves the
implementation of various data structures and algorithms. Students should have a good
understanding of data structures such as stacks, queues, trees, and graphs, and algorithms
such as lexical analysis, parsing, and code generation.
● Familiarity with compiler tools: Students should be familiar with compiler tools such as
Compiler Design (3170701) 210210107052
Lex and Yacc. These tools can help automate the process of creating a compiler, making
it more efficient and error-free.
● Debugging skills: Debugging is an essential skill for any programmer, and it is particularly
important in Compiler Design. Students should be able to use debugging tools to find and
fix errors in their code.
● Optimization techniques: Code optimization is a critical component of Compiler Design.
Students should be familiar with optimization techniques such as constant folding, dead
code elimination, and loop unrolling, which can significantly improve the performance of
the compiled code.
● Collaboration and communication skills: Compiler Design is a complex subject that
requires collaboration and communication between team members. Students should
develop good communication and collaboration skills to work effectively with their peers
and instructors.
By developing these industry-relevant skills, students can become proficient in Compiler
Design and be better equipped to meet the demands of the industry.
1. Teacher should provide the guideline with demonstration of practical to the students with
all features.
2. Teacher shall explain basic concepts/theory related to the experiment to the students before
starting of each practical
3. Involve all the students in performance of each experiment.
4. Teacher is expected to share the skills and competencies to be developed in the students
and ensure that the respective skills and competencies are developed in the students after
the completion of the experimentation.
5. Teachers should give opportunity to students for hands-on experience after the
demonstration.
6. Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by concerned industry.
7. Give practical assignment and assess the performance of students based on task assigned
to check whether it is as per the instructions or not.
8. Teacher is expected to refer complete curriculum of the course and follow the guidelines
for implementation.
1. Students are expected to carefully listen to all the theory classes delivered by the faculty
members and understand the COs, content of the course, teaching and examination scheme, skill
set to be developed etc.
2. Students will have to perform experiments considering C or other applicable programming
language using Lex tool or Yacc.
3. Students are instructed to submit practical list as per given sample list shown on next
page.Students have to show output of each program in their practical file.
4. Student should develop a habit of submitting the experimentation work as per the schedule and
Compiler Design (3170701) 210210107052
1. Handle equipment with care: When working in the lab, students should handle equipment
and peripherals with care. This includes using the mouse and keyboard gently, avoiding
pulling or twisting network cables, and handling any hardware devices carefully.
2. Avoid water and liquids: Students should avoid using wet hands or having any liquids near
the computer equipment. This will help prevent damage to the devices and avoid any safety
hazards.
3. Shut down the PC properly: At the end of the lab session, students should shut down the
computer properly. This includes closing all programs and applications, saving any work,
and following the correct shutdown procedure for the operating system.
4. Obtain permission for laptops: If a student wishes to use their personal laptop in the lab, they
should first obtain permission from the Lab Faculty or Lab Assistant. They should follow
all lab rules and guidelines and ensure that their laptop is properly configured for the lab
environment.
Compiler Design (3170701) 210210107052
Index
(Progressive Assessment Sheet)
Sr. Title of experiment Pag Date of Date of Asse Sign. of Rem
No e performan submissi ssme Teacher arks
. No. ce on nt with
Mar date
ks
1 Implementation of Finite Automata and String
Validation.
2 Introduction to Lex Tool. Implement following
Programs Using Lex:
a. Generate Histogram of words
b. Caesar Cypher
c. Extract single and multiline comments
from C Program
3 Implement following Programs Using Lex:
a. Convert Roman to Decimal
b. Check weather given statement is
compound or simple
c. Extract html tags from .html file
4 Introduction to YACC and generate Calculator
Program.
5 Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser
6 Implement a program for constructing
a. Recursive Decent Parser (RDP)
b. LALR Parser
7 Implement a program for constructing Operator
Precedence Parsing.
8 Generate 3-tuple intermediate code for given infix
expression.
9 Extract Predecessor and Successor from given
Control Flow Graph.
10 Study of Learning Basic Block Scheduling
Heuristics from Optimal Data.
Total
Compiler Design (3170701) 210210107052
Experiment No - 1
Aim: To study and implement Finite Automata and validate strings using it.
Date:
Objectives:
1. To understand the concept of Finite Automata.
2. To implement Finite Automata using programming language.
3. To validate strings using Finite Automata.
Theory:
Finite Automata is a mathematical model that consists of a finite set of states and a set of
transitions between these states. It is used to recognize patterns or validate strings. In a Finite
Automata, there are five components:
1. A set of states
2. An input alphabet
3. A transition function
4. A start state
5. A set of final (or accepting) states
In the implementation of Finite Automata and string validation, we need to create a Finite
Automata that recognizes a specific pattern or set of patterns. The Finite Automata consists of
states, transitions between the states, and a set of accepting states. The input string is then
validated by passing it through the Finite Automata, starting at the initial state, and following
the transitions until the string is either accepted or rejected.
String validation using Finite Automata is useful in a variety of applications, including pattern
matching, text processing, and lexical analysis in programming languages. It is an efficient
method for validating strings and can handle large inputs with minimal memory and time
complexity.
1
Compiler Design (3170701) 210210107052
Solution:
Program:
Sample Program-1: Create a program in Python that implements a Finite Automata to
validate strings that start with 'a' and end with 'b'.
Code:
# Define the Finite Automata
states = {'q0', 'q1', 'q2'}
alphabet = {'a', 'b'}
start_state = 'q0'
2
Compiler Design (3170701) 210210107052
accept_states = {'q2'}
transitions = {
('q0', 'a'): 'q1',
('q1', 'a'): 'q1',
('q1', 'b'): 'q2'
}
3
Compiler Design (3170701) 210210107052
Sample Program-1:
In this implementation of Finite Automata and string validation, I have learned how to create
a Finite Automata that recognizes a specific pattern or set of patterns, and how to validate
strings using the Finite Automata. I have also learned how to implement a Finite Automata
using programming language, and how to test it with different inputs. By using Finite
Automata, I can efficiently validate strings and recognize patterns, making it a powerful tool
in computer science and related fields.
Sample Program-2:
In this example, the program searches for the pattern “abacaba” in the text
abcabacabacabacaba. It computes the Finite Automata using the compute_transition_function
function, and then uses it to search for the pattern in the text using the search function. It outputs
that the pattern is found at index 3.
Quiz:
4
Compiler Design (3170701) 210210107052
with discrete inputs, outputs, states and a set of transitions from state to state that occurs
on input symbols from the alphabet Σ.
• Also it is used to analyze and recognize Natural language Expressions. The finite
automata or finite state machine is an abstract machine that has five elements or tuples.
# Example usage:
input_string = "abaca"
result = accept_even_as(input_string)
if result:
print("Accepted: The string has an even number of 'a's.")
else:
print("Rejected: The string does not have an even number of 'a's.")
5
Compiler Design (3170701) 210210107052
4. What is the process of building a finite automaton to search a pattern in a text string?
Ans:
• Define the Problem: Clearly specify the pattern you want to find in the text.
• Construct the Transition Table: Create a table that defines how the automaton transitions
between states based on input characters. Each state represents a partial match of the
pattern.
• Processing the Text: Start processing the input text, following transitions in the DFA.
When you reach an accepting state, you've found a match.
• Reporting Matches: Record the positions of matches as they occur in the text.
• Optimizations: Apply optimizations like failure functions for efficiency.
• Testing and Validation: Test the DFA with different patterns and texts to ensure it works
correctly.
• Application: Integrate the DFA into your application for pattern searching.
• This process allows efficient pattern matching in text strings using a finite automaton.
• Algorithm-
FINITE AUTOMATA (T, P)
State <- 0
for l <- 1 to n
State <- δ(State, ti)
If State == m then
Match Found
end
end
5. What are the advantages of using a finite automaton to search for patterns in text
strings?
Ans:
• Using a finite automaton (specifically a deterministic finite automaton or DFA) to search
for patterns in text strings offers several advantages:
• Efficiency: DFAs can perform pattern matching in linear time with respect to the length
of the text. This efficiency makes them well-suited for large-scale text processing tasks.
• Constant Space: DFAs have a fixed memory footprint regardless of the size of the pattern
or text. This means they are memory-efficient and can be used for real-time applications
and with limited resources.
• Determinism: DFAs provide deterministic behavior, ensuring that for a given pattern and
6
Compiler Design (3170701) 210210107052
Suggested Reference:
1. https://www.tutorialspoint.com/what-is-finite-automata
Problem
Knowledge Implementat Testing & Creativity in
Recognition
Rubrics (2) ion (2) Debugging(2) logic/code (2) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
7
Compiler Design (3170701) 210210107052
Experiment No - 2
Aim: Introduction to Lex Tool. Implement following Programs Using Lex
a. Generate Histogram of words
b. Caesar Cypher
c. Extract single and multiline comments from C Program
Date:
Competency and PracticalSkills: Understanding of Lex tool and its usage in compiler design,
understanding of regular expressions and data structures, improving programming skill to
develop programs using lex tool
Objectives:
1. To introduce students to Lex tool and its usage in compiler design
2. To provide practical knowledge of regular expressions and their use in pattern matching
3. To enhance students' understanding of data structures such as arrays, lists, and trees
4. To develop students' problem-solving skills in developing and implementing programs
using Lex tool
5. To develop students' debugging skills to identify and resolve program errors and issues
Theory:
❖ COMPILER:
• A compiler is a translator that converts the high-level language into the machine language.
• High-level language is written by a developer and machine language can be understood by
the processor. Compiler is used to show errors to the programmer.
• The main purpose of a compiler is to change the code written in one language without
changing the meaning of the program.
• When you execute a program which is written in HLL programming language then it
executes into two parts.
• In the first part, the source program compiled and translated into the object program (low
level language).
• In the second part, the object program translated into the target program through the
assembler.
8
Compiler Design (3170701) 210210107052
❖ LEX:
• Lex is a program that generates lexical analyzers. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of tokens.
• It reads the input stream and produces the source code as output through implementing the
lexical analyzer in the C program.
• During the first phase the compiler reads the input and converts strings in the source to
tokens.
• With regular expressions we can specify patterns to lex so it can generate code that will
allow it to scan and match strings in the input. Each pattern specified in the input to lex has
an associated action.
• Typically an action returns a token that represents the matched string for subsequent use by
the parser. Initially we will simply print the matched string rather than return a token value.
⮚ Function of LEX:
• Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler
runs the lex.1 program and produces a C program lex.yy.c.
• Finally C compiler runs the lex.yy.c program and produces an object program a.out.
• a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.
9
Compiler Design (3170701) 210210107052
STEPS:
• Step 1 : An input file describes the lexical analyzer to be generated named lex.l is written
in lex language. The lex compiler transforms lex.l to C program, in a file that is always
named lex.yy.c.
• Step 2 : The C compiler compile lex.yy.c file into an executable file called a.out.
• Step 3 : The output file a.out take a stream of input characters and produce a stream of
tokens.
• Program Structure:
Rules Section: The rules section contains a series of rules in the form: pattern action and
pattern must be unintended and action begin on the same line in {} brackets. The rule section
is enclosed in “%% %%”.
Syntax:
%%
pattern action
%%
User Code Section: This section contains C statements and additional functions. We can also
compile these functions separately and load with thelexical analyzer.
10
Compiler Design (3170701) 210210107052
Program:
Code:
%{
#include<stdio.h>
#include<string.h>
#define MAX 1000
%}
/* Declarations */
int count = 0;
char words[MAX][MAX];
/* Rule Section */
%%
[a-zA-Z]+ {
int i, flag = 0;
for(i=0; i<count; i++) {
if(strcmp(words[i], yytext) == 0) {
flag = 1;
break;
}
}
if(flag == 0) {
strcpy(words[count++], yytext);
}
}
.;
%%
/* Code Section */
int main(int argc, char **argv) {
if(argc != 2) {
printf("Usage: ./a.out <filename>\n");
return 1;
}
FILE *fp = fopen(argv[1], "r");
if(fp == NULL) {
printf("Cannot open file!\n");
return 1;
}
11
Compiler Design (3170701) 210210107052
yyin = fp;
yylex();
int i, j;
printf("\nWord\t\tFrequency\n");
for(i=0; i<count; i++) {
int freq = 0;
rewind(fp);
while(fscanf(fp, "%s", words[MAX-1]) == 1) {
if(strcmp(words[MAX-1], words[i]) == 0) {
freq++;
}
}
printf("%-15s %d\n", words[i], freq);
}
fclose(fp);
return 0;
}
Code:
%{
#include<stdio.h>
int shift;
%}
%%
[a-z] {char ch = yytext[0];
ch += shift;
if (ch> 'z') ch -= 26;
printf ("%c" ,ch );
}
[A-Z] { char ch = yytext[0] ;
ch += shift;
if (ch> 'Z') ch -= 26;
printf("%c",ch);
}
. {exit(0);}
%%
int main()
{
printf("Enter an no. of alphabet to shift: \n");
scanf("%d", &shift);
printf("Enter the string: \n");
yylex();
12
Compiler Design (3170701) 210210107052
return 0;
}
int yywrap(){
return 1;
}
Code :
%{
#include <stdio.h>
%}
%%
%%
/* Main function */
int main(int argc, char **argv) {
if(argc != 2) {
printf("Usage: %s <filename>\n", argv[0]);
return 1;
}
Program-1:
13
Compiler Design (3170701) 210210107052
$ lex histogram.l
$ cc lex.yy.c -o histogram -ll
$ ./histogram sample.txt
In this program, I have implemented a histogram of words using lex tool.The program counts
the frequency of each word in a given input file.It uses an array words to store all the distinct
words and counts the frequency of each word by iterating through the words array and
comparing it with the input file.The program also checks for errors such as invalid input
file.This program can be used to analyze the most frequent words in a text file or a
document.This program can be extended to handle large files by implementing a dynamic array
to store the distinct words instead of a fixed size array.
Program-2:
lex CeasarCypher.l
cc lex.yy.c
a.exe
Input File:
Output:
14
Compiler Design (3170701) 210210107052
This Lex program implements a simple Caesar cipher encryption scheme for alphabetic
characters. It takes an integer input shift to determine the number of positions to shift each
alphabetic character. During lexical analysis of the input string, it identifies lowercase and
uppercase letters, shifts them by the specified amount while maintaining their case, and prints
the encrypted result. The program prompts the user to input both the shift value and the string
to be encrypted. It provides a basic demonstration of character manipulation within Lex,
offering a practical tool for encoding text with a variable Caesar cipher.
Program-3:
lex ExtractComment.l
lex.yy.c
a.exe input.c
Input File:
Output File:
This Lex program is designed for extracting comments from a C source code file. It performs
lexical analysis on the input file ("input.c") and identifies both single-line (//) and multi-line
(/* ... */) comments. The program utilizes regular expressions to recognize and extract
comments, printing them to an output file ("out.c"). It is intended for analyzing and
documenting comments within C code, offering a practical tool for code analysis and
documentation generation. The code provides straightforward input and output handling,
making it suitable for basic comment extraction tasks.
Quiz:
15
Compiler Design (3170701) 210210107052
16
Compiler Design (3170701) 210210107052
• To perform text encryption, you would typically use encryption libraries or cryptographic
tools available in programming languages like Python, Java, C++, or specialized
cryptographic software like OpenSSL. These libraries and tools offer various encryption
algorithms and methods for secure data encryption.
• Common encryption algorithms include:
1. AES (Advanced Encryption Standard): A widely used symmetric-key encryption
algorithm.
2. RSA: An asymmetric-key encryption algorithm often used for secure communication
and digital signatures.
3. DES (Data Encryption Standard): An older symmetric-key encryption algorithm,
now considered relatively weak.
4. Triple DES (3DES): A symmetric-key encryption algorithm that applies DES three
times for increased security.
5. Blowfish: A symmetric-key encryption algorithm known for its speed and security.
6. Public Key Cryptography: Asymmetric-key encryption algorithms like RSA and
ECC (Elliptic Curve Cryptography) are used for public key encryption.
4. What is the purpose of the "Extract single and multiline comments from C Program"
program in Lex?
Ans:
• A "Extract single and multiline comments from C Program" program written in Lex
serves the purpose of parsing a C programming language source code file and extracting
both single-line comments (e.g., // comment) and multi-line comments (e.g., /* comment
*/) from the code. Here's why such a program can be useful:
1. Documentation: It retrieves comments, which contain explanations and
documentation for the code, aiding developers in understanding the code's
functionality.
2. Code Analysis: By extracting comments, it assists code analysis tools in identifying
code sections that require attention, such as bug reports or code review requests.
3. Documentation Generation: Extracted comments can be used to automatically
generate documentation, improving code documentation and maintainability.
17
Compiler Design (3170701) 210210107052
Suggested Reference:
1. https://www.geeksforgeeks.org/caesar-cipher-in-cryptography/
2. https://www.geeksforgeeks.org/lex-program-to-count-the-frequency-of-the-given-word-in-a-file/
Marks
18
Compiler Design (3170701) 210210107052
Experiment No - 3
Aim: Implement following Programs Using Lex
a. Convert Roman to Decimal
b. Check weather given statement is compound or simple
c. Extract html tags from .html file
Date:
Competency and PracticalSkills: Understanding of Lex tool and its usage in compiler design,
understanding of regular expressions and data structures, improving programming skill to
develop programs using lex tool
Objectives:
1. To introduce students to Lex tool and its usage in compiler design
2. To provide practical knowledge of regular expressions and their use in pattern matching
3. To enhance students' understanding of data structures such as arrays, lists, and trees
4. To develop students' problem-solving skills in developing and implementing programs
using Lex tool
5. To develop students' debugging skills to identify and resolve program errors and issues
Theory:
❖ LEX:
• Lex is a program that generates lexical analyzers. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of tokens.
• It reads the input stream and produces the source code as output through implementing the
lexical analyzer in the C program.
• During the first phase the compiler reads the input and converts strings in the source to
tokens.
• With regular expressions we can specify patterns to lex so it can generate code that will
allow it to scan and match strings in the input. Each pattern specified in the input to lex has
an associated action.
• Typically an action returns a token that represents the matched string for subsequent use by
the parser. Initially we will simply print the matched string rather than return a token value.
⮚ Function of LEX:
• Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler
runs the lex.1 program and produces a C program lex.yy.c.
• Finally C compiler runs the lex.yy.c program and produces an object program a.out.
• a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.
19
Compiler Design (3170701) 210210107052
• A Lex program is separated into three sections by %% delimiters. The formal of Lex source
is as follows:
%{ definitions %}
%%
{ rules }
%%
{ user subroutines }
• Definitions include declarations of constant, variable and regular definitions.
• Rules define the statement of form p1 {action1} p2 {action2}. pn {action}.
• Where pi describes the regular expression and action1 describes the actions the lexical
analyzer should take when pattern pi matches a lexeme.
• User subroutines are auxiliary procedures needed by the actions. Thesubroutine can be
loaded with the lexical analyzer and compiled separately.
STEPS:
• Step 1 : An input file describes the lexical analyzer to be generated named lex.l is written
in lex language. The lex compiler transforms lex.l to C program, in a file that is always
named lex.yy.c.
• Step 2 : The C compiler compile lex.yy.c file into an executable file called a.out.
• Step 3 : The output file a.out take a stream of input characters and produce a stream of
tokens.
• Program Structure:
Rules Section: The rules section contains a series of rules in the form: pattern action and
pattern must be unintended and action begin on the same line in {} brackets. The rule section
is enclosed in “%% %%”.
Syntax:
%%
pattern action
%%
20
Compiler Design (3170701) 210210107052
User Code Section: This section contains C statements and additional functions. We can also
compile these functions separately and load with thelexical analyzer.
Program:
Code:
%{
#include <stdio.h>
/* Rules section */
%%
I { prev_value = 1; } // If the symbol is 'I', set prev_value to 1
IV { decimal += 3; } // If the symbol is 'IV', add 3 to the decimal value
V { decimal += 5; } // If the symbol is 'V', add 5 to the decimal value
IX { decimal += 8; } // If the symbol is 'IX', add 8 to the decimal value
X { decimal += 10; } // If the symbol is 'X', add 10 to the decimal value
XL { decimal += 30; } // If the symbol is 'XL', add 30 to the decimal value
L { decimal += 50; } // If the symbol is 'L', add 50 to the decimal value
XC { decimal += 80; } // If the symbol is 'XC', add 80 to the decimal value
C { decimal += 100; } // If the symbol is 'C', add 100 to the decimal value
CD { decimal += 300; } // If the symbol is 'CD', add 300 to the decimal value
D { decimal += 500; } // If the symbol is 'D', add 500 to the decimal value
CM { decimal += 800; } // If the symbol is 'CM', add 800 to the decimal value
M { decimal += 1000; } // If the symbol is 'M', add 1000 to the decimal value
. { printf("Invalid Roman numeral\n"); exit(1); } // If any other symbol is encountered,
exit with an error message
%%
/* Code section */
int main()
{
yylex();
printf("Decimal value: %d\n", decimal);
return 0;
}
21
Compiler Design (3170701) 210210107052
Code:
%{
#include <stdio.h>
int flag = 0;
%}
%%
and|or|but|because|if|then|nevertheless { flag = 1; }
. ;
\n { return 0; }
%%
int main() {
printf("Enter the sentence:\n");
yylex();
if (flag == 0)
printf("Simple sentence\n");
else
printf("Compound sentence\n");
return 0;
}
int yywrap() {
return 1;
}
Code:
%{
#include<stdio.h>
%}
%%
\<[^>]*\> fprintf(yyout,"%s\n",yytext);
.|\n;
%%
int yywrap()
{ return 1; } int main()
{
yyin=fopen("html_page.html","r");
yylex(); return 0; }
Program-1:
22
Compiler Design (3170701) 210210107052
A Lex program efficiently converts Roman numerals to decimal values by matching each
numeral symbol to its corresponding decimal equivalent while identifying and handling invalid
symbols. Lex simplifies this conversion process by defining rules for recognition and
conversion, making it a convenient tool for this task.
Program-2:
Program-3:
The Lex program is crafted to identify and extract HTML tags from an input HTML file. It
23
Compiler Design (3170701) 210210107052
relies on regular expressions to spot HTML tags enclosed within angle brackets ("<" and ">"),
effectively disregarding them. Furthermore, it filters out tabs, spaces, and newline characters
from the input data stream.
Quiz:
1. What is Lex tool?
• Lex is a program that generates lexical analyzer. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
• It reads the input stream and produces the source code as output through implementing
the lexical analyzer in the C program.
4. How does the program check whether a given statement is compound or simple?
• The program checks whether a given statement is compound or simple by examining
its structure and the presence of control structures like loops or conditionals. If the
statement contains control structures, it is considered compound; otherwise, it's
considered simple.
5. What is the purpose of the program to extract HTML tags from an HTML file?
• The purpose of the program to remove HTML tags from an HTML file is to extract and
eliminate all HTML markup tags and their content, leaving behind only the plain text
content of the file. This program is useful for tasks that require processing or analyzing
the text within an HTML document without interference from HTML formatting and
structure.
24
Compiler Design (3170701) 210210107052
Suggested Reference:
1. Aho, A.V., Sethi, R., & Ullman, J.D. (1986). Compilers: Principles, Techniques, and
Tools.
Addison-Wesley.
2. Levine, J.R., Mason, T., & Brown, D. (2009). lex & yacc. O'Reilly Media, Inc.
3. Lex - A Lexical Analyzer Generator. Retrieved
from https://www.gnu.org/software/flex/manual/
4. Lexical Analysis with Flex. Retrieved from https://www.geeksforgeeks.org/flex-fast-
lexicalanalyzer-generator/
5. The Flex Manual. Retrieved from https://westes.github.io/flex/manual/
Marks
25
Compiler Design (3170701) 210210107052
Experiment No - 4
Aim: Introduction to YACC and generate Calculator Program
Date:
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept of YACC and its significance in compiler construction
⮚ Write grammar rules for a given language
⮚ Implement a calculator program using YACC
Theory:
YACC (Yet Another Compiler Compiler) is a tool that is used for generating parsers. It is used
in combination with Lex to generate compilers and interpreters. YACC takes a set of rules and
generates a parser that can recognize and process the input according to those rules.
The grammar rules that are defined using YACC are written in BNF (Backus-Naur Form)
notation. These rules describe the syntax of a programming language.
INPUT FILE:
→ The YACC input file is divided into three parts.
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
Definition Part:
→ The definition part includes information about the tokens used in the syntax definition.
Rule Part:
→ The rules part contains grammar definition in a modified BNF form. Actions is C code in { }
and can be embedded inside (Translation schemes).
26
Compiler Design (3170701) 210210107052
The program for generating a calculator using YACC involves the following steps:
⮚ Defining the grammar rules for the calculator program
⮚ Writing the Lex code for tokenizing the input
⮚ Writing the YACC code for parsing the input and generating the output
Program:
%{
#include<stdio.h>
#include<ctype.h>
int result;
%}
%%
[0-9]+ { yylval = atoi(yytext); return INTEGER; }
[ \t] ; /* skip whitespace */
\n { return EOL; }
. { return yytext[0]; }
%%
int yywrap(void) {
return 1;}
27
Compiler Design (3170701) 210210107052
%{
#include<stdio.h>
%}
%%
line: /* empty */
| line exp EOL { printf("= %d\n", $2); }
;
int main(void) {
yyparse();
return 0;}
void yyerror(char* s) {
fprintf(stderr, "error: %s\n", s);
}
After executing the program, we observed that the calculator program was successfully
generated using YACC. It was able to perform simple arithmetic operations such as addition,
subtraction, multiplication, and division. The program was also able to handle negative
numbers and brackets.
28
Compiler Design (3170701) 210210107052
Quiz:
1. What is YACC?
• YACC, short for Yet Another Compiler Compiler, is a tool used for generating parsers
and compilers based on formal grammars.
Suggested Reference:
1. "Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
2. "The Unix Programming Environment" by Brian W. Kernighan and Rob
https://www.geeksforgeeks.org/introduction-to-yacc/
https://www.geeksforgeeks.org/yacc-program-to-implement-a-calculator-and-recognize-a-
valid-arithmetic-expression/
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
29
Compiler Design (3170701) 210210107052
Experiment No - 5
Aim: Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser
Date:
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept parsers and its significance in compiler construction
⮚ Write first and follow set for given grammar
⮚ Implement a LL(1) and predictive grammar using top down parser
Software/Equipment: C compiler
Theory:
❖ LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done
from the Left to Right manner and the second L shows that in this parsing technique,
we are going to use the Left most Derivation Tree. And finally, the 1 represents the
number of look-ahead, which means how many symbols are you going to see when
you want to make a decision.
Essential conditions to check first are as follows:
1. The grammar is free from left recursion.
2. The grammar should not be ambiguous.
3. The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.
30
Compiler Design (3170701) 210210107052
Symbols. All the Null Productions of the Grammars will go under the Follow elements and
the remaining productions will lie under the elements of the First set.
❖ Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the
next input symbols. To make the parser back-tracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and the
input is consumed. The parser refers to the parsing table to take any decision on the input and
stack element combination.
In recursive descent parsing, the parser may have more than one production to choose from for
a single instance of input, whereas in predictive parser, each step has at most one production
to choose. There might be instances where there is no production matching the input string,
making the parsing procedure to fail.
Program-1:
#include<stdio.h>
#include<string.h>
31
Compiler Design (3170701) 210210107052
32
Compiler Design (3170701) 210210107052
33
Compiler Design (3170701) 210210107052
add_FIRST_A_to_FOLLOW_B(sc, nt);
if (first[sc - 'A']['^'])
continue;
}
else {
follow[nt - 'A'][sc] = 1;
}
break;
}
if (x == pro[i].len)
add_FOLLOW_A_to_FOLLOW_B(pro[i].str[0]
, nt);
}
}
}
}
}
}
void add_FIRST_A_to_FIRST_B(char A, char B) {
int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^') {
first[B - 'A'][i] = first[A - 'A'][i] || first[B -
'A'][i];
}
}
}
void FIRST() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_B(sc, pro[i].str[0]);
if (first[sc - 'A']['^'])
continue;
}
else {
first[pro[i].str[0] - 'A'][sc] = 1;
}
break;
}
if (j == pro[i].len)
first[pro[i].str[0] - 'A']['^'] = 1;
}
++t;
}
}
34
Compiler Design (3170701) 210210107052
35
Compiler Design (3170701) 210210107052
}
}
// display follow of each variable
printf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
char c = pro[i].str[0];
printf("FOLLOW OF %c: ", c);
for (j = 0; j < TSIZE; ++j) {
if (follow[c - 'A'][j]) {
printf("%c ", j);
}
}
printf("\n");
}
}
// display first of each variable ß
// in form A->ß
printf("\n");
for (i = 0; i < no_pro; ++i) {
printf("FIRST OF %s: ", pro[i].str);
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j]) {
printf("%c ", j);
}
}
printf("\n");
}
terminal['$'] = 1;
terminal['^'] = 0;
// printing parse table
printf("\n");
printf("\n\t**************** LL(1) PARSING TABLE *******************\n");
printf("\t--------------------------------------------------------\n");
printf("%-10s", "");
for (i = 0; i < TSIZE; ++i) {
if (terminal[i]) printf("%-10c", i);
}
printf("\n");
int p = 0;
for (i = 0; i < no_pro; ++i) {
if (i != 0 && (pro[i].str[0] != pro[i - 1].str[0]))
p = p + 1;
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j] && j != '^') {
table[p][j] = i + 1;
}
else if (first_rhs[i]['^']) {
for (k = 0; k < TSIZE; ++k) {
if (follow[pro[i].str[0] - 'A'][k]) {
36
Compiler Design (3170701) 210210107052
table[p][k] = i + 1;
}
}
}
}
}
k = 0;
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
printf("%-10c", pro[i].str[0]);
for (j = 0; j < TSIZE; ++j) {
if (table[k][j]) {
printf("%-10s", pro[table[k][j] - 1].str);
}
else if (terminal[j]) {
printf("%-10s", "");
}
}
++k;
printf("\n");
}
}
}
Program -2:
#include
#include
// Function to parse a non -terminal S
void S();
int main()
{ printf("Enter the input string: ");
gets(input);
// Start parsing with non -terminal S
S();
// Check if the entire string is parsed
if (input[pos] == '\0')
{ printf(" \nParsing Successful! \n");}
else { printf(" \nParsing Failed! \n"); }
return 0;
}
37
Compiler Design (3170701) 210210107052
void S() {
if (input[pos] == '(')
{
pos++;
L();
if (input[pos] == ')') {
pos++; } else {
printf("Error: Expected ')' \n");
return; } }
else if (input[pos] == 'a') {
pos++;
} Else
{
printf("Error: Invalid input \n");
return; }
}
void L()
{ S();
if (input[pos] == ',')
{ pos++; L(); } }
38
Compiler Design (3170701) 210210107052
In the above example, the grammar is given as input and first set and follow set of nonterminals
are identified.Further the LL1 parsing table is constructed .
Program-2:
S → (L) | a
L → SL | ε
The code is a simple demonstration of how a recursive-descent parser works for a specific context-
free grammar. It's important to note that the code only handles a limited grammar, and for more
complex languages, more sophisticated parsing techniques such as LL or LR parsing may be required.
Quiz:
39
Compiler Design (3170701) 210210107052
4. How do you calculate FIRST(),FOLLOW() sets used in Parsing Table construction? 5. Name
the most powerful parser.
Calculating the FIRST() and FOLLOW() sets in the context of constructing a parsing table
involves a systematic process that aids in the parsing of context-free grammars. Here are the
basic steps for each:
1. FIRST() Set Calculation:
• Initialize the FIRST() set for each non-terminal symbol to an empty set.
• For each production A → α, where α is a string of terminals and nonterminals:
• If α starts with a terminal, add it to FIRST(A).
• If α starts with a non-terminal B, add FIRST(B) to FIRST(A) and continue until a
terminal is found.
• If α derives the empty string, add ε to FIRST(A).
• Repeat the process until no further changes occur.
40
Compiler Design (3170701) 210210107052
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/construction-of-ll1-parsing-
table/
3. http://www.cs.ecu.edu/karl/5220/spr16/Notes/Top-down/LL1.html
https://slaystudy.com/ll1-parsing-table-program-in-c/
Marks
41
Compiler Design (3170701) 210210107052
Experiment No - 06
Date:
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the RDP ,broad classification of bottom up parsers and its significance in
compiler construction
⮚ Verifying whether the string is accepted for RDP, a given grammar is parsed using LR
parsers.
⮚ Implement a RDP and LALR parser
Software/Equipment: C compiler
Theory:
❖ Recursive Descent Parser:
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It
can be defined as a Parser that uses the various recursive procedure to process the input string
with no backtracking. It can be simply performed using a Recursive language. The first symbol
of the string of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a
procedure. The objective of each procedure is to read a sequence of input characters that can
be produced by the corresponding non-terminal, and return a pointer to the root of the parse
tree for the non-terminal. The structure of the procedure is prescribed by the productions for
the equivalent non-terminal.
The recursive procedures can be simply to write and adequately effective if written in a
language that executes the procedure call effectively. There is a procedure for each non-
terminal in the grammar. It can consider a global variable lookahead, holding the current input
token and a procedure match (Expected Token) is the action of recognizing the next token in
the parsing process and advancing the input stream pointer, such that lookahead points to the
next token to be parsed. Match () is effectively a call to the lexical analyzer to get the next
token.
For example, input stream is a + b$.
lookahead == a
match()
42
Compiler Design (3170701) 210210107052
lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
❖ LALR (1) Parsing:
LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use
the canonical collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different
look ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example
S → AA
A → aA
A→b
Add Augment Production, insert '•' symbol at the first position for every production in
G and also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by
the non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "•" is followed by the non-
terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non-
terminal. So, the I3 State becomes
43
Compiler Design (3170701) 210210107052
44
Compiler Design (3170701) 210210107052
Program-1:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
char input[10];
int i,error;
void E();
void T();
void Eprime();
void Tprime();
void F();
main()
{
i=0;
error=0;
printf("Enter an arithmetic expression : "); // Eg: a+a*a
gets(input);
E();
if(strlen(input)==i&&error==0)
printf("\nAccepted..!!!\n");
else printf("\nRejected..!!!\n");
}
void E()
{
T();
Eprime();
45
Compiler Design (3170701) 210210107052
}
void Eprime()
{
if(input[i]=='+')
{
i++;
T();
Eprime();
}
}
void T()
{
F();
Tprime();
}
void Tprime()
{
if(input[i]=='*')
{
i++;
F();
Tprime();
}
}
void F()
{
if(isalnum(input[i]))i++;
else if(input[i]=='(')
{
i++;
E();
if(input[i]==')')
i++;
else error=1;
}
else error=1;
}
Program-2:
• Calculator.l
%{
#include<stdio.h>
#include<ctype.h>
#include "calculator.tab.h"
%}
%%
[0-9]+ { yylval = atoi(yytext); return INTEGER; }
46
Compiler Design (3170701) 210210107052
• Calculator.y:
%{
#includes<stdio.h>
extern int yylex();
extern void yyerror(char *);
%}
int main(void)
{ yyparse();
return 0;
}
void yyerror(char* s)
{
fprintf(stderr, "error: %s\n", s);
}
Program -1:
47
Compiler Design (3170701) 210210107052
In the above output, as pe the grammar provided and as per calling procedure , the tree is
parsed and thereby the inputted strings are mapped w.r.t calling procedure ; and the string/s
which are successfully parsed are accepted and others rejected.
Program-2:
The code defines a simple calculator that can perform basic arithmetic operations on integers. It supports
addition, subtraction, multiplication, and division, along with parentheses for grouping expressions. You
can compile the code using appropriate tools for Lex and Yacc, such as Flex and Bison. Once compiled,
you can run the executable and input mathematical expressions to get the corresponding results.
48
Compiler Design (3170701) 210210107052
Quiz:
1. What do you mean by shift reduce parsing?
• Shift-reduce parsing is a bottom-up parsing technique where the parser shifts the input onto
a stack until it can reduce it according to the production rules of a grammar. It involves
shifting symbols onto a stack and reducing them according to the grammar rules until the
entire input is parsed.
49
Compiler Design (3170701) 210210107052
3. Adjust the transitions in the parsing table accordingly, ensuring that transitions from
the merged item sets are appropriately updated to reflect the merged state.
4. Update the lookahead sets for the merged item sets to ensure that the parser can
handle the combined grammar rules accurately.
5. Reconstruct the parsing table based on the updated item sets and transitions, taking
care to resolve any potential conflicts that may arise during the merging process.
• By following these steps, you can effectively merge item sets and streamline the
construction of parsing tables, facilitating more efficient and optimized parsing of input
strings based on the given grammar rules.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/recursive-descent-parser/
3. https://www.youtube.com/watch?v=odoHgcoombw
4. https://www.geeksforgeeks.org/lalr-parser-with-examples/
Marks
50
Compiler Design (3170701) 210210107052
Experiment No - 07
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept of OPG its significance in compiler construction
⮚ Write precedence relations for grammar
⮚ Implement a OPG using C compiler
Software/Equipment: C compiler
Theory:
Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used to a class of
Grammars known as Operator Grammar.
A Grammar G is Operator Grammar if it has the following
properties −
● Production should not contain ϵ on its right side.
● There should not be two adjacent non-terminals at the right side of production.
Example1 − Verify whether the following Grammar is operator
Grammar or not.
E → E A E |(E)|id
A → +| − | ∗
Solution
No, it is not an operator Grammar as it does not satisfy property 2 of operator Grammar.
As it contains two adjacent Non-terminals on R.H.S of production E → E A E.
We can convert it into the operator Grammar by substituting the value of A in E → E A E.
E → E + E |E − E |E * E |(E) | id.
Operator Precedence Relations
Three precedence relations exist between the pair of terminals.
Relation Meaning
51
Compiler Design (3170701) 210210107052
52
Compiler Design (3170701) 210210107052
Program:
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
void main()
{
char grm[20][20], c;
while (c != '\0') {
flag = 1;
else {
flag = 0;
f();
}
if (c == '$') {
flag = 0;
f();
}
c = grm[i][++j];
}
}
if (flag == 1)
printf("Operator grammar");
}
53
Compiler Design (3170701) 210210107052
In the above example ,the grammar is analysed as per operator grammar rules and the output
is against the rules of OPG so, it is not an operator grammar.
Input :2
A=A/A
B=A+A
In the above example ,the grammar is analysed as per operator grammar rules and the output
favors the rules of OPG(operator present between two non terminals) so, it is not an operator
grammar.
Quiz:
1. Define operator grammar.
• An operator grammar is a type of context-free grammar that defines the syntax rules for
expressions involving operators. It specifically focuses on the rules governing the usage
and combination of operators within expressions. Operator grammars are commonly used
in the context of defining the syntax of programming languages, mathematical
expressions, and formal language specifications. They typically specify the valid
arrangements of operators and operands, along with any associated precedence and
associativity rules. The grammar outlines how operators can be combined to form valid
expressions, providing a foundation for parsing and evaluating complex expressions in
various computational contexts.
54
Compiler Design (3170701) 210210107052
55
Compiler Design (3170701) 210210107052
Suggested Reference:
1. https://www.gatevidyalay.com/operator-precedence-parsing/
2. https://www.geeksforgeeks.org/role-of-operator-precedence-parser/
Marks
56
Compiler Design (3170701) 210210107052
Experiment No - 08
Date:
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the different intermediate code representations and its significance in
compiler construction
⮚ Write intermediate code for given infix expression
Software/Equipment: C compiler
Theory:
Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.
57
Compiler Design (3170701) 210210107052
58
Compiler Design (3170701) 210210107052
t4 = b * t3
t5 = t2 + t4
a = t5
2. Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
● Temporaries are implicit and difficult to rearrange code.
● It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
3. Indirect Triples – This representation makes use of pointer to the listing of all references
to computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and easier to
rearrange code.
Example – Consider expression a = b * – c + b * – c
Question – Write quadruple, triples and indirect triples for following expression : (x + y) * (y
+ z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
59
Compiler Design (3170701) 210210107052
t4 = t1 + z
t5 = t3 + t4
Program:
void main()
{
printf("Enter the expression:");
scanf("%s",j);
printf("\tThe Intermediate code is:\n");
small();
}
if(j[i]=='*')
printf("\tt%d=%s*%s\n",c,a,b);
if(j[i]=='/')
printf("\tt%d=%s/%s\n",c,a,b);
if(j[i]=='+')
printf("\tt%d=%s+%s\n",c,a,b);if(j[i]=='-')
printf("\tt%d=%s-%s\n",c,a,b);
if(j[i]=='=')
printf("\t%c=t%d",j[i-1],--c);
sprintf(ch,"%d",c);
j[i]=ch[0];
c++;
small();
}
void small()
{
pi=0;l=0;
for(i=0;i<strlen(j);i++)
{
for(m=0;m<5;m++)
60
Compiler Design (3170701) 210210107052
if(j[i]==sw[m])
if(pi<=p[m])
{
pi=p[m];
l=1;
k=i;
}}
if(l==1)
dove(k);
else
exit(0);}
Observations and Conclusion:
In the above example the user is asked to write an infix expression and the output is generated
intermediate code(3- address code).
Quiz:
1. What are the different implementation methods for three-address code?
• Three-address code is a form of intermediate code that represents the code in a simple
and linear form, using at most three operands per instruction. There are several
implementation methods for generating and representing three-address code, including:
• Quadruples: Quadruples represent three-address code using four fields: an operation
code, and three operands (source operands and result operand). It's a straightforward
method to represent instructions and is commonly used in compilers.
• Triples: Triples are similar to quadruples but use only three fields: an operation code and
two operands. They can be converted into quadruples by adding an extra field for the
result.
• Indirect triples: Indirect triples store the operands as memory locations, allowing for
more flexibility in handling complex expressions.
• Syntax trees: Syntax trees can be used to represent three-address code. They are
hierarchical structures that break down the code into a more abstract and visual
representation, enabling easier manipulation and translation into other forms of code.
• These methods help in the conversion of high-level languages into low-level languages,
making it easier to perform optimizations and generate efficient machine code during the
compilation process. They are crucial for facilitating the translation and optimization
phases in the design and implementation of compilers and interpreters.
61
Compiler Design (3170701) 210210107052
62
Compiler Design (3170701) 210210107052
between the source code and the various phases of the compiler, such as parsing, semantic
analysis, optimization, and code generation. They enable efficient manipulation and
analysis of code, making it easier to implement various program analysis and
transformation techniques.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman.
2. https://www.geeksforgeeks.org/introduction-to-intermediate-representationir/
3. https://cs.lmu.edu/~ray/notes/ir/
Marks
63
Compiler Design (3170701) 210210107052
Experiment No - 09
Aim: Extract Predecessor and Successor from given Control Flow Graph.
Date:
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept control structure (in blocks) in compiler
⮚ Write the predecessor and successor for given graph.
64
Compiler Design (3170701) 210210107052
65
Compiler Design (3170701) 210210107052
Firstly, we compute the basic blocks (which is already done above). Secondly, we assign the
flow control information.
Program:
// C++ program to find predecessor and successor in a
BST#include<iostream>
usingnamespacestd;
// BST Node
structNode
{
intkey;
structNode*left,*right;
};
// If key is present at
rootif(root->key
==key)
{
// the maximum value in left subtree is
predecessorif(root->left !=NULL)
66
Compiler Design (3170701) 210210107052
{
Node* tmp = root-
>left;while(tmp-
>right)
tmp = tmp-
>right;pre=tmp ;
}
}
return;
}
67
Compiler Design (3170701) 210210107052
Node *root =
NULL;root =
insert(root,
50);insert(root,30);
insert(root,20);
insert(root,40);
insert(root,75);
insert(root,60);
insert(root,80);
if(suc!=NULL)
cout << "Successor is " << suc-
68
Compiler Design (3170701) 210210107052
>key;else
cout << "No
Successor";return0;
}
In the above example user gets predecessor and successor of a given specific node .
Quiz:
1. What is flowgraph?
• A flowgraph, also known as a control flow graph, is a graphical representation of the
control flow or execution flow of a program. It depicts the sequence of instructions and
the paths that the program can take during its execution. In a flowgraph, nodes represent
individual instructions or basic blocks, and directed edges between nodes represent the
possible control transfers between these instructions or blocks.
• Key characteristics of a flowgraph include:
• Nodes: Nodes represent specific program statements, basic blocks, or individual
instructions.
• Edges: Edges between nodes represent the control flow between different parts of the
program, illustrating the possible paths that the program can follow during execution.
• Flowgraphs are essential for program analysis, optimization, and understanding the
behavior of a program during its execution. They are commonly used in various stages
of the compilation process and play a vital role in identifying control dependencies,
optimizing code, and detecting potential issues within the program's control flow.
2. Define DAG.
• DAG stands for Directed Acyclic Graph. It is a finite directed graph that contains no
directed cycles. In other words, it is a directed graph that has a defined direction between
the edges
• but does not contain any directed cycles, which means it is not possible to traverse
through the graph and return to the same node following the direction of the edges.
• Key characteristics of a DAG include:
• Directed Edges: Edges in a DAG have a specific direction associated with them,
indicating the relationship between the nodes.
• Acyclic Property: DAGs do not contain any directed cycles, which means it is not
possible to traverse the graph and return to the same node following the direction of the
edges.
• DAGs find applications in various fields, including computer science, mathematics, and
scheduling algorithms. They are commonly used for representing dependencies between
tasks, scheduling jobs, representing arithmetic expressions, and optimizing computations
69
Compiler Design (3170701) 210210107052
3. Define Backpatching
• Backpatching is a technique used in compilers and interpreters to handle the translation
of high-level programming constructs into lower-level code. It involves delaying the
generation of certain code instructions until additional information becomes available.
Specifically, it is used for filling in the target addresses of control flow statements such
as jumps and branches after the addresses become known during the code generation
process.
• Key points about backpatching:
• Delayed Address Resolution: Backpatching delays the assignment of actual addresses or
locations for target instructions until all the necessary information is available.
• Temporary Placeholder: During the compilation or interpretation process, backpatching
may use temporary placeholders for the addresses, which are later updated with the
correct target addresses.
• Optimization and Efficiency: Backpatching aids in code optimization and efficiency by
allowing the compiler or interpreter to handle control flow instructions more effectively.
• Overall, backpatching is an essential technique for managing control flow statements
efficiently during the translation of high-level code into lower-level machine code,
helping to improve the overall performance and efficiency of the compilation process.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman
2. https://www.geeksforgeeks.org/data-flow-analysis-compiler/
70
Compiler Design (3170701) 210210107052
Marks
71
Compiler Design (3170701) 210210107052
Experiment No - 10
Aim: Study of Learning Basic Block Scheduling Heuristics from Optimal Data.
Date:
Objectives:
By the end of this experiment, the students should be able to:
⮚ Understanding the concept of basic block scheduling and its importance in compiler
optimization.
⮚ Understanding the various heuristics used for basic block scheduling.
⮚ Analyzing optimal data to learn the basic block scheduling heuristics.
⮚ Comparing the performance of the implemented basic block scheduler with other
commonly used basic block schedulers.
Theory:
Instruction scheduling is an important step for improving the performance of object code
produced by a compiler. Basic block scheduling is important in its own right and also as a
building block for scheduling larger groups of instructions such as superblocks. The basic block
instruction scheduling problem is to find a minimum length schedule for a basic block a
straight-line sequence of code with a single-entry point and a single exit point subject to
precedence, latency, and resource constraints. Solving the problem exactly is known to be
difficult, and most compilers use a greedy list scheduling algorithm coupled with a heuristic.
The heuristic is usually hand-crafted, a potentially time-consuming process. Modern
architectures are pipelined and can issue multiple instructions per time cycle. On such
processors, the order that the instructions are scheduled can significantly impact performance.
The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single entry point and a single exit point
subject to precedence, latency, and resource constraints. Instruction scheduling for basic blocks
is known to be NP-complete for realistic architectures. The most popular method for scheduling
basic blocks continues to be list scheduling.
For e.g.: We consider multiple-issue pipelined processors. On such processors, there are
multiple functional units and multiple instructions can be issued (begin execution) each clock
cycle. Associated with each instruction is a delay or latency between when the instruction is
issued and when the result is available for other instructions which use the result. In this paper,
we assume that all functional units are fully pipelined and that instructions are typed. Examples
of types of instructions are load/store, integer, floating point, and branch instructions. We use
72
Compiler Design (3170701) 210210107052
the standard labelled directed acyclic graph (DAG) representation of a basic block (see Figure
1(a)). Each node corresponds to an instruction and there is an edge from i to j labelled with a
positive integer l (i, j) if j must not be issued until i has executed for l (i, j) cycles. Given a
labelled dependency DAG for
a basic block, a schedule for a multiple-issue processor specifies an issue or start time for each
instruction or node such that the latency constraints are satisfied and the resource constraints
are satisfied. The latter are satisfied if, at every time cycle, the number of instructions of each
type issued at that cycle does not exceed the number of functional units that can execute
instructions of that type. The length of a schedule is the number of cycles needed for the
schedule to complete; i.e., each instruction has been issued at its start time and, for each
instruction with no successors, enough cycles have elapsed that the result for the instruction is
available. The basic block instruction scheduling problem is to construct a schedule with
minimum length.
Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures.
The most popular method for scheduling basic blocks continues to be list scheduling. A list
scheduler takes a set of instructions as represented by a dependency DAG and builds a schedule
using a best-first greedy heuristic. A list scheduler generates the schedule by determining all
instructions that can be scheduled at that time step, called the ready list, and uses the heuristic
to determine the best instruction on the list. The selected instruction is then added to the partial
schedule and the scheduler determines if any new instructions can be added to the ready list.
The heuristic in a list scheduler generally consists of a set of features and an order for testing
the features. Some standard features are as follows. The path length from a node i to a node j
in a DAG is the maximum number of edges along any path from i to j. The critical-path distance
from a node i to a node j in a DAG is the maximum sum of the latencies along any path from i
to j. Note that both the path length and the critical-path distance from a node i to itself is zero.
A node j is a descendant of a node i if there is a directed path from i to j; if the path consists of
a single edge, j is also called an immediate successor of i. The earliest start time of a node i is
a lower bound on the earliest cycle in which the instruction i can be scheduled.
In supervised learning of a classifier from examples, one is given a training set of instances,
where each instance is a vector of feature values and the correct classification for that instance,
and is to induce a classifier from the instances. The classifier is then used to predict the class
of instances that it has not seen before. Many algorithms have been proposed for supervised
learning. One of the most widely used is decision tree learning. In a decision tree the internal
nodes of the tree are labelled with features, the edges to the children of a node are labelled with
the possible values of the feature, and the leaves of the tree are labelled with a classification.
To classify a new example, one starts at the root and repeatedly tests the feature at a node and
follows the appropriate branch until a leaf is reached. The label of the leaf is the predicted
classification of the new example.
73
Compiler Design (3170701) 210210107052
Algorithm:
This document is on automatically learning a good heuristic for basic block scheduling using
supervised machine learning techniques. The novelty of our approach is in the quality of the
training data we obtained training instances from very large basic blocks and we performed an
extensive and systematic analysis to identify the best features and to synthesize new features—
and in our emphasis on learning a simple yet accurate heuristic.
● Instruction scheduling is an important step for improving the performance of object code
produced by a compiler.
● Basic block scheduling is important as a building block for scheduling larger groups of
instructions such as superblocks.
● The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single-entry point and a single exit point
subject to precedence, latency, and resource constraints.
● Solving the problem exactly is known to be difficult, and most compilers use a greedy list
74
Compiler Design (3170701) 210210107052
Quiz:
1. What is the basic block instruction scheduling problem?
• The basic block instruction scheduling problem is a crucial optimization task in compiler
design that focuses on rearranging the order of instructions within a basic block to
improve the performance of the generated code. The goal is to minimize the overall
execution time by reducing pipeline stalls, data hazards, and other dependencies that can
lead to inefficient processor utilization.
• Key points about the basic block instruction scheduling problem include:
1. Dependency Management: The problem involves analyzing data dependencies and
control flow constraints within a basic block to determine the most efficient sequence
of instructions.
2. Optimization Objectives: The primary objective is to minimize pipeline stalls, such
as data hazards, structural hazards, and control hazards, to maximize the utilization
of hardware resources.
3. Scheduling Heuristics: Various heuristics and algorithms, such as list scheduling,
dynamic programming, and greedy algorithms, are employed to determine an
optimal or near-optimal schedule for the instructions within the basic block.
4. Compiler Efficiency: Efficient instruction scheduling can significantly improve the
overall performance of the generated code, enabling the compiler to produce
executable programs that utilize the underlying hardware more effectively and
execute tasks with improved speed and efficiency.
2. Why is instruction scheduling important for improving the performance of object code
producedby a compiler?
• Instruction scheduling is crucial for improving the performance of object code generated
by a compiler due to the following reasons:
1. Resource Utilization: Efficient instruction scheduling reduces stalls and hazards,
ensuring better utilization of hardware resources such as the CPU and memory.
2. Pipeline Optimization: Proper scheduling minimizes pipeline stalls, enabling the
processor to execute instructions more efficiently and effectively, thereby
maximizing its throughput.
3. Dependency Management: By managing data and control dependencies, instruction
scheduling minimizes the impact of hazards, improving the overall execution time
of the program.
4. Improved Parallelism: Optimal instruction scheduling facilitates better exploitation
of instruction-level parallelism, enabling the processor to execute multiple
75
Compiler Design (3170701) 210210107052
instructions simultaneously.
5. Enhanced Performance: Effective instruction scheduling results in faster execution
times and improved overall performance of the compiled code, leading to enhanced
system responsiveness and efficiency.
3. What are the constraints that need to be considered in solving the basic block
instruction scheduling problem?
• In solving the basic block instruction scheduling problem, several constraints need to be
considered to ensure that the optimized schedule complies with the dependencies and
limitations of the underlying hardware. These constraints include:
• Data Dependencies: Dependencies arise from the data flow between instructions, such as
read-after-write (RAW), write-after-read (WAR), and write-after-write (WAW)
dependencies. Scheduling must ensure that instructions dependent on the results of
previous instructions are executed in the correct order.
• Resource Conflicts: These constraints involve managing the limited hardware resources,
including processor units, functional units, and memory access. Scheduling should avoid
conflicts that may occur due to resource sharing.
• Control Dependencies: Control flow instructions, such as branches and jumps, introduce
constraints that must be accounted for during scheduling to ensure correct program
execution and maintain the integrity of the program's control flow.
• Hardware Limitations: Various hardware-specific limitations, such as pipeline length,
pipeline stages, and latency, must be considered to avoid pipeline stalls and other
performance bottlenecks.
• Instruction Set Architecture (ISA) Constraints: Compliance with the specific ISA of the
target processor is essential to ensure that the scheduled instructions can be executed
correctly on the target hardware.
• Considering these constraints is critical for devising an efficient scheduling strategy that
optimizes the performance of the generated code without compromising the correctness
and functionality of the program.
76
Compiler Design (3170701) 210210107052
Suggested Reference:
1. https://dl.acm.org/doi/10.5555/1105634.1105652
2. https://www.worldcat.org/title/1032888564
• https://dl.acm.org/doi/10.5555/1105634.1105652
• https://www.researchgate.net/publication/221501045_Learning_basic_block_scheduling_h
euristics_from_optimal_data
Problem
Knowledge Documentati Presentation(2
Recognition Ethics(2)
Rubrics (2) on (2) ) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
77