0% found this document useful (0 votes)
31 views

Compiler Design

Lab Manual of Compiler Design GTU

Uploaded by

vcjadhav27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Compiler Design

Lab Manual of Compiler Design GTU

Uploaded by

vcjadhav27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Degree Engineering

A Laboratory Manual for

Compiler Design
(3170701)

[ B.E. (Computer Engineering) : Semester - 7 ]

Enrolment No 210210107052
Name Vivek Chetan Jadhav
Branch Computer Engineering
Academic Term 2024-25(ODD)
Institute Name Government Engineering College Bhavnagar

Directorate of Technical
Education,Gandhinagar,Gujarat
Government Engineering College, Bhavnagar
Computer Engineering Department

CERTIFICATE

This is to certify that Mr./Ms. _____________________________________________ Enrollment

No. _______________ of B.E. Semester VII Ifrom Computer Engineering Department of this

Institute (GTU Code: 021) has satisfactorily completed the Practical work for the subject Compiler

Design(3170701) for the academic year 2024-25..

Place: ___________

Date: ___________

Signature of Course Faculty Head of the Department


Compiler Design (3170701) 210210107052

Preface

Compiler Design is an essential subject for computer science and engineering students. It deals with
the theory and practice of developing a program that can translate source code written in one
programming language into another language. The main objective of this subject is to teach students
how to design and implement a compiler, which is a complex software system that converts high-
level language code into machine code that can be executed on a computer.The design of compilers
is an essential aspect of computer science, as it helps in bridging the gap between human-readable
code and machine-executable code.

This lab manual is designed to help students understand the concepts of compiler design and
develop hands-on skills in building a compiler. The manual provides step-by-step instructions for
implementing a simple compiler using C and other applicable programming language, covering all
the essential components such as lexical analyzer, parser, symbol table, intermediate code
generator, and code optimizer.

The manual is divided into several sections, each focusing on a specific aspect of compiler design.
The first section provides an introduction to finite automata, phases of compiler and covering the
basic concepts of lexical analysis. The subsequent sections cover parsing, code generation, and
study of Learning Basic Block Scheduling. Each section includes detailed instructions for
completing the lab exercises and programming assignments, along with examples and code
snippets.

The lab manual also includes a set of challenging programming assignments and quizzes that will
help students test their understanding of the subject matter. Additionally, the manual provides a list
of recommended books and online resources for further study.

This manual is intended for students studying Compiler Design and related courses. It is also useful
for software developers and engineers who want to gain a deeper understanding of compiler design
and implementation. We hope that this manual will be a valuable resource for students and
instructors alike and will contribute to the learning and understanding of compiler design.
Compiler Design (3170701) 210210107052

OBJECTIVE:

This laboratory course is intended to make the students experiment on the basic
techniques of compiler construction and tools that can used to perform syntax-
directed translation of a high-level programming language into an executable code.
Students will design and implement language processors in C by using tools to
automate parts of the implementation process. This will provide deeper insights into
the more advanced semantics aspects of programming languages, code generation,
machine independent optimizations, dynamic memory allocation, and object
orientation.

OUTCOMES:

Upon the completion of Compiler Design practical course, the student will be able
to:

1. Understand the working of lex and yacc compiler for debugging of programs.
2. Understand and define the role of lexical analyzer, use of regular expression and
transition diagrams.
3. Understand and use Context free grammar, and parse tree construction.
4. Learn & use the new tools and technologies used for designing a compiler.
5. Develop program for solving parser problems.
6. Learn how to write programs that execute faster.
Compiler Design (3170701) 210210107052

DTE’s Vision

● To provide globally competitive technical education


● Remove geographical imbalances and inconsistencies
● Develop student friendly resources with a special focus on girls’ education and support to
● weaker sections
● Develop programs relevant to industry and create a vibrant pool of technical professionals

Institute’s Vision

● To transform students into good human beings, responsible citizens and employable
engineering graduates through imbibing human values and excellence in technical
education.

Institute’s Mission

● To educate students from the local and rural areas, so they become enlightened individuals,
improving the living standards of their families, industry and society. We will provide
individual attention, quality education and take care of character building.

Department’s Vision

● To achieve excellence for providing value based education in computer science and
Information Technology through innovation, team work, and ethical practices.

Department’s Mission

● To produce graduates according to the need of industry, government, society and scientific
community and to develop partnership with industries, government agencies and R & D
Organizations for knowledge sharing and overall development of faculties and students.
● To motivate students/graduates to be entrepreneurs.
● To motivate students to participate in reputed conferences, workshops, symposiums,
seminars and related technical activities.
● To impart human and ethical values in our students for better serving of society.
Compiler Design (3170701) 210210107052

Programme Outcomes (POs)

1. Engineering knowledge: Apply the knowledge of mathematics, science,engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, review research literature, and analyzecomplex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineeringproblems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledgeand research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques,resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextualknowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professionalengineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics andresponsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as amember or leader
in diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activitieswith the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
11. Project management and finance: Demonstrate knowledge and understandingof the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and abilityto engage in
independent and life-long learning in the broadest context of technological change.
Compiler Design (3170701) 210210107052

Program Specific Outcomes (PSOs)

● Sound knowledge of fundamentals of computer science and engineering including software


and hardware.
● Develop the software using sound software engineering principles having web based/mobile
based interface.
● Use various tools and technology supporting modern software frameworks for solving
problems having large volume of data in the domain of data science and machine learning.

Program Educational Objectives (PEOs)

● Possess technical competence in solving real life problems related to Computing.


● Acquire good analysis, design, development, implementation and testing skills to formulate
simple computing solutions to the business and societal needs.
● Provide requisite skills to pursue entrepreneurship, higher studies, research, and
development and imbibe high degree of professionalism in the fields of computing.
● Embrace life-long learning and remain continuously employable.
● Work and excel in a highly competence supportive, multicultural and professional
environment which abiding to the legal and ethical responsibilities.
Compiler Design (3170701) 210210107052

Practical – Course Outcome matrix

Course Outcomes (COs)


CO_3170701. Understand the basic concepts; ability to apply automata theory and
1 knowledge on formal languages.
Ability to identify and select suitable parsing strategies for a compiler for
CO_3170701.
various cases. Knowledge in alternative methods (top-down or bottom-up,
2
etc).
CO_3170701. Understand backend of compiler: intermediate code, Code optimization
3 Techniques and Error Recovery mechanisms
CO_3170701. Understand issues of run time environments and scheduling for instruction
4 level parallelism.
Compiler Design (3170701) 210210107052

Sr. C C C C
No Title of experiment O O O O
. 1 2 3 4
Implementation of Finite Automata and String Validation.
1. √

Introduction to Lex Tool. Implement following Programs


Using Lex:
a. Generate Histogram of words
2. √
b. Caesar Cypher
c. Extract single and multiline comments from C
Program
Implement following Programs Using Lex:
a. Convert Roman to Decimal
3. b. Check weather given statement is compound or √
simple
c. Extract html tags from .html file

4. Introduction to YACC and generate Calculator Program.
Implement a program for constructing √
5. a. LL(1) Parser
b. Predictive Parser
Implement a program for constructing √
6. a. Recursive Decent Parser (RDP)
b. LALR Parser
Implement a program for constructing Operator √
7.
Precedence Parsing.
Generate 3-tuple intermediate code for given infix √
8.
expression.
Extract Predecessor and Successor from given Control √
9.
Flow Graph.
Study of Learning Basic Block Scheduling Heuristics
10. √
from Optimal Data.

Industry Relevant Skills

Compiler Design is a vital subject in computer science and engineering that focuses on the
design and implementation of compilers. Here are some industry-relevant skills that students
can develop while studying Compiler Design:
● Proficiency in programming languages: A good understanding of programming languages
is essential for building compilers. Students should be proficient in programming
languages such as C/C++, Java, and Python.
● Knowledge of data structures and algorithms: Compiler Design involves the
implementation of various data structures and algorithms. Students should have a good
understanding of data structures such as stacks, queues, trees, and graphs, and algorithms
such as lexical analysis, parsing, and code generation.
● Familiarity with compiler tools: Students should be familiar with compiler tools such as
Compiler Design (3170701) 210210107052

Lex and Yacc. These tools can help automate the process of creating a compiler, making
it more efficient and error-free.
● Debugging skills: Debugging is an essential skill for any programmer, and it is particularly
important in Compiler Design. Students should be able to use debugging tools to find and
fix errors in their code.
● Optimization techniques: Code optimization is a critical component of Compiler Design.
Students should be familiar with optimization techniques such as constant folding, dead
code elimination, and loop unrolling, which can significantly improve the performance of
the compiled code.
● Collaboration and communication skills: Compiler Design is a complex subject that
requires collaboration and communication between team members. Students should
develop good communication and collaboration skills to work effectively with their peers
and instructors.
By developing these industry-relevant skills, students can become proficient in Compiler
Design and be better equipped to meet the demands of the industry.

Guidelines forFaculty members

1. Teacher should provide the guideline with demonstration of practical to the students with
all features.
2. Teacher shall explain basic concepts/theory related to the experiment to the students before
starting of each practical
3. Involve all the students in performance of each experiment.
4. Teacher is expected to share the skills and competencies to be developed in the students
and ensure that the respective skills and competencies are developed in the students after
the completion of the experimentation.
5. Teachers should give opportunity to students for hands-on experience after the
demonstration.
6. Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by concerned industry.
7. Give practical assignment and assess the performance of students based on task assigned
to check whether it is as per the instructions or not.
8. Teacher is expected to refer complete curriculum of the course and follow the guidelines
for implementation.

Instructions for Students

1. Students are expected to carefully listen to all the theory classes delivered by the faculty
members and understand the COs, content of the course, teaching and examination scheme, skill
set to be developed etc.
2. Students will have to perform experiments considering C or other applicable programming
language using Lex tool or Yacc.
3. Students are instructed to submit practical list as per given sample list shown on next
page.Students have to show output of each program in their practical file.
4. Student should develop a habit of submitting the experimentation work as per the schedule and
Compiler Design (3170701) 210210107052

she/he should be well preparedfor the same.

Common Safety Instructions

Students are expected to

1. Handle equipment with care: When working in the lab, students should handle equipment
and peripherals with care. This includes using the mouse and keyboard gently, avoiding
pulling or twisting network cables, and handling any hardware devices carefully.
2. Avoid water and liquids: Students should avoid using wet hands or having any liquids near
the computer equipment. This will help prevent damage to the devices and avoid any safety
hazards.
3. Shut down the PC properly: At the end of the lab session, students should shut down the
computer properly. This includes closing all programs and applications, saving any work,
and following the correct shutdown procedure for the operating system.
4. Obtain permission for laptops: If a student wishes to use their personal laptop in the lab, they
should first obtain permission from the Lab Faculty or Lab Assistant. They should follow
all lab rules and guidelines and ensure that their laptop is properly configured for the lab
environment.
Compiler Design (3170701) 210210107052

Index
(Progressive Assessment Sheet)
Sr. Title of experiment Pag Date of Date of Asse Sign. of Rem
No e performan submissi ssme Teacher arks
. No. ce on nt with
Mar date
ks
1 Implementation of Finite Automata and String
Validation.
2 Introduction to Lex Tool. Implement following
Programs Using Lex:
a. Generate Histogram of words
b. Caesar Cypher
c. Extract single and multiline comments
from C Program
3 Implement following Programs Using Lex:
a. Convert Roman to Decimal
b. Check weather given statement is
compound or simple
c. Extract html tags from .html file
4 Introduction to YACC and generate Calculator
Program.
5 Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser
6 Implement a program for constructing
a. Recursive Decent Parser (RDP)
b. LALR Parser
7 Implement a program for constructing Operator
Precedence Parsing.
8 Generate 3-tuple intermediate code for given infix
expression.
9 Extract Predecessor and Successor from given
Control Flow Graph.
10 Study of Learning Basic Block Scheduling
Heuristics from Optimal Data.
Total
Compiler Design (3170701) 210210107052

Experiment No - 1
Aim: To study and implement Finite Automata and validate strings using it.

Date:

Competency and PracticalSkills: Understanding and implementing Finite Automata, string


validation

Relevant CO: CO1

Objectives:
1. To understand the concept of Finite Automata.
2. To implement Finite Automata using programming language.
3. To validate strings using Finite Automata.

Software/Equipment: Computer system, Text editor, Programming language.

Theory:

Finite Automata is a mathematical model that consists of a finite set of states and a set of
transitions between these states. It is used to recognize patterns or validate strings. In a Finite
Automata, there are five components:
1. A set of states
2. An input alphabet
3. A transition function
4. A start state
5. A set of final (or accepting) states

In the implementation of Finite Automata and string validation, we need to create a Finite
Automata that recognizes a specific pattern or set of patterns. The Finite Automata consists of
states, transitions between the states, and a set of accepting states. The input string is then
validated by passing it through the Finite Automata, starting at the initial state, and following
the transitions until the string is either accepted or rejected.

String validation using Finite Automata is useful in a variety of applications, including pattern
matching, text processing, and lexical analysis in programming languages. It is an efficient
method for validating strings and can handle large inputs with minimal memory and time
complexity.

Example: Suppose a finite automaton which accepts even number of a's


where ∑ = {a, b, c}

1
Compiler Design (3170701) 210210107052

Solution:

q0 is the initial state.

Program:
Sample Program-1: Create a program in Python that implements a Finite Automata to
validate strings that start with 'a' and end with 'b'.

Code:
# Define the Finite Automata
states = {'q0', 'q1', 'q2'}
alphabet = {'a', 'b'}
start_state = 'q0'

2
Compiler Design (3170701) 210210107052

accept_states = {'q2'}
transitions = {
('q0', 'a'): 'q1',
('q1', 'a'): 'q1',
('q1', 'b'): 'q2'
}

# Validate a string using the Finite Automata


def validate_string(string):
current_state = start_state
for char in string:
if (current_state, char) not in transitions:
return False
current_state = transitions[(current_state, char)]
return current_state in accept_states

# Test the program with sample strings


strings_to_test = ['ab', 'aaab', 'abbb', 'a', 'b']
for string in strings_to_test:
if validate_string(string):
print(string, "is accepted")
else:
print(string, "is rejected")

Sample Program-2: Implement a program to search a pattern in a string using finite


automata
Code:
# Define the Finite Automata
def compute_transition_function(pattern, alphabet):
m = len(pattern)
transitions = {}
for q in range(m + 1):
for a in alphabet:
k = min(m + 1, q + 2)
while k > 0 and not (pattern[:q] + a).endswith(pattern[:k-1]):
k -= 1
transitions[(q, a)] = k
return transitions

# Search for pattern in text using Finite Automata


def search(pattern, text):
alphabet = set(text)
transitions = compute_transition_function(pattern, alphabet)
q=0
for i, char in enumerate(text):
q = transitions[(q, char)]
if q == len(pattern):
return i - len(pattern) + 1
return -1

3
Compiler Design (3170701) 210210107052

# Test the program with user input


pattern = input("Enter the pattern to search for: ")
text = input("Enter the text to search in: ")
index = search(pattern, text)
if index != -1:
print("Pattern found at index", index)
else:
print("Pattern not found")

Observations and Conclusion:

Sample Program-1:

In this implementation of Finite Automata and string validation, I have learned how to create
a Finite Automata that recognizes a specific pattern or set of patterns, and how to validate
strings using the Finite Automata. I have also learned how to implement a Finite Automata
using programming language, and how to test it with different inputs. By using Finite
Automata, I can efficiently validate strings and recognize patterns, making it a powerful tool
in computer science and related fields.

Sample Program-2:

In this example, the program searches for the pattern “abacaba” in the text
abcabacabacabacaba. It computes the Finite Automata using the compute_transition_function
function, and then uses it to search for the pattern in the text using the search function. It outputs
that the pattern is found at index 3.

Quiz:

1. What is a Finite Automata?


Ans:
• Finite automata is an abstract computing device. It is a mathematical model of a system

4
Compiler Design (3170701) 210210107052

with discrete inputs, outputs, states and a set of transitions from state to state that occurs
on input symbols from the alphabet Σ.
• Also it is used to analyze and recognize Natural language Expressions. The finite
automata or finite state machine is an abstract machine that has five elements or tuples.

2. What are the five components of a Finite Automata?


Ans:
• A finite automaton M is a 5-tuple (Q, q0,A,∑δ), where
Q is a finite set of states,
q0 ∈ Q is the start state,
A ⊆ Q is a notable set of accepting states,
∑ is a finite input alphabet,
δ is a function from Q x ∑ into Q called the transition function of M.

3. How a Finite Automata implemented using a programming language?


Ans:
def accept_even_as(input_string):
# Define the states of the finite automaton
states = {
'q0': {'a': 'q1', 'b': 'q0', 'c': 'q0'},
'q1': {'a': 'q0', 'b': 'q1', 'c': 'q1'}
}

# Initialize the current state


current_state = 'q0'

# Iterate over each character in the input string


for char in input_string:
if char in states[current_state]:
current_state = states[current_state][char]
else:
# If the character is not in the alphabet, transition to an error state
current_state = 'error'
break

# Check if the final state is q0 (even number of 'a's)


return current_state == 'q0'

# Example usage:
input_string = "abaca"
result = accept_even_as(input_string)
if result:
print("Accepted: The string has an even number of 'a's.")
else:
print("Rejected: The string does not have an even number of 'a's.")

5
Compiler Design (3170701) 210210107052

4. What is the process of building a finite automaton to search a pattern in a text string?
Ans:

• Define the Problem: Clearly specify the pattern you want to find in the text.
• Construct the Transition Table: Create a table that defines how the automaton transitions
between states based on input characters. Each state represents a partial match of the
pattern.
• Processing the Text: Start processing the input text, following transitions in the DFA.
When you reach an accepting state, you've found a match.
• Reporting Matches: Record the positions of matches as they occur in the text.
• Optimizations: Apply optimizations like failure functions for efficiency.
• Testing and Validation: Test the DFA with different patterns and texts to ensure it works
correctly.
• Application: Integrate the DFA into your application for pattern searching.
• This process allows efficient pattern matching in text strings using a finite automaton.

• Algorithm-
FINITE AUTOMATA (T, P)
State <- 0
for l <- 1 to n
State <- δ(State, ti)
If State == m then
Match Found
end
end

5. What are the advantages of using a finite automaton to search for patterns in text
strings?
Ans:
• Using a finite automaton (specifically a deterministic finite automaton or DFA) to search
for patterns in text strings offers several advantages:
• Efficiency: DFAs can perform pattern matching in linear time with respect to the length
of the text. This efficiency makes them well-suited for large-scale text processing tasks.
• Constant Space: DFAs have a fixed memory footprint regardless of the size of the pattern
or text. This means they are memory-efficient and can be used for real-time applications
and with limited resources.
• Determinism: DFAs provide deterministic behavior, ensuring that for a given pattern and

6
Compiler Design (3170701) 210210107052

text, the search results are consistent and predictable.


• Parallelism: The nature of DFA operations allows for easy parallelization, making them
suitable for high-performance computing environments.
• Flexibility: DFAs can be constructed for a wide range of pattern-matching tasks,
including exact matching, substring searching, and more complex text processing
requirements.
• Multiple Patterns: DFAs can be extended to efficiently search for multiple patterns
simultaneously, making them useful in applications like keyword searching and virus
scanning.
• Pattern Preprocessing: DFAs can be preprocessed before searching, enabling efficient
pattern searches for multiple texts without the need to recreate the automaton for each
text.
• Optimizations: Various optimization techniques, such as the use of failure functions (e.g.,
in the Knuth-Morris-Pratt algorithm) or the Aho-Corasick algorithm for multiple pattern
matching, can further improve the search performance.
• Low Overhead: The overhead associated with using DFAs is typically low, making them
suitable for embedded systems and resource-constrained environments.
• Determining Positions: DFAs can easily determine the positions in the text where
matches occur, which is important in many text processing applications.
• Language Recognition: DFAs are fundamental in the theory of formal languages and can
be used for tasks beyond pattern matching, such as parsing and syntax analysis.
• Well-Defined Theory: DFAs are well-studied and have a strong theoretical foundation,
which makes them a reliable choice for pattern matching.

Suggested Reference:

1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,


Rajeev Motwani, and Jeffrey D. Ullman.
2. https://www.youtube.com/watch?v=58N2N7zJGrQ&list=PLBlnK6fEyqRgp46KUv4ZY6
9yXmpwKOIev
3. GeeksforGeeks: Finite Automata Introduction -
https://www.geeksforgeeks.org/introduction-of-finite-automata/

References used by the students:

1. https://www.tutorialspoint.com/what-is-finite-automata

Rubric wise marks obtained:

Problem
Knowledge Implementat Testing & Creativity in
Recognition
Rubrics (2) ion (2) Debugging(2) logic/code (2) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

7
Compiler Design (3170701) 210210107052

Experiment No - 2
Aim: Introduction to Lex Tool. Implement following Programs Using Lex
a. Generate Histogram of words
b. Caesar Cypher
c. Extract single and multiline comments from C Program

Date:

Competency and PracticalSkills: Understanding of Lex tool and its usage in compiler design,
understanding of regular expressions and data structures, improving programming skill to
develop programs using lex tool

Relevant CO: CO1

Objectives:
1. To introduce students to Lex tool and its usage in compiler design
2. To provide practical knowledge of regular expressions and their use in pattern matching
3. To enhance students' understanding of data structures such as arrays, lists, and trees
4. To develop students' problem-solving skills in developing and implementing programs
using Lex tool
5. To develop students' debugging skills to identify and resolve program errors and issues

Software/Equipment: Computer system, Text editor, Lex tool, C compiler, Terminal or


Command prompt.

Theory:

❖ COMPILER:
• A compiler is a translator that converts the high-level language into the machine language.
• High-level language is written by a developer and machine language can be understood by
the processor. Compiler is used to show errors to the programmer.
• The main purpose of a compiler is to change the code written in one language without
changing the meaning of the program.
• When you execute a program which is written in HLL programming language then it
executes into two parts.
• In the first part, the source program compiled and translated into the object program (low
level language).
• In the second part, the object program translated into the target program through the
assembler.

8
Compiler Design (3170701) 210210107052

❖ LEX:
• Lex is a program that generates lexical analyzers. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of tokens.
• It reads the input stream and produces the source code as output through implementing the
lexical analyzer in the C program.
• During the first phase the compiler reads the input and converts strings in the source to
tokens.
• With regular expressions we can specify patterns to lex so it can generate code that will
allow it to scan and match strings in the input. Each pattern specified in the input to lex has
an associated action.
• Typically an action returns a token that represents the matched string for subsequent use by
the parser. Initially we will simply print the matched string rather than return a token value.

⮚ Function of LEX:
• Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler
runs the lex.1 program and produces a C program lex.yy.c.
• Finally C compiler runs the lex.yy.c program and produces an object program a.out.
• a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.

⮚ LEX File Format:


• A Lex program is separated into three sections by %% delimiters. The formal of Lex source
is as follows:
%{ definitions %}
%%
{ rules }
%%
{ user subroutines }
• Definitions include declarations of constant, variable and regular definitions.

9
Compiler Design (3170701) 210210107052

• Rules define the statement of form p1 {action1} p2 {action2}. pn {action}.


• Where pi describes the regular expression and action1 describes the actions the lexical
analyzer should take when pattern pi matches a lexeme.
• User subroutines are auxiliary procedures needed by the actions. Thesubroutine can be
loaded with the lexical analyzer and compiled separately.

❖FLEX: Fast Lexical Analyzer Generator


• FLEX is a tool/computer program for generating lexical analyzers (scanners or
lexers)written by Vern Paxson in C around 1987. It is used together with Berkeley Yacc
parser generator or GNU Bison parser generator. Flex and Bison both are more flexible
than Lex and Yacc and produce faster code.
• Bison produces parsers from the input file provided by the user. The function yylex() is
automatically generated by the flex when it is provided with a .l file and this yylex()
function is expected by parser to call to retrieve tokens from current/this token stream.

STEPS:
• Step 1 : An input file describes the lexical analyzer to be generated named lex.l is written
in lex language. The lex compiler transforms lex.l to C program, in a file that is always
named lex.yy.c.
• Step 2 : The C compiler compile lex.yy.c file into an executable file called a.out.
• Step 3 : The output file a.out take a stream of input characters and produce a stream of
tokens.
• Program Structure:

In the input file, there are 3 sections:


Definition Section: The definition section contains the declaration of variables, regular
definitions, and manifest constants. In the definition section, text is enclosed in “%{ %}”
brackets. Anything written in this brackets is copied directly to the file lex.yy.c
Syntax:
%{
// Definitions
%}

Rules Section: The rules section contains a series of rules in the form: pattern action and
pattern must be unintended and action begin on the same line in {} brackets. The rule section
is enclosed in “%% %%”.
Syntax:
%%
pattern action
%%

User Code Section: This section contains C statements and additional functions. We can also
compile these functions separately and load with thelexical analyzer.

How to run the program:


To run the program, it should be first saved with the extension .l or .lex. Run the below
commands on terminal in order to run the program file.
• Step 1: lex filename.l or lex filename.lex depending on the extension file is saved with
name.l extension.

10
Compiler Design (3170701) 210210107052

• Step 2: gcc lex.yy.c


• Step 3: ./a.out
• Step 4: Provide the input to program in case it is required

Program:

Program-1: Generate Histogram of words

Code:
%{
#include<stdio.h>
#include<string.h>
#define MAX 1000
%}

/* Declarations */
int count = 0;
char words[MAX][MAX];

/* Rule Section */
%%

[a-zA-Z]+ {
int i, flag = 0;
for(i=0; i<count; i++) {
if(strcmp(words[i], yytext) == 0) {
flag = 1;
break;
}
}
if(flag == 0) {
strcpy(words[count++], yytext);
}
}

.;

%%

/* Code Section */
int main(int argc, char **argv) {
if(argc != 2) {
printf("Usage: ./a.out <filename>\n");
return 1;
}
FILE *fp = fopen(argv[1], "r");
if(fp == NULL) {
printf("Cannot open file!\n");
return 1;
}

11
Compiler Design (3170701) 210210107052

yyin = fp;
yylex();

int i, j;
printf("\nWord\t\tFrequency\n");
for(i=0; i<count; i++) {
int freq = 0;
rewind(fp);
while(fscanf(fp, "%s", words[MAX-1]) == 1) {
if(strcmp(words[MAX-1], words[i]) == 0) {
freq++;
}
}
printf("%-15s %d\n", words[i], freq);
}

fclose(fp);
return 0;
}

Program-2: Caesar Cypher

Code:
%{
#include<stdio.h>
int shift;
%}

%%
[a-z] {char ch = yytext[0];
ch += shift;
if (ch> 'z') ch -= 26;
printf ("%c" ,ch );
}
[A-Z] { char ch = yytext[0] ;
ch += shift;
if (ch> 'Z') ch -= 26;
printf("%c",ch);
}
. {exit(0);}
%%

int main()
{
printf("Enter an no. of alphabet to shift: \n");

scanf("%d", &shift);
printf("Enter the string: \n");
yylex();

12
Compiler Design (3170701) 210210107052

return 0;
}
int yywrap(){
return 1;
}

Program-3: Extract single and multiline comments from C Program

Code :
%{
#include <stdio.h>
%}

/* Define patterns for single-line and multi-line comments */


%option noyywrap

%%

\/\/[^\n]* { printf("Single-line comment: %s\n", yytext); }

\/\*([^*]|\*+[^/])*\*+\/ { printf("Multi-line comment: %s\n", yytext);

%%

/* Main function */
int main(int argc, char **argv) {
if(argc != 2) {
printf("Usage: %s <filename>\n", argv[0]);
return 1;
}

FILE *file = fopen(argv[1], "r");


if(!file) {
printf("Cannot open file %s\n", argv[1]);
return 1;
}

yyin = file; /* Set the input file for lex */


yylex(); /* Start lexing */

fclose(file); /* Close the input file */


return 0;
}

Observations and Conclusion:

Program-1:

13
Compiler Design (3170701) 210210107052

$ lex histogram.l
$ cc lex.yy.c -o histogram -ll
$ ./histogram sample.txt

In this program, I have implemented a histogram of words using lex tool.The program counts
the frequency of each word in a given input file.It uses an array words to store all the distinct
words and counts the frequency of each word by iterating through the words array and
comparing it with the input file.The program also checks for errors such as invalid input
file.This program can be used to analyze the most frequent words in a text file or a
document.This program can be extended to handle large files by implementing a dynamic array
to store the distinct words instead of a fixed size array.

Program-2:

lex CeasarCypher.l
cc lex.yy.c
a.exe

Input File:

Output:

14
Compiler Design (3170701) 210210107052

This Lex program implements a simple Caesar cipher encryption scheme for alphabetic
characters. It takes an integer input shift to determine the number of positions to shift each
alphabetic character. During lexical analysis of the input string, it identifies lowercase and
uppercase letters, shifts them by the specified amount while maintaining their case, and prints
the encrypted result. The program prompts the user to input both the shift value and the string
to be encrypted. It provides a basic demonstration of character manipulation within Lex,
offering a practical tool for encoding text with a variable Caesar cipher.

Program-3:

lex ExtractComment.l
lex.yy.c
a.exe input.c

Input File:

Output File:

This Lex program is designed for extracting comments from a C source code file. It performs
lexical analysis on the input file ("input.c") and identifies both single-line (//) and multi-line
(/* ... */) comments. The program utilizes regular expressions to recognize and extract
comments, printing them to an output file ("out.c"). It is intended for analyzing and
documenting comments within C code, offering a practical tool for code analysis and
documentation generation. The code provides straightforward input and output handling,
making it suitable for basic comment extraction tasks.

Quiz:

1. What is Lex tool used for?


Ans:
• The term "Lex tool" is often used to refer to a lexical analyzer generator tool known as
"Lex." Lex is a program used in the field of computer science and compiler construction.
Its primary purpose is to generate lexical analyzers, which are components of a compiler
or interpreter.

15
Compiler Design (3170701) 210210107052

• Here's what Lex is used for:


1. Lexical Analysis: Lexical analysis is the first phase of compiling a program written
in a programming language. The role of Lex is to take the source code of a program
and break it down into a sequence of tokens. Tokens are the smallest units of meaning
in a programming language, such as keywords, identifiers, literals, and operators. Lex
generates code for recognizing and categorizing these tokens based on user-defined
patterns and rules.
2. Compiler and Interpreter Construction: Lex is often used in conjunction with other
tools like Yacc (Yet Another Compiler Compiler) to build compilers and interpreters
for programming languages. Lexical analyzers generated by Lex are essential for
parsing and understanding the structure of the source code.
3. Text Processing: While Lex is primarily designed for use in compiler construction, it
can also be applied to other text processing tasks where pattern matching and
tokenization are required. It can be useful for tasks like processing log files,
extracting specific information from text, or creating custom scripting languages.

2. What is the purpose of the "Generate Histogram of words" program in Lex?


Ans:
• A "Generate Histogram of words" program in Lex serves the purpose of analyzing a text
document and creating a histogram or frequency distribution of words within that
document. Here's why such a program can be useful:
1. Word Frequency Analysis: The program counts the occurrences of each unique word
in the text. This analysis helps to identify which words are used most frequently in
the document.
2. Content Summary: By creating a word frequency histogram, you can get a quick
overview of the document's content. The most frequently occurring words often
represent the document's main topics or themes.
3. Keyword Extraction: For information retrieval or search engine optimization (SEO)
purposes, identifying keywords based on their frequency can be important. The words
that appear most often may be relevant keywords.
4. Text Cleaning: The program can also be used as a preprocessing step to clean and
preprocess text data. For example, you can use it to remove common stop words (e.g.,
"the," "and," "in") from the text before further analysis.
5. Content Analysis: Word frequency histograms are valuable in content analysis and
research. Researchers can use them to identify patterns, trends, and important terms
in a collection of documents.
6. Comparative Analysis: You can compare word frequency histograms across different
documents or versions of a document to track changes in word usage over time or to
compare the content of multiple texts.
7. Data Visualization: Histograms are a visual way to represent data, making it easier to
understand and interpret word frequency distributions.

3. Which program in Lex is used for encrypting text?


Ans:
• Lex is primarily a tool for generating lexical analyzers for text processing and is not
typically used for encryption or cryptography. Encryption involves the secure
transformation of plaintext into ciphertext to protect the confidentiality and integrity of
data.

16
Compiler Design (3170701) 210210107052

• To perform text encryption, you would typically use encryption libraries or cryptographic
tools available in programming languages like Python, Java, C++, or specialized
cryptographic software like OpenSSL. These libraries and tools offer various encryption
algorithms and methods for secure data encryption.
• Common encryption algorithms include:
1. AES (Advanced Encryption Standard): A widely used symmetric-key encryption
algorithm.
2. RSA: An asymmetric-key encryption algorithm often used for secure communication
and digital signatures.
3. DES (Data Encryption Standard): An older symmetric-key encryption algorithm,
now considered relatively weak.
4. Triple DES (3DES): A symmetric-key encryption algorithm that applies DES three
times for increased security.
5. Blowfish: A symmetric-key encryption algorithm known for its speed and security.
6. Public Key Cryptography: Asymmetric-key encryption algorithms like RSA and
ECC (Elliptic Curve Cryptography) are used for public key encryption.

4. What is the purpose of the "Extract single and multiline comments from C Program"
program in Lex?
Ans:
• A "Extract single and multiline comments from C Program" program written in Lex
serves the purpose of parsing a C programming language source code file and extracting
both single-line comments (e.g., // comment) and multi-line comments (e.g., /* comment
*/) from the code. Here's why such a program can be useful:
1. Documentation: It retrieves comments, which contain explanations and
documentation for the code, aiding developers in understanding the code's
functionality.
2. Code Analysis: By extracting comments, it assists code analysis tools in identifying
code sections that require attention, such as bug reports or code review requests.
3. Documentation Generation: Extracted comments can be used to automatically
generate documentation, improving code documentation and maintainability.

5. How does the Caesar Cypher program work?


Ans:
• The Caesar cipher, also known as the Caesar shift or Caesar code, is one of the simplest
and oldest methods of encryption. It's a type of substitution cipher where each letter in
the plaintext is shifted a certain number of places down or up the alphabet. This shift
value is known as the "key" or "shift."
• Here's how the Caesar cipher program works:
1. Choose a Shift Value: The first step is to decide on a shift value (the key) that
determines how much each letter in the plaintext will be shifted. For example, if the
key is 3, then "A" would become "D," "B" would become "E," and so on.
2. Input the Plain Text: The program takes the plain text message that you want to
encrypt as input. The plain text should consist of letters (uppercase or lowercase) and
possibly other characters like spaces or punctuation, depending on the
implementation.
3. Encrypt the Text: For each letter in the plain text, the program applies the Caesar
cipher by shifting the letter by the specified key. The shifting is done by moving the

17
Compiler Design (3170701) 210210107052

letter a certain number of positions forward or backward in the alphabet, depending


on whether you're encrypting (forward shift) or decrypting (backward shift).
4. If the letter is uppercase, it will remain uppercase after encryption.
5. If the letter is lowercase, it will remain lowercase after encryption.
6. Non-alphabet characters (like spaces or punctuation) are typically left unchanged in
most implementations.
7. Output the Cipher Text: The program generates the cipher text, which is the result of
the encryption process. This cipher text can then be transmitted or stored securely, as
it is less readable without knowledge of the key.
8. Decryption (Optional): If you want to decrypt the cipher text, you would use the same
key but apply it in the opposite direction. For example, if you encrypted with a key
of 3, you would decrypt by shifting each letter 3 positions backward in the alphabet.

Suggested Reference:

1. "Flex & Bison: Text Processing Tools" by John Levine


2. "Lex and Yacc" by Tom Niemann
3. "Introduction to Compiler Construction with Unix" by Axel T. Schreiner
4. https://www.youtube.com/watch?v=ArtBEUvS3PI
5. Lex - A Lexical Analyzer Generator. Retrieved from
https://www.gnu.org/software/flex/manual
References used by the students:

1. https://www.geeksforgeeks.org/caesar-cipher-in-cryptography/
2. https://www.geeksforgeeks.org/lex-program-to-count-the-frequency-of-the-given-word-in-a-file/

Understandi Problem Completeness


Logic
ng of Lex Recognitio and accuracy
Building Ethics (2)
tool (2) n (2)
Rubrics (2) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

18
Compiler Design (3170701) 210210107052

Experiment No - 3
Aim: Implement following Programs Using Lex
a. Convert Roman to Decimal
b. Check weather given statement is compound or simple
c. Extract html tags from .html file

Date:

Competency and PracticalSkills: Understanding of Lex tool and its usage in compiler design,
understanding of regular expressions and data structures, improving programming skill to
develop programs using lex tool

Relevant CO: CO1

Objectives:
1. To introduce students to Lex tool and its usage in compiler design
2. To provide practical knowledge of regular expressions and their use in pattern matching
3. To enhance students' understanding of data structures such as arrays, lists, and trees
4. To develop students' problem-solving skills in developing and implementing programs
using Lex tool
5. To develop students' debugging skills to identify and resolve program errors and issues

Software/Equipment: Computer system, Text editor, Lex tool, C compiler, Terminal or


Command prompt.

Theory:

❖ LEX:
• Lex is a program that generates lexical analyzers. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of tokens.
• It reads the input stream and produces the source code as output through implementing the
lexical analyzer in the C program.
• During the first phase the compiler reads the input and converts strings in the source to
tokens.
• With regular expressions we can specify patterns to lex so it can generate code that will
allow it to scan and match strings in the input. Each pattern specified in the input to lex has
an associated action.
• Typically an action returns a token that represents the matched string for subsequent use by
the parser. Initially we will simply print the matched string rather than return a token value.

⮚ Function of LEX:
• Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler
runs the lex.1 program and produces a C program lex.yy.c.
• Finally C compiler runs the lex.yy.c program and produces an object program a.out.
• a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.

⮚ LEX File Format:

19
Compiler Design (3170701) 210210107052

• A Lex program is separated into three sections by %% delimiters. The formal of Lex source
is as follows:
%{ definitions %}
%%
{ rules }
%%
{ user subroutines }
• Definitions include declarations of constant, variable and regular definitions.
• Rules define the statement of form p1 {action1} p2 {action2}. pn {action}.
• Where pi describes the regular expression and action1 describes the actions the lexical
analyzer should take when pattern pi matches a lexeme.
• User subroutines are auxiliary procedures needed by the actions. Thesubroutine can be
loaded with the lexical analyzer and compiled separately.

❖ FLEX: Fast Lexical Analyzer Generator


• FLEX is a tool/computer program for generating lexical analyzers (scanners or
lexers)written by Vern Paxson in C around 1987. It is used together with Berkeley Yacc
parser generator or GNU Bison parser generator. Flex and Bison both are more flexible
than Lex and Yacc and produce faster code.
• Bison produces parsers from the input file provided by the user. The function yylex() is
automatically generated by the flex when it is provided with a .l file and this yylex()
function is expected by parser to call to retrieve tokens from current/this token stream.

STEPS:
• Step 1 : An input file describes the lexical analyzer to be generated named lex.l is written
in lex language. The lex compiler transforms lex.l to C program, in a file that is always
named lex.yy.c.
• Step 2 : The C compiler compile lex.yy.c file into an executable file called a.out.
• Step 3 : The output file a.out take a stream of input characters and produce a stream of
tokens.
• Program Structure:

In the input file, there are 3 sections:


Definition Section: The definition section contains the declaration of variables, regular
definitions, and manifest constants. In the definition section, text is enclosed in “%{ %}”
brackets. Anything written in this brackets is copied directly to the file lex.yy.c
Syntax:
%{
// Definitions
%}

Rules Section: The rules section contains a series of rules in the form: pattern action and
pattern must be unintended and action begin on the same line in {} brackets. The rule section
is enclosed in “%% %%”.
Syntax:
%%
pattern action
%%

20
Compiler Design (3170701) 210210107052

User Code Section: This section contains C statements and additional functions. We can also
compile these functions separately and load with thelexical analyzer.

How to run the program:


To run the program, it should be first saved with the extension .l or .lex. Run the below
commands on terminal in order to run the program file.
• Step 1: lex filename.l or lex filename.lex depending on the extension file is saved with
name.l extension.
• Step 2: gcc lex.yy.c
• Step 3: ./a.out
• Step 4: Provide the input to program in case it is required

Program:

Program-1: Convert Roman to Decimal

Code:
%{
#include <stdio.h>

int decimal = 0; // to store the decimal value


int prev_value = 0; // to store the value of the previous Roman numeral
%}

/* Rules section */
%%
I { prev_value = 1; } // If the symbol is 'I', set prev_value to 1
IV { decimal += 3; } // If the symbol is 'IV', add 3 to the decimal value
V { decimal += 5; } // If the symbol is 'V', add 5 to the decimal value
IX { decimal += 8; } // If the symbol is 'IX', add 8 to the decimal value
X { decimal += 10; } // If the symbol is 'X', add 10 to the decimal value
XL { decimal += 30; } // If the symbol is 'XL', add 30 to the decimal value
L { decimal += 50; } // If the symbol is 'L', add 50 to the decimal value
XC { decimal += 80; } // If the symbol is 'XC', add 80 to the decimal value
C { decimal += 100; } // If the symbol is 'C', add 100 to the decimal value
CD { decimal += 300; } // If the symbol is 'CD', add 300 to the decimal value
D { decimal += 500; } // If the symbol is 'D', add 500 to the decimal value
CM { decimal += 800; } // If the symbol is 'CM', add 800 to the decimal value
M { decimal += 1000; } // If the symbol is 'M', add 1000 to the decimal value
. { printf("Invalid Roman numeral\n"); exit(1); } // If any other symbol is encountered,
exit with an error message
%%

/* Code section */
int main()
{
yylex();
printf("Decimal value: %d\n", decimal);
return 0;
}

21
Compiler Design (3170701) 210210107052

Program-2: Check whether the given statement is compound or simple.

Code:
%{
#include <stdio.h>
int flag = 0;
%}

%%
and|or|but|because|if|then|nevertheless { flag = 1; }
. ;
\n { return 0; }

%%

int main() {
printf("Enter the sentence:\n");
yylex();
if (flag == 0)
printf("Simple sentence\n");
else
printf("Compound sentence\n");
return 0;
}

int yywrap() {
return 1;
}

Program-3: Extract html tags from .html file

Code:
%{
#include<stdio.h>
%}
%%
\<[^>]*\> fprintf(yyout,"%s\n",yytext);
.|\n;
%%
int yywrap()
{ return 1; } int main()
{
yyin=fopen("html_page.html","r");
yylex(); return 0; }

Observations and Conclusion:

Program-1:

22
Compiler Design (3170701) 210210107052

A Lex program efficiently converts Roman numerals to decimal values by matching each
numeral symbol to its corresponding decimal equivalent while identifying and handling invalid
symbols. Lex simplifies this conversion process by defining rules for recognition and
conversion, making it a convenient tool for this task.

Program-2:

To summarize, a Lex program can effectively assess whether a provided statement is


compound or simple by examining its structure, including the presence of specific elements
like curly braces or newline characters. This program establishes rules for recognizing and
categorizing statements, allowing for a clear and concise means of differentiating them within
the associated program actions.

Program-3:

The Lex program is crafted to identify and extract HTML tags from an input HTML file. It

23
Compiler Design (3170701) 210210107052

relies on regular expressions to spot HTML tags enclosed within angle brackets ("<" and ">"),
effectively disregarding them. Furthermore, it filters out tabs, spaces, and newline characters
from the input data stream.

Quiz:
1. What is Lex tool?
• Lex is a program that generates lexical analyzer. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
• It reads the input stream and produces the source code as output through implementing
the lexical analyzer in the C program.

2. What is the purpose of Lex tool?


• The purpose of the Lex tool is to generate lexical analyzers (parsers) that tokenize input
text or code into meaningful units, making it easier to process and analyze structured
data, such as programming languages or configuration files.

3. What is the aim of the Roman to Decimal conversion program?


• The aim of the Roman to Decimal conversion program is to take a Roman numeral as
input and convert it into its equivalent decimal number. This program recognizes valid
Roman numeral symbols and their combinations and calculates the corresponding
decimal value. If an invalid symbol or format is encountered, the program may exit with
an error message. The program provides an efficient and automated way to perform this
conversion, simplifying the process of working with Roman numerals in various
applications.

4. How does the program check whether a given statement is compound or simple?
• The program checks whether a given statement is compound or simple by examining
its structure and the presence of control structures like loops or conditionals. If the
statement contains control structures, it is considered compound; otherwise, it's
considered simple.

5. What is the purpose of the program to extract HTML tags from an HTML file?
• The purpose of the program to remove HTML tags from an HTML file is to extract and
eliminate all HTML markup tags and their content, leaving behind only the plain text
content of the file. This program is useful for tasks that require processing or analyzing
the text within an HTML document without interference from HTML formatting and
structure.

6. Explain the Rule Section of a Lex program.


• The Rule Section of a Lex program defines patterns and corresponding actions for
recognizing and processing text. It consists of a series of rules in the format "pattern
action," where patterns are regular expressions describing text to match, and actions are
code blocks executed when a match occurs.

7. What is the purpose of the Definition Section in a Lex program?


• The purpose of the Definition Section in a Lex program is to define macros and regular
expression patterns that can be used throughout the program. It allows for the creation of
reusable and more readable patterns and simplifies the specification of lexical rules in

24
Compiler Design (3170701) 210210107052

the Rule Section.

8. How can you declare and initialize a variable in a Lex program?


• In a Lex program, you can declare and initialize variables using C code sections,
specifically within the "Definitions Section." Here's how you can declare and initialize
variables in a Lex program:
• Using the %{ and %} Delimiters: You can enclose C code within the %{ and %}
delimiters in the Definitions Section. This C code is typically placed at the beginning of
the Definitions Section and can include variable declarations and initializations.
• %{ #include <stdio.h> int token_count = 0; %}

9. How can you compile and execute a Lex program?


• First, create your Lex program file with a .l extension.
• Generate the Lexical Analyzer Code using: lex myprogram.l
• Compile the C Code using: gcc lex.yy.c
• Execute the Program using: a.exe

Suggested Reference:
1. Aho, A.V., Sethi, R., & Ullman, J.D. (1986). Compilers: Principles, Techniques, and
Tools.
Addison-Wesley.
2. Levine, J.R., Mason, T., & Brown, D. (2009). lex & yacc. O'Reilly Media, Inc.
3. Lex - A Lexical Analyzer Generator. Retrieved
from https://www.gnu.org/software/flex/manual/
4. Lexical Analysis with Flex. Retrieved from https://www.geeksforgeeks.org/flex-fast-
lexicalanalyzer-generator/
5. The Flex Manual. Retrieved from https://westes.github.io/flex/manual/

References used by the students:


1. https://www.geeksforgeeks.org/converting-roman-numerals-decimal-lying-1-3999/
2. https://www.vtupulse.com/lex-and-yacc/lex-program-to-simple-or-compound-sentence/
3. https://www.geeksforgeeks.org/lex-code-extract-html-tags-file/

Rubric wise marks obtained:


Understandi Problem Completeness
ng of Lex Recognition Logic and accuracy Ethics (2)
tool (2) Building (2)
Rubrics (2) (2) Total
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

25
Compiler Design (3170701) 210210107052

Experiment No - 4
Aim: Introduction to YACC and generate Calculator Program

Date:

Competency and Practical Skills:


• Understanding of YACC and its role in compiler construction
• Ability to write grammar rules and YAAC programs for a given language
• Ability to develop program using YACC

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept of YACC and its significance in compiler construction
⮚ Write grammar rules for a given language
⮚ Implement a calculator program using YACC

Software/Equipment: Windows/Linux Operating System, YACC Compiler, Text editor,


Command prompt or terminal

Theory:
YACC (Yet Another Compiler Compiler) is a tool that is used for generating parsers. It is used
in combination with Lex to generate compilers and interpreters. YACC takes a set of rules and
generates a parser that can recognize and process the input according to those rules.

The grammar rules that are defined using YACC are written in BNF (Backus-Naur Form)
notation. These rules describe the syntax of a programming language.

INPUT FILE:
→ The YACC input file is divided into three parts.
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....

Definition Part:
→ The definition part includes information about the tokens used in the syntax definition.

Rule Part:
→ The rules part contains grammar definition in a modified BNF form. Actions is C code in { }
and can be embedded inside (Translation schemes).

Auxiliary Routines Part:

26
Compiler Design (3170701) 210210107052

→ The auxiliary routines part is only C code.


→ It includes function definitions for every function needed in the rules part.
→ It can also contain the main() function definition if the parser is going to be run as a
program.
→ The main() function must call the function yyparse().

For Compiling YACC Program:


1. Write lex program in a file file.l and yacc in a file file.y
2. Open Terminal and Navigate to the Directory where you have saved the files.
3. type lex file.l
4. type yacc file.y
5. type cc lex.yy.c y.tab.h -ll
6. type ./a.out

The program for generating a calculator using YACC involves the following steps:
⮚ Defining the grammar rules for the calculator program
⮚ Writing the Lex code for tokenizing the input
⮚ Writing the YACC code for parsing the input and generating the output

Program:

1. Create a file named "calculator.l":

%{
#include<stdio.h>
#include<ctype.h>
int result;
%}

%%
[0-9]+ { yylval = atoi(yytext); return INTEGER; }
[ \t] ; /* skip whitespace */
\n { return EOL; }
. { return yytext[0]; }
%%

int yywrap(void) {
return 1;}

27
Compiler Design (3170701) 210210107052

2. Create a file named "calculator.y":

%{
#include<stdio.h>
%}

%token INTEGER EOL

%%

line: /* empty */
| line exp EOL { printf("= %d\n", $2); }
;

exp: INTEGER { $$ = $1; }


| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| '(' exp ')' { $$ = $2; }
;
%%

int main(void) {
yyparse();
return 0;}

void yyerror(char* s) {
fprintf(stderr, "error: %s\n", s);
}

Observations and Conclusion:

After executing the program, we observed that the calculator program was successfully
generated using YACC. It was able to perform simple arithmetic operations such as addition,
subtraction, multiplication, and division. The program was also able to handle negative
numbers and brackets.

28
Compiler Design (3170701) 210210107052

Quiz:

1. What is YACC?
• YACC, short for Yet Another Compiler Compiler, is a tool used for generating parsers
and compilers based on formal grammars.

2. What is the purpose of YACC?


• The purpose of YACC is to automate the creation of parsers for programming languages
or formal grammars, generating parsers in programming languages like C or C++ to
analyze and understand the syntax of input code.

3. What is the output of YACC?


• The primary output of YACC is a parser implemented in a programming language,
responsible for analyzing the structure and syntax of input code based on provided
grammar rules.

4. What is a syntax analyzer?


• A syntax analyzer, also known as a parser, checks the syntactic correctness of source
code by ensuring it adheres to the specified grammar rules of a programming language,
verifying the correct structure of the code.

5. What is the role of a lexical analyzer in YACC?


• The role of a lexical analyzer in YACC is to tokenize the input source code, breaking it
into meaningful units known as tokens. These tokens represent language elements like
keywords, identifiers, and literals. The lexical analyzer categorizes these tokens and
provides them to the YACC parser for further syntactic analysis, facilitating the
understanding and validation of the source code's structure.

Suggested Reference:
1. "Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
2. "The Unix Programming Environment" by Brian W. Kernighan and Rob

Pike References used by the students:

https://www.geeksforgeeks.org/introduction-to-yacc/
https://www.geeksforgeeks.org/yacc-program-to-implement-a-calculator-and-recognize-a-
valid-arithmetic-expression/

Rubric wise marks obtained:


Rubrics Understan Grammar Implement Testing & Ethics (2) Total
ding ng of Generation ation (2) Debugging
YAAC (2) (2) (2)

Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

29
Compiler Design (3170701) 210210107052

Experiment No - 5
Aim: Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser

Date:

Competency and PracticalSkills:


• Understanding Parsers and its role in compiler construction
• Ability to write first and follow for given grammar
• Ability to developLL(1) and predictive parser using top down parsing approach

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept parsers and its significance in compiler construction
⮚ Write first and follow set for given grammar
⮚ Implement a LL(1) and predictive grammar using top down parser

Software/Equipment: C compiler
Theory:
❖ LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done
from the Left to Right manner and the second L shows that in this parsing technique,
we are going to use the Left most Derivation Tree. And finally, the 1 represents the
number of look-ahead, which means how many symbols are you going to see when
you want to make a decision.
Essential conditions to check first are as follows:
1. The grammar is free from left recursion.
2. The grammar should not be ambiguous.
3. The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.

Algorithm to construct LL(1) Parsing Table:


Step 1: First check all the essential conditions mentioned above and go to step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
1. First(): If there is a variable, and from that variable, if we try to drive all the strings
then the beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the process of
derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each terminal
in Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε
in the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the Terminal

30
Compiler Design (3170701) 210210107052

Symbols. All the Null Productions of the Grammars will go under the Follow elements and
the remaining productions will lie under the elements of the First set.

❖ Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the
next input symbols. To make the parser back-tracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as LL(k) grammar.

Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and the
input is consumed. The parser refers to the parsing table to take any decision on the input and
stack element combination.

In recursive descent parsing, the parser may have more than one production to choose from for
a single instance of input, whereas in predictive parser, each step has at most one production
to choose. There might be instances where there is no production matching the input string,
making the parsing procedure to fail.

Program-1:
#include<stdio.h>
#include<string.h>

31
Compiler Design (3170701) 210210107052

#define TSIZE 128


// table[i][j] stores
// the index of production that must be applied on
// ith varible if the input is
// jth nonterminal
int table[100][TSIZE];
// stores all list of terminals
// the ASCII value if use to index terminals
// terminal[i] = 1 means the character with
// ASCII value is a terminal
char terminal[TSIZE];
// stores all list of terminals
// only Upper case letters from 'A' to 'Z'
// can be nonterminals
// nonterminal[i] means ith alphabet is present as
// nonterminal is the grammar
char nonterminal[26];
// structure to hold each production
// str[] stores the production
// len is the length of production
struct product {
char str[100];
int len;
}pro[20];
// no of productions in form A->ß
int no_pro;
char first[26][TSIZE];
char follow[26][TSIZE];
// stores first of each production in form A->ß
char first_rhs[100][TSIZE];
// check if the symbol is nonterminal
int isNT(char c) {
return c >= 'A' && c <= 'Z';
}
// reading data from the file
void readFromFile() {
FILE* fptr;
fptr = fopen("text.txt", "r");
char buffer[255];
int i;
int j;
while (fgets(buffer, sizeof(buffer), fptr)) {
printf("%s", buffer);
j = 0;
nonterminal[buffer[0] - 'A'] = 1;
for (i = 0; i < strlen(buffer) - 1; ++i) {
if (buffer[i] == '|') {
++no_pro;
pro[no_pro - 1].str[j] = '\0';
pro[no_pro - 1].len = j;

32
Compiler Design (3170701) 210210107052

pro[no_pro].str[0] = pro[no_pro - 1].str[0];


pro[no_pro].str[1] = pro[no_pro - 1].str[1];
pro[no_pro].str[2] = pro[no_pro - 1].str[2];
j = 3;
}
else {
pro[no_pro].str[j] = buffer[i];
++j;
if (!isNT(buffer[i]) && buffer[i] != '-' && buffer[i] != '>') {
terminal[buffer[i]] = 1;
}
}
}
pro[no_pro].len = j;
++no_pro;
}
}
void add_FIRST_A_to_FOLLOW_B(char A, char B)
{
int i;
for (i = 0; i < TSIZE; ++i)
{
if (i != '^'){
follow[B - 'A'][i] = follow[B - 'A'][i] || first[A - 'A'][i];
}
}
void add_FOLLOW_A_to_FOLLOW_B(char A, char B)
{
int i;
for (i = 0; i < TSIZE; ++i)
{
if (i != '^')
follow[B - 'A'][i] = follow[B - 'A'][i] || follow[A - 'A'][i];
}
}
void FOLLOW()
{
int t = 0;
int i, j, k, x;
while (t++ < no_pro)
{
for (k = 0; k < 26; ++k) {
if (!nonterminal[k]) continue;
char nt = k + 'A';
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
if (nt == pro[i].str[j]) {
for (x = j + 1; x < pro[i].len; ++x) {
char sc = pro[i].str[x];
if (isNT(sc)) {

33
Compiler Design (3170701) 210210107052

add_FIRST_A_to_FOLLOW_B(sc, nt);
if (first[sc - 'A']['^'])
continue;
}
else {
follow[nt - 'A'][sc] = 1;
}
break;
}
if (x == pro[i].len)
add_FOLLOW_A_to_FOLLOW_B(pro[i].str[0]
, nt);
}
}
}
}
}
}
void add_FIRST_A_to_FIRST_B(char A, char B) {
int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^') {
first[B - 'A'][i] = first[A - 'A'][i] || first[B -
'A'][i];
}
}
}
void FIRST() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_B(sc, pro[i].str[0]);
if (first[sc - 'A']['^'])
continue;
}
else {
first[pro[i].str[0] - 'A'][sc] = 1;
}
break;
}
if (j == pro[i].len)
first[pro[i].str[0] - 'A']['^'] = 1;
}
++t;
}
}

34
Compiler Design (3170701) 210210107052

void add_FIRST_A_to_FIRST_RHS__B(char A, int B) {


int i;
for (i = 0; i < TSIZE; ++i) {
if (i != '^')
first_rhs[B][i] = first[A - 'A'][i] || first_rhs[B][i];
}
}
// Calculates FIRST(ß) for each A->ß
void FIRST_RHS() {
int i, j;
int t = 0;
while (t < no_pro) {
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
char sc = pro[i].str[j];
if (isNT(sc)) {
add_FIRST_A_to_FIRST_RHS__B(sc, i);
if (first[sc - 'A']['^'])
continue;
}
else {
first_rhs[i][sc] = 1;
}
break;
}
if (j == pro[i].len)
first_rhs[i]['^'] = 1;
}
++t;
}
}
int main() {
readFromFile();
follow[pro[0].str[0] - 'A']['$'] = 1;
FIRST();
FOLLOW();
FIRST_RHS();
int i, j, k;
// display first of each variable
printf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
char c = pro[i].str[0];
printf("FIRST OF %c: ", c);
for (j = 0; j < TSIZE; ++j) {
if (first[c - 'A'][j]) {
printf("%c ", j);
}
}
printf("\n");

35
Compiler Design (3170701) 210210107052

}
}
// display follow of each variable
printf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
char c = pro[i].str[0];
printf("FOLLOW OF %c: ", c);
for (j = 0; j < TSIZE; ++j) {
if (follow[c - 'A'][j]) {
printf("%c ", j);
}
}
printf("\n");
}
}
// display first of each variable ß
// in form A->ß
printf("\n");
for (i = 0; i < no_pro; ++i) {
printf("FIRST OF %s: ", pro[i].str);
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j]) {
printf("%c ", j);
}
}
printf("\n");
}
terminal['$'] = 1;
terminal['^'] = 0;
// printing parse table
printf("\n");
printf("\n\t**************** LL(1) PARSING TABLE *******************\n");
printf("\t--------------------------------------------------------\n");
printf("%-10s", "");
for (i = 0; i < TSIZE; ++i) {
if (terminal[i]) printf("%-10c", i);
}
printf("\n");
int p = 0;
for (i = 0; i < no_pro; ++i) {
if (i != 0 && (pro[i].str[0] != pro[i - 1].str[0]))
p = p + 1;
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j] && j != '^') {
table[p][j] = i + 1;
}
else if (first_rhs[i]['^']) {
for (k = 0; k < TSIZE; ++k) {
if (follow[pro[i].str[0] - 'A'][k]) {

36
Compiler Design (3170701) 210210107052

table[p][k] = i + 1;
}
}
}
}
}
k = 0;
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
printf("%-10c", pro[i].str[0]);
for (j = 0; j < TSIZE; ++j) {
if (table[k][j]) {
printf("%-10s", pro[table[k][j] - 1].str);
}
else if (terminal[j]) {
printf("%-10s", "");
}
}
++k;
printf("\n");
}
}
}

Program -2:
#include
#include
// Function to parse a non -terminal S
void S();

// Function to parse a non -terminal L


void L();

// Input string and position pointer


char input[100];
int pos = 0;

int main()
{ printf("Enter the input string: ");
gets(input);
// Start parsing with non -terminal S
S();
// Check if the entire string is parsed
if (input[pos] == '\0')
{ printf(" \nParsing Successful! \n");}
else { printf(" \nParsing Failed! \n"); }
return 0;
}

37
Compiler Design (3170701) 210210107052

void S() {
if (input[pos] == '(')
{
pos++;
L();
if (input[pos] == ')') {
pos++; } else {
printf("Error: Expected ')' \n");
return; } }
else if (input[pos] == 'a') {
pos++;
} Else
{
printf("Error: Invalid input \n");
return; }
}
void L()
{ S();
if (input[pos] == ',')
{ pos++; L(); } }

Observations and Conclusion:


Program -1:

38
Compiler Design (3170701) 210210107052

In the above example, the grammar is given as input and first set and follow set of nonterminals
are identified.Further the LL1 parsing table is constructed .

Program-2:

S → (L) | a

L → SL | ε

The code is a simple demonstration of how a recursive-descent parser works for a specific context-
free grammar. It's important to note that the code only handles a limited grammar, and for more
complex languages, more sophisticated parsing techniques such as LL or LR parsing may be required.
Quiz:

1. What is a parser and state the Role of it?


• A parser is a software component that interprets and analyzes the structure of a program
or a piece of text according to the rules of a formal grammar. Its primary role is to break
down a complex input into its constituent parts, identifying the syntactic elements and
their relationships based on the grammar rules. It helps in understanding the hierarchical
structure of the input, enabling the extraction of meaningful information and facilitating
further processing or interpretation. In programming languages, parsers play a crucial
role in code compilation, interpretation, and execution, ensuring that the syntax of the
code adheres to the language's grammar rules.

2. Types of parsers? Examples to each.


1. Recursive Descent Parser:
• Example: Simple parsers for small grammars in programming languages.
2. LL(1) Parser:

39
Compiler Design (3170701) 210210107052

• Example: Parser used in the processing of computer languages such as JavaScript.


3. LALR Parser (Look-Ahead LR):
• Example: Yacc (Yet Another Compiler Compiler) tool for generating parsers,
commonly used in the development of compilers.
4. LR Parser (Left-to-right, Rightmost Derivation):
• Example: Bison, a general-purpose parser generator that converts an annotated
context-free grammar into a deterministic LR or generalized LR parser.
5. SLR Parser (Simple LR):
• Example: Used for simple grammars and often implemented in parser generators
like Yacc.
6. Earley Parser:
• Example: Parsing natural language, particularly in cases where the grammar is
ambiguous or not context-free.
• These parsers serve as crucial tools for various applications, including language
processing, compiler development, and natural language understanding. They
provide efficient ways to analyze and understand the structure of complex inputs

3. What are the Tools available for implementation?


• There are several tools available for implementing parsers and other language processing
tasks. Some of the most commonly used tools include:
1. Lex and Yacc (or Flex and Bison): Lex and Yacc are tools for generating lexical
analyzers and parsers, respectively. Flex and Bison are the GNU versions of these
tools.
2. ANTLR (ANother Tool for Language Recognition): ANTLR is a powerful parser
generator for reading, processing, executing, or translating structured text or binary
files.
3. Jison: Jison is a robust parser generator for JavaScript that builds parsers based on a
context-free grammar.
4. PLY (Python Lex-Yacc): PLY is a pure-Python implementation of the lex and yacc
tools that provides a more Pythonic way of defining grammars.
5. ANTLRWorks: ANTLRWorks is a powerful IDE for ANTLR that includes a
grammar editor and debugger for creating and testing language grammars.
• These tools significantly simplify the development of parsers and other language
processing components, enabling developers to focus more on the language's logic
and structure rather than low-level parsing details.

4. How do you calculate FIRST(),FOLLOW() sets used in Parsing Table construction? 5. Name
the most powerful parser.
Calculating the FIRST() and FOLLOW() sets in the context of constructing a parsing table
involves a systematic process that aids in the parsing of context-free grammars. Here are the
basic steps for each:
1. FIRST() Set Calculation:
• Initialize the FIRST() set for each non-terminal symbol to an empty set.
• For each production A → α, where α is a string of terminals and nonterminals:
• If α starts with a terminal, add it to FIRST(A).
• If α starts with a non-terminal B, add FIRST(B) to FIRST(A) and continue until a
terminal is found.
• If α derives the empty string, add ε to FIRST(A).
• Repeat the process until no further changes occur.

40
Compiler Design (3170701) 210210107052

2. FOLLOW() Set Calculation:


• Initialize the FOLLOW() set for each non-terminal symbol to an empty set.
• Identify the start symbol and set the FOLLOW() set of the start symbol to include
the end-of-input marker ($).
• For each production A → αBβ, add FIRST(β) (excluding ε) to FOLLOW(B). If ε
is in FIRST(β), add FOLLOW(A) to FOLLOW(B).
• Repeat the process until no further changes occur.
3. The most powerful parser is the Earley Parser. It is a powerful parsing algorithm
that can parse any context-free grammar, including ambiguous grammars, in O(n^3)
time in the general case, making it more powerful than other parsers like the LL and
LR parsers, which are restricted to certain classes of grammars

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/construction-of-ll1-parsing-
table/
3. http://www.cs.ecu.edu/karl/5220/spr16/Notes/Top-down/LL1.html

References used by the students:

https://slaystudy.com/ll1-parsing-table-program-in-c/

Rubric wise marks obtained:


Rubrics Knowled Problem Code Completene Presentati Total
ge of implement Quality (2) ss and on (2)
Parsing ati on (2) accuracy
techniques (2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

41
Compiler Design (3170701) 210210107052

Experiment No - 06

Aim: Implement a program for constructing


a. Recursive Decent Parser (RDP)
b. LALR Parser

Date:

Competency and PracticalSkills:


• Understanding of RDP and bottom up parsers and its role in compiler construction
• Ability to write acceptance of string through RDP and parsing of string using LALR parsers
for a given grammar
• Ability to developRDP and LALR parser using bottom up approach

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the RDP ,broad classification of bottom up parsers and its significance in
compiler construction
⮚ Verifying whether the string is accepted for RDP, a given grammar is parsed using LR
parsers.
⮚ Implement a RDP and LALR parser

Software/Equipment: C compiler
Theory:
❖ Recursive Descent Parser:
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It
can be defined as a Parser that uses the various recursive procedure to process the input string
with no backtracking. It can be simply performed using a Recursive language. The first symbol
of the string of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a
procedure. The objective of each procedure is to read a sequence of input characters that can
be produced by the corresponding non-terminal, and return a pointer to the root of the parse
tree for the non-terminal. The structure of the procedure is prescribed by the productions for
the equivalent non-terminal.
The recursive procedures can be simply to write and adequately effective if written in a
language that executes the procedure call effectively. There is a procedure for each non-
terminal in the grammar. It can consider a global variable lookahead, holding the current input
token and a procedure match (Expected Token) is the action of recognizing the next token in
the parsing process and advancing the input stream pointer, such that lookahead points to the
next token to be parsed. Match () is effectively a call to the lexical analyzer to get the next
token.
For example, input stream is a + b$.
lookahead == a
match()

42
Compiler Design (3170701) 210210107052

lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
❖ LALR (1) Parsing:
LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use
the canonical collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different
look ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example
S → AA
A → aA
A→b
Add Augment Production, insert '•' symbol at the first position for every production in
G and also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by
the non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "•" is followed by the non-
terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non-
terminal. So, the I3 State becomes

43
Compiler Design (3170701) 210210107052

I3= A → a•A, a/b


A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "•" is followed by the non-
terminal. So, the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $
If we analyze then LR (0) items of I3 and I6 are same but they differ only in their
lookahead.
I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b
}
I6= { A → a•A, $
A → •aA, $
A → •b, $
}
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we
can combine them and called as I36.
I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$
}
The I4 and I7 are same but they differ only in their look ahead, so we can combine
them and called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combine
them and called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA:

44
Compiler Design (3170701) 210210107052

LALR (1) Parsing table:

Program-1:
#include<stdio.h>
#include<string.h>
#include<ctype.h>

char input[10];
int i,error;
void E();
void T();
void Eprime();
void Tprime();
void F();
main()
{
i=0;
error=0;
printf("Enter an arithmetic expression : "); // Eg: a+a*a
gets(input);
E();
if(strlen(input)==i&&error==0)
printf("\nAccepted..!!!\n");
else printf("\nRejected..!!!\n");
}

void E()
{
T();
Eprime();

45
Compiler Design (3170701) 210210107052

}
void Eprime()
{
if(input[i]=='+')
{
i++;
T();
Eprime();
}
}
void T()
{
F();
Tprime();
}
void Tprime()
{
if(input[i]=='*')
{
i++;
F();
Tprime();
}
}
void F()
{
if(isalnum(input[i]))i++;
else if(input[i]=='(')
{
i++;
E();
if(input[i]==')')
i++;

else error=1;
}

else error=1;
}

Program-2:
• Calculator.l

%{
#include<stdio.h>
#include<ctype.h>
#include "calculator.tab.h"
%}

%%
[0-9]+ { yylval = atoi(yytext); return INTEGER; }

46
Compiler Design (3170701) 210210107052

[ \t] ; /* skip whitespace */


\n { return EOL; }
. { return yytext[0]; }
%%
int yywrap(void) { return 1;
}

• Calculator.y:

%{
#includes<stdio.h>
extern int yylex();
extern void yyerror(char *);
%}

%token INTEGER EOL


%left '+' '-'
%left '*' '/'
%%
line: /* empty */
| line exp EOL { printf("= %d\n", $2); }
;
exp: INTEGER { $$ = $1; }
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| '(' exp ')' { $$ = $2; }
;
%%

int main(void)
{ yyparse();
return 0;
}
void yyerror(char* s)
{
fprintf(stderr, "error: %s\n", s);
}

Observations and Conclusion:

Program -1:

a+(a*a), a+a*a, (a), a , a+a+a*a+a.... etc are accepted


++a, a***a, +a, a*, ((a . . . etc are rejected.

47
Compiler Design (3170701) 210210107052

In the above output, as pe the grammar provided and as per calling procedure , the tree is
parsed and thereby the inputted strings are mapped w.r.t calling procedure ; and the string/s
which are successfully parsed are accepted and others rejected.

Program-2:

The code defines a simple calculator that can perform basic arithmetic operations on integers. It supports
addition, subtraction, multiplication, and division, along with parentheses for grouping expressions. You
can compile the code using appropriate tools for Lex and Yacc, such as Flex and Bison. Once compiled,
you can run the executable and input mathematical expressions to get the corresponding results.

48
Compiler Design (3170701) 210210107052

Quiz:
1. What do you mean by shift reduce parsing?
• Shift-reduce parsing is a bottom-up parsing technique where the parser shifts the input onto
a stack until it can reduce it according to the production rules of a grammar. It involves
shifting symbols onto a stack and reducing them according to the grammar rules until the
entire input is parsed.

2. Provide broad classification of LR parsers.


• LR parsers are a type of bottom-up parsers that use left-to-right scanning of the input and
perform a rightmost derivation in reverse. They are classified into different types based
on the look-ahead they consider and the actions they take. The broad classification of LR
parsers includes:
• SLR (Simple LR) Parser: SLR parsers are simple to construct and use a minimal amount
of information to make parsing decisions. They have a simple set of states and use a
subset of the LR(0) items to construct the parsing table. However, SLR parsers may not
be able to handle all grammars due to their limited lookahead.
• LALR (Look-Ahead LR) Parser: LALR parsers are an improvement over SLR parsers.
They use a more sophisticated lookahead mechanism to handle a larger set of grammars
while still maintaining a manageable parsing table. LALR parsers are more powerful than
SLR parsers but less powerful than canonical LR parsers.
• LR(1) Parser: LR(1) parsers have a lookahead of one symbol and are more powerful than
SLR and LALR parsers. They can handle a broader class of grammars, making them
more widely applicable. However, constructing LR(1) parsing tables can be complex,
and the parsing tables can be large.
• Canonical LR (LR(1)) Parser: Canonical LR parsers are the most powerful among LR
parsers. They consider the entire right context of any handle and have the most substantial
lookahead. Canonical LR parsers can handle a wide range of context-free grammars but
require more computational resources for construction and may result in large parsing
tables.
• These different classifications represent a trade-off between the complexity of the parser
construction and the power of the grammar it can handle. They aim to balance the
efficiency and practicality of parsing algorithms for various types of context-free
grammars.

3. Differentiate RDP and LALR parser.


• RDP is a top-down parsing technique that uses recursive procedures, while LALR is a
bottom-up parsing technique that employs a state machine with a stack and a more
sophisticated look-ahead mechanism. RDP is suitable for LL(k) grammars and is easy to
implement, while LALR is more powerful and practical for handling a broader class of
grammars, albeit with more complex parsing table construction.

4. How to do merging of itemsets?


• To merge item sets in the context of parsing, specifically in the construction of LR(0),
SLR, LALR, or LR(1) parsing tables, follow these steps:
1. Identify the item sets that can be merged based on their core.
2. Merge the item sets by combining their contents while taking care to handle any
duplicate items.

49
Compiler Design (3170701) 210210107052

3. Adjust the transitions in the parsing table accordingly, ensuring that transitions from
the merged item sets are appropriately updated to reflect the merged state.
4. Update the lookahead sets for the merged item sets to ensure that the parser can
handle the combined grammar rules accurately.
5. Reconstruct the parsing table based on the updated item sets and transitions, taking
care to resolve any potential conflicts that may arise during the merging process.
• By following these steps, you can effectively merge item sets and streamline the
construction of parsing tables, facilitating more efficient and optimized parsing of input
strings based on the given grammar rules.

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/recursive-descent-parser/
3. https://www.youtube.com/watch?v=odoHgcoombw
4. https://www.geeksforgeeks.org/lalr-parser-with-examples/

References used by the students:


• https://www.geeksforgeeks.org/recursive-descent-parser/
• https://datatrained.com/post/recursive-descent-parser/
• https://www.geeksforgeeks.org/lalr-parser-with-examples/
• https://en.wikipedia.org/wiki/LALR_parser

Rubric wise marks obtained:


Knowledge
Problem Completeness
of Parsing Code Presentation
implementati and accuracy
Rubrics techniques Quality (2) (2) Total
on (2) (2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

50
Compiler Design (3170701) 210210107052

Experiment No - 07

Aim: Implement a program for constructing Operator Precedence Parsing.


Date:

Competency and PracticalSkills:


• Understanding of OPG and its role in compiler construction
• Ability to write precedence table for a given language
• Ability to developOPG program using C compiler.

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept of OPG its significance in compiler construction
⮚ Write precedence relations for grammar
⮚ Implement a OPG using C compiler

Software/Equipment: C compiler
Theory:
Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used to a class of
Grammars known as Operator Grammar.
A Grammar G is Operator Grammar if it has the following
properties −
● Production should not contain ϵ on its right side.
● There should not be two adjacent non-terminals at the right side of production.
Example1 − Verify whether the following Grammar is operator
Grammar or not.
E → E A E |(E)|id
A → +| − | ∗
Solution
No, it is not an operator Grammar as it does not satisfy property 2 of operator Grammar.
As it contains two adjacent Non-terminals on R.H.S of production E → E A E.
We can convert it into the operator Grammar by substituting the value of A in E → E A E.
E → E + E |E − E |E * E |(E) | id.
Operator Precedence Relations
Three precedence relations exist between the pair of terminals.

Relation Meaning

p <. q p has less precedence than q.

p >. q p has more precedence than q.

p =. q p has equal precedence than q.


Depending upon these precedence Relations, we can decide which operations will be
executed or parsed first.

51
Compiler Design (3170701) 210210107052

Association and Precedence Rules


● If operators have different precedence
Since * has higher precedence than +
Example−
In a statement a + b * c
∴ + <. *
In statement a * b + c
∴∗ . > +
● If operators have Equal precedence, then use Association rules.
(a) Example minus; In statement a + b + c here + operators are having equal precedence.
As '+' is left Associative in a + b + c
∴ (a + b) will be computed first, and then it will be added to c.
i.e., (a + b) + c
+ .> +
Similarly, '*' is left Associative in a * b * c
(b) Example − In a statement a ↑ b ↑ c here, ↑ is the Right
Associative operator
∴ It will become a ↑ (b ↑ c)
∴ (b ↑ c) will be computed first.
∴ ↑<. ↑
● Identifier has more precedence then all operators and symbols.
∴ θ <. id $ <. id
id . > θ id . > $
id . >)
(<. id.
● $ has less precedence than all other operators and symbols.
$ <. ( id . > $
$ <. + ). > $
$ <.*

Example 2– Construct the Precedence Relation table for the Grammar.


E → E + E | E ∗ E/id
Solution
Operator-Precedence Relations
Id + * $

Id .> .> .>

+ <. .> <. .>

* <. .> .> .>

$ <. <. <.


Advantages of Operator Precedence Parsing
● It is accessible to execute.
Disadvantages of Operator Precedence Parsing
● Operator Like minus can be unary or binary. So, this operator can have different
precedence’s in different statements.
● Operator Precedence Parsing applies to only a small class of Grammars.

52
Compiler Design (3170701) 210210107052

Program:
#include<stdlib.h>
#include<stdio.h>
#include<string.h>

// function f to exit from the loop


// if given condition is not true
void f()
{
printf("Not operator grammar");
exit(0);
}

void main()
{
char grm[20][20], c;

// Here using flag variable,


// considering grammar is not operator grammar
int i, n, j = 2, flag = 0;

// taking number of productions from user


scanf("%d", &n);
for (i = 0; i < n; i++)
scanf("%s", grm[i]);

for (i = 0; i < n; i++) {


c = grm[i][2];

while (c != '&#092;&#048;') {

if (grm[i][3] == '+' || grm[i][3] == '-'


|| grm[i][3] == '*' || grm[i][3] == '/')

flag = 1;

else {

flag = 0;
f();
}

if (c == '$') {
flag = 0;
f();
}

c = grm[i][++j];
}
}

if (flag == 1)
printf("Operator grammar");
}

53
Compiler Design (3170701) 210210107052

Observations and Conclusion:


Input :3
A=A*A
B=AA
A=$

Output : Not operator grammar

In the above example ,the grammar is analysed as per operator grammar rules and the output
is against the rules of OPG so, it is not an operator grammar.
Input :2
A=A/A
B=A+A

Output : Operator grammar

In the above example ,the grammar is analysed as per operator grammar rules and the output
favors the rules of OPG(operator present between two non terminals) so, it is not an operator
grammar.

Quiz:
1. Define operator grammar.
• An operator grammar is a type of context-free grammar that defines the syntax rules for
expressions involving operators. It specifically focuses on the rules governing the usage
and combination of operators within expressions. Operator grammars are commonly used
in the context of defining the syntax of programming languages, mathematical
expressions, and formal language specifications. They typically specify the valid
arrangements of operators and operands, along with any associated precedence and
associativity rules. The grammar outlines how operators can be combined to form valid
expressions, providing a foundation for parsing and evaluating complex expressions in
various computational contexts.

2. Define operator precedence grammar.


• An operator precedence grammar is a type of context-free grammar that includes
additional rules for specifying the precedence and associativity of operators within
expressions. It is a technique used to resolve ambiguities that arise from the presence of
multiple operators in an expression. This grammar type defines the priority or precedence
of different operators, ensuring that expressions are parsed and evaluated according to

54
Compiler Design (3170701) 210210107052

specific precedence rules.


• In an operator precedence grammar, each operator is assigned a precedence level, and
rules are defined to handle the interactions between operators of different precedence
levels. These rules help the parser determine the correct order of operations when parsing
and evaluating an expression. Operator precedence grammars are commonly used in the
design of parsers for programming languages and mathematical expressions to ensure
that expressions are evaluated correctly based on predefined precedence and associativity
rules.

3. What are the different precedence relations in operator precedence parser?


• In an operator precedence parser, the following precedence relations are defined to
resolve ambiguities and determine the order of operations during parsing:
• Precedence Relation (<): This relation defines that an operator with a lower precedence
value binds tighter or has higher priority than an operator with a higher precedence value.
It helps in deciding the order of operations when parsing an expression.
• Precedence Relation (=): This relation specifies that two operators with the same
precedence value can appear adjacent to each other in an expression. It ensures that
operators of the same precedence level are handled correctly, considering their
associativity.
• Precedence Relation (>): This relation indicates that an operator with a higher precedence
value binds tighter or has higher priority than an operator with a lower precedence value.
It is crucial for determining the correct order of operations when parsing complex
expressions with multiple operators.
• These precedence relations play a vital role in guiding the operator precedence parser to
interpret and evaluate expressions correctly, ensuring that the parsing process follows the
defined precedence and associativity rules for the operators in the grammar.

4. What are the different methods are available to determine relations?


• Different methods can be employed to determine relations, particularly in the context of
operator precedence parsers. Some common methods include:
• Parsing Table Construction: Constructing a parsing table based on the operator
precedence relations defined in the grammar. This method involves systematically
analyzing the precedence and associativity of operators to populate the parsing table,
which is then used to guide the parsing process.
• Operator Precedence Functions: Defining specific functions that establish the precedence
relations between operators. These functions can be designed to compare the precedence
levels of different operators and determine their relative priority in the parsing process.
• Precedence Climbing Algorithm: Implementing a precedence climbing algorithm that
systematically processes the input expression while adhering to the defined precedence
relations. This algorithm helps determine the appropriate precedence and associativity of
operators, enabling the parser to parse and evaluate the expression accurately.
• By employing these methods, developers can effectively establish and utilize precedence
relations to ensure that the operator precedence parser accurately interprets and processes
complex expressions based on the defined rules and priorities.

5. What do you mean by precedence function


• A precedence function, in the context of operator precedence parsing, is a function that

55
Compiler Design (3170701) 210210107052

is designed to determine the precedence relationship between different operators. It is


used to compare the relative priority of operators and establish their order of operations
within an expression.
• This function typically takes two operators as input and returns a value that indicates the
precedence relationship between them. The return value can be used to determine
whether one operator has higher precedence, lower precedence, or the same precedence
as the other.
• Precedence functions are essential for establishing the hierarchy of operators in the
grammar and ensuring that the parser correctly processes the input expression according
to the specified precedence and associativity rules. They play a critical role in guiding
the parsing process and resolving any potential ambiguities that may arise from the
presence of multiple operators in an expression.

Suggested Reference:
1. https://www.gatevidyalay.com/operator-precedence-parsing/
2. https://www.geeksforgeeks.org/role-of-operator-precedence-parser/

Rubric wise marks obtained:


Knowledge Knowledge
Completeness
of parsing of Implementat Presentation
and accuracy
Rubrics techniques precedence ion (2) (2) Total
(2)
(2) table (2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

56
Compiler Design (3170701) 210210107052

Experiment No - 08

Aim: Generate 3-tuple intermediate code for given infix expression.

Date:

Competency and PracticalSkills:


• Understanding of intermediate code generation and its role in compiler construction
• Ability to write intermediate code for given infix expression.

Relevant CO: CO2

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the different intermediate code representations and its significance in
compiler construction
⮚ Write intermediate code for given infix expression

Software/Equipment: C compiler
Theory:
Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.

Three address code is used in compiler applications:

Optimization: Three address code is often used as an intermediate representation of code


during optimization phases of the compilation process. The three address code allows the
compiler to analyze the code and perform optimizations that can improve the performance of
the generated code.
Code generation: Three address code can also be used as an intermediate representation of
code during the code generation phase of the compilation process. The three address code
allows the compiler to generate code that is specific to the target platform, while also ensuring
that the generated code is correct and efficient.
Debugging: Three address code can be helpful in debugging the code generated by the
compiler. Since three address code is a low-level language, it is often easier to read and
understand than the final generated code. Developers can use the three address code to trace
the execution of the program and identify errors or issues that may be present.
Language translation: Three address code can also be used to translate code from one
programming language to another. By translating code to a common intermediate
representation, it becomes easier to translate the code to multiple target languages.
General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries
and op represents the operator

57
Compiler Design (3170701) 210210107052

Example-1: Convert the expression a * – (b + c) into three address code.

Example-2: Write three address code for following code


for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}

Implementation of Three Address Code –


There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples
1. Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and result.
op denotes the operator and arg1 and arg2 denotes the two operands and result is used to store
the result of the expression.
Advantage –
● Easy to rearrange code for global optimization.
● One can quickly access value of temporary variables using symbol table.
Disadvantage –
● Contain lot of temporaries.
● Temporary variable creation increases time and space complexity.
Example – Consider expression a = b * – c + b * – c. The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c

58
Compiler Design (3170701) 210210107052

t4 = b * t3
t5 = t2 + t4
a = t5

2. Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
● Temporaries are implicit and difficult to rearrange code.
● It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c

3. Indirect Triples – This representation makes use of pointer to the listing of all references
to computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and easier to
rearrange code.
Example – Consider expression a = b * – c + b * – c

Question – Write quadruple, triples and indirect triples for following expression : (x + y) * (y
+ z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2

59
Compiler Design (3170701) 210210107052

t4 = t1 + z
t5 = t3 + t4

Program:

void main()
{
printf("Enter the expression:");
scanf("%s",j);
printf("\tThe Intermediate code is:\n");
small();
}
if(j[i]=='*')
printf("\tt%d=%s*%s\n",c,a,b);
if(j[i]=='/')
printf("\tt%d=%s/%s\n",c,a,b);
if(j[i]=='+')
printf("\tt%d=%s+%s\n",c,a,b);if(j[i]=='-')
printf("\tt%d=%s-%s\n",c,a,b);
if(j[i]=='=')
printf("\t%c=t%d",j[i-1],--c);
sprintf(ch,"%d",c);
j[i]=ch[0];
c++;
small();
}
void small()
{
pi=0;l=0;
for(i=0;i<strlen(j);i++)
{
for(m=0;m<5;m++)

60
Compiler Design (3170701) 210210107052

if(j[i]==sw[m])
if(pi<=p[m])
{
pi=p[m];
l=1;
k=i;
}}
if(l==1)
dove(k);
else
exit(0);}
Observations and Conclusion:

In the above example the user is asked to write an infix expression and the output is generated
intermediate code(3- address code).
Quiz:
1. What are the different implementation methods for three-address code?
• Three-address code is a form of intermediate code that represents the code in a simple
and linear form, using at most three operands per instruction. There are several
implementation methods for generating and representing three-address code, including:
• Quadruples: Quadruples represent three-address code using four fields: an operation
code, and three operands (source operands and result operand). It's a straightforward
method to represent instructions and is commonly used in compilers.
• Triples: Triples are similar to quadruples but use only three fields: an operation code and
two operands. They can be converted into quadruples by adding an extra field for the
result.
• Indirect triples: Indirect triples store the operands as memory locations, allowing for
more flexibility in handling complex expressions.
• Syntax trees: Syntax trees can be used to represent three-address code. They are
hierarchical structures that break down the code into a more abstract and visual
representation, enabling easier manipulation and translation into other forms of code.
• These methods help in the conversion of high-level languages into low-level languages,
making it easier to perform optimizations and generate efficient machine code during the
compilation process. They are crucial for facilitating the translation and optimization
phases in the design and implementation of compilers and interpreters.

2. What do you mean by quadruple?


• In the context of compilers and programming language translation, a quadruple is a data
structure used to represent a single executable statement in three-address code. It consists

61
Compiler Design (3170701) 210210107052

of four fields, including:


• Operator: The operation code specifying the operation to be performed, such as addition,
subtraction, multiplication, division, or assignment.
• Argument 1: The first operand or argument involved in the operation.
• Argument 2: The second operand or argument involved in the operation.
• Result: The variable or memory location where the result of the operation is stored.
• Quadruples are commonly used as an intermediate representation during the compilation
process. They simplify the manipulation and optimization of code and provide a clear
and structured way to represent complex operations in a more manageable and
interpretable form. They are instrumental in facilitating the generation of machine code
from higher-level programming languages.

3. Define triple and indirect triple.


• Triple: A triple is a data structure used in the representation of three-address code. It
consists of three fields and represents a single executable statement. The fields in a triple
typically include an operator and two operands, where the operands can be variables,
constants, or memory locations. Triples are a simplified form of representing code and
are often used as an intermediate step in the compilation process before generating the
final machine code.
• Indirect Triple: An indirect triple is a variation of the triple data structure that uses
memory locations as operands rather than variables or constants directly. In an indirect
triple, the operands are represented as memory addresses or locations, allowing for more
flexibility in handling complex expressions and data structures. Indirect triples are
particularly useful in cases where the specific value of an operand is not known at
compile time or when dealing with data structures that require dynamic memory
allocation. They enable the representation of indirect addressing in the generated code.

4. What is abstract syntax tree?


• An Abstract Syntax Tree (AST) is a hierarchical data structure used in computer science
and programming language theory to represent the syntactic structure of a program or a
piece of code. It provides a more abstract and structured representation of the code,
capturing
its essential elements and their relationships.
• Key features of an AST include:
• Hierarchical Structure: ASTs are hierarchical, with each node representing a syntactic
construct in the program, such as expressions, statements, or declarations. The nodes are
interconnected through parent-child relationships that reflect the nesting and composition
of code elements.
• Abstract Representation: ASTs abstract away from the specifics of the source code's
textual representation and focus on its underlying structure. They capture the essential
semantics and grammar of the code while omitting irrelevant details like formatting,
comments, and unnecessary syntax.
• Language Independence: ASTs are language-independent, making them valuable in the
design of compilers and interpreters. They serve as an intermediary representation that
facilitates various program analysis and transformation tasks, including optimization,
translation, and code generation.
• ASTs play a crucial role in the compilation process, serving as an essential bridge

62
Compiler Design (3170701) 210210107052

between the source code and the various phases of the compiler, such as parsing, semantic
analysis, optimization, and code generation. They enable efficient manipulation and
analysis of code, making it easier to implement various program analysis and
transformation techniques.

5. Differentiate parse tree and abstract syntax tree.


• The main differences between a parse tree and an abstract syntax tree (AST) are as
follows:
• Parse Tree: A parse tree, also known as a concrete syntax tree, represents the syntactic
structure of the input according to the grammar rules of a language. It includes all the
syntactic details, such as parentheses, brackets, and other tokens, along with the
grammatical relationships between the elements. Parse trees capture the entire syntactic
structure, including elements that may not be semantically relevant, resulting in a more
detailed representation of the code's syntax.
• Abstract Syntax Tree (AST): An AST, on the other hand, represents the abstract
syntactic structure of the input, focusing on the essential elements and their relationships
while abstracting away from unnecessary details. ASTs omit specific syntactic details
that are not essential for understanding the code's structure and semantics, such as
parentheses and other grammatical constructs that do not affect the program's logic. ASTs
are designed to be more compact and languageindependent, making them well-suited for
performing various program analysis and transformation tasks during the compilation
process.

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman.
2. https://www.geeksforgeeks.org/introduction-to-intermediate-representationir/
3. https://cs.lmu.edu/~ray/notes/ir/

References used by the students:


• https://www.geeksforgeeks.org/three-address-code-compiler/
• https://www.geeksforgeeks.org/intermediate-code-generation-in-compiler-design/
• https://www.javatpoint.com/three-address-code
• https://www.gatevidyalay.com/implementation-of-three-address-code/

Rubric wise marks obtained:


Problem Completeness
Knowledge Logic
Recognition and accuracy Ethics (2)
Rubrics (2) Building (2) Total
(2) (2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

63
Compiler Design (3170701) 210210107052

Experiment No - 09

Aim: Extract Predecessor and Successor from given Control Flow Graph.

Date:

Competency and PracticalSkills:


• Understanding of Control flow structure in compilers
• Ability to write predecessor and successor for given graph

Relevant CO: CO3

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understand the concept control structure (in blocks) in compiler
⮚ Write the predecessor and successor for given graph.

Software/Equipment: C/C++ compiler


Theory:
A basic block is a simple combination of statements. Except for entry and exit, the basic blocks
do not have any branches like in and out. It means that the flow of control enters at the
beginning and it always leaves at the end without any halt. The execution of a set of instructions
of a basic block always takes place in the form of a sequence.
The first step is to divide a group of three-address codes into the basic block. The new basic
block always begins with the first instruction and continues to add instructions until it reaches
a jump or a label. If no jumps or labels are identified, the control will flow from one instruction
to the next in sequential order.
The algorithm for the construction of the basic block is described below step by step:
Algorithm: The algorithm used here is partitioning the three-address code into basic blocks.
Input: A sequence of three-address codes will be the input for the basic blocks.
Output: A list of basic blocks with each three address statements, in exactly one block, is
considered as the output.
Method: We’ll start by identifying the intermediate code’s leaders. The following are some
guidelines for identifying leaders:
1. The first instruction in the intermediate code is generally considered as a leader.
2. The instructions that target a conditional or unconditional jump statement can be
considered as a leader.
3. Any instructions that are just after a conditional or unconditional jump statement
can be considered as a leader.
Each leader’s basic block will contain all of the instructions from the leader until the instruction
right before the following leader’s start.
Example of basic block:
Three Address Code for the expression a = b + c – d is:
T1 = b + c
T2 = T1 - d
a = T2
This represents a basic block in which all the statements execute in a sequence one after the
other.

64
Compiler Design (3170701) 210210107052

Basic Block Construction:


Let us understand the construction of basic blocks with an example:
Example:
1. PROD = 0
2. I = 1
3. T2 = addr(A) – 4
4. T4 = addr(B) – 4
5. T1 = 4 x I
6. T3 = T2[T1]
7. T5 = T4[T1]
8. T6 = T3 x T5
9. PROD = PROD + T6
10. I = I + 1
11. IF I <=20 GOTO (5)
Using the algorithm given above, we can identify the number of basic blocks in the above
three-address code easily-
There are two Basic Blocks in the above three-address code:
● B1 – Statement 1 to 4
● B2 – Statement 5 to 11
Transformations on Basic blocks:
Transformations on basic blocks can be applied to a basic block. While transformation, we
don’t need to change the set of expressions computed by the block.
There are two types of basic block transformations. These are as follows:
1. Structure-Preserving Transformations
Structure preserving transformations can be achieved by the following methods:
1. Common sub-expression elimination
2. Dead code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements
2. Algebraic Transformations
In the case of algebraic transformation, we basically change the set of expressions into an
algebraically equivalent set.
For example, and expression
x:= x + 0
or x:= x *1
This can be eliminated from a basic block without changing the set of expressions.
Flow Graph:
A flow graph is simply a directed graph. For the set of basic blocks, a flow graph shows the
flow of control information. A control flow graph is used to depict how the program control is
being parsed among the blocks. A flow graph is used to illustrate the flow of control between
basic blocks once an intermediate code has been partitioned into basic blocks. When the
beginning instruction of the Y block follows the last instruction of the X block, an edge might
flow from one block X to another block Y.
Let’s make the flow graph of the example that we used for basic block formation:

65
Compiler Design (3170701) 210210107052

Flow Graph for above Example

Firstly, we compute the basic blocks (which is already done above). Secondly, we assign the
flow control information.

Program:
// C++ program to find predecessor and successor in a
BST#include<iostream>
usingnamespacestd;

// BST Node
structNode
{
intkey;
structNode*left,*right;
};

// This function finds predecessor and successor of key in BST.


//It sets pre and suc as predecessor and successor respectively
voidfindPreSuc(Node*root,Node*&pre,Node*&suc,int key){
//Basecase
if(root== NULL)return ;

// If key is present at
rootif(root->key
==key)
{
// the maximum value in left subtree is
predecessorif(root->left !=NULL)

66
Compiler Design (3170701) 210210107052

{
Node* tmp = root-
>left;while(tmp-
>right)
tmp = tmp-
>right;pre=tmp ;
}

// the minimum value in right subtree is


successorif(root->right !=NULL)
{

Node* tmp = root->right


;while(tmp->left)
tmp = tmp->left
;suc=tmp ;

}
return;
}

// If key is smaller than root's key, go to left subtreeif(root->key >key)


{
suc=root ;
findPreSuc(root->left,pre,suc,key);
}
else// goto rightsubtree
{
pre=root ;
findPreSuc(root->right,pre,suc,key);
}
}

// A utility function to create a new BST


nodeNode*newNode(int item)
{
Node *temp = new
Node;temp->key=item;
temp->left = temp->right =
NULL;returntemp;
}

/* A utility function to insert a new node with given key in BST

67
Compiler Design (3170701) 210210107052

*/Node*insert(Node* node, int key)


{
if(node==NULL)
return
newNode(key);if(key
<node->key)
node->left=insert(node->left,key);
else
node->right=insert(node->right,key);
returnnode;
}

// Driver program to test above function


intmain()
{
intkey =65; //Keyto be searched inBST

/*Let us createfollowing BST


50
/ \
30 70
/ \ / \
20 40 60 80 */

Node *root =
NULL;root =
insert(root,
50);insert(root,30);
insert(root,20);
insert(root,40);
insert(root,75);
insert(root,60);
insert(root,80);

Node* pre = NULL, *suc =


NULL;findPreSuc(root,pre,
suc,key);
if(pre!=NULL)
cout << "Predecessor is " << pre->key <<
endl;else
cout<<"No Predecessor";

if(suc!=NULL)
cout << "Successor is " << suc-

68
Compiler Design (3170701) 210210107052

>key;else
cout << "No
Successor";return0;
}

Observations and Conclusions:

In the above example user gets predecessor and successor of a given specific node .

Quiz:
1. What is flowgraph?
• A flowgraph, also known as a control flow graph, is a graphical representation of the
control flow or execution flow of a program. It depicts the sequence of instructions and
the paths that the program can take during its execution. In a flowgraph, nodes represent
individual instructions or basic blocks, and directed edges between nodes represent the
possible control transfers between these instructions or blocks.
• Key characteristics of a flowgraph include:
• Nodes: Nodes represent specific program statements, basic blocks, or individual
instructions.
• Edges: Edges between nodes represent the control flow between different parts of the
program, illustrating the possible paths that the program can follow during execution.
• Flowgraphs are essential for program analysis, optimization, and understanding the
behavior of a program during its execution. They are commonly used in various stages
of the compilation process and play a vital role in identifying control dependencies,
optimizing code, and detecting potential issues within the program's control flow.

2. Define DAG.
• DAG stands for Directed Acyclic Graph. It is a finite directed graph that contains no
directed cycles. In other words, it is a directed graph that has a defined direction between
the edges
• but does not contain any directed cycles, which means it is not possible to traverse
through the graph and return to the same node following the direction of the edges.
• Key characteristics of a DAG include:
• Directed Edges: Edges in a DAG have a specific direction associated with them,
indicating the relationship between the nodes.
• Acyclic Property: DAGs do not contain any directed cycles, which means it is not
possible to traverse the graph and return to the same node following the direction of the
edges.
• DAGs find applications in various fields, including computer science, mathematics, and
scheduling algorithms. They are commonly used for representing dependencies between
tasks, scheduling jobs, representing arithmetic expressions, and optimizing computations

69
Compiler Design (3170701) 210210107052

3. Define Backpatching
• Backpatching is a technique used in compilers and interpreters to handle the translation
of high-level programming constructs into lower-level code. It involves delaying the
generation of certain code instructions until additional information becomes available.
Specifically, it is used for filling in the target addresses of control flow statements such
as jumps and branches after the addresses become known during the code generation
process.
• Key points about backpatching:
• Delayed Address Resolution: Backpatching delays the assignment of actual addresses or
locations for target instructions until all the necessary information is available.
• Temporary Placeholder: During the compilation or interpretation process, backpatching
may use temporary placeholders for the addresses, which are later updated with the
correct target addresses.
• Optimization and Efficiency: Backpatching aids in code optimization and efficiency by
allowing the compiler or interpreter to handle control flow instructions more effectively.
• Overall, backpatching is an essential technique for managing control flow statements
efficiently during the translation of high-level code into lower-level machine code,
helping to improve the overall performance and efficiency of the compilation process.

4. What is concept o local transformation on a block of code?


• The concept of local transformation on a block of code refers to the application of specific
optimizations or modifications to a particular section or block of code within a program,
without affecting the overall structure or behavior of the program. Local transformations
typically target a limited scope, such as a basic block, a set of consecutive instructions,
or a small portion of the code, aiming to improve the efficiency, performance, or
readability of that specific segment.
• Key characteristics of local transformations include:
• Limited Scope: Local transformations focus on a confined portion of the code, such as a
basic block, a single function, or a small segment of instructions, rather than the entire
program.
• Specific Optimizations: These transformations encompass various specific
optimizations, such as constant folding, common subexpression elimination, strength
reduction, and loop optimizations, that aim to enhance the performance and efficiency of
the code within the targeted block.
• Minimal Impact: Local transformations are designed to have minimal impact on the
overall structure and behavior of the program, ensuring that the changes applied to the
specific block of code do not alter the functionality of the entire program.
• By employing local transformations, developers and compilers can focus on improving
the efficiency and performance of specific segments of the code without affecting the
broader context, thereby optimizing the code at a granular level while maintaining the
overall integrity of the program.

Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman
2. https://www.geeksforgeeks.org/data-flow-analysis-compiler/

70
Compiler Design (3170701) 210210107052

References used by the students:


• https://www.geeksforgeeks.org/inorder-predecessor-successor-given-key-bst/
• https://www.pepcoding.com/resources/online-java-
foundation/generictree/predecessor_and_successor_of_an_element/topic

Rubric wise marks obtained:


Documentatio
Problem
Knowledge implementati Correctness n and
Recognition
Rubrics (2) on (2) (2) Presentation Total
(2)
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

71
Compiler Design (3170701) 210210107052

Experiment No - 10
Aim: Study of Learning Basic Block Scheduling Heuristics from Optimal Data.

Date:

Competency and PracticalSkills:


● Knowledge of basic computer architecture concepts
● Understanding of compiler design and optimization techniques
● Ability to analyze and interpret performance data
● Proficiency in programming languages such as C/C++ and assembly language

Relevant CO: CO4

Objectives:
By the end of this experiment, the students should be able to:
⮚ Understanding the concept of basic block scheduling and its importance in compiler
optimization.
⮚ Understanding the various heuristics used for basic block scheduling.
⮚ Analyzing optimal data to learn the basic block scheduling heuristics.
⮚ Comparing the performance of the implemented basic block scheduler with other
commonly used basic block schedulers.

Software/Equipment: Computer system

Theory:
Instruction scheduling is an important step for improving the performance of object code
produced by a compiler. Basic block scheduling is important in its own right and also as a
building block for scheduling larger groups of instructions such as superblocks. The basic block
instruction scheduling problem is to find a minimum length schedule for a basic block a
straight-line sequence of code with a single-entry point and a single exit point subject to
precedence, latency, and resource constraints. Solving the problem exactly is known to be
difficult, and most compilers use a greedy list scheduling algorithm coupled with a heuristic.
The heuristic is usually hand-crafted, a potentially time-consuming process. Modern
architectures are pipelined and can issue multiple instructions per time cycle. On such
processors, the order that the instructions are scheduled can significantly impact performance.

The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single entry point and a single exit point
subject to precedence, latency, and resource constraints. Instruction scheduling for basic blocks
is known to be NP-complete for realistic architectures. The most popular method for scheduling
basic blocks continues to be list scheduling.

For e.g.: We consider multiple-issue pipelined processors. On such processors, there are
multiple functional units and multiple instructions can be issued (begin execution) each clock
cycle. Associated with each instruction is a delay or latency between when the instruction is
issued and when the result is available for other instructions which use the result. In this paper,
we assume that all functional units are fully pipelined and that instructions are typed. Examples
of types of instructions are load/store, integer, floating point, and branch instructions. We use

72
Compiler Design (3170701) 210210107052

the standard labelled directed acyclic graph (DAG) representation of a basic block (see Figure
1(a)). Each node corresponds to an instruction and there is an edge from i to j labelled with a
positive integer l (i, j) if j must not be issued until i has executed for l (i, j) cycles. Given a
labelled dependency DAG for

a basic block, a schedule for a multiple-issue processor specifies an issue or start time for each
instruction or node such that the latency constraints are satisfied and the resource constraints
are satisfied. The latter are satisfied if, at every time cycle, the number of instructions of each
type issued at that cycle does not exceed the number of functional units that can execute
instructions of that type. The length of a schedule is the number of cycles needed for the
schedule to complete; i.e., each instruction has been issued at its start time and, for each
instruction with no successors, enough cycles have elapsed that the result for the instruction is
available. The basic block instruction scheduling problem is to construct a schedule with
minimum length.

Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures.
The most popular method for scheduling basic blocks continues to be list scheduling. A list
scheduler takes a set of instructions as represented by a dependency DAG and builds a schedule
using a best-first greedy heuristic. A list scheduler generates the schedule by determining all
instructions that can be scheduled at that time step, called the ready list, and uses the heuristic
to determine the best instruction on the list. The selected instruction is then added to the partial
schedule and the scheduler determines if any new instructions can be added to the ready list.

The heuristic in a list scheduler generally consists of a set of features and an order for testing
the features. Some standard features are as follows. The path length from a node i to a node j
in a DAG is the maximum number of edges along any path from i to j. The critical-path distance
from a node i to a node j in a DAG is the maximum sum of the latencies along any path from i
to j. Note that both the path length and the critical-path distance from a node i to itself is zero.
A node j is a descendant of a node i if there is a directed path from i to j; if the path consists of
a single edge, j is also called an immediate successor of i. The earliest start time of a node i is
a lower bound on the earliest cycle in which the instruction i can be scheduled.

In supervised learning of a classifier from examples, one is given a training set of instances,
where each instance is a vector of feature values and the correct classification for that instance,
and is to induce a classifier from the instances. The classifier is then used to predict the class
of instances that it has not seen before. Many algorithms have been proposed for supervised
learning. One of the most widely used is decision tree learning. In a decision tree the internal
nodes of the tree are labelled with features, the edges to the children of a node are labelled with
the possible values of the feature, and the leaves of the tree are labelled with a classification.
To classify a new example, one starts at the root and repeatedly tests the feature at a node and
follows the appropriate branch until a leaf is reached. The label of the leaf is the predicted
classification of the new example.

73
Compiler Design (3170701) 210210107052

Algorithm:

This document is on automatically learning a good heuristic for basic block scheduling using
supervised machine learning techniques. The novelty of our approach is in the quality of the
training data we obtained training instances from very large basic blocks and we performed an
extensive and systematic analysis to identify the best features and to synthesize new features—
and in our emphasis on learning a simple yet accurate heuristic.

Observations and Conclusion:

● Instruction scheduling is an important step for improving the performance of object code
produced by a compiler.
● Basic block scheduling is important as a building block for scheduling larger groups of
instructions such as superblocks.
● The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single-entry point and a single exit point
subject to precedence, latency, and resource constraints.
● Solving the problem exactly is known to be difficult, and most compilers use a greedy list

74
Compiler Design (3170701) 210210107052

scheduling algorithm coupled with a heuristic.


● Modern architectures are pipelined and can issue multiple instructions per time cycle. The
order that the instructions are scheduled can significantly impact performance.
● Instruction scheduling for basic blocks is known to be NP-complete for realistic
architectures.
● The most popular method for scheduling basic blocks continues to be list scheduling, which
takes a set of instructions as represented by a dependency DAG and builds a schedule using
a best-first greedy heuristic.
● The heuristic in a list scheduler generally consists of a set of features and an order for
testing the features.
● Decision tree learning is one of the most widely used algorithms for supervised learning.

Quiz:
1. What is the basic block instruction scheduling problem?
• The basic block instruction scheduling problem is a crucial optimization task in compiler
design that focuses on rearranging the order of instructions within a basic block to
improve the performance of the generated code. The goal is to minimize the overall
execution time by reducing pipeline stalls, data hazards, and other dependencies that can
lead to inefficient processor utilization.
• Key points about the basic block instruction scheduling problem include:
1. Dependency Management: The problem involves analyzing data dependencies and
control flow constraints within a basic block to determine the most efficient sequence
of instructions.
2. Optimization Objectives: The primary objective is to minimize pipeline stalls, such
as data hazards, structural hazards, and control hazards, to maximize the utilization
of hardware resources.
3. Scheduling Heuristics: Various heuristics and algorithms, such as list scheduling,
dynamic programming, and greedy algorithms, are employed to determine an
optimal or near-optimal schedule for the instructions within the basic block.
4. Compiler Efficiency: Efficient instruction scheduling can significantly improve the
overall performance of the generated code, enabling the compiler to produce
executable programs that utilize the underlying hardware more effectively and
execute tasks with improved speed and efficiency.

2. Why is instruction scheduling important for improving the performance of object code
producedby a compiler?
• Instruction scheduling is crucial for improving the performance of object code generated
by a compiler due to the following reasons:
1. Resource Utilization: Efficient instruction scheduling reduces stalls and hazards,
ensuring better utilization of hardware resources such as the CPU and memory.
2. Pipeline Optimization: Proper scheduling minimizes pipeline stalls, enabling the
processor to execute instructions more efficiently and effectively, thereby
maximizing its throughput.
3. Dependency Management: By managing data and control dependencies, instruction
scheduling minimizes the impact of hazards, improving the overall execution time
of the program.
4. Improved Parallelism: Optimal instruction scheduling facilitates better exploitation
of instruction-level parallelism, enabling the processor to execute multiple

75
Compiler Design (3170701) 210210107052

instructions simultaneously.
5. Enhanced Performance: Effective instruction scheduling results in faster execution
times and improved overall performance of the compiled code, leading to enhanced
system responsiveness and efficiency.

3. What are the constraints that need to be considered in solving the basic block
instruction scheduling problem?
• In solving the basic block instruction scheduling problem, several constraints need to be
considered to ensure that the optimized schedule complies with the dependencies and
limitations of the underlying hardware. These constraints include:
• Data Dependencies: Dependencies arise from the data flow between instructions, such as
read-after-write (RAW), write-after-read (WAR), and write-after-write (WAW)
dependencies. Scheduling must ensure that instructions dependent on the results of
previous instructions are executed in the correct order.
• Resource Conflicts: These constraints involve managing the limited hardware resources,
including processor units, functional units, and memory access. Scheduling should avoid
conflicts that may occur due to resource sharing.
• Control Dependencies: Control flow instructions, such as branches and jumps, introduce
constraints that must be accounted for during scheduling to ensure correct program
execution and maintain the integrity of the program's control flow.
• Hardware Limitations: Various hardware-specific limitations, such as pipeline length,
pipeline stages, and latency, must be considered to avoid pipeline stalls and other
performance bottlenecks.
• Instruction Set Architecture (ISA) Constraints: Compliance with the specific ISA of the
target processor is essential to ensure that the scheduled instructions can be executed
correctly on the target hardware.
• Considering these constraints is critical for devising an efficient scheduling strategy that
optimizes the performance of the generated code without compromising the correctness
and functionality of the program.

4. What is the most popular method for scheduling basic blocks?


• List scheduling is one of the most popular and widely used methods for scheduling basic
blocks in the context of compiler optimization. It is a heuristic-based algorithm that aims
to minimize the critical path length and maximize resource utilization by efficiently
scheduling instructions within a basic block. List scheduling works by creating a list of
instructions and their corresponding data and resource dependencies and then scheduling
these instructions based on their priorities and available resources. It is known for its
simplicity and effectiveness in achieving near-optimal schedules for basic blocks in the
context of compiler optimization.

5. What is the heuristic used in a list scheduler?


• In list scheduling, a common heuristic used is the "as-soon-as-possible" (ASAP)
heuristic. This heuristic prioritizes the scheduling of instructions as early as possible,
taking into account the availability of resources and the dependencies between
instructions. It aims to minimize the critical path length by scheduling instructions to
execute at the earliest available opportunity while ensuring that all data dependencies are
satisfied. The ASAP heuristic is effective in reducing the overall execution time of a
basic block by maximizing the utilization of resources and minimizing stalls and idle

76
Compiler Design (3170701) 210210107052

cycles within the pipeline.

Suggested Reference:

1. https://dl.acm.org/doi/10.5555/1105634.1105652
2. https://www.worldcat.org/title/1032888564

References used by the students:

• https://dl.acm.org/doi/10.5555/1105634.1105652
• https://www.researchgate.net/publication/221501045_Learning_basic_block_scheduling_h
euristics_from_optimal_data

Rubric wise marks obtained:

Problem
Knowledge Documentati Presentation(2
Recognition Ethics(2)
Rubrics (2) on (2) ) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)

Marks

77

You might also like