Compiler Design Lab Manual
Compiler Design Lab Manual
COLLEGE OF TECHNOLOGY
B.TECH
COMPUTER ENGINEERING
Name of Student:
Roll No:
Semester: Division:
COMPUTER ENGINEERING
VISION
To develop skilled professionals who can effectively tackle various challenges, equipped with
advanced capabilities and ethical principles to innovate responsibly for the benefit of society.
MISSION
CO1: Understand the basic concepts; ability to apply automata theory and knowledge on
formal languages.
CO2: Ability to identify and select suitable parsing strategies for a compiler for various
cases. Knowledge in alternative methods (top-down or bottom-up, etc).
CO3: Understand backend of compiler
CO4: Understand run time environments
CO-PO-Matrix:
CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 3 3 3 3 3 2 1 1 1 2 1 1
CO2 2 3 3 2 3 1 2 2 2
CO3 2 1 2 1 2 1 1
CO4 2 1 2 1
BONAFIDE CERTIFICATE
This is to certify that Mr./Ms ..........................................................................................
with Roll No. .............................................. from Semester ……… Division .…….
has successfully completed his/her laboratory experiments in the Compiler Design
(1010043418) from the department of ...................................................................
during the academic year ......... -........
String Matching:
The string-matching automaton is a very useful tool which is used in string matching algorithm.
String matching algorithms build a finite automaton scans the text string T for all occurrences of
the pattern P.
Finite Automata:
input symbols
σ = Transition function
The finite automaton starts in state q0 and reads the characters of its input string one at a time. If
the automaton is in state q and reads input character a, it moves from state q to state δ (q, a).
Whenever its current state q is a member of A, the machine M has accepted the string read so far.
An input that is not allowed is rejected.
1
A finite automaton M induces a function ∅ called the called the final-state function, from ∑* to
Q such that ∅(w) is the state M ends up in after scanning the string w. Thus, M accepts a string w
if and only if ∅(w) ∈ A.
Program:
2
3
Output:
4
Question Answers:
1. Consider the following C declaration. int x[10], y,z; What kind of error does
following statement contain? z == x +y;
a. Syntax error
b. Semantic error
c. Logical error
d. All of the above
3. All syntax errors and some of the semantic errors (the static semantic errors) are
detected by the .
4. Errors that occur when you violate the rules of writing C/C++ syntax are known as
.
Conclusion:
Marks out of 10
Signature with Date of Completion
5
Date: ___________
PRACTICAL 2
AIM: Introduction to lex tool.
Lex tool:
Lex is a program designed to generate scanners, also known as tokenizers, which recognize
lexical patterns in text. Lex is an acronym that stands for "lexical analyzer generator." It is
intended primarily for Unix-based systems. The code for Lex was originally developed by Eric
Schmidt and Mike Lesk.
lex file.l
./a.out
Installation steps:
6
Example of Lex Program:
7
Output:
8
Question Answers:
a. Scans the entire program first and then translate it into machine code
b. When all the syntax errors are removed execution takes place
c. Slow for debugging
d. Execution time is more
a. Semantic analysis
b. Syntax analysis
c. Regular analysis
d. General analysis
Conclusion:
Marks out of 10
9
Date: ___________
PRACTICAL 3
AIM: Implement following Programs Using Lex
a. Generate Histogram of words
b. Ceasor Cypher
c. Extract single and multiline comments from C Program
%{
Definition section
%}
%%
Rules section
%%
The Definition section is the place to define macros and import header files written in C. It is
also possible to write any C code here, which will be copied verbatim into the generated source
file. It is bracketed with %{ and %}.
The Rules section is the most important section; Each rule is made up of two parts: a pattern and
an action separated by whitespace. The lexer that lex generates will execute the action when it
recognizes the pattern. Patterns are simply regular expressions. When the lexer sees some text in
the input matching a given pattern, it executes the associated C code. It is bracketed with %% &
%%.
The User Subroutine section in which all the required procedures are defined. It contains the
main in which C statements and functions that are copied verbatim to the generated source file.
These statements presumably contain code called by the rules in the rules section.
Program:
10
11
12
Output:
a. Statement
b. Numeric data
c. ASCII data
d. set to regular expression
a. Yacc
b. Lex
c. Both A and B
d. None of the above.
13
4. The action part of lex program is included in section.
a. Rule
b. Declaration
c. Procedure
d. Body
a. Designing compilers
b. simulating sequential circuits
c. Developing text editors
d. All of the above.
Conclusion:
Marks out of 10
Signature with Date of Completion
14
Date: ___________
PRACTICAL 4
Roman numerals are a numeral system originating from ancient Rome, using combinations of
letters from the Latin alphabet (I, V, X, L, C, D, M) to represent numbers. Each letter has a
specific value:
● I=1
● V=5
● X = 10
● L = 50
● C = 100
● D = 500
● M = 1000
The conversion involves parsing a Roman numeral string and computing its decimal value based
on positional values and subtraction rules:
1. Add the value of a numeral if it is greater than or equal to the numeral to its right.
2. Subtract the value if it is less than the numeral to its right.
Lex is used to tokenize the input Roman numeral and apply the conversion logic. The patterns in
Lex identify Roman numerals, and the associated action performs the conversion using an
algorithm.
Lex Program:
15
Program:
16
Output:
17
b. Check Whether Given Statement is Compound or Simple
A Lex program can analyze text to identify and categorize statements based on keywords and
delimiters:
The Lex program uses regular expressions to identify these keywords and delimiters and classify
the statements accordingly.
Lex Program:
Program:
18
Output:
19
c. Extract HTML Tags from .html File
HTML tags are enclosed in angle brackets, and they define the structure and content of
web pages. Tags are represented as <tagname> and may also have attributes, resulting in formats
like <tagname attribute="value">.
1. Identify Tags: Match patterns that start with <, include tag names and optional attributes,
and end with >.
2. Extract Tags: Capture the text matching the tag pattern.
Lex can be used to create a lexical analyzer that identifies and extracts these tags from HTML
content based on regular expressions.
Lex Program:
Program:
20
Output:
21
Question and Answers:
1. What is the primary purpose of the Lex program used to convert Roman numerals
to decimal?
A. To parse and validate HTML documents
B. To translate Roman numeral characters into their decimal equivalent
C. To determine if a statement is simple or compound
D. To identify and extract HTML tags from a file
2. In the Lex program for checking whether a statement is compound or simple, which
of the following is considered a compound statement?
A. x = 5;
B. if (x > 0) { y = x; }
C. y = x + 2;
D. while (x < 10) { x++; }
3. In the Lex program to extract HTML tags, which regular expression pattern would
match the following HTML tag: <div class="content">?
A. [^>]+
B. <[^>]+>
C. <![^>]+>
D. </[^>]+>
4. In the Lex program for converting Roman numerals to decimal, how is the
subtraction rule applied?
A. If a numeral is followed by a larger numeral, its value is added.
B. If a numeral is followed by a larger numeral, its value is subtracted.
C. If a numeral is followed by a smaller numeral, its value is ignored.
D. All values are added regardless of the numeral's position.
Conclusion:
Marks out of 10
Signature with Date of Completion
22
Date: ___________
PRACTICAL 5
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It can
be defined as a Parser that uses the various recursive procedure to process the input string with
no backtracking. It can be simply performed using a Recursive language. The first symbol of the
string of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a procedure.
The objective of each procedure is to read a sequence of input characters that can be produced by
the corresponding non-terminal, and return a pointer to the root of the parse tree for the non-
terminal. The structure of the procedure is prescribed by the productions for the equivalent non-
terminal.
The recursive procedures can be simply to write and adequately effective if written in a language
that executes the procedure call effectively. There is a procedure for each non-terminal in the
grammar. It can consider a global variable lookahead, holding the current input token and a
procedure match (Expected Token) is the action of recognizing the next token in the parsing
process and advancing the input stream pointer, such that lookahead points to the next token to
be parsed. Match () is effectively a call to the lexical analyzer to get the next token.
For example, input stream is a + b$. lookahead == a
match() lookahead == + match () lookahead == b
In this manner, parsing can be done.
Example − In the following Grammar, first symbol, i.e., if, while & begin uniquely determine,
which of the alternative to choose.
As Statement starting with if will be a conditional statement & statement starting with while will
be an Iterative Statement.
Stmt → If condition then Stmt else Stmt
| While condition do Stmt
| begin Stmt end.
23
One of major drawback or recursive-descent parsing is that it can be
implemented only for those languages which support recursive procedure
calls and it suffers from the problem of left- recursion.
Example − Write down the algorithm using Recursive procedures to
implement the following Grammar.
E → TE′ E′ → +TE′ T → FT′
T′ →∗ FT′|ε F → (E)|id
Program:
24
25
Output:
26
Question and Answers:
a. 2
b. 3
c. 4
d. 5
2. When the parser starts constructing the parse tree from the start
symbol and then tries to transform the start symbol to the input, it is called?
a. bottom-up parsing
b. top-down parsing
c. Both A and B
d. None of the above
a. Bottom-up Parsing
b. Recursive descent parsing
c. Backtracking
d. All of the above
a. predictive parsing
b. non-predictive parsing
c. recursive parsing
d. non-recursive parsing
Conclusion:
Marks out of 10
Signature with Date of Completion
27
Date: ______________
PRACTICAL 6
FIRST () is a function that specifies the set of terminals that start a string
derived from a production rule. It is the first terminal that appears on the
right-hand side of the production. For example,
Here we find out that T has two productions like T->*FT’ and T->ε, after
viewing this we found the first of T in both the production statement which
is * and ε.
Rule 2:
Rule 3:
● If ∈ does not belongs to First(E), then First(EF) = First(E)
● If ∈ belongs to First(E), then First(EF) = { First(E) – ∈ } 𝖴 First(F)
28
Production Rule / CFG First
S -> E F First(S) = { g, ∈ }
E -> g | ∈ First(E) = { g, ∈}
F -> f | o First(F) = { f, o }
1. E → E + T / T T → T x F / F F → (E) / id
29
2. S → ACB / CbB / Ba A → da / BC
B→g/∈
C→h/∈
Marks out of 10
Signature with Date of Completion
30
Date: ______________
PRACTICAL 7
Follow():
To find the "Follow" set of a grammar, you need to follow a systematic approach that
involves analyzing the production rules of the grammar. The "Follow" set of a
non-terminal symbol contains all the terminals that can appear immediately after that
non-terminal in any valid derivation of the grammar. Here are the steps to find the
"Follow" set of a non-terminal symbol:
Start by initializing the "Follow" sets of all non-terminal symbols to empty sets.
The "Follow" of the start symbol (usually denoted by S) is set to contain the
end-of-input symbol ($). This indicates the end of the input.
Go through each production rule of the grammar and apply the following rules:
Continue applying Step 3 until no new elements are added to any "Follow" set.
A -> a | ε B -> b
C -> c | ε
31
32
2. S -> A B
A -> a | ε B -> b C
C -> c | ε
33
34
Question And Answers
4. Which data structure is commonly used to store the "First" and "Follow" sets in a
compiler?
a. Stack
b. Queue
c. Set
d. Array
Conclusion:
Marks out of 10
Signature with Date of Completion
35
Date: ___________
PRACTICAL 8
Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to represent
an expression and the value computed at each instruction is stored in temporary variable
generated by compiler. The compiler decides the order of operation given by three address code.
Optimization: Three address code is often used as an intermediate representation of code during
optimization phases of the compilation process. The three address code allows the compiler to
analyze the code and perform optimizations that can improve the performance of the generated
code.
Code generation: Three address code can also be used as an intermediate representation of code
during the code generation phase of the compilation process. The three address code allows the
compiler to generate code that is specific to the target platform, while also ensuring that the
generated code is correct and efficient.
Debugging: Three address code can be helpful in debugging the code generated by the compiler.
Since three address code is a low-level language, it is often easier to read and understand than the
final generated code. Developers can use the three address code to trace the execution of the
program and identify errors or issues that may be present.
Algorithm:
Program:
36
37
38
OUTPUT:
39
3. In intermediate code generation, what is the role of a quadruple?
Conclusion:
Marks out of 10
Signature with Date of Completion
40
Date: ___________
PRACTICAL 9
Yacc (for “yet another compiler compiler.”) is the standard parser generator for the Unix
operating system. An open source program, yacc generates code for the parser in the C
programming language. The acronym is usually rendered in lowercase but is occasionally seen as
YACC or Yacc.
Step1: A Yacc source program has three parts as follows: Declarations %% translation rules %%
supporting C routines
iv. Define the tokens used by the parser. v. Define the operators and their precedence.
Step3: Rules Section: The rules section defines the rules that parse the input stream. Each rule of
a grammar production and the associated semantic action.
Step4: Programs Section: The programs section contains the following subroutines. Because
these subroutines are included in this file, it is not necessary to use the yacc library when
processing this file.
Step5: Main- The required main program that calls the yyparse subroutine to start the program.
Step6: yyerror(s) -This error-handling subroutine only prints a syntax error message.
Step7: yywrap -The wrap-up subroutine that returns a value of 1 when the end of input occurs.
The calc.lex file contains include statements for standard input and output, as programmar file
information if we use the -d flag with the yacc command. The y.tab.h file contains definitions for
the tokens that the parser program uses.
Step8: calc.lex contains the rules to generate these tokens from the input stream.
41
Program:
42
43
Output:
44
Question and Answers
a. Lexical analysis
b. Syntax analysis
c. Semantic analysis
d. Code optimization
a. C
b. Java
c. Python
d. JavaScript
Conclusion:
Marks out of 10
Signature with Date of Completion
45
Date: ___________
PRACTICAL 10
AIM: Implement a C program for operator precedence parsing Operator Precedence
Parsing:
A grammar that is used to define mathematical operators is called an operator grammar or
operator precedence grammar. Such grammars have the restriction that no production has either
an empty right-hand side (null productions) or two adjacent non-terminals in its right-hand side.
Examples – This is an example of operator grammar:
E->E+E/E*E/id
However, the grammar given below is not an operator grammar because two non-terminals are
adjacent to each other:
S->SAS/a
A->bSb/b
S->SbSbS/SbS/a A->bSb/b
Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets
an operator grammar. This parser is only used for operator grammars. Ambiguous grammars are
not allowed in any parser except operator precedence parser. There are two methods for
determining what precedence relations should hold between a pair of terminals
This parser relies on the following three precedence relations: ⋖, ≐, ⋗ a ⋖ b This means a
“yields precedence to” b. a ⋗ b This means a “takes precedence over” b. a ≐ b This means a “has
same precedence as” b.
46
Program:
47
Output:
48
QUESTIONS AND ANSWERS
1. What is an operator precedence parser?
a. A type of parser that uses operator precedence to resolve shift-reduce conflicts.
b. A parser that uses regular expressions to match operators in the input.
c. A parser that handles arithmetic expressions but cannot handle logical expressions.
d. A parser that assigns precedence levels to operators based on their appearance in the
input.
2. In an operator precedence parser, which of the following operators has the highest
precedence?
a. Addition (+)
b. Multiplication (*)
c. Exponentiation (^)
d. Parentheses ()
Conclusion:
Marks out of 10
Signature with Date of Completion
49
50