0% found this document useful (0 votes)
38 views81 pages

CompilerDesign UNIT 1-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views81 pages

CompilerDesign UNIT 1-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

1

2
21CS601
Compiler Design

Department: CSE
Batch/Year: 2021-25 / III

Created by:
Dr. P. EZHUMALAI, Prof & Head/RMDEC
Dr. A. K. JAITHUNBI, Associate Professor/RMDEC
V.SHARMILA, Assistant Professor/RMDEC

Date: 05.01.2024

3
1. CONTENTS

S. Page
Contents
No No
1 Course Objectives 5

2 Pre Requisites 6

3 Syllabus 7

4 Course outcomes 9

5 CO- PO/PSO Mapping 10

6 Lecture Plan 11

7 Activity based learning 12

8 Lecture Notes 13

9 Assignments 69

10 Part A Questions & Answers 70

11 Part B Questions 74

12 Supportive online Certification courses 75

13 Real time Applications 76

14 Contents beyond the Syllabus 77

15 Assessment Schedule 78

16 Prescribed Text Books & Reference Books 79

17 Mini Project Suggestions 80

4
2. COURSE OBJECTIVES

• To study the different phases of Compiler.

• To understand the techniques for tokenization and parsing.

• To understand the conversion of source program into an intermediate


representation.

• To learn the different techniques used for assembly code generation.

• To analyze various code optimization techniques.

5
3. PRE REQUISITES
• Pre-requisite Chart

21CS601 - COMPILER DESIGN

21CS503 THEORY OF COMPUTATION

21MA302
21CS201
Discrete Mathematics
Data Structures

21CS02 Python
21GE101 -
Programming (Lab
Problem solving and C
Integrated)
Programming

6
4. SYLLABUS
21CS601 COMPILER DESIGN (Lab Integrated) LTPC
3 02 4
OBJECTIVES

• To study the different phases of compiler.


• To understand the techniques for tokenization and parsing.
• To understand the conversion of source program into an intermediate
representation.
• To learn the different techniques used for assembly code generation.
• To analyze various code optimization techniques.

UNIT I INTRODUCTION TO COMPILERS 9


Introduction–Structure of a Compiler–Role of the Lexical Analyzer - Input
Buffering - Specification of Tokens - Recognition of Tokens–The Lexica Analyzer
Generator LEX- Finite Automata - From Regular Expressions to Automata -
conversion from NFA to DFA, Epsilon NFA to DFA – Minimization of Automata.
UNIT II SYNTAX ANALYSIS 9
Role of the Parser - Context-free grammars – Derivation Trees – Ambiguity in
Grammars and Languages- Writing a grammar – Top-Down Parsing –Bottom Up
Parsing -LR Parser-SLR, CLR - Introduction to LALR Parser -Parser Generators –
Design of a parser generator –YACC.
UNITIII INTERMEDIATE CODE GENERATION 9
Syntax Directed Definitions - Evaluation Orders for Syntax Directed Definitions–
Application of Syntax Directed Translation - Intermediate Languages - Syntax Tree
-Three address code – Types and Declarations - Translation of Expressions - Type
Checking.

UNIT IV RUN-TIME ENVIRONMENT AND CODE GENERATION 9


Run Time Environment: Storage Organization-Stack Allocation of space - Access
to nonlocal data on stack – Heap management - Parameter Passing - Issues in Code
Generation - Design of a simple Code Generator Code generator using DAG –
Dynamic programming based code generation.
UNIT V CODE OPTIMIZATION 9
Principal Sources of Optimization – Peep-hole optimization - Register allocation
and assignment - DAG -Basic blocks and flow graph - Optimization in Basic
blocks – Data Flow Analysis.

7
4. SYLLABUS

LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol
table, while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.
3. Implement a Lexical Analyzer using Lex Tool.
4. Design Predictive Parser for the given language.
5. Implement an Arithmetic Calculator using LEX and YACC.
6. Generate three address code for a simple program using LEX and YACC.
7. Implement simple code optimization techniques (Constant folding,
Strength reduction and Algebraic transformation).
8. Implement back-end of the complier for which the three address code
is given as input and the 8086 assembly language code is produced as
output.

8
5. COURSE OUTCOME

• At the end of the course, the student should be able to:

COURSE OUTCOMES HKL

Understand the different phases of compiler and identify the


CO1 K2
tokens using Automata and LEX tool.

Construct the parse tree and check the syntax of the given
CO2 K3
source program using parser and YACC tool.

Generate intermediate code representation for the given source


CO3 K4
program after syntax directed translation and type checking.

Analyze the run-time environment and design a simple code


CO4 K4
generator for assembly code generation.

Implement the various code optimization techniques to reduce


CO5 K3
the size of the code.

• HKL = Highest Knowledge Level

9
6. CO - PO / PSO MAPPING

PROGRAM OUTCOMES PSO


K3, PSO PSO PSO
CO HKL K3 K4 K5 K5 K4, A3 A2 A3 A3 A3 A3 A2 1 2 3
K5
PO PO PO PO PO PO PO PO PO PO PO PO
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12

CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -

CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO3 K4 3 2 1 - - - - 1 1 1 - 1 2 - -

CO4 K4 3 2 1 - - - - 1 1 1 - 1 2 - -

CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

• Correlation Level - 1. Slight (Low) 2. Moderate (Medium)


3. Substantial (High) , If there is no correlation, put “-“.

10
7. LECTURE PLAN : UNIT – I

INTRODUCTION TO COMPILERS

UNIT – I INTRODUCTION TO COMPILERS

Proposed Actual Highest Re


S. Mode of Delivery
Lecture Topic Lecture CO Cognitive LU Outcomes mar
No Delivery Resources
Date Date Level k

2/1/2024 Structure of a MD1 & Define the structure of a


1 K2 T1
compiler MD5 compiler.
05/1/2024 Lexical Analysis
MD1 & Defines the role of Lexical
2 – Role of K3 T1
MD5 Analyser.
Lexical Analyzer
08/1/2024
MD1 & Explain how input provides
3 Input Buffering K2 T1
MD5 in lexical analyzer.
10/1/2024 Specification of MD1 & Explains the token
4 K3 T1
Tokens MD5 specification.
Explains the recognition of
12/1/2024 Recognition of MD1 &
5 CO1 K3 T1 tokens during the lexical
Tokens MD5 analysis phase.
22/1/2024 MD1 & Defines the Finite automata
6 Finite Automata K3 T1
MD5 and its types.
23/1/2024 Regular
MD1 & Explains the ways to convert
7 Expressions to K3 T1
MD5 the RE to Automata.
Automata
24/1/2024 Apply the Minimization
MD1 &
8 Minimizing DFA K3 T1 algorithm to reduce the
MD5 states in DFA.
25/1/2024
MD1 & Explains how to use LEX tool
9 Lex K3 T1
MD5 to design Lexical Analyser.

• ASSESSMENT COMPONENTS MODE OF DELEIVERY


• AC 1. Unit Test MD 1. Oral presentation
• AC 2. Assignment MD 2. Tutorial
• AC 3. Course Seminar MD 3. Seminar
• AC 4. Course Quiz MD 4 Hands On
• AC 5. Case Study MD 5. Videos
• AC 6. Record Work MD 6. Field Visit
• AC 7. Lab / Mini Project
• AC 8. Lab Model Exam
• AC 9. Project Review
8. ACTIVITY BASED LEARNING : UNIT – I

UNIT – I
• TO UNDERSTAND THE BASIC CONCEPTS OF COMPILERS , STUDENTS ABLE TO
TAKE QUIZ AS AN ACTIVITY.

• LINK for the quiz is given below.

https://sites.google.com/site/wonsunahn/teaching/cs-0449-systems-
software/kahoot-quiz

Video link

https://youtu.be/l-JTDDCRBss

https://youtu.be/AdXideQrkPE

https://youtu.be/XPH_hoP9z40

https://youtu.be/NmOEbti9gzY

https://youtu.be/IFBb84Ec_Cg

• Hands On -Assignment:

1. Illustrate diagrammatically the output of each phases of compiler


2. Demonstrate the RE to DFA process using JFLAP
a. a(b+c)*a
b. 10 + (0 + 11)0* 1
To run JFLAP on Windows, you may simply use one of the following methods:
• In a command/console window, go to the directory with the JFLAP7. jar, execute
java -jar JFLAP7.
• Right-click on the file and open it with "Java(TM) Platform SE binary"
• Double-click on the file (How to run a Jar file in Windows by double-clicking)

1. Use online LEX compiler and execute the program


a. To create simple calculator with variables.
b. To Count the number of characters in a string
2. Use Regular Expression Tester - https://www.freeformatter.com/regex-
tester.html and the test the regular expressions and show the results
ab*,(a|b)*abb.

12
9. LECTURE NOTES
UNIT – I

INTRODUCTION TO COMPILERS

1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM

Preprocessor

A preprocessor produce input to compilers. They may perform the following


functions.

1.Macro processing: A preprocessor may allow a user to define macros that


are short hands for longer constructs.

2.File inclusion: A preprocessor may include header files into the program
text.

3. Rational preprocessor: these preprocessors augment older languages with


more modern flow-of-control and data structuring facilities.

4.Language Extensions: These preprocessor attempts to add capabilities to


the language by certain amounts to build-in macro

13
COMPILER

Compiler is a translator program that translates a program written in (HLL) the


source program and translate it into an equivalent program in (MLL) the target
program. As an important part of a compiler is error showing to the programmer.

Source Program COMPILER Target Program

Error Message

Executing a program written n HLL programming language is basically of two parts.


The source program must first be compiled translated into a object program. Then
the results object program is loaded into a memory executed.

Source Program COMPILER Obj Program

Obj Program input OBJECT PROGRAM Obj Program

ASSEMBLER: programmers found it difficult to write or read programs in machine


language. They begin to use a mnemonic (symbols) for each machine instruction,
which they would subsequently translate into machine language. Such a mnemonic
machine language is now called an assembly language. Programs known as
assembler were written to automate the translation of assembly language in to
machine language. The input to an assembler program is called source program,
the output is a machine language translation (object program).

14
INTERPRETER: An interpreter is a program that appears to execute a source
program as if it were machine language.

INPUT PROCESS OUTPUT

INTERPRETER
Source Program Program Output

Data

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters.


JAVA also uses interpreter. The process of interpretation can be carried out in
following phases.

1. Lexical analysis

2. Syntax analysis

3. Semantic analysis

4. Direct Execution

Advantages:

• Modification of user program can be easily made and implemented as


execution proceeds.
• Type of object that denotes various may change dynamically.
• Debugging a program and finding errors is simplified task for a program used
for interpretation.
• The interpreter for the language makes it machine independent.

Disadvantages:

• The execution of the program is slower. Memory consumption is more.

15
Loader and Link-editor:

Once the assembler procedures an object program, that program must be placed
into memory and executed. The assembler could place the object program directly
in memory and transfer control to it, thereby causing the machine language
program to be execute. This would waste core by leaving the assembler in memory
while the users program was being executed. Also the programmer would have to
retranslate his program with each execution, thus wasting translation time. To
overcome this problem of wasted translation time and memory, system
programmers developed another component called loader.
“A loader is a program that places programs into memory and prepares them for
execution.” It would be more efficient if subroutines could be translated into object
form the loader could ”relocate” directly behind the users program. The task of
adjusting programs so that they may be placed in arbitrary core locations is called
relocation. Relocation loaders perform four functions.

TRANSLATOR
A translator is a program that takes as input a program written in one language
and produces as output a program in another language. Beside program
translation, the translator performs another very important role, the error-
detection. Any violation of the HLL (High Level Language) specification would be
detected and reported to the programmers. Important role of translator are:
• Translating the HLL program input into an equivalent ml program.
• Providing diagnostic messages wherever the programmer violates
specification of the HLL.

TYPE OF TRANSLATORS:-

1) INTERPRETOR 2) COMPILER 3) PREPROSSESSOR

16
1.2 STRUCTURE OF THE COMPILER DESIGN
Phases of a compiler: A compiler operates in phases. A phase is a logically
interrelated operation that takes source program in one representation and
produces output in another representation. The phases of a compiler are shown in
below
There are two phases of compilation.
a. Analysis (Machine Independent/Language Dependent)
b. Synthesis(Machine Dependent/Language independent)
Compilation process is partitioned into
PHASES OF A COMPILER
No-of-sub processes called ‘phases’.
Lexical Analysis:-
LA or Scanners reads the source program one character at a time, carving the
source program into a sequence of atomic units called tokens.
Syntax Analysis:-
The second stage of translation is called Syntax analysis or parsing. In this phase
expressions, statements, declarations etc… are identified by using the results of
lexical analysis. Syntax analysis is aided by using techniques based on formal
grammar of the programming language.
Intermediate Code Generations:-
An intermediate representation of the final machine language code is produced.
This phase bridges the analysis and synthesis phases of translation.
Code Optimization :-
This is optional phase described to improve the intermediate code so that the output
runs faster and

17
Code Generation:-
The last phase of translation is code generation. A number of optimizations to
reduce the length of machine language program are carried out during this
phase. The output of the code generator is the machine language program of the
specified computer.

Table Management (or) Book-keeping:-


This is the portion to keep the names used by the program and records essential
information about each. The data structure used to record this information called a
“Symbol Table”.
Error Handlers:-
It is invoked when a flaw error in the source program is detected.
The output of LA is a stream of tokens, which is passed to the next phase, the
syntax analyzer or parser. The SA groups the tokens together into syntactic
structure called as expression. Expression may further be combined to form
statements. The syntactic structure can be regarded as a tree whose leaves are the
token called as parse trees.

18
The parser has two functions. It checks if the tokens from lexical analyzer,
occur in pattern that are permitted by the specification for the source language. It
also imposes on tokens a tree-like structure that is used by the sub-sequent phases
of the compiler.

Example, if a program contains the expression A+/B after lexical analysis this
expression might appear to the syntax analyzer as the token sequence id+/id. On
seeing the /, the syntax analyzer should detect an error situation, because the
presence of these two adjacent binary operators violates the formulations rule of
an expression.
Syntax analysis is to make explicit the hierarchical structure of the incoming token
stream by identifying which parts of the token stream should be grouped.

Example, (A/B*C has two possible interpretations.) 1, divide A by B and then


multiply by C or
2, multiply B by C and then use the result to divide A.
each of these two interpretations can be represented in terms of a parse tree.

Intermediate Code Generation:-

The intermediate code generation uses the structure produced by the syntax
analyzer to create a stream of simple instructions. Many styles of intermediate code
are possible. One common style uses instruction with one operator and a small
number of operands.
The output of the syntax analyzer is some representation of a parse tree. the
intermediate code generation phase transforms this parse tree into an intermediate
language representation of the source program.

Code Optimization

This is optional phase described to improve the intermediate code so that the
output runs faster and takes less space. Its output is another intermediate code
program that does the some job as the original, but in a way that saves time and /
or spaces.
1, Local Optimization:-
There are local transformations that can be applied to a program to make an
improvement. For example,
If A > B goto L2
Goto L3

19
L2 :This can be replaced by a single statement
If A < B goto L3
Another important local optimization is the elimination of common sub-
expressions
A := B + C + D E := B + C + F
Might be evaluated as
T1 := B + C
A := T1 + D E := T1 + F
Take this advantage of the common sub-expressions B + C.
2, Loop Optimization:-
Another important source of optimization concerns about increasing the speed of
loops. A typical loop improvement is to move a computation that produces the
same result each time around the loop to a point, in the program just before the
loop is entered.
Code generator :-
Cg produces the object code by deciding on the memory locations for data,
selecting code to access each datum and selecting the registers in which each
computation is to be done. Many computers have only a few high speed registers in
which computations can be performed quickly. A good code generator would
attempt to utilize registers as efficiently as possible.

Symbol Table Management OR Book-keeping :-


A compiler needs to collect information about all the data objects that appear in the
source program. The information about data objects is collected by the early phases
of the compiler-lexical and syntactic analyzers. The data structure used to record
this information is called as Symbol Table.

Error Handing :-
One of the most important functions of a compiler is the detection and reporting of
errors in the source program. The error message should allow the programmer to
determine exactly where the errors have occurred. Errors may occur in all or the
phases of a compiler.
Whenever a phase of the compiler discovers an error, it must report the error to
the error handler, which issues an appropriate diagnostic msg. Both of the table-
management and error-Handling routines interact with all phases of the compiler.

20
21
1.3 LEXICAL ANALYSIS

Lexical analysis is the process of converting a sequence of characters into a


sequence of tokens. A program or function which performs lexical analysis is called a
lexical analyzer or scanner. A lexer often exists as a single function which is called by
a parser or another function.

THE ROLE OF THE LEXICAL ANALYZER


The lexical analyzer is the first phase of a compiler.
Its main task is used to read the input characters and produces as output a
sequence of tokens that the parser uses for syntax analysis.

Upon receiving a ‘get next token’ command from the parser, the lexical analyzer
reads input characters until it can identify the next token.

ISSUES OF LEXICAL ANALYZER

There are three issues in lexical analysis:


To make the design simpler.
To improve the efficiency of the compiler. To enhance the computer portability.

TOKENS

A token is a string of characters, categorized according to the rules as a symbol


(e.g., IDENTIFIER, NUMBER, COMMA). The process of forming tokens from an input
stream of characters is called tokenization.

22
A token can look like anything that is useful for processing an input text stream or
text file. Consider this expression in the C programming language: sum=3+2;

Lexeme Token type

sum Identifier

= Assignment operator

3 Number

+ Addition operator

2 Number

; End of statement

LEXEME:
Collection or group of characters forming tokens is called Lexeme.
PATTERN:
A pattern is a description of the form that the lexemes of a token may take.
In the case of a keyword as a token, the pattern is just the sequence of characters
that form the keyword. For identifiers and some other tokens, the pattern is a more
complex structure that is matched by many strings.
Attributes for Tokens
Some tokens have attributes that can be passed back to the parser. The lexical
analyzer collects information about tokens into their associated attributes. The
attributes influence the translation of tokens.
a. Constant : value of the constant
b. Identifiers: pointer to the corresponding symbol table
entry.

23
ERROR RECOVERY STRATEGIES IN LEXICAL ANALYSIS:
The following are the error-recovery actions in lexical analysis:
1) Deleting an extraneous character.
2) Inserting a missing character.
3) Replacing an incorrect character by a correct character.
4) Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from the
token until error is resolved.
INPUT BUFFERING
We often have to look one or more characters beyond the next lexeme before we
can be sure we have the right lexeme. As characters are read from left to right,
each character is stored in the buffer to form a meaningful token as shown below:
Forward pointer

A = B + C

Beginning of the token Look ahead pointer


We introduce a two-buffer scheme that handles large look ahead’s safely. We then
consider an improvement involving "sentinels" that saves time checking for the
ends of buffers.
BUFFER PAIRS
A buffer is divided into two N-character halves, as shown below
Beginning of the token Look ahead pointer
We introduce a two-buffer scheme that handles large look ahead’s safely. We then
consider an improvement involving "sentinels" that saves time checking for the
ends of buffers.
BUFFER PAIRS
A buffer is divided into two N-character halves, as shown below

24
Each buffer is of the same size N, and N is usually the number of characters on one
disk block. E.g., 1024 or 4096 bytes.
Using one system read command we can read N characters into a buffer.
If fewer than N characters remain in the input file, then a special character,
represented by eof, marks the end of the source file.
Two pointers to the input are maintained:
1. Pointer lexeme_beginning, marks the beginning of the current lexeme,
whose extent we are attempting to determine.
2. Pointer forward scans ahead until a pattern match is found.

Once the next lexeme is determined, forward is set to the character at its right end.
The string of characters between the two pointers is the current lexeme. After the
lexeme is recorded as an attribute value of a token returned to the parser,
lexeme_beginning is set to the character immediately after the lexeme just found.

Advancing forward pointer:


Advancing forward pointer requires that we first test whether we have reached the
end of one of the buffers, and if so, we must reload the other buffer from the
input, and move forward to the beginning of the newly loaded buffer. If the end of
second buffer is reached, we must again reload the first buffer with input and the
pointer wraps to the beginning of the buffer.
Code to advance forward pointer:
if forward at end of first half then
begin
reload second half; forward := forward + 1
end
else if forward at end of second half

25
then begin
reload second half; move forward to beginning of first half
end
else forward := forward + 1;
SENTINELS
For each character read, we make two tests: one for the end of the buffer, and one
to determine what character is read. We can combine the buffer-end test with the
test for the current character if we extend each buffer to hold a sentinel character
at the end.
The sentinel is a special character that cannot be part of the source program, and a
natural choice is the character eof.
The sentinel arrangement is as shown below:

Note that eof retains its use as a marker for the end of the entire input. Any eof
that appears other than at the end of a buffer means that the input is at an end.
Code to advance forward pointer:
forward : = forward + 1;
if forward ↑ = eof then begin
if forward at end of first half then begin reload second half;
forward := forward + 1
end
else if forward at end of second half then begin reload first half;
move forward to beginning of first half end
else /* eof within a buffer signifying end of input */ terminate lexical analysis
end

26
1.4 SPECIFICATION OF TOKENS
There are 3 specifications of tokens:
1) Strings
2) Language
3) Regular expression
Strings and Languages
An alphabet or character class is a finite set of symbols.
A string over an alphabet is a finite sequence of symbols drawn from that alphabet. A
language is any countable set of strings over some fixed alphabet.
In language theory, the terms "sentence" and "word" are often used as synonyms for
"string." The length of a string s, usually written |s|, is the number of occurrences of
symbols in s. For example, banana is a string of length six. The empty string, denoted ε, is
the string of length zero.
Operations on strings
The following string-related terms are commonly used:
1. A prefix of string s is any string obtained by removing zero or more symbols
from the end of string s.
For example, ban is a prefix of banana.
2. A suffix of string s is any string obtained by removing zero or more symbols
from the beginning of s. For example, nana is a suffix of banana.

3. A substring of s is obtained by deleting any prefix and any suffix from s.


For example, nan is a substring of banana.

4. The proper prefixes, suffixes, and substrings of a string s are those


prefixes, suffixes, and substrings, respectively of s that are not ε or not equal to s itself.

5. A subsequence of s is any string formed by deleting zero or more not


necessarily consecutive positions of s. For example, baan is a subsequence of banana.

27
Operations on languages:
The following are the operations that can be applied to languages:
1.Union 2.Concatenation 3.Kleene closure 4.Positive closure The following example
shows the operations on strings: Let L={0,1} and S={a,b,c}
1. Union : L U S={0,1,a,b,c}
2. Concatenation : L.S={0a,1a,0b,1b,0c,1c}
3. Kleene closure : L*={ ε,0,1,00….}
4. Positive closure : L+={0,1,00….}
Regular Expressions
Each regular expression r denotes a language L(r).
Here are the rules that define the regular expressions over some alphabet Σ and
the languages that those expressions denote:
1. ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole
member is the empty string.
2. If „a‟ is a symbol in Σ, then „a‟ is a regular expression, and L(a) = {a}, that
is, the language with one string, of length one, with „a‟ in its one position.
3. Suppose r and s are regular expressions denoting the languages L(r) and
L(s). Then, a) (r)|(s) is a regular expression denoting the language L(r) U L(s).
b) (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r).
4. The unary operator * has highest precedence and is left associative.
5. Concatenation has second highest precedence and is left
associative.
6. | has lowest precedence and is left associative.

28
Regular set
A language that can be defined by a regular expression is called a regular set.
If two regular expressions r and s denote the same regular set, we say they are
equivalent and Write r = s.
There are a number of algebraic laws for regular expressions that can be used to
manipulate into equivalent forms.
For instance, r|s = s|r is commutative; r|(s|t)=(r|s)|t is associative.
Regular Definitions
Giving names to regular expressions is referred to as a Regular definition. If Σ is an
alphabet of basic symbols, then a regular definition is a sequence of definitions of
the form

dl → r 1 d2 → r2
……… dn → rn
1. Each di is a distinct name.
2. Each ri is a regular expression over the alphabet Σ U {dl, d2,. . . ,
di-l}.
Example: Identifiers is the set of strings of letters and digits beginning with a letter.
Regular definition for this set:
letter → A | B | …. | Z | a | b | …. | z | digit → 0 | 1 | …. | 9
id → letter ( letter | digit ) *
Shorthands
Certain constructs occur so frequently in regular expressions that it is convenient to
introduce notational shorthands for them.

29
1. One or more instances (+):
· The unary postfix operator + means “ one or more instances of” .
· If r is a regular expression that denotes the language L(r), then ( r )+
is a regular expression that denotes the language (L (r ))+
· Thus the regular expression a+ denotes the set of all strings of one
or more a‟s.
· The operator + has the same precedence and associativity as the
operator *.
2. Zero or one instance ( ?):
· The unary postfix operator ? means “zero or one instance of”.
· The notation r? is a shorthand for r | ε.
· If „r‟ is a regular expression, then ( r )? is a regular expression that
denotes the language L( r ) U { ε }.
3. Character Classes:

· The notation [abc] where a, b and c are alphabet symbols denotes


the regular expression a | b | c.
· Character class such as [a – z] denotes the regular expression a | b |
c | d | ….|z.
· We can describe identifiers as being strings generated by the regular
expression, [A–Za–z][A–Za–z0– 9]*
Non-regular Set
A language which cannot be described by any regular expression is a non-regular
set. Example: The set of all strings of balanced parentheses and repeating strings
cannot be described by a regular expression. This set can be specified by a context-
free grammar.

30
1.5 RECOGNITION OF TOKENS
Consider the following grammar fragment:
stmt → if expr then stmt | if expr then stmt else stmt | ε
expr → term relop term | term
term → id | num
where the terminals if , then, else, relop, id and num generate sets of strings given
by the following regular definitions:
if the n → if the n els
els → e
e →
relop → <|<=|=|<>|>|>=
id → letter(letter|digit)*
num → digit+ (.digit+)?(E(+|-
)?digit+)?
For this language fragment the lexical analyzer will recognize the keywords if, then,
else,
as well as the lexemes denoted by relop, id, and num. To simplify matters, we
assume keywords are reserved; that is, they cannot be used as identifiers.
Transition diagrams
It is a diagrammatic representation to depict the action that will take place when a
lexical analyzer is called by the parser to get the next token. It is used to keep
track of information about the characters that are seen as the forward pointer
scans the input.

31
1.6 FINITE AUTOMATA
Finite Automata is one of the mathematical models that consist of a number of
states and edges. It is a transition diagram that recognizes a regular expression or
grammar.
Types of Finite Automata
There are two types of Finite Automata:
Non-deterministic Finite Automata (NFA) Deterministic Finite Automata (DFA)
Non-deterministic Finite Automata
NFA is a mathematical model that consists of five tuples denoted by M = {Qn, Ʃ, δ,
q0, fn}
Qn – finite set of states
Ʃ – finite set of input symbols
δ – transition function that maps state-symbol pairs to set of states q0 – starting
state
fn – final state

32
Deterministic Finite Automata
DFA is a special case of a NFA in which i) no state has an ε-transition.
ii) there is at most one transition from each state on any input. DFA has five tuples
denoted by
M = {Qd, Ʃ, δ, q0, fd} Qd – finite set of states
Ʃ– finite set of input symbols
Δ – transition function that maps state-symbol pairs to set of states q0 –
starting state
fd – final state
1.7 Converting a Regular Expression into a Non-Deterministic Finite
Automaton (Thompson’s Algorithm)
There are only 5 rules, one for each type of RE:

33
The algorithm constructs NFAs with only one final state. For example, the third rule
indicates that, to construct the NFA for the RE AB, we construct the NFAs for A and
B which are represented as two boxes with one start and one final state for each
box. Then the NFA for AB is constructed by connecting the final state of A to the
start state of B using an empty transition.
For example, the RE (a|b)c is mapped to the following NFA:

1.8 Non Deterministic Finite Automaton into a Deterministic Finite


Automaton: (Subset Construction Algorithm)
The next step is to convert a NFA to a DFA (called subset construction). Suppose
that you assign a number to each NFA state. The DFA states generated by subset
construction have sets of numbers, instead of just one number. For example, a DFA
state may have been assigned the set {5,6,8}. This indicates that arriving to the
state labeled {5,6,8} in the DFA is the same as arriving to the state 5, the state 6,
or the state 8 in the NFA when parsing the same input. (Recall that a particular
input sequence when parsed by a DFA, leads to a unique state, while when parsed
by a NFA it may lead to multiple states.)
First we need to handle transitions that lead to other states for free (without
consuming any input). These are the transitions. We define the closure of a NFA
node as the set of all the nodes reachable by this node using zero, one, or more
transitions. For example, The closure of node 1 in the left figure below

34
is the set {1,2}. The start state of the constructed DFA is labeled by the closure of the NFA start
state. For every DFA state labeled by some set and for every character c in the language alphabet,
you find all the states reachable by s1, s2, ..., or sn using c arrows and you union together the
closures of these nodes. If this set is not the label of any other node in the DFA constructed so far,
you create a new DFA node with this label. For example, node {1,2} in the DFA above has an arrow
to a {3,4,5} for the character a since the NFA node 3 can be reached by 1 on a and nodes 4 and 5
can be reached by 2. The b arrow for node {1,2} goes to the error node which is associated with an
empty set of NFA nodes. The following NFA recognizes , even though it wasn't constructed

Converting NFAs to DFAs


To convert an NFA to a DFA, we must and a way to remove all "-transitions and to ensure that there
is one transition per symbol in each state. We do this by constructing a DFA in which each state
corresponds to a set of some states from the NFA. In the DFA, transitions from a state S by some
symbol go to the state S that consists of all the possible NFA-states that could be reached by from
some NFA state q contained in the present DFA state S. The resulting DFA \simulates" the given NFA
in the sense that a single DFA-transition represents many simultaneous NFA-transitions. The first
concept we need is the "E-closure pronounced \epsilon closure". The " -closure of an NFA state q is
the set containing q along with all states in the automaton that are reachable by any number of " E-
transitions from q . In the following automaton, the " E-closures are given in the table to the right:

Likewise, we can done the "-closure of a set of states to be the states reachable by " - transitions
from its members. In other words, this is the union of the " -closures of its elements. To convert our
NFA to its DFA counterpart, we begin by taking the " –closure of the start state q of our NFA and
constructing a new start state S.in our DFA corresponding to that " -closure. Next, for each symbol in
our alphabet, we record the set of NFA states that we can reach from S 0on that symbol. For each
such set, we make a DFA state corresponding to its "E-closure, taking care to do this only once for
each set.

35
Building Syntax tree
Example
Step 1 (a|b)*abb - > (a|b)*abb #

Node followpos

1 {1, 2, 3}
2 {1, 2, 3}
3 {4}
4 {5}
5 {6}
6 -

36
Step 4

A=firstpos(n0)={1,2,3}
Move[A,a]={1,2}
=followpos(1) U followpos(3)
= {1,2,3,4}=B
Move[A,b]={2}
followpos(2)={1,2,3}=A Move[B,a]={1,3}
=followpos(1) U followpos(3)=B
Move[B,b]={2,4}
=followpos(2) U followpos(4)
={1,2,3,5}=C
Move[C,a]={1,3}=B
Mov[C,b]={2,5}
= Followpos(2)Ufollowpos(5)={1,2,3,6}---D
Mov(D,a)={1,3}=B
Mov(D,b)={2,4}=C

Required DFA:

(a|b)*abb#

37
1.9 A LANGUAGE FOR SPECIFYING LEXICAL ANALYZER
There is a wide range of tools for constructing lexical
analyzers.
· Lex
· YACC
LEX
Lex is a
computer program that generates lexical analyzers. Lex is commonly used
with
the yacc parser generator.
Creating a lexical analyzer
First, a specification of a lexical analyzer is prepared by
creating a program lex.l in the Lex language.
Then, lex.l is run through the Lex compiler to produce a C
program lex.yy.c.

Finally, lex.yy.c is run through the C compiler to produce an object program


a.out, which is the lexical analyzer that transforms an input stream into a
sequence of tokens.

LEX Compiler
lex.l lex.yy.c

C Compiler
lex.yy.c a.out

a.out
input stream Sequence of tokens

38
Lex Specification
A Lex program consists of three parts:
{ definitions }
%%
{ rules }
%%
{ user subroutines }

Definitions include declarations of variables, constants, and regular


definitions
Rules are statements of the form
p1 {action 1}
p2 {action 2}

pn {action n}

where pi is regular expression and action i describes what action the lexical
analyzer should take when pattern pi matches a lexeme.
Actions are written in C code.
User subroutines are auxiliary procedures needed by the actions. These
can be compiled separately and loaded with the lexical analyzer.
YACC- YET ANOTHER COMPILER-COMPILER
Yacc provides a general tool for describing the input to a computer program.
The Yacc user specifies the structures of his input, together with code to be
invoked as each such structure is recognized. Yacc turns such a specification
into a subroutine that handles the input process; frequently, it is convenient
and appropriate to have most of the flow of control in the user's application
handled by this subroutine.

39
In the case two sets are equal, we simply reuse the existing DFA state that we
already constructed. This process is then repeated for each of the new DFA states
(that is, set of NFA states) until we run out of DFA states to process. Finally, every
DFA state whose corresponding set of NFA states contains an accepting state is itself
marked as an accepting state.
The Lexical-Analyzer Generator Lex
Lexical Analyzer tool is called Lex, or in a more recent implementation Flex, that
allows one to specify a lexical analyzer by specifying regular expressions to describe
patterns for tokens.
The input notation for the Lex tool is referred to as the Lex language and the tool
itself is the Lex compiler.
Behind the scenes, the Lex compiler transforms the input patterns into a transition
diagram and generates code, in a file called l e x . y y . c, that simulates this
transition diagram.
Use of Lex
Figure 3.22 suggests how Lex is used. An input file, which we call lex.l , is written in
the Lex language and describes the lexical analyzer to be generated. The Lex
compiler transforms lex.1 to a C program, in a file that is always named lex.yy.c.
The latter file is compiled by the C compiler into a file called a.out . The C-compiler
output is a working lexical analyzer that can take a stream of input characters and
produce a stream of tokens.
The normal use of the compiled C program, referred to as a. out in Fig. 3.22, is as a
subroutine of the parser. It is a C function that returns an integer, which is a code
for one of the possible token names. The attribute value, whether it be another
numeric code, a pointer to the symbol table, or nothing, is placed in a global variable
yylval , which is shared between the lexical analyzer and parser, thereby making it
simple to return both the name and an attribute value of a token.

40
Structure of Lex Programs
A Lex program has the following form:
declarations
%%
translation rules
%%
auxiliary functions
· The declarations section includes declarations of variables, manifest constants
(identifiers declared to stand for a constant, e.g., the name of a token), and regular
definitions.
· The translation rules each have the form
Pattern { Action }
· Each pattern is a regular expression, which may use the regular definitions of
the declaration section. The actions are fragments of code, typically written in C,
although many variants of Lex using other languages have been created.
· The third section holds whatever additional functions are used in the actions.
· Alternatively, these functions can be compiled separately and loaded with the
lexical analyzer.
· When called by the parser, the lexical analyzer begins reading its remaining
input, one character at a time, until it finds the longest prefix of the input that
matches one of the patterns Pi. It then executes the associated action Ai. Typically,
Ai will return to the parser, but if it does not (e.g., because Pi describes whitespace
or comments), then the lexical analyzer proceeds to find additional lexemes, until
one of the corresponding actions causes a return to the parser.

41
The lexical analyzer returns a single value, the token name, to the parser, but uses the
shared, integer variable yylval to pass additional information about the lexeme found, if
needed.
Example: Figure 3.23 is a Lex program that recognizes the tokens of Fig. 3.12 and returns
the token found.

Declarations section:
In the declarations section we see a pair of special brackets, %{ and %}. Anything
within these brackets is copied directly to the file lex.yy.c , and is not treated as a regular
definition. It is common to place there the definitions of the manifest constants, using C
#def i n e statements to associate unique integer codes with each of the manifest constants.

Also in the declarations section is a sequence of regular definitions. Regular definitions


that are used in later definitions or in the patterns of the translation rules are surrounded
by curly braces. Thus, for instance, delim is defined to be a shorthand for the character
class consisting of the blank, the tab, and the newline; the latter two are represented, as in
all UNIX commands, by backslash followed by t or n, respectively. Then, ws is defined to be
one or more delimiters, by the regular expression {delim}+.

Notice that in the definition of id and number, parentheses are used as grouping
metasymbols and do not stand for themselves. In contrast, E in the definition of number
stands for itself. If we wish to use one of the Lex metasymbols, such as any of the
parentheses, +, *, or ?, to stand for themselves, we may precede them with a backslash.
For instance, we see \. in the definition of number, to represent the dot, since that character
is a metasymbol representing "any character," as usual in UNIX regular expressions.

Auxiliary-function section:
In the auxiliary-function section, we see two such functions, i n s t a llID () and
installNum(). Like the portion of the declaration section that appears between %{. . . % } ,
everything in the auxiliary section is copied directly to file l e x . y y . c , but may be used in
the actions.

42
Translation rules:
Finally, let us examine some of the patterns and rules in the middle section of
Fig. 3.23. First, ws, an identifier declared in the first section, has an associated
empty action. If we find whitespace, we do not return to the parser, but look for
another lexeme.
The second token has the simple regular expression pattern if. Should we see the
two letters if on the input, and they are not followed by another letter or digit
(which would cause the lexical analyzer to find a longer prefix of the input matching
the pattern for id), then the lexical analyzer consumes these two letters from the
input and returns the token name IF, that is, the integer for which the manifest
constant IF stands. Keywords t h e n and e l s e are treated similarly.
%{
/* definitions of manifest constants
#define LT 260
#define LE 261
EQ, NE, GT, GE, IF, THEN, ELSE, ID, NUMBER, RELOP */
%}
/* regular definitions */
delim [ \t\n]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?
%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID);}
{number} {yylval = (int) installNumO ; return(NUMBER);}
“<” {yylval = LT; return(RELOP) ;}
"<=" {yylval = LE; return(RELOP) ;}
“=” {yylval = EQ; return(RELOP) ;}
"<>" {yylval = NE; return(RELOP) ;}
“>” {yylval = GT; return(RELOP) ;}
“>=” {yylval = GE; return(RELOP) ;}
%%

43
int installID() {/* function to install the lexeme, whose first
character is pointed to by yytext, arid whose length is yyleng, into
the symbol table and return a pointer thereto */
}
int installNum() {/* similar to installlD, but puts numerical
constants into a separate table */
}
Figure 3.23: Lex program for the tokens of Fig. 3.12
The fifth token has the pattern defined by id. Note that, although keywords like if
match this pattern as well as an earlier pattern, Lex chooses whichever pattern is
listed first in situations where the longest matching prefix matches two or more
patterns. The action taken when id is matched is threefold:
1. Function i n s t a l l I D ( ) is called to place the lexeme found in the symbol
table.
2. This function returns a pointer to the symbol table, which is placed in global
variable y y l v a l , where it can be used by the parser or a later component of the
compiler. Note that i n s t a l l I D ( ) has available to it two variables that are set
automatically by the lexical analyzer that Lex generates:
(a) yytext is a pointer to the beginning of the lexeme, analogous to lexeme Begin in
Fig. 3.3.
(b) yyleng is the length of the lexeme found.
3. The token name ID is returned to the parser. The action taken when a lexeme
matching the pattern number is similar, using the auxiliary function installNum().
Conflict Resolution in Lex
Two rules that Lex uses to decide on the proper lexeme to select, when several
prefixes of the input match one or more patterns:
1. Always prefer a longer prefix to a shorter prefix.
2. If the longest possible prefix matches two or more patterns, prefer the
pattern listed first in the Lex program.

44
E x a m p l e:
• The first rule tells us to continue reading letters and digits to find the longest
prefix of these characters to group as an identifier. It also tells us to treat <=
as a single lexeme, rather than selecting < as one lexeme and = as the next
lexeme.
• The second rule makes keywords reserved, if we list the keywords before id
in the program.
For instance, if t h e n is determined to be the longest prefix of the input that
matches any pattern, and the pattern then precedes { i d } , as it does in Fig. 3.23,
then the token THEN is returned, rather than ID.
The Lookahead Operator
· Lex automatically reads one character ahead of the last character that
forms the selected lexeme, and then retracts the input so only the lexeme
itself is consumed from the input.
· When we want a certain pattern to be matched to the input only when
it is followed by a certain other characters. If so, we may use the slash in a
pattern to indicate the end of the part of the pattern that matches the
lexeme. What follows / is additional pattern that must be matched before we
can decide that the token in question was seen, but what matches this
second pattern is not part of the lexeme.

45
Example : In Fortran and some other languages, keywords are not reserved. That
situation creates problems, such as a statement
IF(I,J) = 3
where IF is the name of an array, not a keyword. This statement contrasts with
statements of the form
IF( condition ) THEN ...
where IF is a keyword.
To recognize the keyword IF, which is always followed by a left parenthesis, some
text ,the condition that may contain parentheses, a right parenthesis and a letter.
Then, we could write a Lex rule for the keyword IF like:
IF / \( .* \) {letter}
This rule says that the pattern the lexeme matches is just the two letters IF. The
slash says that additional pattern follows but does not match the lexeme.
In this pattern, the first character is the left parentheses. Since that character is a
Lex metasymbol, it must be preceded by a backslash to indicate that it has its literal
meaning. The dot and star match "any string without a newline." Note that the dot
is a Lex metasymbol meaning "any character except newline." It is followed by a
right parenthesis, again with a backslash to give that character its literal meaning.
The additional pattern is followed by the symbol letter, which is a regular definition
representing the character class of all letters.
For instance, suppose this pattern is asked to match a prefix of input:
IF(A<(B+C)*D)THEN...
the first two characters match IF, the next character matches \ ( , the next nine
characters match .*, and the next two match \) and letter. Note the fact that the
first right parenthesis (after C) is not followed by a letter is irrelevant; we only need
to find some way of matching the input to the pattern. We conclude that the letters
IF constitute the lexeme, and they are an instance of token if.

46
Conversion of NFA with εto DFA

Step 1:
Consider M=(Q,Σ,δ,q0,F) is a NFA with ε,we have to convert this NFA with εto
equivalent DFA denoted by MD = (QD,Σ,δD,q0,FD)

Then obtain, ε-closure(q0) = {p1,p2,...pn} then [p1,p2,...pn] becomes a start


state of DFA. Now [p1,p2,...pn] ϵFD.
Step 2:
We will obtain δtransitions on [p1,p2,...pn] for each input.
δD([[p1,p2,...pn],a) = ε-closure(δ(p1,a) δ(p2,a) ..δ(pn,a))

= ⋃ε−closure(pi,a)𝑛𝑖=1 where 'a' is the input ϵΣ.


Step 3:
The state obtained [p1,p2,...pn] ϵQD. The state containing in pi is a final state in
DFA.

Theorem:
A language L is accepted by some ε-NFA if and only if(iff) L is accepted by some
DFA.

Proof:
Suppose
L=L(D) for some DFA.
D into an ε-NFA E by adding transition δ(q, ε)= φfor all states q of D.
we must also convert the transition of D on input symbols.
Example:
δD(p,a) = p into an NFA transition to the set containing only p.
i.e) δE(p,a) = {p}.

Thus e transition of E and D are the same, but E explicitly states that there are no
transition out of any state on ε.
Let E=(QE,Σ,δE,q0,FE) be an ε-NFA. Apply the modified subset construction
described above to produce the DFA.

D = (QD,Σ,δD,q0,FD)

we need to show that L(D) = L(E) an we do so b showing that the extended


transition functions of E and D are the same.

Formally
δ'E(q0,w) = δ'D(qD,w) by induction on the length of w.

Basis:
if |w| = 0, then w = ε.
we know δ'E(q0, ε)= ECLOSE(q0).
we also know that qD = ECLOSE(q0), because this is how the start state of D is
defined.
for DFA, we know that δ'(p, ε)= p for any state p, so on particular
δ'D(qD, ε)= ECLOSE (q0).
we have thus proved that δ'E(q0, ε)= δ'D(qD, ε).

Induction:
Suppose w=xa, where a is the final symbol of w and assume that the statement
holds for x.
i.e) δ'E(q0, x) = δ'D(qD, x).
let both these sets of states be {p1,p2,...pk}.
By the definition of δ'for ε-NFA's ,we compute δ'E(q0, w) by
let {r1,r2,...rm} be ⋃δE(Pi,a)𝑘𝑖=1
Then δ'E(q0,w) = ECLOSE({r1,r2,...rm})

If we examine the construction of DFA D in the modified subset construction, we


see that δD({P1,P2,...Pk},a) is the same set as δ'E(q0, w).
we have now proved that δ'E(q0, x) = δ'D(qD, w) and completed the inductive
part.

Theorem:
If L is accepted by NFA with ε-transitions, then L is accepted by an NFA without
ε-transitions. That is L(M) = L(M')

Proof:

Let M = (Q,Σ,δ,q0,F) be an NFA with ε-transitions.


Construct M' which is NFA without ε-transitions. M' = (Q,,',q0,F') where F' =
{𝐹∪{𝑞0} 𝑖𝑓ε−closure(q0) contains a state of F𝐹,𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

By induction: In ε-NFA, and ' are same and in NFA and ' are different.
Extended transition function

DFA :
Basis: |W|=0
δˆ(q, ε)= q
If we are in a state q and read no inputs, then we are still in state q
Induction: |W|>1
Suppose w is a string of the form xa;
Then: δˆ(q, w) = δˆ(q, xa)
= δ(δˆ(q, x), a)
= δ(p , a ) let δˆ(q, x) = p
= [r]
NFA:
Basis: δˆ(q, ε)= q
If we are in a state q and read no inputs, then we are still in state q
Induction:
Suppose w is a string of the form xa
Then: δˆ(q, w) = δ(δˆ(q, x), a)
= δ({p1, p2, . . . pk , },a) let δˆ(q, x) = {p1, p2,
. . . pk , }

= U { δ(p1, a ) , δ(p2, a ) ,….}


= {r1, r2, . . . , rm}
NFA-ε :
Basis: δˆ(q, ε) = ε-Clsoure (q) =q
If we are in a state q and read no inputs, then we are still in state q
Induction:
Suppose w is a string of the form xa
Then: δˆ(q, w) = ε-Clsoure(δ(δˆ(q, x), a))
= ε-Clsoure δ({p1, p2, . . . pk , },a) let δˆ(q, x) = {p1, p2, . . . pk , }

= ε-Clsoure U { δ(p1, a ) , δ(p2, a ) ,….}


= ε-Clsoure {r1, r2, . . . , rm}
={r1, r2, . . . , rm}
1. a. Convert the given NFA to DFA.

Solution:
Initial State of given NFA is {q0 }
Since NFA is equivalent to DFA,
Let the Initial State of DFA be [q0] -------------A

Now we will obtain δtransition for state A.


δ’(A, 0)= δ’({q0}, 0) = {q0} =[q0] -------- A Transition table
δ’(A, 1)= δ’({q0}, 1) = {q1}=[q1] -------- B (new state generated)

The δtransition for state B is obtained as:


δ’(B, 0)= δ’({q1}, 0) = {q1, q2} =[q1,q2] -------- C
δ’(B,1) = δ’({q1}, 1) = {q1}=[q1] -------- B

Now we will obtain δ'transition on C.


δ’(C, 0) =δ’({q1, q2}, 0)
= δ(q1, 0) ∪ δ(q2, 0)
= {q1, q2} ∪ {q2}
= [q1, q2] -------- C

δ’(C, 1) =δ’({q1, q2}, 1)


= δ(q1, 1) ∪ δ(q2, 1)
= {q1} ∪ {q1, q2}
= {q1, q2}
= [q1, q2] -------- C
Resultant DFA

DFA - Transition table

state Input symbols

0 1

→A A B

B C B

*C C C

DFA - Transition diagram


Convert the following NFA-ε into its equivalent NFA.

a)

Note:
No change in Initial State
No change in the total No. Of States
May be change in Final States
Changes in transitions

Solution:
ε-Closure(q0)={q0,q1}
ε-Closure(q1)={q1}

Initial state of given nfa-ε is {q0}


Now, initial state of nfa will be {q0}

Now find the transition of q0 on inputs 0,1


δ(q0,0)=ε-Closure(δ(ε-Closure(q0),0))
=ε-Closure(δ({q0,q1},0))
=ε-Closure(q0)
={q0,q1}
δ(q0,1)=ε-Closure(δ(ε-Closure(q1),1))
=ε-Closure(δ({q0,q1},1))
=ε-Closure(q1)
={q1}
Now find the transition of q1 on inputs 0,1
δ(q1,0)=ε-Closure(δ(ε-Closure(q1),0))
=ε-Closure(δ({q1},0))
=ε-Closure(Φ)

δ(q1,1)=ε-Closure(δ(ε-Closure(q1)1))
=ε-Closure(δ({q1},1))
=ε-Closure(q1)
={q1}
Final State
q0 & q1 Both are final states because q1 lies in both
ε-Closure(q0)={q0,q1}
ε-Closure(q1)={q1}
Resultant NFA:

Transition table

State Input symbol

0 1

*q0 {q0,q1} {q1}

* q1 Φ {q1}

Transition diagram
Conversion of NFA-ε into its equivalent NFA.

Convert the following NFA-ε into its equivalent NFA.

b)

Solution:
ε-Closure(A)={A,B}
ε-Closure(B)={B,D}
ε-Closure(C)={C}
ε-Closure(D)={D}

Initial state of given nfa-ε is {A}


Now, initial state of nfa will be {A}

Now find the transition of A on inputs 0,1


δ(A,0)=ε-Closure(δ(ε-Closure(A),0))
=ε-Closure(δ({A,B},0))
=ε-Closure(δ(A,0)U δ(B,0))
= ε-Closure(A,C)
={A,B,C}
δ(A,1)=ε-Closure(δ(ε-Closure(A),1))
=ε-Closure(δ({A,B},1))
=ε-Closure(δ(A,1)U δ(B,1))
=ε-Closure(∅)
={∅}
Now find the transition of B on inputs 0,1
δ(B,0)=ε-Closure(δ(ε-Closure(B),0))
=ε-Closure(δ({B,D},0))
=ε-Closure(δ(B,0)U δ(D,0))
= ε-Closure(C,D)
={C,D}
δ(B,1)=ε-Closure(δ(ε-Closure(B),1))
=ε-Closure(δ({B,D},1))
=ε-Closure(δ(B,1)U δ(D,1))
=ε-Closure(∅,D)
={D}
Now find the transition of C on inputs 0,1
δ(C,0)=ε-Closure(δ(ε-Closure(C),0))
=ε-Closure(δ(C,0))
= ε-Closure(∅)
={∅}
δ(C,1)=ε-Closure(δ(ε-Closure(C),1))
=ε-Closure(δ(C,1))
=ε-Closure(∅)
={∅}
Now find the transition of D on inputs 0,1
δ(D,0)=ε-Closure(δ(ε-Closure(D),0))
=ε-Closure(δ(D,0))
= ε-Closure(D)
={D}
δ(D,1)=ε-Closure(δ(ε-Closure(D),1))
=ε-Closure(δ(D,1))
=ε-Closure(D)
={D}
Resultant NFA:

Transition table
State Input symbol
0 1
->A {A,B,C} Φ

*B {C,D} {D}

C Φ Φ

*D {D} {D}

Note:
B & D are final states because final state of given NFA-ε D lies in both
ε-Closure(B)={B,D}
ε-Closure(D)={D}
Conversion of NFA-ε to DFA

Convert the following NFA-ε into its equivalent DFA.


a)

Solution:

Let us obtain the ε-closure of each state.


ε-closure(q0) = {q0, q1, q2}
ε-closure(q1) = {q1, q2}
ε-closure(q2) = {q2}

Let the initial state of DFA be ε-closure({q0})


= {q0, q1, q2}
= [q0, q1, q2] ----- A
Transitions of A on inputs 0,1,2

δ'(A, 0) = ε-closure(δ((q0, q1, q2), 0))


= ε-closure(δ(q0, 0) ∪ δ(q1, 0) ∪ δ(q2, 0))
= ε-closure({q0})
= {q0, q1, q2}
= [q0, q1, q2] ----- A

δ'(A, 1) = ε-closure(δ((q0, q1, q2), 1))


= ε-closure(δ(q0, 1) ∪ δ(q1, 1) ∪ δ(q2, 1))
= ε-closure({q1})
= {q1, q2}
= [q1, q2] ------ B

δ'(A, 2) = ε-closure(δ((q0, q1, q2), 2))


= ε-closure(δ(q0, 2) ∪ δ(q1, 2) ∪ δ(q2, 2))
= ε-closure({q2})
= {q2}
= [q2] ------C
Transitions of B on inputs 0,1,2

δ'(B, 0) = ε-closure{δ((q1, q2), 0)}


= ε-closure{δ(q1, 0) ∪ δ(q2, 0)}
= ε-closure{∅}
= ∅ ------- ∅

δ'(B, 1) = ε-closure{δ((q1, q2), 1)}


= ε-closure{δ(q1, 1) ∪ δ(q2, 1)}
= ε-closure{q1}
= {q1, q2}
= [q1, q2] ------- B

δ'(B, 2) = ε-closure{δ((q1, q2), 2)}


= ε-closure{δ(q1, 2) ∪ δ(q2, 2)}
= ε-closure{q2}
= {q2}
= [q2] ------- C

Transitions of C on inputs 0,1,2

δ'(C, 0) = ε-closure{δ(q2, 0)}


= ε-closure{∅ }
= ∅ ------- ∅

δ'(C, 1) = ε-closure{δ(q2, 1)}


= ε-closure{∅ }
= ∅------- ∅

δ'(C, 2) = ε-closure{δ(q2, 2)}


= {q2}
= [q2] ------- C
Resultant DFA :

States Inputs
0 1 2

->*A A B C

*B ∅ B C

*C ∅ ∅ C

Note:

Final state of given NFA is {q2}

A = {q0, q1, q2} B = {q1, q2} C = {q2}

Since q2 lies in DFA states A,B,C , all the three states becomes the final
states of resultant DFA.
Convert the following NFA-ε into its equivalent DFA.

b)
States Inputs
0 1 ε
->A {B} {A} {B}
B ∅ {B} {C}
*C {C,A} {C} ∅

Solution:

Let us obtain the ε-closure of each state.


ε-closure(A) = {A,B,C}
ε-closure(B) = {B,C}
ε-closure(C) = {C}

Let the initial state of DFA be ε-closure({A})


= {A,B,C}
= [A,B,C] ----- X
Transitions of X on inputs 0,1
δ’(X, 0)= ε-closure{δ(X, 0)}
= ε-closure{δ((A,B,C), 0)}
= {B,C}
= [B,C] -------- Y

δ’(X, 1) )=ε-closure{δ(X, 1)}


=ε-closure{δ((A,B,C), 1)}
={A,B,C}
= [A,B,C] ------- X
Transitions of Y on inputs 0,1
δ’(Y, 0) =ε-closure{δ(Y, 0)}

=ε-closure{δ((B,C), 0)}
= {C}
= [C] ---------Z

δ’(Y, 1) =ε-closure{δ(Y, 1)}


=ε-closure{δ((B,C), 1)}

= {B,C}
=[B,C] ------Y

Transitions of Z on inputs 0,1

δ’(Z, 0)=ε-closure{δ(Z, 0)}


=ε-closure{δ(C, 0)}
= {C}

= [C] -----------Z

δ’(Z, 1)= ε-closure{δ(Z, 1)}


= ε-closure{δ((C), 1)}
= {C}

= [C] ----------Z

Resultant DFA : States Inputs


0 1
->*X Y X
*Y Z Y
*Z Z Z
1.10 MINIMIZATION OF DFA

• For storage efficiency, it is desirable to reduce the number of states as far as


possible.
• One method for reducing the states of a DFA is based on finding and combing
indistinguishable(equivalent) states.

Two states p and q of a DFA are called indistinguishable/equivalent


if δ∗(p, w) ∈F implies δ∗(q, w) ∈F, and
δ∗(p, w) ∉F implies δ∗(q, w) ∉F, for all w ∈Σ∗

If, on the other hand, there exists some string w ∈Σ∗ such that
δ∗(p, w) ∈F and
δ∗(q, w) ∉F, or vice versa,
then the states p and q are said to be distinguishable by a string w.

Algorithm
Step 1 − All the states Q are divided in two partitions − final states and non-final
states and are denoted by P0. All the states in a partition are 0th equivalent. Take
a counter k and initialize it with 0.

Step 2 − Increment k by 1. For each partition in Pk, divide the states in Pk into two
partitions if they are k-distinguishable. Two states within this partition X and Y are
k-distinguishable if there is an input S such that δ(X, S) and δ(Y, S) are (k-1) -
distinguishable.

Step 3 − If Pk ≠ Pk-1, repeat Step 2, otherwise go to Step 4.

Step 4 − Combine kth equivalent sets and make them the new states of the
reduced DFA.
Problem: 1 Find minimized DFA for the given DFA:

State 0 1
-> q0 q1 q5
q1 q6 q2
* q2 q0 q2
q3 q2 q6
q4 q7 q5
q5 q2 q6
q6 q6 q4
q7 q6 q2
Solution :
Partition the states into final and non-final
Q1={q2} Q2={q0,q1,q3,q4,q5,q6,q7}
Now check for equivalent states by comparing each others

• Now compare the transitions of qo with others excluding q2


δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q1, 0 ) = q6 δ( q1, 1 ) = q2

belongs to Q2 not same group

So q0,q1 - distinguishable states

• Now compare the transitions of qo with q3


δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q3, 0 ) = q2 δ( q3, 1 ) = q6

not same group belongs to Q2

So q0,q3 - distinguishable states


• Now compare the transitions of qo with q4
δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q4, 0 ) = q7 δ( q4, 1 ) = q5

belongs to Q2 belongs to Q2

So q0,q4 – equivalent states

• Now compare the transitions of qo with q5


δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q5, 0 ) = q2 δ( q5, 1 ) = q6

different group belongs to Q2

So q0,q5 – distinguishable states

• Now compare the transitions of qo with q6


δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q6, 0 ) = q6 δ( q6, 1 ) = q4
belongs to Q2 belongs to Q2

So q0,q6 – equivalent states

• Now compare the transitions of qo with q7


δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q7, 0 ) = q6 δ( q7, 1 ) = q2
belongs to Q2 different group

So q0,q7 – distinguishable states

Therefore: Q1={q2} Q2= {q0,q4,q6} Q3={q1,q3,q5,q7}


Then check equivalent states in Q3 {q1,q3,q5,q7}, compare the transitions of q1
with rest of the states q3,q5 and q7
• Now compare the transitions of q1 with q3
δ( q1, 0 ) = q6 δ( q1, 1 ) = q2
δ( q3, 0 ) = q2 δ( q3, 1 ) = q6

different group different group

So q1,q3 – distinguishable states

• Now compare the transitions of q1 with q5


δ( q1, 0 ) = q6 δ( q1, 1 ) = q2
δ( q5, 0 ) = q2 δ( q5, 1 ) = q6

different group different group

So q1,q5 – distinguishable states

• Now compare the transitions of q1 with q7


δ( q1, 0 ) = q6 δ( q1, 1 ) = q2
δ( q7, 0 ) = q6 δ( q7, 1 ) = q2

same group same group

So q1,q7 –equivalent states

Therefore , Q1={q2} Q2= {q0,q4,q6} Q3={q1,q7} Q4= {q3,q5}


Then check equivalent states in Q4, check the transitions of q3 and q5
• Now compare the transitions of q3 with q5
δ( q3, 0 ) = q2 δ( q3, 1 ) = q6
δ( q5, 0 ) = q2 δ( q5, 1 ) = q6

same group same group

So q3,q5 –equivalent states


Therefore : Q1={q2} Q2={q0,q4,q6} Q3={q1,q7} Q4= {q3,q5}
Now again check whether Q2 can be still partitioned , compare q0 with q4 and q6
• Now compare the transitions of qo with q4
δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q4, 0 ) = q7 δ( q4, 1 ) = q5

same group same group

So q0,q4 –equivalent states


• Now compare the transitions of qo with q6
δ( q0, 0 ) = q1 δ( q0, 1 ) = q5
δ( q6, 0 ) = q6 δ( q6, 1 ) = q4

different group different group

So q0,q6 –distinguishable states


Therefore: Q1=[q2] Q2=[q0,q4] Q3=[q1,q7] Q4= [q3,q5] Q5=[q6]
Equivalent states are
Q1=[q2] Q2=[q0,q4] Q3=[q1,q7] Q4= [q3,q5] Q5=[q6]
DFA(M) is defined by 5 tuples: M={Q, ∑, δ,Q2,Q1}
Where
Q={Q1,Q2,Q3,Q4,Q5}
∑ =(0,1)
δis defined in table
Q2 is initial state
Q1 is final state
After merging equivalent states, DFA becomes

State 0 1 State 0 1
-> q0,q4 q1,q7 q5
-> [q0,q4] [q1,q7] [q3,q5]
q1,q7 q6 q2
[q1,q7] [q6] [q2]
* q2 q0 q2 removing

q3,q5 q2 q6 duplicates
*[ q2] [q0,q4] [q2]
q4 q7 q5
[q3,q5] [q2] [q6]
q5 q2 q6
q6 q6 q4 [q6] [q6] [q0,q4]
q7 q6 q2

Since Q1=[q2] Q2=[q0,q4] Q3=[q1,q7] Q4= [q3,q5] Q5=[q6]


Then the minimized DFA is:

State 0 1

-> Q2 Q3 Q4

Q3 Q5 Q1

* Q1 Q2 Q1

Q4 Q1 Q5

Q5 Q5 Q2
Problem 2: Find Minimized DFA

Solution:
Partition the states into final and non-final :
One set will contain q1, q2, q4 which are final states of DFA
another set will contain remaining states.

So P0 = { { q1, q2, q4 }, { q0, q3, q5 } }


• Check for equivalence in { q0, q3, q5 } :
• Now check for { q0, q3} :
δ( q0, 0 ) = q3 and δ( q0, 1) = q1 δ
( q3, 0 ) = q0 and δ(q3, 1 ) = q4
so q0, q3 are equivalent states
• Now check for { q0, q5} :
δ( q0, 0 ) = q3 and δ( q0, 1 ) = q1 δ
( q5, 0 ) = q5 and δ( q5, 1 ) = q5
So, q0 and q5 are distinguishable.
So, set { q0, q3, q5 } will be partitioned into { q0, q3 } and { q5 }.
So, Q = { { q1, q2, q4 }, { q0, q3}, { q5 } }

Hence Minimized DFA is:


9. ASSIGNMENT : UNIT – I

SNO ASSIGNMENT QUESTIONS CO K


level leve
l
1 Explain in detail about the phases of compiler. K2

2 Find the minimized DFA for the regular expression K3


ab*|ab.
3 Compare the number of states of DFA and minimized K4
DFA using Thompson Construction (a|b)*abb(a|b)*.
4 Prove the following regular expressions are equivalent by K5
constructing minimized DFA (a*|b*)* and (a|b)*.
5 Recall the various buffering methods in Lexical analyzer. K1

6 Write an algorithm for minimization of DFA. Give an K2


example. CO1
7 Develop a lexical analyzer to recognize few patterns in K6
C.(Ex. Identifiers, constants, comments, operators
etc.). Create a symbol table, while recognizing
identifiers.
8 Organize the phases in to different groups based on K3
the task.

69
10. PART A : Q & A : UNIT – I
SNo Questions and Answers CO K
What are the two parts of a compilation? Explain
briefly.
Analysis and Synthesis are the two parts of compilation.
(Front-end & Back-end)
1 ● The analysis part breaks up the source program K1
into constituent pieces and creates an
intermediate representation of the source program.
● The synthesis part constructs the desired target
program from the intermediate representation.
List the various compiler construction tools.
● Parser generators
● Scanner Generator
2 ● Syntax-directed translation engines K2
● Automatic code generators
● Dataflow Engines
● Compiler construction tool kits
Differentiate compiler and interpreter.
● The machine-language target program produced
by a compiler is usually much faster than an
3 interpreter at mapping inputs to outputs. K1
● An interpreter, however, can usually give better CO1
error diagnostics than a compiler, because it executes
the source program statement by statement.
Define tokens, Patterns, and lexemes.
● Tokens- Sequence of characters that have a collective
meaning. A token of a language is a category of its
lexemes.
● Patterns- There is a set of strings in the input for
4 K2
which the same token is produced as output. This
set of strings is described by a rule called a
pattern associated with the token.
● Lexeme- A sequence of characters in the source
program that is matched by the pattern for a token.
Describe the possible error recovery actions in lexical
analyzer.
·Panic mode recovery
·Deleting an extraneous character
5 K1
·Inserting a missing character
·Replacing an incorrect character by a correct character
·Transposing two adjacent characters

70
10. PART A : Q & A : UNIT – I
SNo Questions and Answers CO K
Write the regular expression for identifier.
letter -> A | B | ……| Z | a | b | …..| z
6 K2
digit -> 0 | 1 |…….|9
id -> letter (letter | digit ) *
Write the regular expression for Number.
digit -> 0 | 1 |…….|9
digits -> digit digit*
7 optional_fraction->.digits | € K2
optional_exponent -> (E(+|-| €)digits) | €
num -> digits optional_fractionoptional_exponent
id -> letter (letter | digit ) *
List the phases that constitute the front end of a
compiler.
The front end consists of those phases or parts of
phases that depend primarily on the source language
and are largely independent of the target
machine.These include
8 ● Lexical and Syntactic analysis K2
● The creation of symbol table
CO1
● Semantic analysis
● Generation of intermediate code
A certain amount of code optimization can be done by
the front end as well and includes Error handling that
goes along with each of these phases.
Mention the back-end phases of a compiler.
The back end of compiler includes those portions
that depend on the target machine and generally
those portions do not depend on the source
9 K2
language, just the intermediate language. These
include Code optimization
Code generation, along with error handling and symbol-
table operations.
What is the role of lexical analysis phase?
Lexical analyzer reads the source program one
character at a time, and grouped into a sequence
of atomic units called tokens. Identifiers, keywords,
constants, operators and punctuation symbols such as
10 commas, parenthesis, are typical tokens. K2
Identifiers are entered into the Symbol Table.
Removes comments, whitespaces (blanks, newline , tab)
Keeps track of the line numbers.
Identifies Lexical Errors and appropriate error messages.

71
10. PART A : Q & A : UNIT – I

SNo Questions and Answers CO K


What is Sentinel ?
For each character read, we make two tests in lexical
analysis phase: one for the end of the buffer, and one
to determine what character is read (the latter may be
a multiway branch). We can combine the buffer-end
11 K1
test with the test for the current character if we extend
each buffer to hold a sentinel character at the end. The
sentinel is a special character that cannot be part of
the source program, and a natural choice is the
character eof.
Define Symbol Table.
A symbol table is a data structure containing a record for
each identifier, with fields for the attributes of the
identifier. The data structure allows us to find the record
12 for each identifier quickly and to store or retrieve data K2
from that record quickly.
Whenever an identifier is detected by a lexical analyzer,
it is entered into the symbol table. The attributes of an
identifier cannot be determined by the lexical analyzer. CO1
Define a preprocessor.
A source program may be divided into modules
stored in separate files. The task of collecting the
13 source program is sometimes entrusted to a separate K2
program, called a preprocessor.The preprocessor may
also expand shorthands, called macros, into source
language statements.
What are the functions of the pre-processors?
The task of collecting the source program is sometimes
14 entrusted to a separate program, called a preprocessor. K2
The preprocessor may also expand shorthands, called
macros, into source language statements.
What are the various parts in LEX program?
Lex program will be in following form
declarations
15 %% k1
translation rules
%%
auxiliary functions
10. PART A : Q & A : UNIT – I

SNo Questions and Answers CO K

How will you group the phases of the compiler?


The front-end phases of lexical analysis, syntax analysis,
semantic analysis, and intermediate code generation
16 might be grouped together into one pass. Code K1
optimization might be an optional pass.
Then there could be a back-end pass consisting of code
generation for a particular target machine.

Name some variety of intermediate forms.


· Postfix notation or polish notation.
· Syntax tree
17 K1
· Three address code
· Quadruple
· Triple
Write a regular expression to describe a
languages consists of strings made of even
numbers a and b.
18 K1
r= CO1
((a|b)(a|b
))*
Give the transition diagram for an identifier.

19 K1

Write a short notes on LEX.


A LEX source program is a specification of lexical
analyzer consisting of set of regular expressions
together with an action for each regular expression. The
action is a piece of code, which is to be executed
20
whenever a token specified by the corresponding
regular expression is recognized. The output of a LEX is
a lexical analyzer program constructed from the LEX
source specification.

73
11. PART B QUESTIONS : UNIT – I

1.Describe the various phases of compiler and trace it with the program segment
(t=b*-c+b*-c). (CO1,K4)
2.Explain in detail the process of compilation. Illustrate the output of each phase of
the compilation for the input “a = (b+c) * (b+c) *2”. (CO1,K3)
3. Discuss various buffering techniques in detail. (CO1,K5)
4. Construct Regular expression to NFA for the sentence (alb)*a. (CO1,K3)
5. Construct NFA using the regular expression (a/b)*abb. (CO1,K3)
6. Construct the NFA from the (a|b)*a(a|b) using Thompson’s construction
algorithm. (CO1,K6)
7. Construct the DFA for the augmented regular expression (a | b )* # directly
using syntax tree. (CO1,K3)
8. Write an algorithm for minimizing the number of states of a DFA. (CO1,K3)

PART C QUESTIONS

1. Explain the structure of a LEX program with an example. (CO1,K3)


2. Illustrate how does LEX work? (CO1,K2)
3.Consider the regular expression below which can be used as part of a
specification of the definition of exponents in floating-point numbers. Assume that
the alphabet consists of numeric digits (‘0’ through ‘9’) and alphanumeric
characters (‘a’ through ‘z’ and ‘A’ through ‘Z’) with the addition of a selected small
set of punctuation and special characters. (say in this example only the characters
‘+’ and ‘-‘ are relevant). Also, in this representation of regular expressions the
character ‘.’ Denotes concatenation. (CO1,K6)
Exponent = (+|-|€) . (E | e) . (digit)+
(i) Derive an NFA capable of recognizing its language using Thompsons’
construction.
(ii) Derive the DFA for the NFA found in i) above using subset construction.
(iii) Minimize the DFA found in (ii) above.
4. Write Lex program to identify the following tokens - relational operators ,
arithmetic operators and keywords (if, while, do, switch, for ). (CO1,K4)

74
12. Supportive online Certification courses

NPTEL : https://nptel.ac.in/courses/106/105/106105190/
Swayam : https://www.classcentral.com/course/swayam-compiler-design-12926
coursera : https://www.coursera.org/learn/nand2tetris2
Udemy : https://www.udemy.com/course/introduction-to-compiler-construction-and-design/
Mooc : https://www.mooc-list.com/course/compilers-coursera
Edx : https://www.edx.org/course/compilers

75 75
13. Real time Applications in day to day life and to Industry

1. Characterizing Tokens in the JavaCC Grammar File


2. use the StreamTokenizer object to implement an interactive calculator
3. Regular expressions which can be useful in the real-world applications
○ Email validation
○ Password validation
○ Valid date format
○ Empty string validation
○ Phone number validation
○ Credit card number Validation

4. Regular expressions are useful in a wide variety of text processing


tasks, include data validation, data scraping (especially web scraping),
data wrangling, simple parsing, the production of syntax highlighting
systems, and many other tasks.
5. While regexps would be useful on Internet search engines,
processing them across the entire database could consume excessive
computer resources depending on the complexity and design of the
regex.
6. One salient example of a commercial, real-life application of
deterministic finite automata (DFAs) is Microsoft's historical reliance
on finite-state technology from Xerox PARC. Microsoft has used two-level
automata for spell-checking and other functions in various products for
three decades.
7. The various applications of Automata are sequential machine, vending
machine. A simple video games, text matching can be done with
automata

76 76
14. CONTENTS BEYOND SYLLABUS : UNIT – I

Lexical analysis and Java:


Learn how to convert human readable text into machine readable data using
the StringTokenizer and StreamTokenizer classes

Java's lexical analyzers


The Java Language Specification, version 1.0.2, defines two lexical analyzer
classes, StringTokenizer and StreamTokenizer. From their names you can deduce
that StringTokenizer uses String objects as its input, and StreamTokenizer uses
InputStream objects.

As a lexical analyzer, StringTokenizer could be formally defined as shown below.

[~delim1,delim2,...,delimN] :: Token

Detailed description can be obtained from the link,


https://www.infoworld.com/article/2076874/lexical-analysis-and-java--part-1.html

Text Parsing:
Text parsing is a technique which is used to derive a text string using the
production rules of a grammar to check the acceptability of a string.

Regular Expression Matching:


It is a technique to checking the two or more regular expression are similar to
each other or not. The finite state machine is useful to checking out that the
expressions are acceptable or not by a machine or not.
Speech Recognition:
Speech recognition via machine is the technology enhancement that is
capable to identify words and phrases in spoken language and convert them
to a machine-readable format. Receiving words and phrases from real world
and then converted it into machine readable language automatically is
effectively solved by using finite state machine.
Detailed description can be obtained from the link,
https://yashindiasoni.medium.com/real-world-applications-of-automata-88c7ba254e80

77
15. ASSESSMENT SCHEDULE

• Tentative schedule for the Assessment During 2023-2024


even semester

Name of the
S.NO Start Date End Date Portion
Assessment

1 Unit Test 1 29.1.2024 3.2.2024 UNIT 1

2 IAT 1 10.2.2024 16.2.2024 UNIT 1 & 2

3 Unit Test 2 11.3.2024 16.3.2024 UNIT 3

4 IAT 2 1.4.2024 6.4.2024 UNIT 3 & 4

UNIT 5 , 1 &
5 Revision 1 13.5.2024 16.5.2024
2

6 Revision 2 17.4.2024 19.4.2024 UNIT 3 & 4

7 Model 20.4.2024 30.4.2024 ALL 5 UNITS

78
16. PRESCRIBED TEXT BOOKS & REFERENCE BOOKS

• TEXT BOOKS:

• Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman,


Compilers: Principles, Techniques and Tools‖, Second Edition,
Pearson Education, 2014.

• REFERENCE BOOKS:

• Randy Allen, Ken Kennedy, Optimizing Compilers for Modern


Architectures: A Dependence based Approach, Morgan
Kaufmann Publishers, 2002.
• Steven S. Muchnick, Advanced Compiler Design and Implementation‖,
Morgan Kaufmann Publishers - Elsevier Science, India, Indian Reprint
2003.
• Keith D Cooper and Linda Torczon, Engineering a Compiler‖,
Morgan Kaufmann Publishers Elsevier Science, 2004.
• V. Raghavan, Principles of Compiler Design‖, Tata McGraw Hill Education
Publishers,2010.
• Allen I. Holub, Compiler Design in C‖, Prentice-Hall Software Series, 1993.

79
17. MINI PROJECT SUGGESTION

• Objective:
Design of lexical analyzer Generator
Design of an Automata for pattern matching

• Planning:
• This method is mostly used to improve the ability of students in application
domain and also to reinforce knowledge imparted during the lecture.
• Students are asked to prepare mini projects involving application of the
concepts, principles or laws learnt.
• The faulty guides the students at various stages of developing the project
and gives timely inputs for the development of the model.
• Students convert their ideas into real time applications.

projects:
1. C Mini Project: Creating a Lexical Analyzer.(CO1,K6)
2. Use Flex to create a lexical analyzer for C. (CO1,K2)
3. Regular Expression matching - to check the two or more regular
expression are similar to each other or not. (CO1,K4)
4. Construct a automata vending machine as an automated machine use finite
state automata to control the functions process.(CO1,K3)
5. Design a simple tool for Symbol Table Management. (CO1,K6)

80
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.

81 81

You might also like