0% found this document useful (0 votes)
61 views145 pages

Unit I - Automata and Lexical Analyzer

The document discusses the subject of automata and compiler design. It covers topics like finite automata, regular expressions, context free grammars, parsing, semantic analysis, code generation, and code optimization. It also provides objectives and outcomes of learning this subject.

Uploaded by

zebra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views145 pages

Unit I - Automata and Lexical Analyzer

The document discusses the subject of automata and compiler design. It covers topics like finite automata, regular expressions, context free grammars, parsing, semantic analysis, code generation, and code optimization. It also provides objectives and outcomes of learning this subject.

Uploaded by

zebra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

Subject Name : AUTOMATA AND COMPILER DESIGN

Subject Code : 67302


Topic : Unit I - Automata and Lexical Analyzer
Class : III YEAR - I Sem
Date : 25.08.2021

Faculty Name : Dr. R RAJA


Designation : Associate Professor
Department : CSIT
1
AUTOMATA AND COMPILER DESIGN
Prerequisite:
1. Problem Solving through C
2. Data Structures
Course Objectives:
1. To demonstrate the interplay between different models and formal languages.
2. Employ finite state machines to solve problems in computing.
3. Classify machines by their power to recognize the languages
4. Explain deterministic and non-deterministic machines.
5. Emphasize the concepts learnt in lexical analysis, syntax analysis, semantic analysis,
intermediate code generation and type checking process through several programming
exercises.
6. To provide the understanding of language translation peculiarities by designing complete
2
translator for mini language.
Unit I - Automata and Lexical Analyzer
Languages, regular expressions, finite automate and state diagram-DFA, NFA, conversion of
regular expression to NFA-ε, NFA to DFA, NFA-ε to DFA, Phases of the Compiler, lexical
analysis, LEX tool.
Unit II – Parsing
Context free grammars and parsing-Context free grammars, derivation, parse trees, ambiguity
LL (k) grammars, LL (1) parsing. Bottom-up parsing and handle pruning, LR (k) grammar
parsing, and LALR (k) grammars, parsing ambiguous grammars, YACC programming
specification.
Unit III - Semantic Analysis
Syntax directed definition and translation, s-attributed and l-attributed grammars, type checking,
type conversion, equivalence of type expressions, Overloading of functions and operators,
Chomsky hierarchy of languages and recognizers.
10/22/2021 3
Unit IV - Intermediate Code Generation and Runtime Storage
Intermediate code- abstract syntax tree, translation of simple statements and control flow
statements, storage organizations, storage allocation strategies, access to non-local
names, parameter passing techniques, language facilities for dynamic storage allocation,
symbol table and implementation.

Unit V - Code Optimization and Code Generation


Principle sources of optimization, optimization of basic blocks, flow graphs, data flow
analysis. Machine dependent code generation; object code forms, generic code
generation algorithm, register allocation and assignment, using DAG representation of
Blocks, Peephole optimization.

10/22/2021 Prepared by Dr R Raja 4


Course Outcomes: At the end of the course, the student will be able to

CO 1 : Thoroughly understand formal language principles, employ finite


state machines to solve problems in computing for recognizing
languages and work with LEX tool to write lexical analyzer for
programming languages like C, C++ and Java.
CO 2 : Understand various parsing techniques and YAAC tool to write parser
for programming languages like C, C++.
CO 3 : Understand how to incorporate semantic actions and type
information for identifiers and use them in performing type
checking.
CO 4 : Understand various storage organizations, allocation strategies,
intermediate code representations and generation for various
programming language constructs.

10/22/2021 Prepared by Dr R Raja 5


Automata Theory

10/22/2021 6
10/22/2021 7
Why to bother with automata theory?

10/22/2021 8
Historical perspective of Automata theory

10/22/2021 Prepared by Dr R Raja 9


Introduction to Finite Automata

Finite automata are a useful model for many important kinds of hardware and
software. some of the most important kinds:
 Software for designing and checking the behavior of digital circuits.
 The lexical analyzer of a typical compiler that is the compiler component that
breaks the input text into logical units such as identifiers keywords and
punctuation.
 Software for scanning large bodies of text such as collections of Web pages to
find occurrences of words phrases or other patterns.
 Software for verifying systems of all types that have a finite number of distinct
states such as communications protocols or protocols for secure exchange of
information Prepared by Dr R Raja 10
10/22/2021 Prepared by Dr R Raja 11
10/22/2021 Prepared by Dr R Raja 12
Structural Representations
There are two important notations that are not automaton like but play an important role in
the study of automata and their applications
1. Grammars are useful models when designing software that processes data with a
recursive structure. The best known example is a parse the component of a compiler
that deals with the recursively nested features of the typical programming language
such as expressions arithmetic conditional and so on For instance a grammatical rule
like E  E + E states that an expression can be formed by taking any two expressions
and connecting them by a plus sign.
2. Regular Expressions also denote the structure of data especially text strings the
patterns of strings they describe are exactly the same as what can be described by
finite automata. The style of these expressions differs significantly from that of
grammars. The UNIX style regular expression represents capitalized

words followed by a space and two capital letters This expression represents patterns
in text that could be a city and state. If multiword city names

10/22/2021 Prepared by Dr R Raja 13


Automata and Complexity
Automata are essential for the study of the limits of computation. There are two
important issues

1.What can a computer do at all? This study is called decidability and the
problems that can be solved by computer are called decidable.

2. What can a computer do efficiently? This study is called intractability and the
problems that can be solved by a computer using no more time than some slowly
growing function of the size of the input are called tractable Often we take all
polynomial functions to be slowly growing while functions that grow faster than
any polynomial are deemed to grow too fast.
14
The Central Concepts of Automata Theory

The most important definitions of terms in theory of automata. These


concepts include the Alphabet (a set of symbols ) Strings (a list of symbols
from an alphabet) and Language (a set of strings from the same alphabet).

Symbol: A symbol is an abstract entity i.e., letters and digits.

Example: 0,1

Alphabet (∑): An alphabet is a finite, nonempty set of symbols.

Example: Binary alphabet ∑ = {0, 1}

String: It is a sequence of symbols.

Example: 0101 is a string

10/22/2021 Prepared by Dr R Raja 15


Finite String: It is a finite sequence of symbols.
Example: 010 is a finite string which has length of 3.
Infinite String: It is an infinite sequence of symbols.
Example: 011111… is an infinite string which has infinite length. (infinite strings are
not used in any formal language)
Language: A language is a collection of sentences of finite length all constructed
from a finite alphabet of symbols.
Example: L = {00, 010, 00000, 110000} is a language over input alphabet ∑ = {0, 1}
Formal Language: It is a language where form of strings is restricted over given
alphabet.
Example:
Set of all strings where each string starts with 1 over binary alphabet.
16
L={1, 10, 11, …} over 0’s and 1’s.
10/22/2021 Prepared by Dr R Raja 17
10/22/2021 Prepared by Dr R Raja 18
Empty String (Λ or ε or λ): If length of the string is zero, such string is called as empty string or void
string.
Concatenation of two strings:
If x, y ∈ ∑*, then x concatenated with y is the word formed by the symbols of x followed by the
symbols of y.
This is denoted by x.y, it is same as xy.
Substring of a string:
A string v is a substring of a string ω if and only if there are some strings x and y such that ω = xvy.
Suffix of a string:
If ω = xv for some string x, then v is suffix of ω.
Prefix of a string:
If ω = vy for some string y, then v is a prefix of ω.
Reversal of a string:
Given a string ω, its reversal denoted by ωR is the string spelled backwards.
10/22/2021 Prepared by Dr R Raja 19
Length of a String
Definition − It is the number of symbols present in a string. (Denoted by |S|).
Examples −
If S = ‘cabcad’, |S|= 6

10/22/2021 Prepared by Dr R Raja 20


Kleene Closure:
If ∑ is the Alphabet, then there is a language in which any string of letters from ∑ is a
word, even the null string. We call this language closure of the alphabet.
It is denoted by * (asterisk) after the name of the alphabet is ∑*. This notation is also
known as the Kleene Star.
If ∑ = {a, b}, then ∑* = {ε, a, b, a, ab, bb,….}
∑* = ∑0 ∪ ∑1 ∪ ∑2 ∪ …
∑* = ∑+ ∪ {ε}

Positive Closure:
The’ +’ (plus operation) is sometimes called positive Closure.
If ∑ = {a}, then ∑+ = {a, aa, aaa, …} = the set of nonempty strings from ∑
∑+ = ∑* – {ε}
10/22/2021 Prepared by Dr R Raja 21
Finite Automaton(FA) or Finite State Machine (FSM)
An automaton with a finite number of states is called a Finite Automaton(FA) or Finite
State Machine (FSM).
Formal definition of a Finite Automaton
An automaton can be represented by a 5-tuple (Q, ∑, δ, q0, F),
Finite Automaton can
where − be classified into two
types −
Q is a finite set of states.
 Deterministic Finite
∑ is a finite set of symbols, called the alphabet of the automaton. Automaton (DFA)
 Non-deterministic
δ is the transition function.
Finite Automaton
q0 is the initial state from where any input is processed (q0 ∈ Q). (NDFA / NFA)
F is a set of final state/states of Q (F ⊆ Q).

10/22/2021 Prepared by Dr R Raja 22


Deterministic Finite Automaton (DFA)
In DFA, for each input symbol, one can determine the state to which the machine
will move. Hence, it is called Deterministic Automaton. As it has a finite number
of states, the machine is called Deterministic Finite Machine or Deterministic
Finite Automaton.
Formal Definition of a DFA
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F)
where −
Q is a finite set of states.
∑ is a finite set of symbols called the alphabet.
δ is the transition function where δ: Q × ∑ → Q
q0 is the initial state from where any input is processed (q0 ∈ Q).
F is a set of final state/states of Q (F ⊆ Q).
10/22/2021 Prepared by Dr R Raja 23
Graphical Representation of a DFA

A DFA is represented by digraphs called state diagram.

 The vertices represent the states.

 The arcs labelled with an input alphabet show the transitions.

 The initial state is denoted by an empty single incoming arc.

 The final state is indicated by double circles.

10/22/2021 Prepared by Dr R Raja 24


10/22/2021 Prepared by Dr R Raja 25
10/22/2021 Prepared by Dr R Raja 26
10/22/2021 27
10/22/2021 Prepared by Dr R Raja 28
Examples of DFA
Example 1:
Design a FA with ∑ = {0, 1} accepts those string which starts with 1 and ends with 0.
Solution:
The FA will have a start state q0 from which only the edge with input 1 will go to the next
state.

In state q1, if we read 1, we will be in state q1, but if we read 0 at state q1, we will reach
to state q2 which is the final state. In state q2, if we read either 0 or 1, we will go to q2
state or q1 state respectively. Note that if the input ends with 0, it will be in the final
state.
10/22/2021 Prepared by Dr R Raja 29
Example 2:

Design a FA with ∑ = {0, 1} accepts the only input 101.

Solution:

In the given solution, we can see that only input 101 will be accepted. Hence, for input
101, there is no other path shown for other input.

10/22/2021 Prepared by Dr R Raja 30


Example 3:
Design FA with ∑ = {0, 1} accepts even number of 0's and even number of 1's.
Solution:
This FA will consider four different stages for input 0 and input 1. The stages could be:

Here q0 is a start state and the final state also. Note


carefully that a symmetry of 0's and 1's is
maintained. We can associate meanings to each
state as:
q0: state of even number of 0's and even number of
1's.
q1: state of odd number of 0's and even number of
1's.
q2: state of odd number of 0's and odd number of
1's.
q3: state of even number of 0's and odd number of
1's.
10/22/2021 Prepared by Dr R Raja 31
Example 4:
Design FA with ∑ = {0, 1} accepts the set of all strings with three consecutive 0's.
Solution:
The strings that will be generated for this particular languages are 000, 0001,
1000, 10001, .... in which 0 always appears in a clump of 3. The transition graph is as
follows:

10/22/2021 Prepared by Dr R Raja 32


Example 5:
Design a DFA L(M) = {w | w ε {0, 1}*} and W is a string that does not contain
consecutive 1's.
Solution:
When three consecutive 1's occur the DFA will be:

Here two consecutive 1's or single 1 is acceptable, hence

The stages q0, q1, q2 are the final states. The DFA will generate the strings that do not
contain consecutive 1's like 10, 110, 101,..... etc.
10/22/2021 Prepared by Dr R Raja 33
Example 6:
Design a FA with ∑ = {0, 1} accepts the strings with an even number of 0's followed by
single 1.
Solution:
The DFA can be shown by a transition diagram as:

10/22/2021 Prepared by Dr R Raja 34


Non-Deterministic Finite Automata (NFA / NDFA) :
 NFA stands for non-deterministic finite automata. It is easy to construct an NFA than DFA

for a given regular language.

 The finite automata are called NFA when there exist many paths for specific input from the

current state to the next state.

 Every NFA is not DFA, but each NFA can be translated into DFA.

 NFA is defined in the same way as DFA but with the two exceptions, it contains multiple

next states, and it contains ε transition.

10/22/2021 Prepared by Dr R Raja 35


In the following diagram, we can see that from state q0 for input a, there are two next

states q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is

not fixed or determined that with a particular input where to go next. Hence this FA is

called non-deterministic finite automata.

10/22/2021 36
Design a NFA for the transition table as given below:

Present State 0 1

→q0 q0, q1 q0, q2


q1 q3 ε
q2 q2, q3 q3
→q3 q3 q3

10/22/2021 Prepared by Dr R Raja 37


Design an NFA with ∑ = {0, 1} accepts all string ending with 01.

Solution:

10/22/2021 Prepared by Dr R Raja 38


Design an NFA with ∑ = {0, 1} in which double '1' is followed by double '0'.

The FA with double 1 is as follows:

It should be immediately followed by double 0. Then,

Now before double 1, there can be any string of 0 and 1. Similarly, after double 0, there
can be any string of 0 and 1. Hence the NFA becomes:

Now considering the string 01100011


10/22/2021 39
Design an NFA in which all the string contain a substring 1110.

The language consists of all the string containing substring 1110. The partial transition
diagram can be:

Now as 1110 could be the substring. Hence we will add the inputs 0's and 1's so that the
substring 1110 of the language can be maintained. Hence the NFA becomes:

We can process string 111010


δ(q1, 111010) = δ(q2, 1100)
= δ(q3, 100)
= δ(q4, 00)
= δ(q5, 0)
10/22/2021 40
= δ(q5, ε)
Design an NFA with ∑ = {0, 1} accepts all string in which the third symbol from the
right end is always 0.

Thus we get the third symbol from the right end as '0' always. The NFA can be:

OOO, O11,1011,11011

10/22/2021 Prepared by Dr R Raja 41


An Application: Text Search
NFA’s for Text Search
Example : use an NFA to search two keywords “web” and “eBay” among
text
Σ
∑ = set of all printable ASCII characters

e b
2 3 4
$20
Σ w

1 Σ

e
Start 5 6 7 8
$20
b a y

10/22/2021 Prepared by Dr R Raja 42


Finite Automata with Epsilon-Transitions

Use of ε -transitions
 We allow the automaton to accept the empty string ε.
 This means that a transition is allowed to occur without
reading input symbol.
 The resulting NFA is called ε -NFA.
 It adds “programming (design) convenience” (more
intuitive for use in designing FA’s)

10/22/2021 Prepared by Dr R Raja 43


Example : An ε -NFA accepting decimal numbers like 2.15, .125, +1.4, -0.501…

10/22/2021 Prepared by Dr R Raja 44


Formal Notation for an ε-NFA
Definition: an ε -NFA A is denoted by A = (Q, S, d, q0, F)

where the transition function δ takes as arguments:

 a state in Q, and

 a member of ∑∪{ε}

10/22/2021 Prepared by Dr R Raja 45


Epsilon-Closures (ε -closures)
Formal recursive definition of the set ECLOSE(q) for q:
 State q is in ECLOSE(q) (including the state itself);
 If p is in ECLOSE(q), then all states accessible from p
through paths of ε ’s are also in ECLOSE(q).

10/22/2021 Prepared by Dr R Raja 46


Find ε –closure of all the states

ε –closure(1) ={1,2,3,4,6}

ε–closure(2) ={2,3,6}

ε–closure(3) ={3,6}

ε−closure(4) = {4}

ε –closure(5) ={5,7}

ε–closure(6) = {6}

ε –closure(7) ={7}

10/22/2021 Prepared by Dr R Raja 47


Eliminating ε Transitions
NFA with ε can be converted to NFA without ε, and this NFA without ε can be
converted to DFA. To do this, we will use a method, which can remove all the ε
transition from given NFA. The method will be:
1. Find out all the ε transitions from each state from Q. That will be called as
ε-closure{q1} where qi ∈ Q.
2. Then δ' transitions can be obtained. The δ' transitions mean a ε-closure on δ
moves.
3. Repeat Step-2 for each input symbol and each state of given NFA.
4. Using the resultant states, the transition table for equivalent NFA without ε
can be built.

10/22/2021 Prepared by Dr R Raja 48


Conversion of NFA with €-transitions to NFA without
€-transitions
Steps to Convert of NFA with €-transitions to NFA without €-transitions
1. Find €-closure of all the states
2. Find Extended transition function δ' on each input symbol with each and
every state
a) δ'(q0, a) = ε-closure(δ(δ^(q0, ε),a))
b) δ^(q0, ε) = ε-closure (q0)
3. Summarize all the computed δ' transitions
4. Construction of the transition table
5. Construction of the transition diagram (which state contains the final state in
ε-closure then that state is called as final state)
10/22/2021 Prepared by Dr R Raja 49
Example: Convert the following NFA with ε to NFA without ε.

Solutions: We will first obtain ε-closures of q0, q1 and q2 as follows:


ε-closure(q0) = {q0}
ε-closure(q1) = {q1, q2}
ε-closure(q2) = {q2}
Now the δ' transition on each input symbol is obtained as:
δ'(q1, a) = ε-closure(δ(δ^(q1, ε),a))
δ'(q0, a) = ε-closure(δ(δ^(q0, ε),a)) = ε-closure(δ(ε-closure(q1),a))
= ε-closure(δ(ε-closure(q0),a)) = ε-closure(δ(q1, q2), a)
= ε-closure(δ(q0, a)) = ε-closure(δ(q1, a) ∪ δ(q2, a))
= ε-closure(q1) = ε-closure(Ф ∪ Ф)
= {q1, q2} =Ф
δ'(q1, b) = ε-closure(δ(δ^(q1, ε),b))
δ'(q0, b) = ε-closure(δ(δ^(q0, ε),b)) = ε-closure(δ(ε-closure(q1),b))
= ε-closure(δ(ε-closure(q0),b)) = ε-closure(δ(q1, q2), b)
= ε-closure(δ(q0, b)) = ε-closure(δ(q1, b) ∪ δ(q2, b))
=Ф = ε-closure(Ф ∪ q2)
10/22/2021 50
= {q2}
δ'(q2, a) = ε-closure(δ(δ^(q2, ε),a)) Now we will summarize all the computed δ'
= ε-closure(δ(ε-closure(q2),a)) transitions:
= ε-closure(δ(q2, a)) δ'(q0, a) = {q1, q2}
= ε-closure(Ф)
δ'(q0, b) = Ф

δ'(q1, a) = Ф
δ'(q2, b) = ε-closure(δ(δ^(q2, ε),b)) δ'(q1, b) = {q2}
= ε-closure(δ(ε-closure(q2),b)) δ'(q2, a) = Ф
= ε-closure(δ(q2, b)) δ'(q2, b) = {q2}
= ε-closure(q2)
= {q2}
The transition table can be: State q1 and q2 become the final state as ε-closure
of q1 and q2 contain the final state q2. The NFA
can be shown by the following transition diagram:
→q0 {q1, q2} Ф
*q1 Ф {q2}
*q2 Ф {q2}
10/22/2021 51
10/22/2021 52
10/22/2021 53
Convert the following NFA with ε to NFA without ε.

10/22/2021 54
Conversion of NFA to DFA

 In NFA, when a specific input is given to the current state, the machine goes to multiple

states. It can have zero, one or more than one move on a given input symbol.

 On the other hand, in DFA, when a specific input is given to the current state, the

machine goes to only one state. DFA has only one move on a given input symbol.

 Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M). There should be

equivalent DFA denoted by M' = (Q', ∑', q0', δ', F') such that L(M) = L(M').

10/22/2021 Prepared by Dr R Raja 55


Conversion of NFA to DFA
Steps for converting NFA to DFA:

Step 1: Initially Q' = ϕ

Step 2: Add q0 of NFA to Q'. Then find the transitions from this start

state.

Step 3: In Q', find the possible set of states for each input symbol. If this

set of states is not in Q', then add it to Q'.

Step 4: In DFA, the final state will be all the states which contain F(final
10/22/2021 56
states of NFA)
Conversion of NFA to DFA
Example : Convert the given NFA to DFA.

Solution: For the given transition diagram we will first construct the transition table.
State 0 1 The δ' transition for state q2 is obtained as:
→q0 q0 q1 δ'([q2], 0) = [q2]
q1 {q1, q2} q1 δ'([q2], 1) = [q1, q2]
*q2 q2 {q1, q2}
Now we will obtain δ' transition on [q1, q2].
Now we will obtain δ' transition for state q0.
δ'([q1, q2], 0) = δ(q1, 0) ∪ δ(q2, 0)
δ'([q0], 0) = [q0]
= {q1, q2} ∪ {q2}
δ'([q0], 1) = [q1]
= [q1, q2]
The δ' transition for state q1 is obtained as: δ'([q1, q2], 1) = δ(q1, 1) ∪ δ(q2, 1)
= {q1} ∪ {q1, q2}
δ'([q1], 0) = [q1, q2] (new state generated)
= {q1, q2}
δ'([q1], 1) = [q1]
10/22/2021
= [q1, q2] 57
The state [q1, q2] is the final state as well because it contains a final state q2. The transition
table for the constructed DFA will be:
State 0 1

→[q0] [q0] [q1]


[q1] [q1, q2] [q1]
*[q2] [q2] [q1, q2]
*[q1, q2] [q1, q2] [q1, q2]

The Transition diagram will be:

10/22/2021 Prepared by Dr R Raja 58


Example : Convert the given NFA to DFA.

Solution: For the given transition diagram we will first construct the transition table.
State 0 1
Now we will obtain δ' transition on [q0, q1].
→q0 {q0, q1} {q1}
*q1 ϕ {q0, q1}
δ'([q0, q1], 0) = δ(q0, 0) ∪ δ(q1, 0)
= {q0, q1} ∪ ϕ
Now we will obtain δ' transition for state q0. = {q0, q1}
δ'([q0], 0) = {q0, q1} = [q0, q1]
= [q0, q1] (new state generated) δ'([q0, q1], 1) = δ(q0, 1) ∪ δ(q1, 1)
δ'([q0], 1) = {q1} = [q1] = {q1} ∪{qo,q1}
The δ' transition for state q1 is obtained as: = {q0, q1}
= [q0, q1]
δ'([q1], 0) = ϕ
δ'([q1], 1) = [q0, q1]

10/22/2021 59
As in the given NFA, q1 is a final state, then in DFA wherever, q1 exists that state becomes a
final state. Hence in the DFA, final states are [q1] and [q0, q1]. Therefore set of final states
F = {[q1], [q0, q1]}.
The transition table for the constructed DFA will be:

State 0 1

→[q0] [q0, q1] [q1]


*[q1] ϕ [q0, q1]
*[q0, q1] [q0, q1] [q0, q1]

The Transition diagram will be: Even we can change the name of the states of
DFA.
Suppose
A = [q0]
B = [q1]
C = [q0, q1]

10/22/2021 60
Example Problems for Conversion of NFA to DFA
Construct a NFA accepting the set of strings over {a,b} ending with “aba”. Use it to
construct a DFA accepting the same set of strings.
Solution : NFA for accepting the strings ending with “aba”

Now we will obtain δ' transition for state q0. Now we will obtain δ' transition for state
{q0,q2}

Now we will obtain δ' transition for state


{q0,q1}. Now we will obtain δ' transition for state
{q0,q1,q3}

10/22/2021 61
Transition Table and Transition Diagram

10/22/2021 Prepared by Dr R Raja 62


10/22/2021 Prepared by Dr R Raja 63
64
65
10/22/2021 Prepared by Dr R Raja 66
67
68
69
10/22/2021 Prepared by Dr R Raja 70
10/22/2021 Prepared by Dr R Raja 71
Conversion of NFA to DFA Using Subset Construction Method

10/22/2021 Prepared by Dr R Raja 72


10/22/2021 Prepared by Dr R Raja 73
Conversion of NFA to DFA Using Subset Construction Method

10/22/2021 74
10/22/2021 75
10/22/2021 Prepared by Dr R Raja 76
Conversion of NFA with ε to DFA:
Steps for converting NFA with ε to DFA:

Step 1: We will take the ε-closure for the starting state of NFA as a starting state of DFA.

Step 2: Find the states for each input symbol that can be traversed from the present. That

means the union of transition value and their closures for each state of NFA

present in the current state of DFA.

Step 3: If we found a new state, take it as current state and repeat step 2.

Step 4: Repeat Step 2 and Step 3 until there is no new state present in the transition table

of DFA.

Step 5: Mark the states of DFA as a final state which contains the final state of NFA.
10/22/2021 77
Convert the NFA with ε into its equivalent DFA.

Solution: Now, let ε-closure {q0} = {q0, q1, q2} be state A.


Let us obtain ε-closure of each state. Hence
ε-closure {q0} = {q0, q1, q2} δ'(A, 0) = ε-closure {δ((q0, q1, q2), 0) }
ε-closure {q1} = {q1} = ε-closure {δ(q0, 0) ∪ δ(q1, 0) ∪ δ(q2,0)}
ε-closure {q2} = {q2} = ε-closure {q3}
ε-closure {q3} = {q3} = {q3} call it as state B.
ε-closure {q4} = {q4} δ'(A, 1) = ε-closure {δ((q0, q1, q2), 1) }
= ε-closure{δ((q0, 1) ∪ δ(q1, 1) ∪ δ(q2,1)}
= ε-closure {q3}
= {q3} = B.

10/22/2021 78
Now,
δ'(B, 0) = ε-closure {δ(q3, 0) } For state C:
=ϕ δ'(C, 0) = ε-closure {δ(q4, 0) }
δ'(B, 1) = ε-closure {δ(q3, 1) } =ϕ
= ε-closure {q4} δ'(C, 1) = ε-closure {δ(q4, 1) }
= {q4} i.e. state C =ϕ

The DFA will be,

10/22/2021 Prepared by Dr R Raja 79


Convert the given NFA into its equivalent DFA.

Solution: Let us obtain the ε-closure of each state.


ε-closure(q0) = {q0, q1, q2} δ'(A, 2) = ε-closure{δ((q0, q1, q2),2)}
ε-closure(q1) = {q1, q2} = ε-closure{δ(q0, 2) ∪ δ(q1, 2) ∪ δ(q2, 2)}
ε-closure(q2) = {q2} = ε-closure{q2}
Now we will obtain δ' transition. = {q2} call it state C
Let ε-closure(q0) = {q0, q1, q2} call it as state A.

δ'(A, 0) = ε-closure{δ((q0, q1, q2), 0)}


= ε-closure{δ(q0, 0) ∪ δ(q1, 0) ∪ δ(q2, 0)}
= ε-closure{q0}
= {q0, q1, q2}  A

δ'(A, 1) = ε-closure{δ((q0, q1, q2), 1)}


= ε-closure{δ(q0, 1) ∪ δ(q1, 1) ∪ δ(q2, 1)}
= ε-closure{q1} 80
= {q1, q2} call it as state B
Hence δ'(C, 2) = ε-closure{δ(q2, 2)}
δ'(B, 0) = ε-closure{δ((q1, q2), 0)} = {q2}  C
= ε-closure{δ(q1, 0) ∪ δ(q2, 0)}
= ε-closure{ϕ} Hence the DFA is

δ'(B, 1) = ε-closure{δ((q1, q2), 1)}


= ε-closure{δ(q1, 1) ∪ δ(q2, 1)}
= ε-closure{q1}
= {q1, q2} i.e. state B itself

δ'(B, 2) = ε-closure{δ((q1, q2), 2)}


= ε-closure{δ(q1, 2) ∪ δ(q2, 2)}
= ε-closure{q2}  As A = {q0, q1, q2} in which final state
= {q2} i.e. state C itself q2 lies hence A is final state.

δ'(C, 0) = ε-closure{δ(q2, 0)}  B = {q1, q2} in which the state q2 lies


= ε-closure{ϕ} =ϕ hence B is also final state.

δ'(C, 1) = ε-closure{δ(q2, 1)}  C = {q2}, the state q2 lies hence C is also


81
= ε-closure{ϕ} =ϕ a final state.
Conversion of Epsilon NFA to DFA

82
10/22/2021 83
Conversion of Epsilon NFA to DFA
10/22/2021 85
Convert the Following Epsilon NFA into DFA

1.

2.

3.

10/22/2021 86
Conversion of Regular Expression to Finite Automata : Direct Method
Example 1: Design a FA from given regular expression 10 + (0 + 11)0* 1

Step 1: Step 4:

Step 2:

Step 5:
Step 3:

10/22/2021 87
Example 2: Design a FA from given regular expression 1 (1* 01* 01*)*

Step 1:

Step 2:

Step 3:

10/22/2021 Prepared by Dr R Raja 88


Example 3: Design a FA from given regular expression 0*1 + 10

Step 1: Step 3:

Step 2:

10/22/2021 89
Example 4: Design a FA from given regular expression (0+1)*(00+11)(0+1)*

Step 1: Step 4:

Step 2:
Step 5:

Step 3:

10/22/2021 Prepared by Dr R Raja 90


Conversion of Regular Expression to Finite Automata : Thompson’s Construction
Method

10/22/2021 Prepared by Dr R Raja 91


Conversion of Regular Expression to Finite Automata : Thompson’s Construction
Method
UNION

10/22/2021 Prepared by Dr R Raja 92


Conversion of Regular Expression to Finite Automata : Thompson’s Construction
Method
CONCATENATION

10/22/2021 Prepared by Dr R Raja 93


Conversion of Regular Expression to Finite Automata : Thompson’s Construction
Method
CLOSURE

10/22/2021 Prepared by Dr R Raja 94


=> ((ab)+c)*

10/22/2021 Prepared by Dr R Raja 95


10/22/2021 Prepared by Dr R Raja 96
10/22/2021 Prepared by Dr R Raja 97
(a|b)*abb

10/22/2021 Prepared by Dr R Raja 98


(0|1)(0|2)

10/22/2021 Prepared by Dr R Raja 99


(12|53)

10/22/2021 Prepared by Dr R Raja 100


(abc)*

10/22/2021 Prepared by Dr R Raja 101


10/22/2021 Prepared by Dr R Raja 102
10/22/2021 Prepared by Dr R Raja 103
COMPILER

10/22/2021 Prepared by Dr R Raja 104


What is Compiler?
 A compiler translates (or compiles) a program written in a high-level programming
language that is suitable for human programmers into the low-level machine language that
is required by computers.
 Using a high-level language for programming has a large impact on how fast programs can
be developed. The main reasons for this are:
 Compared to machine language, the notation used by programming languages is
closer to the way humans think about problems.
 The compiler can spot some obvious programming mistakes.
 Programs written in a high-level language tend to be shorter than equivalent
programs written in machine language.
 Another advantage of using a high-level level language is that the same program can be
compiled to many different machine languages and, hence, be brought to run on many
different machines.
 A good compiler will, however, be able to get very close to the speed of hand-written
105
machine code when translating well-structured programs.
Other Applications
 In addition to the development of a compiler, the techniques used in compiler
design can be applicable to many problems in computer science.
 Techniques used in a lexical analyzer can be used in text editors, information
retrieval system, and pattern recognition programs.
 Techniques used in a parser can be used in a query processing system such as
SQL.
 Many software having a complex front-end may need techniques used in
compiler design.
 A symbolic equation solver which takes an equation as input. That program should
parse the given input equation.
 Most of the techniques used in compiler design can be used in Natural
Language Processing (NLP) systems.
10/22/2021 Prepared by Dr R Raja 106
Major Parts of Compilers
There are two major parts of a compiler: Analysis and Synthesis
 In analysis phase, an intermediate representation is created from the given source
program.
Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.
 In synthesis phase, the equivalent target program is created from this intermediate
representation.
Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this
phase.
The Analysis-Synthesis Model of Compilation
There are two parts to compilation:
 Analysis determines the operations implied by the source program which are recorded
in a tree structure
 Synthesis takes the tree structure and translates the operations therein into the target
program
Other Tools that Use the Analysis-Synthesis Model

 Editors (syntax highlighting)


 Pretty printers (e.g. Doxygen)
 Static checkers (e.g. Lint and Splint)
 Interpreters
 Text formatters (e.g. TeX and LaTeX)
 Silicon compilers (e.g. VHDL)
 Query interpreters/compilers (Databases)

Analysis of the Source Program

In compiling, analysis of the source program consists of three phases:

 Lexical analysis
 Syntax analysis
 Semantic analysis
10/22/2021 Prepared by Dr R Raja 108
Language Processors

10/22/2021 Prepared by Dr R Raja 109


Phases of the Compiler

10/22/2021 110
Lexical Analyzer
 Lexical Analyzer reads the source program character by character and returns the
tokens of the source program.
 A token describes a pattern of characters having same meaning in the source program.
(such as identifiers, operators, keywords, numbers, delimeters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number

 Puts information about identifiers into the symbol table.


 Regular expressions are used to describe tokens (lexical constructs).
 A (Deterministic) Finite State Automaton can be used in the implementation of a
lexical analyzer.

10/22/2021 Prepared by Dr R Raja 111


Syntax Analyzer
 A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the
given program.
 A syntax analyzer is also called as a parser.
 A parse tree describes a syntactic structure.
assgstmt

identifier := expression

newval expression + expression

identifier number

oldval 12

 In a parse tree, all terminals are at leaves.

 All inner nodes are non-terminals in a context free grammar

10/22/2021 Prepared by Dr R Raja 112


Syntax Analyzer (CFG)

 The syntax of a language is specified by a context free grammar (CFG).


 The rules in a CFG are mostly recursive.
 A syntax analyzer checks whether a given program satisfies the rules implied by a CFG
or not.
– If it satisfies, the syntax analyzer creates a parse tree for the given program.

 Ex: We use BNF (Backus Naur Form) to specify a CFG


assgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression

10/22/2021 Prepared by Dr R Raja 113


Syntax Analyzer versus Lexical Analyzer
 Which constructs of a program should be recognized by the lexical analyzer, and which
ones by the syntax analyzer?
 Both of them do similar things; But the lexical analyzer deals with simple non-
recursive constructs of the language.
 The syntax analyzer deals with recursive constructs of the language.
 The lexical analyzer simplifies the job of the syntax analyzer.
 The lexical analyzer recognizes the smallest meaningful units (tokens) in a
source program.
 The syntax analyzer works on the smallest meaningful units (tokens) in a source
program to recognize meaningful structures in our programming language.

10/22/2021 Prepared by Dr R Raja 114


Parsing Techniques
 Depending on how the parse tree is created, there are different parsing techniques.
 These parsing techniques are categorized into two groups:
 Top-Down Parsing,
 Bottom-Up Parsing
 Top-Down Parsing:
 Construction of the parse tree starts at the root, and proceeds towards the leaves.
 Efficient top-down parsers can be easily constructed by hand.
 Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
 Bottom-Up Parsing:
 Construction of the parse tree starts at the leaves, and proceeds towards the root.
 Normally efficient bottom-up parsers are created with the help of some software
tools.
 Bottom-up parsing is also known as shift-reduce parsing.
 Operator-Precedence Parsing – simple, restrictive, easy to implement
 LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

10/22/2021 Prepared by Dr R Raja 115


Semantic Analyzer
 A semantic analyzer checks the source program for semantic errors and collects the type
information for the code generation.
 Type-checking is an important part of semantic analyzer.
 Normally semantic information cannot be represented by a context-free language used in
syntax analyzers.
 Context-free grammars used in the syntax analysis are integrated with attributes (semantic
rules)
 the result is a syntax-directed translation,
 Attribute grammars
 Ex:
newval := oldval + 12

• The type of the identifier newval must match with type of the expression
(oldval+12)

10/22/2021 Prepared by Dr R Raja 116


Intermediate Code Generation
 A compiler may produce an explicit intermediate codes representing the source program.
 These intermediate codes are generally machine (architecture independent). But the level
of intermediate codes is close to the level of machine codes.
 Ex:
newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1 Intermediates Codes (Quadraples)


ADD temp1,#1,temp2
MOV temp2,,id1

10/22/2021 Prepared by Dr R Raja 117


Code Optimizer (for Intermediate Code Generator)
 The code optimizer optimizes the code produced by the intermediate code generator in
the terms of time and space.

 Ex:

MULT id2,id3,temp1
ADD temp1,#1,id1
Code Generator
 Produces the target language in a specific architecture.
 The target program is normally is a relocatable object file containing the machine
codes.
 Ex:
( assume that we have an architecture with instructions whose at least one of its
operands is a machine register)

MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1 Prepared by Dr R Raja 118
Symbol-Table Management
 An essential function of a compiler is to record the identifiers used in the source
program and collect information about various attributes of each identifier.
 These attributes may provide information about the storage allocated for an identifier,
its type, its scope, and,
 In the case of procedure names, such things as the number and types of its arguments,
the method of passing each argument, and the type of returned, if any.
 A symbol table is a data structure containing a record of each identifier, with fields
for the attributes of the identifier.
 When an identifier in the source program is detected by the lexical analyzer, the
identifier is entered into the symbol table.
 The remaining phases enter information about identifiers into the symbol table and
then use this information in various way.
 Ex. when doing semantic analysis and intermediate code generation,
 we need to know what the types of identifiers are, so we can check that the source
program uses them in valid ways,
 And so that we can generate the proper operations on them.
 The code generator typically enters and uses detail information about the storage
assigned to identifiers.
10/22/2021 Prepared by Dr R Raja 119
Error Detection and Reporting

 Each phase can encounter errors.


 However, after detecting an error, a phase must somehow deal with that error,
so that compilation can proceed, allowing further errors in the source program to be
detected.
 A compiler that stops when it finds the first error is not as helpful as it could be.
 The syntax and semantic analysis phases usually handle a large fraction of the error
detectable by the compiler.
 The lexical phase can detect errors where the characters remaining in the input do not
form any token of the language.
 Errors where the token stream violates the structure rules (syntax) of the language are
determined by the syntax analysis phase.
 During semantic analysis the compiler tries to detect constructs that have the right
syntactic structure but no meaning to the operation involved,
 e.g. to add two identifiers, one of which is the name of an array, and the other the name of
a procedure.

10/22/2021 Prepared by Dr R Raja 120


10/22/2021 121
Cousins of the Compiler
The cousins of the compiler means “the context in which a compiler typically
operates”
The cousins of the compiler are:
 Preprocessor.
 Assembler.
 Loader and Link-editor.

Preprocessor: A preprocessor is a program that processes its input data to produce


output that is used as input to another program. The preprocessor is executed before
the actual compilation of code begins. They may perform the following functions
1. Macro processing
2. File Inclusion
3. Rational Preprocessors
4. Language extension

10/22/2021 Prepared by Dr R Raja 122


 ASSEMBLERS take assembly code and covert to machine code.

 Some compilers go directly to machine code; others produce assembly code


then call a separate assembler.

 Either way, the output machine code is usually RELOCATABLE, with


memory addresses starting at location 0

 Assembly code: names are used for instructions, and names are used for
memory addresses.

 Two-pass Assembly:
 First Pass: all identifiers are assigned to memory addresses (0-offset)
e.g. substitute 0 for a, and 4 for b
 Second Pass: produce relocatable machine code:
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
relocation bit
10/22/2021 Prepared by Dr R Raja 123
LOADER AND LINK-EDITOR

 LOADERS take relocatable machine code and alter the addresses, putting the
instructions and data in a particular location in memory.

 The LINK EDITOR (part of the loader) pieces together a complete program from
several independently compiled parts.

 Loader: taking relocatable machine code, altering the addresses and placing the
altered instructions into memory.
 Link-editor: taking many (relocatable) machine code programs (with cross-
references) and produce a single file.
 Need to keep track of correspondence between variable names and
corresponding addresses in each piece of code.

10/22/2021 Prepared by Dr R Raja 124


The Grouping of Phases into Passes
 Logical organization of a compiler
 Activities from several phases may be grouped together into a pass that reads an input
file and writes an output file.
 For example,
 The front-end phases of lexical analysis, syntactic analysis, semantic analysis,
and intermediate code generation might be grouped into one pass.
 Code optimization might be an optional pass.
 There could be a back-end pass consisting of code generation for a particular
machine.
 Compiler front and back ends:
 Front end: analysis (machine independent)
 Back end: synthesis (machine dependent)
 Compiler passes:
A collection of phases is done only once (single pass) or multiple times (multi pass)
 Single pass: usually requires everything to be defined before being used in
source program
 Multi pass: compiler may have to keep entire program representation in
memory

10/22/2021 Prepared by Dr R Raja 125


Compiler-Construction Tools:
Some commonly used compiler-construction tools include
1. Parser generators
• Automatically produce syntax analyzers from a grammatical description
of a PL.
2. Scanner generators
• Produce lexical analyzers from a regular-expression description of the
tokens of a language.
3. Syntax-directed translation engines
• Produce a collection of routines for walking a parse tree and generating
intermediate code.
4. Code-generator generators
• Produce a code generator from a collection of rules for translating each
operation of intermediate language into the machine language for the
target language.
5. Data-flow analysis engines
• Facilitate the gathering of information about how values are transmitted
from one part of a program to each other part. Key part of code
optimization.
6. Compiler-construction toolkits
• Provide an integrated set of routines for constructing various phases of a
compiler.
10/22/2021 Prepared by Dr R Raja 126
LEXICAL ANALYSIS
Lexical Analysis

Typical tasks of the lexical analyzer:


 Remove white space and comments
 Encode constants as tokens
 Recognize keywords
 Recognize identifiers and store identifier names in a global symbol table

Issues in Lexical Analysis


 There are several reasons for separating the analysis phase of compiling into
lexical analysis and parsing:
 Simpler design
 Compiler efficiency
 Compiler portability
 Specialized tools have been designed to help automate the construction of lexical
analyzer and parser when they are separated.

10/22/2021 Prepared by Dr R Raja 127


The Role of The Lexical Analyzer

token
Source To semantic
program Lexical Analyzer Parser analysis

getNextToken

Symbol
table

 Lexical Analyzer reads the source program character by character to


produce tokens.
 Normally a lexical analyzer doesn’t return a list of tokens at one shot,
it returns a token when the parser asks a token from it.
10/22/2021 Prepared by Dr R Raja 128
 Each time the parser needs a token, it sends a request to the scanner
 the scanner reads as many characters from the input stream as necessary to
construct a single token
 when a single token is formed, the scanner is suspended and returns the token to
the parser
 the parser will repeatedly call the scanner to read all the tokens from the input
stream
 Lexical Analyzer reads the source program character by character to produce
tokens.
 Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns
a token when the parser asks a token from it.
 The main Job of Lexical Analyzer is to find out the Sequence of tokens from the
input Program by scanning it and breaking it which is used for syntax analyzer.
 The secondary task of Lexical analyzer to remove the White spaces. 129
Why to separate Lexical analysis and parsing

 Simplicity of design
 Improving compiler efficiency
 Enhancing compiler portability

Tokens, Patterns and Lexemes

 A token is a pair a token name and an optional token value


 A pattern is a description of the form that the lexemes of a token may take
 A lexeme is a sequence of characters in the source program that matches
the pattern for a token

Example

const pi = 3.1416;
The substring pi is a lexeme for the token “identifier.”
130
10/22/2021 131
10/22/2021 Prepared by Dr R Raja 132
Operations on Languages

10/22/2021 Prepared by Dr R Raja 133


Regular expressions
• Regular expression is a compact notation for describing string.
• In Pascal, an identifier is a letter followed by zero or more letter or digits
→letter(letter|digit)*
• |: or
• *: zero or more instance of
• a(a|d)*

Rules
• ε is a regular expression that denotes {ε}, the set containing empty string.
• If a is a symbol in Σ, then a is a regular expression that denotes {a}, the set containing the
string a.
• Suppose r and s are regular expressions denoting the language L(r) and L(s), then
• (r) |(s) is a regular expression denoting L(r)∪L(s).
• (r)(s) is regular expression denoting L (r) L(s).
• (r) * is a regular expression denoting (L (r) )*.
• (r) is a regular expression denoting L (r).

Precedence Conventions
• The unary operator * has the highest precedence and is left associative.
• Concatenation has the second highest precedence and is left associative.
• | has the lowest precedence and is left associative.
• (a)|(b)*(c)→a|b*c Prepared by Dr R Raja 134
Example of Regular Expressions

10/22/2021 Prepared by Dr R Raja 135


Properties of Regular Expression

Regular Definitions
• If Σ is an alphabet of basic symbols, then a regular definition is a sequence of definitions
of the form:
d1→r1
d2→r2
...
dn→rn
• where each di is a distinct name, and each ri is a regular expression over the symbols in
10/22/2021 136
∪Σ{d1,d2,…,di-1}, i.e., the basic symbols and the previously defined names.
Examples of Regular Definitions

10/22/2021 Prepared by Dr R Raja 137


10/22/2021 138
LEX Tool

10/22/2021 Prepared by Dr R Raja 139


LEX Tool

10/22/2021 Prepared by Dr R Raja 140


10/22/2021 Prepared by Dr R Raja 141
10/22/2021 Prepared by Dr R Raja 142
10/22/2021 Prepared by Dr R Raja 143
144
10/22/2021 Prepared by Dr R Raja 145

You might also like