0% found this document useful (0 votes)

35 views104 pages

CD ch2

The document discusses lexical analysis which involves breaking a program into tokens. A lexical analyzer reads the source code character by character and returns tokens one by one to the parser. Regular expressions are used to specify patterns to identify tokens. A transition diagram is also used which shows the states involved in recognizing tokens based on the input characters.

Uploaded by

Riya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views104 pages

CD ch2

Uploaded by

Riya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 104

SECTION 1.

1
LEXICAL ANALYSIS- INTRODUCTION
LEXICAL ANALYZER

 Lexical Analyzer reads the source program character by character to

produce tokens.
 Normally a lexical analyzer doesn’t return a list of tokens at one shot,
it returns a token when the parser asks a token from it.

source Lexical token

program Parser
Analyzer get next token

Symbol
Table
ROLES OF THE LEXICAL ANALYSER

Lexical analyzer performs following tasks:

 Helps to identify token in the symbol table

 Removes white spaces and comments from the source program

 Correlates error messages with the source program

 Helps you to expands the macros if it is found in the source program

 Read input characters from the source program

TOKENS, LEXEMES AND PATTERNS

 Token: Token is a sequence of characters that can be treated as a

single logical entity. Typical tokens are:
Identifiers 2) keywords 3) operators 4) special symbols 5)constants

 Lexeme: A lexeme is a sequence of characters in the source program

that is matched by the pattern for a token.

 Pattern: A set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule called a
pattern associated with the token.
TOKENS, LEXEMES AND PATTERNS
Token Lexeme Pattern
(element of a
kind )
ID x y n_0 letter followed by letters
and digits
NUM -123 any numeric constant
1.456e-5
IF if if
LPAREN ( (
LITERAL ``Hello'' any string of characters
(except ``) between `` and ``

 Regular expressions are widely used to specify patterns.

EXAMPLE #include <stdio.h>
int maximum(int x, int y){
// This will compare 2 numbers
Tokens Generated
Lexeme Token
int Keyword
maximu Identifier
m Type Examples
( Operator Comment // This will compare
2 numbers
int Keyword
Pre- #include <stdio.h>
x Identifier processor
directive
, Operator
Whitespace /n /b /t
int Keyword
Non-Tokens
Y Identifier
) Operator
{ Operator
TERMINOLOGY OF LANGUAGES

 Alphabet : a finite set of symbols (ASCII characters)

 String :
 Finite sequence of symbols on an alphabet
 Sentence and word are also used in terms of string
  is the empty string
 |s| is the length of string s.
 Language: sets of strings over some fixed alphabet
  the empty set is a language.
 {} the set containing empty string is a language
 The set of well-formed C programs is a language
 The set of all possible identifiers is a language.
 Operators on Strings:
 Concatenation: xy represents the concatenation of strings x and y.
OPERATIONS ON LANGUAGES

 Concatenation:
 L1L2 = { s1s2 | s1  L1 and s2  L2 }
 Union
 L1  L2 = { s | s  L1 or s  L2 }
 Exponentiation:
 L0 = {} L1 = L L2 = LL
 Kleene Closure

 L* = Li
i =0

 Positive Closure


L+ =  L
i
 i =1
EXAMPLE
 L1 = {a,b,c,d} L2 = {1,2}

 L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}

 L1  L2 = {a,b,c,d,1,2}

 L13 = all strings with length three (using a,b,c,d)

 L1* = all strings using letters a,b,c,d and empty string

 L1+ = doesn’t include the empty string

REGULAR EXPRESSIONS

 We use regular expressions to describe tokens of a programming

language.

 A regular expression is built up of simpler regular expressions

(using defining rules)

 Each regular expression denotes a language.

 A language denoted by a regular expression is called as a

regular set.
REGULAR EXPRESSIONS (RULES)
Regular expressions over alphabet 

Reg. Expr Language it denotes

 {}
a  {a}
(r1) | (r2) L(r1)  L(r2)
(r1) (r2) L(r1) L(r2)
(r)* (L(r))*
(r) L(r)

 (r)+ = (r)(r)*
 (r)? = (r) | 
REGULAR EXPRESSIONS (CONT.)
 We may remove parentheses by using precedence rules.
 * highest
 concatenation next
 | lowest
 ab*|c means (a(b)*)|(c)

 Ex:
  = {0,1}
 0|1 => {0,1}
 (0|1)(0|1) => {00,01,10,11}
 0* => { ,0,00,000,0000,....}
 (0|1)* => all strings with 0 and 1, including the empty string
REGULAR DEFINITIONS
 To write regular expression for some languages can be difficult,
because their regular expressions can be quite complex. In those cases,
we may use regular definitions.
 We can give names to regular expressions and we can use these names
as symbols to define other regular expressions.

 A regular definition is a sequence of the definitions of the form:

d1 → r1 where di is a distinct name and
d2 → r2 ri is a regular expression over symbols in
. {d1,d2,...,di-1}
dn → rn
basic symbols previously defined names
REGULAR DEFINITIONS (CONT.)

 Ex: Identifiers in Pascal

letter → A | B | ... | Z | a | b | ... | z
digit → 0 | 1 | ... | 9
id → letter (letter | digit ) *
 If we try to write the regular expression representing identifiers without using regular
definitions, that regular expression will be complex.
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *

 Ex: Unsigned numbers in Pascal

digit → 0 | 1 | ... | 9
digits → digit +
opt-fraction → ( . digits ) ?
opt-exponent → ( E (+|-)? digits ) ?
unsigned-num → digits opt-fraction opt-exponent
NOTATIONAL SHORTHAND
 The following shorthand are often used:

r+ = rr*
r? = r│ε
[a-z] = a │ b │ c │ … │ z

 Examples:
digit → [0-9]
digits → digit+
optional_fraction → (. digits)?
optional_exponent → ( E (+ │ -)? digit+ )?
num → digits optional_fraction optional_exponent
RECOGNITION OF TOKENS
 e.g. Regular Definitions
stmt → if expr then stmt if → if
│ if expr then stmt else stmtthen → then
│ ε else → else
expr → term relop term relop → < │ <= │ = │ <> │ > │ >=
│ term id → letter (letter │digit)*
term → id num →digits optional_fraction
│ num optional_exponent

Assumptions
delim → blank │tab │newline
TRANSITION DIAGRAMS

relop → <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
id → letter ( letterdigit )* letter or digit

start letter other

9 10 11 * return(gettoken(),
install_id())
TRANSITION DIAGRAMS: CODE
 token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
 else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
int fail()
break;
{ forward = token_beginning;
case 1:
swith (start) {
…
case 0: start = 9; break;
case 9: c = nextchar();
case 9: start = 12; break;
if (isletter(c)) state = 10;
case 12: start = 20; break;
else state = fail();
case 20: start = 25; break;
break;
case 25: recover(); break;
case 10: c = nextchar();
default: /* error */
if (isletter(c)) state = 10;
}
else if (isdigit(c)) state = 10;
return start;
else state = 11;
}
break;
…
THE LEX AND FLEX SCANNER GENERATORS

 Lex and its newer cousin flex are scanner generators

 Systematically translate regular definitions into C source code

for efficient scanning

 Generated code is easy to integrate in C applications

CREATING A LEXICAL ANALYZER WITH LEX AND FLEX

lex
source lex or flex lex.yy.c
program compiler
lex.l

lex.yy.c C a.out
compiler

input sequence
stream a.out of tokens
LEX SPECIFICATION

 A lex specification consists of three parts:

regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
 The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
REGULAR EXPRESSIONS IN LEX
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1 r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1\r2 match r1 when followed by r2
{d} match the regular expression defined by d
STAR OPERATION (KLEENE CLOSURE)
a* = {a0, a1, a2, a3, a4,…. a∞} ={ε, a, aa, aaa, aaaa,….. a∞}
Important Characteristics
➢ Value of * ranges from 0 to ∞ i.e. the elements of set a* will include {a0, a1, a2, a3, a4,
a5…. a∞}
➢ a0 means zero number of a’s and this is represented by ε.
➢ * is represented in finite automata by a loop on that particular state; if value of a is 3
i.e. a3 loop iterates for 3 times.
➢ If value of a is 0 i.e. a0 loop will not iterate at all.

q2f m/c for a*

POSITIVE CLOSURE
a+ = {a1, a2, a3, a4,…., a ∞} = { a, aa, aaa, aaaa,….. a ∞}
Important Characteristics
➢ value of + ranges from 1 to ∞ i.e. the elements of set a+ will include {a1, a2, a3, a4,
a5…. a ∞}
➢ There is no a0 move i.e. ε is not part of this set.
➢ Value of a will start from 1 i.e. at least one will come which can be followed by 0 or
more 1’s.
➢ Please remember: a+ = a.a* a

a
q0 q2f

m/c for a+
CONCATENATION OPERATION
Concatenation means joining (a.b)
Important Note: a.b ≠ b.a i.e. order of join will change the design of automata

a
q0 qq2f
m/c for a

b
m/c for b q0 q2f

b b a
a
q0 q1 qq2f q0 q1 qqq2f
f

m/c for a.b m/c for b.a

OR OPERATION

a
q0 qq2f
m/c for a
b
q0 q2f
m/c for b

NFA for a+b (a/b)

a q2f

q0
m/c for a/b
b q2f
SECTION 1.2
INTRODUCTION TO FINITE AUTOMATA
FINITE AUTOMATA
Automata means machine
Finite Automata consist of 5 tuples:
M = (Q, Σ, δ, q0, F)
Q A finite set of states
Σ A finite set of input alphabet
δ A transition function
q0 The initial/starting state, q0 is in Q
F A set of final/accepting states, which is a subset of F
TYPES OF AUTOMATA

There are two types of finite Automata:

➢ Deterministic Finite Automata (DFA)

➢ Non-deterministic finite Automata (NFA)

DETERMINISTIC FINITE AUTOMATA
Deterministic Finite Automata is a Machine where corresponding to
a every input of Σ, there can be only one output from every state.

b Here Σ = { a, b} and at
every state there is one
a
q1 O/P from ‘a’ and one
q0 a, O/P from ‘b’. None of
b a b the states have more
b
q2 then one output
corresponding to a or
qf
a b.
NON-DETERMINISTIC FINITE AUTOMATA

Non-Deterministic Finite Automata is a machine where corresponding to a single

input of Σ (a,b), there can be more than one output from a particular state.

b
Here state q0 has two
a moves from a, one to
q0 q1
q1 and other to q2,
a b like wise state q2 has
a two moves on ‘b’ one
b q2
qf self loop to q1 and
b another to qf
TYPES OF NFA
There are two type of NFA

i. NFA without ε -move

ii. NFA with ε -move

NFA WITH Ε-MOVE

Consider the following NFA, here corresponding q1 there is an ε-move.

a,
b
a q1
q0

a ε
a,b
qf
DIFFERENCE BETWEEN DFA AND NFA
Deterministic Finite Non-Deterministic Finite
Automata Automata

 Deterministic Finite  Non-Deterministic

Automata is a Machine Finite Automata is a
where corresponding to a machine where
every input of Σ, there corresponding to a
can be only one output single input of Σ (a,b),
from every state. there can be more than
 DFA will not have ε- one output from a
move particular state.
 NFA can have ε-move
SECTION 1.3
THOMSON’S CONSTRUCTION
THOMPSON’S CONSTRUCTION
We have three operations on Regular Expressions:

i) Star operation

ii) Concatenation

iii) OR operation

For each operation we have defined rules to build a NFA with ε-move
Thompson’s Construction for Star Operation

a* = {ε, a, aa, aaa, aaaa,…..} a

NFA for a*
NFA for a* using Thomson’s Construction:

ε ε
q0 q1 q2 qf
a

ε
Thompson’s Construction for Star Operation

NFA for a* using Thomson’s Construction:

ε
Only ε
ε ε
q0 q1 q2 qf
a

ε
ε
Single a
ε ε
q0 q1 q2 qf
a

ε
Thompson’s Construction for Star Operation

NFA for a* using Thomson’s Construction:

ε
Two a’s
ε ε q0→q1→q2→q1→q2→qf
q0 q1 q2 qf
a

ε
ε N number of a’s
q0→q1→q2→q1→q2→qf
ε ε q1→q2→q1 loops for N
q0 q1 q2 qf
a times where N varies from
2 to ∞

ε
THOMPSON’S CONSTRUCTION FOR CONCATENATION
OPERATION

a
NFA for a q0 qf

b
NFA for b q0 qf

NFA for ab using Thomson’s Construction

a b
q0 q1 qf
THOMPSON’S CONSTRUCTION FOR OR OPERATION

a
NFA for a q0 qf

b
NFA for b q0 qf

NFA for a+b (a/b) using Thomson’s Construction

a ε
ε q1 q2

q0 qf

ε b q4 ε
q3
THOMPSON’S CONSTRUCTION FOR AA*B Question 1

a
Thompson’s for a: q0 qf

b
Thompson’s for b: q0 qf

ε
Thompson’s for a*: ε ε
q0 q1 q2 qf
a

ε
THOMPSON’S CONSTRUCTION FOR a*b(a/b)
Question 1
Thompson’s Construction for aa*b:

ε
a ε ε b
q0 q1 q2 q3 q4 qf
a

ε
NFA using Thompson’s Construction

a
a
q0 q1 qf
b
NFA without Thompson’s
THOMPSON’S CONSTRUCTION FOR a*b(a/b)
Question 2
ε
Thompson’s for a*: ε ε
q0 q1 q2 qf
a

ε
b
Thompson’s for b: q0 qf

a ε
ε q1 q2
Thompson’s for a/b: q0 qf

ε b q4 ε
q3
THOMPSON’S CONSTRUCTION FOR a*b(a/b)
Question 2
NFA using Thompson’s Construction

ε a ε
ε q5 q6
ε ε b qf
q0 q1 q2 q3 q4
a b
ε q7 q8 ε
ε

b a,b
q0 q1 qf

NFA without Thompson’s

THOMPSON’S CONSTRUCTION FOR (a/b/c)
ε q1
a
q2 ε Question 3
b ε qf Three ε out moves moves from a
q0 q3 q4
ε state are not allowed
c q6 ε
ε q5

a
ε q1 q2 ε
b qf
q0 ε q4 q6 ε
ε q3 q8 ε

ε c
q5 q7 ε
Final Output
THOMPSON’S CONSTRUCTION FOR ab(a/b)*
Question 4
a ε
ε q1 q2
Thompson’s for a/b: q0 qf

ε b q4 ε
q3

Thompson’s for (a/b)*: ε

a ε
ε q2 q3
ε q1 q6 ε
q0 qf
ε b q5 ε
q4

ε
THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4
a
Thompson’s for a: q0 qf

b
Thompson’s for b: q0 qf

Thompson’s for (a/b)*:

ε
a ε
ε q2 q3
ε q1 q6 ε qf
q0
ε b q5 ε
q4

ε
THOMPSON’S CONSTRUCTION FOR ab(a/b)*
Question 4

ε
a ε
ε q4 q5
a b ε q3 q8 ε
q0 q1 q2 qf
ε b ε
q6 q7

ε
NFA using Thompson’s Construction

a,b

a b
q0 q1 qf

NFA without Thompson’s

SECTION 1.4
SUBSET CONSTRUCTION
HOW TO WORK WITH Ε-CLOSURE FUNCTION

Steps for ε-Closure function:

➢ First step is to take ε-Closure of the start state , for e.g. if the start
state is 0 so take ε-Closure(0).

➢ ε-Closure(n) will include set of all the states which can be

traversed from state n without consuming any input i.e. through ε
move only.

➢ Most Imp.- “ε-Closure of a state will include that state itself in the
set”, i.e. ε-Closure(n) will include n in its set of states.
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b
4 5 ε
ε

State a b
Start with the start state: state 0
A
ε-closure(0):{0,1,2,4,7} = A
(0,1,2,4,7)
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b
4 5 ε
ε

Start with the start state:

ε-closure(0):{0,1,2,4,7} = A State a b
(A, a)= ({0,1,2,4,7}, a) = {0,a} ⋃{1,a} ⋃{2,a} ⋃{4,a} ⋃{7,a} A
= Φ ⋃ Φ ⋃{3} ⋃ Φ ⋃ {8} (0,1,2,4,7)

= ε -closure (3) ⋃ ε -closure (8)

SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b 5 ε
4
ε

State a b
(A, a)= ε -closure (3) ⋃ ε -closure (8) A B
= {1,2,3,4,6,7} U {8} (0,1,2,4,7) (1,2,3,4,6,7,8
)
= {1,2,3,4,6,7,8}=B
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b 5 ε
4
ε

State a b
(A, b)= ({0,1,2,4,7}, b) A B
={0,b} ⋃{1,b} ⋃{2,b}⋃{4,b} ⋃{7,b} (0,1,2,4,7) (1,2,3,4,6,7,8)
= Φ ⋃ Φ ⋃ Φ ⋃{5} ⋃ Φ
= ε -closure (5)
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b
4 5 ε
ε

State a b
(A, b)= ε -closure (5) A B C
= {1,2,4,5,6,7}=C (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε ε a b
1 6 7 8
0 9
ε b
4 5 ε
ε

(B, a)= ({1,2,3,4,6,7,8}, a) State a b

= {1,a}⋃{2,a} ⋃{a,a} ⋃{4,a}⋃{6,a}⋃{7,a} ⋃{8,a} A B C
= Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ {8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (3) ⋃ ε -closure (8)
B B
= {1,2,3,4,6,7,8}=B (Slide No. 55)
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b ε
4 5
ε

(B, b)= ({1,2,4,5,6,7,8}, b) State a b

={1,b} ⋃{2,b} ⋃{4,b} ⋃{5,b} ⋃{6,b} ⋃{7,b} ⋃{8,b A B C
= Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ{9} (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (5) ⋃ ε -closure (9)
B B
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε

(B, b) = ε -closure (5) ⋃ ε -closure (9) State a b

= {1,2,4,5,6,7,9}=D A B C
(0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)

B B D
(1,2,4,5,6,7,9)
SUBSET CONSTRUCTION FOR(a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε

(C, a)= ({1,2,4,5,6,7}, a) State a b

= {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} A B C
= Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8 (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)

= ε -closure (3) ⋃ ε -closure (8)

B B D
= {1,2,3,4,6,7,8}=B (Slide no. 55) (1,2,4,5,6,7,9)
C B
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε ε a b
0
1 6 7 8 9
ε b 5 ε
4
ε

(C, b)= ({1,2,4,5,6,7}, b) State a b

= {1,b} ⋃{2,b} ⋃{4,b}⋃{5,b} ⋃{6,b} ⋃{7,b} A B C
= Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)

= ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57)

B B D
(1,2,4,5,6,7,9)
C B C
SUBSET CONSTRUCTION FOR (a/b)*ab
ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε

(D, a)= ({1,2,4,5,6,7,9}, a) State a b

= {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} ⋃{9,a} A B C
= Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (3) ⋃ ε -closure (8) B B D
(1,2,4,5,6,7,9)
= {1,2,3,4,6,7,8}=B (Slide no. 55)
C B C
D B
SUBSET CONSTRUCTION FOR (a/b)*ab

ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε

(D, b)= ({1,2,4,5,6,7,9}, b) State a b

= {1,b}⋃{2,b}⋃{4,b}⋃{5,b}⋃{6,b} ⋃{7,b} ⋃{9,b} A B C
= Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57) B B D
(1,2,4,5,6,7,9)
C B C
D B C
SUBSET CONSTRUCTION FOR (a/b)*ab

b
C State a b
b A B C
b a (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
B B D
a B
A (1,2,4,5,6,7,9)
C B C
a a
b D B C

qD2
➢ Here state A is start state since set
‘A’ has state ‘0’ in its subset which is
Final Output start state in the NFA with
Thompson’s construction.
➢ D is final state since the set D has
state ‘9’ which is final state in the
NFA with Thompson’s Construction
Ε-CLOSURE(T)

push all states of T onto stack

initialize ϵ-closure(T) to T
while (stack is not empty) do
begin
pop t, the top element, off stack;
for (each state u with an edge from t to u labelled ϵ do
begin
if (u is not in ϵ-closure(T)) do
begin
add u to ϵ-closure(T)
push u onto stack
end
end
end
CONVERTING A NFA INTO A DFA (SUBSET CONSTRUCTION)
put -closure({s0}) as an unmarked state into the set of DFA (DS)
while (there is one unmarked S1 in DS) do -closure({s0}) is the set of all states can be accessible
from s0 by -transition.
begin
mark S1 set of states to which there is a transition on
for each input symbol a do a from a state s in S1
begin
S2  -closure(move(S1,a))
if (S2 is not in DS) then
add S2 into DS as an unmarked state
transfunc[S1,a]  S2
end
end

 a state S in DS is an accepting state of DFA if a state s in S is an accepting state of

NFA
 the start state of DFA is -closure({s0})
SECTION 1.5
RE TO DFA THROUGH SYNTAX TREE
METHOD OR DIRECT METHOD
CONVERTING REGULAR EXPRESSIONS DIRECTLY TO
DFAS
 Important state
 We may convert a regular expression into a DFA (without creating a
NFA first).
 First we augment the given regular expression by concatenating it
with a special symbol #.
r ➔ (r)# augmented regular expression
 Then, we create a syntax tree for this augmented regular expression.
 In this syntax tree, all alphabet symbols (plus # and the empty
string) in the augmented regular expression will be on the leaves,
and all inner nodes will be the operators in that augmented regular
expression.
 Then each alphabet symbol (plus #) will be numbered (position
numbers).
FROM REGULAR EXPRESSION TO DFA DIRECTLY:
SYNTAX TREE OF (a/b)*abb#

concatenation
#
6
b
closure 5
b
4
a
* 3
alternation
| position
number
a b (for leafs )
1 2
FROM REGULAR EXPRESSION TO DFA DIRECTLY:
ANNOTATING THE TREE

 nullable(n): the subtree at node n generates languages

including the empty string
 firstpos(n): set of positions that can match the first symbol
of a string generated by the subtree at node n
 lastpos(n): the set of positions that can match the last
symbol of a string generated by the subtree at node n
 followpos(i): the set of positions that can follow position i
in the tree
FROM REGULAR EXPRESSION TO DFA
DIRECTLY: ANNOTATING THE TREE

Node n nullable(n) firstpos(n) lastpos(n)

Leaf  true  

Leaf i false {i} {i}

| nullable(c1) firstpos(c1) lastpos(c1)

/ \ or ꓴ ꓴ
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if nullable(c1) then if nullable(c2) then
• nullable(c1)
firstpos(c1) ꓴ lastpos(c1) ꓴ
/ \ and
c1 c2 firstpos(c2) lastpos(c2)
nullable(c2)
else firstpos(c1) else lastpos(c2)
*
| true firstpos(c1) lastpos(c1)
c1
FROM REGULAR EXPRESSION TO DFA DIRECTLY:
SYNTAX TREE OF (a/b)*abb#
{1, 2, 3} {6}

{1, 2, 3} {5} {6} # {6}

{1, 2, 3} {4} {5} b {5}

nullable 5

{1, 2, 3} {3} {4} b {4}

a {3} firstpos lastpos

{1, 2} {1, 2} {3}
* 3

{1, 2} | {1, 2}

{1} a {1} {2} b {2}

1 2
FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE

Node followpos (a/b)*a b b #

1 {1, 2, 3}
2 {1, 2, 3} 1 2 34 5 6
3 {4}
4 {5}
5 {6}
6 -
FROM RE TO DFA DIRECTLY
(a/b)*a b b #
Let {1,2,3}=A
A,a ({1,2,3},a) followpos (1) ꓴ {1,2,3,4} B 1 2 34 5 6
followpos(3) Node
Symbol followpos
Name
A,b ({1,2,3},b) followpos (2) {1,2,3} A
1 a {1, 2, 3}
B,a ({1,2,3,4},a followpos (1) ꓴ {1,2,3,4} B 2 b {1, 2, 3}
) followpos(3)
3 a {4}
B,b ({1,2,3,4},b followpos (2) ꓴ {1,2,3,5} C 4 b {5}
) followpos(4) 5 b {6}
C,a ({1,2,3,5},a followpos (1) ꓴ {1,2,3,4} B 6 # -
) followpos(3)
State a b
C,b ({1,2,3,5},b followpos (2) ꓴ {1,2,3,6} D
A B A
) followpos(5)
B B C
D,a ({1,2,3,6},a followpos (1) ꓴ {1,2,3,4} B C B D
) followpos(3)
D B A
D,b ({1,2,3,6},b followpos (2) {1,2,3} A
)
FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE

Node followpos
b b
1 {1, 2, 3}
a
2 {1, 2, 3}
start a 1,2, b 1,2, b 1,2,
3 {4} 1,2,3
3,4 3,5 3,6
4 {5} a
5 {6} a
6 -
DIFFERENT DFA’S FOR (a/b)*abb
b

State a b C b

A B C
b a
B B D
a
a b
C B C A B D EE
b
a
D B E a

E B C

b b State a b
a A B A
start a 1,2, b 1,2, b 1,2,
1,2,3 B B C
3,4 3,5 3,6
C A D
a
D B A
FROM REGULAR EXPRESSION TO DFA DIRECTLY:
FOLLOWPOS
for each node n in the tree do
if n is a cat-node with left child c1 and right child c2 then
for each i in lastpos(c1) do
followpos(i) := followpos(i)  firstpos(c2)
end do
else if n is a star-node
for each i in lastpos(n) do
followpos(i) := followpos(i)  firstpos(n)
end do
end if
end do
FROM REGULAR EXPRESSION TO DFA DIRECTLY:
ALGORITHM

s0 := firstpos(root) where root is the root of the syntax tree

Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a   do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do
SECTION 1.6
MINIMIZATION OF DFA
Question 1
MINIMIZATION THE FOLLOWING DFA, IF
POSSIBLE

a B
A
b a
a a
b C b
D E
b

b
USING FINAL AND NON FINAL STATE

Divide the entire set of states into two subsets: Set of final
States and set of non final states.

Consider each sub-set as a separate entity and identify if they

need to be split further or can they be combined together
Question 1

DFA MINIMIZATION USING PARTITIONING METHOD

a B
A Stat a b
b a → e
a a A B C
b C b
B B D
D E
b C B C
D B E
b *
E B C

Draw the transition table corresponding to the given DFA

Question 1

DFA MINIMIZATION USING PARTITIONING METHOD

Divide the states into two subsets- final and non-final

State a b
→ A B C
B B D
Set of non Final States (NF): {A,B,C, D} C B C
Set of Final States (F): {E} D B E
* E B C
Question 1

DFA MINIMIZATION USING PARTITIONING METHOD

Check O/P of all clubbed states (A,B,C,D) with Σ=a

NF= {A,B,C,D}
State a b F= {E}
→ A B C A,B,C
,D
B B D
C B C
E
D B E
* E B C
Question 1
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,B,C,D) with Σ=b

A,B,C
,D NF= ({A,B,C} {D})
F= {E}
State a b
→ A B C A,B,C D
B B D b
Split into two since
C B C
E {A,B,C} goes on
D B E states within {A,B,C)
while state D goes to
* E B C State {E}
Question 1

DFA MINIMIZATION USING PARTITIONING METHOD

Check O/P of all clubbed states (A,B,C) with Σ=a

NF= ({A,B,C}, {D})

State a b
→ A B C A,B,C D
B B D
C B C E
NO SPLIT
D B E
* E B C
Question 1
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,B,C) with Σ=b

A,B,C
B NF= ({A,C}, {B}
State a b {D})

→ A B C A,C b
B B D
D
C B C Split into two since
{A,C} goes to state
D B E
E {C} while {B} goes
* E B C to State {D} which is
already separated.
Question 1

DFA MINIMIZATION USING PARTITIONING METHOD

Check O/P of all clubbed states (A,C) with Σ=a

NO SPLIT

B NF= ({A,C}, {B}

State a b {D})
→ A B C A,C D
B B D
C B C E Both A and C go to
state B which is
D B E already separated
* E B C
Question 1

DFA MINIMIZATION USING PARTITIONING METHOD

Check O/P of all clubbed states (A,C) with Σ=b
NO SPLIT

NF= ({A,C}, {B}

{D})
B
State a b Both A and C state
→ go to same group
A B C A,C D {A,C} on Σ=b
B B D
Since subset {A,C}
C B C E remain as single
D B E combined state till
end, both states will
* E B C
be joined together as a
single state
State a b State a b
DFA MINIMIZATION → A B C A,C B A,C
→
USING PARTITIONING METHOD B B D B B D
C B C D B E
D B E * E B A,C
* E B C

a b a
a B a
A A, B
C
b a
a a a a
b C b b
D E E
b D b

b b

Final Output
Question 2
MINIMIZATION THE FOLLOWING DFA, IF POSSIBLE

b
a

a b a
A B C D
a
a b
b b

b b a
E F G H

b a
a
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD

b
a
State a b
a b a
C D → A B F
A B
a B G C
a b C A C
b b *
D C G
b b a E H F
E F G H
F C G
b a G G E
a H G C

Draw the transition table corresponding to the given DFA

Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Divide the states into two subsets- final and non-final
State a b
→ A B F
B G C
* C A C
D C G
Set of Non Final States (NF): {A,B,D,E,F,G,H} E H F
Set of Final States (F): {C} F C G
G G E
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,B,D,E,F,G,H) with Σ=a

State a b A,B,D,E NF= {A,B,E,G,H}, {D,F}

→ A , F,G,H
B F
B G C
* C A C A,B,E, D,F
D C G G,H
E H F
a Split into two since
F C G
{A,B,E,G,H} go to
G G E C state states within its
H G C set while {D,F} goes
to State {C}
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,B,E,G,H) with Σ=a

State a b NO SPLIT
→ A B F
B G C
* C A C A,B,E, D,F
D C G G,H
E H F
a
F C G
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,B,E,G,H) with Σ=b

NF= {A,E},{G},{B,H},{D,F}
State a b A,B,E,
→ A B F G,H
B G C D,F
* C A C A,E
B,H
D C G b
E H F G
F C G b
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,E) with Σ=a

State a b NO SPLIT
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C
* C A C A,E
B,H
D C G
E H F G
D,F
F C G
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (A,E) with Σ=b

NO SPLIT
State a b
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C D,F
* C A C A,E
B,H
D C G
E H F G
F C G
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (B,H) with Σ=a

State a b NO SPLIT
→ A B F
B G C NF= {A,E},{G},{B,H},{D,F}

* C A C A,E
B,H
D C G
E H F
G D,F
F C G
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (B,H) with Σ=b

State a b NO SPLIT
→ A B F
B G C NF= {A,E},{G},{B,H},{D,F}

* C A C A,E
B,H
D C G
E H F
G D,F
F C G
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (D,F) with Σ=a

State a b NO SPLIT
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C
* C A C A,E
B,H
D C G
E H F
G D,F
F C G a
G G E C
H G C
Question 2
DFA MINIMIZATION USING PARTITIONING METHOD
Check O/P of all clubbed states (D,F) with Σ=b

State a b NO SPLIT

→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C
* C A C A,E
B,H D,F
D C G
E H F
F C G G
G G E C
H G C
State a b State a b

DFA MINIMIZATION USING → A

B
B
G
F
C
→
B, H
A,E B,H
G
D,F
C
PARTITIONING METHOD * C
D
A
C
C
G
*
C A,E C
E H F D,F C G
b G G A,E
a
F C G
G G E
H G C
b b
a b a a
A B C D
a a b a D,
A, B,
a b H
C F
b E
b a
b
b b a a
E F G H
a
b a G
a
a b

Final Output
THANKS

Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Unit 2
No ratings yet
Unit 2
89 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
2 Lexical Analizer
No ratings yet
2 Lexical Analizer
56 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Lexical Analysis: Textbook:Modern Compiler Design
No ratings yet
Lexical Analysis: Textbook:Modern Compiler Design
43 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
2 Lex
No ratings yet
2 Lex
45 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
21-Ambiguity in CFG, CYK Algorithm-27-02-2024
No ratings yet
21-Ambiguity in CFG, CYK Algorithm-27-02-2024
3 pages
Compiler
No ratings yet
Compiler
60 pages
End Sem CD
No ratings yet
End Sem CD
97 pages
CD ch2
No ratings yet
CD ch2
104 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Compiler Construction Final Notes For End Sem Exam
No ratings yet
Compiler Construction Final Notes For End Sem Exam
37 pages
Automata Theory and Computability: "NFA To DFA Conversion"
No ratings yet
Automata Theory and Computability: "NFA To DFA Conversion"
8 pages
CS606 Midterm
No ratings yet
CS606 Midterm
11 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet