Compiler Design CS - 3
Compiler Design CS - 3
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Contact Session - 3
Introduction to Lexical Analyzer
IMP Note to Self
5
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hierarchy of languages
Recursive Languages
Context-Free Languages
Regular Languages
6
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Deterministic Finite State Automata
(DFA)
……..
0 1 1 0 0
Finite
Control
• One-way, infinite tape, broken into cells
• One-way, read-only tape head.
• Finite control, i.e.,
– finite number of states, and
– transition rules between them, i.e.,
– a program, containing the position of the read head, current symbol being
scanned, and the current “state.”
• A string is placed on the tape, read head is positioned at the left end,
and the DFA will read the string one symbol at a time until all symbols
have been read. The DFA will then either accept or reject the string.
7
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• The finite control can be described by a transition diagram or table:
• Example #1:
1
0
q0 q1 1
0
1 0 0 1 1
q0 q0 q1 q0 q0 q0
M1 M2 …. M-inf
L
• Id’s of the characters or states are irrelevant,
you can call them by any names!
Sigma = {0, 1} ≡ {a, b}
States = {q0, q1} ≡ {u, v}, as long as they have
identical (isomorphic) transition table
9
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• An equivalent machine to the previous example (DFA for even num-
ber of 0’s):
1 1
q3 q2
1 0 0 0
q0 0 q1 1
1 0 0 1 1
q0 q3 q1 q2 q2 q2 accept string
a a a/b/c
c c
q0 q1 q2
b b
a c c c b ac-
cepted
q0 q0 q1 q2 q2 q2
a a c re-
jected
q0 q0 q0 q1
11
• Accepts those strings that contain at least two
BITS Pilani, c’s to be University under Section 3 of UGC Act, 1956
Deemed
a a a/b/c
q0 c q1 c q2
b b
Inductive Proof (sketch): that the machine correctly accepts strings with at least two c’s
Proof goes over the length of the string.
Inductive steps: Each case for symbol p, for string xp (|xp| = k+1), the last symbol p = a, b or c
xa xb xc
12
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Formal Definition of a DFA
• A DFA is a five-tuple:
M = (Q, Σ, δ, q0, F)
Q = {q0, q1} 0
q0 q1 1
Σ = {0, 1}
Start state is q0 0
F = {q0}
δ:
0 1
q0 q1 q0
q1 q0 q1
14
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Revisit example #2:
a a a/b/c
Q = {q0, q1, q2}
Σ = {a, b, c} c c
q0 q1 q2
Start state is q0
F = {q2} b b
δ: a b c
q0 q0 q0 q1
q1 q1 q1 q2
q2 q2 q2 q2
δ^ : (Q x Σ*) –> Q
δ^(q,w) – The state entered after reading string w having started in state
q.
Formally:
1) δ^(q, ε) = q, and
2) For all w in Σ* and a in Σ
δ^(q,wa) = δ (δ^(q,w), a)
16
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Recall Example #1: 1
0
q0 q1 1
0
• What is δ^(q0, 011)? Informally, it is the state entered by M after pro-
cessing 011 having started in state q0.
• Formally:
• Therefore:
• Is 10 accepted? No, since δ(q0, 10) = q1 is not a final state. The fact
that δ(q1, 10) = q2 is irrelevant, q1 is not the start state!
20
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Definitions related to DFAs
• Let M = (Q, Σ, δ,q0,F) be a DFA and let w be in Σ*. Then w is accepted by M iff
δ(q0,w) = p for some state p in F.
• Let M = (Q, Σ, δ,q0,F) be a DFA. Then the language accepted by M is the set:
• Let M1 = (Q1, Σ1, δ1, q0, F1) and M2 = (Q2, Σ2, δ2, p0, F2) be DFAs. Then M1 and
M2 are equivalent iff L(M1) = L(M2).
21
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Notes:
– A DFA M = (Q, Σ, δ,q0,F) partitions the set Σ* into two sets: L(M) and
Σ* - L(M).
– If L = L(M) then L is a subset of L(M) and L(M) is a subset of L (def. of set equality).
– Similarly, if L(M1) = L(M2) then L(M1) is a subset of L(M2) and L(M2) is a subset of
L(M1).
• Can you write a program to “simulate” a given DFA, or any arbitrary input DFA?
22
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
0/1
0/1 0/1
q0 q1 q2
23
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
b/c a/b/c
a
a
q0 q1 q2
b/c
Logic:
In Start state (q0): b’s and c’s: ignore – stay in same state
q0 is also “accept” state
First ‘a’ appears: get ready (q1) to reject
But followed by a ‘b’ or ‘c’: go back to start state q0
When second ‘a’ appears after the “ready” state: go to reject state q2
Ignore everything after getting to the “reject” state q2
24
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
b/c a a/b/c
a a
b
q0 q1 q2 q3
c
b/c
Logic: acceptance is straight forward, progressing on each expected
symbol
However, rejection needs special care, in each state (for DFA, we will see
this becomes easier in NFA, non-deterministic machine)
25
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
q0 a b q7
b
b a
b a
q4 q5 q6
b
Remember, you may have multiple “final” states, but only one “start”
state
26
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Let Σ = {0, 1}. Give DFAs for {}, {ε}, Σ*, and Σ+.
0/1 0/1
0/1
q0 q0 q1
27
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Problem: Third symbol from last is 1
0/1
1 0/1 0/1
q0 q1 q2 q3
Is this a DFA?
28
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 1
• Draw a DFA for the language accepting strings starting with ‘ab’ over input
alphabets ∑ = {a, b}
– Solution:
– Step-01:
• All strings of the language starts with substring “ab”. So, length of sub-
string = 2.
– Thus, Minimum number of states required in the DFA = 2 + 2 = 4.
– It suggests that minimized DFA will have 4 states.
– Step-02:
– We will construct DFA for the following strings-
• ab
• aba
• abab
29
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Step 3
30
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 2
• Construct a DFA that accepts a language L over input al-
phabets ∑ = {a, b} such that L is the set of all strings
starting with ‘aa’ or ‘bb’.
• Step-01:
• Minimum number of states required in the DFA = 5.
• It suggests that minimized DFA will have 5 states.
• Step-02:
• We will construct DFA for the following strings-
• aa
• aaa
• aaaa
• bb
• bbb
• bbbb
31
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Step 3
32
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 3
• Let Σ = {a, b} and let L = { w ∈ Σ* | w ≠ ε and the first and last
character of w are the same }. Design a DFA for L.
33
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 4
• Let Σ = {a, b} and let L = { w ∈ Σ* | w is a nonempty string
whose characters alternate between a's and b's }. Design a
DFA whose language is L.
34
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nondeterministic Finite Au-
tomata
Nondeterminism
Subset Construction
35
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nondeterminism
• A nondeterministic finite automaton has the
ability to be in several states at once.
• Transitions from a state on an input symbol
can be to any set of states.
36
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nondeterminism – (2)
• Start in one start state.
• Accept if any sequence of choices leads to a
final state.
• Intuitively: the NFA always “guesses right.”
37
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Formal NFA
• A finite set of states, typically Q.
• An input alphabet, typically Σ.
• A transition function, typically δ.
• A start state in Q, typically q0.
• A set of final states F ⊆ Q.
38
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Transition Function of an
NFA
• δ(q, a) is a set of states.
• Extend to strings as follows:
• Basis: δ(q, ε) = {q}
• Induction: δ(q, wa) = the union over all states
p in δ(q, w) of δ(p, a)
39
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Language of an NFA
• A string w is accepted by an NFA if δ(q0, w)
contains at least one final state.
• The language of the NFA is the set of strings it
accepts.
40
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence of DFA’s, NFA’s
41
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence – (2)
43
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Critical Point
44
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Subset Construction – (2)
• The transition function δD is defined by:
δD({q1,…,qk}, a) is the union over all i = 1,…,k of
δN(qi, a).
• Example: We’ll construct the DFA equivalent
of our “chessboard” NFA.
45
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construc-
tion
r b r b
47
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construc-
tion
r b r b
48
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construc-
tion
r b r b
49
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construc-
tion
r b r b
50
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construc-
tion
r b r b
51
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construc-
tion
r b r b
52
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NFA’s With ε-Transitions
• We can allow state-to-state transitions on ε
input.
• These transitions are done spontaneously,
without looking at the input string.
• A convenience at times, but still only regular
languages are accepted.
53
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: ε-NFA
ε 0 1 ε
A {E} {B} ∅
1 1 B ∅ {C} {D}
1 B C D C ∅ {D} ∅
D ∅ ∅ ∅
A ε ε 0
* E
F
{F} ∅ {B, C}
{D} ∅ ∅
0
E F
0
54
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Closure of States
A
ε ε 0
0 E F
• Closure of a set of states = union of the
0
clo-
sure of each state.
55
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence of NFA, ε-NFA
• Every NFA is an ε-NFA.
– It just has no transitions on ε.
• Converse requires us to take an ε-NFA and
construct an NFA that accepts the same lan-
guage.
• We do so by combining ε–transitions with the
next transition on a real input.
a
Transitions
on ε
Transitions
on ε
57
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Picture of ε-Transition Removal
To here, and performs
Text goes
the subset construction
from here
a
Transitions
on ε
Transitions
on ε
58
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Picture of ε-Transition Removal
To here, with no
subset construction
a
We’ll go
from here
a
a
Transitions
on ε
Transitions
on ε
59
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence – (2)
• Start with an ε-NFA with states Q, inputs Σ,
start state q0, final states F, and transition
function δE.
• Construct an “ordinary” NFA with states Q, in-
puts Σ, start state q0, final states F’, and transi-
tion function δN.
60
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence – (3)
61
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: ε-NFA-to-
Interesting closures:
CL(B) = {B,D};
CL(E) = {B,C,D,E}
NFA
0 1 ε 0 1
A {E} {B} ∅ A {E} {B}
B ∅ {C} {D} *B ∅ {C}
C ∅ {D} ∅ C ∅ {D}
D ∅ ∅ ∅ ∅ ∅
*D
* E {F} ∅ {B, C} E {F} {C, D}
F {D} ∅ ∅ * F {D} ∅
Since closure of
ε-NFA E includes B and
Since closures of C; which have
B and E include transitions on 1
final state D. to C and D.
62
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Summary
• DFA’s, NFA’s, and ε–NFA’s all accept exactly the
same set of languages: the regular languages.
• The NFA types are easier to design and may
have exponentially fewer states than a DFA.
• But only a DFA can be implemented!
63
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• https://www.gatevidyalay.com/tag/dfa-examp
les-with-solutions-pdf/
• https://web.stanford.edu/class/cs103a/ass
n/070S%20Solutions%20for%20Practice%
20with%20Automata.pdf
• https://www.cse.iitd.ac.in/~naveen/courses/C
OL352/slides/fa3.ppt
64
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
IMP Note to Self