Flat Notes 2
Flat Notes 2
in
smartworlD.asia
Formal Languages & Automata Theory Complete Notes
Smartzworld.com 1 jntuworldupdates.org
Smartworld.asia Specworld.in
Contents
1 Mathematical Preliminaries 3
2 Formal Languages 4
2.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Finite Representation . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Regular Expressions . . . . . . . . . . . . . . . . . . . 13
smartworlD.asia
3 Grammars 18
3.1 Context-Free Grammars . . . . . . . . . . . . . . . . . . . . . 19
3.2 Derivation Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Regular Grammars . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Digraph Representation . . . . . . . . . . . . . . . . . . . . . 36
4 Finite Automata 38
4.1 Deterministic Finite Automata . . . . . . . . . . . . . . . . . 39
4.2 Nondeterministic Finite Automata . . . . . . . . . . . . . . . 49
4.3 Equivalence of NFA and DFA . . . . . . . . . . . . . . . . . . 54
4.3.1 Heuristics to Convert NFA to DFA . . . . . . . . . . . 58
4.4 Minimization of DFA . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . 61
4.4.2 Algorithmic Procedure for Minimization . . . . . . . . 65
4.5 Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.1 Equivalence of Finite Automata and Regular Languages 72
4.5.2 Equivalence of Finite Automata and Regular Grammars 84
4.6 Variants of Finite Automata . . . . . . . . . . . . . . . . . . . 89
4.6.1 Two-way Finite Automaton . . . . . . . . . . . . . . . 89
4.6.2 Mealy Machines . . . . . . . . . . . . . . . . . . . . . . 91
Smartzworld.com 2 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
Smartzworld.com 3 jntuworldupdates.org
Smartworld.asia Specworld.in
Chapter 1
Mathematical Preliminaries
smartworlD.asia
Smartzworld.com 4 jntuworldupdates.org
Smartworld.asia Specworld.in
Chapter 2
Formal Languages
smartworlD.asia
siders a language that has a script, then it can be observed that a word is a
sequence of symbols of its underlying alphabet. It is observed that a formal
learning of a language has the following three steps.
1. Learning its alphabet - the symbols that are used in the language.
In this learning, step 3 is the most difficult part. Let us postpone to discuss
construction of sentences and concentrate on steps 1 and 2. For the time
being instead of completely ignoring about sentences one may look at the
common features of a word and a sentence to agree upon both are just se-
quence of some symbols of the underlying alphabet. For example, the English
sentence
"The English articles - a, an and the - are
categorized into two types: indefinite and definite."
may be treated as a sequence of symbols from the Roman alphabet along
with enough punctuation marks such as comma, full-stop, colon and further
one more special symbol, namely blank-space which is used to separate two
words. Thus, abstractly, a sentence or a word may be interchangeably used
Smartzworld.com 5 jntuworldupdates.org
Smartworld.asia Specworld.in
2.1 Strings
We formally define an alphabet as a non-empty finite set. We normally use
the symbols a, b, c, . . . with or without subscripts or 0, 1, 2, . . ., etc. for the
elements of an alphabet.
A string over an alphabet Σ is a finite sequence of symbols of Σ. Although
one writes a sequence as (a1 , a2 , . . . , an ), in the present context, we prefer to
write it as a1 a2 · · · an , i.e. by juxtaposing the symbols in that order. Thus,
a string is also known as a word or a sentence. Normally, we use lower case
letters towards the end of English alphabet, namely z, y, x, w, etc., to denote
smartworlD.asia
strings.
Example 2.1.1. Let Σ = {a, b} be an alphabet; then aa, ab, bba, baaba, . . .
are some examples of strings over Σ.
a1 a2 · · · an b1 b2 · · · bm .
Smartzworld.com 6 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
a∈Σ
2.2 Languages
We have got acquainted with the formal notion of strings that are basic
elements of a language. In order to define the notion of a language in a
broad spectrum, it is felt that it can be any collection of strings over an
alphabet.
Thus we define a language over an alphabet Σ as a subset of Σ∗ .
Smartzworld.com 7 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 2.2.1.
Remark 2.2.2. Note that ∅ 6= {ε}, because the language ∅ does not contain
any string but {ε} contains a string, namely ε. Also it is evident that |∅| = 0;
whereas, |{ε}| = 1.
Since languages are sets, we can apply various well known set operations
such as union, intersection, complement, difference on languages. The notion
of concatenation of strings can be extended to languages as follows.
The concatenation of a pair of languages L1 , L2 is
L1 L2 = {xy | x ∈ L1 ∧ y ∈ L2 }.
Example 2.2.3.
smartworlD.asia
1. If L1 = {0, 1, 01} and L2 = {1, 00}, then
L1 L2 = {01, 11, 011, 000, 100, 0100}.
Remark 2.2.4.
3. L1 ⊆ L1 L2 if and only if ε ∈ L2 .
Smartzworld.com 8 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 2.2.5.
smartworlD.asia
1. Kleene star of the language {01} is
{ε, 01, 0101, 010101, . . .} = {(01)n | n ≥ 0}.
2. If L = {0, 10}, then L∗ = {ε, 0, 10, 00, 010, 100, 1010, 000, . . .}
n
Since
[ an arbitrary string in L is of the form x1 x2 · · · xn , for xi ∈ L and
L∗ = Ln , one can easily observe that
n≥0
L∗ = L0 ∪ L ∪ L2 ∪ · · ·
= {ε} ∪ {0, 1} ∪ {00, 01, 10, 11} ∪ · · ·
= {ε, 0, 1, 00, 01, 10, 11, · · · }
= the set of all strings over Σ.
Smartzworld.com 9 jntuworldupdates.org
Smartworld.asia Specworld.in
Thus, L∗ = L+ ∪ {ε}.
We often can easily describe various formal languages in English by stat-
ing the property that is to be satisfied by the strings in the respective lan-
guages. It is not only for elegant representation but also to understand the
properties of languages better, describing the languages in set builder form
is desired.
Consider the set of all strings over {0, 1} that start with 0. Note that
each such string can be seen as 0x for some x ∈ {0, 1}∗ . Thus the language
can be represented by
{0x | x ∈ {0, 1}∗ }.
Examples
1. The set of all strings over {a, b, c} that have ac as substring can be
written as
smartworlD.asia
{xacy | x, y ∈ {a, b, c}∗ }.
This can also be written as
{x ∈ {a, b, c}∗ | |x|ac ≥ 1},
stating that the set of all strings over {a, b, c} in which the number of
occurrences of substring ac is at least 1.
2. The set of all strings over some alphabet Σ with even number of a0 s is
{x ∈ Σ∗ | |x|a = 2n, for some n ∈ N}.
Equivalently,
{x ∈ Σ∗ | |x|a ≡ 0 mod 2}.
3. The set of all strings over some alphabet Σ with equal number of a0 s
and b0 s can be written as
{x ∈ Σ∗ | |x|a = |x|b }.
Smartzworld.com 10 jntuworldupdates.org
Smartworld.asia Specworld.in
5. The set of all strings over some alphabet Σ that have an a in the 5th
position from the right can be written as
6. The set of all strings over some alphabet Σ with no consecutive a0 s can
be written as
{x ∈ Σ∗ | |x|aa = 0}.
7. The set of all strings over {a, b} in which every occurrence of b is not
before an occurrence of a can be written as
{am bn | m, n ≥ 0}.
Note that, this is the set of all strings over {a, b} which do not contain
ba as a substring.
2.3 Properties
smartworlD.asia
The usual set theoretic properties with respect to union, intersection, comple-
ment, difference, etc. hold even in the context of languages. Now we observe
certain properties of languages with respect to the newly introduced oper-
ations concatenation, Kleene closure, and positive closure. In what follows,
L, L1 , L2 , L3 and L4 are languages.
P3 L{ε} = {ε}L = L.
P4 L∅ = ∅L = ∅.
P5 Distributive Properties:
1. (L1 ∪ L2 )L3 = L1 L3 ∪ L2 L3 .
10
Smartzworld.com 11 jntuworldupdates.org
Smartworld.asia Specworld.in
Conversely, suppose x ∈ L1 L3 ∪ L2 L3 =⇒ x ∈ L1 L3 or x ∈ L2 L3 .
Without loos of generality, assume x 6∈ L1 L3 . Then x ∈ L2 L3 .
smartworlD.asia
2. L1 (L2 ∪ L3 ) = L1 L2 ∪ L1 L3 .
P6 If L1 ⊆ L2 and L3 ⊆ L4 , then L1 L3 ⊆ L2 L4 .
P7 ∅∗ = {ε}.
P8 {ε}∗ = {ε}.
P9 If ε ∈ L, then L∗ = L+ .
P10 L∗ L = LL∗ = L+ .
11
Smartzworld.com 12 jntuworldupdates.org
Smartworld.asia Specworld.in
P11 (L∗ )∗ = L∗ .
P12 L∗ L∗ = L∗ .
smartworlD.asia
y1 · · · yn ∈ (L1 L2 )∗ with yi ∈ L1 L2 . Now each yi = ui vi , for ui ∈ L1 and
vi ∈ L2 . Note that vi ui+1 ∈ L2 L1 , for all i with 1 ≤ i ≤ n − 1. Hence,
x = yz = (y1 · · · yn )z = (u1 v1 · · · un vn )z = u1 (v1 u2 · · · vn−1 un vn z) ∈
L1 (L2 L1 )∗ . Converse is similar. Hence, (L1 L2 )∗ L1 = L1 (L2 L1 )∗ .
12
Smartzworld.com 13 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
Kleene star operation we can have finite representation for some infinite lan-
guages.
While operations are under consideration, to give finite representation for
languages one may first look at the indivisible languages, namely ∅, {ε}, and
{a}, for all a ∈ Σ, as basis elements.
To construct {x}, for x ∈ Σ∗ , we can use the operation concatenation
over the basis elements. For example, if x = aba then choose {a} and {b};
and concatenate {a}{b}{a} to get {aba}. Any finite language over Σ, say
{x1 , . . . , xn } can be obtained by considering the union {x1 } ∪ · · · ∪ {xn }.
In this section, we look at the aspects of considering operations over
basis elements to represent a language. This is one aspect of representing a
language. There are many other aspects to give finite representations; some
such aspects will be considered in the later chapters.
13
Smartzworld.com 14 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
from the emptyset, {ε}, and {a}, for a ∈ Σ, by finitely many applica-
tions of union, concatenation and Kleene star.
2. The smallest class of languages over an alphabet Σ which contains
∅, {ε}, and {a} and is closed with respect to union, concatenation,
and Kleene star is the class of all regular languages over Σ.
Example 2.4.4. As we observed earlier that the languages ∅, {ε}, {a}, and
all finite sets are regular.
Example 2.4.5. {an | n ≥ 0} is regular as it can be represented by the
expression a∗ .
Example 2.4.6. Σ∗ , the set of all strings over an alphabet Σ, is regular. For
instance, if Σ = {a1 , a2 , . . . , an }, then Σ∗ can be represented as (a1 + a2 +
· · · + an )∗ .
Example 2.4.7. The set of all strings over {a, b} which contain ab as a
substring is regular. For instance, the set can be written as
{x ∈ {a, b}∗ | ab is a substring of x}
= {yabz | y, z ∈ {a, b}∗ }
= {a, b}∗ {ab}{a, b}∗
Hence, the corresponding regular expression is (a + b)∗ ab(a + b)∗ .
14
Smartzworld.com 15 jntuworldupdates.org
Smartworld.asia Specworld.in
L = {x | 01 is a substring of x} ∪ {x | 10 is a substring of x}
= {y01z | y, z ∈ Σ∗ } ∪ {u10v | u, v ∈ Σ∗ }
= Σ∗ {01}Σ∗ ∪ Σ∗ {10}Σ∗
Since Σ∗ , {01}, and {10} are regular we have L to be regular. In fact, at this
point, one can easily notice that
Example 2.4.9. The set of all strings over {a, b} which do not contain ab
as a substring. By analyzing the language one can observe that precisely the
language is as follows.
{bn am | m, n ≥ 0}
Thus, a regular expression of the language is b∗ a∗ and hence the language is
smartworlD.asia
regular.
Example 2.4.10. The set of strings over {a, b} which contain odd number
of a0 s is regular. Although the set can be represented in set builder form as
writing a regular expression for the language is little tricky job. Hence, we
postpone the argument to Chapter 3 (see Example 3.3.6), where we construct
a regular grammar for the language. Regular grammar is a tool to generate
regular languages.
Example 2.4.11. The set of strings over {a, b} which contain odd number
of a0 s and even number of b0 s is regular. As above, a set builder form of the
set is:
{x ∈ {a, b}∗ | |x|a = 2n + 1, for some n and |x|b = 2m, for some m}.
Writing a regular expression for the language is even more trickier than the
earlier example. This will be handled in Chapter 4 using finite automata,
yet another tool to represent regular languages.
15
Smartzworld.com 16 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 2.4.13. The regular expressions (10+1)∗ and ((10)∗ 1∗ )∗ are equiv-
alent.
Since L((10)∗ ) = {(10)n | n ≥ 0} and L(1∗ ) = {1m | m ≥ 0}, we have
L((10)∗ 1∗ ) = {(10)n 1m | m, n ≥ 0}. This implies
x = x1 x2 · · · xp where xi = 10 or 1
=⇒ x = (10)p1 1q1 (10)p2 1q2 · · · (10)pr 1qr for pi , qj ≥ 0
=⇒ x ∈ L(((10)∗ 1∗ )∗ ).
smartworlD.asia
From property P14, by choosing L1 = {10} and L2 = {1}, one may notice
that
({10} ∪ {1})∗ = ({10}∗ {1}∗ )∗ .
Since 10 and 1 represent the regular languages {10} and {1}, respectively,
from the above equation we get
Since those properties hold good for all languages, by specializing those prop-
erties to regular languages and in turn replacing by the corresponding regular
expressions we get the following identities for regular expressions.
Let r, r1 , r2 , and r3 be any regular expressions
1. rε ≈ εr ≈ r.
2. r1 r2 6≈ r2 r1 , in general.
4. r∅ ≈ ∅r ≈ ∅.
5. ∅∗ ≈ ε.
6. ε∗ ≈ ε.
16
Smartzworld.com 17 jntuworldupdates.org
Smartworld.asia Specworld.in
7. If ε ∈ L(r), then r∗ ≈ r+ .
8. rr∗ ≈ r∗ r ≈ r+ .
9. (r1 + r2 )r3 ≈ r1 r3 + r2 r3 .
10. r1 (r2 + r3 ) ≈ r1 r2 + r1 r3 .
11. (r∗ )∗ ≈ r∗ .
Example 2.4.14.
Proof.
smartworlD.asia
≈ b+ a∗ b∗ b + b+ b
≈ b+ a∗ b+ + b+ b
≈ b+ a∗ b+ , since L(b+ b) ⊆ L(b+ a∗ b+ ).
Proof.
17
Smartzworld.com 18 jntuworldupdates.org
Smartworld.asia Specworld.in
Chapter 3
Grammars
smartworlD.asia
is presented in Chapter 7.
In the context of natural languages, the grammar of a language is a set of
rules which are used to construct/validate sentences of the language. It has
been pointed out, in the introduction of Chapter 2, that this is the third step
in a formal learning of a language. Now we draw the attention of a reader
to look into the general features of the grammars (of natural languages) to
formalize the notion in the present context which facilitate for better under-
standing of formal languages. Consider the English sentence
18
Smartzworld.com 19 jntuworldupdates.org
Smartworld.asia Specworld.in
In this process, we observe that two types of words are in the discussion.
1. The words like the, study, students.
smartworlD.asia
appearing, then you need not say anything more about them. In case you
arrive at a stage where you find a word of type (2), then you are assumed
to say some more about the word. For example, if the word Article comes,
then one should say which article need to be chosen among a, an and the.
Let us call the type (1) and type (2) words as terminals and nonterminals,
respectively, as per their features.
Thus, a grammar should include terminals and nonterminals along with a
set of rules which attribute some information regarding nonterminal symbols.
• A set of rules.
19
Smartzworld.com 20 jntuworldupdates.org
Smartworld.asia Specworld.in
Sentence
Noun−phrase Noun−phrase
smartworlD.asia
With this, we formally define the notion of grammar as below.
Definition 3.1.1. A grammar is a quadruple
G = (N, Σ, P, S)
where
1. N is a finite set of nonterminals,
2. Σ is a finite set of terminals,
3. S ∈ N is the start symbol, and
4. P is a finite subset of N × V ∗ called the set of production rules. Here,
V = N ∪ Σ.
It is convenient to write A → α, for the production rule (A, α) ∈ P .
To define a formal notion of validating or deriving a sentence using a
grammar, we require the following concepts.
Definition 3.1.2. Let G = (N, Σ, P, S) be a grammar with V = N ∪ Σ.
1. We define a binary relation ⇒
G
on V ∗ by
α⇒
G
β if and only if α = α1 Aα2 , β = α1 γα2 and A → γ ∈ P,
for all α, β ∈ V ∗ .
20
Smartzworld.com 21 jntuworldupdates.org
Smartworld.asia Specworld.in
2. The relation ⇒
G
is called one step relation on G. If α ⇒
G
β, then we call
α yields β in one step in G.
5. If α = α0 ⇒
G
α1 ⇒
G
··· ⇒
G
αn−1 ⇒G
αn = β is a derivation, then the length
n
of the derivation is n and it may be written as α ⇒G
β.
smartworlD.asia
from the start symbol S of G. That is, S ⇒ ∗ α.
10. The language generated by G, denoted by L(G), is the set of all sen-
tences generated by G. That is,
∗ x}.
L(G) = {x ∈ Σ∗ | S ⇒
21
Smartzworld.com 22 jntuworldupdates.org
Smartworld.asia Specworld.in
1. S ⇒ ab
2. S ⇒ bb
3. S ⇒ aba
4. S ⇒ aab
Notation 3.1.4.
1. A → α1 , A → α2 can be written as A → α1 | α2 .
smartworlD.asia
2. Normally we use S as the start symbol of a grammar, unless otherwise
specified.
1. S → 0S
2. S → 1S
3. S → ε
22
Smartzworld.com 23 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
({S}, Σ, P, S) does not generate any string and hence L(G) = ∅.
Method-II. Consider a CFG G in which each production rule has some non-
terminal symbol on its right hand side. Clearly, no terminal string can be
generated in G so that L(G) = ∅.
1. S → aS
2. S → ε
S ⇒ aS ⇒ aε = a
If we use the rule (1), then one may notice that there will always be the
nonterminal S in the resulting sentential form. A derivation can only be
terminated by using rule (2). Thus, for any derivation of length k, that
23
Smartzworld.com 24 jntuworldupdates.org
Smartworld.asia Specworld.in
derives some string x, we would have used rule (1) for k − 1 times and rule
(2) once at the end. Precisely, the derivation will be of the form
S ⇒ aS
⇒ aaS
..
.
k−1
z }| {
⇒ aa · · · a S
k−1
z }| {
⇒ aa · · · a ε = ak−1
Hence, it is clear that L(G) = {ak | k ≥ 0}
In the following, we give some more examples of typical CFGs.
Example 3.1.9. Consider the grammar having the following production
rules:
S → aSb | ε
One may notice that the rule S → aSb should be used to derive strings other
than ε, and the derivation shall always be terminated by S → ε. Thus, a
smartworlD.asia
typical derivation is of the form
S ⇒ aSb
⇒ aaSbb
..
.
⇒ an Sbn
⇒ an εbn = an bn
Hence, L(G) = {an bn | n ≥ 0}.
Example 3.1.10. The grammar
S → aSa | bSb | a | b | ε
generates the set of all palindromes over {a, b}. For instance, the rules S →
aSa and S → bSb will produce same terminals at the same positions towards
left and right sides. While terminating the derivation the rules S → a | b
or S → ε will produce odd or even length palindromes, respectively. For
example, the palindrome abbabba can be derived as follows.
S ⇒ aSa
⇒ abSba
⇒ abbSbba
⇒ abbabba
24
Smartzworld.com 25 jntuworldupdates.org
Smartworld.asia Specworld.in
A → bAc
smartworlD.asia
to produce b0 s and c0 s on either sides. Eventually, the rule
A→ε
can be introduced, which terminate the derivation. Thus, we have the fol-
lowing production rules of a CFG, say G.
S → aSc | A | ε
A → bAc | ε
Now, one can easily observe that L(G) = L.
Example 3.1.12. For the language {am bm+n cn | m, n ≥ 0}, one may think
in the similar lines of Example 3.1.11 and produce a CFG as given below.
S → AB
A → aAb | ε
B → bBc | ε
25
Smartzworld.com 26 jntuworldupdates.org
Smartworld.asia Specworld.in
Figure 3.3 gives three derivations for the string a + b ∗ a. Note that the
three derivations are different, because of application of different sequences of
smartworlD.asia
rules. Nevertheless, the derivations (1) and (2) share the following feature.
A nonterminal that appears at a particular common position in both the
derivations derives the same substring of a + b ∗ a in both the derivations.
In contrast to that, the derivations (2) and (3) are not sharing such feature.
For example, the second S in step 2 of derivations (1) and (2) derives the
substring a; whereas, the second S in step 2 of derivation (3) derives the
substring b ∗ a. In order to distinguish this feature between the derivations
of a string, we introduce a graphical representation of a derivation called
derivation tree, which will be a useful tool for several other purposes also.
Definition 3.2.1. Let G = (N, Σ, P, S) be a CFG and V = N ∪ Σ. For
A ∈ N and α ∈ V ∗ , suppose
A⇒ ∗ α
26
Smartzworld.com 27 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 3.2.2. Consider the following derivation in the CFG given in Ex-
ample 3.1.9.
S ⇒ aSb
⇒ aaSbb
⇒ aaεbb
= aabb
∗ aabb is shown below.
The derivation tree of the above derivation S ⇒
S=
¢¢ ==
¢¢ =
a S= b
¢¢ ==
¢¢ =
a S b
ε
Example 3.2.3. Consider the following derivation in the CFG given in Ex-
ample 3.1.12.
smartworlD.asia
S ⇒ AB
⇒ aAbB
⇒ aaAbbB
⇒ aaAbbB
⇒ aaAbbbBc
⇒ aaAbbbεc
= aaAbbbc
∗ aaAbbbc is shown below.
The derivation tree of the above derivation S ⇒
p S NNNN
ppppp NNN
pp N
A= B>
¡¡ == ¡¡ >>
¡¡ = ¡¡ >
a A= b b B c
¡¡ ==
¡¡ =
a A b ε
27
Smartzworld.com 28 jntuworldupdates.org
Smartworld.asia Specworld.in
S
{ CCCC
S
{ CCCC {
S CC
{{ {{ {{ CC
{{ C {{ C {{ C
S CC ∗ S S CC ∗ S S + S
{ ???
ÄÄÄ CC
C ÄÄÄ CC
C {{{ ?
Ä Ä {
S + S a S + S a a S ∗ S
a b a b b a
smartworlD.asia
rules in a derivation can be permuted to get an equivalent derivation. For
example, as derivation trees (1) and (2) of Figure 3.4 are same, the derivations
(1) and (2) of Figure 3.3 are equivalent. Thus, a derivation tree may represent
several equivalent derivations. However, for a given derivation tree, whose
yield is a terminal string, there is a unique special type of derivation, viz.
leftmost derivation (or rightmost derivation).
28
Smartzworld.com 29 jntuworldupdates.org
Smartworld.asia Specworld.in
Proof. “Only if” part is straightforward, as every leftmost derivation is, any-
way, a derivation.
For “if” part, let
A = α0 ⇒ α1 ⇒ α2 ⇒ · · · ⇒ αn−1 ⇒ αn = x
be a derivation. If αi ⇒L αi+1 , for all 0 ≤ i < n, then we are through.
Otherwise, there is an i such that αi 6⇒L αi+1 . Let k be the least such that
αk 6⇒L αk+1 . Then, we have αi ⇒L αi+1 , for all i < k, i.e. we have leftmost
substitutions in the first k steps. We now demonstrate how to extend the
leftmost substitution to (k + 1)th step. That is, we show how to convert the
derivation
A = α0 ⇒L α1 ⇒L · · · ⇒L αk ⇒ αk+1 ⇒ · · · ⇒ αn−1 ⇒ αn = x
in to a derivation
0 0 0
A = α0 ⇒L α1 ⇒L · · · ⇒L αk ⇒L αk+1 ⇒ αk+2 ⇒ · · · ⇒ αn−1 ⇒ αn0 = x
in which there are leftmost substitutions in the first (k + 1) steps and the
derivation is of same length of the original. Hence, by induction one can
extend the given derivation to a leftmost derivation A ⇒∗ x.
L
smartworlD.asia
Since αk ⇒ αk+1 but αk 6⇒L αk+1 , we have
αk = yA1 β1 A2 β2 ,
for some y ∈ Σ∗ , A1 , A2 ∈ N and β1 , β2 ∈ V ∗ , and A2 → γ2 ∈ P such that
αk+1 = yA1 β1 γ2 β2 .
But, since the derivation eventually yields the terminal string x, at a later
step, say pth step (for p > k), A1 would have been substituted by some string
γ1 ∈ V ∗ using the rule A1 → γ1 ∈ P . Thus the original derivation looks as
follows.
A = α0 ⇒L α1 ⇒L · · · ⇒L αk = yA1 β1 A2 β2
⇒ αk+1 = yA1 β1 γ2 β2
= yA1 ξ1 , with ξ1 = β1 γ2 β2
⇒ αk+2 = yA1 ξ2 , ( here ξ1 ⇒ ξ2 )
..
.
⇒ αp−1 = yA1 ξp−k−1
⇒ αp = yγ1 ξp−k−1
⇒ αp+1
..
.
⇒ αn = x
29
Smartzworld.com 30 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia ⇒
..
.
0
αp−1
0
= yγ1 ξp−k−2
⇒ αp = yγ1 ξp−k−1 = αp
0
⇒ αp+1 = αp+1
..
.
⇒ αn0 = αn = x
As stated earlier, we have the theorem by induction.
Proposition 3.2.7. Two equivalent leftmost derivations are identical.
Proof. Let T be the derivation tree of two leftmost derivations D1 and D2 .
Note that the production rules applied at each nonterminal symbol is pre-
cisely represented by its children in the derivation tree. Since the derivation
tree is same for D1 and D2 , the production rules applied in both the deriva-
tions are same. Moreover, as D1 and D2 are leftmost derivations, the order
of application of production rules are also same; that is, each production is
applied to the leftmost nonterminal symbol. Hence, the derivations D1 and
D2 are identical.
Now we are ready to establish the correspondence between derivation
trees and leftmost derivations.
30
Smartzworld.com 31 jntuworldupdates.org
Smartworld.asia Specworld.in
κ = max{|α| | A → α ∈ P }.
|x| ≤ κh
smartworlD.asia
Hence, T is a κ-ary tree. Now, in the similar lines of Theorem 1.2.3 that is
given for binary trees, one can easily prove that T has at most κh leaf nodes.
Hence, |x| ≤ κh .
3.2.1 Ambiguity
Let G be a context-free grammar. It may be a case that the CFG G gives
two or more inequivalent derivations for a string x ∈ L(G). This can be
identified by their different derivation trees. While deriving the string, if
there are multiple possibilities of application of production rules on the same
symbol, one may have a difficulty in choosing a correct rule. In the context
of compiler which is constructed based on a grammar, this difficulty will lead
to an ambiguity in parsing. Thus, a grammar with such a property is said
to be ambiguous.
Remark 3.2.11. One can equivalently say that a CFG G is ambiguous, if G has
two different rightmost derivations or derivation trees for a string in L(G).
31
Smartzworld.com 32 jntuworldupdates.org
Smartworld.asia Specworld.in
S → S ∗ S | S + S | (S) | a | b
S → S+T | T
T → T ∗R | R
R → (S) | a | b
smartworlD.asia
Example 3.2.14. The context-free language
{am bm cn dn | m, n ≥ 1} ∪ {am bn cn dm | m, n ≥ 1}
is inherently ambiguous. For proof, one may refer to the Hopcroft and Ullman
[1979].
Example 3.3.2. The CFGs given in Examples 3.1.8, 3.1.9 and 3.1.11 are
clearly linear. Whereas, the CFG given in Example 3.1.12 is not linear.
Remark 3.3.3. If G is a linear grammar, then every derivation in G is a
leftmost derivation as well as rightmost derivation. This is because there is
exactly one nonterminal symbol in the sentential form of each internal step
of the derivation.
32
Smartzworld.com 33 jntuworldupdates.org
Smartworld.asia Specworld.in
A → x or A → xB
A → x or A → Bx
Because of the similarity in the definitions of left linear grammar and right
linear grammar, every result which is true for one can be imitated to obtain
a parallel result for the other. In fact, the notion of left linear grammar is
equivalent right linear grammar (see Exercise ??). Here, by equivalence we
smartworlD.asia
mean, if L is generated by a right linear grammar, then there exists a left
linear grammar that generates L; and vice-versa.
Example 3.3.5. The CFG given in Example 3.1.8 is a right linear grammar.
For the language {ak | k ≥ 0}, an equivalent left linear grammar is given
by the following production rules.
S → Sa | ε
33
Smartzworld.com 34 jntuworldupdates.org
Smartworld.asia Specworld.in
the actual number, rather the parity (even or odd) of the number of a0 s that
are generated so far. So to keep track of the parity information, we require
two nonterminal symbols, say O and E, respectively for odd and even. In the
beginning of any derivation, we would have generated zero number of symbols
– in general, even number of a0 s. Thus, it is expected that the nonterminal
symbol E to be the start symbol of the desired grammar. While E generates
the terminal symbol b, the derivation can continue to be with nonterminal
symbol E. Whereas, if one a is generated then we change to the symbol O,
as the number of a0 s generated so far will be odd from even. Similarly, we
switch to E on generating an a with the symbol O and continue to be with
O on generating any number of b0 s. Precisely, we have obtained the following
production rules.
E → bE | aO
O → bO | aE
Since, our criteria is to generate a string with odd number of a0 s, we can
terminate a derivation while continuing in O. That is, we introduce the
production rule
O→ε
smartworlD.asia
Hence, we have the right linear grammar G = ({E, O}, {a, b}, P, E), where P
has the above defined three productions. Now, one can easily observe that
L(G) = L.
An elegant proof of this theorem would require some more concepts and
hence postponed to later chapters. For proof of the theorem, one may refer
to Chapter 4. In view of the theorem, we have the following definition.
Definition 3.3.8. Right linear grammars are also called as regular gram-
mars.
Remark 3.3.9. Since left linear grammars and right linear grammars are
equivalent, left linear grammars also precisely generate regular languages.
Hence, left linear grammars are also called as regular grammars.
34
Smartzworld.com 35 jntuworldupdates.org
Smartworld.asia Specworld.in
S → aS | bS | abA
A → aA | bA | ε
Example 3.3.11. Consider the regular grammar with the following produc-
tion rules.
S → aA | bS | ε
A → aS | bA
Note that the grammar generates the set of all strings over {a, b} having
even number of a0 s.
smartworlD.asia
It can be easily observe that the following regular grammar generates the
language.
S → aS | B
B → bB | b
Example 3.3.13. The grammar with the following production rules is clearly
regular.
S → bS | aA
A → bA | aB
B → bB | aS | ε
35
Smartzworld.com 36 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
Example 3.4.2. The digraph for the grammar presented in Example 3.3.10
is as follows.
a,b a,b
ª ª
G@ABC
/ FED ab / G@ABC
FED ε / ?89:;
>=<
S A $
Remark 3.4.3. From a digraph one can easily give the corresponding right
linear grammar.
Example 3.4.4. The digraph for the grammar presented in Example 3.3.12
is as follows.
a b
ª ª
G@ABC
/ FED ε / G@ABC
FED b / ?89:;
>=<
S B $
Remark 3.4.5. A derivation in a right linear grammar can be represented, in
its digraph, by a path from the starting node S to the special node $; and
conversely. We illustrate this through the following.
Consider the following derivation for the string aab in the grammar given
in Example 3.3.12.
S ⇒ aS ⇒ aaS ⇒ aaB ⇒ aab
The derivation can be traced by a path from S to $ in the corresponding
digraph (refer to Example 3.4.4) as shown below.
a /S a /S ε /B b /$
S
36
Smartzworld.com 37 jntuworldupdates.org
Smartworld.asia Specworld.in
One may notice that the concatenation of the labels on the path, called label
of the path, gives the desired string. Conversely, it is easy to see that the
label of any path from S to $ can be derived in G.
smartworlD.asia
37
Smartzworld.com 38 jntuworldupdates.org
Smartworld.asia Specworld.in
Chapter 4
Finite Automata
smartworlD.asia
representation of the grammar is given below:
b b
ª ª
/ G@ABC
FED a / G@ABC
FED ε / ?89:;
>=<
E f O $
a
Let us traverse the digraph via a sequence of a0 s and b0 s starting at the node
E. We notice that, at a given point of time, if we are at the node E, then
so far we have encountered even number of a0 s. Whereas, if we are at the
node O, then so far we have traversed through odd number of a0 s. Of course,
being at the node $ has the same effect as that of node O, regarding number
of a0 s; rather once we reach to $, then we will not have any further move.
Thus, in a digraph that models a system which understands a language,
nodes holds some information about the traversal. As each node is holding
some information it can be considered as a state of the system and hence a
state can be considered as a memory creating unit. As we are interested in the
languages having finite representation, we restrict ourselves to those systems
with finite number of states only. In such a system we have transitions
between the states on symbols of the alphabet. Thus, we may call them as
finite state transition systems. As the transitions are predefined in a finite
state transition system, it automatically changes states based on the symbols
given as input. Thus a finite state transition system can also be called as a
38
Smartzworld.com 39 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
δ : Q × Σ −→ Q is a function called the transition function or next-state
function.
Note that, for every state and an input symbol, the transition function δ
assigns a unique next state.
Example 4.1.1. Let Q = {p, q, r}, Σ = {a, b}, F = {r} and δ is given by
the following table:
δ a b
p q p
q r p
r r r
Clearly, A = (Q, Σ, δ, p, F ) is a DFA.
Transition Table
Instead of explicitly giving all the components of the quintuple of a DFA, we
may simply point out the initial state and the final states of the DFA in the
table of transition function, called transition table. For instance, we use an
arrow to point the initial state and we encircle all the final states. Thus, we
can have an alternative representation of a DFA, as all the components of
39
Smartzworld.com 40 jntuworldupdates.org
Smartworld.asia Specworld.in
the DFA now can be interpreted from this representation. For example, the
DFA in Example 4.1.1 can be denoted by the following transition table.
δ a b
→p q p
q r p
°
r r r
Transition Diagram
Normally, we associate some graphical representation to understand abstract
concepts better. In the present context also we have a digraph representa-
tion for a DFA, (Q, Σ, δ, q0 , F ), called a state transition diagram or simply a
transition diagram which can be constructed as follows:
3. If there are multiple arcs from labeled a1 , . . . ak−1 , and ak , one state to
smartworlD.asia
another state, then we simply put only one arc labeled a1 , . . . , ak−1 , ak .
The transition diagram for the DFA given in Example 4.1.1 is as below:
b a, b
®
a a
p
/ ?89:;
>=< q
/ ?89:;
>=< / ?89:;
>=<
/.-,
()*+
r
b
b
Note that there are two transitions from the state r to itself on symbols
a and b. As indicated in the point 3 above, these are indicated by a single
arc from r to r labeled a, b.
40
Smartzworld.com 41 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
the path labeled aba from p to reach to q as shown below:
a b a
p
?>=<
89:; q
/ ?89:;
>=< p
/ ?89:;
>=< q
/ ?89:;
>=<
Language of a DFA
Now, we are in a position to define the notion of acceptance of a string, and
consequently acceptance of a language, by a DFA.
A string x ∈ Σ∗ is said to be accepted by a DFA A = (Q, Σ, δ, q0 , F ) if
δ̂(q0 , x) ∈ F . That is, when you apply the string x in the initial state the
DFA will reach to a final state.
The set of all strings accepted by the DFA A is said to be the language
accepted by A and is denoted by L(A ). That is,
L(A ) = {x ∈ Σ∗ | δ̂(q0 , x) ∈ F }.
Example 4.1.2. Consider the following DFA
q0 SSS a
/ G@ABC
FED q1 ?
/ G@ABC
FED b
q2
/ GFED
@ABC
?>=<
89:; b
q
/ GFED
kk 3
@ABC
?>=<
89:;
SSS ??? a Ä kkk
SSS a ÄÄ
Ä kk
SSS kk
SSS ??? Ä kkk
b SSS ?? ÄÄÄ kkkkk a,b
SSS Â ÄÄ kkk
)
?>=<
89:; ku
t
T
a,b
41
Smartzworld.com 42 jntuworldupdates.org
Smartworld.asia Specworld.in
The only way to reach from the initial state q0 to the final state q2 is through
the string ab and it is through abb to reach another final state q3 . Thus, the
language accepted by the DFA is
{ab, abb}.
Example 4.1.3. As shown below, let us recall the transition diagram of the
DFA given in Example 4.1.1.
b a, b
®
a a
p
/ ?89:;
>=< q
/ ?89:;
>=< / ?89:;
>=<
/.-,
()*+
r
b
b
1. One may notice that if there is aa in the input, then the DFA leads us
from the initial state p to the final state r. Since r is also a trap state,
after reaching to r we continue to be at r on any subsequent input.
2. If the input does not contain aa, then we will be shuffling between p
and q but never reach the final state r.
smartworlD.asia
Hence, the language accepted by the DFA is the set of all strings over {a, b}
having aa as substring, i.e.
Description of a DFA
Note that a DFA is an abstract (computing) device. The depiction given in
Figure 4.1 shall facilitate one to understand its behavior. As shown in the
figure, there are mainly three components namely input tape, reading head,
and finite control. It is assumed that a DFA has a left-justified infinite tape
to accommodate an input of any length. The input tape is divided into cells
such that each cell accommodate a single input symbol. The reading head is
connected to the input tape from finite control, which can read one symbol at
a time. The finite control has the states and the information of the transition
function along with a pointer that points to exactly one state.
At a given point of time, the DFA will be in some internal state, say p,
called the current state, pointed by the pointer and the reading head will be
reading a symbol, say a, from the input tape called the current symbol. If
δ(p, a) = q, then at the next point of time the DFA will change its internal
state from p to q (now the pointer will point to q) and the reading head will
move one cell to the right.
42
Smartzworld.com 43 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
state, then the input x is accepted by the DFA. Otherwise, x is rejected by
the DFA.
Configurations
A configuration or an instantaneous description of a DFA gives the informa-
tion about the current state and the portion of the input string that is right
from and on to the reading head, i.e. the portion yet to be read. Formally,
a configuration is an element of Q × Σ∗ .
Observe that for a given input string x the initial configuration is (q0 , x)
and a final configuration of a DFA is of the form (p, ε).
The notion of computation in a DFA A can be described through con-
figurations. For which, first we define one step relation as follows.
43
Smartzworld.com 44 jntuworldupdates.org
Smartworld.asia Specworld.in
C0 , C1 , . . . , Cn such that
C = C0 |−− C1 |−− C2 |−− · · · |−− Cn = C 0
Definition 4.1.5. A the computation of A on the input x is of the form
C |−*− C 0
where C = (q0 , x) and C 0 = (p, ε), for some p.
Remark 4.1.6. Given a DFA A = (Q, Σ, δ, q0 , F ), x ∈ L(A ) if and only if
(q0 , x) |−*− (p, ε) for some p ∈ F .
Example 4.1.7. Consider the following DFA
b a b a a, b
a b a b
q0
/ G@ABC
FED q1
/ G@ABC
FED q2
/ G@ABC
FED
?>=<
89:; q3
/ G@ABC
FED
?>=<
89:; q4
/ G@ABC
FED
smartworlD.asia
the initial state q0 only.
2. On the other hand, if the input has an a, then the DFA transits from
q0 to q1 . On any subsequent a0 s, it remains in q1 only. Thus, the role
of q1 in the DFA is to understand that the input has at least one a.
3. Further, the DFA goes from q1 to q2 via a b and remains at q2 on
subsequent b0 s. Thus, q2 recognizes that the input has an occurrence
of an ab.
Since q2 is a final state, the DFA accepts all those strings which have
one occurrence of ab.
4. Subsequently, if we have a number of a0 s, the DFA will reach to q3 ,
which is a final state, and remains there; so that all such strings will
also be accepted.
5. But, from then, via b the DFA goes to the trap state q4 and since it is
not a final state, all those strings will not be accepted. Here, note that
role of q4 is to remember the second occurrence of ab in the input.
Thus, the language accepted by the DFA is the set of all strings over {a, b}
which have exactly one occurrence of ab. That is,
n ¯ o
¯
x ∈ {a, b}∗ ¯ |x|ab = 1 .
44
Smartzworld.com 45 jntuworldupdates.org
Smartworld.asia Specworld.in
2. Whereas, on input a or b, every state will lead the DFA to its next
state as q0 to q1 , q1 to q2 and q2 to q0 .
4. On the other hand, if any string violates the above stated condition,
smartworlD.asia
then the DFA will either be in q1 or be in q2 and hence they will not be
accepted. More precisely, if the total number of a0 s and b0 s in the input
leaves the remainder 1 or 2, when it is divided by 3, then the DFA will
be in the state q1 or q2 , respectively.
1. Instead of q0 , if q1 is only the final state (as shown in (i), below), then
the language will be
n ¯ o
¯
x ∈ {a, b, c}∗ ¯ |x|a + |x|b ≡ 1 mod 3
45
Smartzworld.com 46 jntuworldupdates.org
Smartworld.asia Specworld.in
c c c c
a,b a,b
q0 A`
/ G@ABC
FED q1
/ G@ABC
FED
?>=<
89:; q0 A`
/ G@ABC
FED q1
/ G@ABC
FED
AA }} AA }}
AA }} AA }}
AA } AA }
a,b AA }} a,b a,b AA }} a,b
}~ } }~ }
q2
GFED
@ABC q2
GFED
@ABC
?>=<
89:;
T T
c c
(i) (ii)
3. In the similar lines, one may observe that the language
n ¯ o
∗ ¯
x ∈ {a, b, c} ¯ |x|a + |x|b 6≡ 1 mod 3
n ¯ o
¯
= x ∈ {a, b, c}∗ ¯ |x|a + |x|b ≡ 0 or 2 mod 3
can be accepted by the DFA shown below, by making both q0 and q2
final states.
c c
a,b
q0 A`
/ G@ABC
FED
?>=<
89:; q1
/ G@ABC
FED
smartworlD.asia
AA }}
AA }}
AA }
a,b AA }} a,b
}~ }
q2
GFED
@ABC
?>=<
89:;
T
c
Clearly, on any input over {a, b}, we are in the state q2 means that, so
far, we have read the portion of length 2.
46
Smartzworld.com 47 jntuworldupdates.org
Smartworld.asia Specworld.in
q3
GFED
@ABC
}}>
a,b }}
}
}}
a,b a,b }}
q0
/ G@ABC
FED q1
/ G@ABC
FED q2 A`
/ G@ABC
FED
?>=<
89:; a,b
AA
AA
A
a,b AAA ²
q4
@ABC
GFED
smartworlD.asia Σ
0 ONML
qk0 +1
HIJK \ U
K
>
3
,
q0
/ G@ABC
FED Σ
q1
/ G@ABC
FED Σ ./ . . _ _ _ Σ
qk 0
/ GFED
@ABC
?>=<
89:; '»
S
¶
¯
¢
q_^]\ t
Σ XYZ[
k0 +k−1
pc j
47
Smartzworld.com 48 jntuworldupdates.org
Smartworld.asia Specworld.in
Thus, we have the following DFA which accepts the given language.
b a
·
a
q0
/ G@ABC
FED q1
/ G@ABC
FED
?>=<
89:;
d
b
.-,w
//()*+ b //()*+
.-,e a /.-,
/()*+
ÂÁÀ¿
»¼½¾
Q I
a
b
smartworlD.asia
b
² a
/.-,
()*+
ÂÁÀ¿
»¼½¾
Q
b
Unlike the above examples, it is little tricky to ascertain the language ac-
cepted by the DFA. By spending some amount of time, one may possibly
report that the language is the set of all strings over {a, b} with last but one
symbol as b.
But for this language, if we consider the following type of finite automa-
ton one can easily be convinced (with an appropriate notion of language
acceptance) that the language is so.
a, b
b a, b
p
/ ?89:;
>=< q
/ ?89:;
>=< / ?89:;
>=<
/.-,
()*+
r
Note that, in this type of finite automaton we are considering multiple (pos-
sibly zero) number of transitions for an input symbol in a state. Thus, if
a string is given as input, then one may observe that there can be multiple
next states for the string. For example, in the above finite automaton, if
abbaba is given as input then the following two traces can be identified.
a b b a b a
p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=<
48
Smartzworld.com 49 jntuworldupdates.org
Smartworld.asia Specworld.in
a b b a b a
p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< p
/ ?89:;
>=< q
/ ?89:;
>=< / ?89:;
>=<
r
Clearly, p and r are the next states after processing the string abbaba. Since
it is reaching to a final state, viz. r, in one of the possibilities, we may
say that the string is accepted. So, by considering this notion of language
acceptance, the language accepted by the finite automaton can be quickly
reported as the set of all strings with the last but one symbol as b, i.e.
n ¯ o
¯ ∗
xb(a + b) ¯ x ∈ (a + b) .
Thus, the corresponding regular expression is (a + b)∗ b(a + b).
This type of automaton with some additional features is known as non-
deterministic finite automaton. This concept will formally be introduced in
the following section.
smartworlD.asia
a state on an input symbol, now we consider a finite automaton with nonde-
terministic transitions. A transition is nondeterministic if there are several
(possibly zero) next states from a state on an input symbol or without any in-
put. A transition without input is called as ε-transition. A nondeterministic
finite automaton is defined in the similar lines of a DFA in which transitions
may be nondeterministic.
Formally, a nondeterministic finite automaton (NFA) is a quintuple N =
(Q, Σ, δ, q0 , F ), where Q, Σ, q0 and F are as in a DFA; whereas, the transition
function δ is as below:
δ : Q × (Σ ∪ {ε}) −→ ℘(Q)
is a function so that, for a given state and an input symbol (possibly ε), δ
assigns a set of next states, possibly empty set.
Remark 4.2.1. Clearly, every DFA can be treated as an NFA.
Example 4.2.2. Let Q = {q0 , q1 , q2 , q3 , q4 }, Σ = {a, b}, F = {q1 , q3 } and δ
be given by the following transition table.
δ a b ε
q0 {q1 } ∅ {q4 }
q1 ∅ {q1 } {q2 }
q2 {q2 , q3 } {q3 } ∅
q3 ∅ ∅ ∅
q4 {q4 } {q3 } ∅
49
Smartzworld.com 50 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
Consider the traces for the string ab from the state q0 . Clearly, the following
four are the possible traces.
a b
q0
(i) GFED
@ABC q1
/ G@ABC
FED q1
/ G@ABC
FED
ε a b
q0
(ii) GFED
@ABC q4
/ G@ABC
FED q4
/ G@ABC
FED q3
/ G@ABC
FED
a ε b
q0
(iii) @ABC
GFED q1
/ G@ABC
FED q2
/ G@ABC
FED q3
/ G@ABC
FED
a b ε
q0
(iv) GFED
@ABC q1
/ G@ABC
FED q1
/ G@ABC
FED q2
/ G@ABC
FED
Note that three distinct states, viz. q1 , q2 and q3 are reachable from q0 via
the string ab. That means, while tracing a path from q0 for ab we consider
possible insertion of ε in ab, wherever ε-transitions are defined. For example,
in trace (ii) we have included an ε-transition from q0 to q4 , considering ab as
εab, as it is defined. Whereas, in trace (iii) we consider ab as aεb. It is clear
that, if we process the input string ab at the state q0 , then the set of next
states is {q1 , q2 , q3 }.
Definition 4.2.3. Let N = (Q, Σ, δ, q0 , F ) be an NFA. Given an input string
x = a1 a2 · · · ak and a state p of N , the set of next states δ̂(p, x) can be easily
computed using a tree structure, called a computation tree of δ̂(p, x), which
is defined in the following way:
50
Smartzworld.com 51 jntuworldupdates.org
Smartworld.asia Specworld.in
2. Children of the root are precisely those nodes which are having transi-
tions from p via ε or a1 .
3. For any node, whose branch (from the root) is labeled a1 a2 · · · ai (as
a resultant string by possible insertions of ε), its children are precisely
those nodes having transitions via ε or ai+1 .
4. If there is a final state whose branch from the root is labeled x (as a
resultant string), then mark the node by a tick mark X.
5. If the label of the branch of a leaf node is not the full string x, i.e.
some proper prefix of x, then it is marked by a cross X – indicating
that the branch has reached to a dead-end before completely processing
the string x.
Example 4.2.4. The computation tree of δ̂(q0 , ab) in the NFA given in
Example 4.2.2 is shown in Figure 4.2
smartworlD.asia
q
0
a ε
q1 q4
b ε
a
q1 q2 q4
✓
ε b b
q q q3
2 3 ✓ ✓
Example 4.2.5. The computation tree of δ̂(q0 , abb) in the NFA given in
Example 4.2.2 is shown in Figure 4.3. In which, notice that the branch
q0 − q4 − q4 − q3 has the label ab, as a resultant string, and as there are no
further transitions at q3 , the branch has got terminated without completely
processing the string abb. Thus, it is indicated by marking a cross X at the
end of the branch.
51
Smartzworld.com 52 jntuworldupdates.org
Smartworld.asia Specworld.in
q
0
a ε
q1 q4
b ε
a
q1 q2 q4
b ε b
b
q1 q2 q q3
3
✓ ✕ ✕
ε b
q q
2 3 ✓
smartworlD.asia
As the automaton given in Example 4.2.2 is nondeterministic, if a string
is processed at a state, then there may be multiple next states (unlike DFA),
possibly empty. For example, if we apply the string bba at the state q0 , then
the only possible way to process the first b is going via ε-transition from q0 to
q4 and then from q4 to q3 via b. As there are no transitions from q3 , the string
cannot be processed further. Hence, the set of next states for the string bba
at q0 is empty.
Thus, given a string x = a1 a2 · · · an and a state p, by treating x as
εa1 εa2 ε · · · εan ε and by looking at the possible complete branches starting at
p, we find the set of next states for p via x. To introduce the notion of δ̂ in
an NFA, we first introduce the notion called ε-closure of a state.
Example 4.2.7. In the following we enlist the ε-closures of all the states of
the NFA given in Example 4.2.2.
E(q0 ) = {q0 , q4 }; E(q1 ) = {q1 , q2 }; E(q2 ) = {q2 }; E(q3 ) = {q3 }; E(q4 ) = {q4 }.
52
Smartzworld.com 53 jntuworldupdates.org
Smartworld.asia Specworld.in
Now we are ready to formally define the set of next states for a state via
a string.
δ̂ : Q × Σ∗ −→ ℘(Q)
by
smartworlD.asia
¯
L(N ) = x ∈ Σ∗ ¯ δ̂(q0 , x) ∩ F 6= ∅ .
Example 4.2.10. Note that q1 and q3 are the final states for the NFA given
in Example 4.2.2.
1. The possible ways of reaching q1 from the initial state q0 is via the set
of strings represented by the regular expression ab∗ .
(a) Via the state q1 : With the strings of ab∗ we can clearly reach q1
from q0 , then using the ε-transition we can reach q2 . Then the
strings of a∗ (a + b) will lead us from state q2 to q3 . Thus, the set
of string in this case can be represented by ab∗ a∗ (a + b).
(b) Via the state q4 : Initially, we use ε-transition to reach q4 from
q0 , then clearly the strings of a∗ b will precisely be useful to reach
from q4 to q3 . And hence ab∗ itself represent the set of strings in
this case.
ab∗ + a∗ b + ab∗ a∗ (a + b)
53
Smartzworld.com 54 jntuworldupdates.org
Smartworld.asia Specworld.in
1. From the initial state q0 one can reach back to q0 via strings from a∗ or
from aεb∗ b, i.e. ab+ , or via a string which is a mixture of strings from
the above two sets. That is, the strings of (a + ab+ )∗ will lead us from
q0 to q0 .
2. Also, note that the strings of ab∗ will lead us from the initial state q0
to the final state q2 .
3. Thus, any string accepted by the NFA can be of the form – a string
from the set (a + ab+ )∗ followed by a string from the set ab∗ .
smartworlD.asia
(a + ab+ )∗ ab∗
Lemma 4.3.1. Given an NFA in which there are some ε-transitions, there
exists an equivalent NFA without ε-transitions.
54
Smartzworld.com 55 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
δ̂(q0 , x) ∩ F 6= ∅. That is, x ∈ L(N ). Hence L(N ) = L(N 0 ).
So, it is enough to prove that δ̂ 0 (q0 , x) = δ̂(q0 , x) for each x ∈ Σ+ . We
prove this by induction on |x|. If x ∈ Σ, then by definition of δ 0 we have
δ̂ 0 (q0 , x) = δ 0 (q0 , x) = δ̂(q0 , x).
Assume the result for all strings of length less than or equal to m. For a ∈ Σ,
let |xa| = m + 1. Since |x| = m, by inductive hypothesis,
δ̂ 0 (q0 , x) = δ̂(q0 , x).
Now
δ̂ 0 (q0 , xa) = δ 0 (δ̂ 0 (q0 , x), a)
[
= δ 0 (p, a)
p∈δ̂ 0 (q0 ,x)
[
= δ̂(p, a)
p∈δ̂(q0 ,x)
= δ̂(q0 , xa).
Hence by induction we have δ 0 (q0 , x) = δ̂(q0 , x) ∀x ∈ Σ+ . This completes
the proof.
55
Smartzworld.com 56 jntuworldupdates.org
Smartworld.asia Specworld.in
Lemma 4.3.2. For every NFA N 0 without ε-transitions, there exists a DFA
A such that L(N 0 ) = L(A ).
Proof. Let N 0 = (Q, Σ, δ 0 , q0 , F 0 ) be an NFA without ε-transitions. Con-
struct A = (P, Σ, µ, p0 , E), where
n o
P = p{i1 ,...,ik } | {qi1 , . . . , qik } ⊆ Q ,
p0 = p{0} ,
n o
E = p{i1 ,...,ik } ∈ P | {qi1 , . . . , qik } ∩ F 0 6= ∅ , and
µ : P × Σ −→ P defined by
[
µ(p{i1 ,...,ik } , a) = p{j1 ,...,jm } if and only if δ 0 (qil , a) = {qj1 , . . . , qjm }.
il ∈{i1 ,...,ik }
smartworlD.asia
This suffices the result, because
x ∈ L(N 0 ) ⇐⇒ δ̂ 0 (q0 , x) ∩ F 0 6= ∅
⇐⇒ µ̂(p0 , x) ∈ E
⇐⇒ x ∈ L(A ).
By inductive hypothesis,
Now by definition of µ,
[
µ(p{i1 ,...,ik } , a) = p{j1 ,...,jm } if and only if δ 0 (qil , a) = {qj1 , . . . , qjm }.
il ∈{i1 ,...,ik }
56
Smartzworld.com 57 jntuworldupdates.org
Smartworld.asia Specworld.in
Thus,
µ̂(p0 , xa) = p{j1 ,...,jm } if and only if δ̂ 0 (q0 , xa) = {qj1 , . . . , qjm }.
We illustrate the constructions for Lemma 4.3.1 and Lemma 4.3.2 through
the following example.
b a, b
ε, b
q0 @
/ G@ABC
FED q1
/ G@ABC
FED
@@ ~~
@@ ~~
@@ ~
@ ~~
ε @@@ ~~~ b
@@ ~
@Ã ~~~~
smartworlD.asia
q2
GFED
@ABC
?>=<
89:;
That is, the transition function, say δ, can be displayed as the following table
δ a b ε
q0 ∅ {q0 , q1 } {q1 , q2 }
q1 {q1 } {q1 , q2 } ∅
q2 ∅ ∅ ∅
δ 0 = δ̂ a b
q0 {q1 } {q0 , q1 , q2 }
q1 {q1 } {q1 , q2 }
q2 ∅ ∅
57
Smartzworld.com 58 jntuworldupdates.org
Smartworld.asia Specworld.in
which are in the subset under consideration. That is, for any input symbol
a and a subset X of {0, 1, 2} (the index set of the states)
[
µ(pX , a) = δ 0 (qi , a).
i∈X
where n ¯ o
¯
P = pX ¯ X ⊆ {0, 1, 2} ,
n ¯ o
¯
E = pX ¯ X ∩ {0, 2} 6= ∅ ; in fact there are six states in E, and
the transition map µ is given in the following table:
µ a b
p∅ p∅ p∅
p{0} p{1} p{0,1,2}
p{1} p{1} p{1,2}
smartworlD.asia
p{2} p∅ p∅
p{0,1} p{1} p{0,1,2}
p{1,2} p{1} p{1,2}
p{0,2} p{1} p{0,1,2}
p{0,1,2} p{1} p{0,1,2}
58
Smartzworld.com 59 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
a
q
?>=<
89:; p
/ ?>=<
89:;
In case there is another loop at q, say with the input symbol b, then scenario
will be as under:
a,b
a
q
?>=<
89:; p
/ ?>=<
89:;
59
Smartzworld.com 60 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 4.3.6. One can construct the following NFA for the language
{an xbm | x = baa, n ≥ 1 and m ≥ 0}.
a b
´ a b a a °
//()*+
.-, //()*+
.-, /()*+
/ .-, //()*+
.-, //()*+
.-,
ÂÁÀ¿
»¼½¾
smartworlD.asia
UUUU b rrr iiii
UUUUU b rr i iii
UUU* ² ry tiriririiii
b a
/.-,
()*+
Q
a,b
Example 4.3.7. We consider the following NFA, which accepts the language
{x ∈ {a, b}∗ | bb is a substring of x}.
a,b a,b
· ®
b b
p
/ ?89:;
>=< q
/ ?89:;
>=< / ?89:;
>=<
/.-,
()*+
r
By applying a heuristic discussed in Case 2, as shown in the following finite
automaton, we can remove nondeterminism at p, whereas we get nondeter-
minism at q, as a result.
a b a,b
· ®
b b
p
/ ?89:;
>=< q
/ ?89:;
>=< / ?89:;
>=<
/.-,
()*+
r
b
a
60
Smartzworld.com 61 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 4.3.8. Consider the language L over {a, b} containing those strings
x with the property that either x starts with aa and ends with b, or x starts
with ab and ends with a. That is,
By applying the heuristics on this NFA, the following DFA can be obtained,
which accepts L.
smartworlD.asia a rrr
rrr9 _
°
/.-,
()*+
a
b °
//()*+
.-,
ÂÁÀ¿
»¼½¾
b
rrr a
rr
//()*+
.-, a / rL
/.-,
()*+
LLL
LLL b
b L
² b LLLL Ä
/.-,
()*+ %
/.-,
()*+ //()*+
.-,
ÂÁÀ¿
»¼½¾
Q Q a Q
a,b b a
61
Smartzworld.com 62 jntuworldupdates.org
Smartworld.asia Specworld.in
x ∼ y =⇒ ∀z(xz ∼ yz).
smartworlD.asia
δ̂(q0 , y). Now, for any z ∈ Σ∗ ,
1. L is accepted by a DFA.
62
Smartzworld.com 63 jntuworldupdates.org
Smartworld.asia Specworld.in
Cq = {x ∈ Σ∗ | δ̂(q0 , x) = q}
L = {x ∈ Σ∗ | δ̂(q0 , x) ∈ F }
[
= {x ∈ Σ∗ | δ̂(q0 , x) = p}
p∈F
[
= Cp .
p∈F
as desired.
(2) ⇒ (3): Suppose ∼ is an equivalence with the criterion given in (2).
We show that ∼ is a refinement of ∼L so that the number of equivalence
smartworlD.asia
classes of ∼L is less than or equal to the number of equivalence classes of
∼. For x, y ∈ Σ∗ , suppose x ∼ y. To show x ∼L y, we have to show that
∀z(xz ∈ L ⇐⇒ yz ∈ L). Since ∼ is right invariant, we have xz ∼ yz, for all
z, i.e. xz and yz are in same equivalence class of ∼. As L is the union of
some of the equivalence classes of ∼, we have
∀z(xz ∈ L ⇐⇒ yz ∈ L).
AL = (Q, Σ, δ, q0 , F )
by setting
n ¯ o
¯
Q = Σ /∼L = [x] ¯ x ∈ Σ , the set of all equivalence classes of Σ∗ with
∗ ∗
respect to ∼L ,
q0 = [ε],
n ¯ o
¯
F = [x] ∈ Q ¯ x ∈ L and
63
Smartzworld.com 64 jntuworldupdates.org
Smartworld.asia Specworld.in
Since ∼L is of finite index, Q is a finite set; further, for [x], [y] ∈ Q and a ∈ Σ,
smartworlD.asia
states of any DFA accepting L is greater than or equal to the index of ∼L
and the proof of (3) ⇒ (1) provides us the DFA AL with number of states
equal to the index of ∼L . Hence, AL is a minimum state DFA accepting L.
Example 4.4.6. Consider the language L = {x ∈ {a, b}∗ | ab is a substring of x}.
We calculate the equivalence classes of ∼L . First observe that the strings ε,
a and ab are not equivalent to each other, because of the following.
1. The string b distinguishes the pair ε and a: εb = b ∈
/ L, whereas ab ∈ L.
2. Any string which does not contain ab distinguishes ε and ab. For in-
stance, εb = b ∈
/ L, whereas abb ∈ L.
64
Smartzworld.com 65 jntuworldupdates.org
Smartworld.asia Specworld.in
– If x has some a’s and some b’s, then x must be of the form bn am ,
for n, m ≥ 1. In this case, x will be in [a].
Thus, ∼L has exactly three equivalence classes and hence it is of finite index.
By Myhill-Nerode theorem, there exists a DFA accepting L.
Example 4.4.7. Consider the language L = {an bn | n ≥ 1} over the alphabet
{a, b}. We show that the index of ∼L is not finite. Hence, by Myhill-Nerode
theorem, there exists no DFA accepting L. For instance, consider an , am ∈
{a, b}∗ , for m 6= n. They are not equivalent with respect to ∼L , because, for
bn ∈ {a, b}∗ , we have
an bn ∈ L, whereas am bn ∈
/ L.
smartworlD.asia
that their roles in the language acceptance are same. Here, two states are
said to have same role, if it will lead us to either both final states or both non-
final states for every input string; so that they contribute in same manner
in language acceptance. Among such group of states, whose roles are same,
only one state can be considered and others can be discarded to reduce the
number states without affecting the language. Now, we formulate this idea
and present an algorithmic procedure to minimize the number of states of
a DFA. In fact, we obtain an equivalent DFA whose number of states is
minimum.
Definition 4.4.8. Two states p and q of a DFA A are said to be equivalent,
denoted by p ≡ q, if for all x ∈ Σ∗ both the states δ(p, x) and δ(q, x) are
either final states or non-final states.
Clearly, ≡ is an equivalence relation on the set of states of A . Given
two states p and q, to test whether p ≡ q we need to check the condition for
all strings in Σ∗ . This is practically difficult, since Σ∗ is an infinite set. So,
in the following, we introduce a notion called k-equivalence and build up a
technique to test the equivalence of states via k-equivalence.
Definition 4.4.9. For k ≥ 0, two states p and q of a DFA A are said to be
k
k-equivalent, denoted by p ≡ q, if for all x ∈ Σ∗ with |x| ≤ k both the states
δ(p, x) and δ(q, x) are either final states or non-final states.
65
Smartzworld.com 66 jntuworldupdates.org
Smartworld.asia Specworld.in
k
Clearly, ≡ is also an equivalence relation on the set of states of A . Since
there are only finitely man strings of length up to k over an alphabet, one may
easily test the k-equivalence between the states. Let us denote the partition
Q Q k
of Q under the relation ≡ by , whereas it is k under the relation ≡.
Remark 4.4.10. For any p, q ∈ Q,
k
p ≡ q if and only if p ≡ q for all k ≥ 0.
k k−1
Also, for any k ≥ 1 and p, q ∈ Q, if p ≡ q, then p ≡ q.
Given a k-equivalence relation over the set of states, the following theorem
provides us a criterion to calculate the (k + 1)-equivalence relation.
smartworlD.asia
k and a ∈ Σ be arbitrary. Then since p ≡ q, δ(δ(p, a), x) and δ(δ(q, a), x)
k
both are either final states or non-final states, so that δ(p, a) ≡ δ(q, a).
0 k
Conversely, for k ≥ 0, suppose p ≡ q and δ(p, a) ≡ δ(q, a) ∀a ∈ Σ. Note
that for x ∈ Σ∗ with |x| ≤ k and a ∈ Σ, δ(δ(p, a), x) and δ(δ(q, a), x) both are
either final states or non-final states, i.e. for all y ∈ Σ∗ and 1 ≤ |y| ≤ k + 1,
both the states δ(p, y) and δ(q, y) are final or non-final states. But since
0 k+1
p ≡ q we have p ≡ q.
Remark 4.4.12. Two k-equivalent states p and q will further be (k + 1)-
k Q Q
equivalent if δ(p, a) ≡ δ(q, a) ∀a ∈ Σ, i.e. from k we can obtain k+1 .
Q Q Q Q
Theorem 4.4.13. If k = k+1 , for some k ≥ 0, then k = .
Q Q
Proof. Suppose k = k+1 . To prove that, for p, q ∈ Q,
k+1 n
p ≡ q =⇒ p ≡ q ∀n ≥ 0
66
Smartzworld.com 67 jntuworldupdates.org
Smartworld.asia Specworld.in
Now,
k+1 k
p ≡ q =⇒ δ(p, a) ≡ δ(q, a) ∀a ∈ Σ
k+1
=⇒ δ(p, a) ≡ δ(q, a) ∀a ∈ Σ
k+2
=⇒ p ≡ q
δ a b
→ q0 q3 q2
q1 q6 q2
q2 q8 q5
07q
16523 43 q0 q1
q2 q5
smartworlD.asia
07q
16524 43
q5 q4 q3
07q
16526 43 q1 q0
q7 q4 q6
07q
16528 43 q2 q7
q9 q7 q10
q10 q5 q9
From the definition, two states p and q are 0-equivalent if both δ̂(p, x) and
δ̂(q, x) are either final states or non-final states, for all |x| = 0. That is, both
δ̂(p, ε) and δ̂(q, ε) are either final states or non-final states. That is, both p
and q are either final states or non-final states.
0
Thus, under the equivalence relation ≡, all final states are equivalent and
all non-final states are equivalent.
Q Hence, there are precisely two equivalence
classes in the partition 0 as given below:
Q n o
0 = {q0 , q1 , q2 , q5 , q7 , q9 , q10 }, {q3 , q4 , q6 , q8 }
From the Theorem 4.4.11, we know that any two 0-equivalent states, p
and q, will further be 1-equivalent if
0
δ(p, a) ≡ δ(q, a) ∀a ∈ Σ.
67
Smartzworld.com 68 jntuworldupdates.org
Smartworld.asia Specworld.in
Q
Using this condition we can evaluate 1 by checking every two 0-equivalent
states whether they are further 1-equivalent or not. If they are 1-equivalent
they continue to be in the same equivalence class. Otherwise, they will be put
in different
Q equivalence classes. The following shall illustrate the computation
of 1 :
(i) δ(q
Q 0 , a) = q3 and δ(q1 , a) = q6 are in the same equivalence class of
0 ; and also
(ii) Q
δ(q0 , b) = q2 and δ(q1 , b) = q2 are in the same equivalence class of
0.
1
Thus,
Q q0 ≡ q1 so that they will continue to be in same equivalence class
in 1 also.
(i) δ(q2 , a) and δ(q5 , a) are, respectively, q8 and q4 ; which are in the
smartworlD.asia
Q
same equivalence class of 0 .
(ii) Whereas, δ(q2 , b)Q= q5 and δ(q5 , b) = q3 are in different equiva-
lences classes of 0 .
Since, any two states which are not k-equivalent cannot be (k Q+1)-equivalent,
we check for those pairs belonging to same equivalence of 0 whether they
are further 1-equivalent. Thus, we obtain
Q n o
1 = {q0 , q1 , q2 }, {q5 , q7 }, {q9 , q10 }, {q3 , q4 , q6 , q8 }
Q Q
Similarly, we continue to compute 2 , 3 , etc.
Q n o
2 = {q0 , q1 , q2 }, {q5 , q7 }, {q9 , q10 }, {q3 , q6 }, {q4 , q8 }
68
Smartzworld.com 69 jntuworldupdates.org
Smartworld.asia Specworld.in
Q n o
3 = {q ,
0 1q }, {q2 }, {q ,
5 7 q }, {q ,
9 10q }, {q ,
3 6 q }, {q ,
4 8q }
Q n o
4 = {q0 , q1 }, {q2 }, {q5 , q7 }, {q9 , q10 }, {q3 , q6 }, {q4 , q8 }
Note Qthat theQ process for a DFA will always terminate at a finite stage
and get k = k+1 , for some k. This is because, there are finite number
of states and in the worst case equivalences may end up with singletons.
Thereafter no further refinementQ isQpossible. Q
In the present context, 3 = 4 . Thus it is partition corresponding
to ≡. Now we construct a DFA with these equivalence classes as states (by
renaming them, for simplicity) and with the induced transitions. Thus we
have an equivalent DFA with fewer number of states that of the given DFA.
The DFA with the equivalences classes as states is constructed below:
Let P = {p1 , . . . , p6 }, where p1 = {q0 , q1 }, p2 = {q2 }, p3 = {q5 , q7 },
p4 = {q9 , q10 }, p5 = {q3 , q6 }, p6 = {q4 , q8 }. As p1 contains the initial state
q0 of the given DFA, p1 will be the initial state. Since p5 and p6 contain the
final states of the given DFA, these two will form the set of final states. The
induced transition function δ 0 is given in the following table.
δ0 a b
smartworlD.asia
→ p1 p5 p2
p2 p6 p3
p3 p6 p5
p4 p3 p4
07p1652543 p1 p1
07p
6152643 p2 p3
Here, note that the state p4 is inaccessible from the initial state p1 . This will
also be removed and the following further simplified DFA can be produced
with minimum number of states.
b
p1
/ GFED
@ABC p2
/ GFED
@ABC
E } } Y
}}}
b }}
a,b a
}} a a
}}
}
²
b
}~ } a ²
p5 o
GFED
@ABC
?>=<
89:; p3
GFED
@ABC p6
/ GFED
@ABC
?>=<
89:;
f
b
The following theorem confirms that the DFA obtained in this procedure,
in fact, is having minimum number of states.
Theorem 4.4.15. For every DFA A , there is an equivalent minimum state
DFA A 0 .
69
Smartzworld.com 70 jntuworldupdates.org
Smartworld.asia Specworld.in
A 0 = (Q0 , Σ, δ 0 , q00 , F 0 )
where
Q0 = {[q] | q is accessible from q0 }, the set of equivalence classes of Q
with respect to ≡ that contain the states accessible from q0 ,
q00 = [q0 ],
F 0 = {[q] ∈ Q0 | q ∈ F } and
δ 0 : Q0 × Σ −→ Q0 is defined by δ 0 ([q], a) = [δ(q, a)].
For [p], [q] ∈ Q0 , suppose [p] = [q], i.e. p ≡ q. Now given a ∈ Σ, for each
x ∈ Σ∗ , both δ̂(δ(p, a), x) = δ̂(p, ax) and δ̂(δ(q, a), x) = δ̂(q, ax) are final or
non-final states, as p ≡ q. Thus, δ(p, a) and δ(q, a) are equivalent. Hence, δ 0
is well-defined and A 0 is a DFA.
Claim 1: L(A ) = L(A 0 ).
Proof of Claim 1: In fact, we show that δ̂ 0 ([q0 ], x) = [δ̂(q0 , x)], for all
smartworlD.asia
x ∈ Σ∗ . This suffices because
δ̂ 0 ([q0 ], x) ∈ F 0 ⇐⇒ δ̂(q0 , x) ∈ F.
as desired.
Claim 2: A 0 is a minimal state DFA accepting L.
Proof of Claim 2: We prove that the number of states of A 0 is equal to
the number of states of AL , a minimal DFA accepting L. Since the states of
AL are the equivalence classes of ∼L , it is sufficient to prove that
Recall the proof (2) ⇒ (3) of Myhill Nerode theorem and observe that
70
Smartzworld.com 71 jntuworldupdates.org
Smartworld.asia Specworld.in
On the other hand, suppose there are more number of equivalence classes for
A 0 then that of ∼L does. That is, there exist x, y ∈ Σ∗ such that x ∼L y,
but not x ∼A 0 y.
That is, δ̂ 0 ([q0 ], x) 6= δ̂ 0 ([q0 ], y).
That is, [δ̂(q0 , x)] 6= [δ̂(q0 , y)].
That is, one among δ̂(q0 , x) and δ̂(q0 , y) is a final state and the other is
a non-final state.
That is, x ∈ L ⇐⇒ y ∈
/ L. But this contradicts the hypothesis that
x ∼L y.
Hence, index of ∼L = index of ∼A 0 .
Example 4.4.16. Consider the DFA given in following transition table.
δ a b
→ 07q16520 43 q1 q0
q1 q0 q3
07q 16522 43 q4 q5
07q 16523 43 q4 q1
a b v a b
p0
/ GFED
@ABC
?>=<
89:; p1
/ GFED
@ABC p2
/ GFED
@ABC
?>=<
89:; p3
/ GFED
@ABC
?>=<
89:; p4
/ GFED
@ABC
d d d d
a b a b
71
Smartzworld.com 72 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
1. Given a regular expression, we construct an equivalent finite automa-
ton.
To prove (1), we need the following. Let r1 and r2 be two regular ex-
pressions. Suppose there exist finite automata A1 = (Q1 , Σ, δ1 , q1 , F1 ) and
A2 = (Q2 , Σ, δ2 , q2 , F2 ) which accept L(r1 ) and L(r2 ), respectively. Under
this hypothesis, we prove the following three lemmas.
72
Smartzworld.com 73 jntuworldupdates.org
Smartworld.asia Specworld.in
A
q1
ε A1 F1
q0
ε A2 F2
q2
Now, for x ∈ Σ∗ ,
x ∈ L(A ) ⇐⇒ (δ̂(q0 , x) ∩ F ) 6= ∅
smartworlD.asia
⇐⇒ (δ̂(q0 , εx) ∩ F ) 6= ∅
⇐⇒ (δ̂(δ̂(q0 , ε), x) ∩ F ) 6= ∅
⇐⇒ (δ̂({q1 , q2 }, x) ∩ F ) 6= ∅
⇐⇒ ((δ̂(q1 , x) ∪ δ̂(q2 , x)) ∩ F ) 6= ∅
⇐⇒ ((δ̂(q1 , x) ∩ F ) ∪ (δ̂(q2 , x) ∩ F )) 6= ∅
⇐⇒ (δ̂(q1 , x) ∩ F1 ) 6= ∅ or (δ̂(q2 , x) ∩ F2 ) 6= ∅
⇐⇒ x ∈ L(A1 ) or x ∈ L(A2 )
⇐⇒ x ∈ L(A1 ) ∪ L(A2 ).
73
Smartzworld.com 74 jntuworldupdates.org
Smartworld.asia Specworld.in
A
ε
q1
A1 F1 A2 F2
q2
Then
smartworlD.asia
x1 = a1 a2 . . . ak ∈ L(A1 ) and x2 = ak+1 ak+2 . . . an ∈ L(A2 )
Now,
(q1 , x) = (q1 , x1 x2 )
|−*− (p, x2 ), for some p ∈ F1
= (p, εx2 )
|−− (q2 , x2 ), since δ(p, ε) = {q2 }
|−*− (p0 , ε), for some p0 ∈ F2
74
Smartzworld.com 75 jntuworldupdates.org
Smartworld.asia Specworld.in
Proof. Construct
A = (Q, Σ, δ, q0 , F ),
where Q = Q1 ∪ {q0 , p} with new states q0 and p, F = {p} and define δ by
½
{q1 , p}, if q ∈ F1 ∪ {q0 } and a = ε
δ(q, a) =
δ1 (q, a), if q ∈ Q1 and a ∈ Σ ∪ {ε}
A ε
q1
A1 F1
ε p
q0 ε
smartworlD.asia
Figure 4.6: Kleene Star of a Finite Automaton
p1 , p2 , . . . , pk ∈ F1 and x1 , x2 , . . . , xk
such that
p1 ∈ δ̂(q1 , x1 ), p2 ∈ δ̂(q1 , x2 ), . . . , pk ∈ δ̂(q1 , xk )
with x = x1 x2 . . . xk . Thus, for all 1 ≤ i ≤ k, xi ∈ L(A1 ) so that x ∈ L(A1 )∗ .
Conversely, suppose x ∈ L(A1 )∗ . Then x = x1 x2 · · · xl with xi ∈ L(A1 )
for all i and for some l ≥ 0. If l = 0, then x = ε and clearly, x ∈ L(A ).
Otherwise, we have
75
Smartzworld.com 76 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
regular expression r. Suppose r has zero operators, then r must be ε, ∅ or a
for some a ∈ Σ.
If r = ε, then the finite automaton as depicted below serves the pur-
pose.
q
/ ?89:;
>=<
7654
0123
If r = ∅, then (i) any finite automaton with no final state will do;
or (ii) one may consider a finite automaton in which final states are
not accessible from the initial state. For instance, the following two
automata are given for each one of the two types indicated above and
serve the purpose.
∀a∈Σ ∀a∈Σ
·
∀a∈Σ
p
/ ?89:;
>=< p o
/ ?89:;
>=< q
?>=<
89:;
7654
0123
(i) (ii)
76
Smartzworld.com 77 jntuworldupdates.org
Smartworld.asia Specworld.in
Suppose the result is true for regular expressions with k or fewer operators
and assume r has k + 1 operators. There are three cases according to the
operators involved. (1) r = r1 +r2 , (2) r = r1 r2 , or (3) r = r1∗ , for some regular
expressions r1 and r2 . In any case, note that both the regular expressions r1
and r2 must have k or fewer operators. Thus by inductive hypothesis, there
exist finite automata A1 and A2 which accept L(r1 ) and L(r2 ), respectively.
Then, for each case we have a finite automaton accepting L(r), by Lemmas
4.5.1, 4.5.2, or 4.5.3, case wise.
ε Ä a ε
a∗ : //()*+
.-, /.-,
/()*+ //()*+
.-, /.-,
7/()*+
ÂÁÀ¿
»¼½¾
ε
//()*+
.-, b //()*+
.-,
ÂÁÀ¿
»¼½¾
b :
smartworlD.asia
ε
ε Ä a ε ε b
a∗ b : //()*+
.-, /.-,
/()*+ //()*+
.-, /.-,
7/()*+ //()*+
.-, //()*+
.-,
ÂÁÀ¿
»¼½¾
ε
Now, finally an NFA for a∗ b + a is:
ε
ε Ä a ε ε b
/.-,
()*+ /.-,
()*+
/ /()*+
/ .-, /.-,
7/()*+ /()*+
/ .-, /()*+
/ .-,
ÂÁÀ¿
»¼½¾
ÄÄ?
ε ÄÄ
ÄÄÄ
.-,Ä?
ε
//()*+
??
??
ε ???
Â
/.-,
()*+ //()*+
.-,
ÂÁÀ¿
»¼½¾
a
Proof. We prove the result by induction on the number of states of DFA. For
base case, let A = (Q, Σ, δ, q0 , F ) be a DFA with only one state. Then there
are two possibilities for the set of final states F .
77
Smartzworld.com 78 jntuworldupdates.org
Smartworld.asia Specworld.in
L1
L3
q0
Assume that the result is true for all those DFA whose number of states is
less than n. Now, let A = (Q, Σ, δ, q0 , F ) be a DFA with |Q| = n. First note
that the language L = L(A ) can be written as
L = L∗1 L2
where
L1 is the set of strings that start and end in the initial state q0
smartworlD.asia
L2 is the set of strings that start in q0 and end in some final state. We
include ε in L2 if q0 is also a final state. Further, we add a restriction
that q0 will not be revisited while traversing those strings. This is
justified because, the portion of a string from the initial position q0 till
that revisits q0 at the last time will be part of L1 and the rest of the
portion will be in L2 .
Using the inductive hypothesis, we prove that both L1 and L2 are regular.
Since regular languages are closed with respect to Kleene star and concate-
nation it follows that L is regular.
The following notation shall be useful in defining the languages L1 and
L2 , formally, and show that they are regular. For q ∈ Q and x ∈ Σ∗ , let us
denote the set of states on the path of x from q that come after q by P(q,x) .
That is, if x = a1 · · · an ,
n
[
P(q,x) = {δ̂(q, a1 · · · ai )}.
i=1
Define
L1 = {x ∈ Σ∗ | δ̂(q0 , x) = q0 }; and
½
L3 , if q0 ∈
/ F;
L2 =
L3 ∪ {ε}, if q0 ∈ F,
78
Smartzworld.com 79 jntuworldupdates.org
Smartworld.asia Specworld.in
A(a,b) = (Q0 , Σ, δ 0 , qa , F 0 )
smartworlD.asia
0
then, since q0 ∈ / P(qa ,x) and δ̂ (qa , x) ∈ F 0 . This implies,
/ Q , q0 ∈ 0
B = {a ∈ Σ | δ(q0 , a) = q0 } ∪ {ε}
then clearly, [
L1 = B ∪ aL(a,b) b.
(a,b)∈A
Hence L1 is regular.
Claim 2: L3 is regular.
Proof of Claim 2: Consider the following set
C = {a ∈ Σ | δ(q0 , a) 6= q0 }
79
Smartzworld.com 80 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
But, clearly [
L3 = aLa
a∈C
1. Note that the following strings bring the DFA from q0 back to q0 .
(a) a (via the path hq0 , q0 i)
(b) ba (via the path hq0 , q1 , q0 i)
(c) For n ≥ 0, bbbn a (via the path hq0 , q1 , q2 , q2 , . . . , q2 , q0 i)
Thus L1 = {a, ba, bbbn a | n ≥ 0} = a + ba + bbb∗ a.
2. Again, since q0 is not a final state, L2 – the set of strings which take
the DFA from q0 to the final state q2 – is
{bbbn | n ≥ 0} = bbb∗ .
80
Smartzworld.com 81 jntuworldupdates.org
Smartworld.asia Specworld.in
Ri = {x ∈ Σ∗ | δ̂(q0 , x) = qi }
smartworlD.asia
unknown expression for each Ri , say ri . We are indented to observe that ri ,
for all i, is a regular expression so that
And in case of R0 , it is
as ε takes the DFA from q0 to itself. Thus, for each j, we have an equation
81
Smartzworld.com 82 jntuworldupdates.org
Smartworld.asia Specworld.in
q
0
R0
Rn
R1 Rj
Ri
q q q q q
0 1 i j n
Σ
(1, j ) Σ Σ
Σ
( i, j ) ( j, j ) Σ
( n ,j )
(0, j )
q
j
smartworlD.asia
r1 = r0 s(0,1) + r1 s(1,1) + · · · + ri s(i,1) + · · · + rn s(n,1)
..
.
rj = r0 s(0,j) + r1 s(1,j) + · · · + ri s(i,j) + · · · + rn s(n,j)
..
.
rn = r0 s(0,n) + r1 s(1,n) + · · · + ri s(i,n) + · · · + rn s(n,n)
The system can be solved for rfi 0 s, expressions for final states, via straight-
forward substitution, except the same unknown appears on the both the left
and right hand sides of a equation. This situation can be handled using
Arden’s principle (see Exercise ??) which states that
Let s and t be regular expressions and r is an unknown. A equation
of the form r = t + rs, where ε ∈
/ L(s), has a unique solution given by
∗
r = ts .
By successive substitutions and application of Arden’s principle we evaluate
the expressions for final states purely in terms of symbols from Σ. Since the
operations involved here are admissible for regular expression, we eventually
obtain regular expressions for each rfi .
We demonstrate the method through the following examples.
82
Smartzworld.com 83 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
Example 4.5.9. Consider the following DFA
b
a a
a, b
q1
/ G@ABC
FED
?>=<
89:; q2
/ G@ABC
FED
?>=<
89:; q3
/ G@ABC
FED
d
b
83
Smartzworld.com 84 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
In addition, if the initial state q0 ∈ F , then we include S → ε in P. Clearly
G is a regular grammar. We claim that L(G) = L(A ).
From the construction of G, it is clear that ε ∈ L(A ) if and only if
ε ∈ L(G). Now, for n ≥ 1, let x = a1 a2 . . . an ∈ L(A ) be arbitrary. That is
δ̂(q0 , a1 a2 . . . an ) ∈ F . This implies, there exists a sequence of states
q1 , q 2 , . . . , q n
such that
δ(qi−1 , ai ) = qi , for 1 ≤ i ≤ n, and qn ∈ F.
As per construction of G, we have
S = q0 ⇒ a1 q1
⇒ a1 a2 q2
..
.
⇒ a1 a2 · · · an−1 qn−1
⇒ a1 a2 · · · an = x.
84
Smartzworld.com 85 jntuworldupdates.org
Smartworld.asia Specworld.in
Thus x ∈ L(G).
∗
Conversely, suppose y = b1 · · · bm ∈ L(G), for m ≥ 1, i.e. S ⇒ y in G.
Since every production rule of G is form A → aB or A → a, the derivation
∗
S ⇒ y has exactly m steps and first m − 1 steps are because of production
rules of the type A → aB and the last mth step is because of the rule of the
form A → a. Thus, in every step of the deviation one bi of y can be produced
in the sequence. Precisely, the derivation can be written as
S ⇒ b1 B1
⇒ b1 b2 B2
..
.
⇒ b1 b2 · · · bm−1 Bm−1
⇒ b1 b2 · · · bm = y.
From the construction of G, it can be observed that
δ(Bi−1 , bi ) = Bi , for 1 ≤ i ≤ m − 1, and B0 = S
in A . Moreover, δ(Bm−1 , bm ) ∈ F . Thus,
smartworlD.asia
δ̂(q0 , y) = δ̂(S, b1 · · · bm ) = δ̂(δ(S, b1 ), b2 · · · bm )
= δ̂(B1 , b2 · · · bm )
..
.
= δ̂(Bm−1 , bm )
= δ(Bm−1 , bm ) ∈ F
so that y ∈ L(A ). Hence L(A ) = L(G).
Example 4.5.11. Consider the DFA given in Example 4.5.8. Set N =
{q0 , q1 }, Σ = {a, b}, S = q0 and P has the following production rules:
q0 → aq1 | bq0 | a
q1 → aq0 | bq1 | b
Now G = (N , Σ, P, S) is a regular grammar that is equivalent to the given
DFA.
Example 4.5.12. Consider the DFA given in Example 4.5.9. The regular
grammar G = (N , Σ, P, S), where N = {q1 , q2 , q3 }, Σ = {a, b}, S = q1 and
P has the following rules
q1 → aq2 | bq1 | a | b | ε
q2 → aq3 | bq1 | b
q3 → aq3 | bq3
85
Smartzworld.com 86 jntuworldupdates.org
Smartworld.asia Specworld.in
is equivalent to the given DFA. Here, note that q3 is a trap state. So, the
production rules in which q3 is involved can safely be removed to get a simpler
but equivalent regular grammar with the following production rules.
q1 → aq2 | bq1 | a | b | ε
q2 → bq1 | b
One can easily prove that the GFA is no more powerful than an NFA.
That is, the language accepted by a GFA is regular. This can be done
by converting each transition of a GFA into a transition of an NFA. For
instance, suppose there is a transition from a state p to a state q via s string
smartworlD.asia
x = a1 · · · ak , for k ≥ 2, in a GFA. Choose k − 1 new state that not already
there in the GFA, say p1 , . . . , pk−1 and replace the transition
x
p −→ q
In a similar way, all the transitions via strings, of length at least 2, can be
replaced in a GFA to convert that as an NFA without disturbing its language.
86
Smartzworld.com 87 jntuworldupdates.org
Smartworld.asia Specworld.in
B ∈ δ(A, x) ⇐⇒ A → xB ∈ P
and
$ ∈ δ(A, x) ⇐⇒ A → x ∈ P
We claim that L(A ) = L.
Let w ∈ L, i.e. there is a derivation for w in G. Assume the derivation
has k steps, which is obtained by the following k − 1 production rules in the
first k − 1 steps
Ai−1 → xi Ai , for 1 ≤ i ≤ k − 1, with S = A0 ,
and at the end, in the kth step, the production rule
Ak−1 → xk .
smartworlD.asia
Thus, w = x1 x2 · · · xk and the derivation is as shown below:
S = A0 ⇒ x1 A1
⇒ x1 x2 A2
..
.
⇒ x1 x2 · · · xk−1 Ak−1
⇒ x1 x2 · · · xk = w.
87
Smartzworld.com 88 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 4.5.15. Consider the regular grammar given in the Example 3.3.10.
Let Q = {S, A, $}, Σ = {a, b}, X = {ε, a, b, ab} and F = {$}. Set A =
(Q, Σ, X, δ, S, F ), where δ : Q × X −→ ℘(Q) is defined by the following
table.
δ ε a b ab
S ∅ {S} {S} {A}
A {$} {A} {A} ∅
$ ∅ ∅ ∅ ∅
Clearly, A is a GFA. Now, we convert the GFA A to an equivalent NFA.
Consider a new symbol B and split the production rule
S → abA
S → aB and B → bA
smartworlD.asia
and replace them in place of the earlier one. Note that, in the resultant
grammar, the terminal strings that is occurring in the righthand sides of
production rules are of lengths at most one. Hence, in a straightforward
manner, we have the following NFA A 0 that is equivalent to the above GFA
and also equivalent to the given regular grammar. A 0 = (Q0 , Σ, δ 0 , S, F ),
where Q0 = {S, A, B, $} and δ 0 : Q0 × Σ −→ ℘(Q0 ) is defined by the following
table.
δ ε a b
S ∅ {S, B} {S}
A {$} {A} {A}
B ∅ ∅ {A}
$ ∅ ∅ ∅
Example 4.5.16. The following an equivalent NFA for the regular grammar
given in Example 3.3.13.
δ ε a b
S ∅ {A} {S}
A ∅ {B} {A}
B {$} {S} {B}
$ ∅ ∅ ∅
88
Smartzworld.com 89 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
where Q, Σ, q0 and F are as in DFA, but the transition function is
δ : Q × Σ −→ Q × {L, R},
89
Smartzworld.com 90 jntuworldupdates.org
Smartworld.asia Specworld.in
Case-I: X = R.
½
0 (q, xa), if y = ε;
C =
(q, xaby 0 ), whenever y = by 0 , for some b ∈ Σ.
Case-II: X = L.
½
0 undefined, if x = ε;
C =
(q, x0 bay 0 ), whenever x = x0 b, for some b ∈ Σ.
smartworlD.asia
δ 0 1
→ 07q16520 43 (q1 , R) (q0 , R)
07q16251 34 (q1 , R) (q2 , L)
q2 (q3 , L) (q3 , L)
q3 (q4 , R) (q5 , R)
q4 (q6 , R) (q6 , R)
q5 (q5 , R) (q5 , R)
q6 (q0 , R) (q0 , R)
The following computation on 10011 shows that the string is accepted by the
2DFA.
(q0 , 10011) |−− (q0 , 10011) |−− (q1 , 10011) |−− (q1 , 10011)
|−− (q2 , 10011) |−− (q3 , 10011) |−− (q4 , 10011)
|−− (q6 , 10011) |−− (q0 , 10011) |−− (q0 , 10011).
Given 1101 as input to the 2DFA, as the state component the final con-
figuration of the computation
(q0 , 1101) |−− (q0 , 1101) |−− (q0 , 1101) |−− (q1 , 1101)
|−− (q2 , 1101) |−− (q3 , 1101) |−− (q5 , 1101)
|−− (q5 , 1101) |−− (q5 , 1101).
is a non-final state, we observe that the string 1101 is not accepted by the
2DFA.
90
Smartzworld.com 91 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 4.6.3. Consider the language over {0, 1} that contains all those
strings with no consecutive 00 s. That is, any occurrence of two 10 s have to
be separated by at least one 0. In the following we design a 2DFA which
checks this parameter and accepts the desired language. We show the 2DFA
using a transition diagram, where the left or right move of the reading head
is indicated over the transitions.
(1,R) (1,R)
(0,R)
q0 A`
/ G@ABC
FED
?>=<
89:; q1
/ G@ABC
FED
?>=<
89:;
AA }
AA }}}
AA }
(1,R) AA }} (0,L)
}~ }
q2
GFED
@ABC
?>=<
89:;
T
(0,L)
smartworlD.asia
output function. In case of Mealy machine, the output is associated to each
transition, i.e. given an input symbol in a state, while changing to the next
state, the machine emits an output symbol. Thus, formally, a Mealy machine
is defined as follows.
λ̂ : Q × Σ∗ −→ ∆∗
that is defined by
91
Smartzworld.com 92 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
1. λ̂(q, ε) = ε, and
where q1 = q and qi+1 = δ(qi , ai ), for 1 ≤ i < n. Clearly, |x| = |λ̂(q, x)|.
Example 4.6.6. Let Q = {q0 , q1 , q2 }, Σ = {a, b}, ∆ = {0, 1} and define the
transition function δ and the output function λ through the following tables.
δ a b λ a b
q0 q1 q0 q0 0 0
q1 q1 q2 q1 0 1
q2 q1 q0 q2 0 0
92
Smartzworld.com 93 jntuworldupdates.org
Smartworld.asia Specworld.in
For instance, output of M for the input string baababa is 0001010. In fact,
this Mealy machine prints 1 for each occurrence of ab in the input; otherwise,
it prints 0.
Example 4.6.7. In the following we construct a Mealy machine that per-
forms binary addition. Given two binary numbers a1 · · · an and b1 · · · bn (if
they are different length then we put some leading 00 s to the shorter one),
the input sequence will be considered as we consider in the manual addition,
as shown below. µ ¶µ ¶ µ ¶
an an−1 a1
···
bn bn−1 b1
Here, we reserve a1 = b1 = 0 so as to accommodate the extra bit, if any,
during addition. The expected output
smartworlD.asia
is such that
cn cn−1 · · · c1
a1 · · · an + b1 · · · bn = c1 · · · cn .
Note that there are four input symbols, viz.
µ ¶ µ ¶ µ ¶ µ ¶
0 0 1 1
, , and .
0 1 0 1
For notational convenience, let us denote the above symbols by a, b, c and d,
respectively. Now, the desired Mealy machine, while it is in the initial state,
say q0 , if the input is d, i.e. while adding 1 + 1, it emits the output 0 and
remembers the carry 1 through a new state, say q1 . For other input symbols,
viz. a, b and c, as there is no carry, it will continue in q0 and performs the
addition. Similarly, while the machine continues in q1 , for the input a, i.e.
while adding 0 + 0, it changes to the state q0 , indicating that the carry is
0 and emits 1 as output. Following this mechanism, the following Mealy
machine is designed to perform binary addition.
93
Smartzworld.com 94 jntuworldupdates.org
Smartworld.asia Specworld.in
Chapter 5
Properties of Regular
Languages
smartworlD.asia
determine whether a given language is regular or not? If a given language
is regular, then to prove the same we need to use regular expression, regular
grammar, finite automata or Myhill-Nerode theorem. Is there any other way
to prove that a language is regular? The answer is “Yes”. If a given lan-
guage can be obtained from some known regular languages by applying those
operations which preserve regularity, then one can ascertain that the given
language is regular. If a language is not regular, although we have Myhill-
Nerode theorem, a better and more practical tool viz. pumping lemma will
be introduced to ascertain that the language is not regular. If we were some-
how know that some languages are not regular, then again closure properties
might be helpful to establish some more languages that are not regular. Thus,
closure properties play important role not only in proving certain languages
are regular, but also in establishing non-regularity of languages. Hence, we
are indented to explore further closure properties of regular languages.
94
Smartzworld.com 95 jntuworldupdates.org
Smartworld.asia Specworld.in
smartworlD.asia
Alternative Proof by Construction. For i = 1, 2, let Ai = (Qi , Σ, δi , qi , Fi ) be
two DFA accepting Li . That is, L(A1 ) = L1 and L(A2 ) = L2 . Set the DFA
A = (Q1 × Q2 , Σ, δ, (q1 , q2 ), F1 × F2 ),
where δ is defined point-wise by
δ((p, q), a) = (δ1 (p, a), δ2 (q, a)),
for all (p, q) ∈ Q1 × Q2 and a ∈ Σ. We ³ claim´that³L(A ) = L1 ∩ L´2 . Using
induction on |x|, first observe that δ̂ (p, q), x = δ̂1 (p, x), δ̂2 (q, x) , for all
x ∈ Σ∗ .
Now it clearly follows that
³ ´
x ∈ L(A ) ⇐⇒ δ̂ (q1 , q2 ), x ∈ F1 × F2
³ ´
⇐⇒ δ̂1 (q1 , x), δ̂2 (q2 , x) ∈ F1 × F2
⇐⇒ δ̂1 (q1 , x) ∈ F1 and δ̂2 (q2 , x) ∈ F2
⇐⇒ x ∈ L1 and x ∈ L2
⇐⇒ x ∈ L1 ∩ L2 .
95
Smartzworld.com 96 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 5.1.3. Using the construction given in the above proof, we design
a DFA that accepts the language
so that L is regular. Note that the following DFA accepts the language
L1 = {x ∈ (0 + 1)∗ | |x|0 is even}.
1 1
0
q1
/ G@ABC
FED
?>=<
89:;
g q2
/ G@ABC
FED
Also, the following DFA accepts the language L2 = {x ∈ (0+1)∗ | |x|1 is odd}.
0 0
1
p1
/ GFED
@ABC
h
p2
/ GFED
@ABC
?>=<
89:;
smartworlD.asia
below.
1
v
/ GFED
@ABC
s1 / GFED
@ABC
?>=<
89:;
s2
E 1 Y
0 0 0 0
² ²
GFED
@ABC 1 / GFED
@ABC
s3 h s4
1
96
Smartzworld.com 97 jntuworldupdates.org
Smartworld.asia Specworld.in
is regular, we choose s2 and s3 as final states and obtain the following DFA
which accepts L0 .
1
v
/ GFED
@ABC
s1 / GFED
@ABC
?>=<
89:;
s2
E 1 Y
0 0 0 0
² ²
GFED
@ABC
?>=<
89:; 1 / GFED
@ABC
s3 h s4
1
Corollary 5.1.4. The class of regular languages is closed under set differ-
ence.
smartworlD.asia
Remark 5.1.6. In general, one may conclude that the removal of finitely many
strings from a regular language leaves a regular language.
Lemma 5.1.8. For every regular language L, there exists a finite automaton
A with a single final state such that L(A ) = L.
97
Smartzworld.com 98 jntuworldupdates.org
Smartworld.asia Specworld.in
Proof of the Theorem 5.1.7. Let A be a finite automaton with the initial
state q0 and single final state qf that accepts L. Construct a finite automaton
A R by reversing the arcs in A with the same labels and by interchanging
the roles of initial and final states. If x ∈ Σ∗ is accepted by A , then there is
a path q0 to qf labeled x in A . Therefore, there will be a path from qf to q0
in A R labeled xR so that xR ∈ L(A R ). Conversely, if x is accepted by A R ,
then using the similar argument one may notice that its reversal xR ∈ L(A ).
Thus, L(A R ) = LR so that LR is regular.
Example
0 5.1.9. Consider the alphabet Σ = {a0 , a1 , . . . , a7 }, where ai =
bi
b00i and b0i b00i b000
i is the binary representation of decimal number i, for 0 ≤
000
bi
0 0 0 1
i ≤ 7. That is, a0 = 0 , a1 = 0 , a2 = 1 , . . ., a7 = 1 .
0 1 0 1
Now a string x = ai1 ai2 · · · ain over Σ is said to represent correct binary
addition if
b0i1 b0i2 · · · b0in + b00i1 b00i2 · · · b00in = b000 000 000
i1 bi2 · · · bin .
smartworlD.asia
For example, the string a5 a1 a6 a5 represents correct addition, because 1011 +
0010 = 1101. Whereas, a5 a0 a6 a5 does not represent a correct addition, be-
cause 1011 + 0010 6= 1001.
We observe that the language L over Σ which contain all strings that
represent correct addition, i.e.
L = {ai1 ai2 · · · ain ∈ Σ∗ | b0i1 b0i2 · · · b0in + b00i1 b00i2 · · · b00in = b000 000 000
i1 bi2 · · · bin },
Note that the NFA accepts LR ∪ {ε}. Hence, by Remark 5.1.6, LR is regular.
Now, by Theorem 5.1.7, L is regular, as desired.
{x ∈ Σ∗ | ∃y ∈ L2 such that xy ∈ L1 }.
98
Smartzworld.com 99 jntuworldupdates.org
Smartworld.asia Specworld.in
Example 5.1.11. 1. Let L1 = {a, ab, bab, baba} and L2 = {a, ab}; then
L1 /L2 = {ε, bab, b}.
3. Let L5 = 0∗ 10∗ .
(a) L5 /0∗ = L5 .
(b) L5 /10∗ = 0∗ .
(c) L5 /1 = 0∗ .
smartworlD.asia
so that A 0 is a DFA. We claim that L(A 0 ) = L/L0 . For w ∈ Σ∗ ,
w ∈ L(A 0 ) ⇐⇒ δ̂(q0 , w) ∈ F 0
⇐⇒ δ̂(q0 , wx) ∈ F, for some x ∈ L0
⇐⇒ w ∈ L/L0 .
h : Σ∗1 −→ Σ∗2
h(xy) = h(x)h(y).
One may notice that to give a homomorphism from Σ∗1 to Σ∗2 , it is enough to
give images for the elements of Σ1 . This is because as we are looking for a
homomorphism one can give the image of h(x) for any x = a1 a2 · · · an ∈ Σ∗1
by
h(a1 )h(a2 ) · · · h(an ).
Therefore, a homomorphism from Σ∗1 to Σ∗2 is a mapping from Σ1 to Σ∗2 .
99
Then, h is a homomorphism from Σ∗1 to Σ∗2 , which for example assigns the
image 10010010 for the string abb.
We can generalize the concept of homomorphism by substituting a lan-
guage instead of a string for symbols of the domain. Formally, a substitution
is a mapping from Σ1 to P(Σ∗2 ).
Example 5.1.14. Let Σ1 = {a, b} and Σ2 = {0, 1}. Define h : Σ1 −→
P(Σ∗2 ) by
h(a) = {0n | n ≥ 0}, say L1 ;
h(b) = {1n | n ≥ 0}, say L2 .
Then, h is a substitution. Now, for any string a1 a2 · · · an ∈ Σ∗1 , its image
under the above substitution h is
smartworlD.asia
the concatenation of languages. For example, h(ab) is the language
L1 L2 = {0m 1n | m, n ≥ 0} = L(0∗ 1∗ ).
100
((0 + 1)∗ 1)+ (0(0 + 1)∗ )+ = ((0 + 1)∗ 1)∗ ((0 + 1)∗ 1)(0(0 + 1)∗ )(0(0 + 1)∗ )∗
= ((0 + 1)∗ 1)∗ (0 + 1)∗ 10(0 + 1)∗ (0(0 + 1)∗ )∗
smartworlD.asia
= (0 + 1)∗ 10(0 + 1)∗ .
The following Theorem 5.1.17 confirms that the expression obtained in this
process represents h(L). Thus, from the expression of h(L), we can conclude
that the language h(L) is the set of all strings over {0, 1} that have 10 as
substring.
101
r = r1 + r2 or r = r1 r2 or r = r1∗
for some regular expressions r1 and r2 . Note that both r1 and r2 have k or
fewer operations. Hence, by inductive hypothesis, we have
smartworlD.asia
where r10 and r20 are the regular expressions which are obtained from r1 and
r2 by replacing ra for each a in r1 and r2 , respectively.
Consider the case where r = r1 + r2 . The expression r0 (that is obtained
from r) is nothing else but replacing each ra in the individual r1 and r2 , we
have
r0 = r10 + r20 .
Hence,
as desired, in this case. Similarly, other two cases, viz. r = r1 r2 and r = r1∗ ,
can be handled.
Hence, the class of regular languages is closed under substitutions by
regular languages.
Corollary 5.1.18. The class of regular languages is closed under homomor-
phisms.
102
is regular.
δ 0 : Q × Σ1 −→ Q
is defined by
δ 0 (q, a) = δ̂(q, h(a))
for all q ∈ Q and a ∈ Σ1 . Note that A 0 is a DFA. Now, for all x ∈ Σ∗1 , we
prove that
smartworlD.asia
δ̂ 0 (q0 , x) = δ̂(q0 , h(x)).
This gives us L(A 0 ) = h−1 (L), because, for x ∈ Σ∗1 ,
We prove our assertion by induction on |x|. For basis, suppose |x| = 0. That
is x = ε. Then clearly,
103
for all x ∈ Σ∗1 with |x| = k. Let x ∈ Σ∗1 with |x| = k and a ∈ Σ1 . Now,
smartworlD.asia
by f (0) = 01 and f (1) = 10 so that f is a homomorphism. Now note that
f (L) = {f (x) | x ∈ L}
= {f (a1 · · · an ) | a1 · · · an ∈ L}
= {f (a1 ) · · · f (an ) | a1 · · · an ∈ L}
= {a1 b1 · · · an bn | a1 · · · an ∈ L and ai = 0 iff bi = 1}
= L0
1. v 6= ε, and
2. uv i w ∈ L, for all i ≥ 0.
104
q0 , q 1 , . . . , q n
q q
r+1 s−1
a r+1
as
a1
q q1 qr−1 qr q q an
0 ar a s+1 s+1 n−1
qn
Thus, for i ≥ 0, uv i w ∈ L.
Remark 5.2.2. If L is finite, then by choosing κ = 1 + max{|x| | x ∈ L}
one may notice that L vacuously satisfies the pumping lemma, as there is no
string of length greater than or equal to κ in L. Thus, the pumping lemma
holds good for all regular languages.
105
smartworlD.asia
This statement can be better explained via the following adversarial game.
Given a language L, if we want to show that L is not regular, then we play
as given in the following steps.
1. An opponent will give us an arbitrary number κ.
106
smartworlD.asia
Through the following points, in this case, we demonstrate a contra-
diction to pumping lemma.
1. In this case, v is in the first block of 00 s and p < n.
2. Suppose v is pumped for i = 2.
3. If the resultant string uv i w is of odd length, then clearly it is not
in L.
4. Otherwise, suppose uv i w = yz with |y| = |z|.
4n+p
5. Then, clearly, |y| = |z| = 2
= 2n + p2 .
6. Since z is the suffix of the resultant string and |z| > 2n, we have
p
z = 1 2 0n 1n .
p
7. Hence, clearly, y = 0n+p 1n− 2 6= z so that yz 6∈ L.
Using a similar argument as given in Case-1 and the arguments shown
in Example 5.2.3, one can demonstrate contradictions to pumping lemma
in each of the following remaining cases.
Case-2 (For q ≥ 1, v = 1q with x = 0n 1k1 v1k2 0n 1n ). That is, v is in the first
block of 10 s.
Case-3 (For p ≥ 1, v = 0p with x = 0n 1n 0k1 v0k2 1n ). That is, v is in the
second block of 00 s.
107
smartworlD.asia
Hence, L is not regular.
Remark 5.2.5. Although it is sufficient to choose a particular string to counter
the pumping lemma, it is often observed that depending on the string chosen
there can be several possibilities of partitions as uvw that are to be considered
as we have to check for all possibilities. For instance, in Example 5.2.3 we
have discussed three cases. On the other hand, in Example 5.2.4 instead of
choosing a typical string we have chosen a string which reduces the number
of possibilities to discuss. Even then, there are ten possibilities to discuss.
In the following, we show how the number of possibilities, to be con-
sidered, can be reduced further. In fact, we observe that it is sufficient to
consider the occurrence of v within the first κ symbols of the string under
consideration. More precisely, we state the assertion through the following
theorem, a restricted version of pumping lemma.
Theorem 5.2.6 (Pumping Lemma – A Restricted Version). If L is an infi-
nite regular language, then there exists a number κ (associated to L) such that
for all x ∈ L with |x| ≥ κ, x can be written as uvw satisfying the following:
1. v 6= ε,
2. |uv| ≤ κ, and
3. uv i w ∈ L, for all i ≥ 0.
108
Proof. The proof of the theorem is exactly same as that of Theorem 5.2.1,
except that, when the pigeon-hole principle is applied to find the repetition
of states in the accepting sequence, we find the repetition of states within
the first κ + 1 states of the sequence. As |x| ≥ κ, there will be at least κ + 1
states in the sequence and since |Q| = κ, there will be repetition in the first
κ + 1 states. Hence, we have the desired extra condition |uv| ≤ κ.
smartworlD.asia
v 6= ε then there is only one possibility for v in x, as in the previous example.
Now, pumping v will result a string that is not a palindrome so that L is not
regular.
we have
h(L0 ) = {0n 1n | n ≥ 0}.
Since regular languages are closed under homomorphisms, h(L0 ) is also reg-
ular. But we know that {0n 1n | n ≥ 0} is not regular. Thus, we arrived at a
contradiction. Hence, L is not regular.
109