0% found this document useful (0 votes)

5 views401 pages

Book

The document titled 'Theory of Finite Automata with an Introduction to Formal Languages' by John Carroll and Darrell Long provides a comprehensive overview of finite automata and formal languages. It covers foundational topics such as logic, set theory, and relations, followed by detailed discussions on finite automata, their applications, and related concepts like regular expressions and Turing machines. The text serves as both an introduction and a reference for understanding the theoretical aspects of automata and formal languages.

Uploaded by

Harry sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views401 pages

Book

Uploaded by

Harry sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 401

Theory of Finite Automata

with an Introduction to Formal Languages

John Carroll and Darrell Long

August 2, 2016
2
Contents

0 Preliminaries 5
0.1 Logic and Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
0.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.4 Cardinality and Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.5 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
0.6 Backus-Naur Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1 Introduction and Basic Definitions 25

1.1 Alphabets and Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.2 Definition of a Finite Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3 Examples of Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.4 Circuit Implementation of Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.5 Applications of Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2 Characterization of Finite Automaton Definable Languages 65

2.1 Right Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.2 Nerode’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3 Pumping Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3 Minimization of Finite Automata 85

3.1 Homomorphisms and Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Minimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4 Nondeterministic Finite Automata 113

4.1 Definitions and Basic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2 Circuit Implementation of NDFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.3 NDFAs With Lambda-Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5 Closure Properties 143

5.1 FAD Languages and Basic Closure Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.2 Further Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

6 Regular Expressions 173

6.1 Algebra of Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.2 Regular Sets As FAD Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.3 Language Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

3
6.4 FAD Languages As Regular Sets; Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . 192

7 Finite-State Transducers 203

7.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.2 Minimization of Finite-State Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.3 Moore Sequential Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.4 Transducer Applications and Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . 225

8 Regular Grammars 243

8.1 Overview Of The Grammar Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8.2 Right-Linear Grammars and Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.3 Regular Grammars and Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

9 Context-Free Grammars 271

9.1 Parse Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
9.2 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
9.3 Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.4 Pumping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
9.5 Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

10 Pushdown Automata 307

10.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10.2 Equivalence of PDAs and CFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.3 Equivalence Of Acceptance By Final State and Empty Stack . . . . . . . . . . . . . . . . . . . 325
10.4 Closure Properties and Deterministic Pushdown Automata . . . . . . . . . . . . . . . . . . . 327

11 Turing Machines 339

11.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
11.2 Variants of Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
11.3 Turing Machines, LBAs, and Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
11.4 Closure Properties and The Hierarchy Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 368

12 Decidability 373
12.1 Decidable Questions About Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 373
12.2 Other Decidable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
12.3 An Undecidable Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
12.4 Turing Decidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
12.5 Turing-Decidable Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388

4
Chapter 0

Preliminaries

This chapter reviews some of the basic concepts used in this text. Many can be found in standard texts
on discrete mathematics. Much of the notation employed in later chapters is also presented here.

0.1 Logic and Set Theory

A basic familiarity with the nature of formal proofs is assumed; most proofs given in this text are complete
and rigorous, and the reader is encouraged to work the exercises in similar detail. A knowledge of logic
circuits would be necessary to construct the machines discussed in this text. Important terminology and
techniques are reviewed here.
Unambiguous statements that can take on the values True or False (denoted by 1 and 0, respectively)
can be combined with connectives such as and (∧), or (∨), and not (¬) to form more complex state-
ments. The truth tables for several useful connectives are given in Figure 0.1, along with the symbols
representing the physical devices that implement these connectives.
As an example of a complex statement, consider the assertion that two statements p and q take on
the same value. This can be rephrased as: Either (p is true and q is true) or (p is false and q is false). As
the truth table for not shows, a statement r is false exactly when ¬r is true; the above assertion could be
further refined to: Either (p is true and q is true) or (¬p is true and ¬q is true).
In symbols, this can be abbreviated as:

(p ∧ q) ∨ (¬p ∧ ¬q)

NOT gate AND gate OR gate NAND gate NOR gate

p ¬p p q p ∧q p q p ∨q p q p↑q p q p↓q
1 0 1 1 1 1 1 1 1 1 0 1 1 0
0 1 1 0 1 1 0 1 1 0 1 1 0 0
0 1 0 0 1 1 0 1 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1

Figure 0.1: Common logic gates and their truthtables

5
p q ¬p ¬q ¬p ∧ ¬q p ∧q (p ∧ q) ∨ (¬p ∧ ¬q)
1 1 0 0 0 1 1
1 0 0 1 0 0 0
0 1 1 0 0 0 0
0 0 1 1 1 0 1

Figure 0.2: Truth tables for various compound expressions

Figure 0.3: Functionally equivalent circuits

The truth table covering the four combinations of truth values of p and q can be built from the truth
tables defining ∧, ∨, and ¬, as shown in 0.2. The truth table shows that the assertion is indeed true in the
two cases where p and q reflect the same values, and false in the two cases where the values assigned to
p and q differ. When the statement that r and s always take on the same value is indeed true, we often
write r iff s (“r if and only if s”). The biconditional can also be denoted by r ⇔ s (“r is equivalent to s”).
Consider the statement (p ∧ q) ∨ (p ↓ q). Truth tables can be constructed to verify that (p ∧ q) ∨ (¬p ∧
¬q) and (p ∧ q) ∨ (p ↓ q) have identical truth tables, and thus (p ∧ q) ∨ (¬p ∧ ¬q) ⇔ (p ∧ q) ∨ (p ↓ q).

Example 0.1
Circuitry for realizing each of the above statements is displayed in Figure 0.3. Since the two statements
were equivalent, the circuits will exhibit the same behavior for all combinations of input signals p and
q. The second circuit would be less costly to build since it contains fewer components, and tangible
benefits therefore arise when equivalent but less cumbersome statements can be derived. Techniques
for minimizing such circuitry are presented in most discrete mathematics texts.
Example 0.1 shows that it is straightforward to implement statement formulas by circuitry. Recall that
the location of the 1 values in the truth table can be used to find the corresponding principal disjunctive
normal form (PDNF) for the expression represented by the truth table. For example, the truth table
corresponding to NAND has three rows with 1 values (p = 1, q = 0; p = 0, q = 1; p = 0, q = 0), leading to
three terms in the PDNF expression: (p ∧ ¬q) ∨ (¬p ∧ q) ∨ (¬p ∧ ¬q). This formula can be implemented
as the circuit illustrated in Figure 0.4 and thus a NAND gate can be replaced by this combination of three
ANDs and one OR gate. This circuit relies on the assurance that a quantity of interest (such as p) will
generally be available in both its negated and unnegated forms. Hence we can count on access to an
input line representing ¬p (rather than feeding the input for p into a NOT gate).
In a similar fashion, any statement formula can be represented as a group of AND gates feeding
a single OR gate. In larger truth tables, there may be many more 1 values, and hence more complex
statements may need many AND gates. Regardless of the statement complexity, however, circuits based

6
Figure 0.4: A circuit equivalent to a single NAND gate

(p ∨ q) ∧ r ⇔ (p ∧ r ) ∨ (q ∧ r ) (p ∧ q) ∨ r ⇔ (p ∨ r ) ∧ (q ∨ r ) (distributive laws)
(p ∨ q) ∨ r ⇔ p ∨ (q ∨ r ) (p ∧ q) ∧ r ⇔ p ∧ (q ∧ r ) (associative laws)
p ∨q ⇔ q ∨p p ∧q ⇔ q ∧p (commutative laws)
¬(p ∨ q) ⇔ ¬p ∧ ¬q ¬(p ∧ q) ⇔ ¬p ∨ ¬q (De Morgan’s laws)
(p ∨ q) ∧ p ⇔ p (p ∧ q) ∨ p ⇔ p (absorption laws)
p ∨ ¬p ⇔ True p ∧ ¬p ⇔ False (mutual exclusion)

Figure 0.5: Some Useful equivalences and their duals

on the PDNF of an expression will allow for a fast response to changing input signals, since no signal
must propagate through more than two gates.
Other useful equivalences are given in Figure 0.5. Each rule has a dual, written on the same line.
Predicates are often used to make statements about certain objects. such as the numbers in the
set (Z) of integers. For example. Q might represent the property of being less than 5, in which case
Q(x) will represent the statement “x is less than 5.” Thus, Q(3) is true, while Q(7) is false. It is often
necessary to make global statements such as: All integers have the property P , which can be denoted by
(∀x ∈ Z)P (x). Note that the dummy variable x was used to state the concept in a convenient form; x is not
meant to represent a particular object, and the statement could be equivalently phrased as (∀i ∈ Z)P (i ).
For the predicate Q defined above, the statement (∀x ∈ Z)Q(x) is false, while when applied to more
restricted domains, (∀x ∈ {1, 2, 3})Q(x) is true, since it is in this case equivalent to Q(1) ∧ Q(2) ∧ Q(3), or
(1 < 5) ∧ (2 < 5) ∧ (3 < 5).
In a similar fashion, the statement that some integers have the property P will be denoted by (∃i ∈
Z)P (i ). For the predicate Q defined above, (∃i ∈ {4, 5, 6})Q(i ) is true, since it is equivalent to Q(4)∨Q(5)∨
Q(6), or (4 < 5) ∨ (5 < 5) ∨ (6 < 5). The statement (∃y ∈ {7, 8, 9})Q(y) is false.
Note that asserting that it is not the case that all objects have the property P is equivalent to saying
that there is at least one object that does not have the property P . In symbols, we have

¬(∀x ∈ Z)P (x) ⇔ (∃x ∈ Z)(¬P (x))

Similarly,
¬(∃x ∈ Z)P (x) ⇔ (∀x ∈ Z)(¬P (x))

Given two statements A and B , if B is true whenever A is true, we will say that A implies B , and write
A ⇒ B . For example, the truth tables show that p ∧ q ⇒ p ∨ q, since for the case where p ∧ q is true
(p = 1, q = 1), p ∨ q is true, also. In the cases where p ∧ q is false, the value of p ∨ q is immaterial.

7
A basic knowledge of set theory is assumed. Some standard special symbols will be repeatedly used
to designate common sets.

Definition 0.1 The set of natural numbers is given by N = {0, 1, 2, 3, 4, . . .}.

The set of integers is given by Z = {. . . , −2, −1, 0, 1, 2, . . .}.
The set of rational numbers is given by Q = { ba | a ∈ Z ∧ b ∈ Z ∧ b 6= 0}.
The set of real numbers (points on the number line) will be denoted by R.

The following concepts and notation will be used frequently throughout the text.

Definition 0.2 Let A and B be sets. A is a subset of B if every element of A also belongs to B ; that is,
A ⊆ B ⇔ (∀x)(x ∈ A ⇒ x ∈ B ).

Definition 0.3 Two sets A and B are said to be equal if they contain exactly the same elements; that is,
A = B ⇔ (∀x)(x ∈ A ⇔ x ∈ B ).

Thus, two sets A and B are equal iff A ⊆ B and B ⊆ A. The symbol ⊂ will be used to denote a proper
subset: A ⊂ B iff A ⊆ B and A 6= B .

Definition 0.4 For sets A and B , the cross product of A with B , is the set of all ordered pairs from A and
B ; that is, A × B = {〈a, b〉 | a ∈ A ∧ b ∈ B }.

0.2 Relations
Relations are used to describe relationships between members of sets of objects. Formally, a relation is
just a subset of a cross product of two sets.

Definition 0.5 Let X and Y be sets. A relation R from X to Y is simply a subset of X × Y . If 〈a, b〉 ∈ R, we
write aRb. If 〈a, b〉 ∉ R, we write a6 R b. If X = Y , we say R is a relation in X.

Example 0.2
Let X = {1, 2, 3}. The familiar relation < (less than) would then consist of the following ordered pairs:
<: {〈1, 2〉, 〈1, 3〉, 〈2, 3〉}, by which we mean to indicate that 1 < 2, 1 < 3, and 2 < 3. 〈3, 3〉 ∉< since 3 6< 3.
Some relations have special properties. For example, the relation “less than” is transitive, by which
we mean that for any numbers x, y, and z, if x < y and y < z, then x < z. Definition 0.6 describes an
important class of relations that have some familiar properties.

Definition 0.6 A relation is reflexive iff (∀x)(xR x).

A relation is symmetric iff (∀x)(∀y)(xR y ⇒ yR x).
A relation is transitive iff (∀x)(∀y)(∀z)((xR y ∧ yR z) ⇒ xR z).
An equivalence relation is a relation that is reflexive, symmetric, and transitive.

Example 0.3
< is not an equivalence relation; while it is transitive, it is not reflexive since 3 6< 3. (It is also not symmet-
ric, since 2 < 3, but 3 6< 2.)

8
Example 0.4
Let X = N. The familiar relation = (equality) is an equivalence relation.

=: {〈0, 0〉, 〈1, 1〉, 〈2, 2〉, 〈3, 3〉, 〈4, 4〉, . . .},

and it is clear that (∀x)(∀y)(x = y ⇒ y = x). The equality relation is therefore symmetric, and it is likewise
obvious that = is also reflexive and transitive.

Definition 0.7 Let R be an equivalence relation in X , and let h ∈ X . Then [h]R refers to the equivalence
class consisting of all entities that are related to h by the equivalence relation R; that is, [h]R = {y | y R h}.

Example 0.5
The equivalence classes for = are singleton sets: [1]= = {I }, [5]= = {5}, and so on.

Example 0.6
Let X = Z, and define the relation R in Z by

〈u, v〉R〈w, x〉iffux = v w

If 〈x, y〉 is viewed as the fraction xy , then R is the relation that identifies equivalent fractions: 23 R 14
21 ,
since 2 · 21 = 3 · 14. In this sense, R can be viewed as the equality operator on the set of rational numbers
Z.
Note that in this context the equivalence class [ 28 ]R represents the set of all “names” for the point
one-fourth of the way between 0 and 1; that is,
· ¸
2 −3 −2 −1 1 2 3 4 5
= {. . . , , , , , , , , , . . .}
8 R −12 −8 −4 4 8 12 16 20

There are therefore many other ways of designating this same set; for example,
· ¸
1 −3 −2 −1 1 2 3 4 5
= {. . . , , , , , , , , , . . .}
4 R −12 −8 −4 4 8 12 16 20

Example 0.7
Let X = N and choose an n ∈ N. Define R n by

xR n y iff (∃i ∈ Z)(x − y = i · n)

That is, two numbers are related if their difference is a multiple of n. Equivalently, x and y must have
the same remainder upon dividing each of them by n if we are to have xR n y . R n can be shown to be an
equivalence relation for each natural number n. The equivalence classes of R 2 , for example, are the two
familiar sets, the even numbers and the odd numbers. The equivalence classes for R 3 are

[0]R3 = {0, 3, 6, 9, 12, 15, . . .}

[1]R3 = {1, 4, 7, 10, 13, . . .}
[2]R3 = {2, 5, 8, 11, 14, . . .}

9
R n is often called congruence modulo n, and xR n y is commonly denoted by X ≡ Y (mod n) or x ≡n
Y.
If R is an equivalence relation in X , then every element of X belongs to exactly one equivalence class
of R. X is therefore comprised of the union of the equivalence classes of R, and in this sense R partitions
the set X into disjoint subsets. Conversely, a partition of X defines an equivalence relation in X ; the sets
of the partition can be thought of as the equivalence classes of the resulting relation.

Definition 0.8 Given a set X and sets A 1 , A 2 , . . . , A n , P = {A 1 , A 2 , . . . , A n } is a partition of X if the sets in P

are all subsets of X , they cover X , and are pairwise disjoint. That is, the following three conditions are
satisfied:
(∀i ∈ {1, 2, . . . , n})(A i ⊆ X )
(∀x ∈ X )(∃i ∈ {1, 2, . . . , n} 3 x ∈ A i )
(∀i , j ∈ {1, 2, . . . , n})(i 6= j ⇒ A i ∩ A j = 0)

Definition 0.9 Given a set X and a partition P = {A 1 , A 2 , . . . , A n } of X , the relation R(P ) in X induced by P
is given by
(∀x ∈ X )(∀y ∈ X )(x R(P ) y ⇔ (∃i ∈ {1, 2, . . . , n} 3 x ∈ A i ∧ y ∈ A i ))

R(P ) thus relates elements that belong to the same subset of P .

Example 0.8

Let X = {1, 2, 3, 4, 5} and consider the relation Q = R(S) induced by the partition S = {〈1, 2〉, 〈3, 5〉, 〈4〉}.
Since 1 and 2 are in the same set, they should be related by Q, while 16 Q4 because 1 and 4 belong to
different sets of the partition. Q can be described by

Q = {〈1, 1〉, 〈1, 2〉, 〈2, 1〉, 〈2, 2〉, 〈3, 3〉, 〈3, 5〉, 〈4, 4〉, 〈5, 3〉, 〈5, 5〉}

It is straightforward to check that Q satisfies the three properties needed to qualify as an equivalence
relation, and the equivalence classes of Q are

[1]Q = {l , 2}
[2]Q = {1, 2}
[3]Q = {3, 5}
[4]Q = {4}
[5]Q = {3, 5}

The set of distinct equivalence classes of Q can be used to partition X ; note that these three classes
comprise S. In a similar manner, the three distinct equivalence classes of R 3 in Example 0.7 form a
partition of N.
A “finer” partition of X can be obtained by breaking up the equivalence classes of Q into smaller (and
hence more numerous) sets. The resulting relation is called a refinement of Q.

Definition 0.10 Given two equivalence relations R and Q in a set X , R is a refinement of Q iff R ⊆ Q; that
is, (∀x ∈ X )(∀y ∈ X )(〈x, y〉 ∈ R ⇒ 〈x, y〉 ∈ Q).

10
Example 0.9
Consider Q = {〈1, 1〉, 〈1, 2〉, 〈2, 1〉, 〈2, 2〉, 〈3, 3〉, 〈3, 5〉, 〈4, 4〉, 〈5, 3〉, 〈5, 5〉} and S = {〈1, 1〉, 〈2, 2〉, 〈3, 3〉,
〈3, 5〉, 〈4, 4〉, 〈5, 3〉, 〈5, 5〉}. S is clearly a subset of Q, and hence S refines Q. Note that the partition induced
by S, {{l }, {2}, {3, 5}, {4}}, indeed splits up the partition induced by Q, which was {{1, 2}, {3, 5}, {4}}. While it
may at first seem strange, the fact that S contained fewer ordered pairs than Q guarantees that S will
yield more equivalence classes than Q.

0.3 Functions
A function f is a special type of relation in which each first coordinate is associated with one and only one
second coordinate, in which case we can use functional notation f (x) to indicate the unique element f
associates with a given first coordinate x. In the previous section we concentrated on relations in X , that
is, subsets of X × X . The set of first coordinates of a function f (the domain X ) is often different from the
set of possible second coordinates (the codomain Y ), and hence f will be a subset of X × Y .

Definition 0.11 A function f: X → Y is a subset of X × Y for which

1. (∀ ∈ X )(∃y ∈ Y 3 x f y).

2. (∀x ∈ X )((x f y 1 ∧ x f y 2 ) ⇒ y 1 = y 2 ).

When a pair of elements are related by a function, we will write f (a) = b instead of a f b or 〈a, b〉 ∈ f .
The criteria for being a function could then be rephrased as (∀x ∈ X )(∃y ∈ Y 3 f (x) = y), and (∀x 1 ∈
X )(∀x 2 ∈ X )(x 1 = x 2 ⇒ f (x 1 ) = f (x 2 )).

Example 0.10
Let n be a positive integer. Define f n : N → N by f n ( j ) = the smallest natural number i for which j ≡ i mod
n, f 3 ( j ), for example, is a function and is represented by the ordered pairs f 3 : {〈0, 0〉, 〈1, 1〉, 〈2, 2〉, 〈3, 0〉,
〈4, 1〉, . . .}. This implies that f 3 (0) = 0, f 3 (1) = 1, f 3 (2) = 2, f 3 (3) = 0, and so on.
Note that f 3 is a subset of the relation R 3 given in Example 0.7. If R 3 were presented as a function,
it would not be well defined; that is, R 3 does not satisfy Definition 0.11. For example, 2R 3 5 and 2R 3 8,
but 5 6= 8, and so R 3 (2) is not a meaningful expression, since there is no unique object that R 3 associates
with 2. In this case, R 3 violated Definition 0.11 by associating more than one object with a given first
coordinate; in general, a proposed relation may also fail to be well defined by associating no objects with
a potential first coordinate.

Example 0.11
Consider the “function” g : Q → N defined by g ( m n )¡=¢m. This apparently straightforward definition is
fundamentally flawed. According to the formula, g 82 = 2, g 79 = 7, g 10
¡ ¢ ¡5¢
= 5, and so forth. However,
2 5
¡2¢ ¡5¢
8 = 20 , but g 8 = 2 6= 5 = g 20 , and Definition 0.11 is again violated; g (0.25) is not a well defined
quantity, and thus the “function” g is not well defined. Had g truly been a function, it would have passed
the test: if x = y, then g (x) = g (y).
The problem with this seemingly innocent definition is that the domain element 0.25 is actually an
equivalence class of fractions (recall Example 0.6), and the definition of g was based on just one repre-
sentative of that class. We observed that two representatives ( 82 and 20
5
) of the same class gave conflicting

11
answers (2 and 5) for the value that g associated with their class (0.25). While it is possible to define
functions on a set of equivalence classes in a consistent manner, it will always be important to verify that
such functions are single valued.
Selection criteria, which determine whether a candidate does or does not belong to a given set, are
special types of functions.

Definition 0.12 Given a set A, the characteristic function χ A associated with A is defined by

χ A (x) = 1 if x ∈ A and χ A (x) = 0 if x ∉ A

Example 0.12
The characteristic function for the set of odd numbers is the function f 2 given in Example 0.10.
To say that a set is well defined essentially means that the characteristic function associated with that
set is a well-defined function. A set of equivalence classes can be ill defined if the definition is based on
the representatives of those equivalence classes.

Example 0.13
Consider the “set” of fractions that have odd numerators, whose characteristic “function” is defined by:
³m ´
χB = 1 if m is odd
n
and ³m ´
χB = 0 if m is even
n
This characteristic function suffers from flaws similar to those found in the function g in Example
0.11. 14 = 82 and yet X B 41 = 1 while X B 28 = 0, which implies that the fraction 41 belongs to B , while 28 is
¡ ¢ ¡ ¢

not an element of B . Due to this ambiguous definition of set membership, B is not a well-defined set. B
failed to pass the test: if x = y, then (x ∈ B iff y ∈ B ).
The definition of a relation requires the specification of the domain, codomain, and the ordered
pairs comprising the relation. For relations that are functions, every domain element must occur as a
first coordinate. However, the set of elements that occur as second coordinates need not include all the
codomain (as was the case in the function f n in Example 0.10).

Definition 0.13 The range of a function f : X → Y is given by

{y ∈ Y | ∃x ∈ X 3 f (x) = y}

Conditions similar to those imposed on the behavior of first coordinates of a function may also be
placed on second coordinates, yielding specialized types of functions. Functions for which the range
encompasses all the codomain, for example, are called surjective

Definition 0.14 A function f : X → Y is onto or surjective iff

(∀y ∈ Y )(∃x ∈ X 3 f (x) = y); that is,

a set of ordered pairs representing an onto function must have at least one first coordinate associated with
any given second coordinate .

12
Example 0.14
The function g : {1, 2, 3} → {a, b} defined by g (l ) = a, g (2) = b, and g (3) = a is onto since both codomain
elements are part of the range of g . However, the function h : {1, 2, 3} → {a, b, c} defined by h(l ) = a,
h(2) = b, and h(3) = a is not onto since no domain element maps to c.
The function f : N → N defined by f (i ) = i + 1 (∀i = 0, 1, 2, . . .) is not onto since there is no element x
for which f (x) = 0.
Definition 0.15 A function f : X → Y is one to one or injective iff
(∀x 1 ∈ X )(∀x 2 ∈ X )( f (x 1 ) = f (x 2 ) ⇒ x 1 = x 2 ); that is,
an injective function must not have more than one first coordinate associated with any given second coor-
dinate .

Example 0.15
The function f : N → N defined by f (i ) = i + 1 (∀i = 0, 1, 2, . . .) is clearly injective since if f (i ) = f ( j ) then
i + 1 = j + 1, and so i must equal j .
The function g : {1, 2, 3} × {a, b} defined by g (1) = a, g (2) = b, and g (3) = a is not one to one since
g (1) = g (3), but 1 6= 3.
Definition 0.16 A function is a bijection iff it is one to one and onto (injective and surjective); that is, it
must satisfy
1. (∀x 1 ∈ X )(∀x 2 ∈ X )( f (x 1 ) = f (x 2 ) ⇒ x 1 = x 2 ).

2. (∀y ∈ Y )(∃x ∈ X 3 f (x) = y).

A bijective function must therefore have exactly one first coordinate associated with any given second
coordinate.

Example 0.16
The function f : N → N defined by f (i ) = i + 1 (∀i = 0, 1, 2, . . .) is injective but not surjective, so it is not a
bijection. However, the function b: Z → Z defined by b(i ) = i + 1 (∀i = . . . , −2, −1, 0, 1, 2, . . .) is a bijection.
Note that while the rule for b remains the same as for f , both the domain and range have been expanded,
and many more ordered pairs have been added to form the function b.
It is often appropriate to take the results produced by one function and apply the rule specified by
a second function. For example, we may have a list associating students with their height in inches
(that is, we have a function relating names with numbers). The conversion rule for changing inches into
centimeters is also a function (associating any given number of inches with the corresponding length in
centimeters), which can be applied to the heights given in the student list to produce a new list matching
student names with their height in centimeters. This new list is referred to as the composition of the
original two functions.
Definition 0.17 The composition of two functions f : X → Y and g : Y → Z is given by
g ◦ f = {〈x, z〉 | ∃y ∈ Y 3 〈x, y〉 ∈ f and 〈y, z〉 ∈ g }
Note that the composition is not defined unless the codomain of the first function matches the do-
main of the second function. In functional notation, g ◦ f = {〈x, z〉 | ∃y ∈ Y 3 f (x) = y and g (y) = z}, and
therefore when g ◦ f is defined, it can be described by the rule g ◦ f (x) = g ( f (x)).

13
Example 0.17
Consider the functions f 3 from Example 0.10 and f from Example 0.14, where f 3 : N → N was defined
by f 3 (i ) = the smallest natural number i for which j = i mod 3, and the function f : N → N is defined by
f (i ) = i + 1. f ◦ f 3 consists of the ordered pairs {〈0, 1〉, 〈1, 2〉, 〈2, 3〉, 〈3, 1〉, 〈4, 2〉, 〈5, 3〉, . . .} and is represented
by the rule f ◦ f 3 ( j ) = f 3 ( j )+1, which happens to be the smallest positive number that is congruent to j +
1 mod 3. Note that f 3 ◦ f ( j ) = f 3 ( j +1), which happens to be the smallest natural number that is congru-
ent to j +1 mod 3. This represents a different set of ordered pairs {〈0, 1〉, 〈1, 2〉, 〈2, 0〉, 〈3, 1〉, 〈4, 2〉, 〈5, 0〉, . . .}.
In most cases, f ◦ g 6= g ◦ f .

Theorem 0.1 Let the functions f : X → Y and g : Y → Z be onto. Then g ◦ f is onto.

Proof. See the exercises.

Theorem 0.2 Let the functions f : X → Y and g : Y → Z be one to one. Then g ◦ f is one to one.
Proof. See the exercises.

Definition 0.18 The converse of a relation R, written ∼R, is defined by

∼R = {〈y, x〉 | 〈x, y〉 ∈ R}

The converse of a function f is likewise

∼f = {〈y, x〉 | 〈x, y〉 ∈ f }

If ∼f happens to be a function, it is called the inverse of f and is denoted by f −1 .

When the inverse exists, it is appropriate to use functional notation f −1 also, and we therefore have,
for any elements a and b, f −1 (b) = a iff f (a) = b. Note that if f : X → Y then f −1 : Y → X .

Example 0.18
Consider the ordered pairs for the relation <: {〈1, 2〉, 〈1, 3〉, 〈2, 3〉}. The converse is then ∼<: {〈2, 1〉, 〈3, 1〉, 〈3, 2〉}.
Thus, the converse of “less than” is the relation “greater than”
The function b:Z → Z defined by b(i ) = i + 1(∀i = . . . , −2, −1, 0, 1, 2, . . .) has the inverse b −1 : Z → Z
defined by b −1 (i ) = i − 1(∀i = . . . , −2, −1, 0, 1, 2, . . .). The inverse of the function that increments integers
by 1 is the function that decrements integers by the same amount.
The function f : Z → Z defined by f (i ) = i 2 (∀i = . . . , −2, −1, 0, 1, 2, . . .) has a converse that is not a
function over the given domain and codomain; the inverse notation is inappropriate, since f −1 (3) is not
defined, nor is f −1 (−4).
Not surprisingly, if the converse of f is to be a function, the codomain of f (which will be the new
domain of f −1 ) must satisfy conditions similar to those imposed on the domain of f . In particular:

Theorem 0.3 Let f : X → Y be a function. The converse of f is a function iff f is a bijection.

Proof. See the exercises.

If f is a bijection, f −1 must exist and will also be a bijection. In fact, the compositions f ◦ f −1 and
f −1 ◦ f are the identity functions on the domain and codomain, respectively (see the exercises).

14
0.4 Cardinality and Induction
The size of various sets will frequently be of interest in the topics covered in this text, and it will occasion-
ally be necessary to consider the set of all subsets of a given set.

Definition 0.19 Given a set A, the power set of A, denoted by ℘(A) or 2 A , is

℘(A) = {X | X ⊆ A}

Example 0.19
℘({a, b, c}) = {;, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}
and
℘({ }) = {;}.
Note that {;} 6= ;.

Definition 0.20 Two sets X and Y are equipotent if there exists a bijection f : X → Y , and we will write
kX k = kY k. kX k denotes the cardinality of X , that is, the number of elements in X .

That is, sets with the same cardinality or “size” are equipotent. The equipotent relation is reflexive,
symmetric, and transitive and is therefore an equivalence relation.

Example 0.20
The function g : {a, b, c} → {x, y, z} defined by g (a) = z, g (b) = y, and g (c) = x is a bijection, and thus
k{a, b, c}k = k{x, y, z}k. The equivalence class consisting of all sets that are equipotent to {a, b, c} is gener-
ally associated with the cardinal number 3. Thus, k{a, b, c}k = 3; k{ }k = 0. {a, b, c} is not equipotent to { },
and hence 3 6= 0.
The subset relation allows the sizes of sets to be ordered: kAk ≤ kB k iff (∃C )(C ⊆ B ∧ kAk = kC k). We
will write kAk < kB k iff (kAk ≤ kB k and kAk 6= kB k). The observations about {a, b, c} and { } imply that
0 < 3.
For N = {0, 1, 2, 3, 4, 5, 6, . . .} and E = {0, 2, 4, 6, . . .}, the function f : N → E, defined by f (x) = 2x, is a
bijection. The set of natural numbers N is countably infinite, and its size is often denoted by ℵ0 = kNk.
The doubling function f shows that kNk = kEk. Similarly, it can be shown that Z and N × N are also
countably infinite (see the exercises). A set that is equipotent to one of its proper subsets is called an
infinite set. Since kNk = kEk and yet E ⊂ N, we know that N must be infinite. No such correspondence
between {a, b, c} and any of its proper subsets is possible, so {a, b, c} is a finite set. 3 is therefore a finite
cardinal number, while ℵ0 represents an infinite cardinal number.
Theorem 0.4 compares the size of a set A with the number of subsets of A and shows that kAk <
k℘(A)k. For the sets in Example 0.19, we see that 3 < 8 and 0 < 1, which is not unexpected. It is perhaps
surprising to find that the theorem will also apply to infinite sets, for example, kNk < k℘(N)k. This means
that there are cardinal numbers larger than ℵ0 ; there are infinite sets that are not countably infinite.
Indeed, the next theorem implies that there is an unending progression of infinite cardinal numbers.

Theorem 0.4 Let A be any set. Then kAk < k℘(A)k

Proof. There is a bijection between A and the set of all singleton subsets of A, as shown by the function
s: A → {{x} | x ∈ A} defined by s(z) = {z} for each z ∈ A. Since {{x} | x ∈ A} ⊆ ℘(A), we have kAk ≤ k℘(A)k.

15
It remains to show that kAk 6= k℘(A)k. By definition of cardinality, we must show that there cannot exist a
bijection between A and ℘(A). The following proof by contradiction will show this.
Assume f : A → ℘(A) is a function; we will demonstrate that there must exist a set in ℘(A) that is not in
the range of f , and hence f cannot be onto. Consider an element z of A and the set f (z) to which it maps.
f (z) is a subset of A, and hence z may or may not belong to f (z). Define B to be the set {y ∈ A | y ∉ f (y)}. B
is then the set of all elements of A that do not appear in the set corresponding to their image under f . It is
impossible for B to be in the range of f , for if it were then there would be an element of A that maps to this
subset: assume w ∈ A and f (w) = B . Since w is an element of A, it might belong to B , which is a subset
of A. If w ∈ B , then w ∈ f (w), since f (w) = B ; but the elements for which y ∈ f (y) were exactly the ones
omitted from B , and thus we would have w ∉ B , which is a contradiction. Our speculation that w might
belong to B is therefore incorrect. The only other option is that w does not belong to B . But if w ∉ B = f (w),
then w is one of the elements that are supposed to be in B and we are again faced with the impossibility
that w ∉ B and w ∈ B . In all cases, we reach a contradiction if we assume that there exists an element w
for which f (w) = B . Thus, B was a member of the codomain that is not in the range of f , and f is therefore
not a bijection.

Sets that are finite or are countably infinite are called countable or denumerable because their ele-
ments can be arranged one after the other (enumerated). We will often need to prove that a given state-
ment is true in an infinite variety of cases that can be enumerated by the natural numbers 0, 1, 2, . . .. The
assertion that the sum of the first n positive numbers can be predicted by multiplying n by the number
one larger than n and dividing the result by 2 seems to be true for various test values of n:
3(3+1)
1+2+3 = 2
5(5+1)
1+2+3+4+5 = 2

and so on. We would like to show that the assertion is true for all values of n = 1, 2, 3, . . ., but we clearly
could never check the arithmetic individually for an infinite number of cases. The assertion, which varies
according to the particular number n we choose, can be represented by the statement

n
P (n):1 + 2 + 3 + · · · + (n − 2) + (n − 1) + n adds up to (n + 1) .
2

Note that P (n) is not a number; it is the assertion that two numbers are the same and therefore will only
take on the values True and False. We would like to show that P (n) is true for each positive integer n; that
is, (∀n)P (n). Notice that if you were to attempt to check out whether P (101) was true your work would be
considerably simplified if you already knew how the first 100 numbers added up. If the first 100 summed
to 5050, it is clear that 1 + 2 + · · · + 99 + 100 + 101 = (1 + 2 + · · · + 99 + 100) + 101 = 5050 + 101 = 5151; the
hard part of the calculation can be done without doing arithmetic with 101 separate numbers. Checking
that (101+1)101
2 agrees with 5151 shows that P (101) is indeed true [that is, as long as we are sure that our
calculations in verifying P (100) are correct]. Essentially, the same technique could have been used to
show that P (6) followed from P (5). This trick of using the results of previous cases to help verify further
cases is reflected in the principle of mathematical induction.

Theorem 0.5 Let P (n) be a statement for each natural number n ∈ N. From the two hypotheses

i. P (0)

ii. (∀m ∈ N)(P (m) ⇒ P (m + 1))

16
we can conclude (∀n ∈ N)P (n).

The fundamental soundness of the principle is obvious in light of the following analogy: Assume you
can reach the basement of some building (hypothesis i). If you were assured that from any floor m you
could reach the next higher floor (hypothesis ii), you would then be assured that you could reach any
floor you wished ((∀n ∈ N)P (n)).
Similar statements can be made from other starting points; for example, beginning with P (4) and
(∀m>4)(P (m) ⇒ P (m + 1), we can derive the conclusion (∀m > 4)P (n); had we started on the fourth
floor of the building, we could reach any of the higher floors.

Example 0.21
Consider the statement discussed above, where P (n) was the assertion that 1+2+3+· · ·+(n−2)+(n−1)+n
adds up to (n+1)n 2 . We will begin with P (1) (the basis step) and note that 1 = (1+1)1
2 , so P (1) is indeed true.
For the inductive step, let m be an arbitrary (but fixed) positive integer, and assume P (m) is true; that is,
1 + 2 + 3 + · · · + (m − 2) + (m − 1) + m adds up to (m+1)m
2 . We need to show P (m + 1): 1 + 2 + 3 + · · · + (m +
(m+1+1)(m+1)
1 − 2) + (m + 1 − 1) + (m + 1)) adds up to 2 . As in the case of proceeding from 100 to 101, we
will use the fact that the first m integers add up correctly (the induction assumption) to see how the first
m + 1 integers add up. We have:

1 + 2 + 3 + · · · + (m + 1 −2) + (m + 1 − 1) + (m + 1)
= (1 + 2 + 3 + · · · + (m + 1 − 2) + (m + 1 − 1)) + (m + 1)
= (m+1)m
2 + (m + 1)
(m+1)m
= 2 + (m+1)2
2
= ((m+1)m+(m+1)2)
2
= (m+1)(m+2)
2
= (m+1+1)(m+1)
2

P (m+1) is therefore true, and P (m+1) indeed follows from P (m). Since m was arbitrary, (∀m)(P (m) ⇒
P (m + 1)) and, by induction, (∀n ≥ 1)P (n). The formula is therefore true for every positive integer n. It
is interesting to note that, with the usual convention of defining the sum of no integers to be zero, the
formula also holds for n = 0, and P (0) could have been used as the basis step to prove (∀n ∈ N)P (n).

Example 0.22
Consider the statement

Any statement formula using the n variables p 1 , p 2 , . . . , p n has an equivalent expression that
contains less than n · 2n operators.

This can be proved by induction on the statement

P (n) : Any statement formula using n or fewer variables has an equivalent expression that
contains less than n · 2n operators.

Basis step: A statement formula in one variable must be either be p, ¬p, T, or F, each of which requires
at most one operator, and since 1 < 1 · 21 , P (1) is true.
Inductive step: Assume P (m) is true; we need to prove that P (m + 1) is true, which is to say that
we need to ensure that the statement holds not just for formulas with m or fewer variables, but also for

17
formulas with m+1 variables. Thus, choose an expression S containing the variables p 1 , p 2 , . . . , p m , p m+1 .
Consider the principal disjunctive normal form (PDNF) of S. This expression is equivalent to S and has
terms that can be separated into two categories: (1) those that contain the term p m+l , and (2) those
that contain the term ¬p m+1 . While the PDNF may very well contain more than the desired number of
terms, the distributive law can be used to factor p m+l out of all the terms in (1), leaving an expression of
the form C ∧ p m+1 where C is a formula containing only the terms p 1 , p 2 , . . . , p m . Similarly, ¬p m+1 can
be factored out of all the terms in (2), leaving an expression of the form D ∧ ¬p m+1 , where D is also a
formula containing only the terms p 1 , p 2 , . . . , p m .
S can therefore be written as (C ∧ p m+1 ) ∨ (D ∧ ¬p m+1 ), which contains the four operators ∧, ∨, ∧,
and ¬ and the operators that comprise the formulas for C and D. However, since both C and D only
contain the m variables p 1 , p 2 , . . . , p m , the induction assumption ensures that they each have equivalent
representations using no more than m · 2m operators. S can therefore be written in a form containing at
most 4+m·2m +m·2m operators, which can be shown to be less than (m+1)·2m+1 for all positive numbers
m. Since S was an arbitrary expression with m +1 operators, we have shown that an y statement formula
using exactly m + 1 variables has an equivalent expression that contains no more than (m + 1) · 2m+1
operators.
Since P (m) was assumed true, we likewise know that any statement formula using m or fewer vari-
ables also has an equivalent expression that contains no more than m·2m operators. P (m+1) is therefore
true, and P (m +1) indeed follows from P (m). Since m was an arbitrary positive integer, (∀m > 1)(P (m) ⇒
P (m + 1)) and by induction (∀n > 1)P (n). The formula is therefore true for every natural number n.

0.5 Recursion
Since this text will be dealing with devices that repeatedly perform certain operations, it is important to
understand the recursive definition of functions and how to effectively investigate the properties of such
functions. Recall that the factorial function ( f (n) = n!) is defined to be the product of the first n integers.
Thus,
f (1) = 1
f (2) = 1 · 2 = 2
f (3) = 1 · 2 · 3 = 6
f (4) = 1 · 2 · 3 · 4 = 24
and so on. Note that individual definitions get longer as n increases. If we adopt the convention that
f (0) = 1, the factorial function can be recursively defined in terms of other values produced by the func-
tion.

Definition 0.21 For x ∈ N, define

f (x) = 1, if x = 0
f (x) = x · f (x − 1), if x > 0

This definition implies that f (3) = 3 · f (2) = 3 · 2 · f (1) = 3 · 2 · 1 · f (0) = 3 · 2 · 1 · 1 = 6.

0.6 Backus-Naur Form

The syntax of programming languages is often illustrated with syntax diagrams or described in Backus-
Naur Form (BNF) notation.

18
Figure 0.6: Syntax diagrams for the components of integer constants

Example 0.23
The constraints for integer constants, which may begin with a sign and must consist of one or more
digits, are succinctly described by the following productions (replacement rules):

The symbol | represents “or,” and the rule

<sign> ::= + |−
−

should be interpreted to mean that the token <sign> can be replaced by either the symbol + or the symbol
− . A typical integer constant is therefore +12
+12, since it can be derived by applying the above rules in the
following fashion:

<integer> → <sign><natural>
<sign><natural> → + <natural>
+ <natural> → + <digit><natural>
+ <digit><natural> → +1
+1<natural>
+1
+1<natural> → +1
+1<digit>
+1
+1<digit> → +12

19
Figure 0.7: A syntax diagram for integer constants

Syntax diagrams for each of the four productions are shown in Figure 0.6. These can be combined to
form a diagram that does not involve the intermediate tokens <sign>, <digit>, and <natural> (see Figure
0.7).

Exercises
0.1. Construct truth tables for:

(a) ¬r ∨ (¬p ↓ ¬q)

(b) (p ∧ ¬q) ∨ ¬(p ↑ q)

0.2. Draw circuit diagrams for:

(a) ¬(r ∨ (¬p ↓ ¬q)) ↑ (s ∧ p)

(b) (p ∧ ¬q) ∨ ¬(p ↑ q)

0.3. Show that the sets {1, 2} × {a, b} and {a, b} × {1, 2} are not equal.

0.4. Let X = {1, 2, 3, 4}.

(a) Determine the set of ordered pairs comprising the relation <.
(b) Determine the set of ordered pairs comprising the relation =.
(c) Since relations are sets of ordered pairs, it makes sense to union them together. Determine
the set = ∪ <.
(d) Determine the set of ordered pairs comprising the relation ≤.

0.5. Let n ∈ N be a natural number. Show that congruence modulo n, =n , is an equivalence relation.

20
0.6. Let X = N. Determine the equivalence classes for congruence modulo 0.

0.7. Let X = N. Determine the equivalence classes for congruence modulo 1.

0.8. Let X = R. Determine the equivalence classes for congruence modulo 1.

0.9. Let R be an arbitrary equivalence relation in X . Prove that the distinct equivalence classes of R form
a partition of X .

0.10. Given a set X and a partition P = {A 1 , A 2 , . . . , A n } of X , prove that X equals the union of the sets in
P.

0.11. Given a set X and a partition P = {A 1 , A 2 , . . . , A n } of X , prove that the relation R(P ) in X induced by
P is an equivalence relation.

0.12. Let X = {1, 2, 3, 4}.

(a) Give an example of a partition P for which R(P ) is a function.

(b) Give an example of a partition P for which R(P ) is not a function.

0.13. The following “proof” seems to indicate that a relation that is symmetric and transitive must also
be reflexive:

By symmetry, xR y ⇒ yRx.
Thus we have (xR y ∧ yRx).
By transitivity, (xR y ∧ yRx) ⇒ xR x. Hence (∀x)(xR x).

Find the flaw in this “proof” and give an example of a relation that is symmetric and transitive but
not reflexive.

0.14. Let R be an arbitrary equivalence relation in X . Prove that the equality relation on X refines R.

0.15. Consider the “function” t : R → R defined by pairing x with the real number whose cosine is x.

(a) Show that t is not well defined.

(b) Adjust the domain and range of t to produce a valid function.

0.16. Consider the function s 0: R → R defined by s 0 (x) = x 2 Show that the converse of s 0 is not a function.

0.17. Let P be the set of nonnegative real numbers, and consider the function s: P → P defined by s(x) =
x 2 . Show that s −1 exists.

0.18. Let f : X × Y be an arbitrary function. Prove that the converse of j is a function iff f is a bijection.

0.19. (a) Let ∼A denote the complement of a set A. Prove that ∼(∼A) = A.
(b) Let ∼R denote the converse of a relation R. Prove that ∼(∼R) = R

0.20. Let the functions f : X → Y and g : Y → Z be one to one. Prove that g of is one to one.

0.21. Let the functions f : X → Y and g : Y → Z be onto. Prove that g ◦ f is onto.

0.22. Define two functions for which f ◦ g = g ◦ f

21
0.23. Define, if possible, a bijection between:

(a) N and Z
(b) N and N × N
(c) N and Q
(d) N and {a, b, c}
n 2 (n+1)2
0.24. Use induction to prove that the sum of the cubes of the first n positive integers adds up to 4

0.25. Use induction to prove that the sum of the first n positive integers is less than n 2 (for n > 1).

0.26. Use induction to prove that, for n > 3, n! > n 2 .

0.27. Use induction to prove that, for n > 3, n! > 2n .

0.28. Use induction to prove that 12 + 22 + · · · + n 2 = n(n + 1) (2n+1)

6 .

0.29. Prove by induction that X ∩ (X 1 ∪ X 2U · · · ∪ X n ) = (X ∩ X 1 ) ∪ (X ∩ X 2 ) ∪ · · · ∪ (X ∩ X n ).

0.30. Let ∼ A denote the complement of the set A. Prove by induction that ∼(X 1 ∪ X 2 ∪· · ·∪ X n ) = (∼X 1 )∩
(∼X 2 ) ∩ · · · ∩ (∼X n ).

0.31. Use induction to prove that there are 2n subsets of a set of size n; that is, for any finite set A,
k℘(A)k = 2kAk .

0.32. The principle of mathematical induction is often stated in the following form, which requires (ap-
parently) stronger hypotheses to reach the desired conclusion: Let P (n) be a statement for each
natural number n ∈ N. From the two hypotheses

i. P (0)
ii. (∀m ∈ N)(((∀i ≤ m)P (i )) ⇒ P (m + 1))

we can conclude (∀n ∈ N)P (n). Prove that the strong form of induction is equivalent to the state-
ment of induction given in the text. Hint: Consider the restatement of the hypothesis given in
Example 0.22.

0.33. Determine what types of strings are defined by the following BNF:

0.34. A set X is cofinite if the complement of X (with respect to some generally understood universal set)
is finite. Let the universal set be Z. Give an example of

22
p1 p2 p3 q
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 1

Figure 0.8: The truth table for Exercise 0.41

(a) A finite set

(b) A cofinite set
(c) A set that is neither finite nor cofinite

0.35. Consider the equipotent relation, which relates sets to other sets.

(a) Prove that this relation is reflexive.

(b) Prove that this relation is symmetric.
(c) Prove that this relation is transitive

0.36. Define a function that will show that kNk = kN × Nk

0.37. Show that N is equipotent to N.

0.38. Show that N is equipotent to Q.

0.39. Show that ℘(N) is equipotent to { f : N → {Yes, No} | f is a function}.

0.40. Show that ℘(N) is equipotent to R.

0.41. Draw a circuit diagram that will implement the function q given by the truth table shown in Figure
0.8.

0.42. (a) Draw a circuit diagram that will implement the function q 1 given by the truth table shown in
Figure 0.9.
(b) Draw a circuit diagram that will implement the function q 2 given by the truth table shown in
Figure 0.9.
(c) Draw a circuit diagram that will implement the function q 3 given by the truth table shown in
Figure 0.9.

23
p1 p2 p3 p4 q1 q2 q3
0 0 0 0 1 1 1
0 0 0 1 0 1 0
0 0 1 0 1 0 0
0 0 1 1 1 1 0
0 1 0 0 0 1 1
0 1 0 1 1 0 0
0 1 1 0 1 1 0
0 1 1 1 0 0 0
1 0 0 0 1 0 1
1 0 0 1 1 0 0
1 0 1 0 0 1 0
1 0 1 1 0 0 0
1 1 0 0 1 1 1
1 1 0 1 0 1 0
1 1 1 0 0 1 1
1 1 1 1 0 0 0

Figure 0.9: The truth table for Exercise 0.41

24
Chapter 1

Introduction and Basic Definitions

This chapter introduces the concept of a finite automaton, which is perhaps the simplest form of abstract
computing device. Although finite automata theory is concerned with relatively simple machines, it is an
important foundation of a large number of concrete and abstract applications. The finite-state control of
a finite automaton is also at the heart of more complex computing devices such as finite-state transducers
(Chapter 7), pushdown automata (Chapter 10), and Turing machines (Chapter 11).
Applications for finite automata can be found in the algorithms used for string matching in text ed-
itors and spelling checkers and in the lexical analyzers used by assemblers and compilers. In fact, the
best known string matching algorithms are based on finite automata. Although finite automata are gen-
erally thought of as abstract computing devices, other non-computer applications are possible. These
applications include traffic signals and vending machines or any device in which there are a finite set of
inputs and a finite set of things that must be “remembered” by the device.
Briefly, a deterministic finite automaton, also called a recognizer or acceptor, is a mathematical model
of a finite-state computing device that recognizes a set of words over some alphabet; this set of words is
called the language accepted by the automaton. For each word over the alphabet of the automaton, there
is a unique path through the automaton; if the path ends in what is called a final or accepting state, then
the word traversing this path is in the language accepted by the automaton.
Finite automata represent one attempt at employing a finite description to rigorously define a (pos-
sibly) infinite set of words (that is, a language). Given such a description, the criterion for membership in
the language is straightforward and well-defined; there are simple algorithms for ascertaining whether
a given word belongs to the set. In this respect, such devices model one of the behaviors we require of
a compiler: recognizing syntactically correct programs. Actually, finite automata have inherent limita-
tions that make them unsuitable for modeling the compilers of modern programming languages, but
they serve as an instructive first approximation. Compilers must also be capable of producing object
code from source code, and a model of a simple translation device is presented in Chapter 7 and en-
hanced in later chapters.
Logic circuitry can easily be devised to implement these automata in hardware. With appropriate
data structures, these devices can likewise be modeled with software. An example is the highly inter-
active Turing’s World© , developed at Stanford University by Jon Barwise and John Etchemendy. This
Apple® Macintosh graphics package and the accompanying tutorial are particularly useful in experi-
menting with many forms of automata. Both hardware and software approaches will be explored in this
chapter. We begin our formal treatment with some fundamental definitions.

25
1.1 Alphabets and Words
The devices we will consider are meant to react to and manipulate symbols. Different applications may
employ different character sets, and we will therefore take care to explicitly mention the alphabet under
consideration.

Definition 1.1 Σ is an alphabet iff Σ is a finite nonempty set of symbols.

An element of an alphabet is often called a letter, although there is no reason to restrict symbols in
an alphabet to consist solely of single characters. Some familiar examples of alphabets are the 26-letter
English alphabet and the ASCII character set, which represents a standard set of computer codes. In this
text we will usually make use of shorter, simpler alphabets, like those given in Example 1.1.

Example 1.1
i. {0, 1}

ii. {a, b, c}

iii. {〈0, 0〉, 〈0, 1〉, 〈1, 0〉, 〈1, 1〉}

It is important to emphasize that the elements (letters) of an alphabet are not restricted to single
characters. In example (iii.) above, the alphabet is composed of the ordered pairs in {0, 1} × {0, 1}. Such
an alphabet will be utilized in Chapter 7 when we use sequential machines to construct a simple binary
adder.
Based on the definition of an alphabet, we can define composite entities called words or strings,
which are finite sequences of symbols from the alphabet.

Definition 1.2 For a given alphabet Σ and a natural number n, a sequence of symbols a 1a 2 . . .a
a n is a word
(or string) over the alphabet Σ of length n iff for each i = 1, 2, . . . , n,a
a 1 ∈ Σ.

As formally specified in Definition 1.5, the order in which the symbols of the word occur will be
deemed significant, and therefore a word of length 3 can be identified with an ordered triple belonging
to Σ × Σ × Σ. Indeed, one may view the three-letter word bc a as a convenient shorthand for the ordered
b, c, a
triple 〈b, a〉. A word over an alphabet is thus an ordered string of symbols, where each symbol in the
string is an element of the given alphabet. An obvious example of words are what you are reading right
now, which are words (or strings) over the standard English alphabet. In some contexts, these strings of
symbols are occasionally called sentences.

Example 1.2
Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9}; some examples of words over this alphabet are
i. 42

ii. 242342
Even though only three different members of Σ occur in the second example, the length of 242342
is 6, as each symbol is counted each time it occurs. To easily and succinctly express these concepts, the
absolute value notation will be employed to denote the length of a string. Thus, |4242 242342
42| = 2, |242342
242342| = 6,
and |aa l a 2a 3a 4 | = 4.

26
Definition 1.3 For a given alphabet Σ and a word x = a 1a 2 . . .a
a n over Σ, |x| denotes the length of x. That
a 1a 2 . . .a
is, |a a n | = n.

It is possible to join together two strings to form a composite word; this process is called concatena-
tion. The concatenation of two strings of symbols produces one longer string of symbols, which is made
up of the characters in the first string, followed immediately by the symbols of the second string.

Definition 1.4 Given an alphabet Σ, let x = a 1 , . . .a b n , be strings where each a i ∈ Σ and

a n and y = b 1 , . . .b
each b j ∈ Σ. The concatenation of the strings x and y, denoted by x · y, is the juxtaposition of x and y;
a n b 1 . . .b
that is, x · y = a 1 . . .a b n ,.

Note in Definition 1.4 that |x · y| = n + m = |x| + |y|. Some examples of string concatenation are

bbb = aaabbb
i. aaa ·bbb

ii. home·run = homerun

iii. a 2 ·b
b 3 = aabbb

Example (iii.) illustrates a shorthand for denoting strings. Placing a superscript after a symbol means
that this entity is a string made by concatenating it to itself the specified number of times. In a similar
ac)3 is meant to express ac acac
ac
fashion, (ac acac. Note that an equal sign was used in the above examples. For-
mally, two strings are equal if they have the same number of symbols and these symbols match, character
for character.

Definition 1.5 Given an alphabet Σ, let x = a 1 . . .a b m be strings over Σ. x and y are equal
a n and y = b 1 . . .b
iff n = m and for each i = 1, 2, . . . , n, a i = b i .

The operation of concatenation has certain algebraic properties: it is associative, and it is not com-
mutative. That is,

i. (∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀z ∈ Σ∗ )x · (y · z) = (x · y) · z.

ii. For most strings x and y, x · y 6= y · x.

When the operation of concatenation is clear from the context, we will adopt the convention of omit-
ting the symbol for the operator (as is done in arithmetic with the multiplication operator). Thus x y z
refers to x · y · z. In fact, in Chapter 6 it will be seen that the operation of concatenation has many alge-
braic properties that are similar to those of arithmetic multiplication.
It is often necessary to count the number of occurrences of a given symbol within a word. The nota-
tion described in the next definition will be an especially useful shorthand in many contexts.

Definition 1.6 Given an alphabet Σ, and some b ∈ Σ, the length of a word w with respect to b , denoted
|w|b , is the number of occurrences of the letter b within that word.

27
Example 1.3
abb
abb|b = 2
i. |abb

abb
abb|c = 0
ii. |abb

1000000001118881888888
iii. |1000000001118881888888
1000000001118881888888|1 = 5

Definition 1.7 Given an alphabet Σ, the empty word, denoted by λ, is defined to be the (unique) word
consisting of zero letters.

The empty word is often denoted by ² in many formal language texts. The empty string serves as the
identity element for concatenation. That is, for all strings x,

x ·λ = λ·x = x

Even though the empty word is represented by a single character, λ is a string but is not a member of any
alphabet: λ ∉ Σ. (Symbols in an alphabet all have length one; λ has length zero.)
A particular string x can be divided into substrings in several ways. If we choose to break x up into
three substrings u, v, and w, there are many ways to accomplish this. For example, if x = abccd bc bc, it
could be written as ab · ccd · bc
bc; that is, x = uv w, where u = ab
ab, v = ccd , and w = bc
bc. This x could also
be written as abc · λ ·cd
cd bc abc, v = λ, and w = cd bc
bc, where u = abc bc. In this second case, |x| = 7 = 3 + 0 + 4 =
|u| + |v| + |w|.
A fundamental structure in formal languages involves sets of words. A simple example of such a set
is Σk , the collection of all words of exactly length k (for some k ∈ N) that can be constructed from the
letters of Σ.

Definition 1.8 Given an alphabet Σ and a nonnegative integer k ∈ N, we define

Σk = {x | x is a word over Σ and |x| = k}

Example 1.4
If

Σ = {0
0,1
1}

then

Σ0 = {λ}
Σ1 = {0
0,1
1}
Σ2 = {00
00 01
00,01 10
01,10 11
10,11
11}
Σ3 = {000
000 001
000,001 010
001,010 011
010,011 100
011,100 101
100,101 110
101,110 111
110,111
111}

λ is the only element of Σ0 , the set of all words containing zero letters from Σ. There is no difficulty in
letting λ be an element (and the only element) of Σ0 , since each Σk is not necessarily an alphabet, but is
instead a set of words; λ, according to the definition, is indeed a word consisting of zero letters.

28
Definition 1.9 Given an alphabet Σ, define

∞
Σ∗ = Σk = Σ0 ∪ Σ1 ∪ Σ2 ∪ Σ3 ∪ . . .
[
k=1

and
∞
Σ+ = Σk = Σ1 ∪ Σ2 ∪ Σ3 ∪ . . .
[
k=1

Σ∗ is the set of all words that may be constructed from the letters of an alphabet Σ. Σ+ is the set of all
nonempty words that may be constructed from Σ.
Σ∗ , like the set of natural numbers, is an infinite set. Although Σ∗ is infinite, each word in Σ∗ is of
finite length. This property follows from the definition of Σ∗ and a property of natural numbers: any
k ∈ N must by definition be a finite number. Σ∗ is defined to be the union of all Σk , k ∈ N. Since each
such k is a finite number and every word in Σk is of length k, then every word in Σk must be of finite
length. Furthermore, since Σ∗ is the union of all such Σk , every word in Σ∗ must also be of finite length.
While Σ∗ can contain arbitrarily long words, each of these words must be finite, just as every number in
N is finite.
Since Σ∗ is the union of all Σk for k ∈ N, Σ∗ must also contain Σ0 . In other words, besides containing
all words that can be constructed from one or more letters of Σ, Σ∗ also contains the empty word λ.
While λ ∉ Σ, λ ∈ Σ∗ . λ represents a string and not a symbol, and thus the empty string cannot be in the
alphabet Σ. However, λ is included in Σ∗ , since Σ∗ is not just an alphabet, but a collection of words over
the alphabet Σ. Note, however, that Σ+ is Σ∗ − {λ}; Σ+ specifically excludes λ.

1.2 Definition of a Finite Automaton

We now have the building blocks necessary to define deterministic finite automata. A deterministic fi-
nite automaton is a mathematical model of a machine that accepts a particular set of words over some
alphabet Σ.
A useful visualization of this concept might be referred to as the black@black box model. This con-
ceptualization is built around a black box that houses the finite-state control. This control reacts to the
information provided by the read head, which extracts data from the input tape. The control also governs
the operation of the output indicator, often depicted as an acceptance light, as shown in Figure 1.1.
There is no limit to the number of symbols that can be on the tape (although each individual word
must be of finite length). As the input tape is read by the machine, state transitions, which alter the
current state of the automaton, take place within the black box. Depending on the word contained on
the input tape, the light bulb either lights or remains dark when the end of the input string is reached,
indicating acceptance or rejection of the word, respectively. We assume that the input head can sense
when it has passed the last symbol on the tape. In some sense, a personal computer fits the finite-state
control model; it reacts to each keystroke entered from the keyboard according to the current state of
the CPU and its own internal memory. However, the number of possible bit patterns that even a small
computer can assume is so astronomically large that it is totally impractical to model a computer in this
fashion. Finite-state machines can be profitably used to describe portions of a computer (such as parts
of the arithmetic/logic unit, as discussed in Chapter 7, Example 7.15) and other devices that assume a
reasonable number of states.

29
Figure 1.1: A model of a finite-state acceptor (DFA)

Although finite automata are usually thought of as processing strings of letters over some alphabet,
the input can conceptually be elements from any finite set. A useful example is the “brain” of a vending
machine, which, say, dispenses 30¢ candy bars.

Example 1.5
The input to the vending machine is the set of coins {nickel, dime, quarter}, represented by n , d , and
q in Figure 1.2. The machine may only “remember” a finite number of things; in this case, it will keep
track of the amount of money that has been dropped into the machine. Thus, the machine may be in
the “state” of remembering that no money has yet been deposited (denoted in this example by <0¢>), or
that a single nickel has been inserted (the state labeled <5¢>, or that either a dime or two nickels have
been deposited <10¢>, and so on. Note that from state <0¢> there is an arrow labeled by the dime token
d pointing to the state <10¢>, indicating that, at a time when the machine “believes” that no money has
been deposited, the insertion of a dime causes the machine to transfer to the state that remembers that
n)
ten cents has been deposited. From the <0¢> state, the arrows in the diagram show that if two nickels (n
are input the machine moves through the <5¢> state and likewise ends in the state labeled <10¢>.
The vending machine thus counts the amount of change dropped into the machine (up to 50¢). The
machine begins in the state labeled <0¢> and follows the arrows to higher-numbered states as coins are
inserted. For example, depositing a nickel, a dime, and then a quarter would move the machine to the
states <5¢>, <15¢>, and then <40¢>. The states labeled 30¢ and above are doubly encircled to indicate
that enough money has been deposited; if 30¢ or more has been deposited, then the machine “accepts,”
indicating that a candy bar may be selected.
Finite automata are appropriate whenever there are a finite number of inputs and only a finite num-
ber of situations must be distinguished by the machine. Other applications include traffic signals and
elevators (as discussed in Chapter 7). We now present a formal mathematical definition of a finite-state
machine.

Definition 1.10 A deterministic finite automaton or deterministic finite acceptor (DFA) is a quintuple
〈Σ, S, s 0 , δ, F 〉, where

i. Σ is the input alphabet (a finite nonempty set of symbols).

ii. S is a finite nonempty set of states.

30
Figure 1.2: An implementation of a vending machine

iii. s 0 is the start (or initial) state, an element of S.

iv. δ is the state transition function; δ: S × Σ → S.

v. F is the set of final (or accepting) states, a (possibly empty) subset of S.

The input alphabet, Σ, for any deterministic finite automaton A, is the set of symbols that can appear
on the input tape. Each successive symbol in a word will cause a transition from the present state to
another state in the machine. As specified by the δ function, there is exactly one such state transition for
each combination of a symbol a ∈ Σ and a state s ∈ S. This is the origin of the word “deterministic” in the
phrase “deterministic finite automaton.”
The various states represent the memory of the machine. Since the number of states in the machine
is finite, the number of distinguishable situations that can be remembered by the machine is also finite.
This limitation of the device’s ability to store its past history is the origin of the word “finite” in the phrase
“deterministic finite automaton.” At any given time during processing, if the previous history of the
machine is considered to be the reactions of the DFA to the letters that have already been read, then the
current state represents all that is known about the history of the machine.
The start state of the machine is the state in which the machine always begins processing a string.
From this state, successive input symbols from Σ are used by the δ function to arrive at successive states
in the machine. Processing stops when the string of symbols is exhausted. The state in which the ma-
chine is left can either be a final state, in which case the word is accepted, or it can be any one of the other
states of S, in which case the word is rejected.
To produce a formal description of the concepts defined above, it is necessary to enumerate each
part of the quintuple that comprises the DFA. Σ, S, s 0 , and F are easily enumerated, but the function
δ can often be tedious to describe. One device used to display the mapping δ is the state transition
diagram. Besides graphically displaying the transitions of the δ function, the state transition diagram for
a deterministic finite automaton also illustrates the other four parts of the quintuple.
A finite automaton state transition diagram is a directed graph. The states of the machine represent
the vertices of the graph, while the mapping of the δ function describes the edges. Final states are de-

31
Figure 1.3: The DFA described in Example 1.6

type
Sigma=’a’..’c’;
State=(s0,s1,s2);
var
TransitionTable=array[State,Sigma] of State;

function Delta(S:State;A:Sigma):State;
begin
Delta:=TransitionTable [S, A]
end; {Delta}

Figure 1.4: A Pascal implementation of a state transition function

noted by a doubly encircled state, and the start state is identified by a straight incoming arrow. Each
domain element of the transition function corresponds to an edge in the directed graph. We formally
define a finite automaton state transition diagram for 〈Σ, S, s 0 , δ, F 〉 as a directed graph G = 〈V, E 〉, as
follows:

i. V = S,

ii. E = {〈s, t ,a a ∈ Σ ∧ δ(s,a

a 〉|s, t ∈ S,a a ) = t },

where V is the set of vertices of the graph, and E is the set of edges connecting these vertices. Each
a 〉, such that s is the origin vertex, t is the terminus, and a is the
element of E is an ordered triple, 〈s, t ,a
letter from Σ labeling the edge. Thus, for any vertex there is exactly one edge leaving that vertex for each
element of Σ.

Example 1.6
In the DFA shown in Figure 1.3, the set of edges E of the graph G is given by E = {〈s 0 , s 1 ,a a 〉, 〈s 0 , s 2 ,b
b 〉,
a 〉, 〈s 1 , s 2 ,b
〈s 1 , s 1 ,a b 〉, 〈s 2 , s 1 ,a
a 〉, 〈s 2 , s 0 ,b
b 〉}. The figure also shows that s 0 is the designated start state and that
s 1 is the only final state. The state transition function for a finite automaton is often represented in the
form of a state transition table. A state transition table is a matrix with the rows of the matrix labeled and

32
indexed by the states of the machine, and the columns of the matrix labeled and indexed by the elements
of the input alphabet; the entries in the table are the states to which the DFA will move. Formally, let T be
a state transition table for some deterministic finite automaton A = 〈Σ, S, s 0 , δ, F 〉, and let s ∈ S and a ∈ Σ.
Then the value of each matrix entry is given by the equation

a ∈ Σ)T saa = δ(s,a

(∀s ∈ S)(∀a a)

For the automaton in Example 1.6, the state transition table is

δ a b
s0 s1 s2
s1 s1 s2
s2 s1 s0

This table represents the following transitions:

δ(s 0 ,a
a ) = s1 δ(s 0 ,b
b ) = s2
δ(s 1 ,a
a ) = s1 δ(s 2 ,b
b ) = s2
δ(s 2 ,a
a ) = s1 δ(s 2 ,b
b ) = s0

State transition tables are the most common method of representing the basic structure of an au-
tomaton within a computer. When represented as an array in the memory of the computer, access is very
fast and the structure lends itself easily to manipulation by the computer. Techniques such as depth-first
search are easily and efficiently implemented when the state transition diagram is represented as a table.
Figure 1.4 illustrates an implementation of the δ function via transition tables in Pascal.
With δ, we can describe the state in which we will find ourselves after processing a single letter. We
also want to be able to describe the state at which we will arrive after processing an entire string. We will
extend the δ function to cover entire strings rather than just single letters; δ(s, x) will be the state we wind
up at when starting at s and processing, in order, all the letters of the string x. While this is a relatively
easy concept to (vaguely) state in English, it is somewhat awkward to formally define. To facilitate formal
proofs concerning DFAs, we use the following recursive definition.

Definition 1.11 Given a DFA A = 〈Σ, S, s 0 , δ, F 〉, the extended state transition function for A, denoted δ, is
a function δ: S × Σ∗ → S defined recursively as follows:

a ∈ Σ)
i. (∀s ∈ S)(∀a δ(s,aa ) = δ(s,aa)
ii. (∀s ∈ S) δ(s, λ) = s
iii. (∀s ∈ S)(∀x ∈ Σ∗ )(∀a
a ∈ Σ) δ(s,aa x) = δ(δ(s,a
a ), x)

The δ function extends the δ function from single letters to words. Whereas the δ function maps pairs
of states and letters to other states, the δ function maps pairs of states and words to other states. (i.) is
the observation that δ and δ treat single letters the same; this fact is not really essential to the definition
of δ, since it can be deduced from (ii.) and (iii.) (see the exercises).
The δ function maps the current state s and the first letter a 1 , of a word w = a 1 . . .a
a n via the δ function
to some other state t . It is then recursively applied with the new state t and the remainder of the word,
a n . The recursion stops when the remainder of the word is the empty word λ. See
that is, with a 2 . . .a
Examples 1.7 and 1.11 for illustrations of computations using this recursive definition.

33
Since the recursion of the δ function all takes place at the end of the string, δ is called tail recursive.
Tail recursion is easily transformed into iteration by applying the δ to successive letters of the input word
and using the result of the previous application of δ as an input to the current application.
Figure 1.5 gives an implementation of the δ function in Pascal. Recursion has been replaced by it-
eration, and previous function results are saved in an auxiliary variable T. The function Delta, the input
alphabet Sigma, and the state set State agree with the definitions given in Figure 1.4.
It stands to reason that if we start in state s and word y takes us to state r , and if we start in state r
and word x takes us to state t , then the word y x should take us from state s all the way to t . That is, if
δ(s, y) = r and δ(r, x) = t , then δ(s, y x) should equal t , also. We can indeed prove this, as shown with the
following theorem.

Theorem 1.1 Let A = 〈Σ, S, s 0 , δ, F 〉 be a DFA. Then

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀s ∈ S)(δ(s, y x) = δ(δ(s, y), x))
Proof. Define P (n) by
(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀s ∈ S)(δ(s, y x) = δ(δ(s, y), x))
Basis step: P (0): Let y ∈ Σ0 (⇒ y = λ).

δ(δ(s, y), x) (since y = λ)

= δ(δ(s, λ), x) (by Definition 1.11ii)
= δ(s, x) (since x = λ · x)
= δ(s, λx) (since y = λ)
= δ(s, y x)

Inductive step: Assume P (m):

(∀x ∈ Σ∗ )(∀y ∈ Σm )(∀s ∈ S)(δ(s, y x) = δ(δ(s, y), x)),

For any z ∈ Σm+l , (∃a

a ∈ Σ1 )(∃y ∈ Σm ) 3 z = a y. Then

δ(s, zx) (by definition of z)

= δ(s,a a y x) (by Definition 1.11iii)
= δ(δ(s,a a ), y x) (since (∃t ∈ S) 3 δ(s,a
a) = t )
= δ(t , y x) (by the induction assumption)
= δ(δ(t , y), x) (by definition of t )
= δ(δ(δ(s,a a ), y), x) (by Definition 1.11iii)
= δ(δ(s,a a y), x) (by definition of z)
= δ(δ(s, z), x)

Therefore, P (m) ⇒ P (m + 1), and since this implication holds for any nonnegative integer m, by the prin-
ciple of mathematical induction we can say that P (n) is true for all n ∈ N. Since the statement therefore
holds for any string y of any length, the assertion is indeed true for all y in Σ∗ . This completes the proof of
the theorem.

Note that the statement of Theorem 1.1 is very similar to the rule iii of the recursive definition of the
extended state transition function (Definition 1.11) with the string y replacing the single letter a . We
will see a remarkable number of situations like this, where a recursive rule defined for a single symbol
extends in a natural manner to a similar rule for arbitrary strings.

34
const
MaxWordLength = 2$s_5$; {an arbitrary constraint}
type
Word = record
Length : 0 .. MaxWordLength;
Letters: packed array [0..MaxWordLength] of Sigma
end; {Word}
function DeltaBar(S:State; W:Word) : State;
{uses the function Delta defined previously}
var
T: State;
I:0..MaxWordLength;
begin
T: = S;
if W.Length>0
then
for I:=1 to W.Length do
T:=Delta(T, W. Letters[I]);
DeltaBar : = T
end; {DeltaBar}

Figure 1.5: A Pascal implementation of the extended state transition function

As alluded to earlier, the state in which a string terminates is significant; in particular, it is important
to determine whether the terminal state for a string happens to be one of the states that was designated
to be a final state.

Definition 1.12 Given a DFA A = 〈Σ, S, s 0 , δ, F 〉, A accepts a word w ∈ Σ∗ iff δ(s 0 , w) ∈ F .

We say a word w is accepted by a machine A = 〈Σ, S, s 0 , δ, F 〉 iff the extended state transition function
δ associated with A maps to a final state from s 0 when processing the word w. This means that the path
from the start state ultimately leads to a final state when the word w is presented to the machine. We will
occasionally say that A recognizes w; a DFA is sometimes referred to as a recognizer.

Definition 1.13 Given a DFA A = 〈Σ, S, s 0 , δ, F 〉, A rejects a word w ∈ Σ∗ iff δ(s 0 , w) ∉ F .

In other words, a word w is rejected by a machine A = 〈Σ, S, s 0 , δ, F 〉 iff the δ function associated with
A maps to a nonfinal state from s 0 when processing the word w.

Example 1.7
Let

A = 〈Σ, S, s 0 , δ, F 〉

where

35
Σ = {00,11}
S = {q 0 , q 1 }
s0 = q0
F = {q 1 }

and δ is given by the transition table

δ 0 1
q0 q0 q1
q1 q1 q0

The structure of this automaton is shown in Figure 1.6.

To see how some of the above definitions apply, let x = 0100
0100:

δ(q 0 , x) = δ(q 0 ,0100

0100
0100)
= δ(δ(q 0 ,0
0),100
100
100)
= δ(q 0 ,100
100
100)
= δ(δ(q 0 ,1
1),00
00
00)
= δ(q 1 ,00
00
00)
= δ(δ(q 1 ,0
0),0
0)
= δ(q 1 ,0
0)
= δ(q 1 ,0
0)
= q1

Thus, δ(q 0 , x) = q 1 ∈ F , which means that x is accepted by A; A recognizes x.

Now let y = 1100
1100:

δ(q 0 , x) = δ(q 0 ,1100

1100
1100)
= δ(δ(q 0 ,1
1),100
100
100)
= δ(q 1 ,100
100
100)
= δ(δ(q 1 ,1
1),00
00
00)
= δ(q 0 ,00
00
00)
= δ(δ(q 0 ,0
0),0
0)
= δ(q 0 ,0
0)
= δ(q 0 ,0
0)
= q0

Therefore, δ(q 0 , y) = q 0 ∉ F , which means that y is not accepted by A.

Following the Pascal conventions defined in the previous programming fragments, the function Ac-
cept defined in Figure 1.7 tests for acceptance of a string by consulting a FinalState set and using DeltaBar
to refer to the TransitionTable.
The functions Delta, DeltaBar, and Accept can be combined to form a Pascal program that models
a DFA. The sample fragments given in Figures 1.4, 1.5, and 1.7 rightly pass the candidate string as a

36
Figure 1.6: The DFA discussed in Example 1.7

parameter. A full program would be complicated by several constraints, including the awkward way in
which strings must be handled in Pascal. To highlight the correspondence between the code modules
and the automata definitions, the program given in Figure 1.8 handles input at the character level rather
than at the word level. The definitions in the procedure Initialize reflect the structure of the DFA
shown in Figure 1.9. Invoking this program will produce a response to a single input word. For example,
a typical exchange would be

cba
Rejected

Running this program again might produce

cccc
Accepted

This behavior is essentially the same as that of the C program shown in Figure 1.10. The succinct coding
clearly shows the relationship between the components of the quintuple for the DFA and the correspond-
ing code.

Definition 1.14 Given an alphabet Σ, L is a language over the alphabet Σ iff L ⊆ Σ∗ .

A language is a collection of words over some alphabet. If the alphabet is denoted by Σ, then a lan-
guage L over Σ is a subset of Σ∗ . Since L ⊆ Σ∗ , L may be finite or infinite. Clearly, the words used in the
English language are a subset of words over the Roman alphabet and this collection is therefore a lan-
guage according to our definition. Note that a language L, in this context, is simply a list of words; neither
syntax nor semantics are involved in the specification of L. Thus, a language as defined by Definition 1.14
has little of the structure or relationships one would normally expect of either a natural language (like
English) or a programming language (like Pascal).

Example 1.8
Some other examples of valid languages are

i. ;

1}∗ | |w| > 5}

0,1
ii. {w ∈ {0

37
function Accept(W:Word):Boolean;
{returns TRUE iff Wis accepted by the DFA}
begin
Accept: = DeltaBar (s0, W) in FinalState
end; {Accept}

Figure 1.7: A Pascal implementation of a test for acceptance

iii. {λ}

iv. {λ, bilbo, frodo, samwise}

b }∗ | |x|a = |x|b }
a ,b
v. {x ∈ {a

The empty language, denoted by ; or { }, is different from {λ}, the language consisting of only the
empty word λ. Whereas the empty language consists of zero words, the language consisting of λ contains
one word (which contains zero letters). The distinction is analogous to an example involving sets of
numbers: the set {0}, containing only the integer 0, is still a larger set than the empty set.
Every DFA differentiates between words that do not reach final states and words that do. In this sense,
each automaton defines a language.

Definition 1.15 Given a DFA A = 〈Σ, S, s 0 , δ, F 〉, the language accepted by A, denoted L(A), is defined to
be
L(A) = {w ∈ Σ∗ | δ(s 0 , w) ∈ F }

L(A), the language accepted by a finite automaton A, is the set of all words w from Σ∗ for which
δ(s 0 , w) ∈ F . In order for a word w to be contained in L(B ), the path through the finite automaton B, as
determined by the letters in w, must lead from the start state to one of the final states.
For deterministic finite automata, the path for a given word w is unique: there is only one path since,
at any given state in the automaton, there is exactly one transition for each a ∈ Σ. This is not necessarily
the case for another variety of finite automaton, the nondeterministic finite automaton, as will be seen
in Chapter 4.

Definition 1.16 finite automaton definable Given an alphabet Σ, a language L ⊆ Σ∗ is finite automaton
definable (FAD) iff there exists some DFA B = 〈Σ, S, s 0 , δ, F 〉, such that L = L(B ).

0, 11} that contain an odd number of 1 s is finite automaton definable, as

The set of all words over {0,
evidenced by the automaton in Example 1.7, which accepts exactly this set of words.

1.3 Examples of Finite Automata

This section illustrates the definitions of the quintuples and the state transition diagrams for some non-
trivial automata. The following example and Example 1.11 deal with the recognition of tokens, an im-
portant issue in the construction of compilers.

38
program DFA(input, output);
{This program tests whether input strings are accepted by the }
{automaton displayed in Figure 1.9. The program expects input from}
{the keyboard, delimited by a carriage return. No error checking }
{is done; letters outside [’a’ .. ’c’] cause a range error. }
type
Sigma=’a’..’c’;
State= (s0, s1, s2);
var
TransitionTable : array [State, Sigma] of State;
FinalState : set of State;
function Delta(s : State; c : Sigma) : State;
begin
Delta :=TransitionTable[s,c]
end; { Delta }
function DeltaBar(s : State) : State;
var
t : State;
begin
t :=s;
{ Step through the keyboard input one letter at a time. }
while not eoln(input) do
begin
t :=Delta(t, input^);
get(input)
end;
DeltaBar :=t
end; { DeltaBar }
function Accept : boolean;
begin
Accept :=DeltaBar(s0) in FinalState
end; { Accept }
procedure Initialize;
begin
FinalState := [s2];
{ Set up the state transition table.
TransitionTable [s0,’a’] :=s1; TransitionTable [s0,’b’] :=s0;
TransitionTable [s0,’c’] :=s2; TransitionTable [s1,’a’] :=s2;
TransitionTable [s1,’b’] :=s0; TransitionTable [s1,’c’] :=s0;
TransitionTable [s2,’a’] :=s0; TransitionTable [s2,’b’] :=s0;
TransitionTable [s2,’c’] :=s1;
end; { Initialize
begin { DFA }
Initialize;
if Accept then
writeln(output, ’Accepted’)
else
writeln(output, ’Rejected’) 39
end. { DFA }

Figure 1.8: A Pascal program that emulates the DFA shown in Figure 1.9
Figure 1.9: The DFA emulated by the programs in Figures 1.8 and 1.10

Example 1.9
The set of FORTRAN identifiers is a finite automaton definable language. This statement can be proved
by verifying that the following machine accepts the set of all valid FORTRAN 66 identifiers. These identi-
fiers, which represent variable, subroutine, and array names, can contain from 1 to 6 (nonblank) charac-
ters, must begin with an alphabetic character, can be followed by up to 5 letters or digits, and may have
embedded blanks. In this example, we have ignored the difference between capital and lowercase letters,
and ¦ represents a blank.

Σ = ASCII
Γ = ASCII − {a
a ,b
b ,c,
c, . . .xx , y ,zz ,0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9, ¦}
S = {s 0 , s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 }
s0 = s0

δ a b c ...y z 0 1 . . .88 9 ¦ Γ
s0 s1 s1 s1 . . . s1 s1 s7 s7 . . . s7 s7 s0 s7
s1 s2 s2 s2 . . . s2 s2 s2 s2 . . . s2 s2 s1 s7
s2 s3 s3 s3 . . . s3 s3 s3 s3 . . . s3 s3 s2 s7
s3 s4 s4 s4 . . . s4 s4 s4 s4 . . . s4 s4 s3 s7
S4 s5 s5 s5 . . . s5 s5 s5 s5 . . . s5 s5 s4 s7
s5 s6 s6 s6 . . . s6 s6 s6 s6 . . . s6 s6 s5 s7
s6 s7 s7 s7 . . . s7 s7 s7 s7 . . . s7 s7 s6 s7
s7 s7 s7 s7 . . . s7 s7 s7 s7 . . . s7 s7 s7 s7

F = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 }

The entries under the column labeled Γ show the transitions taken for each member of the set Γ.
The state transition diagram of the machine corresponding to this quintuple is displayed in Figure 1.11.
Note that, while each of the 26 letters transition from s 0 to s 1 , a single arrow labeled a-z is sufficient to
denote all these transitions. Similarly, the transition labeled Σ from s 7 indicates that every element of the
alphabet follows the same path.

40
# include <stdio.h>

# define to_int(c) ((int) c-(int) ’a’)

# define FINAL_STATE s_2

enum state { s_0, s_1, s_2 };

/*
** This table implements the state transition function and is indexed by
** the current state and the current input letter
*/

enum state transition_table[3][3]={ { s_1, s_0, s_2 },

{ s_2, s_0, s_0 },
{ s_0, s_0, s_1 }
};

enum state delta(s, c)

enum state s;
{
enum state t;
char c;

t=s;

/*
** Step through the input one letter at a time.
*/
while ((char) (c = getchar()) != ’\n’)
t=delta(t, c);
return t;
}

main() {
if (delta_bar(s_0) == FINAL_STATE)
printf("Accepted\n");
else
printf("Rejected\n");
exit(0);
}

Figure 1.10: A C program that emulates the DFA shown in Figure 1.9

41
Figure 1.11: A DFA that recognizes valid FORTRAN identifiers

Figure 1.12: The DFA M discussed in Example 1.10

42
Table 1.1:

<sign> ::= +|−

Example 1.10

The DFA M shown in Figure 1.12 accepts only those strings that have an even number of b s and an even
number of a s. Thus
b }∗ | |x|a ≡ 0 mod 2 ∧ |x|b ≡ 0 mod 2}
a ,b
L(M ) = {x ∈ {a

The corresponding quintuple for M = 〈Σ, S, s 0 , δ, F 〉 has the following components:

Σ = {a
a ,b
b}
S = {〈0, 0〉, 〈0, 1〉, 〈1, 0〉, 〈1, 1〉}
s 0 = 〈0, 0〉

δ a b
〈0, 0〉 〈1, 0〉 〈0, 1〉
〈0, 1〉 〈1, 1〉 〈0, 0〉
〈1, 0〉 〈0, 0〉 〈1, 1〉
〈1, 1〉 〈0, 1〉 〈1, 0〉

F = {〈0, 0〉}

Note that the transition function can be succinctly specified by

δ(〈i , j 〉,a
a ) = 〈1 − i , j 〉 and δ(〈i , j 〉,b
b ) = 〈i , 1 − j 〉 for all i , j ∈ {0, 1}

See the exercises for some other problems involving congruence modulo 2.

Example 1.11

Consider a typical set of all real number constants in modified scientific notation format described by
the BNF in Table 1.1.

43
This set of productions defines real number constants like +192.
+192., since

<real constant> ⇒ <integer>..

<integer > . ⇒ <sign><natural>..
<sign><natural>. ⇒ + <natural>..
+ <natural>. ⇒ + <digit><natural>..
+1<natural>..
+ < digit><natural > . ⇒ +1
+1 +1<digit><natural>..
+1<natural>. ⇒ +1
+1 +1<digit><digit>..
+1<digit><natural>. ⇒ +1
+1
+1<digit><digit>. ⇒ +1 2.
+1<digit>2.
+1 2. ⇒ +192.
+1<digit>2.

Other possibilities are

1
3.1415
2.718281828
27.
42.42
1.0E − 32

while the following strings do not qualify:

.01
1. + 1
8.E − 10

The set of all real number constants that can be derived from the productions given in Table 1.1 is
a FAD language. Let R be the deterministic finite automaton defined below. The corresponding state
transition diagram is given in Figure 1.13.

Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,+
+,− E ,..}
−,E
S = {s 0 , s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 }
s0 = s0

δ 0 1 2 3 4 5 6 7 8 9 + − E .
s0 s2 s2 s2 s2 s2 s2 s2 s2 s2 s2 s1 s1 s7 s7
s1 s2 s2 s2 s2 s2 s2 s2 s2 s2 s2 s7 s7 s7 s7
s2 s2 s2 s2 s2 s2 s2 s2 s2 s2 s2 s7 s7 s7 s3
s3 s8 s8 s8 s8 s8 s8 s8 s8 s8 s8 s7 s7 s7 s7
s4 s5 s5 s5 s5 s5 s5 s5 s5 s5 s5 s6 s6 s7 s7
s5 s5 s5 s5 s5 s5 s5 s5 s5 s5 s5 s7 s7 s7 s7
s6 s5 s5 s5 s5 s5 s5 s5 s5 s5 s5 s7 s7 s7 s7
s7 s7 s7 s7 s7 s7 s7 s7 s7 s7 s7 s7 s7 s7 s7
s8 s8 s8 s8 s8 s8 s8 s8 s8 s8 s8 s7 s7 s4 s7

F = {s 2 , s 3 , s 5 , s 8 }

44
]tbh]

Figure 1.13: A DFA that recognizes real number constants

The language accepted by R, that is L(R), is exactly the set of all real number constants in modified
scientific notation format described by the BNF in Table 1.1.
For example, let x = 3.1415
3.1415:

δ(s 0 , x) = δ(s 0 ,3.1415

3.1415
3.1415)
= δ(δ(s 0 ,3
3),.1415
.1415
.1415)
= δ(s 2 ,.1415
.1415
.1415)
= δ(δ(s 2 ,..),1415
1415
1415)
= δ(s 3 ,1415
1415
1415)
= δ(δ(s 3 ,1
1),415
415
415)
= δ(s 8 ,415
415
415)
= δ(δ(s 8 ,4
4),15
15
15)
= δ(s 8 ,15
15
15)
= δ(δ(s 8 ,1
1),5
5)
= δ(s 8 ,5
5)
= δ(s 8 ,5
5)
= s8

s 8 ∈ F , and therefore 3.1415 ∈ L(R).

45
NOT gate AND gate OR gate NAND gate NOR gate
p ¬p p q p ∧q p q p ∨q p q p↑q p q p↓q
1 0 1 1 1 1 1 1 1 1 0 1 1 0
0 1 1 0 1 1 0 1 1 0 1 1 0 0
0 1 0 0 1 1 0 1 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1

Figure 1.14: Common logic gates and their truth tables

While many important classes of strings such as numerical constants (Example 1.11) and identifiers
(Example 1.9) are FAD, not all languages that can be described by BNF can be recognized by DFAs. These
limitations will be investigated in Chapters 8 and 9, and a more capable type of automaton will be defined
in Chapter 10.

1.4 Circuit Implementation of Finite Automata

Now that we have described the mathematical nature of deterministic finite automata, let us turn to
the physical implementation of such devices. We will investigate the sort of physical components that
actually go into the “brain” of, say, a vending machine. Recall that the basic building blocks of digital logic
circuits are logic gates; using 0 or False to represent a low voltage (ground) and 1 or True to represent a
higher voltage (often +5 volts), the basic gates have the truth tables shown in Figure 1.14.
Since our DFA will examine one letter at a time, we will generally need some type of timing mecha-
nism, which will be regulated by a clock pulse; we will read one letter per pulse and allow enough interim
time for transient signals to propagate through our network as we change states and move to the next
letter on the input tape. The clock pulse will alternate between high and low voltages, as shown in Figure
1.15. For applications such as vending machines, the periodic clock pulse would be replaced by a device
that pulsed whenever a new input (such as the insertion of a coin) was detected.
We need to retain the present status of the network (current state, letter, and so forth) as we move on
to the next input symbol. This is achieved through the use of a D flip-flop (D stands for data or delay),
which uses NAND gates and the clock signal to store the current value of, say, p 0 , between clock pulses.
The symbol for a D flip-flop (sometimes called a latch) is shown in Figure 1.16, along with the actual
gates that comprise the circuit.
The output, p and ¬p, will reflect the value of the input signal p 0 only after the high clock pulse is
received and will retain that value after the clock drops to low (even if p 0 subsequently changes) until the
next clock pulse comes along, at which time the output will reflect the new current value of p 0 . This is
best illustrated by referring to the NAND truth table and tracing the changes in the circuit. Begin with
clock = p = p 0 = 0 and ¬p = 1 , and verify that the circuit is stable. Now assume that p 0 changes to 1 , and
note that, although some internal values may change, p and ¬p remain at 0 and 1 , respectively; the old
value of p 0 has been “remembered” by the D flip-flop. Contrast this with the behavior when we strobe
the clock: assume that the clock now also changes to 1 so that we now have clock = p = p 0 = 1 , and p = 0 .
When the signal propagates through the network, we find that p and ¬p have changed to reflect the new
value of p 0 ; clock = p = p 0 = 1 , and ¬p = 0 .

46
Figure 1.15: A typical clock pulse pattern for latched circuits

We will also have to represent the letters of our input alphabet by high and low voltages (that is, com-
binations of 0 s and 1 s). The , for example, is quite naturally represented by 8 bits, a 1a 2a 3a 4a 5a 6a 7a 8 ,
where B , for example, has the bit pattern 01000010 (binary 66). One of these bit patterns should be re-
served for indicating the end of our input string <EOS>. Our convention will be to reserve binary zero
for this role, which means our ASCII end of string symbol would be 00000000 (or NULL). In actual appli-
cations using the ASCII alphabet, however, a more appropriate choice for <EOS> might be 00001101 (a
carriage return) or 00001010 (a line feed) or 00100000 (a space).
Our alphabets are likely to be far smaller than the ASCII character set, and we will hence need fewer
than 8 bits of information to encode our letters. For example, if Σ = {b b ,cc }, 2 bits, a 1 and a 2 , will suffice.
Our choice of encoding could be 00 = <EOS>, 01 = b , 10 = c , and 11 is unused.
In a similar fashion, we must encode state names. A machine with S = {r 0 , r 1 , r 2 , r 3 , r 4 , r 5 } would need
3 bits (denoted by t 1 , t 2 , and t 3 ) to represent the six states. The most natural encoding would be r 0 = 000 000,
r 1 = 001
001, r 2 = 010
010, r 3 = 011
011, r 4 = 100
100, and r 5 = 101
101, with the combinations 110 and 111 left unused.
Finally, a mechanism for differentiating between final and nonfinal states must be implemented (al-
though this need not be engaged until the <EOS> symbol is encountered). Recall that we must illuminate
the “acceptance light” if the machine terminates in a final state and leave it unlit if the string on the input
tape is instead rejected by the DFA. A second “rejection light” can be added to the physical model, and
exactly one of the two will light when <EOS> is scanned by the input head.

Example 1.12

When building a logical circuit from the definition of a DFA, we will find it convenient to treat <EOS> as
an input symbol, and define the state transition function for it by (∀s ∈ S)(δ(s 1 , <EOS>) = s). Thus, the
DFA in Figure 1.17a should be thought of as shown in Figure 1.17b. As we have only two states, a single
state bit will suffice, representing s 0 by t 1 = 0 and s 1 by t 1 = 1 . Since Σ = {b
b ,cc }, we will again use 2 bits, a 1
and a 2 , to represent the input symbols. As before, 00 = <EOS>, 01 = b, 10 = c, and 11 is unused.

47
Figure 1.16: (a) A data flip-flop or latch (b) The circuitry for a D flip-flop

Figure 1.17: (a) The DFA discussed in Example 1.12 (b) The expanded state transition diagram for the
DFA implemented in Figure 1.18

48
Determining the state transition function will require knowledge of the current state (represented by
the status of t 1 ) and the current input symbol (represented by the pair of bits a 1 and a 2 ). These three
input values will allow the next state t 10 to be calculated. From the δ function, we know that

δ(s 0 ,b
b ) = s0
δ(s 0 ,cc ) = s 1
δ(s 1 ,b
b ) = s0
δ(s 1 ,cc ) = s 0

These specifications correspond to the following four rows of the truth table for t 10 :

t1 a1 a2 t 10 t1 a1 a2 t 10
0 0 1 0 s0 0 1 =b s0
0 1 0 1 which represents s0 1 0 =c s1
1 0 1 0 s1 0 1 =b s0
1 1 0 0 s1 1 0 =c s1

Adding the state transitions for <EOS> and using * to represent the outcome for the two rows cor-
responding to the unused combination a 1a 2 = 11 fills out the eight rows of the complete truth table, as
shown in Table 1.2.

Table 1.2:

t1 a1 a2 t 10
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 *
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 *

If we arbitrarily assume that the two don’t-care combinations (*) are zero, the principle disjunctive
normal form of t 10 contains just two terms: (¬t 1 ∧ a 1 ∧ ¬a a 2 ) ∨ (t 1 ∧ ¬a
a 1 ∧ ¬a
a 2 ). It is profitable to reassign
the don’t-care value in the fourth row to 1 , since the expression can then be shortened to (¬t 1 ∧a a 1 )∨(t 1 ∧
a a
¬a 1 ∧ ¬a 2 ) by applying standard techniques for minimizing Boolean functions. Incorporating this into
a feedback loop with a D flip-flop provides the heart of the digital logic circuit representing the DFA, as
shown in Figure 1.18.
The accept portion of the circuitry ensures that we do not indicate acceptance when passing through
the final state; it is only activated when we are in a final state while scanning the <EOS> symbol. Similarly,
the reject circuitry can only be activated when the <EOS> symbol is encountered. When there are several
final states, this part of the circuitry becomes correspondingly more complex. It is instructive to follow
the effect a string such as bcc has on the above circuit. Define a i ( j ) as the j th value the bit a i takes on
as the string bcc is processed; that is, a i ( j ) is the value of a i during the j th clock pulse. We then have

49
Figure 1.18: The circuitry implementing the DFA discussed in Example 1.12

a 1 (1) = 0 a 2 (1) = 1 ⇒b
a 1 (2) = 1 a 2 (2) = 0 ⇒c
a 1 (3) = 1 a 2 (3) = 0 ⇒c
a 1 (4) = 0 a 2 (4) = 0 ⇒ <EOS>

Trace the circuit through four clock pulses (starting with t 1 = 0 ), and observe the current values that
t 1 assumes, noting that it corresponds to the appropriate state of the machine as each input symbol is
scanned.

Note that a six-state machine would require more and substantially larger truth tables. Since a state
encoding would now need to specify t 1 , t 2 , and t 3 , three different truth tables (for t 10 , t 20 , and t 30 ) must be
constructed to predict the next state transition. More significantly, the input variables would include t 1 ,
t 2 , t 3 , a 1 , and a 2 , making each table 32 rows long. Three D flip-flop feedback loops would be necessary
to store the three values t 1 , t 2 , and t 3 .

Also, physical logic circuits of this type have the disconcerting habit of initializing to some random
configuration the first time power is applied to the network. A true working model would thus need a
reset circuit to initialize each t i to 0 in order to ensure that the machine started in state s 0 . Slightly more
complex set-reset flip-flops can be used to provide a hardware solution to this problem. However, a sim-
ple algorithmic solution would require the input tape to have a leading start-of-string symbol <SOS>.
The definition of the state transition function should be expanded so that scanning the <SOS> symbol
from any state will automatically transfer control to s 0 . We will adopt the convention that <SOS> will be
represented by the highest binary code; in ASCII, for example, this would be 11111111
11111111, while in the pre-
ceding example it would be 11 11. To promote uniformity in the exercises, it is suggested that <SOS> should
always be given the highest binary code and <EOS> be represented by binary zero; as in the examples
given here, the symbols in Σ should be numbered sequentially according to their natural alphabetical or-
der. In a similar fashion, numbered states should be given their corresponding binary codes. The reader
should note, however, that other encodings might result in less complex circuitry.

50
Figure 1.19: The DFA discussed in Example 1.13

Example 1.13
As a more complex example of automaton circuitry, consider the DFA displayed in Figure 1.19. Two flip-
flops t 1 and t 2 will be necessary to represent the three states, most naturally encoded as s 0 = 00 00, s 1 = 01
01,
s 2 = 10
10, with s 3 = 11 unused. Employing both <SOS> and <EOS> encodings yields the DFA in Figure 1.20.
Note that we must account for the possibility that the circuitry might be randomly initialized to t 1 = 1
and t 2 = 1; we must ensure that scanning the <SOS> symbol moves us back into the “real” part of the
machine. Two bits of information (a a 1 and a 2 ) are also needed to describe the input symbols. Following
our conventions, we assign <EOS> = 00 00, a = 01
01, b = 10
10, and <SOS> = 11 11. The truth table for both the
transition function and the conditions for acceptance is given in Table 1.3.
In the first row, t 1 = 0 and t 2 = 0 indicate state s 0 , while a 1 = 0 and a 2 = 0 denote the <EOS> symbol.
Since δ(s 0 , <EOS>) = s 0 , t 10 = 0 and t 20 = 0. We do not want to accept a string that ends in s 0 , so accept= 0
also. The remaining rows are determined similarly. The (nonminimized) circuitry for this DFA is shown
in Figure 1.21.

1.5 Applications of Finite Automata

In this chapter we have described the simplest form of finite automaton, the DFA. Other forms of au-
tomata, such as nondeterministic finite automata, pushdown automata, and Turing machines, are in-
troduced later in the text. We close this chapter with three examples to motivate the material in the
succeeding chapters.
When presenting automata in this chapter, we made no effort to construct the minimal machine. A
minimal machine for a given language is one that has the least number of states required to accept that
language.

Example 1.14
In Example 1.5, the vending machine kept track of the amount of change that had been deposited up to
50¢. Since the candy bars cost only 30¢, there is no need to count up to 50¢. In this sense, the machine
is not optimal, since a less complex machine can perform the same task, as shown in Figure 1.22. The

51
Table 1.3:

t1 t2 a1 a2 t 10 t 20 accept
0 0 0 0 0 0 0
0 0 0 1 0 1 0
0 0 1 0 1 0 0
0 0 1 1 0 0 0
0 1 0 0 0 1 1
0 1 0 1 1 0 0
0 1 1 0 0 0 0
0 1 1 1 0 0 0
1 0 0 0 1 0 1
1 0 0 1 0 0 0
1 0 1 0 0 1 0
1 0 1 1 0 0 0
1 1 0 0 * * *
1 1 0 1 * * *
1 1 1 0 * * *
1 1 1 1 0 0 0

Figure 1.20: The expanded state transition diagram for the DFA implemented in Figure 1.21

52
Figure 1.21: The circuitry implementing the DFA discussed in Example 1.13

n ,d
corresponding quintuple is 〈{n q }, {s 0 , s 5 , s 10 , s 15 , s 20 , s 25 , s 30 }, s 0 , δ, {s 30 }〉, where for each state s i , δ is
d ,q
defined by

δ(s i ,n
n ) = s mi n{30,i +5)}
δ(s i ,d
d ) = s mi n{30,i +10}
δ(s i ,q
q ) = s mi n{30,i +25}

Note that the higher-numbered states in Example 1.5 were all effectively “remembering” the same
thing, that enough coins had been deposited. These final states have been coalesced into a single final
state to produce the more efficient machine in Figure 1.22. In the next two chapters, we develop the the-
oretical background and algorithms necessary to construct from an arbitrary DFA the minimal machine
that accepts the same language.
As another illustration of the utility of concepts relating to finite-state machines, we will consider the
formalism used by many text editors to search for a particular target string pattern in a text file. To find
ababb in a file, for example, a naive approach might consist of checking whether the first five characters
of the file fit this pattern, and next checking characters 2 through 6 to find a match, and so on. This
results in examining file characters more than once; it ought to be possible to remember past values, and
avoid such duplication. Consider the text string aabababbb
aabababbb. By the time the fifth character is scanned,
we have matched the first four characters of ababb
ababb. Unfortunately, a , the sixth character of aabababbb
aabababbb,
does not produce the final match; however, since characters 4,5, and 6 (aba aba
aba) now match the first three
characters of the target string, it does allow for the possibility of characters 4 through 8 matching (as is
indeed the case in this example). This leads to a general rule: If we have matched the first four letters of

53
Figure 1.22: The automaton discussed in Example 1.14

Figure 1.23: A DFA that accepts strings containing ababb

the target string, and the next character happens to be a (rather than the desired b ), we must remember
that we have now matched the first three letters of the target string.
“Rules” such as these are actually the state transitions in the DFA given in the next example. State
s i represents having matched the first i characters of the target string, and the rule developed above is
succinctly stated as δ(s 4 ,a
a ) = s3 .

Example 1.15
A DFA that accepts all strings that contain ababb as a substring is displayed in Figure 1.23. The corre-
sponding quintuple is
b }, {s 0 , s 1 , s 2 , s 3 , s 4 , s 5 }, s 0 , δ, {s 5 }〉,
a ,b
〈{a

where δ is defined by

δ a b
s0 s1 s0
s1 s1 s2
s2 s3 s0
s3 s1 s4
s4 s3 s5
s5 s5 s5

It is a worthwhile exercise to test the operation of this DFA on several text strings and verify that the
automaton is indeed in state s i exactly when it has matched the first i characters of the target string. Note

54
that if we did not care what the third character of the substring was (that is, if we were searching for occur-
rences of ababb or abbbb
abbbb), a trivial modification of the above machine would allow us to search for both
substrings at once, as shown in Figure 1.24. The corresponding quintuple is 〈{a b }, {s 0 , s 1 , s 2 , s 3 , s 4 , s 5 }, s 0 , δ, {s 5 }〉,
a ,b
where δ is defined by

δ a b
s0 s1 s0
s1 s1 s2
s2 s3 s3
s3 s1 s4
s4 s3 s5
s5 s5 s5

Figure 1.24: A DFA that accepts strings that contain either ababb or abbbb

ab
ab) and the terminal
In this case, we required one letter between the initial part of the search string (ab
bb
bb). It is possible to modify the machine to accept strings that contain ab
part (bb ab, followed by any num-
ber of letters, followed by bbbb. This type of machine would be useful for identifying comments in many
programming languages. For example, a Pascal comment is essentially of the form (∗ , followed by most
combinations of letters, followed by the first occurrence of ∗ )).
It should be noted that the machine in Example 1.15 is highly specialized and tailored for the spe-
cific string ababb
ababb; other target strings would require completely different recognizers. While it appears
to require much thought to generate the appropriate DFA for a given string, we will see how the tools
presented in Chapter 4 can be used to automate the entire process.
Example 1.15 indicates how automata can be used to guide the construction of software for matching
designated patterns. Finite-state machines are also useful in designing hardware that detects designated
sequences. Example 4.7 will explore a communications application, and the following discussion illus-
trates how these concepts can be applied to help evaluate the performance of computers.
A computer program is essentially a linear list of machine instructions, stored in consecutive mem-
ory locations. Each memory location holds a sequence of bits that can be thought of as words comprised
of 0 s and 1 s. Different types of instructions are represented by different patterns of bits. The CPU se-
quentially fetches these instructions and chooses its next action by examining the incoming bit pattern
to determine the type of instruction that should be executed. The sequences of bits that encode the
instruction type are called opcodes.
Various performance advantages can be attained when one part of the CPU prefetches the next in-
struction while another part executes the current instruction. However, computers must have the ca-
pability of altering the order in which instructions are executed; branch instructions allow the CPU to
avoid the anticipated next instruction and instead begin executing the instructions stored in some other

55
area of memory. When a branch occurs, the prefetched instruction will generally need to be replaced by
the proper instruction from the new area of memory. The consequent delay can degrade the speed with
which instructions are executed.
Irrespective of prefetching problems, it should be clear that a branch instruction followed immedi-
ately by another branch instruction is inefficient. If a CPU is found to be regularly executing two or more
consecutive branch instructions, it may be worthwhile to consider replacing such series of branches with
a single branch to the ultimate destination [FERR]. Such information would be determined by monitor-
ing the instruction stream and searching for patterns that represented consecutive branch opcodes. This
activity is essentially the pattern recognition problem discussed in Example 1.15.
It is unwise to try to collect the data representing the contents of the instruction stream on secondary
storage so that it can be analyzed later. The volume of information and the speed with which it is gener-
ated preclude the collection of a sufficiently large set of data points. Instead, the preferred solution uses
a specially tailored piece of hardware to monitor the contents of the CPU opcode register and increment
a hardware counter each time the appropriate patterns are detected. The heart of this monitor can be
built by transforming the appropriate automaton into the corresponding logic circuitry, as outlined in
Section 1.4. Unlike the automaton in Example 1.15, the automaton model for this application would
allow transitions out of the final state, so that it may continue to search for successive patterns. The re-
sulting logic circuitry would accept as input the bit patterns currently present in the opcode register, and
send a pulse to the counter mechanism each time the accept circuitry was energized.
Note that in this case we would not want to inhibit the accept circuitry by requiring an <EOS> symbol
to be scanned. Indeed, we want the light on our conceptual black box to flicker as we process the data,
since we are intent on counting the number of times it flickers during the course of our monitoring.

Example 1.16
We close this chapter with an illustration of the manner in which computational algorithms can prof-
itably use the automaton abstraction. Network communications between independent processors are
governed by a protocol that implements a finite state control [TANE]. The Kermit protocol, developed
at Columbia University, was widely employed to communicate between processors and is still most of-
ten used for its original purpose: to transfer files between micros and mainframes [DACR]. During a file
transfer, the send portion of Kermit on the source host is responsible for delivering data to the receive
portion of the Kermit process on the destination host. The receive portion of Kermit reacts to incoming
data in much the same way as the machines presented in this chapter. The receive program starts in a
state of waiting for a transfer request (in the form of an initialization packet) to signal the commence-
ment of a file transfer (state R in Figure 1.25). When such a packet is received, Kermit transitions to the RF
state, where it awaits a file-header packet (which specifies the name of the file about to be transferred).
Upon receipt of the file-header packet, it enters the RD state, where it processes a succession of data
packets (which comprise the body of the file being transferred). An EOF packet should arrive after all the
data are sent, which can then be followed by another file-header packet (if there is a sequence of files to
be transferred) or by a break packet (if the transfer is complete). In the latter case, Kermit reverts to the
start state R and awaits the next transfer request. The send portion of the Kermit process on the source
host follows the behavior of a slightly more complex automaton. The state transition diagram given in
Figure 1.25 succinctly describes the logic of the receive portion of the Kermit protocol; for simplicity,
timeouts and error conditions are not reflected in the diagram. The input alphabet is {B,D,Z,H,S}, where
B represents a break, D is a data packet, Z is EOF, H is a file-header packet, and S is a send-intention
packet. The state set is {A,R,RF,RD}, where A denotes the abort state, R signifies receive, RF is receive

56
Figure 1.25: The state transition diagram for the receive portion of Kermit, as discussed in Example 1.16

fileheader, and RD is receive data. Note that unexpected packets (such as a data packet received in the
start state R or a break packet received when data packets are expected in state RD) cause a transition to
the abort state A.
In actuality, the receive protocol does more than just observe the incoming packets; Kermit sends an
acknowledgment (ACK or NAK) of each packet back to the source host. Receipt of the file header should
also cause an appropriate file to be created and opened, and each succeeding data packet should be
verified and its contents placed sequentially in the new file. A machine model that incorporates actions
in response to input is the subject of Chapter 7, where automata with output are explored.

EXERCISES
1.1. Recall how we defined δ in this chapter:

a ∈ Σ)
(∀s ∈ S)(∀a δt (s,aa ) = δ(s,a
a)
(∀s ∈ S) δt (s, λ) = s
(∀s ∈ S)(∀x ∈ Σ∗ )(∀a
a ∈ Σ) δt (s,aa x) = δt (δ(s,a
a ), x)

δ, here denoted δt , was tail recursive. Tail recursion means that all recursion takes place at the end
of the string. Let us now define an alternative extended transition function, δh , thusly:

a ∈ Σ)
(∀s ∈ S)(∀a δh (s,aa ) = δ(s,a
a)
(∀s ∈ S) δh (s, λ) = s
a ∈ Σ)(∀x ∈ Σ∗ )
(∀s ∈ S)(∀a δh (s, xaa ) = δ(δh (s, x),a
a)

57
It is clear from the definition of δh that all the recursion takes place at the head of the string. For this
reason, δh is called head recursive. Show that the two definitions result in the same extension of δ,
that is, prove by mathematical induction that

(∀s ∈ S)(∀x ∈ Σ∗ )(δt (s, x) = δh (s, x))

1.2. Consider Example 1.14. The vending machine accepts coins as input, but if you change your mind
(or find you do not have enough change), it will not refund your money. Modify this example to have
another input, <coin-return>, which is represented by r and which will conceptually return all your
coins.

1.3. (a) Specify the quintuple corresponding to the DFA displayed in Figure 1.26.
(b) Describe the language defined by the DFA displayed in Figure 1.26.

Figure 1.26: The automaton discussed in Exercise1.3

1.4. Construct a state transition diagram and enumerate all five parts of a deterministic finite automaton
b ,cc }, S, s 0 , δ, F 〉 such that
a ,b
A = 〈{a

L(A) = {x | |x| is a multiple of 2 or 3}.

1.5. Let Σ = {0
0,11}. Construct deterministic finite automata that will accept each of the following lan-
guages, if possible.

(a) L 1 = {x | |x| mod 7 = 4}

(b) L 2 = Σ∗ − {w | ∃n ≥ 1 3 w = a 1 . . . a n ∧ a n = 1}
(c) L 3 = {y | |y|0 = |y|1 }

1.6. Let Σ = {a
a ,b
b }.

(a) Construct deterministic finite automata A 1 , A 2 , A 3 , and A 4 such that:

i. L(A 1 ) = {x | (|x|a is odd) ∧ (|x|b is even)}
ii. L(A 2 ) = {y | (|y|a is even) ∨ (|y|b is odd)}
iii. L(A 3 ) = {z | (|z|a is even)∨(|z|b is even)}(∨ represents exclusive-or)
iv. L(A 4 ) = {z | |z|a is even}
(b) How does the structure of each of these machines relate to the one defined in Example 1.10?

1.7. Modify the machine M defined in Example 1.10 so that the language accepted by the machine con-
b }∗ , where both |x|a and |x|b are even and |x| > 0, that is, the new machine
a ,b
sists of strings x ∈ {a
should accept L(M ) − {λ}.

58
1.8. Let M = 〈Σ, S, s 0 , δ, F 〉 be an (arbitrary) DFA that accepts the language L(M ). Write down a general
procedure for modifying this machine so that it will accept L(M ) − {λ}. (Specify the five parts of the
new machine and justify your statements.) It may be helpful to do this for a specific machine (as in
Exercise 1.7) before attempting the general case.

1.9. Let M = 〈Σ, S, s 0 , δ, F 〉 be an (arbitrary) DFA that accepts the language L(M ). Write down a general
procedure for modifying this machine so that it will accept L(M ) ∪ {λ}. (Specify the five parts of the
new machine and justify your statements.)

1.10. Let Σ = {a
a ,b d } and Ψ = {x ∈ Σ∗ | (x begins with d ) ∨ (x contains two consecutive b s)}.
b ,d

(a) Draw a machine that will accept Ψ.

(b) Formally specify the five parts of the DFA from part (a).

1.11. Let Σ = {a b ,cc } and Φ = {x ∈ Σ∗ | every b in x is immediately followed by c }.

a ,b

(a) Draw a machine that will accept Φ.

(b) Formally specify the five parts of the DFA from part (a).

1.12. Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,66,7
7,8 9}. Consider the base 10 numbers formed by strings from Σ∗ : 14 repre-
8,9
sents fourteen, the three-digit string 205 represents two hundred and five, and so on. Let Ω = {x ∈ Σ∗ |
0,00
the number represented by x is evenly divisible by 7} = {λ,0 00 000
00,000 7,07
000, . . . ,7 07 007
07,007 14
007, . . . ,14 21
14,21 28
21,28 35
28,35
35, . . .}.

(a) Draw a machine that will accept Ω.

(b) Formally specify the five parts of the DFA from part (a).

1.13. Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8 9}. Let Γ = {x ∈ Σ∗ | the number represented by x is evenly divisible by 3}.
8,9

(a) Draw a three-state machine that will accept Γ.

(b) Formally specify the five parts of the DFA from part (a).

1.14.

1.15. Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8 9}. Let K = {x ∈ Σ∗ | the number represented by x is evenly divisible by 5}.
8,9

(a) Draw a five-state DFA that accepts K .

(b) Formally specify the five parts of the DFA from part (a).
(c) Draw a two-state DFA that accepts K .
(d) Formally specify the five parts of the DFA from part (c).

1.16. Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9}. Draw a DFA that accepts the first eight primes.

1.17. (a) Find all ten combinations of u, v, and w such that uv w = c ab (one such combination is u = c ,
v = λ, w = ab
ab).
(b) In general, if x is of length n, and uv w = x, how many distinct combinations of u, v, and w
will satisfy this constraint?

1.18. Let Σ = {a b } and Ξ = {x ∈ Σ∗ |x contains (at least) two consecutive b s ∧ x does not contain two
a ,b
consecutive a s}. Draw a machine that will accept Ξ.

59
1.19. . The FORTRAN identifier in Example 1.9 recognized all alphabetic words, including those like DO,
DATA, END, and STOP, which have different uses in FORTRAN. Modify Figure 1.11 to produce a DFA
that will also reject the words DO and DATA while still accepting all other valid FORTRAN identifiers.

1.20. Consider the machine defined in Example 1.11. This machine accepts most real number constants
in scientific notation. However, this machine does have some (possibly desirable) limitations. These
limitations include requiring that a 0 precede the decimal point when specifying a number with a
mantissa less than 1.

(a) Modify Figure 1.13 so that it will accept the set of real-number constants described by the
following BNF.
<sign> ::= + |−
−
1|2
<digit> ::= 0 |1 2|3
3|4
4|5
5|6
6|7
7|8
8|9
9
<natural> ::=<digit>|<digit><natural>
<integer> ::=<natural>|<sign><natural>
<real constant> ::=<integer>
:: <integer>.
:: .<natural>
:: <sign>.<natural>
:: .<natural>E<integer>
:: <sign>.<natural>E<integer>
:: <integer>.<natural>
:: <integer>.<natural>E<integer>
Write a program in your favorite programming language to implement the automaton
derived in part (a). The program should read a line of text and state whether or not the word
on that line was accepted.

1.21. Show that part (i) of Definition 1.11 is implied by parts (ii) and (iii) of that definition.

1.22. Develop a more succinct description of the transition function given in Example 1.9 (compare with
the description in Example 1.10).

b}∗ . Give an example of

a, b
1.23. Let the universal set be {a,

(a) A finite set.

(b) A cofinite set.
(c) A set that is neither finite nor cofinite.

1.24. Consider the DFA given in Figure 1.27.

(a) Specify the quintuple for this machine.

(b) Describe the language defined by this machine.

1.25. Consider the set consisting of the names of everyone in China. Is this set a FAD language?

A ,B
1.26. Consider the set of all legal infix arithmetic expressions over the alphabet {A B ,+ −, ∗ ,//} without
+,−
parentheses (assume normal precedence rules apply). Is this set a FAD language? If so, draw the
machine.

60
Figure 1.27: The DFA discussed in Exercise 1.23

1.27. Consider an arbitrary deterministic finite automaton M .

(a) What aspect of the machine determines whether λ ∈ L(M )?

(b) Specify a condition that would guarantee that L(M ) = Σ∗ .
(c) Specify a condition that would guarantee that L(M ) = ;.

1.28. Construct deterministic finite automata to accept each of the following languages.

b ,cc }∗ | abc is a substring of x}

a ,b
(a) {x ∈ {a
b ,cc }∗ | ac aba is a substring of x}
a ,b
(b) {x ∈ {a

1.29. Consider Example 1.14. The vending machine had as input nickels, dimes, and quarters. When 30¢
had been deposited, a candy bar could be selected. Modify this machine to also accept pennies,
denoted by p, as an additional input. How does this affect the number of states in the machine?

1.30. (a) Describe the language defined by the following quintuple (compare with Figure 1.28).
Σ = {a
a ,b b} δ(t 0 ,a
a ) = t0
S = {t 0 , t 1 } δ(t 0 ,b
b ) = t1
s0 = t0 δ(t 1 ,a
a ) = t1
F = {t 1 } δ(t 1 ,b
b ) = t0
(b) Rigorously prove the statement you made in part (a). Hint: First prove the inductive statement

P (n) : (∀x ∈ Σn )((δ(t 0 , x) = t 0 ⇔ |x|b is even) ∧ (δ(t 0 , x) = t 1 ⇔ |x|b is odd)).

1.31. Consider a vending machine that accepts as input pennies, nickels, dimes, and quarters and dis-
penses 10¢ candy bars.

(a) Draw a DFA that models this machine.

61
Figure 1.28: The DFA discussed in Exercise 1.29

(b) Define the quintuple for this machine.

1.32. Consider a vending machine that accepts as input nickels, dimes, and quarters and dispenses 10¢
candy bars.

(a) Draw a DFA that models this machine.

(b) How many states are absolutely necessary to build this machine?
(c) Using the standard encoding conventions, draw a circuit diagram for this machine (include
<EOS> but not <SOS> in the input alphabet).

1.33. Using the standard encoding conventions, draw a circuit diagram that will implement the machine
given in Exercise 1.29, as follows:

(a) Implements both <EOS> and <SOS>.

(b) Uses neither <EOS> nor <SOS>.

1.34. Using the standard encoding conventions, draw a circuit diagram that will implement the machine
given in Exercise 1.7, as follows:

(a) Implements both <EOS> and <SOS>.

(b) Uses neither <EOS> nor <SOS>.

1.35. Modify Example 1.12 so that it correctly handles the <SOS> symbol; draw the new circuit diagram.

1.36. Using the standard encoding conventions, draw a circuit diagram that will implement the machine
given in Example 1.6, as follows:

(a) Implements both <EOS> and <SOS>.

(b) Uses neither <EOS> nor <SOS>.

1.37. Using the standard encoding conventions, draw a circuit diagram that will implement the machine
given in Example 1.10, as follows:

(a) Implements both <EOS> and <SOS>.

(b) Uses neither <EOS> nor <SOS>.

1.38. Using the standard encoding conventions, draw a circuit diagram that will implement the machine
given in Example 1.14 (include <EOS> but not <SOS> in the input alphabet).

62
1.39. Using the standard encoding conventions, draw a circuit diagram that will implement the machine
given in Example 1.16; include the <SOS> and <EOS> symbols.

1.40. Let Σ = {a
a ,b
b ,cc }. Let L = {x ∈ {a b ,cc }∗ | |x|b = 2}.
a ,b

(a) Draw a DFA that accepts L.

(b) Formally specify the five parts of a DFA that accepts L.

1.41. Draw a DFA accepting {x ∈ {a b ,cc }∗ | every b in x is eventually followed by c }; that is, x might look
a ,b
like baabac aa
aa, or bcacc
bcacc, and so on.

1.42. Let Σ = {a
a ,b
b }. Consider the language consisting of all words that have neither consecutive a s nor
consecutive b s.

(a) Draw a DFA that accepts this language.

(b) Formally specify the five parts of a DFA that accepts L.

1.43. Let Σ = {a
a ,b
b ,cc }. Let L = {x ∈ {a b ,cc }∗ | |x|a ≡ 0 mod 3}.
a ,b

(a) Draw a DFA that accepts L.

(b) Formally specify the five parts of a DFA that accepts L.

1.44. Let Σ = {aa ,b

b ,((,∗
∗,))}. Recall that a Pascal comment is essentially of the form: (∗ followed by most
combinations of letters followed by the first occurrence of ∗) ∗). While the appropriate alphabet for
Pascal is the ASCII character set, for simplicity we will let Σ = {aa ,b
b ,((,∗
∗,))}. Note that (∗b(∗b(a)b∗)
is a single valid comment, since all characters prior to the first ∗) (including the second (∗ ) are
considered part of the comment. Consequently, comments cannot be nested.

(a) Draw a DFA that recognizes all strings that contain exactly one valid Pascal comment (and no
illegal portions of comments, as in aa(∗b∗)b(∗a)
aa(∗b∗)b(∗a), which should be rejected).
(b) Draw a DFA that recognizes all strings that contain zero or more valid (that is, unnested) Pascal
comments. For example, a(∗b(∗bb∗)ba∗)aa and a(∗b are not valid, while a()a(∗∗)b(∗ab∗)
is valid.

1.45. A ,B
(a) Is the set of all postfix expressions over {A B ,+
+,− ∗,//} with two or fewer operators a FAD lan-
−,∗
guage? If it is, draw a machine.
A ,B
(b) Is the set of all postfix expressions over {A B ,+
+,− ∗,//} with four or fewer operators a FAD lan-
−,∗
guage? If it is, draw a machine.
A ,B
(c) Is the set of all postfix expressions over {A B ,+
+,− ∗,//} with eight or fewer operators a FAD
−,∗
language? If it is, draw a machine.
A ,B
(d) Do you think the set of all postfix expressions over {A B ,+
+,− ∗,//} is a FAD language? Why or
−,∗
why not?

1.46. Let Σ = {a
a ,b
b ,cc }. Consider the language consisting of all words that begin and end with different
letters.

(a) Draw a DFA that accepts this language.

(b) Formally specify the five parts of a DFA that accepts this language.

63
1.47. Let Σ = {a
a ,b
b ,cc }.

(a) Draw a DFA that rejects all words for which the last two letters match.
(b) Formally specify the five parts of the DFA.

1.48. Let Σ = {a
a ,b
b ,cc }.

(a) Draw a DFA that rejects all words for which the first two letters match.
(b) Formally specify the five parts of the DFA.

1.49. Prove that the empty word is unique; that is, using the definition of equality of strings, show that if
x and y are empty words then x = y.

1.50. For any two strings x and y, show that |x y| = |x| + |y|

1.51. (a) Draw the DFA corresponding to C = 〈{a b ,cc }, {t 0 , t 1 }, t 0 , δ, {t 1 }〉, where
a ,b
δ(t 0 ,aa ) = t0 δ(t 1 ,aa ) = t0
δ(t 0 ,bb ) = t1 δ(t 1 ,bb ) = t1
δ(t 0 ,cc ) = t 1 δ(t 1 ,cc ) = t 0
(b) Describe L(C ).
(c) Using the standard encoding conventions, draw a circuit diagram for this machine (include
<EOS> but not <SOS> in the input alphabet).

1.52. Let Σ = {II ,V

V , X ,L
L ,C
C ,D
D ,M
M }. Recall that V V I is not considered to be a Roman numeral.

(a) Draw a DFA that recognizes strict-order Roman numerals; that is, 9 must be represented by
V I I I I rather than I X , and so on.
(b) Draw a DFA that recognizes the set of all Roman numerals; that is, 9 can be represented by I X ,
40 by X L
L, and so on.
(c) Write a Pascal program based on your answer to part (b) that recognizes the set of all Roman
numerals.

1.53. Describe the set of words accepted by the DFA in Figure 1.9.

1.54. Let Σ = {00,1

1,2
2,33,4
4,5
5,6
6,7
7,8
8,9
9}. Let
L n = {x ∈ Σ∗ | the sum of the digits of x is evenly divisible by n}. Thus,
0,7
L 7 = {λ,0 7,00
00 07
00,07 16
07,16 25
16,25 34
25,34 43
34,43 52
43,52 59
52,59 61
59,61 68
61,68 70
68,70 77
70,77 86
77,86 95
86,95 000
95,000 007
000,007
007, . . .}.

(a) Draw a machine that will accept L 7 .

(b) Formally specify the five parts of the DFA given in part (a).
(c) Draw a machine that will accept L 3 .
(d) Formally specify the five parts of the DFA given in part (c).
(e) Formally specify the five parts of a DFA that will recognize L n .

1.55. Consider the last row of Table 1.3. Unlike the preceding three rows, the outputs in this row are not
marked with the don’t-care symbol. Explain.

64
Chapter 2

Characterization of Finite Automaton

Definable Languages

Programming languages can be thought of, in a limited sense, as conforming to the definition of a lan-
guage given in Chapter 1. We can consider a text file as being one long “word,” that is, a string of charac-
ters (including spaces, carriage returns, and so on). In this sense, each Pascal program can be thought of
as a single word over the ASCII alphabet. We might define the language Pascal as the set of all valid Pascal
programs (that is, the valid “words” are those text files that would compile with no compiler errors). This
and many other languages are too complicated to be represented by the machines described in Chapter
1. Indeed, even reliably matching an unlimited number of begin and end statements in a file is beyond
the capabilities of a DFA.
The goals for this chapter are to develop some tools for identifying these non-FAD languages and to
investigate the underlying structure of finite automaton definable languages. We begin with the explo-
ration of the relations that describe that structure.

2.1 Right Congruences

To characterize the structure of FAD languages, we will be dealing with relations over Σ∗ , that is, we
will relate strings to other strings. Recall that an equivalence relation must be reflexive, symmetric, and
transitive. The identity relation over Σ∗ , in which each string is related to itself but to no other string, is
an example of an equivalence relation.
The main tool we will need to understand which kinds of languages can be represented by finite au-
tomata is the concept of a right congruence. If we allow the set of all strings that terminate in some given
state to define an equivalence class, the states of a DFA naturally partition Σ∗ into equivalence classes (as
formally presented later in Definition 2.4). Due to the structure imposed on the machine, these classes
have special relationships that are not found in ordinary equivalence relations. For example, if δ(s, a) = t ,
then, given any word x in the class corresponding to the state s, appending an a to this word to form xa is
guaranteed to produce a word listed in the class corresponding to the state t . Right congruences, defined
below, allow us to break up Σ∗ in the same fashion that a DFA breaks up Σ∗ .

Definition 2.1 Given an alphabet Σ, a relation R between pairs of strings (R ⊆ Σ∗ × Σ∗ ) is a right congru-
ence in Σ∗ iff the following four conditions hold:

65
(∀x ∈ Σ∗ ) (x R x) (R)
(∀x, y ∈ Σ∗ ) (x R y ⇒ y R x) (S)
(∀x, y, z ∈ Σ∗ ) (x R y ∧ y R z ⇒ x R z) (T )
(∀x, y ∈ Σ∗ ) (x R y ⇒ (∀u ∈ Σ∗ )(xu R yu)) (RC )

Note that if P is a right congruence then the first three conditions imply that P must be an equiva-
lence relation; for example, if Σ = {a
a ,b
b }, aa P aa by reflexivity, and if 〈abb
abb aba
abb,aba
aba〉 ∈ P , then by symmetry
aba
〈aba abb
aba,abb
abb〉 ∈ P , and so forth. Furthermore, if abb P aba
aba, then the right congruence property guarantees
that

abba P abaa if u = a
abbb P abab if u = b
abbaa P abaaa if u = aa
abbbbaabb P ababbaabb if u = bbaabb

and so on. Thus, the presence of just one ordered pair in P requires the existence of many, many more
ordered pairs. This might seem to make right congruences rather rare objects; there are, however, an
infinite number of them, many of them rather simple, as shown by the following examples.

Example 2.1
Let Σ = {a a ,b
b }, and let R be defined by x R y ⇔ |x| − |y| is even. It is easy to show that this R is an
equivalence relation (see the exercises) and partitions Σ∗ into two equivalence classes: the even-length
words and the odd-length words. Furthermore, R is a right congruence: for example, if x = abb and
y = baabb
baabb, then abb R baabbbaabb, since |x| − |y| = 3 − 5 = −2, which is even. Note that for any choice of u,
abb
abbu R baabb
baabbu, since |xu| − |yu| will also be −2. Thus abbabbu R baabb
baabbu for every choice of u. The same
is true for any other pair of words x and y that are related by R, and so R is indeed a right congruence.

Example 2.2
Let Σ = {a
a ,b
b ,cc }, and let R 2 be defined by x R 2 y ⇔ x and y end with the same letter. It is straightforward
to show that R 2 is a right congruence (see the exercises) and partitions Σ∗ into four equivalence classes:
those words ending in a , those words ending in b , those words ending in c , and {λ}.
The relation R 2 was based on the placement of letters within words, while Example 2.1 was based
solely on the length of the words. The following definition illustrates a way to produce a relation in Σ∗
based on a given set of words

Definition 2.2 Given an alphabet Σ, and a language L ⊆ Σ∗ , the relation induced by L in Σ∗ , denoted by
R L , is defined by
(∀x, y ∈ Σ∗ )(x R L y ⇔ (∀u ∈ Σ∗ )(xu ∈ L ⇔ yu ∈ L))

Example 2.3
Let K be the set of all words over {a a ,b
b } that are of odd length. Those strings that are in K are used to
define exactly which pairs of strings are in R K . For example, we can determine that ab R K bbaa
bbaa, since it
is true that, for any u ∈ Σ∗ , either ab
abu ∉ K and bbaa
bbaau ∉ K (when |u| is even) or ab
abu ∈ K and bbaa
bbaau ∈ K
(when |u| is odd). Note that ab and a are not related by R K , since there are choices for u that would

66
abλ ∉ K and yet a λ ∈ K . In this case, R K turns out to be the same as the
violate the definition of R K : ab
relation R defined in Example 2.1.
Recall that relations are sets of ordered pairs, and thus the claim that these two relations are equal
means that they are equal as sets; an ordered pair belongs to R exactly when it belongs to R K :

R = R K iff (∀x, y ∈ Σ∗ )(x R y ⇔ x R K y)

The strings ab and bbaa are related by R in Example 2.1, and they are likewise related by R K . A similar
statement is true for any other pair that was in the relation R; it will be in R K also. Additionally, it can be
shown that elements that were not in R will not be in R K either.
Notice that R K relates more than just the words in K ; neither ab nor bbaa belongs to K , and yet they
were related to each other. This simple language K happens to partition Σ∗ into two equivalence classes,
corresponding to the language itself and its complement. Less trivial languages will often form many
equivalence classes. The relation R L defined by a language L has all the properties given in Definition
2.1.

Theorem 2.1 Let Σ be an alphabet. If L is any language over Σ (that is, L ⊆ Σ∗ ), the relation R L given in
Definition 2.2 must be a right congruence.
Proof. See the exercises.

Note that the above theorem is very broad in scope: any language, no matter how complex, always
induces a relation that satisfies all four properties of a right congruence. Thus, R L always partitions Σ∗
into equivalence classes. One useful measure of the complexity of a language L is the degree to which it
fragments Σ∗ , that is, the number of equivalence classes in R L .

Definition 2.3 Given an equivalence relation P , the rank of P , denoted r k(P ), is defined to be the number
of distinct (and nonempty) equivalence classes of P .

The ranks of the relation in Example 2.3 was 2, since there were two equivalence classes, the set of
even-length words and the set of odd-length words. In Example 2.2, r k(R 2 ) = 4. The rank of R L can be
thought of as a measure of the complexity of the underlying language L. Thus, for K in Example 2.3,
r k(R K ) = 2, and K might consequently be considered to be a relatively simple language. Some languages
are too complex to be recognized by finite automata; this relationship will be explored in the subsequent
sections.
While the way in which a language gives rise to a partition of Σ∗ may seem mysterious and highly
nonintuitive, a deterministic finite automaton naturally distributes the words of Σ∗ into equivalence
classes. The following definition describes the manner in which a DFA partitions Σ∗ .

Definition 2.4 Given a DFA M = 〈Σ, S, s 0 , δ, F 〉, define a relation R M on Σ∗ as follows:

(∀x, y ∈ Σ∗ )(x R M y ⇔ δ(s 0 , x) = δ(s 0 , y))

R M relates all strings that, when starting at s 0 , wind up at the same state. It is easy to show that R M
will be an equivalence relation with (usually) one equivalence class for each state of M (remember that
equivalence classes are by definition nonempty; what type of state might not have an equivalence class
associated with it?). It is also straightforward to show that the properties of the state transition function
guarantee that R M is in fact a right congruence (see the exercises).

67
The equivalence classes of R M are called initial sets and will be of further interest in later chapters.
For a DFA M = 〈Σ, S, s 0 , δ, F 〉 and a given state t from M , I (M , t ) = {x | δ(s 0 , x) = t }. This initial set can be
thought of as the language accepted by a machine similar to M , but which has t as its only final state.
That is, if we define M t = 〈Σ, S, s 0 , δ, {t }〉, then I (M , t ) = L(M t ).
The notation presented here allows a concise method of denoting both relations defined by lan-
guages and relations defined by automata. It is helpful to observe that even in the absence of context,
R X indicates that a relation based on the language X is being described (since X occurs as a subscript),
while the relation R Y identifies Y as a machine (since Y occurs as a superscript).
Just as each DFA M gives rise to a right congruence R M , many right congruences Q can be associated
with a DFA, which will be called AQ . It can be shown that, if some of the equivalence classes of Q are
singled out to form a language L, AQ will recognize L.

Definition 2.5 Given a right congruence Q of finite rank and a language L that is the union of some of the
equivalence classes of Q, AQ is defined by

AQ = 〈Σ, SQ , s 0Q , δQ , FQ 〉

where
s 0 = {[x]Q | x ∈ Σ∗ }
s 0Q = [λ]Q
FQ = {[x]Q | x ∈ L}

and δQ is defined by
(∀x ∈ Σ∗ )(∀a ∈ Σ)(δQ ([x]Q ,a
a ) = [xa
a ]Q )

Note that this is a finite-state machine since r k(Q) < ∞, and that if L 1 were a different collection
of equivalence classes of Q, AQ would remain the same except for the placement of the final states. In
other words, FQ is the only aspect of this machine that depends on the language L (or L 1 ). As small as
this change might be, it should be noted that AQ is defined both by Q and the language L. It is left for
the reader to show that AQ is well-defined and that L(AQ ) = L (see the exercises). The corresponding
statements will be proved in detail in the next section for the important special case where Q = R L .

Example 2.4
a }∗ × {a
Let Q ⊆ {a a }∗ be the equivalence relation with the following equivalence classes:

[λ]Q = {λ} = {λ}0

a ]Q = {a
[a a }1
a } = {a
aa
aa]Q = {a
[aa a }2 ∪ {a
a }3 ∪ {a
a }4 ∪ {a
a }5 ∪ . . .

a ]Q , then
It is easy to show that Q is a right congruence (see the exercises). If L 1 were defined to be [λ]Q ∪[a
AQ would have the structure shown in Figure 2.1a. For the language defined by the different combination
aa
of equivalence classes given by L 2 = [λ]Q ∪ [aa
aa]Q , AQ would look like the DFA given in Figure 2.1b. This
example illustrates that it is the right congruence Q that establishes the start state and the transitions,
while the language L determines the final state set. It should also be clear why L must be a union of
equivalence classes from Q. The figure shows that a machine with the structure imposed by Q cannot
possibly both reject aaa and accept aaaaaaaa. Either the entire equivalence class for [aa aa
aa]Q must belong to
L, or none of the strings from [aaaa
aa]Q can belong to L.

68
Figure 2.1: (a) The automaton for L, in Example 2.4 (b) The automaton for L 2 in Example 2.4

Figure 2.2: The DFA discussed in Example 2.5

2.2 Nerode’s Theorem

In this section, we will show that languages that partition Σ∗ into a finite number of equivalence classes
can be represented by finite automata, while those that yield an infinite number of classes would require
a machine with an infinite number of states.

Example 2.5
The language K given in Example 2.3 can be represented by a finite automaton with two states; all words
that have an even number of letters eventually wind up at state s 0 , while all the odd words are taken by δ
to s 1 . This machine is shown in Figure 2.2.
It is no coincidence that these states split up the words of Σ∗ into the same equivalence classes that
R K does. There is an intimate relationship between languages that can be represented by a machine with
a finite number of states and languages that induce right congruences with a finite number of equiva-
lence classes, as outlined in the proof of the following theorem.

Theorem 2.2 Nerode’s Theorem. Let L be a language over an alphabet Σ; the following statements are all
equivalent:

1. L is FAD.

2. There exists a right congruence R on Σ∗ for which L is the (possibly empty) union of some of the
equivalence classes of R and r k(R) < ∞.

3. r k(R L ) < ∞.

69
Proof. Because of the transitivity of ⇒, it will be sufficient to show only the three implications (1) ⇒
(2), (2) ⇒ (3), and (3) ⇒ (1), rather than all six of them.
Proof of (1) ⇒ (2): Assume (1); that is, let L be FAD. Then there is a machine that accepts L; that is, there
exists a finite automaton M = 〈Σ, S, s 0 , δ, F 〉 such that L(M ) = L. Consider the relation R M on Σ∗ based on
this machine M as given in Definition 2.4: (∀x, y ∈ Σ∗ )(x R M Y ⇔ δ(s 0 , x) = δ(s 0 , y)).
This R M will be the relation R we need to prove (2). For each s ∈ S, consider I (M , s) = {x ∈ Σ∗ | δ(s 0 , x) =
s}, which represents all strings that wind up at state s (from s 0 ). Note that it is easy to define automata
for which it is impossible to reach certain states from the start state; for such states, I (M , s) would be
empty. Then ∀s ∈ S, I (M , s) is either an equivalence class of R M or I (M , s) = ;. Since there is at most
one equivalence class per state, and there are a finite number of states, it follows that r k(R M ) is also finite:
r k(R M ) ≤ kSk < ∞.
However, we have

L = L(M ) = {x ∈ Σ∗ | δ(s 0 , x) ∈ F } = {x ∈ Σ∗ | δ(s 0 , x) = f } =

[ [
I (M , f )
f ∈F f ∈F

That is, L is the union of some of the equivalence classes of the right congruence R M , and R M is indeed of
finite rank, and hence (2) is satisfied. Thus (1) ⇒ (2).
Proof of (2) ⇒ (3): Assume that (2) holds; that is, there is a right congruence R for which L is the union
of some of the equivalence classes of the right congruence R, and r k(R) < ∞. Note that we no longer have
(1) as an assumption; there is no machine (as yet) associated with L.
Case 1: It could be that L is the empty union; that is, that L = ;. In this case, it is easy to show that R L
has only one equivalence class (Σ∗ ), and thus r k(R L ) = 1 < ∞ and (3) will be satisfied.
Case 2: In the nontrivial case, L is the union of one or more of the equivalence classes of the given right
congruence R, and it is possible to show that this R must then be closely related to the R L induced by the
original language L. In particular, for any strings x and y,

xR y ⇒ (since R is a right congruence)

(∀u ∈ Σ )(xu R yu)
∗
⇒ (by definition of [ ])
(∀u ∈ Σ∗ )([xu]R = [yu]R ) ⇒ (by definition of L as a union of [ ]’s)
(∀u ∈ Σ∗ )(xu ∈ L ⇔ yu ∈ L) ⇒ (by definition of R L )
x RL y

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(x R y ⇒ x R L y) means that R refines R L , and thus each equivalence class of R is entirely
contained in an equivalence class of R L ; that is, each equivalence class of R L must be a union of one or more
equivalence classes of R. Thus, there are more equivalence classes in R than in R L , and so r k(R L ) ≤ r k(R).
But by hypothesis, r k(R) is finite, and so R L must be of finite rank also, and (3) is satisfied. Thus, in either
case, (2) ⇒ (3).
Proof of (3) ⇒ (1): Assume now that condition (3) holds; that is, L is a language for which R L is of
finite rank. Once again, note that all we know is that R L has a finite number of equivalence classes; we
do not have either (1) or (2) as a hypothesis. Indeed, we wish to show (1) by proving that L is accepted by
some finite automaton. We will base the structure of this automaton on the right congruence R L , using
Definition 2.5 with Q = R L . A RL is then defined by

A RL = 〈Σ, S RL , s 0RL , δRL , F RL 〉

70
where

S RL = {[x]RL | x ∈ Σ∗ }
s 0RL = [λ]RL
F RL = {[x]RL | x ∈ L}

and δRL is defined by

(∀x ∈ Σ∗ )(∀a
a ∈ Σ)(δRL ([x]RL ,a
a ) = [xa
a ]R L )

The basic idea in this construction is to define one state for each equivalence class in R L , use the equivalence
class containing λ as the start state, use those classes that were made up of words in L as final states, and
define δ in a natural manner. We claim that this machine is really a well-defined finite automaton and
that it does behave as we wish it to; that is, the language accepted by A RL really is L. In other words,
L(A Rl = L.
First, note that S RL is a finite set, since [by the only assumption we have in (3)] R L consists of only a
finite number of equivalence classes. It can be shown that F RL is well defined; if [z]RL = [y]RL , then either
(both z ∈ L and y ∈ L) or (neither z nor y belong to L) (why?). The reader should show that δRL is similarly
well defined; that is, if [z]RL = [y]RL , it follows that δRL is forced to also take both transitions to the same
a ]RL ). Also, a straightforward induction on |y| shows that the rule for δRL extends to a
a ]RL = [ya
state ([za
similar rule for δRL :
(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(δRL ([x]RL , y) = [x y]RL )

With this preliminary work out of the way, it is possible to easily show that L(A RL ) = L. Let x be any element
of Σ∗ . Then

x ∈ L(A RL ) ⇔ (by definition of L)

δRL (s 0RL , x) ∈ F RL ⇔ (by definition of s 0RL )
δRL ([λ]RL , x) ∈ F RL ⇔ (by definition of δRL and induction)
[λx]RL ∈ F RL ⇔ (by definition of λ)
[x]RL ∈ F RL ⇔ (by definition of F RL )
x ∈L

Consequently, L is exactly the language accepted by this finite automaton; so L must be FAD, and (1) is
satisfied. Thus (3) ⇔ (1). We have therefore come full circle, and all three conditions are equivalent.

The correspondence described by Nerode’s theorem can best be illustrated by an example.

Example 2.6
Let L be the following FAD language: L = Σ∗ − {λ} = Σ+ . There are many finite automata that accept L,
one of which is the DFA given in Figure 2.3. This four-state machine gives rise to a four-equivalence class
right congruence as described in (1) ⇒ (2), where

[λ]R N = I (N , s 0 ) = {λ}, since λ is the only string that ends up at s 0

1 ]R N
[1 = I (N , s 1 ) = {y| |y| is odd, and y ends with a 1 } = {z |δ(s 0 , z) = s 1 ]
11
[11
11]R N = I (N , s 2 ) = {y| |y| is even} − {λ} = {z |δ(s 0 , z) = s 2 }
000
[000
000]R N = I (N , s 3 ) = {y| |y| is odd, and y ends with a 0 } = {z |δ(s 0 , z) = s 3 }

71
Figure 2.3: The DFA N discussed in Example 2.6

Figure 2.4: The DFA P discussed in Example 2.6

Note that L is indeed I (N , s 1 ) ∪ I (N , s 2 ) ∪ I (N , s 3 ) which is the union of all the equivalence classes that
correspond to final states in N , as required by (2). To illustrate (2) ⇒ (3), let R be the equivalence relation
R N defined above, let L again be Σ+ , and note that (2) is satisfied: L = [ll ]R ∪ [11 11 000
11]R ∪ [000
000]R , the union of 3
of the equivalence class of a right congruence of rank 4 (which is finite).
As in the proof of (2) ⇒ (3), R L is refined by R, but in this case R and R L are not equal. All the relations
from R still hold, such as 11 R 11111111, so 11 R L 1111 1111; 0 R 000000, and thus 0 R L 000
000, and so forth. It can also
be shown that 11 R L 000000, even though 11 and 000 were not related by R (apply the definition of R L to
convince yourself of this). Thus, everything in [11 11 000
11]R is related by R L to everything in [000
000]R ; that is, all
the strings belong to the same equivalence class of R L , even though they formed separate equivalence
classes in R. It may at first appear strange, but the fact that there are more relations in R L means that
there are fewer equivalence classes in R L than in R. Indeed, R L has only two equivalence classes, {λ} and
L. In this case, three equivalence classes of R collapse to form one large equivalence class of R L . Thus
{λ} = [λ]RL = [λ]R and L = [11 11 1]R ∪ [11
11]RL = [1 11
11]R ∪ [000000
000]R and, as we were assured by (2) ⇒ (3), R refines R L .
To illustrate (3) ⇒ (1), let’s continue to use the L and R L given above. Since R L is of rank 2, we are
assured of finding a two-state machine that will accept L. A RL in this case would take the form of the
automaton P displayed in Figure 2.4. In this DFA, for example, δ([11 11 0) = [110
11]RL ,0 110 11
110]RL = [11
11]RL , and [λ]RL is
11]RL is a final state since 11 ∈ L. Verify that this machine accepts all words except λ; that
11
the start state. [11
is, L(A RL ) = L.

Example 2.7
Assume Q is defined to be the right congruence R given in Example 2.4, and L is again Σ+ , which is the
1]Q , [11
union of three of the equivalence classes of Q : [1 11 000
000]Q . The automaton AQ is given in
11]Q , and [000
Figure 2.5.
If we were instead to begin with the same language L, but use the two-state machine P at the end of
Example 2.6 to represent L, we would find that L would consist of only one equivalence class, R P would
have only two equivalence classes, and R P would in this case be the same as R L (see the exercises). R P

72
Figure 2.5: The automaton discussed in Example 2.7

turns out to be as simple as R L because the machine we started with was as “simple” as we could get and
still represent L. In Chapter 3 we will characterize the idea of a machine being “as simple as possible,”
that is, minimal.
The two machines given in Example 2.6 accept the same language. It will be convenient to formal-
ize this notion of distinct machines “performing the same task,” and we therefore make the following
definition.

Definition 2.6 Two DFAs A and B are equivalent iff L(A) = L(B ).

Example 2.8
The DFAs N from Example 2.6 and AQ from Example 2.7 are equivalent since L(N ) = Σ+ = L(AQ ) .

Definition 2.7 A DFA A = 〈Σ, S A , s 0 A , δ A , F A 〉 is minimal iff for every DFA B = 〈Σ, S B , s 0B , δB , F B 〉 for which
L(A) = L(B ), |S A | ≤ |S B |.

An automaton is therefore minimal if no equivalent machine has fewer states.

Example 2.9
The DFA N from Example 2.7 is clearly not minimal since the automaton P from Example 2.6 is equiva-
lent and has fewer states than N . The techniques from Chapter 3 can be used to verify that the automaton
P from Example 2.6 is minimal. More importantly, minimization techniques will be explored in Chapter
3 that will allow an optimal machine (like this P ) to be produced from an inefficient automaton (like N ).

2.3 Pumping Lemmas

As you have probably noticed by now, finding R L and counting the equivalence classes is not a very
practical way of verifying that a suspected language cannot be defined by a finite automaton. It would
be nice to have a better way to determine if a given language is unwieldy. The pumping lemma will
supply us with such a technique. It is based on the observation that if your automaton processes a “long
enough” word it must eventually visit (at least) one state more than once.
Let A = 〈Σ, S, s 0 , δ, F 〉, and consider starting at some state s and processing a word x of length 5. We
will pass through state s and perhaps five other states, although these states may not all be distinct if

73
Figure 2.6: A path with a loop

we visit some of them repeatedly while processing the five letters in x; thus the total will be six states
(or less). Note that if A has 10 states (kSk = 10) and |x| = 12 we cannot go to 13 different states while
processing x; (at least) one state must be visited more than once.
In general, if n = kSk, then any string x whose length is equal to or greater than n must pass through
some state q twice while being processed by A, as shown in Figure 2.6. Here the arrows are meant to
represent the path taken while processing several letters, and the intermediate states are not shown. The
strings u, v, and w are defined as
u = first few letters of x that take us to the state q
v = next few letters of x that will again take us back to q
w = rest of the string x

Then, with x = uv w, we have δ(s, u) = q, δ(s, uv) = q, and in fact δ(q, v) = q. Also, δ(s, x) = δ(s, uv w) = f
and δ(q, w) = f , as is clear from the diagram. Now consider the string uw, that is, the string x with the v
part “removed”:
δ(s, uw) = δ(δ(s, u), w) (why?)
= δ(q, w)
= f
That is, the string uw winds up in the same place uv w does; this is illustrated in Figure 2.7a. Note that a
similar thing happens if uv 2 w is processed:

δ(s, uv v w) = δ(δ(s, u), v v w)

= δ(q, v v w)
= δ(δ(q, v), v w)
= δ(q, v w)
= δ(δ(q, v), w)
= δ(q, w)
= f
This behavior is illustrated in Figure 2.7b.
In general, it can be proved by induction that (∀i ∈ N)(δ(s, uv i w) = f = δ(s, uv w)). Notice that we do
reach q two distinct times, which implies that the string v contains at least one letter; that is, |v| ≥ 1. Also,
after the first n letters of x, we must have already repeated a state, and thus some state q can be found
such that |uv| ≤ n. If s happens to be the start state s 0 and f is a final state, we have now shown that:
If A = 〈Σ, S, s 0 , δ, F 〉, where kSk = n, then, given any string x = a 1a 2a 3 . . .a a m , where m ≥ n and δ(s 0 , x) =
f ∈ F [which implies x ∈ L(A)], the states δ(s 0 , λ), δ(s 0 ,aa 1 ), δ(s 0 ,a
a 1a 2 ), δ(s 0 ,a
a 1a 2a 3 ), . . . , δ(s 0 ,a
a 1a 2 . . .a
an)
cannot all be distinct, and so x can be broken up into strings u, v, and w such that

74
Figure 2.7: (a) The path that bypasses the loop (b) The path that traverses the loop twice

x = uv w
|uv| ≤ n
|v| ≥ 1

and (∀i ∈ N)(δ(s 0 , uv i w) =f), that is, (∀i ∈ N)(uv i w ∈ L(A)). In other words, given any “long” string in
L(A), there is a part of the string that can be “pumped” to produce even longer strings in L(A).
Thus, if L is FAD, there exists an automaton (with a finite number n of states), and thus for some
n ∈ N, the above statement should hold. We have just proved what is generally known as the pumping
lemma, which we now state formally.

Theorem 2.3 The Pumping Lemma. Let L be an FAD language over Σ∗ . Then (∃n ∈ N)(∀x ∈ L 3 |x| ≥
n)(∃u, v, w ∈ Σ∗ ) 3

x = uv w
|uv| ≤ n
|v| ≥ 1

and
(∀i ∈ N)(uv i w ∈ L).

Proof. Given above.

Example 2.10
Let E be the set of all even-length words over {a a ,b
b }. There is a two-state machine that accepts E , so E
is FAD, and the pumping lemma applies if n is, say, 5. Then ∀x 3 |x| > 5, if x = a 1a 2a 3 . . .a a j ∈ E (that is,
j is even), we can choose u = λ, v = a 1a 2 , and w = a 3a 4 . . .aa j . Note that |uv| = 2 ≤ 5, |v| = 2 ≥ 1, and
|uv i w| = j + 2(i − 1), which is even, and so (∀i ∈ N)(uv i w ∈ E ).
If Example 2.10 does not appear truly exciting, there is good reason: The pumping lemma is generally
not applied to FAD languages! (Note: We will see an application later.) The pumping lemma is often ap-
plied to show languages are not FAD (by proving that the language does not satisfy the pumping lemma).
Note that the contrapositive of Theorem 2.3 is:

75
Theorem 2.4 Let L be a language over Σ∗ . If
(∀n ∈ N)(∃x ∈ L 3 |x| ≥ n)(∀u, v, w ∈ Σ∗ 3 x = uv w, |uv| ≤ n, |v| ≥ 1)(∃i ∈ N 3 uv i w ∉ L),
then L is not FAD.
Proof. See the exercises.

Example 2.11
Consider L 4 = {y ∈ {0 0,11}∗ | |y|0 = |y|1 }. We will use Theorem 2.4 to show L 4 is not FAD: Let n be given, and
n n
choose x = 0 1 . Then x ∈ L 4 , since |x|0 = n = |x|1 . It should be observed that x must be dependent on
n, and we have no control over n (in particular, n cannot be replaced by some constant; similarly, while
i may be chosen to be a convenient fixed constant, a proof that covers all possible combinations of u, v,
and w must be given).
Note that this choice of x is “long enough” in that |x| = 2n ≥ n, as required by Theorem 2.4. For any
combination of u, v, w ∈ Σ∗ such that x = uv w, |uv| ≤ n, |v| ≥ 1, we hope to find a value for i such that
uv i w ∉ L 4 . Since |uv| ≤ n and the first n letters of x are all zeros, this narrows down the choices for u, v,
and w. They must be of the form u = 0 j and v = 0 k (since |uv| ≤ n and x starts with n zeros), and w must
be the “rest of the string” and look something like w = 0 m 1 n . The constraints on u, v, and w imply that
j + k ≤ n, k ≥ 1, and j + k + m = n. If i = 2, we have that uv 2 w = 0n+k 1n ∉ L 4 (why?). Thus, by Theorem
2.4, L 4 is not FAD [or, alternately, because the conclusion of the pumping lemma (Theorem 2.3) does not
hold, L 4 cannot be FAD].
It is instructive to endeavor to build a DFA that attempts to recognize the language L 4 . As you begin
to see what such a machine must look like, it will become clear that no matter how many states you add
(that is, no matter how large n becomes) there will always be some strings (“long” strings) that would
require even more states. Your construction may also suggest what the equivalence classes of R L 4 must
look like (see the exercises). How many equivalence classes are there? (You should be able to answer this
last question without referring to any constructions.)
A similar argument can be made to show that no DFA can recognize the set of all fully parenthesized
infix expressions (see the exercises). Matching parentheses, like matching 0 s and 1 s in the last example,
requires unlimited storage. We have seen that DFAs are adequate vehicles for pattern matching and
token identification, but a more complex model is clearly needed to implement functions like the parsing
of arithmetic expressions. Pushdown automata, discussed in Chapter 10, augment the finite memory
with an unbounded stack, allowing more complex languages to be recognized.
Intuitively, we would not expect finite-state machines to be able to differentiate between arbitrar-
ily long integers. While modular arithmetic, which only differentiates between a finite number of re-
mainders, should be representable by finite automata, unrestricted arithmetic is likely to be impossible.
For example, {a a i b j c k | i , j , k ∈ N and i + j = k} cannot be recognized by any DFA, while the language
i j k
a b c | i , j , k ∈ N and i + j ≡ k mod 3} is FAD. Checking whether two numbers are relatively prime is
{a
likewise too difficult for a DFA, as shown by the proof in the following example.

Example 2.12
Consider L = {aa i b j | i , j ∈ N and i and j are relatively prime}. We will use Theorem 2.4 to show L is not
FAD: Let n be given, and choose a prime p larger than n +1 (we can be assured such a p exists since there
are an infinite number of primes). Let x = a p b (p−1)! . Since p has no factors other than 1 and p, it has
no nontrivial factor in common with (p − 1) · (p − 2) · . . . · 3 · 2 · 1, and so p and (p − 1)! are relatively prime,
which guarantees that x ∈ L. The length of x is clearly greater than n, so Theorem 2.3 should apply, which

76
implies that there must exist a combination u, v, w ∈ Σ∗ such that x = uv w, |uv| ≤ n, |v| ≥ 1; we hope to
find a value for i such that uv i w ∈ L. Since |uv| ≤ n and the first n letters of x are all a s, there must
exist integers j , k, and m for which u = a j and v = a k , and w must be the “rest of the string”; that is,
w = a m b (p−1)! . The constraints on u, v, and w imply that j + k ≤ n, k ≥ 1, and j + k + m = p. If i = 0, we
have that uv ◦ w = a p−k b (p−1)! . But p − k is a number between p − 1 and p − n and hence must match one
of the nontrivial factors in (p − 1)!, which means that uv ◦ w ∉ L (why?). Therefore, Theorem 2.3 has been
violated, so L could not have been FAD.
The details of the basic argument used to prove the pumping lemma can be varied to produce other
theorems of a similar nature: for example, when processing x, there must be a state q 0 repeated within
the last n letters. This gives rise to the following variation of the pumping lemma.

Theorem 2.5 Let L be a FAD language over Σ∗ . Then

(∃n ∈ N)(∀x ∈ L 3 |x| ≥ n)(∃u, v, w ∈ Σ∗ ) 3

x = uv w
|v w| ≤ n,
|v| ≥ 1

and
(∀i ∈ N)(uv i w ∈ L)

Proof. See the exercises.

The new condition |v w| ≤ n reflects the constraint that some state must be repeated within the last
n letters. The contrapositive of Theorem 2.5 can be useful in demonstrating that certain languages are
not FAD. By repeating our original reasoning while assuming the string x takes us to a nonfinal state, we
obtain yet another variation.

Theorem 2.6 Let L be a FAD language over Σ∗ . Then

(∃n ∈ N)(∀x ∉ L 3 |x| ≥ n)(∃u, v, w ∈ Σ∗ ) 3

x = uv w
|uv| ≤ n
|v| ≥ 1

and
(∀i ∈ N)(uv i w ∉ L)

Proof. See the exercises.

Notice that Theorem 2.6 guarantees that if one “long” string is not in the language then there is an
entire sequence of strings that cannot be in the language. There are some examples of languages in the
exercises where Theorem 2.4 is hard to apply, but where Theorem 2.5 (or Theorem 2.6) is appropriate.
When i = 0, the pumping lemma states that given a “long” string (uv w) in L there is a shorter string
(uw) that is also in L. If this new string is still of length greater than n, the pumping lemma can be
reapplied to find a still shorter string, and so on. This technique is the basis for proving the following
theorem.

77
Theorem 2.7 Let M be an n-state DFA accepting L. Then

a m , and m ≥ n)(∃ an increasing sequence i 1 , i 2 , . . . , i j )

(∀x ∈ L 3 x = a 1a 2 . . .a

for which a i 1 a i 2 . . .a
a i j ∈ L, and j < n.
Proof. See the exercises.

a i j represents a string formed by “removing” letters from perhaps several places

Note that a i 1 a i 2 . . .a
in x, and that this new string has length less than n. Theorem 2.7 can be applied in areas that do not
initially seem to relate to DFAs. Consider an arbitrary right congruence R of (finite) rank n. It can be
shown that each equivalence class of R is guaranteed to contain a representative of length less than n.
For example, consider the relation R given by

[λ]R = {λ}
11111
[11111
11111]R = {y | |y| is odd, and y ends with a 1 }
0101
[0101
0101]R = {y | |y| is even and |y| > 0}
00000
[00000
00000]R = {y | |y| is odd, and y ends with a 0 }

In this relation, r k(R) = 4, and appropriate representatives of length less than 4 are λ, 1 , 11
11, and 100
100, re-
1]R = [11111
spectively. That is, [λ]R = [λ]R , [1 11111 11
11111]R , [11 0101
11]R = [0101 100
0101]R , and [100 00000
100]R = [00000
00000]R . By constructing
a DFA based on the right congruence R, Theorem 2.7 can be used to prove that every equivalence class
of R has a “short” representative (see the exercises).
We have seen that deterministic finite automata are limited in their cognitive powers, that is, there
are languages that are too complex to be recognized by DFAs. When only a finite set of previous histories
can be distinguished, the resulting languages must have a certain repetitious nature. Allowing automata
to instead have an infinite number of states is uninteresting for several reasons. On the practical side, it
would be inconvenient (to say the least) to physically construct such a machine. Infinite automata are
also of little theoretical interest as they do not distinguish between simple and complex languages: any
language can be accepted by the infinite analog of a DFA. With an infinite number of states available, the
state transition diagrams can look like trees, with a unique state corresponding to each word in Σ∗ . The
states corresponding to desired words can simply be made final states.
More reasonable enhancements to automata will be explored later. Non-determinism will be pre-
sented in Chapter 4, and machines with extended capabilities will be defined and investigated in Chap-
ters 10 and 11.

Exercises
2.1. Let Σ = {a b ,cc }. Show that the relation Ψ ⊆ Σ∗ × Σ∗ defined by
a ,b

x Ψ y ⇔ |x| − |y| is odd

is not a right congruence. (Is it an equivalence relation?)

2.2. Let Σ = {a b ,cc }. Consider the relation Q ⊆ Σ∗ × Σ∗ defined by

a ,b

x Q y ⇔ |x|a − |y|b ≡ 0 mod 3

(a) Show that Q is an equivalence relation.

78
(b) Assume that part (a) is true, and show that Q is a right congruence.

2.3. Let Σ = {a
a ,b
b ,cc }. Find all languages L such that r k(R L ) = 1. Justify your conclusions.

1}∗ × {0
0,1
2.4. Let P ⊆ {0 1}∗ be the equivalence relation with the following equivalence classes:
0,1

[λ]P = {λ} = {0 1 }0
0,1
1 ]P
[1 = 0,1
{0 1} = {0
0,11 }1
00 0,1 2
1} ∪ {00,11}3 ∪ {0 1}4 ∪ {0
0,1 1 }5 ∪ . . .
0,1
[00
00]P = {0

Show that P is a right congruence.

2.5. For the relation P defined in Exercise 2.4, find all languages L 3 R L = P .

2.6. For the relation Q defined in Exercise 2.2, find all languages L 3 R L = Q.

2.7. Let Σ = {a b }. Define the relation Q by λQλ. and (∀x 6= λ)(λ Q

a ,b 6 x), and

(∀x 6= λ)(∀y 6= λ)[x Q y ⇔ (|x| is even ∧ |y| is even) ∨ (|x| is odd ∧ |y| is odd)],

which implies that

(∀x 6= λ)(∀y 6= λ)[x Q

6 y ⇔ (|x| is even ∧ |y| is odd) ∨ (|x| is odd ∧ |y| is even)].

(a) Show that Q is a right congruence. and list the equivalence classes.
aa
(b) Define L = [λ]Q ∪ [aa
aa]Q . Find a simple decryption for L, and list the equivalence classes of R L .
(Note that Q does refine R L .)
(c) Draw a machine with states corresponding to the equivalence classes of Q. Arrange the final
states so that the machine accepts L (that is, find AQ ).
(d) Draw A RL .
(e) Consider the machine in part (c) above (AQ ). Does it look like A RL ? Can you rearrange the final
states in AQ (producing a new language K ) so that A RK looks like your new AQ ? Illustrate.
(f) Consider all eight languages found by taking unions of equivalence classes from Q, and see
which ones would satisfy the criteria of part (e).

2.8. Let Σ = {a
a }. Let I be the identity relation on Σ∗ .

(a) Show that I is a right congruence.

(b) What do the equivalence classes of I look like?

2.9. Let Σ = {aa }, and let I be the identity relation on Σ∗ . Let L = {λ} ∪ {a
a } ∪ {aa
aa
aa}, which is the union of
three of the equivalence classes of I . I has infinite rank. Does Nerode’s theorem imply that L is not
FAD? Explain.

2.10. Define a machine M = 〈Σ, S M , s 0 , δ, F M 〉 for which r k(R L(M ) ) 6= kS M k.

2.11. Carefully show that F RL is a well-defined set; that is, show that the rule that assigns equivalence
classes to F RL is unambiguous.

79
2.12. Carefully show that δRL is well-defined, that is, that δRL is a function.

2.13. Use induction to show that (∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(δRL ([x]RL , y) = [x y]RL ).

2.14. Consider the automaton P derived in Example 2.6, find R P and notice that R P = R L .

2.15. Find R A for each machine A you built in the exercises of Chapter 1; compare R A to R L(A) .

2.16. Prove by induction that, for the strings defined in the discussion of the pumping lemma, (∀i ∈
N)(δ(s, uv i w) = f = δ(s, uv w)).

2.17. Prove Theorem 2.1.

2.18. (a) Find a language that gives rise to the relation I defined in Exercise 2.8.
(b) Could such a language be FAD? Explain.

2.19. Starting with Theorem 2.3 as a given hypothesis, prove Theorem 2.4.

2.20. Prove Theorem 2.5 by constructing an argument similar to that given for Theorem 2.3.

2.21. Prove Theorem 2.6 by constructing an argument similar to that given for Theorem 2.3.

2.22. Prove Theorem 2.7.

b }∗ | |x|a < |x|b }. Show L is not FAD.

a ,b
2.23. Let L = {x ∈ {a

b }∗ | |x|a ≥ |x|b }. Show G is not FAD.

a ,b
2.24. Let G = {x ∈ {a

d }∗ | ∃ prime p 3 y = d p } = {d
2.25. Let P = {y ∈ {d d d ,d
d d d ,d d 7 ,d
d d d d d ,d d l 3 . . .}. Prove that P is not FAD.
d 11 ,d

2.26. Let Γ = {x ∈ {0
0,1 2}∗ | ∃w ∈ {0
1,2 1}∗ 3= w ·2· w} = {2
0,1 2,121
121 020
121,020 11211
020,11211 10210, . . .}. Prove that Γ is not FAD.
10210
11211,10210

2.27. Let Ψ = {x ∈ {0 1}∗ | ∃w ∈ {0

0,1 1}∗ 3 x = w · w} = {λ,00
0,1 00 11
00,11 0000
11,0000 1010
0000,1010 1111, . . .}. Prove that Ψ is not FAD.
1111
1010,1111

a n−1a n , then w 0 = a n a n−1 . . .a

2.28. Define the reverse of a string w as follows: If w = a 1a 2a 3a 4 . . .a a 4a 3a 2a1 .
0,1 ∗ 0
1} | w = w } = {λ,0 0,1
1,00
00 11 000 010 101 111 0000 0110
Let K = {w ∈ {0 00,11
11,000
000,010
010,101
101,111
111,0000 0110, . . .}. Prove K is not FAD.
0000,0110

2.29. Let Φ = {x ∈ {a b ,cc }∗ | ∃ j , k, m ∈ N 3= a j b k c m , where j ≥ 3 and k = m}. Prove Φ is not FAD. Hint: The
a ,b
first version of the pumping lemma is hard to apply here (why?).

d }∗ | ∃ nonprime q 3 y = d q } = {λ,d
2.30. Let C = {y ∈ {d d 4 ,d
d ,d d 6 ,d
d 8 ,d
d 9 ,d
d 10 . . .}. Show C is not FAD. Hint: The
first version of the pumping lemma is hard to apply here (why?).

2.31. Assume Σ = {a a ,b
b } and L is a language for which R L has the following three equivalence classes: {λ},
{all odd-length words}, {all even-length words except λ}.

(a) Why couldn’t L = {x | |x| is odd}? (Hint: Recompute R {x| |x| is odd} ).
(b) List the languages L that could give rise to this R L .

2.32. Let Σ = {a b } and let Ψ = {x ∈ Σ∗ | x has an even number of a s and ends with (at least) one b }.
a ,b
Describe R Ψ , and draw a machine accepting Ψ.

a }∗ | ∃ j ∈ N 3 |x| = j 2 } = {λ,a
2.33. Let Ξ = {x ∈ {a a ,aaaa
aaaa a 9 ,a
aaaa,a a 16 ,a
a 25 , . . .}. Prove that Ξ is not FAD.

80
b }∗ | ∃ j ∈ N 3 |x| = 2 j } = {b
2.34. Let Φ = {x ∈ {b b ,bb
bb bbbb
bb,bbbb b 8 ,b
bbbb,b b 16 ,b
b 32 , . . .}. Prove that Φ is not FAD.

2.35. Let Σ = {a
a ,b
b }. Assume R L has the following five equivalence classes: {λ}, {a
a }, {aa
aa a 3 ,a
aa}, {a a 4 ,a
a 5 ,a
a 6 , . . .},
{x | x contains (at least) one b }. Also assume that L consists of exactly one of these equivalence
classes.

(a) Which equivalence class is L?

(b) List the other languages L that could give rise to this R L (and note that they might consist of
several equivalence classes).

2.36. Let Ω = {y ∈ {0 1}∗ | (y contains exactly one 0 ) ∨ (y contains an even number of 1 s)}. Find R Ω .
0,1

2.37. Let Σ = {a b } and L 1 = {x ∈ Σ∗ | |x|a > |x|b } and L 2 = {x ∈ Σ∗ | |x|a < 3}. Which of the following are
a ,b
FAD? Support your answers.
(a) L 1 (b) L 2 (c) L 1 ∩ L 2 (d) ∼L 2 (e) L 1 ∪ L 2

2.38. Let m ∈ N and let R m be defined by x R m y ⇔ |x| − |y| is a multiple of m.

(a) Prove that R m is a right congruence.

aaaaaa
(b) Show that R 2 ∩R 3 is R 6 , and hence also a right congruence. (Note, for example, that 〈λ,aaaaaa
aaaaaa〉 ∈
aaaaaa
R 6 since 〈λ,aaaaaa aaaaaa
aaaaaa〉 ∈ R 3 and 〈λ,aaaaaa
aaaaaa〉 ∈ R 2 ; how do the equivalence classes of R 2 and R 6
compare?).
(c) Show that, in general, if R and S are right congruences, then so is R ∩ S.
(d) Now consider R 2 ∪ R 3 , and show that this is not a right congruence because it is not even an
equivalence relation.
(e) Prove that if R and S are right congruences and R ∪ S happens to be an equivalence relation
then R ∪ S will be a right congruence, also.

2.39. Give an example of two right congruences R 1 and R 2 over Σ∗ for which R 1 ∪ R 2 is not a right congru-
ence.

2.40. Let I = {a b } and let Γ = {λ,a

a ,b a ,ab
ab ba
ab,ba bb
ba,bb bbb} ∪ {x ∈ Σ∗ | |x| ≥ 4}.
bbb
bb,bbb

(a) Use the definition of R Γ to show ab R Γ ba

ba.
(b) Use the definition of R Γ to show ab is not related by R Γ to bb
bb.
a }, {b
(c) Show that the equivalence classes of R Γ are {λ}, {a b }, {aa
aa bb
aa}, {bb ab
bb}, {ab ba
ab,ba
ba}, {x | x 6= bbb ∧ |x| =
3}, {x |x = bbb ∨ |x| ≥ 4}.
(d) Draw the minimal state DFA which accepts Γ.

2.41. Prove that the relation R M given in Definition 2.4 is a right congruence.

2.42. We can view a text file as being one long “word,” that is, a string of characters (including spaces,
carriage returns, and so on). In this sense, each Pascal program can be considered to be a single word
over the ASCII alphabet. We can define the language Pascal as the set of all valid Pascal programs
(that is, the valid words are those text files that would compile with no compiler errors). Is this
language FAD?

81
2.43. Define “Short Pascal” as the collection of valid Pascal programs that are composed of less than 1
million characters. Is Short Pascal FAD? Any volunteers for building the appropriate DFA?

2.44. Let Σ = {a
a ,b a n b k c i | n < 3 or (n ≥ 3 and k = j )}.
b ,cc }, and define L = {a

(a) Show that for this language the conclusion of Theorem 2.3 holds, but the hypothesis of Theo-
rem 2.3 does not hold.
(b) Is the contrapositive of Theorem 2.3 true? Explain.

2.45. Carefully show that FQ in Definition 2.5 is a well-defined set.

2.46. Carefully show that δQ in Definition 2.5 is well defined.

2.47. For δQ in Definition 2.5, use induction to show that

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(δQ ([x]Q , y) = [x y]q ).

2.48. For the L and Q in Definition 2.5, prove that L(AQ ) = L.

2.49. Given L and Q as in Definition 2.5, AQ is a machine to which we can apply Definition 2.4. Prove or
give a counterexample: Q = R (AQ ) .

2.50. Given L and Q = R L as in Definition 2.5, A RL is a machine to which we can apply Definition 2.4.
Prove or give a counterexample: R L = R (A RL ) .

2.51. Show that the converse of Theorem 2.3 is false (Hint: See Exercise 2.29 and let L = Φ).

b }∗ | |x|a = 2|x|b }. Prove that L is not FAD.

a ,b
2.52. Let L = {x ∈ {a

2.53. Consider the language K defined in Exercise 2.28.

110
110]RK ; that is, find all strings y for which y R K 110
(a) Find [110 110.
(b) Describe R K .

2.54. Prove or give a counterexample: R L 1 ∩ R L 2 = R (L 1 ∩L 2 ) .

2.55. Given a right congruence R over Σ∗ for which r k(R) = n. Prove that each equivalence class of R
contains a representative whose length is less than n.

2.56. For the R given in Example 2.6, find all languages L for which R L = R N .

2.57. Consider the languages defined in Exercise 1.6. Find the right congruences induced by each of these
four languages.

a }∗ and λ R L a . List all possible choices for the language L.

2.58. Assume L ⊆ {a

a }∗ and a R L aa
2.59. Assume L ⊆ {a aa. List all possible choices for the language L.

a }∗ and λ R L aa
2.60. Assume L ⊆ {a aa. List all possible choices for the language L.

2.61. (a) Give an example of a DFA M for which R M = R L(M ) .

(b) Give an example of a DFA M for which R M 6= R L(M ) .

82
(c) For every DFA M , show that R M refines R L(M ) .

2.62. Find R M and R L(M ) for the machine M described in Example 1.5.

2.63. Is A RL(A) always equivalent to A? Explain.

2.64. Consider L 4 = {y ∈ {00,11}∗ | |y|0 = |y|1 as given in Example 2.11. Let n be given and consider x =
n
01
(01 0101
0101. Then |x| = 2n > n; but if u = 0101
01) = 010101 . . .0101 0101, v = 01 01)n−3 , (∀i ∈ N)uv i w ∈
01
01, and w = (01
L 4 . Does this mean L 4 is FAD? Explain.

1}∗ | |y|0 = |y|1 as given in Example 2.11. Find R L 4 .

0,1
2.65. Consider L 4 = {y ∈ {0

A ,B
2.66. Show that the set of all postfix expressions over the alphabet {A B ,+
+,−
−} is not FAD.

A ,B
2.67. Show that the set of all parenthesized infix expressions over the alphabet {A B ,+
+,−
−} is not FAD.

2.68. For a given language L, how does R L compare to R −L ; that is, how does the right congruence gen-
erated by a language compare to the right congruence generated by its complement? Justify your
statement.

2.69. Let Σ = {a
a ,b
b }. Assume the right congruence Q has the following equivalence classes: {λ}, {a
a }, {b
b }, {x | |x| ≥
2}. Show that there is no language L such that R L = Q.

a ,aa
2.70. Let Q be the equivalence relation with the two equivalence classes {λ,a aa a 3 ,a
aa} and {a a 4 ,a
a 5 , . . .}.

(a) Show that Q is not a right congruence.

(b) Attempt to build AQ (ignoring FQ for the moment), and describe any difficulties that you en-
counter.
(c) Explain how the failure in part (a) is related to the difficulties found in part (b).

2.71. Let Σ = {a
a ,b a i b j c k | i , j , k ∈ N and i + j = k} is not FAD.
b ,cc }. Show that {a

a }∗ × {a
2.72. Let Q ⊆ {a a }∗ be the equivalence relation with the following equivalence classes:

[λ]Q = {λ} = {aa }0

a ]Q
[a = {a a }1
a } = {a
aa
aa]Q
[aa = a }2 ∪ {a
{a a }3 ∪ {a
a }4 ∪ {a
a} ∪ . . .

Show that Q is a right congruence.

2.73. Let Σ = {a
a ,b
b ,cc }, and let R 2 be defined by x R 2 y ⇔ x and y end with the same letter.

(a) Show that R 2 is an equivalence relation.

(b) Assume that part (a) is true, and show that R 2 is a right congruence.

2.74. Let Σ = {a
a ,b
b ,cc }, and let R 3 be defined by x R 3 y ⇔ x and y begin with the same letter.

(a) Show that R 3 is an equivalence relation.

(b) Assume that part (a) is true, and show that R 3 is a right congruence.

2.75. Let Σ = {a
a ,b
b }. Which of the following languages are FAD? (Support your answers.)

83
(a) L 1 = all words over Σ∗ for which the last letter matches the first letter.
(b) L 2 = all odd-length words over Σ∗ for which the first letter matches the center letter.
(c) L 3 = all words over Σ∗ for which the last letter matches none of the other letters.
(d) L 4 = all even-length words over Σ∗ for which the two center letters match.
(e) L 5 = all odd-length words over Σ∗ for which the center letter matches none of the other letters.

2.76. In the proof of (2) ⇒ (3) in Nerode’s theorem:

(a) Complete the proof of case 1.

(b) Could case 1 actually be included under case 2?

2.77. Consider the right congruence property (RC) in Definition 2.1. Show that the implication could be
replaced by an equivalence; that is, property (RC) could be rephrased as

(∀x, y ∈ Σ∗ )(x R y ⇔ (∀u ∈ Σ∗ )(xu R yu))

2.78. Given a DFA M = 〈Σ, S, s 0 , δ, F 〉, assume that δ(s, u) = q, and δ(q, v) = q. Use induction to show that
(∀i ∈ N)(δ(s, uv i ) = q).

2.79. Let L be the set of all strings that agree with some initial part of the pattern 0 110 210 310 41 . . . =
0100100010000100000100 Thus, L = {0 0,01
01 010
01,010 0100
010,0100 01001
0100,01001 010010
01001,010010 0100100
0100100, . . .}. Prove that L is not
010010,0100100
FAD.

2.80. Consider the following BNF over the three-symbol alphabet {a a ,)),((}, and show that the resulting lan-
guage is not FAD.
<simple> := a |((<simple>))

2.81. (a) Let Σ = {0 1}. Let L 2 = {x ∈ Σ∗ | the base 2 number represented by x is a power of 2}. Show that
0,1
L 2 is FAD.
(b) Let Σ∗ = {00,1
1,2
2,3
3,44,55,6
6,7
7,8
8,9
9}.
Let L 10 = {x ∈ Σ∗ | the base 10 number represented by x is a power of 2}. Prove that L 10 is not
FAD.

84
Chapter 3

Minimization of Finite Automata

We have seen that there are many different automata that can be used to represent a given language. We
would like to be able to find an automaton for a language L that is minimal, that is, a machine which can
represent that language which has the fewest number of states possible.
Finding such an optimal DFA will involve transforming a given automaton into the most efficient
equivalent machine. To effectively accomplish this transformation, we must have a set of clear, unequiv-
ocal directions specifying how to proceed. A procedure is a finite set of instructions that unambiguously
defines deterministic, discrete steps for performing some task. As anyone who has programmed a com-
puter knows, it is possible to generate procedures that will never halt for some inputs (or perhaps for
all inputs if the program is seriously flawed). An algorithm is a procedure that is guaranteed to halt on
all (legal) inputs. In this chapter we will specify a procedure for finding a minimal machine and then
justify that this procedure is actually an algorithm. Thus, the theorems and definitions will show how to
transform an inefficient DFA into an optimal automaton in a straightforward manner that can be easily
programmed.

3.1 Homomorphisms and Isomorphisms

One of our goals for this chapter can be stated as follows: Given a language L, we wish to survey all the
machines that recognize L and choose the machine (or machines) that is “smallest.” It will be seen that
there is indeed a unique smallest machine: A RL . The automaton A RL will be unique in the sense that any
other optimal DFA looks exactly like A RL except for a trivial relabeling of the state names. The concept
of two automata “looking alike” will have to be formalized to provide a basis for our rigorous statements.
Machines that “look alike” will be called isomorphic, and the relabeling specification will be called an
isomorphism.
We have already learned some facts about A RL , which stem from the proof of Nerode’s theorem.
These are summarized below and show that A RL is indeed one of the optimal machines for the language
L.

Corollary 3.1 For any FAD language L, L(A RL ) = L.

Proof. This was shown when (3) ⇒ (1) in Theorem 2.2 was proved.

Also, in the proof of (1) ⇒ (2) in Nerode’s theorem, the relation R M (defined by a given DFA M =
〈Σ, S, s 0 , δ, F 〉 for which L(M ) = L) was used to show kSk ≥ r k(R M ). Furthermore, in (2) ⇒ (3), right con-

85
gruences such as R M that satisfied property (2) must be refinements of R L , and so r k(R M ) ≥ r k(R L ). Thus
kSk ≥ r k(R M ) ≥ r k(R L ) = kS RL k, which leads immediately to the following corollary.

Corollary 3.2 For any FAD language L, A RL is a minimal deterministic finite automaton that accepts L.
Proof. The proof follows from the definition of a minimal DFA (Definition 2.7); that is, if M = 〈Σ, S, s 0 , δ, F 〉
is any machine that also accepts L, then kSk ≥ kS RL k.

Besides being in some sense “the simplest,” the minimal machine has some other nice properties.
For example, if A is minimal, then the right congruence generated by A is identical to the right congru-
ence generated by the language recognized by A; that is, R A = R L(A) (see the exercises). Examples 3.1 and
3.2 illustrate the two basic ways a DFA can have superfluous states.

Definition 3.1 A state s in a finite automaton A = 〈Σ, S, s 0 , δ, F 〉 is called accessible iff

∃x s ∈ Σ∗ 3 δ(s 0 , x s ) = s

The automaton A is called connected iff

(∀s ∈ S)(∃x s ∈ Σ∗ )(δ(s 0 , x s ) = s)

That is, a connected machine requires all states to be accessible; every state s of S must be “reachable”
from s 0 by some string (x s ) in Σ∗ (different states will require different strings, and hence it is convenient
to associate an appropriate string x, with the state s). States that are not accessible are sometimes called
disconnected, inaccessible, or unreachable.

Example 3.1
The machine defined in Figure 3.1 satisfies the definition of a deterministic finite automaton, but is
disconnected since r cannot be reached by any string from the start state q. Note that x q could be λ or
10, while x t might be 0 or 111
10 111. There is no candidate for x r . Furthermore, r could be “thrown away”
without affecting the language that this machine accepts. This will be one of the techniques we will use
to minimize finite automata: removing the inaccessible states.
There is a second way for an automaton to have superfluous states, as shown by the automata in the
following examples. An overabundance of states may be present, recording nonessential information
and consequently distinguishing between strings in ways that are unnecessary.

Example 3.2
Consider the four-state DFA over {a, b}∗ in which s 0 is the start and only final state, defined in Figure
3.2. This automaton is clearly connected, but it is still not optimal. This machine accepts all strings
whose length is a multiple of 3, and s 1 and s 2 are really “remembering” the same information, that is,
that we currently have read a string that is one more than a multiple of 3. The fact that some strings
that end in a are sent to s 1 , while those that end in b may be sent to s 2 , is of no real importance; we
do not have to “remember” what the last letter in the string actually was in order to correctly accept the
given language. The states s 1 and s 2 are in some sense equivalent, since they are performing the same
function. The careful reader may have noticed that this language could have been recognized with a
three-state machine, in which a single state combines the functions of s 1 and s 2 .

86
Figure 3.1: The automaton discussed in Example 3.1

Figure 3.2: The first automaton discussed in Example 3.2

Now consider the automaton shown in Figure 3.3, in which there are three superfluous states. This
automaton accepts the same language as the DFA in Figure 3.2, but this time not only are s 1 and s 2
performing the same function, but s 3 and s 4 are “equivalent,” and s 0 and s 5 are both “remembering” that
there has been a multiple of three letters seen so far. Note that it is not enough to check that s 1 and s 2 take
you to exactly the same places (as was the case in the first example); in this example, the arrows coming
out of s 1 and s 2 do not point to the same places. The important thing is that, when leaving s 1 or s 2 , when
a is seen, we go to equivalent states, and when processing b from s 1 or s 2 , we also go to equivalent states.
However, deciding whether two states are equivalent or not is perhaps a little less straightforward than it
may at first seem. This sets the stage for the appropriate definition of equivalence.

Definition 3.2 Given a finite automaton A〈Σ, S, s 0 , δ, F 〉, there is a relation between the states of A called
E A , the state equivalence relation on A, defined by
(∀s ∈ S)(∀t ∈ S)(sE A t ⇔ (∀x ∈ Σ∗ )(δ(s, x) ∈ F ⇔ δ(t , x) ∈ F ))

In other words, we will relate s and t iff it is not possible to distinguish whether we are starting from state
s or state t ; each string x ∈ Σ∗ will either take x to a final state when starting from s and also take x to a
final state from t , or neither s nor t will take x to a final state.
Another way of looking at this concept is to define new machines that “look like” A, but have different
start states. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉 and two states s, t ∈ S, define a new automaton

87
Figure 3.3: The second automaton discussed in Example 3.2

A t = 〈Σ, S, t , δ, F 〉 that has t as a start state, and another automaton A s = 〈Σ, S, s, δ, F 〉 having s as a start
state. Then sE A t ⇔ L(A s ) = L(A t ). (Why is this an equivalent definition?) These sets of words will be
used in later chapters and are referred to as terminal sets. T (A, t ) will denote the set of all words that
reach final states from t , and thus T (A, t ) = L(A t ) = {x |δ(t , x) ∈ F }.
In terms of the black box model presented in Chapter 1, we see that we cannot distinguish between
A s and A t by placing matching strings on the input tapes and observing the acceptance lights of the
two machines. For any string, both A s and A t will accept, or both will reject; without looking inside the
black boxes, there is no way to tell whether we are starting in state s or state t . This highlights the sense in
which s and t are deemed equivalent: we cannot distinguish between s and t by the subsequent behavior
of the automaton.
The modified automaton A t , which gives rise to the terminal set T (A, t ), can be contrasted with
the modified automaton A t = 〈Σ, S, s 0 , δ, {t }〉 from Chapter 2, which recognized the initial set I (A, t ) =
{x |δ(s 0 , x) = t }. Notice that initial sets are comprised of strings that move from the start state to the
distinguished state t , while terminal sets are made up of strings that go from t to a final state.

Example 3.3
The automaton N discussed in Example 2.6 (Figure 3.4) has the following relations comprising E N :

s0 E N s0
s1 E N s1 , s1 E N s2 , s1 E N s3
s2 E N s1 , s2 E N s2 , s2 E N s3
s3 E N s1 , s3 E N s2 , s3 E N s3

This can be succinctly described by listing the equivalence classes:

88
Figure 3.4: The automaton N discussed in Example 3.3

[s 0 ]E N = {s 0 }
[s 1 ]E N = [s 2 ]E N = [s 3 ]E N = {s 1 , s 2 , s 3 },
and we will abuse our notation slightly and blur the distinction between the relation E N with the partition
it generates by writing E N = {{s 0 }, {s 1 , s 2 , s 3 }}
Recall that Example 2.6 showed that the minimal machine that accepted L(N ) had two states; it will
be seen that it is no coincidence that E N has exactly the same number of equivalence classes.

Definition 3.3 A finite automaton A = 〈Σ, S, s 0 , δ, F 〉 is called reduced iff (∀s, t ∈ S)(sE A t ⇔ s = t ).

In a reduced machine, E A must be the identity relation on S, and in this case each equivalence class
will contain only a single element.

Example 3.4
The automaton N in Figure 3.4 is not reduced, since Example 3.3 shows that [s 2 ]E N contains three states.
On the other hand, the automaton A displayed in Figure 3.5a is reduced since [s 0 ]E A = {s 0 } and [s 1 ]E A =
{s 1 }. The concepts of homomorphism and isomorphism will play an integral part in justifying the cor-
rectness of the algorithms that produce the optimal DFA for a given language. We need to formalize what
we mean when we say that two automata are “the same.” The following examples illustrate the criteria
that must exist between similar machines.

Example 3.5
We now consider the automaton B shown in Figure 3.5b, which looks suspiciously like the DFA A given
in Figure 3.5a. In fact, it is basically the “same” machine. While it has been oriented differently (which
has no effect on the S function), and the start state has been labeled q 0 rather than s 0 , and the final state
is called q 1 rather than s 1 , A and B are otherwise “identical.” For such a relabeling to truly reflect the
same automaton structure, certain conditions must be met, as illustrated in the following examples.

Example 3.6
Consider machine C , defined by the state transition diagram given in Figure 3.5c. This machine is iden-
tical to B , except for the position of the start state. However, it is not the same machine as B, since it

89
Figure 3.5: (a) The automaton A (b) The automaton B(c) The automaton C (d) The automaton D (e) The
automaton E

behaves differently (and in fact accepts a different language). Thus we see that it is important for the
start state of one machine to correspond to the start state of the other machine. Note that we cannot cir-
cumvent this by letting q 0 correspond to s 1 , and q 1 correspond to s 0 , since other problems will develop,
as shown next in Example 3.7.

Example 3.7

Let machine D be defined by the state transition diagram given in Figure 3.5d. The automata B and
D (Figures 3.5b and 3.5d) look much the same, with start states corresponding, but they are not the
same (and will in fact accept different languages), because we cannot get the final states to correspond
correctly. Even if we do get the start and final states to agree, we still have to make sure that the transitions
correspond. This is illustrated in the next example.

90
Figure 3.6: (a) The DFA M discussed in Example 3.10 (b) The DFA N discussed in Example 3.10

Example 3.8
Consider the machine E given in Figure 3.5e. In this automaton, when leaving the start state, we travel
to a final state if we see 0 , and remain at the start state (which is nonfinal) if we see 1 ; this is different
from what happened in machine A, where we traveled to a final state regardless of whether we saw 0 or
1 . Thus it is seen that we not only have to find a correspondence (which can be thought of as a function
µ) between the states of our two machines, but we must do this in a way that satisfies the above three
conditions (or else we cannot claim the machines are “the same”). This is summed up in the following
definition of a homomorphism.

Definition 3.4 Given two finite automata, A = 〈Σ, S A , s 0 A , δ A , F A 〉 and B = 〈Σ, S B , s 0B , δB , F B 〉, and a func-
tion µ : S A → S B . µ is called a finite automata homomorphism from A to B iff the following three condi-
tions hold:

i. µ(s 0 A ) = s 0B .

ii. (∀s ∈ S A )(s ∈ F A ⇔ µ(s) ∈ F B ).

iii. (∀s ∈ S A )(∀a ∈ Σ)(µ(δ A (s, a)) = δS (µ(s), a)).

Note that µ is called a homomorphism from A to B , but it is actually a function between state sets,
that is, from S A to S B .

Example 3.9
Machines A and B in Example 3.5 are homomorphic, since the homomorphism µ: {s 0 , s 1 } → {q 0 , q 1 } de-
fined by µ(s 0 ) = q 0 and µ(s 1 ) = q 1 satisfies the three conditions.
The following example shows that even if we can find a homomorphism that satisfies the 3 conditions
the machines might not be the same.

Example 3.10
Let M = 〈Σ, {s 0 , s 1 , s 2 }, s 0 , δM , {s 1 }〉 and N = 〈Σ, {q 0 , q 1 }, q 0 , δN , {q 1 }〉 be given by the state transition dia-
grams in Figures 3.6a and 3.6b. Define a homomorphism ψ: {s 0 , s 1 , s 2 } → {q 0 , q 1 } by ψ(s 0 ) = q 0 , ψ(s 1 ) =

91
q 1 , and ψ(s 2 ) = q 0 . Note that the start state maps to the start state, final states map to final states (and
nonfinal states map to nonfinal states), and, furthermore, the transitions agree. Here is the statement
that the 0 transition out of state s 0 is consistent:

ψ(δM (s 0 , 0)) = ψ(s 1 ) = q 1 = δN (q 0 , 0) = δN (ψ(s 0 ), 0)

Note that this really does say that the transition labeled 0 leaving the start state conforms; s 0 has a 0 -
transition pointing to s 1 , and so the 0 -transition from ψ(s 0 ) should point to ψ(s 1 ) (that is, q 0 should
point to q 1 ). But the transition taken from s 0 upon seeing a 0 , in our notation, is δM (s 0 , 0), and the place
q 0 goes to is δN (q 0 , 0). We wish to make sure that the state in N corresponding to where the 0 -transition
from s 0 points, denoted by ψ(δM (s 0 , 0)), agrees with the state to which q 0 points. Hence we require
ψ(δM (s 0 , 0)) = δN (q 0 , 0). In the last formula, q 0 was chosen because that was the state corresponding to
s 0 ; that is, ψ(s 0 ) = q 0 . Hence, in our formal notation, we were really checking ψ(δM (s 0 , 0)) = δN (ψ(s 0 ), 0).
Hence, we see that rule (iii) requires us to check all transitions leading out of all states for all letters; that
is, (∀s ∈ S M )(∀a ∈ Σ)(ψ(δM (s, a)) = δN (ψ(s), a)). Applying this rule to each choice of letters a and states
s, we have
ψ(δM (s 0 , 0)) = ψ(s 1 ) = q 1 = δN (q 0 , 0) = δN (ψ(s 0 ), 0)
ψ(δM (s 0 , 1)) = ψ(s 1 ) = q 1 = δN (q 0 , 1) = δN (ψ(s 0 ), 1)
ψ(δM (s 1 , 0)) = ψ(s 1 ) = q 1 = δN (q 1 , 0) = δN (ψ(s 1 ), 0)
ψ(δM (s 1 , 1)) = ψ(s 0 ) = q 0 = δN (q 1 , 1) = δN (ψ(s 1 ), 1)
ψ(δM (s 2 , 0)) = ψ(s 1 ) = q 1 = δN (q 0 , 0) = δN (ψ(s 2 ), 0)
ψ(δM (s 2 , 1)) = ψ(s 1 ) = q 1 = δN (q 0 , 1) = δN (ψ(s 2 ), 1)

Hence ψ is a homomorphism between M and N even though M has three states and N has two states.
While the existence of a homomorphism is not enough to ensure that the machines are “the same,” the
exercises for this chapter indicate that the existence of a homomorphism is enough to ensure that the
machines are equivalent. The extra condition we need to guarantee that the machines are identical
(except for a trivial renaming of the states) is that ψ be a bijection.

Definition 3.5 Given two finite automata A = 〈Σ, S A , s 0 A , δ A , F A 〉 and B = 〈Σ, S B , s 0B , δB , F B 〉, and a func-
tion µ : S A → S B , µ is called a finite automata isomorphism from A to B iff the following five conditions
hold:

i. µ(s 0 A ) = s 0B .

ii. (∀s ∈ S A )(s ∈ F A ⇔ µ(s) ∈ F B ).

iii. (∀s ∈ S A )(∀a ∈ Σ)(µ(δ A (s, a)) = δB (µ(s), a)).

iv. µ is a one-to-one function from S A to S B .

v. µ is onto S B .

Example 3.11
µ from Example 3.9 is an isomorphism. Example 3.5 illustrated that the automaton A was essentially
“the same” as B except for the way the states were named. Note that µ can be thought of as the recipe
for relabeling the states of A to form a machine that would then be in the very strictest sense absolutely
identical to B . µ from Example 3.10 is not an isomorphism because it is not one to one.

92
Definition 3.6 Given two finite automata A = 〈Σ, S A , s 0 A , δ A , F A 〉 and B = 〈Σ, S B , s 0B , δB , F B 〉, A is said to
be isomorphic to B iff there exists a finite automata isomorphism between A and B , and we will write
A∼= B.

Example 3.12
Machines A and B from Examples 3.4 and 3.5 are isomorphic. Machines M and N from Example 3.10 are
not isomorphic (and not just because the particular function µ fails to satisfy the conditions; we must
actually prove that no function exists that qualifies as an isomorphism between M and N ).
Now that we have rigorously defined the concept of two machines being “essentially identical,” we
can prove that, given a language L, any reduced and connected machine A accepting L must be minimal,
that is, have as few states as possible for that particular language. We will prove this assertion by showing
that any such A is isomorphic to A RL , which was shown in Corollary 3.2 to be the “smallest” possible
machine for L.

Theorem 3.1 Let L be any FAD language over an alphabet Σ, and let A = 〈Σ, S, s 0 , δ, F 〉 be any reduced and
connected automaton that accepts L. Then A ∼ = A RL .
Proof. We must try to define a reasonable function µ from the states of A to the states of A RL (which
you should recall corresponded to equivalence classes of R L ). A natural way to define µ (which happens to
work!) is: For each s ∈ S, find a string x s ∈ Σ∗ 3 δ(s 0 , x s ) = s. (Since A is connected, we are guaranteed to
find such an x s . In fact, there may be many strings that take us from s 0 to s; choose any one of them, and
call it x s .) We need to map s to some equivalence class of R L ; the logical choice is the class containing x s .
Thus we define
µ(s) = [x s ]RL

An immediate question comes to mind: There may be several strings that we could use for x s ; does it matter
which one we choose to find the equivalence class? It would not do if, say, R L consisted of two equivalence
classes, the even-length strings = [11 11]R L and the odd-length strings = [0]RL , and both δ(s 0 , 0) and δ(s 0 , 11)
11
equaled s. Then, on the one hand, µ(s) should be [0]RL and, on the other hand, it should be [11]RL . µ must
be a function; it cannot send s to two different equivalence classes. Note that there would be no problem if
δ(s 0 , 11) = s and δ(s 0 , 1111) = s, since [11]RL = [1111]RL , both of which represent the set of all even-length
strings. Here x s could be 11 1111, and there is no inconsistency in the way in which µ(s) is
11, or it could be 1111
defined; in either case, s is mapped by µ to the class of even-length strings. Thus we must first show:

1. µ is well-defined (which means it is defined everywhere, and the definitions are consistent; that is, if
there are two choices for x s , say, y and z, then [y]RL = [z]RL ). Since A is connected, each state s can
be reached by some string x s ; that is, (∀s ∈ S)(∃x s ∈ Σ∗ )(δ(s 0 , x s ) = s), and so there is indeed (at least)
one equivalence class ([x s ]RL ) to which s maps under µ.. We therefore have µ(s) = [x s ]RL . Thus µ is
defined everywhere. We must still make sure that µ is not multiply defined: Let x, y ∈ Σ∗ and assume
δ(s 0 , x) = δ(s 0 , y). Then

δ(s 0 , x) = δ(s 0 , y) ⇒ (by definition of =)

(∀u ∈ Σ∗ )(δ(δ(s 0 , x), u) ∈ F ⇔ δ(δ(s 0 , y), u) ∈ F ) ⇒ (by Theorem 1.1)
(∀u ∈ Σ∗ )(δ(s 0 , xu) ∈ F ⇔ δ(s 0 , yu) ∈ F ) ⇒ (by definition of L)
(∀u ∈ Σ∗ )(xu ∈ L ⇔ yu ∈ L) ⇒ (by definition of RL)
x RL y ⇒ (by definition of [ ])
[x]RL = [y]RL

93
Thus, if both x and y take us from s 0 to s, then it does not matter whether we let µ(s) equal [x]RL or
[y]RL , since they are identical. µ is therefore a bona fide function.

2. µ is onto S RL . Every equivalence class must be the image of some state in S, since (∀[x]RL ∈ S RL )([x]RL =
µ(δ(s 0 , x))), and so δ(s 0 , x) maps to [x]RL .

3. µ(s 0 ) = s 0RL
s 0RL = [λ]RL = µ(δ(s 0 , λ)) = µ(s 0 )

4. Final states map to final states; that is, (∀s ∈ S)(µ(s) ∈ F RL ⇔ s ∈ F ). Choose an s ∈ S and pick a
corresponding x s ∈ Σ∗ such that δ(s 0 , x s ) = s. Then

s ∈F ⇔ (by definition of x s , L)
x s ∈ L(A) ⇔ (by definition of L)
xs ∈ L ⇔ (by definition of F RL )
[x s ]RL ∈ F RL ⇔ (by definition of µ)
µ(s) ∈ F RL

5. The transitions match up; that is, (∀s ∈ S)(∀a ∈ Σ)(µ(δ(s, a)) = δRL (µ(s), a)). Choose an s ∈ S and
pick a corresponding x s ∈ Σ∗ such that δ(s 0 , x s ) = s. Note that this implies that [x s ] = µ(s) = µ(δ(s 0 , x s )).
Then

µ(δ(s, a)) = (by definition of x s )

µ(δ(δ(s 0 , x s ), a)) = (by Theorem 1.1)
µ(δ(s 0 , x s a)) = (by definition of µ)
[x s a]RL = (by definition of δRL )
δRL ([x s ]RL , a) = (by definition of µ and x s )
δRL (µ(s), a)

So far we have not needed the fact that A was reduced. In fact, we have now proved that µ is a
homomorphism from A to A RL as long as A is merely connected. However, if A is reduced, we can
show:

6. µ is one to one; that is, if µ(s) = µ(t ), then s = t . Let s, t ∈ S and assume µ(s) = µ(t ).

µ(s) = µ(t ) ⇒ (by definition of =)

(∀u ∈ Σ∗ )(δRL (µ(s), u) = δRL (µ(t ), u)) ⇔ [by property (5), induction]
(∀u ∈ Σ∗ )(µ(δ(s, u)) = µ(δ(t , u))) ⇒ (by definition of =)
(∀u ∈ Σ∗ )(µ(δ(s, u)) ∈ F RL ⇔ µ(δ(t , u)) ∈ F RL ⇔ [by property (4) above]
(∀u ∈ Σ∗)(δ(s, u) ∈ F ⇔ δ(t , u) ∈ F ) ⇔ (by definition of E A )
sE A t ⇔ (since A is reduced)
s=t

Thus, by results (1) through (6), µ. is a well-defined homomorphism that is also a bijection; so µ is
an isomorphism and therefore A ∼ = A RL .

94
Corollary 3.3 Let A and B be reduced and connected finite automata. Under these conditions, A is equiv-
alent to B iff A ∼ = B.
Proof. If A ∼ = B , it is easy to show that A is equivalent to B (as indicated in the exercises, this implication
is true even if A and B are not reduced and connected). Now assume the hypothesis that A and B are
reduced and connected does hold, and that A is equivalent to B . Since A is minimal, A ∼ = A RL(A) . Similarly,
B∼ = A RL(A) = A RL(B ) ∼
= A RL(B ) . Since L(A) = L(B ), A RL(A) = A RL(B ) . Therefore, A ∼ = B.

3.2 Minimization Algorithms

From the results in the previous section, it follows that a reduced and connected finite automaton must
be minimal. This section demonstrates how to transform an existing DFA into an equivalent machine
that is both reduced and connected and hence is the most efficient machine possible for the given lan-
guage. The designer of an automaton can therefore focus solely on producing a machine that recognizes
the correct set of strings (without regard for efficiency), knowing that the techniques presented in this
section can later be employed to shrink the DFA to its optimal size. The concepts explored in Chapters 4,
5, and 6 will provide further tools to aid in the design process and corresponding techniques to achieve
optimality.

Corollary 3.4 A reduced and connected deterministic finite automaton A = 〈Σ, S, s 0 , δ, F 〉 is minimal.
Proof. By Theorem 3.1, there is an isomorphism between A and A RL . Since an isomorphism is a bi-
jection between the state sets, kSk = kS RL k. By Corollary 3.2, A RL has the smallest number of states, and
therefore so does A.

Thus, if we had a machine for L that we could verify was reduced and connected, we would be able to
state that we had found the minimal machine accepting L. We therefore would like some algorithms for
determining if a machine M has these properties. We would also like to find a method for transforming
a nonoptimal machine into one with the desired properties. The simplest transformation is from a dis-
connected machine to a connected machine: given any machine A, we will define a connected machine
N that accepts the same language that A did; that is, L(A) = L(A c )

Definition 3.7 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, define a new automaton A c = 〈Σ, S c , s 0c , δc , F c 〉,
called A connected, by
S c = {s ∈ S | ∃x ∈ Σ∗ ∈ δ(s 0 , x) = s}
s 0c = s 0
F c = F ∩ S c = { f ∈ F | ∃x ∈ Σ∗ 3 δ(s 0 , x) = f }
and δc is derived from the restriction of δ to S c × Σ:

(∀a ∈ Σ)(∀s ∈ S c )(δc (s, a) = δ(s, a))

A c is thus simply the machine A with the unreachable states “thrown away”; s 0 can be reached by
x = λ, so it is a valid choice for the start state in A c . F c is simply the final states that can be reached from
s 0 , and δc is the collection of transitions that still come from (and consequently point to) states in the
connected portion. Actually, δc was defined to be the transitions that merely come from states in S c , with
no mention of any restrictions on the range of δc . We must have, however, δc : S c ×Σ → S c ; in order for N
to be well defined, A c must be shown to map into the proper range. It would not do to have a transition
leading from a state in A c to a state that is not in the new state set of A c . The fact that δc does indeed
have the desired properties is relegated to the exercises.

95
Figure 3.7: (a) The DFA M discussed in Example 3.13 (b) The DFA M 0 discussed in Example 3.13

Example 3.13
Let M = 〈{a, b}, {q 0 , q 1 , q 2 , q 3 }, q 0 , δ, {q 1 , q 3 }〉, as illustrated in Figure 3.7a. By inspection, the only states
that can be reached from the start state are q 0 and q 3 . Hence M c = 〈{a, b}, {q 0 , q 3 }, q 0 , δc , {q 3 }〉. The
resulting automaton is shown in Figure 3.7b. An algorithm for effectively computing S c will be presented
later.

Theorem 3.2 Given any finite automaton A, the new machine A c is indeed connected.
Proof. This is an immediate consequence of the way S c was defined.

Definition 3.7 and Theorem 3.2 would be of little consequence if it were not for the fact that A and
A c accept the same language. A c is in fact equivalent to A, as proved in Theorem 3.3.

Theorem 3.3 Given any finite automaton A = 〈Σ, S, s 0 , δ, F 〉, A and A c are equivalent, that is, L(A c ) =
L(A).
Proof. Let x ∈ Σ∗ . Then:

x ∈ L(A) ⇔ (by definition of L)

∃s ∈ S 3 (δ(s 0 , x) = s ∧ s ∈ F ) ⇔ (by definition of S c )
s ∈ Sc ∧ s ∈ F ⇔ (by definition of ∩)
s ∈ (S c ∩ F ) ⇔ (by definition of F c )
s ∈ Fc ⇔ (by definition of s [above, on line 2])
δ(s 0 , x) ∈ F c ⇔ (by definition of δc and induction)
c
δ (s 0 , x) ∈ F c ⇔ (by definition of s 0c )
c
δ (s 0c , x) ∈ F c ⇔ (by definition of L)
x ∈ L(A c )

96
Thus, given any machine A, we can find an equivalent machine (that is, a machine that accepts the
same language as A) that is connected. Furthermore, there is an algorithm that can be applied to find A c
(that is, we don’t just know that such a machine exists, we actually have a method for calculating what
it is). The definition of S c implies that there is a procedure for finding S c : one can begin enumerating
the strings x in Σ∗ , and by applying the transition function to each x, the new states that are reached can
be included in S c . This is not a very satisfactory process because there are an infinite number of strings
in Σ∗ to check. However, the indicated proof for Theorem 2.7 shows that, if a state can be reached by
a “long” string, then it can be reached by a “short” string. Thus, we will only need to check the “short”
strings. In particular,
Sc ≡ δ(s 0 , x) = δ(s 0 , x)
[ [
x∈Σ∗ x∈Q

where Q consists of the “short” strings: Q = {x ∈ Σ∗ | |x| < kSk}. Thus, Q is the set of all strings of length
less than the number of states in the DFA. Q is a finite set, and therefore we can check all strings x in
Q in a finite amount of time; we therefore have an algorithm (that is, a procedure that is guaranteed to
halt) for finding S c , and consequently an algorithm for constructing A c . Thus, given any machine, we
can find an equivalent machine that is connected. The above method is not very efficient because many
calculations are constantly repeated. A better algorithm based on Definition 3.10 will be presented later.
We now turn our attention to building a reduced machine from an arbitrary machine. The following
definition gives a consistent way to combine the redundant states identified by the state equivalence
relation E A .

Definition 3.8 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, define a new finite automaton A/E A , called A
modulo its state equivalence relation, by

A/E A = 〈Σ, S E A , s 0E A , δE A , F E A 〉

where

SE A = {[s]E A | s ∈ S}
s 0E A = [s 0 ]E A
FE A = {[s]E A | s ∈ F }

and δE A is defined by
(∀a ∈ Σ)(∀[s]E A ∈ S E A )(δE A ([s]E A , a) = [δ(s, a)]E A )

Thus, there is one state in A/E A for each equivalence class in E A , the new start state is the equivalence
class containing s 0 , and the final states are those equivalence classes that are made up of states from F .
The transition function is also defined in a natural manner: Given an equivalence class [t ]E A and a letter
a, choose one state, say t , from the class and see what state the old transition specified (δ(t , a)). The
new transition function will choose the equivalence class containing this new state ([δ(t , a)]E A ). Once
again, there may be several states in an equivalence class and thus several states from which to choose.
We must make sure that the definition of δE A does not depend on which state of [t ]E A we choose (that is,
we must ascertain that δE A is well defined.) Similarly, F E A should be shown to be well defined (see the
exercises).
It stands to reason that if we coalesce all the states that performed the same function (that is, were
related by E A ) into a single state the resulting machine should no longer have distinct states that perform
the same function. We can indeed prove that this is the case, that is, that A/E A is reduced.

97
Theorem 3.4 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, A/E A is reduced.
Proof. Note that the state equivalence relation for A/E A is E (A/E A ), not E A . We need to show that if two
states of A/E A are related by the state equivalence relation for A/E A then those two states are identical; that
is,
(∀s, t ∈ S E A )(sE (A/E A ) t ⇔ s = t )

Assume s, t ∈ S E A . Then ∃s 0 , t 0 ∈ S 3 s = [s 0 ]E A and t = [t 0 ]E A ; furthermore,

sE (A/E A ) t ⇔ (by definition of s 0 , t 0 )

[s ]E A E (A/E A ) [t 0 ]E A
0
⇔ (by definition of E (A/E A ) )
(∀x ∈ Σ )(δE A ([s ], x) ∈ F E A ⇔ δE A ([t 0 ], x) ∈ F E A )
∗ 0
⇔ (by δE A and induction)
(∀x ∈ Σ∗ )([δ(s 0 , x)] ∈ F E A ⇔ [δ(t 0 , x)] ∈ F E A ) ⇔ (by definition of F E A )
(∀x ∈ Σ∗ )(δ(s 0 , x) ∈ F ⇔ δ(t 0 , x) ∈ F ) ⇔ (by definition of E A )
s0E A t 0 ⇔ (by definition of [ ])
[s ]E A = [t 0 ]E A
0
⇔ (by definition of s, t )
s=t

Since we ultimately want to first apply Definition 3.7 to find a connected DFA and then apply Defini-
tion 3.8 to reduce that DFA, we wish to show that this process of obtaining a reduced machine does not
destroy connectedness. We can be assured that if Definition 3.8 is applied to a connected machine the
result will then be both connected (Theorem 3.5) and reduced (Theorem 3.4).

Theorem 3.5 If A = 〈Σ, S, s 0 , δ, F 〉 is connected, then A/E A is connected.

Proof. We need to show that every state in A/E A can be reached from the start state of A/E A . Assume
s ∈ S E A . Then ∃s 0 ∈ S 3 s = [s 0 ]E A ; but A was connected, and so there exists an x ∈ Σ∗ such that δ(s 0 , x) = s 0 ;
that is, there is a string that will take us from s 0 to s 0 in the original machine A. This same string will take
us from s 0E A to s in A/E A since

δ(s 0 , x) = s 0 ⇒ (by definition of =)

[δ(s 0 , x)]E A = [s 0 ]E A ⇔ (by definition of δE A and induction)
δE A ([s 0 ]E A , x) = [s 0 ]E A ⇔ (by definition of s 0E A )
δE A (s 0E A , x) = [s 0 ]E A

Therefore, every state s ∈ S E A can be reached from the start state and A/E A is thus connected.

Finally, we want to show that we do not change the language by reducing the machine. The following
theorem proves that A/E A and A are indeed equivalent.

Theorem 3.6 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, then L(A/E A ) = L(A).

Proof.

x ∈ L(A/E A ) ⇔ (by definition of L)

δE A (s 0E A , x) ∈ F E A ⇔ (by definition of s 0E A )
δE A ([s 0 ]E A , x) ∈ F E A ⇔ (by definition of δE A and induction)
[δ(s 0 , x)]E A ∈ F E A ⇔ (by definition of F E A )
δ(s 0 , x) ∈ F ⇔ (by definition of L)
x ∈ L(A)

98
Theorem 3.7 Given a finite automaton definable language L and any finite automaton A that accepts
L, then there exists an algorithm for constructing the unique (up to isomorphism) minimum-state finite
automaton accepting L.
Proof. For the finite automaton A that accepts L, there is an algorithm for finding the set of connected
states in A, and therefore there exists an algorithm for constructing A c , which is a connected automaton
with the property that L(A c ) = L(A) = L.
Furthermore, there exists an algorithm for computing E A c the state equivalence relation on A c ; conse-
quently, there is an algorithm for constructing A c /E Ac , which is a reduced, connected automaton with the
property that L(A c /E Ac ) ∼
= L(A c ) = L(A) = L
From the main theorem on minimization (Theorem 3.1), we know that A c /E Ac ∼ = A RL , and A RL is the
unique (up to isomorphism) minimum-state finite automaton accepting L. Consequently, the derived
automaton A c /E Ac is likewise a minimum-state automaton.

The remainder of the chapter is devoted to developing the methods for computing S c and E A and
justifying that the resulting algorithms are indeed correct.
Our formal definition of E A requires that an infinite number of strings be checked before we can find
the equivalence classes upon which A c /E Ac is based. If we could find an algorithm to generate E A , we
would then have an algorithm for building the minimal machine. This is the motivation for Definition
3.9.

Definition 3.9 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉 and an integer i , define the ith partial state
equivalence relation on A, a relation between the states of A denoted by E i A , by

(∀s, t ∈ S)(sE i A t iff (∀x ∈ Σ∗ 3 |x| ≤ i )(δ(s, x) ∈ F ⇔ δ(t , x) ∈ F ))

Thus E i A relates states that cannot be distinguished by strings of length i or less. Contrast this to the
definition of E A , which related states that could not be distinguished by any string of any length. E 0A
denotes a relatively weak criterion that is progressively strengthened with successive E i A relations. As
illustrated by Example 3.14, these relations culminate in the relation we seek, E A .

Example 3.14
Let B be the DFA illustrated in Figure 3.8. Consider the relation E 0B . The empty string λ can differentiate
between q 0 and the final states, but cannot differentiate between q 1 , q 2 , q 3 , and q 4 . Thus E 0B has two
equivalence classes, {q 0 } and {q 1 , q 2 , q 3 , q 4 }.
In E 1B , λ still differentiates q 0 from the other states, but the string 1 can distinguish q 3 from q 1 , q 2 ,
and q 4 since δ(q 3 , 1) ∉ F , but δ(q i , 1) ∈ F for i = 1, 2, and 4. We still cannot distinguish between q 1 , q 2 ,
and q 4 with strings of length 0 or 1, so these remain together and E 1B = {{q 0 }, {q 3 }, {q 1 , q 2 , q 4 }}. Similarly,
since δ(q 1 , 11) ∈ F but δ(q 2 , 11) ∉ F and δ(q 4 , 11) ∉ F , E 2B = {{q 0 }, {q 3 }, {q 1 }, {q 2 , q 4 }}. Further investigation
shows E 2B = E 3B = E 4B = E 5B = · · · , and indeed E B = E 2B .
The i th state equivalence relation provides a convenient vehicle for computing E A . The behavior
exhibited by the relations in Example 3.14 follow a pattern that is similar for all deterministic finite au-
tomata. The following observations will culminate in a proof that the calculation of successive partial
state equivalence relations is guaranteed to lead to the relation E A .
Given an integer i and a finite alphabet Σ, there is clearly an algorithm for finding E i A since there
are only a finite number of strings in Σ0 ∪ Σ1 ∪ Σ2 ∪ · · · ∪ Σi . Furthermore, given every E i A , there is an

99
Figure 3.8: The DFA B discussed in Example 3.14

expression for E A :
∞
\
E A = E 0A ∩ E 1A ∩ E 2A ∩ E 3A ∩ · · · ∩ E n A ∩ · · · = EjA
j =0

The proof is relegated to the exercises and is related to the fact that

Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ · · · ∪ Σn ∪ · · ·

Finally, it should be clear that if two states cannot be distinguished by strings of length 7 or less, they
cannot be distinguished by strings of length 6 or less, which means E 7A is a refinement of E 6A . This
principle generalizes, as formalized below.

Lemma 3.1 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉 and an integer m, E m+1A is a refinement of E m A ,
which means
(∀s, t ∈ S)(sE m+1a t ⇒ sE m A t )
or
E m+1A ⊆ E m A
Proof. See the exercises.

Lemma 3.2 shows that each E m A is related to the desired E A . Lemma 3.1 thus shows that successive
E m A relations come closer to “looking like” E A .

Lemma 3.2 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉 and an integer m, E A is a refinement of E m A , and
so
(∀s, t ∈ S)(sE A t ⇒ sE m A t )
That is,
E A ⊆ Em A
Proof. Let s, t ∈ S. Then
sE A t ⇒ (by definition of E A )
(∀x ∈ Σ )(δ(s, x) ∈ F ⇔ δ(t , x) ∈ F )
∗
⇒ (true for all x, so it is true for all “short” x)
(∀x ∈ Σ 3 |x| ≤ m)(δ(s, x) ∈ F ⇔ δ(t , x) ∈ F )
∗
⇒ (by definition of E m A )
sE m A t

100
While it is clearly possible to find a given E m A by applying the definition to each of the strings in
Σ ∪Σ1 ∪Σ2 ∪· · · Σm , there is a much more efficient way if E m−1A is already known, as outlined in Theorem
0

3.8. A starting point is provided by E 0A , which can be found very easily, as shown by Lemma 3.3. From
E 0A , E 1A can then be found using Theorem 3.8, and then E 2A , and so on.

Lemma 3.3 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, E 0A has two equivalence classes, F and S − F
(unless either F or S − F is empty, in which case there is only one equivalence class, S).
Proof. The proof follows immediately from the definition of E 0A ; the empty string λ differentiates be-
tween final and nonfinal states, producing the equivalence classes outlined above.

Given E 0A as a starting point, Theorem 3.8 shows how successive relations can be efficiently calcu-
lated.

Theorem 3.8 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉,

(∀s ∈ S)(∀t ∈ S)(∀i ∈ N)(sE i +1A t ⇔ sE i A t ∧ (∀a ∈ Σ)(δ(s, a)E i A δ(t , a)))

Proof. Let s ∈ S, t ∈ S. Then

sE i +1A t ⇔ (∀x ∈ Σ∗ 3 |x| ≤ i + 1)(δ(s, x) ∈ F ⇔ δ(t , x) ∈ F )

⇔ (∀x ∈ Σ∗ 3 |x| ≤ i )[δ(s, x) ∈ F ⇔ δ(t , x) ∈ F ]∧
(∀y ∈ Σ∗ 3 |y| = i + 1)[δ(s, y) ∈ F ⇔ δ(t , y) ∈ F ]
⇔ (∀x ∈ Σ∗ 3 |x| ≤ i )[δ(s, x) ∈ F ⇔ δ(t , x) ∈ F ]∧
(∀y ∈ Σ∗ 3 1 ≤ |y| ≤ i + 1)[δ(s, y) ∈ F ⇔ δ(t , y) ∈ F ]
⇔ (∀x ∈ Σ∗ 3 |x| ≤ i )[δ(s, x) ∈ F ⇔ δ(t , x) ∈ F ]∧
(∀a ∈ Σ)(∀x ∈ Σ∗ 3 |x| ≤ i )[δ(s, ax) ∈ F ⇔ δ(t , ax) ∈ F ]
⇔ sE i A t ∧
(∀a ∈ Σ)(∀x ∈ Σ∗ 3 |x| ≤ i )(δ(s, ax) ∈ F ⇔ δ(t , ax) ∈ F )
⇔ sE i A t ∧
(∀a ∈ Σ)(∀x ∈ Σ∗ 3 |x| ≤ i )(δ(δ(s, a), x) ∈ F ⇔ δ(δ(t , a)x) ∈ F )
⇔ sE i A t ∧ (∀a ∈ Σ)(δ(s, a)E i A δ(t , a))

Note that Theorem 3.8 gives a far superior method for determining successive E i A relations. The
definition required the examination of many (long) strings using the δ function; Theorem 3.8 allows us
to simply check a few letters using the δ function, without needing δ at all. Theorems 3.9, 3.10, and 3.11
will assure us that E A will eventually be found. The following theorem guarantees that the relations,
should they ever begin to look alike, will continue to look alike as successive relations are computed.

Theorem 3.9 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉,

(∃m ∈ N 3 E m A = E m+1A ) ⇒ (∀k ∈ N)(E m+k A = E m A )

Proof. By induction on k; see the exercises.

The result in Theorem 3.9 is essential to the proof of the next theorem, which guarantees that when
successive relations look alike they are identical to E A .

101
Theorem 3.10 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉,

(∃m ∈ N 3 E m A = E m+1A ) ⇒ E m A = E A

Proof. Assume ∃m ∈ N 3 E m A = E m+1A and let q, r ∈ S:

1. By Lemma 3.2, qE A r ⇒ qE m A r .

2. Conversely, assume qE m A r . Then

qE m A r ⇒ (by assumption)
qE m+l A r ⇒ (by Theorem 3.9)
(∀ j ≥ m)(qE j A r )

Furthermore, by Lemma 3.1, (∀ j ≤ m)(qE j A r ), and so (∀ j ∈ N)(qE j A r ); but by definition or E A , this

implies qE A r . We have just shown that qE m A r ⇒ qE A r .

3. Combining (1) and (2), we have (∀q, r ∈ S)(qE m A r ⇔ qE A r ), and so E m A = E A .

The next theorem guarantees that these relations will eventually look alike (and so by Theorem 3.10,
we are assured that successive computations of E i A will yield an expression representing the relation
E A ).

Theorem 3.11 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉,

(∃m ∈ N 3 m ≤ kSk ∧ E m A = E m+1A ).

Proof. Assume the conclusion is false; that is, that E 0A , E 1A , . . . , E kSkA are all distinct. Since E kSkA ⊆ · · · ⊆
E 1A ⊆ E 0A , the only way for two successive relations to be different is for the number of equivalence classes
to increase. Thus,
0 < r k(E 0A ) < r k(E 1A ) < r k(E 2A ) < · · · < r k(E kSkA ),
which means that r k(E kSkA ) > kSk, which is a contradiction (why?). Therefore, not all these relations can
be distinct, and so there is some index m for which E m A = E m+1A .

Corollary 3.5 Given a DFA A = 〈Σ, S, s 0 , δ, F 〉, there is an algorithm for computing E A .

Proof. E A can be found by using Lemma 3.2 to find E 0A , and computing successive E i A relations using
Theorem 3.8 until E i A = E i +1A ; this E i A will equal E A , and this will all happen before i reaches kSk, the
number of states in S. The procedure is therefore guaranteed to halt.

Since E A was the key to producing a reduced machine, we now have an algorithm for taking a DFA
and finding an equivalent DFA that is reduced. The other necessary step needed to find the minimal
machine was to produce a connected DFA from a given automaton. This construction hinged on the
calculation of S c , the set of connected states.
The algorithm suggested by the definition of S c is by no means the most efficient; it involves checking
long strings with the δ function and hence massive duplication of effort. Furthermore, the definition
seems to imply that all the strings in Σ∗ must be checked, which certainly cannot be completed if it is
done one string at a time. Theorem 2.7 can be used to justify that it is unnecessary to check any strings
longer than kSk (see the exercises). Thus S c = {δ(s 0 , x)||x| < kSk}. While this set, being based on a finite

102
number of words, justifies that there is an algorithm for finding S c (and hence there exists an algorithm
for constructing A c ), it is still a very inefficient way to calculate the set of accessible states.
As with the calculation of E A , there is a way to avoid using δ to process long strings when computing
S c . In this case, a better strategy is to begin with s 0 and find all the new states that can be reached from
s 0 with just one transition. Note that this can be done by simply examining the row of the state transition
table corresponding to s 0 , and hence the computation can be accomplished quite fast. Each of these new
states should then be examined in the same fashion to see if they lead to still more states, and this process
can continue until all connected states are found. A sequence of state sets is thereby constructed, in a
similar manner to the way successive partial state equivalence relations E i A were built. This approach is
reflected in Definition 3.10.

Definition 3.10 Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, the ith partial state set C i is defined by the
following rules: Let C 0 = {s 0 } and recursively define

δ(q, a).
[
C i +l = C i ∪
q∈C i ,a∈Σ

C kSk must equal S c (why?), and we will often arrive at the final answer long before kSk iterations have
been calculated (see the exercises and refer to the treatment of E i A ). It can also be proved (by induction)
that C represents the set of all states that can be reached from s 0 by strings of length i or less (see the
exercises).
Recall that the definition of S c involved the extended state transition function δ. Definition 3.10
instead uses the information found in the previous iteration to avoid calculating paths for long strings.
As suggested earlier, there is an even more efficient method of calculating C )i + 1 from C i , since only
paths from the newly added states need be explored anew.

Example 3.15
Consider the DFA D given in Figure 3.9.
C 0 = {s 0 }
and since δ(s 0 , a) = s 1 , and δ(s 0 , b) = s 3 ,
C 1 = {s 0 , s 1 , s 3 }
Note that there is no need to check s 0 again, but s 1 and s 3 generate

C 2 = {s 0 , s 1 , s 3 , s 2 , s 4 }

Checking these two new states generates one more state, so

C 3 = {s 0 , s 1 , s 3 , s 2 , s 4 , s 5 }

and since s 5 leads to no new states, we have C 4 = C 3 ; as with E i A , we will now find C 3 = C 4 = C 5 = C 6 =
· · · = S c . The exercises will develop the parallels between the generation of the partial state sets C i and
the generation of the partial state equivalence relations E i A .
The procedure for recursively calculating successive C i s to determine S c provides the final algorithm
needed to efficiently find the minimal machine corresponding to a given automaton A. From A, we use
the C i s to calculate S c and thereby define A c . Theorems 3.8 and related results suggest an efficient al-
gorithm for computing E A from which we can construct A c /E Ac . A c /E Ac , is indeed the minimal machine

103
Figure 3.9: The DFA D discussed in Example 3.15

equivalent to A, as shown by the results in this chapter. Theorems 3.3 and 3.6 show that A c /E Ac is equiv-
alent to A. By Theorems 3.2, 3.4, and 3.5, this automaton is reduced and connected, and Corollary 3.4
guarantees that A c /E Ac must therefore be minimal.
The proof of Theorem 3.7 suggests building a minimal equivalent deterministic finite automaton for
A by first shrinking to a connected machine and then reducing modulo the state equivalence relation,
that is, by finding A c /E Ac . Theorem 3.5 assures us that when we reduce a connected machine it will still
be connected. An alternate strategy would be to first reduce modulo E A and then shrink to a connected
machine, that is, to find (A/E A )c . In this case, we would want to make sure that connecting a reduced
machine will still leave us with a reduced machine. It can be shown that if A is reduced then A c is reduced
(see the exercises), and hence this method could also be used to find the minimal equivalent DFA.
Finding the minimal equivalent DFA by reducing A first and then eliminating the disconnected states
is, however, less efficient than applying the algorithms in the opposite order. Finding the connected set
of states is simpler than finding the state equivalence relation, so it is best to eliminate as many states as
possible by finding S c before embarking on the more complex search for the state equivalence relation.
It should be clear that the algorithms in this chapter are presented in sufficient detail to easily allow
them to be programmed. As suggested in Chapter 1, the final states can be represented as a set and the
transition function as a matrix. The minimization procedures would then return the minimized matrix
and new final state set.
As a practical matter then, when generating an automaton to perform a given task, our concern can
be limited to defining a machine that works. No further creative insight is then necessary to find the
minimal machine. Once a machine that recognizes the desired language is found (however inefficient
it may be), the minimization algorithms can then be applied to produce a machine that is both correct
and efficient.
The proof that a reduced and connected machine is the most efficient was based on the properties
of the automaton A RL obtained from the right congruence R L . This can be proved without relying on

104
the existence of A RL . We close this chapter with an outline of such a proof. The details are similar to the
proofs given in Chapter 7 for finite-state transducers.
Theorem 3.3, which was not based in any way on R L , implies that a minimal DFA must be connected.
Similarly, an immediate corollary of Theorem 3.6 is that a minimal DFA must be reduced. Thus, a min-
imal machine is forced to be both reduced and connected. We now must justify that a reduced and
connected machine is minimal. This result will follow from Corollary 3.3, which can also be proved with-
out relying on A RL . The implication (A ∼ = B ⇒ A is equivalent to B ) is due solely to the properties of
isomorphisms and is actually true irrespective of any other hypotheses (see the exercises). Conversely, if
A is equivalent to B , then the fact that A and B are both reduced and connected allows an isomorphism
to be defined from A to B (see the exercises).
Corollary 3.3 allows us to argue that any reduced and connected automaton A is isomorphic to a
minimal automaton M , and hence A has as few states as M and is minimal. The argument would proceed
as follows: Since M is minimal, we already know that Theorems 3.3 and 3.6 imply that M is reduced
and connected. Thus, M and A are two reduced and connected equivalent automata, and Corollary 3.3
ensures that A ∼= M . Thus, minimal machines are exactly those that are reduced and connected.

Exercises
3.1. Use induction to show (∀s ∈ S)(∀x ∈ Σ∗ )(µ(δ(s, x)) = δRL (µ(s), x)) for the mapping µ defined in The-
orem 3.1.

3.2. Consider the state transition function given in Definition 3.8 and use induction to show

(∀x ∈ Σ∗ )(∀[s]E A ∈ S E A )(δE A ([s]E A , x) = [δ(s, x)]E A )

3.3. Prove that

∞
[
E A = E 0A ∩ E 1A ∩ E 2A ∩ E 3A ∩ · · · ∩ E n A ∩ · · · = EjA
j =0

3.4. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, show that the function δE A given in Definition 3.8 is well
defined.

3.5. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, show that the set F E A given in Definition 3.8 is a well-
defined set.

3.6. Show that the range of the function δc given in Definition 3.7 is contained in S c .

3.7. Prove Lemma 3.1.

3.8. Prove Lemma 3.3.

3.9. Prove Theorem 3.9.

3.10. Given a homomorphism µ from the finite automaton A = 〈Σ, S A , s 0A , δ A , F A 〉 to the DFA B = 〈Σ, S B , s 0B , δB , F B 〉,
prove by induction that

(∀s ∈ S A )(∀x ∈ Σ∗ )(µ(δ A (s, x)) = δB (µ(s), x))

105
3.11. Given a homomorphism µ from the finite automaton A = 〈Σ, S A , s 0A , δ A , F A 〉 to the DFA B = 〈Σ, S B , s 0B , δB , F B 〉,
prove that L(A) = L(B ). As long as it is explicitly cited, the result of Exercise 3.10 may be used with-
out proof.

3.12. (a) Give an example of a DFA for which A is not connected and A/E A is not connected.
(b) Give an example of a DFA for which A is not connected but A/E A is connected.

3.13. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉 and the state equivalence relation E A , show there exists
a homomorphism from A to A/E A .

3.14. Given a connected finite automaton A = 〈Σ, S, s 0 , δ, F 〉, show there exists a homomorphism from A
to A RL (A) by:

(a) Define a mapping ψ from A to A RL(A) . (No justification need be given.)

(b) Prove that your ψ is well defined.
(c) Prove that ψ is a homomorphism.

3.15. Give an example to show that there may not exist a homomorphism from A to A RL(A) if A is not
connected (see Exercise 3.14).

3.16. Give an example to show that there may still exist a homomorphism from A to A RL(A) even if A is not
connected (see Exercise 3.14).

3.17. Give an example to show that, for the relations R and R L given in Theorem 2.2, there need not exist
a homomorphism from A RL to A R .

3.18. ∼
= is an equivalence relation; in Chapter 2 we saw some equivalence relations were also right con-
gruences. Comment on the appropriateness of asking whether ∼ = is a right congruence.

3.19. Is E A a right congruence? Explain your answer.

3.20. Prove that if A is reduced then A c is reduced.

3.21. For a homomorphism µ from S A to S B for the two finite automata A = 〈Σ, S A , s 0A , δ A , F A 〉 and B =
〈Σ, S B , s 0B .δB , F B 〉,prove (∀s, t ∈ S A )(µ(s)E B µ(t ) ⇔ sE A t ).

3.22. Let M be a DFA, and let L = L(M ).

(a) Define an appropriate mapping ψ from M c to A (R M ) . (No justification need be given.)

(b) Prove that your ψ is well defined.
(c) Prove that ψ is a homomorphism.
(d) Prove that ψ is a bijection.
(e) Argue that M c ∼
=A M . (R )

3.23. For the machine A given in Figure 3.10a, find:

(a) E A (list each E i A )

(b) L(A)
(c) A RL(A)

106
(d) R L(A)
(e) A/E A

3.24. For the machine B given in Figure 3.10b, find:

(a) E B (list each E i B )

(b) L(B )
(c) A RL(B )
(d) R L(B )
(e) B /E B

Note that your answer to part (e) might contain some disconnected states.

3.25. For the machine C given in Figure 3.10c, find:

(a) EC (list each E iC )

(b) L(C )
(c) A RL(C )
(d) R L(C )
(e) C /EC

Note that your answer to part (e) might contain some disconnected states.

3.26. For the machine D given in Figure 3.10d, find:

(a) E D (list each E i D )

(b) L(D)
(c) A RL(D)
(d) R L(D)
(e) D/E D

Note that your answer to part (e) might contain some disconnected states.

3.27. Without relying on A RL , prove that if A and B are both reduced and connected equivalent DFAs
then A ∼
= B . Give the details for the following steps:

(a) Define an appropriate function ψ between the states of A and the states of B .
(b) Show that ψ is well defined.
(c) Show that ψ is a homomorphism.
(d) Show that ψ is a bijection.

3.28. In the proof of (6) in Theorem 3.1, the transition from line 3 to line 4 only involved ⇒ rather than
⇔. Show by means of an example that the two expressions involved in this transition are not equiv-
alent.

3.29. Supply reasons for each of the equivalences in the proof of Theorem 3.8.

107
Figure 3.10: (a) The DFA A discussed in Exercise 3.23 (b) The DFA B iscussed in Exercise 3.24 (c) The DFA
C discussed in Exercise 3.25 (d) The DFA D discussed in Exercise 3.26

108
3.30. Minimize the machine defined in Figure 3.3.

3.31. (a) Give an example of a DFA for which A is not reduced and A c is not reduced.
(b) Give an example of a DFA for which A is not reduced and A c is reduced.

3.32. Note that ∼

= relates some automata to other automata, and therefore ∼
= is a relation over the set of
all deterministic finite automata.

(a) For automata A, B , and C , show that if g is an isomorphism from A to B and f is an isomor-
phism from B to C , then f ◦ g is an isomorphism from A to C .
∼ is a symmetric relation; that is, formally justify that if there is an isomorphism
(b) Prove that =
from A to B then there is an isomorphism from B to A.
(c) Prove that ∼
= is a reflexive relation.
∼ is an equivalence relation over the set of
(d) From the results in parts (a), (b), and (c), prove that =
all deterministic finite automata.

3.33. Show that homomorphism is not an equivalence relation over the set of all deterministic finite
automata.

3.34. For the relations R and R L given in Theorem 2.2, show that there exists a homomorphism from A R
to A RL .

3.35. Prove that if there is a homomorphism from A to B then R A refines R B .

3.36. Prove that if A is isomorphic to B then R A = R B

(a) By appealing to Exercise 3.35.

(b) Without appealing to Exercise 3.35.

3.37. Consider two deterministic finite automata for which A is not homomorphic to B , but R A = R B .

(a) Give an example of such automata for which L(A) = L(B ).

(b) Give an example of such automata for which L(A) 6= L(B ).
(c) Can such examples be found if both A and B are connected and L(A) = L(B )?
(d) Can such examples be found if both A and B are reduced and L(A) = L(B )?

3.38. Disprove that if A is homomorphic to B then R A = R B .

3.39. Prove or give a counterexample [assume L = L(M )].

(a) For any DFA M , there exists a homomorphism ψ from A (R M ) to M .

(b) For any DFA M , there exists an isomorphism ψ from A (R M ) to M .
(c) For any DFA M , there exists a homomorphism ψ from M to A (R M ) .

3.40. Prove that if A is a minimal DFA then R A = R L(A) .

3.41. Give an example to show that Exercise 3.40 can be false if A is not minimal.

3.42. Give an example to show that Exercise 3.40 may still hold if A is not minimal.

109
3.43. Definition 3.8 takes an equivalence relation of the set of states S and defines a machine based on
that relation. In general, we could choose a relation R in S and define a machine A/R (as we did
when we defined A/E A when the relation R was E A ).

(a) Consider R = E 0A . Is A/E 0A always well defined? Give an example to illustrate your answer.
(b) Assume R is a refinement of E A . Is A/R always well defined? For the cases where it is well
defined, consider the theorems that would correspond to Theorems 3.4, 3.5, and 3.6 if E A
were replaced by such a refinement R. Which of these theorems would still be true?

3.44. Given a DFA M , prove or give a counterexample.

(a) There exists a homomorphism from M /E M to A RL(M ) .

(b) There exists a homomorphism from A RL(M ) to M /E M .

3.45. Prove that the bound given for Theorem 3.11 can be sharpened: given a finite automaton A =
〈Σ, S, s 0 , δ, F 〉, (∃m ∈ N 3 m < kSk ∧ E m A = E m+1A ).

3.46. Prove or give a counterexample:

(a) If A and B are equivalent, then A and B are isomorphic.

(b) If A and B are isomorphic, then A and B are equivalent.

3.47. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, prove that the C i s given in Definition 3.10 are nested:
(∀i ∈ N)(C i ⊆ C i +1 ).

3.48. Prove (by induction) that C i does indeed represent the set of all states that can be reached from s 0
by strings of length i or less.

3.49. Prove that, given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉,

(∃i ∈ N 3 C i = C i +1 ) ⇒ (∀k ∈ N)(C i = C i +k ).

3.50. Prove that, given a DFA A = 〈Σ, S, s 0 , δ, F 〉, (∃i ∈ N 3 C i = C i +1 ) ⇒ (C i = S c ).

3.51. Prove that, given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, ∃i ∈ N 3 C i = C i +1 .

3.52. Prove that, given a DFA A = 〈Σ, S, s 0 , δ, F 〉, (∃i ∈ N 3 i ≤ kSk ∧C i = S c ).

3.53. Use the results of Exercises 3.47 through 3.52 to argue that the procedure for generating S c from
successive calculations of C i is correct and is actually an algorithm.

3.54. Give an example of two DFAs A and B that simultaneously satisfy the following three criteria:

1. There is a homomorphism from A to B .

2. There is a homomorphism from B to A.
3. There does not exist any isomorphism between A and B .

3.55. Assume R and Q are both right congruences of finite rank, R refines Q, and L is a union of equiva-
lence classes of Q.

110
(a) Show that L is also a union of equivalence classes of R.
(b) Show that there exists a homomorphism from A R to AQ . (Hint: Do not use the µ given in
Theorem 3.1; there is a far more straightforward way to define a mapping.)
(c) Give an example to show that there need not be a homomorphism from AQ to A R .

3.56. Prove that AQ must be connected.

3.57. Prove that if there is an isomorphism from A to B and A is connected then B must also be con-
nected.

3.58. Prove that if there is an isomorphism from A to B and B is connected then A must also be con-
nected.

3.59. Disprove that if there is a homomorphism from A to B and A is connected then B must also be
connected.

3.60. Disprove that if there is a homomorphism from A to B and B is connected then A must also be
connected.

3.61. Given a DFA A, recall the relation R A on Σ∗ induced by A. This relation gives rise to another DFA
A (R A ) [with Q = R A and L = L(A)]. Consider also the connected version of A, A c .

(a) Define an isomorphism ψ from A (R A ) to A c . (No justification need be given.)

(b) Prove that your ψ is well defined.
(c) Prove that ψ is a homomorphism.
(d) Prove that ψ is an isomorphism.

3.62. Assume that A and B are connected DFAs. Assume that there exists an isomorphism ψ from A to B
and an isomorphism µ from B to A. Prove that ψ = µ−1 .

3.63. Assume that A and B are DFAs. Assume that there exists an isomorphism ψ from A to B and an
isomorphism µ from B to A. Give an example for which ψ 6= µ−1 .

3.64. Give an example of a three-state DFA for which E 0A has only one equivalence class. Is it possible
for E 0A to be different from E 1A in such a machine? Explain.

3.65. Assume A and B are both reduced and connected. If ψ is a homomorphism from A to B , does ψ
have to be an isomorphism? Justify your conclusions.

3.66. Prove Corollary 3.2.

3.67. Prove that S c = {δ(s 0 , x)| |x| ≤ kSk}.

3.68. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, two states s, t ∈ S, and the automata A t = 〈Σ, S, t , δ, F 〉
and A s = 〈Σ, S, s, δ, F 〉, prove that sE A t ⇔ L(A s ) = L(A t ).

3.69. Given a finite automaton A = 〈Σ, S, s 0 , δ, F 〉, consider the terminal sets T (A, t ) = {x | δ(t , x) ∈ F } and
initial sets I (A, t ) = {x | δ(s 0 , x) = t } for each t ∈ S.

(a) Prove that the initial sets of A must form a partition of Σ∗ .

(b) Give an example to show that the terminal sets of A might not partition Σ∗ .
(c) Give an example to show that the terminal sets of A might partition Σ∗ .

111
112
Chapter 4

Nondeterministic Finite Automata

A nondeterministic finite automaton, abbreviated NDFA, is a generalization of the deterministic ma-

chines that we have studied in previous chapters. Although nondeterministic machines lack some of
the restrictions imposed on their deterministic cousins, the class of languages recognized by nondeter-
ministic finite automata is exactly the same as the class of languages recognized by deterministic finite
automata. In this sense, the recognition power of nondeterministic finite automata is equivalent to that
of deterministic finite automata.
In this chapter we will show the correspondence of nondeterministic finite automata to determin-
istic finite automata, and we will prove that both types of machines accept the same class of languages.
In a later section, we will again generalize our computational model to allow nondeterministic finite au-
tomata that make transitions spontaneously, without an input symbol being processed. It will be shown
that the class of languages recognized by these new machines is exactly the same as the class of lan-
guages recognized by our first type of nondeterministic finite automata and is thus the same as the class
of languages recognized by deterministic finite automata.

4.1 Definitions and Basic Theorems

Whereas deterministic finite automata are restricted to having exactly one transition from a state for
each a ∈ Σ, a nondeterministic finite automaton may have any number of transitions for a given input
symbol, including zero transitions.
When processing an input string, if an NDFA comes to a state from which there is no transition arc
labeled with the next input symbol, the path through the machine which is being followed is terminated.
Termination can take the place of the “garbage state” (a permanent rejection state) found in many deter-
ministic finite automata, which is used to reject some strings that are not in the language recognized by
the automaton (the state s 7 played this role in Examples 1.11 and 1.9).

Example 4.1
Let L = {w ∈ {a b ,cc }∗ |∃y ∈ {b
a ,b b }∗ 3 w = a ycc }.We can easily build a nondeterministic finite automaton that
accepts this set of words. One such automaton is displayed in Figure 4.1. In this example there are no
transitions out of s 0 labeled with either b or c , nor are there any transitions from s 1 labeled with a . From
state s 2 there are no transitions at all. This means that if either b or c is encountered in state s 0 or a
is encountered in state s 1 , or any input letter is encountered once we reach state s 2 , the word on the

113
Figure 4.1: The NDFA described in Example 4.1

Figure 4.2: A deterministic version of the NDFA in Example 4.1

input tape will not be able to follow this particular path through the machine. Thus, if a word is not fully
processed by the NDFA, it will not be considered accepted (even if the state in which it was prematurely
“stuck” was a final state).
An equivalent, although more complicated, deterministic finite automaton is given in Figure 4.2.
Note that this deterministic finite automaton requires the introduction of an extra state, a dead state or
garbage state, to continue the processing of strings that are not in the language.
A nondeterministic finite automaton may also have multiple transitions from any state for a given
input symbol. For example, consider the following construction of a nondeterministic acceptor for the
language L, which consists of all even-length strings along with all strings whose number of 1 s is a mul-
1}∗ | |x| = 0 mod 2 ∨ |x|1 = 0 mod 3}.
0,1
tiple of 3. That is, L = {x ∈ {0

Example 4.2

In the NDFA given in Figure 4.3, there are multiple transitions from state s 0 : processing the symbol 0
causes the machine to enter states s 1 and s 2 , whereas processing a 1 causes the machine to enter both
state s 3 and state s 2 .
Within a nondeterministic finite automaton there can be multiple paths that are labeled with the
components of a string. For example, if we let w = 01 01, then there are two paths labeled by the compo-
nents of w : (s 0 → s 1 → s 3 ) and (s 0 → s 2 → s 4 ). The second path leads to a final state, s 4 , while the first
path does not. We will adopt the convention that this word w is accepted by the automaton since at least
one of the paths does terminate in a final state. These concepts will be formalized later in Definition 4.3.
This ability to make multiple transitions from a given state can simplify the construction of the ma-
chine, but adds no more power to our computational model. The deterministic machine equivalent to
Example 4.2 is substantially more complex, and its construction is left as an exercise for the reader.
Another restriction that is relaxed when we talk about nondeterministic finite automata is the num-
ber of initial states. While a deterministic machine is constrained to having exactly one start state, a

114
Figure 4.3: The NDFA discussed in Example 4.2

Figure 4.4: The NDFA discussed in Example 4.3

nondeterministic finite automaton may have any number, other than zero, up to kSk. Indeed, some
applications will be seen in which all the states are start states.

Example 4.3
We can build a machine that will accept the same language as in Example 4.2, but in a slightly different
way. Note that in Figure 4.4 the multiplicity of start states simplifies the construction considerably.
As before, the addition of multiple start states to our computational model facilitates machine con-
struction but adds no more recognition power. We turn now to the formal definition of nondeterministic
finite automata.

Definition 4.1 A nondeterministic finite automaton(NDFA) is a quintuple A = 〈Σ, S, S 0 , δ, F 〉 where:

i. Σ is an alphabet.

ii. S is a finite nonempty set of states.

iii. S 0 is a set of initial states, a nonempty subset of S.

iv. δ : S × Σ → ℘(S) is the state transition function.

115
v. F is the set of accepting states, a (possibly empty) subset of S.

The input alphabet, the state space, and even the set of final states are the same as for deterministic
finite automata. The important differences are contained in the definitions of the initial states and of the
δ function.
The set of initial states can be any nonempty subset of the state space. These can be viewed as mul-
tiple entry points into the machine, with each start state beginning distinct, although not necessarily
disjoint, paths through the machine.
The δ function for nondeterministic finite automata differs from the δ function of deterministic ma-
chines in that it maps a single state and a letter to a set of states. In some texts, one will find δ defined
as simply a relation with range S and not as a function; without any loss of generality we define δ as a
function with range ℘(S), which makes the formal proofs of relevant theorems considerably easier.

Example 4.4
Consider the machine A = 〈Σ, S, S 0 , δ, F 〉, where

Σ = a ,b
{a b}
S = {r, s, t }
S0 = {r, s}
F = {t }

and δ : {r, s, t } × {a
a ,b
b } → {;, {r }, {s}, {t }, {s, t }, {r, s}, {r, t }, {r, s, t }} is given in Figure 4.5.

δ a b
r {s} ;
s ; {r, t }
t ; {t }

Figure 4.5: An NDFA state transition diagram corresponding to the formal definition given in Example
4.4

We will see later that this machine accepts strings that begin with alternating a s and b s and end with
one or more consecutive b s

Definition 4.2 Given an NDFA A = 〈Σ, S, S 0 , δ, F 〉, the extended state transition function for A is the func-
tion δ : S × Σ∗ → ℘(S) defined recursively as follows:

116
(∀s ∈ S) δ(s, λ) = {s}
a ∈ Σ)(∀x ∈ Σ∗ ) δ(s, xa
a) = δ(q,a
a)
S
(∀s ∈ S)(∀a
q ∈ δ(s,x)

Once again, δ(s, x) is meant to represent where we arrive after starting at a state s and processing all
the letters of the string x. In the case of nondeterministic finite automata, δ does not map to a single
state but to a set of states because of the possible multiplicity of paths.

Example 4.5
Consider again the NDFA displayed in Example 4.4. To find all the places a string such as bb can reach
from s, we would first determine what can be reached by the first b . The reachable states are r and t ,
since δ(s,bb ) = {r, t }. From these states, we would then determine what could be reached by the second b
(from r , no progress is possible, but from t , we can again reach t ). These calculations are reflected in the
recursive definition of δ:

δ(s,bb δ(q,b δ(q,b

b ) = δ(r,b
b ) ∪ δ(t ,b
[ [
bb
bb) = b) = b ) = { } ∪ {t } = {t }
q ∈ δ(s,b
b) q ∈ {r,t }

Because of the multiplicity of initial states and because the δ function is now set-valued, it is possible
for a nondeterministic finite automaton to be active in more than a single state at one time. Whereas in
all deterministic finite automata there is a unique path through the machine labeled with components
of w for each w ∈ Σ∗ , this is not necessarily the case for nondeterministic finite automata. At any point
in the processing of a string, the δ function maps the input symbol and the current state to a set of
states. This implies that multiple paths through the machine are possible or that the machine can get
“stuck” and be unable to process the remainder of the string if there is no transition from a state labeled
with the appropriate letter. There is no more than one path for each word if there is exactly one start
state and the δ function always maps to a singleton set (or ;). If we were to further require that the δ
function have a defined transition to another state for every input symbol, then the machine that we
have would essentially be a deterministic finite automaton. Thus, all deterministic finite automata are
simply a special class of nondeterministic finite automata; with some trivial changes in notation, any
DFA can be thought of as an NDFA. Indeed, the state transition diagram of a DFA could be a picture of a
well-behaved NDFA. Therefore, any language accepted by a DFA can be accepted by an NDFA.

Example 4.6
Consider the machine given in Example 4.4 and let x = b ; the possible paths through the machine include
(1) starting at s and proceeding to r , and (2) starting at s and proceeding to t . Note that it is not possible
to start from t (since t ∉ S 0 ), and there is no way to proceed with x = b by starting at r , the other start
state.
Now let x = ba and consider the possibilities. The only path through the machine requires that we
start at s, proceed to r , and return to s; starting at s and proceeding to t leaves no way to process the
second letter of x. Starting from r is again hopeless (what types of strings are good candidates for starting
at r ?).
Now let x = bab
bab; the possible paths through the machine include (1) starting at s, proceeding to r ,
returning to s, and then moving again to r , and (2) starting at s, proceeding to r , returning to s, and
then proceeding to t . Note that starting at s and moving immediately to t again leaves us with no way to

117
process the remainder of the string. Both b and bab included paths that terminated at the final state t
(among other places). These strings will be said to be recognized by this NDFA (compare with Definition
4.3). ba had no path that led to a final state, and as a consequence we will consider ba to be rejected by
this machine.
There are a number of ways in which to conceptualize a nondeterministic finite automaton. Among
the most useful are the following two schemes:

1. At each state where a multiple transition occurs, the machine replicates into identical copies of
itself, with each copy following one of the possible paths.

2. Multiple states of the machine are allowed to be active, and each of the active states reacts to each
input letter.

It happens that the second viewpoint is the most useful for our purposes. From a theoretical point
of view, we use this as the basis for proving the equivalence of deterministic and nondeterministic finite
automata. It is also a useful model upon which to base the circuits that implement NDFAs.
The concept of a language for nondeterministic finite automata is different from that for determin-
istic machines. Recall that the requirement for a word to be contained in the language accepted by a
deterministic finite automaton was that the processing of a string would terminate in a final state. This
is also the condition for belonging to the language accepted by a nondeterministic finite automaton;
however, since the path through a nondeterministic finite automaton is not necessarily unique, only one
of the many possible paths need terminate in a final state for the string to be accepted.

Definition 4.3 Let A = 〈Σ, S, S 0 , δ, F 〉 be a nondeterministic finite automaton and w be a word in Σ∗ . A

δ(q, w)) ∩ F 6= ;.
S
accepts w iff (
q ∈S 0

Again conforming with our previous usage, a word that is not accepted is rejected. The use of the
operator L will be consistent with its usage in previous chapters, although it does have a different formal
definition. As before, L(A) is used to designate all those strings that are accepted by a finite automaton
A. Since the concept of acceptance must be modified for nondeterministic finite automata, the formal
definition of L is necessarily different (contrast Definitions 4.3 and 1.12).

Definition 4.4 Given an NDFA A = 〈Σ, S, S 0 , δ, F 〉, the language accepted by A, denoted L(A), is given by

L(A) = {x ∈ Σ∗ | ( δ(q, x)) ∩ F 6= ;}

[
q ∈S 0

Occasionally, it will be more convenient to express L(A) in the following fashion: L(A) = {x ∈ Σ∗ |∃t ∈
S 0 3 δ(t , x) ∩ F 6= ;}. The concept of equivalent automata is unchanged: two machines are equivalent iff
they accept the same language. Thus, if one or both of the machines happen to be nondeterministic, the
definition still applies. For example, the NDFAs given in Figures 4.3 and 4.4 are equivalent.
The language recognized by a nondeterministic finite automaton is the set of all words where at least
one of the paths through the machine labeled with components of that word ends in a final state. In
other words, the set of terminal states at the ends of the paths labeled by components of a word w must
have a state in common with the set of final states in order for w to belong to L(A).
As a first example, refer to the NDFA defined in Example 4.4. As illustrated in Example 4.6, this ma-
chine accepts strings that begin with alternating a s and b s and end with one or more consecutive b s.

118
Figure 4.6: An NDFA for pattern recognition

Example 4.7

For a more concrete example, consider the problem of a ship attempting to transmit data to shore at
random intervals. The receiver must continually listen, usually to noise, and recognize when an actual
transmission starts so that it can record the data that follow. Let us assume that the start of a transmission
is signaled by the string 010010 (in practice, such a signal string should be much longer to minimize the
possibility of random noise triggering the recording mechanism). In essence, we wish to build an NDFA
that will monitor a bit stream and move to a final state when the substring 010010 is detected (note
that nonfinal states correspond to having the recording mechanism off, and final states signify that the
current data should be recorded). The reader is encouraged to discover firsthand how hard it is to build
a DFA that correctly implements this machine and contrast that solution to the NDFA T given in Figure
4.6.
Since the transitions leading to higher states are labeled by the symbols in 010010
010010, it is clear that
the last state cannot be reached unless the sequence 010010 is actually scanned at some point during
the processing of the input string. Thus, the NDFA clearly accepts no word that should be rejected.
Conversely, since all possible legal paths are explored by an NDFA, valid strings will find a way to the
final state. lt is sometimes helpful to think of the NDFA as remaining in s 0 while the initial part of the
input string is being processed and then “guessing” when it is the right time to move to s 1 .
It is also possible to model an end-of-transmission signal that turns the recording device off (see the
exercises). The device would remain in various final states until a valid end-of-transmission string was
scanned, at which point it would return to the (nonfinal) start state.
While the NDFA given in Example 4.6 is very straightforward, it appears to be hard to simulate this
nondeterminism in real time with a deterministic computer. It has not been difficult to keep track of the
multiple paths in the simple machines seen so far. However, if each state has multiple transitions for a
given symbol, the number of distinct paths a single word can take through an NDFA grows exponentially
as the length of the word increases. For example, if each transition allowed a choice of three destination
states, a word of length m would have 3m possible paths from one single start state. An improvement can
be made by calculating, as each letter is processed, the set of possible destinations (rather than recording
all the paths). Still, in an n-state NDFA, there are potentially 2n such combinations of states. This repre-
sents an improvement over the path set, since now the number of state combinations is independent of
the length of the particular word being processed; it depends only on the number of states in the NDFA,
which is fixed. We will see that keeping track of the set of possible destination states is indeed the best
way to handle an NDFA in a deterministic manner.
Since we have seen in Chapter 1 that it is easy to implement a DFA, we now explore methods to
convert an NDFA to an equivalent DFA. Suppose that we are given a nondeterministic finite automaton
A and that we want to construct a corresponding deterministic finite automaton A d . Using the concepts

119
in Definitions 4.1 through 4.4, we can proceed in the following fashion. Our general strategy will be to
keep track of all the states that can be reached by some string in the nondeterministic finite automaton.
Since we can arbitrarily label the states of an automaton, we let the state space of A d be the power set of
S. Thus, S d = ℘(S), and each state in the new machine will be labeled by some subset of S. Furthermore,
let the start state of A d , denoted s 0d , be labeled by the member of ℘(S) containing those states that are
initial states in A; that is, s 0d = S 0 .
Since our general strategy is to “remember” all the states that can be reached for some string, we can
define the δ function in the following natural manner: For every letter in Σ, let the new state transition
function, δd , map to the subset of ℘(S) labeled by the union of all those states that are reached from
some state contained in the current state name (according to the old nondeterministic state transition
function δ).
According to Definition 4.4, for a word to be contained in the language accepted by some nondeter-
ministic finite automaton, at least one of the terminal states was required to be contained in the set of
final states. Thus, let the set of final states in the corresponding deterministic finite automaton be labeled
by the subsets of S that contain at least one of the accepting states in the nondeterministic counterpart.
The formal definition of our corresponding deterministic finite automaton is given in Definition 4.5.

Definition 4.5 Given an NDFA A = 〈Σ, S, S 0 , δ, F 〉, the corresponding deterministic finite automaton,
A d = 〈Σ, S d , s 0d , δd , F d 〉, is defined as follows:

S d = ℘(S)
s 0d = S 0
F = {Q ∈ S d |Q ∩ F =
d
6 ;}

and δd is the state transition function, δd : S d × Σ → S d , defined by

(∀Q ∈ S d )(∀a a ∈ Σ)δd (Q,a δ(q,a

[
a) = a)
q ∈Q

δd extends to the function δd : S d × Σ∗ → S d as suggested by Theorem 1.1:

(∀Q ∈ S d ) δd (Q, λ) = Q
(∀Q ∈ S d )(∀a a ) = δd (δd (Q, x),a
a ∈ Σ)(∀x ∈ Σ∗ )δd (Q, xa a)

Definition 4.5 describes a deterministic finite automaton that observes the same restrictions as all
other deterministic finite automata (a single start state, a finite state set, a well-defined transition func-
tion, and so on). The only peculiarity is the labeling of the states. Note that the definition implies that
the state labeled by the empty set is never a final state and that all transitions from this state lead back
to itself. This is the dead state, which is reached by strings that are always prematurely terminated in the
corresponding nondeterministic machine.

Example 4.8
Consider the NDFA B given in Figure 4.7. As specified by Definition 4.5, the corresponding DFA B d
would look like the machine shown in Figure 4.8. Note that all the states happen to be accessible in this
particular example.
Since the construction of the corresponding deterministic machine involves ℘(S), it should be ob-
vious to the reader that the size of this deterministic finite automaton can grow exponentially larger as

120
Figure 4.7: The NDFA B discussed in Example 4.8

Figure 4.8: The deterministic equivalent of the NDFA given in Example 4.8

the number of states in the associated nondeterministic finite automaton increases. In general, how-
ever, there are often many inaccessible states. Thus, only the states that are found to be reachable during
the construction process need to be included. The reader is encouraged to exploit this fact when con-
structing corresponding deterministic finite automata. The language accepted by the DFA A d follows
the definition given in Chapter 1.
To show that the deterministic finite automaton that we have just defined accepts the same language
as the corresponding nondeterministic finite automaton, we must first show that the δd function be-
haves in the same manner for strings as the δd function does for single letters. For any state Q ∈ S d , the
δd function maps this state and an input letter a ∈ Σ according to the mapping of the δ function for each
q ∈ Q and the letter a . The following lemma establishes that δd performs the corresponding mapping
for strings.

Lemma 4.1 Let A = 〈Σ, S, S 0 , δ, F 〉 be a nondeterministic finite automaton, and let A d = 〈Σ, S d , s 0d , δd , F d 〉
represent the corresponding deterministic finite automaton. Then

(∀Q ∈ S d )(∀x ∈ Σ∗ )(δd (Q, x) = δ(q, x))

[
q ∈Q

Proof. By induction on |x|: Let P (k) be defined by

P (k) : (∀Q ∈ S d )(∀x ∈ Σk )(δd (Q, x) = δ(q, x))

[
q ∈Q

121
Basis step: |x| = 0 ⇒ x = λ and therefore

δd (Q, λ) = Q = δ(q, λ)
[ [
{q} =
q ∈Q q ∈Q

Inductive step: Suppose that the result holds for all x 3 |x| = k; that is, P (k) is true. Let y ∈ Σk+1 . Then
∃x ∈ Σk and ∃a
a ∈ Σ 3 y = xa
a . Then

δd (Q, y) = (by definition of y)

δd (Q, xa
a) = (by Theorem 1.1)
d
δ (δd (Q, x),a
a) = (by the induction hypothesis)
δd ( δ(q, x),a
a) (∀A, B ∈ ℘(S))(∀aa ∈ Σ)(δd (A ∪ B,a
a ) = δd (A,a
a ) ∪ δd (B,a
a ))
S
=
q ∈Q
S d
δ (δ(q, x),aa) = (by Definition 4.5)
q ∈Q
δ(p,aa ))
S S
( = (by Definition 4.2)
q ∈Q p ∈ δ(q,x)

δ(q, xa
a)
S
= (by definition of y)
q ∈Q
δ(q, y)
S
q ∈Q

Therefore, P (k) ⇒ P (k + 1) for all k ≥ 0, and thus by the principle of mathematical induction we can say
that the result holds for all x ∈ Σ∗ .

Having established Lemma 4.1, proving that the language accepted by a nondeterministic finite au-
tomaton and the corresponding deterministic machine are the same language becomes a straightfor-
ward task. The equivalence of A and A d is given in the following theorem.

Theorem 4.1 Let A = 〈Σ, S, S 0 , δ, F 〉 be a nondeterministic finite automaton, and let A d = 〈Σ, s d , s 0d , δd , F d 〉
represent its corresponding deterministic finite automaton. Then A and A d are equivalent; that is, L(A) =
L(A d ).
Proof. Let x ∈ Σ∗ . Then

x ∈ L(A) ⇔ (by Definition 4.4)

δ(s, x)) ∩ F 6= ;
S
( ⇔ (by Definition 4.5)
s∈S 0
δ(s, x)) ∈ F d
S
( ⇔ (by Lemma 4.1)
s ∈ S0
d
δ (S 0 , x) ∈ F d ⇔ (by Definition 4.5)
d
δ (s 0d , x) ∈ F d ⇔ (by Definition 1.15)
x ∈ L(A)

Now that we have established that nondeterministic finite automata and deterministic finite au-
tomata are equal in computing power, the reader might wonder why we bother with nondeterministic
finite automata. Even though nondeterministic finite automata cannot recognize any language that can-
not be defined by a DFA, they are very useful both in theory and in machine construction (as illustrated
by Example 4.7). The following examples further illustrate that NDFAs often yield more natural (and less
complex) solutions to a given problem.

122
Figure 4.9: The NDFA discussed in Example 4.9

Example 4.9
Recall the machine from Chapter 1 that accepted a subset of real constants in scientific notation accord-
ing to the following BNF:

By using nondeterministic finite automata, it is easy to construct a machine that will recognize this lan-
guage (compare with the deterministic version given in Example 1.11). One such NDFA is shown in
Figure 4.9.

Example 4.10
b }∗ | x begins with a ∨x contains ba as a substring}. We can easily build a machine that will
a ,b
Let L = {x ∈ {a
accept this language, as illustrated in Figure 4.10. Now suppose we wanted to construct a machine that
would accept the reverse of this language, that is, to accept L 0 = {x ∈ {a b }∗ | x ends with a ∨ x contains
a ,b
ab
ab}. The machine that will accept this language can be built using nondeterministic finite automata by
simply exchanging the initial states and the final states and by reversing the arrows of the δ function. The
automaton (definitely an NDFA in this case!) arising in this fashion is shown in Figure 4.11.
It can be shown that the technique employed in Example 4.10, when applied to any automaton, will
yield a new NDFA that is guaranteed to accept the reverse of the original language. The material in Chap-
ter 5 will reveal many instances where the ability to define multiple start states and multiple transitions
will be of great value.

123
Figure 4.10: An NDFA accepting the language given in Example 4.10

Figure 4.11: An NDFA representing the reverse of the language represented in Figure 4.10

Example 4.11
Assume we wish to identify all words that contain at least one of the three strings 10110
10110, 1010
1010, or 01101 as
substrings. Consequently, we let L be the set of all words that are made up of some characters, followed
by one of our three target strings, followed by some other characters. That is,

L = {w ∈ {0 1}∗ | w = x y z, x ∈ {0
0,1 1}∗ , y ∈ {10110
0,1 10110 1010
10110,1010 01101 1 }∗ }
0,1
01101}, z ∈ {0
1010,01101

We can construct a nondeterministic finite automaton that will accept this language as follows. First
construct three machines each of which will accept one of the candidates for y. Next, prepend a single
state (s 0 in Figure 4.12) that loops on Σ∗ ; make this state an initial state and draw arrows from it which
mimic the transitions from each of the other three initial states (as shown in Figure 4.12). Finally, append
a single state machine (S 18 ) that accepts Σ∗ ; draw arrows from each of the final states to this state. The
machine that accepts this language is given in Figure 4.12. The reader is encouraged to try to construct a
deterministic version of this machine in order to appreciate the simplicity of the above solution. (States
s 1 , s 7 , and s 12 are shown for clarity, but they are no longer needed.)

Example 4.12
Recall the application in Chapter 1 involving string searching (Example 1.15). The construction of DFAs
involved much thought, but there is an NDFA that solves the problem in an obvious and straightforward
manner. For example, an automaton that recognizes all strings over the alphabet {a a ,b
b } containing the
substring aab might look like the NDFA in Figure 4.13.

124
Figure 4.12: An NDFA for recognizing any of several substrings

Figure 4.13: An automaton recognizing the substring aab

125
Figure 4.14: The connected portion of the DFA equivalent to the NDFA given in Example 4.12

Figure 4.15: A reduced equivalent of the DFA given in Figure 4.14

As is the case for this NDFA, it may be impossible for certain sets of states to all be active at once.
These combinations can never be achieved during the normal operation of the NDFA. The DFA states
corresponding to these combinations will not be in the connected part of A d . Applying Definition 4.5
to find the entire deterministic version and then pruning it down to just the relevant portion is very
inefficient. A better solution is to begin at the start state and “follow transitions” to new states until no
further new states are uncovered. At this point, the relevant states and their transitions will have all been
defined; the remainder of the machine can be safely ignored. For the NDFA in Figure 4.13, the connected
portion of the equivalent DFA is shown in Figure 4.14. This automaton is still not reduced; the last three
states are all equivalent and can be coalesced to form the minimal machine given in Figure 4.15.

The above process can be easily automated; an interesting but frustrating exercise might involve
producing an appropriate set of rules for generating, given a specific string y, a DFA that will recognize
all strings containing the substring y. Definition 4.5 can be used to generate the appropriate DFA from
the obvious NDFA without subjecting the designer to such frustrations!

126
Figure 4.16: The NDFA discussed in Example 4.13

Figure 4.17: The expanded state transition diagram for the NDFA in Figure 4.16

4.2 Circuit Implementation of NDFAs

As mentioned earlier, the presence of multiple paths within an NDFA for a single word characterizes the
nondeterministic nature of these automata. The most profitable way to view the operation of an NDFA
is to consider the automaton as having (potentially) several active states, with each of the active states
reacting to the next letter to determine a new set of active states. In fact, by using one D flip-flop per state,
this viewpoint can be directly translated into hardware. When a given state is active, the corresponding
flip-flop will be on, and when it is inactive (that is, it cannot be reached by the substring that has been
processed at this point), it will be off. As a new letter is processed, a state will be activated (that is, be
placed in the new set of active states) if it can be reached from one of the previously active states. Thus,
the state transition function will again determine the circuitry that feeds into each flip-flop.
Following the same conventions given for DFAs, the input tape will be assumed to be bounded by
special start-of-string <SOS> and end-of-string <EOS> symbols. The <EOS> character is again used to
activate the accept circuitry so that acceptance is not indicated until all letters on the tape have been
processed. As before, the <SOS> symbol can be employed at the beginning of the string to ensure that
the circuitry begins processing the string from the appropriate start state(s). Alternately, SR (set-reset)
flip-flops can be used to initialize the configuration without relying on the <SOS> conventions.

Example 4.13
Consider the NDFA D given in Figure 4.16. With the <SOS> and <EOS> transitions illustrated, the com-
plete model would appear as in Figure 4.17.
Two bits of input data (a a 1 and a 2 ) are required to represent the symbols <EOS>, a , b , and <SOS>.
The standard encodings described in Chapter 1 would produce <EOS> = 00 a = 01
00,a 01,bb = 10
10, and <SOS>
11. If the flip-flop t 1 is used to represent the activity of s 1 , and t 2 is used to record the status of s 2 , then
= 11
the subsequent activity of the two flip-flops can be determined from the current state activity and the
current letter being scanned, as shown in Table 4.1.
The first four rows of Table 4.1 reflect the situation in which a string is hopelessly stuck, and no states
are active. Processing subsequent symbols from Σ, will not change this; both t 1 and t 2 remain 0 . The one
exception is when the <SOS> symbol is scanned; in this case, each of the start states is activated (t 10 = 1
and t 20 = 1 ). This corrects the situation in which both flip-flops happen to initialize to 0 when power is

127
Table 4.1:

tl t2 a1 a2 t 10 t 20 accept
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 1 0 0 0 0
0 0 1 1 1 1 0
0 1 0 0 0 1 1
0 1 0 1 1 0 0
0 1 1 0 1 0 0
0 1 1 1 1 1 0
1 0 0 0 1 0 0
1 0 0 1 1 1 0
1 0 1 0 0 0 0
1 0 1 1 1 1 0
1 1 0 0 1 1 1
1 1 0 1 1 1 0
1 1 1 0 1 0 0
1 1 1 1 1 1 0

first applied to the circuitry. Scanning the <SOS> symbol changes the state of the flip-flops to reflect the
appropriate starting conditions (in this machine, both states are start states, and therefore both should
be active as processing is begun). Note that each of the rows of Table 4.1 that correspond to scanning
<SOS> show that t 1 and t 2 are reset in the same fashion.
Determining the circuit behavior for the symbols in Σ, closely parallels the definition of δd in Def-
inition 4.5. For example, when state s 1 is active but s 2 is inactive (t 1 = 1 and t 2 = 0) and a is scanned
a 1 = 0 and a 2 = 1 ), transitions from s 1 cause both states to next be active (t 10 = 1 and t 20 = 1 ). The other
(a
combinations are calculated similarly. Minimized expressions for the new values of each of the flip-flops
and the accept circuitry are:

t 10 = a 1 ) ∨ (t 2 ∧ a 2 ) ∨ (t 2 ∧ a 1 ) ∨ (a
(t 1 ∧ ¬a a 1 ∧ a 2)
t 20 = a 1 ∧ ¬a
(t 2 ∧ ¬a a 2 ) ∨ (t 1 ∧ a 2 ) ∨ (a
a 1 ∧ a 2)
accept = a 1 ∧ ¬a
(t 2 ∧ ¬a a 2)

Since similar terms appear in these expressions, these three subcircuits can “share” the common com-
ponents, as shown in Figure 4.18.
Note that the accept circuitry reflects that a string should be recognized when some final state is
active (s 2 in this example) and <EOS> is scanned. In more complex machines with several final states,
lines leading from each of the flip-flops corresponding to final states would be joined by an OR gate
before being ANDed with the <EOS> condition.
An interesting exercise involves converting the NDFA D given in Example 4.1 to the equivalent DFA
D d , which will have four states: ;, {s 1 }, {s 2 }, and {s 1 , s 2 }. The deterministic automaton D d can be realized
by a circuit diagram as specified in Chapter 1. This four-stare DFA will require 2 bits of state data. If the
state encoding conventions ; = 00 00, {s 1 } = 10 01, and {s 1 > s 2 } = 11 are used, the circuitry for the
10, {s 2 } = 01
d
DFA D will be identical to that for the NDFA D.

128
Figure 4.18: Circuitry for the automaton discussed in Example 4.13

129
For DFAs, m bits of state data (t 1 , t 2 , . . . , t m ) can encode up to 2m distinct states. With NDFAs, an n-
state machine required a full n bits of state data (1 bit per state). This apparently “extravagant” use of
state data is offset by the fact that an n-state NDFA may require 2n states to form an equivalent DFA. This
was the case in the preceding example, in which n and m were equal to 2; the two-state NDFA D required
two flip-flops, and the equivalent four-state DFA also required two flip-flops; the savings induced by the
DFA state encoding was exactly offset by the multiplicity of states needed by the NDFA.
A DFA may turn out to need less hardware than an equivalent NDFA, as illustrated by Example 4.12.
The four-state NDFA C needs four flip-flops, and the (nonminimal, 16-state) DFA C d would also need
four. However, the minimal equivalent DFA derived in Example 4.12 has only four states and therefore
can be encoded with just 2 bits of state data. Hence only two flip-flops are necessary to implement a
recognizer for L(C ).

4.3 NDFAs With Lambda-Transitions

We now extend our computational model to include the nondeterministic finite automata that allow
transitions between states to occur “spontaneously,” without any input being processed. A transition
that occurs without an input symbol being processed is called a λ-transition or lambda-move|seeλ-
transition. In texts that denote the empty string by the symbol ², such a transition is usually referred
to as an e-@²-move|seeλ-transition.

Definition 4.6 A nondeterministic finite automaton with λ-transitions is a quintuple A λ = 〈Σ, S, S 0 , δλ , F 〉,

where

i. Σ is an alphabet.

ii. S is a finite nonempty set of states.

iii. S 0 is a set of initial states, a nonempty subset of S.

iv. δλ : (S × (Σ ∪ {λ})) → ℘(S) is the state transition function.

v. F is the set of accepting states, a (possibly empty) subset of S.

A nondeterministic finite automaton with λ-transitions is very similar in structure to an NDFA that
does not have λ-transitions. The only different aspect is the definition of the δ function. Instead of
mapping state/letter pairs [from S × Σ] to ℘(S), it maps pairs consisting of a state and either a letter
or the empty string [from S × (Σ ∪ {λ}) to ℘(S)]. From any state that has a λ-transition, we adopt the
convention that the machine is capable of making a spontaneous transition to the new state specified by
that λ-transition without processing an input symbol. However, the machine may also “choose” not to
follow this path and instead remain in the original state. Before we can extend the δλ function to operate
on strings from Σ∗ , we need the very useful concept of lambda-closure|seeλ-closure.

Definition 4.7 Given a nondeterministic finite automaton

A λ = 〈Σ, S, S 0 , δλ , F 〉

with λ-transitions, the λ-closure of a state t ∈ S, denoted Λ(t ), is the set of all states that are reachable
from t without processing any input symbols. The λ-closure of a set of states T is then Λ(T ) = Λ(t ).
S
t ∈T

130
Figure 4.19: An NDFA with lambda-moves

The λ-closure of a state is the set of all the states that can be reached from that state, including itself,
by following λ-transitions only. Obviously, one can always reach the state currently occupied without
having to move. Consequently, even if there are no explicit arcs labeled by λ going back to state t , t is
always in the λ-closure of itself.

Example 4.14
Consider the machine given in Figure 4.19, which contains λ-transitions from s 0 to s 1 and from s 1 to s 2 .
By Definition 4.7,

Λ(s 0 ) = {s 0 , s 1 , s 2 }
Λ(s 1 ) = {s 1 , s 2 }
Λ(s 2 ) = {s 2 }
Λ(s 3 ) = {s 3 }

Definition 4.8 Given a nondeterministic finite automaton

A λ = 〈Σ, S, S 0 , δλ , F 〉

with λ-transitions, the extended state transition function for A λ is a function δλ : S ×Σ∗ → ℘(S) defined as
follows:

i. (∀s ∈ S)δλ (s, λ) = Λ(s)

a ∈ Σ)δλ (s, λ) = Λ( δλ (q,a

a ))
S
ii. (∀s ∈ S)(∀a
q ∈ Λ(s)

iii. (∀s ∈ S)(∀x ∈ Σ∗ )(∀a

a ∈ Σ)δλ (s, xa
a ) = Λ( δλ (q,a
a ))
S
q ∈ δλ (s,x)

The δλ function is not extended in the same way as for the nondeterministic finite automata given
in Definition 4.2. Most importantly, due to the effects of the λ-closure, δλ (s,a
a ) 6= δλ (s,a
a ). Thus, not only
does the δλ function map to a set of states based on a single letter, but it also includes the λ-closure of

131
those states. This may seem strange for single letters (strings of length 1), but it is required for consistency
when the δλ function is presented with strings of length greater than 1, since at each state along the path
there can be λ-transitions. Each λ-transition maps to a new state (which may have λ-transitions of its
own) that must be included in this path and processed by the δλ function.
The nondeterministic finite automaton without λ-transitions that corresponds to a nondeterministic
finite automaton with λ-transitions is given in Definition 4.9.

Definition 4.9 Given a nondeterministic finite automaton with λ-transitions, A λ = 〈Σ, S, S 0 , δλ , F 〉, the
corresponding nondeterministic finite automaton without λ-transitions, A σ
λ
= 〈Σ, S∪{q 0 }, S 0 ∪{q 0 }, δσ
λ
, F σ 〉,
is defined as follows:

iff λ ∉ L(A λ )
½
σ F
F =
F ∪ {q 0 } iff λ ∈ L(A λ )
σ
a ∈ Σ)δλ (q 0 ,a
a) = ;
(∀a
a ∈ Σ) {δσ a ) = Λ( δλ (q,a
a ))
S
(∀s ∈ S)(∀a λ
(s,a
q ∈ Λ(s)

and which is extended in the “usual” way for nondeterministic finite automata to the function
δσ
λ
: (S ∪ {q 0 }) × Σ∗ → ℘(S ∪ {q 0 }).

Note that from a state in A λ several λ-transitions may be taken, then a single letter a may be pro-
cessed, and then several more λ-moves may occur; all this activity can result from just a single symbol
on the input tape being processed. The definition of δσ λ
reflects these types of transitions. The δσ
λ
func-
tion is defined to be the same as the δλ function for all single letters (strings of length 1), which adjusts
for the λ-closure of A λ . The δσ λ
function can then be extended in the usual nondeterministic manner.
To account for the case that λ might be in the language accepted by the automaton A λ , we add an
extra start state q 0 to the corresponding machine A σ λ
, which is disconnected from the rest of the machine.
If λ ∈ L(A λ ), we also make q 0 a final state.

Example 4.15
Let A λ represent the NDFA given in Example 4.14. A λ would then be given by the NDFA shown in Figure
4.20. This new NDFA does indeed accept the same language as A λ . To show in general that L(A λ ) = L(A σ
λ
),
we must first show that the respective extended state transition functions behave in similar fashions.
However, these two functions can be equivalent only for strings of nonzero length (because of the effects
of the λ-closure in the definition of δσ
λ
). This result is established in Lemma 4.2.

Lemma 4.2 Given a nondeterministic finite automaton A λ with λ-transitions and the corresponding non-
deterministic finite automaton A σ
λ
without λ-transitions, then

(∀s ∈ S)(∀x ∈ Σ+ )(δσ

λ
(s, x) = δλ (s, x))

Proof. The proof is by induction on |x|; see the exercises.

Once we have shown that the extended state transition functions behave (almost) identically, we can
proceed to show that the languages accepted by these two machines are the same.

132
Figure 4.20: An NDFA without lambda-moves that is equivalent to the automaton in Figure 4.19

Theorem 4.2 Given a nondeterministic finite automaton that contains λ-transitions, there exists an equiv-
alent nondeterministic finite automaton that does not have λ-transitions.
Proof. Assume A λ = 〈Σ, S, S 0 , δλ , F 〉 is an NDFA with λ-transitions. Construct the corresponding NDFA
A λ = 〈Σ, S ∪ {q 0 }, S 0 ∪ {q 0 }, δσ
σ
λ
, F σ 〈, which has no λ-transitions. We will show L(A λ ) = L(A σ
λ
), and thereby
prove that the two machines are equivalent. Because the way A λ was constructed limits the scope of
Lemma 4.2, the proof is divided into two cases.
Case 1: If x = λ, then by Definition 4.9

(q 0 ∈ F σ iff λ ∈ L(A λ ))

and so
(λ ∈ L(A σ
λ ) ⇔ λ ∈ L(A λ ))

Case 2: Assume x 6= λ. Since there are no transitions leaving q 0 , it may be disregarded as one of the
start states of A λ . Then

x ∈ L(A σ
λ
) ⇒ (by definition of L)
σ σ
δλ (s 0 , x)) ∩ F 6= ;
S
( ⇒ (by Lemma 4.2)
s0 ∈ S 0
(since if q 0 were the common element, then x
δλ (s 0 , x)) ∩ F σ 6= ;
S
( ⇒
s0 ∈ S 0 would have to be λ, which violates the assumption)
δλ (s 0 , x)) ∩ F 6= ;
S
( ⇒ (by definition of L) x ∈ L(A λ )
s0 ∈ S 0

Conversely, and for many of the same reasons, we have

x ∈ L(A λ ) ⇒ (by definition of L)

δλ (s 0 , x)) ∩ F 6= ;
S
( ⇒ (by Lemma 4.2)
s0 ∈ S 0
δσ (since F ⊆ F σ )
S
( (s , x)) ∩ F 6= ;
λ 0
⇒
s0 ∈ S 0
δσ (s , x)) ∩ F σ 6= ;
S
( λ 0
⇒ (by definition of L)
s0 ∈ S 0
x ∈ L(A σ
λ
)

Consequently, (∀x ∈ Σ∗ )(x ∈ L(A σ

λ
) ⇔ x ∈ L(A λ )).

133
Figure 4.21: The NDFA described in Example 4.16

Figure 4.22: The modification of the NDFA in Figure 4.21

Although nondeterministic finite automata with λ-transitions are no more powerful than nondeter-
ministic finite automata without λ-transitions and consequently recognize the same class of languages
as deterministic finite automata, they have their place in theory and machine construction. Because
such machines can be constructed very easily from regular expressions (see Chapter 6), NDFAs are used
by the UNIXTM text editor and by lexical analyzer generators such as LEX for pattern-matching applica-
a ∪ c )∗bc
tions. Example 4.16 involves the regular expression (a a ∪ c )∗ , which describes the set of words
bc(a
composed of any number of a s and c s, followed by a single b , followed by a single c , followed by any
number of a s and c s.

Example 4.16
Suppose that we wanted to construct a machine that will accept the language L = {x ∈ {a b ,cc }∗ |x contains
a ,b
exactly one b , which is immediately followed by c }. A machine that accepts this language is given in
Figure 4.21.
Suppose we now wish to build a machine that will accept any positive number of occurrences of
various strings from this language concatenated together. In this case, the resulting language would
include all strings (with at least one b ) with the property that each and every b is immediately followed
by c . By simply adding a λ-transition from every final state to the start state, we achieve our objective.
The machine that accepts this new language is shown in Figure 4.22.
The previous section outlined how to implement nondeterministic finite automata without λ-transitions;
accommodating λ-moves is in fact quite straightforward. A λ-transition from state s to state t indicates
that state t should be considered active whenever state s is active. This can be assured by an obvious
modification, as shown by the following example.

Figure 4.23: A simple NDFA with lambda moves

134
Figure 4.24: Circuitry for the automaton discussed in Example 4.17

Example 4.17

As an illustration of how circuitry can be defined for machines with λ-transitions, consider the DFA E
given in Figure 4.23. This machine is similar to the NDFA D in Example 4.13, but a λ-transition has been
added from s 1 to s 2 ; that is, δ(s 1 , λ) = {s 2 }. This transition implies that s 2 should be considered active
whenever s 1 is active. Consequently, the circuit diagram produced in Example 4.13 need only be slightly
modified by establishing the extra connection indicated by the dotted line shown in Figure 4.24.

In general, the need for such “extra” connections leaving a given flip-flop input t i is determined by
examining δ(s i , λ), the set of λ-transitions for s i . Note that the propagation delay in this circuit has been
increased; there are signals that must now propagate through an extra gate during a single clock cycle.
The delay will be exacerbated in automata that contain sequences of λ-transitions. In such cases, the
length of the clock cycle may need to be increased to ensure proper operation. This problem can be
minimized by adding all the connections indicated by Λ(s i ), rather than just adding those implied by
δ(s i , λ).

135
Exercises
4.1. Draw the deterministic versions of each of the nondeterministic finite automata shown in Figure
4.25. In each part, assume Σ = {a
a ,b
b ,cc }.

4.2. Consider the automaton given in Example 4.17.

(a) Convert this automaton into an NDFA without λ-transitions using Definition 4.9.
(b) Convert this NDFA into a DFA using Definition 4.5.

4.3. Consider the automaton given in Example 4.4.

(a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS>
nor <EOS>)
(b) Using the standard encodings, draw a circuit diagram for this NDFA (include both <SOS> and
<EOS>)
(c) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the
machine).
(d) Convert the NDFA into a DFA using Definition 4.5 (draw the entire machine, including the
disconnected portion).

4.4. Consider the automaton given in Example 4.2.

(a) Using the standard encodings, draw a circuit diagram for this NDFA (include both <SOS> and
<EOS>)
(b) Convert the NDFA into a DFA using Definition 4.5.

4.5. Consider the automaton given in Example 4.3.

(a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS>
nor <EOS>).
(b) Using the standard encodings, draw a circuit diagram for this NDFA (include both <SOS> and
<EOS>)
(c) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the
machine).
(d) Is this DFA isomorphic to any of the automata constructed in Exercise 4.4?

4.6. Consider the automaton given in Example 4.14.

(a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS>
nor <EOS>)
(b) Using the standard encodings, draw a circuit diagram for the NDFA in part (b) (include neither
<SOS> nor <EOS>)

4.7. Consider the automaton given in the second part of Example 4.16.

(a) Using the standard encodings, draw a circuit diagram for this NDFA (include <EOS> but not
<SOS>)

136
Figure 4.25: Automata for Exercise 4.1

137
(b) Build the equivalent automaton without λ-transitions using Definition 4.9.
(c) Using the standard encodings, draw a circuit diagram for the NDFA in part (b) (include <EOS>
but not <SOS>)
(d) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the
machine).

4.8. It is possible to build a deterministic finite automaton A such that the language accepted by this
machine is the absolute complement of the language accepted by a machine A [that is, L(A) = Σ∗ −
L(A)] by simply complementing the set of final states (see Theorem 5.1). Can a similar thing be done
for nondeterministic finite automata? If not, why not? Give an example to support your statements.

4.9. Given a nondeterministic finite automaton A without λ-transitions, show that it is possible to con-
struct a nondeterministic finite automaton with λ-transitions A 0 with the properties

(1) A 0 has exactly one start state and exactly one final state and
(2) L(A 0 ) = L(A).

4.10. Consider (ii) in Definition 4.8. Can this fact be deduced from parts (i) and (iii)? Justify your answer.

4.11. If we wanted another way to construct a nondeterministic finite automaton without λ-transitions
corresponding to one that does have them, we could try the following: Let S 0 = S, S 00 = Λ(S 0 ), F 0 = F ,
and δ0 (s,a
a ) = δλ (s,a
a ) for all a ∈ Σ, s ∈ S. Show that this works (or if it does not work, explain why not
and give an example).

4.12. Using nondeterministic machines with λ-transitions, give an algorithm for constructing a λ-NDFA
having one start state and one final state that will accept the union of two FAD languages.

4.13. Give an example of an NDFA A for which:

(a) A d is not connected.

(b) A d is not reduced.
(c) A d is minimal.

4.14. Why was it necessary to include an “extra” state q 0 in the construction of A σ

λ
in Definition 4.9? Sup-
port your answer with an example.

4.15. (a) Using nondeterministic machines without λ-transitions, give an algorithm for constructing a
machine that will accept the union of two languages.
(b) Is this easier or more difficult than using machines with λ-transitions?
(c) Is it possible to ensure that this machine both (i) has exactly one start state and (ii) has exactly
one final state?

4.16. Consider the automaton A σ

λ
given in Example 4.15.

(a) Using the standard encodings, draw a circuit diagram for A σ

λ
(include neither <SOS> nor <EOS>)
(b) Convert A σ
λ
into A σd
λ
using Definition 4.5 (draw only the connected portion of the machine).

138
4.17. (a) Prove that for any NDFA without λ-transitions the definitions of δ and δ agree for single letters;
a ∈ Σ)(δ(s,a
that is, (∀s ∈ S)(∀a a ) = δ(s,a
a )).
(b) Give an example to show that this need not be true for an NDFA with λ-transitions.

4.18. Consider the NDFA that accepts the original language L in Example 4.10.

(a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS>
nor <EOS>)
(b) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the
machine).

4.19. Consider the NDFA which accepts the modified language L 0 in Example 4.10.

4.20. Consider the arguments leading up to the pumping lemma in Chapter 2. Are they still valid when
applied to NDFAs?

4.21. Consider Theorem 2.7. Does the conclusion still hold if applied to an NDFA?

4.22. Given a nondeterministic finite automaton A (without λ-transitions) for which λ ∉ L(A), show that
it is possible to construct a nondeterministic finite automaton (also without λ-transitions) A 00 with
the properties:

1. A 00 has exactly one start state.

2. A 00 has exactly one final state.
3. L(A 00 ) = L(A).

4.23. Give an example to show that if λ ∈ L(A) it may not be possible to construct an NDFA without λ-
transitions satisfying all three properties listed in Exercise 4.22.

4.24. Prove Lemma 4.2.

4.25. Given a DFA A, show that it can be thought of as an NDFA A n and that, furthermore, L(A n ) = L(A).
Hint: Carefully define your “new” machine A n , justify that it is indeed an NDFA, make the appropri-
ate inductive statement, and argue that L(A n ) = L(A).

4.26. Give an example to show that the domain of Lemma 4.2 cannot be expanded to include λ; that is,
show that δσ
λ
(s, λ) 6= δλ (s, λ).

4.27. Refer to Definition 4.5 and prove the fact used in Lemma 4.1:

a ∈ Σ)(δd (A ∪ B,a
(∀A ∈ ℘(S))(∀B ∈ ℘(S))(∀a a ) = δd (A,a
a ) ∪ δd (B,a
a ))

4.28. Recall that if a word can reach several states in an NDFA, some of which are final and some nonfinal,
Definition 4.4 requires us to accept that word.

139
(a) Change the definition of L(A) so that a word is accepted only if every state the word can reach
is final.
(b) Change the definition of A d to produce a deterministic machine that accepts only those words
specified in part (a).

4.29. Draw the connected part of T d ,the deterministic equivalent of the NDFA T in Example 4.7.

4.30. Refer to Example 4.7 and modify the NDFA T so that the machine reverts to a nonfinal state (that
is, turns the recorder off) when the substring 000111 is detected. Note that 000111 functions as the
EOT (end of transmission) signal.

4.31. Consider the automaton A given in Example 4.14.

(a) Draw a diagram of A σ

λ
.
(b) Draw A σd
λ
(draw only the connected portion of the machine).

4.32. What is wrong with the following “proof” of Lemma 4.2? Let P (k) be defined by P (k) : (∀s ∈ S)(∀x ∈
Σk )(δσ
λ
(s, x) = δλ (s, x)).
a ∈ Σ)(δσ
Basis step (k = 1) : (∀s ∈ S)(∀a λ
a ) = δσ
(s,a λ
a ) = δλ (s,a
(s,a a )).
Inductive step: Suppose that the result holds for all x ∈ Σ and let y ∈ Σk+1 . Then (∃x ∈ Σk )(∃a
k a ∈Σ3
a ). Then
y = xa

δσ
λ
(s, y) = δσ
λ
a)
(s, xa
σ σ
= δλ (δλ (s, x),a
a)
σ
= δλ (δλ (s, x),a
a)
= δλ (δλ (s, x),a
a)
= δλ (s, xa
a)
= δλ (s, y)

Therefore, P (k) ⇒ P (k + 1) for all k ≥ 1, and by the principle of mathematical induction, we are
assured that the equation holds for all x ∈ Σ+ .

4.33. Consider the automaton given in Example 4.7.

4.34. Consider the automaton given in Example 4.11.

4.35. Consider the automaton B given in Example 4.8.

140
(a) Using the standard encodings, draw a circuit diagram for B (include neither <S0S> nor <EOS>).
(b) Using the standard encodings, draw a circuit diagram for B d (include neither <S0S> nor <EOS>).
Encode the states in such a way that your circuit is similar to the one found in part (a).

4.36. Draw a circuit diagram for each NDFA given in Figure 4.25 (include neither <S0S> nor <EOS>). Use
the standard encodings.

4.37. Draw a circuit diagram for each NDFA given in Figure 4.25 (include both <S0S> and <EOS>). Use the
standard encodings. (See Example 4.13 for the solution corresponding to Figure 4.25e.)

4.38. Definition 3.10 and the associated algorithms were used in Chapter 3 for finding the connected
portion of a DFA.

(a) Adapt Definition 3.10 so that it applies to NDFAs.

(b) Prove that there is an algorithm for finding the connected portion of an NDFA.

141
142
Chapter 5

Closure Properties

In this chapter we will look at ways to combine languages that are recognized by finite automata (that
is, FAD languages) and consider whether the combinations result in other FAD languages. These results
will provide insights into the construction of finite automata and will establish useful information that
will have bearing on the topics covered in later chapters. After the properties of the collection of FAD
languages have been fully explored, other classes of languages will be investigated.
We begin with a review of the concept of closure.

5.1 FAD Languages and Basic Closure Theorems

Notice that when many everyday operators combine objects of a given type they produce an object of the
same type. In arithmetic, for example, the multiplication of any two whole numbers produces another
whole number. Recall that this property is described by saying that the set of whole numbers is closed
under the operation of multiplication. In contrast, the quotient of two whole numbers is likely to produce
a fraction: the whole numbers are not closed under division. The formal definition of closure, both
for operators that combine two other objects (binary operators) and those that modify only one object
(unary operators) is given below.

Definition 5.1 The set K is closed under the (binary) operator Θ iff (∀x, y ∈ K )(x Θ y ∈ K ).

Definition 5.2 The set K is closed under the (unary) operator η iff (∀x ∈ K )(η(x) ∈ K ).

Example 5.1

N is closed under + since, if x and y are nonnegative integers, then x + y is another nonnegative integer;
that is. if x, y ∈ N, then x + y ∈ N.

Example 5.2

I is closed under | | (absolute value), since if x is an integer, |x| is also an integer.

143
Example 5.3
Let ρ = {X | X is a finite subset of N}; then ρ is closed under ∪, since the union of two finite sets is still
finite. (If Y and Z are subsets for which kY k = n < ∞ and kZ k = m < ∞, then kY ∪ Z k ≤ n + m < ∞.
Under what conditions would kY ∪ Z k < n + m?)
To show a set K is not closed under a binary operator Θ, we must show ¬[(∀x, y ∈ K )(x Θ y ∈ K )],
which means ∃x, y ∈ K 3 x Θ y ∉ K .

Example 5.4
N is not closed under − (subtraction) since 3 − 5 = −2 ∉ N, even though both 3 ∈ N and 5 ∈ N.
Notice that the set as well as the operator is important when discussing closure properties; unlike N,
the set of all integers Z is closed under subtraction. As with the binary operator in Example 5.4, a single
counterexample is sufficient to show that a given set is not closed under a unary operator.

Example 5.5
p p
N is not closed under (square root) since 7 ∈ N but 7 ∉ N.
We will not be concerned so much with sets of numbers as with sets of languages. As in Example 5.3,
the collection will be a set of sets. Of prime concern are those languages that are related to automata.

Definition 5.3 Let S be an alphabet. The symbol DΣ is used to denote the set of all FAD languages over Σ;
that is,
DΣ = {L ⊆ Σ∗ | ∃ deterministic finite automaton M 3 L(M ) = L}

DΣ is the set of all languages that can be recognized by finite automata. In this chapter, it is this set
whose closure properties with respect to various operations in Σ∗ we are most interested in investigating.
For example, if there exists a machine that accepts a language K , then there is also a machine that accepts
the complement of K . That is, if K is FAD, then ∼K is FAD: DΣ is closed under ∼.

Theorem 5.1 For any alphabet Σ, DΣ is closed under ∼ (complementation).

Proof. Let K ∈ DΣ . We must show ∼K ∈ DΣ also; that is, there is a machine that recognizes ∼K . But
K ∈ DΣ , and thus there is a deterministic finite automaton that recognizes K : Let A = 〈Σ, S, s 0 , δ, F 〉 and
L(A) = K . Define a new machine A ∼ as follows: A ∼ = 〈Σ, S ∼ , s 0∼ , δ∼ , F ∼ 〉 = 〈Σ, S, s 0 , δ, S-F 〉, which looks just
like A except that the final and nonfinal states have been interchanged. We claim that L(A ∼ ) = ∼K . To
show this, let x be an arbitrary element of Σ∗ . Then

x ∈ L(A ∼ ) ⇔ (by definition of L)

δ∼ (s 0∼ , x) ∈ F ∼ ⇔ (by induction and the fact that δ = δ∼ )
δ(s 0∼ , x) ∈ F ∼ ⇔ (by definition of s 0∼ )
δ(s 0 , x) ∈ F ∼ ⇔ (by definition of F ∼ )
δ(s 0 , x) ∈ S-F ⇔ (by definition of complement)
δ(s 0 , x) ∉ F ⇔ (by definition of L)
x ∉ L(A) ⇔ (by definition of K )
x ∉K ⇔ (by definition of complement)
x ∈ ∼K

144
Figure 5.1: The two automata discussed in Example 5.6

Thus L(A ∼ ) = ∼K as claimed, and therefore the complement of any FAD language can also be recognized
by a machine and is consequently also FAD. Thus DΣ is closed under complementation.

It turns out that DΣ is closed under all the common set operators. Notice that the definition of DΣ
implies that we are working with only one alphabet; if we combine two machines in some way, it is
understood that both automata use exactly the same input alphabet. This turns out to be not much of
a restriction, however, for if we wish to consider two machines that use different alphabets Σ1 and Σ2 ,
we can simply modify each machine so that it is able to process the new common alphabet Σ = Σ1 ∪ Σ2 .
It should be clear that this can be done in such a way as not to affect the language accepted by either
machine (see the exercises).
We will now prove that the union of two FAD languages is also FAD. This can be shown by demonstrat-
ing that, given two automata M 1 and M 2 , it is possible to construct another automaton that recognizes
the union of the languages accepted by M 1 and M 2 .

Example 5.6
Consider the two machines M 1 and M 2 displayed in Figure 5.1. These two machines can easily be em-
ployed to construct a nondeterministic finite automaton that clearly accepts the appropriate union. We
simply need to combine them into a single machine, which in this case will have two start states, as
shown in Figure 5.2.
The structure inside the dotted box should be viewed as a single NDFA with two start states. Any
string that would be accepted by M 1 will reach a final state if it starts in the “upper half” of the new
machine, while strings that are recognized by M 2 will be accepted by the “lower half” of the machine.
Recall that the definition of acceptance by a nondeterministic finite automaton implies that the NDFA
in Figure 5.2 will accept a string if any path leads to a final state. This new NDFA will therefore accept all
the strings that M 1 accepted and all the strings that M 2 accepted. Furthermore, these are the only strings
that will be accepted. This trick is the basis of the following proof, which demonstrates the convenience
of using the NDFA concept; a proof involving only DFAs would be both longer and less obvious (see the
exercises).

Theorem 5.2 For any alphabet Σ, DΣ is closed under ∪.

Proof. Let L 1 and L 2 belong to DΣ . Then there are nondeterministic finite automata A 1 = 〈Σ, S 1 , S 01 , δ1 , F 1 〉
and A 2 = 〈Σ, S 2 , S 02 , δ2 , F 2 〉 such that L(A 1 ) = L 1 and L(A 2 ) = L 2 (why?). Define A ∪ = 〈Σ, S ∪ , s 0∪ , δ∪ , F ∪ 〉,
where

145
Figure 5.2: The resulting automaton in Example 5.6

S∪ = S1 ∪ S2 (without loss of generality, we can assume S 1 ∩ S 2 = ;)

S 0∪ = S 01 ∪ S 02
F ∪ = F1 ∪ F2

and δ∪ : (S 1 ∪ S 2 )× → ℘(S 1 ∪ S 2 ) is defined by

 δ1 (s, a) if s ∈ S 1


δ∪ (s, a) = , ∀s ∈ S 1 ∪ S 2 , ∀a ∈ Σ
δ2 (s, a) if s ∈ S 2


We claim that L(A ∪ ) = L(A 1 ) ∪ L(A 2 ) = L 1 ∪ L 2 . This must be proved using the definition of L from Chapter
4, since A 1 , A 2 , and A ∪ are all NDFAs.

x ∈ L(A ∪ ) ⇔ (from Definition 4.4)

(∃s ∈ S 0∪ )[δ∪ (s, x) ∩ F ∪ 6= ;] ⇔ (by definition of S 0∪ )
(∃s ∈ S 01 ∪ S 02 )[δ∪ (s, x) ∩ F ∪ 6= ;] ⇔ (by definition of ∪)
(∃s ∈ S 01 )[δ∪ (s, x) ∩ F ∪ 6= ;] ∨ (∃s ∈ S 02 )[δ∪ (s, x) ∩ F ∪ 6= ;] ⇔ (by definition of δ∪ and induction)
(∃s ∈ S 01 )[δ1 (s, x) ∩ F ∪ 6= ;] ∨ (∃s ∈ S 02 )[δ2 (s, x) ∩ F ∪ 6= ;] ⇔ (by definition of F ∪ )
(∃s ∈ S 01 )[δ1 (s, x) ∩ F 1 6= ;] ∨ (∃s ∈ S 02 )[δ2 (s, x) ∩ F 2 6= ;] ⇔ (from Definition 4.4)
x ∈ L(A 1 ) ∨ x ∈ L(A 2 ) ⇔ (by definition of ∪)
x ∈ (L(A 1 ) ∪ L(A 2 )) ⇔ (by definition of L 1 , L 2 )
x ∈ L1 ∪ L2

The above “proof” is actually incomplete; the transition from line 4 to line 5 actually depends on the
assumed properties of δ∪ , and not the known properties of δ∪ . A rigorous justification should include
an inductive proof of (or at least a reference to) the fact that δ∪ reflects the same sort of behavior that δ∪
does; that is, 
 δ1 (s, x) if s ∈ S 1

δ (s, x) =
∪ , ∀s ∈ S 1 ∪ S 2 , ∀x ∈ Σ∗
δ2 (s, x) if s ∈ S 2



The above rule essentially states that the definition that applies to the single letter a also applies to the
string x, and it is easy to prove by induction on the length of x (see the exercises).

146
The following theorem, which states that DΣ is closed under ∩, will be justified in two separate ways.
The first proof will argue that the closure property must hold due to previous results; no new DFA need
be constructed. The drawback to this type of proof is that we have no suitable guide for actually combin-
ing two existing DFAs into a new machine that will recognize the appropriate intersection (although, as
outlined in the exercises, in this case a construction based on the first proof is fairly easy to generate).
Some operators are so bizarre that a nonconstructive proof of closure is the best we can hope for; in-
tersection is definitely not that strange, however. In a second proof of the closure of DΣ under ∩, Lemma
5.1 will explicitly outline how an intersection machine could be built. When such constructions can be
demonstrated, we will say that DΣ is effectively closed under the operator in question (see Theorem 5.12
for a discussion of an operator that is not effectively closed).

Theorem 5.3 For any alphabet Σ, DΣ is closed under ∩.

Proof. Let L 1 and L 2 belong to DΣ . Then by Theorem 5.1, ∼L 1 and ∼L 2 are also FAD. By Theorem
5.2, ∼L 1 ∪ ∼L 2 is also FAD. By Theorem 5.1 again, ∼(∼L 1 ∪ ∼L 2 ) is also FAD. By De Morgan’s law, this last
expression is equivalent to L 1 ∩ L 2 , so L 1 ∩ L 2 is FAD, and thus L 1 ∩ L 2 ∈ DΣ .

Note that the above argument could be made to apply to any collection C of sets that were known to
be closed under union and complementation. A second proof of Theorem 5.3 might rely on the following
lemma, using the “direct” method of constructing a deterministic machine that accepts L 1 ∩ L 2 . This
would show that DΣ is effectively closed under the intersection operator.

Lemma 5.1 Given deterministic finite automata A 1 = 〈Σ, S 1 , s 01 , δ1 , F 1 〉 and A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉 such
that L(A 1 ) = L, and L(A 2 ) = L 2 , define a new DFA A ∩ = 〈Σ, S ∩ , s 0∩ , δ∩ , F ∩ 〉, where

s∩ = S1 × S2
s 0∩ = 〈s 01 , s 02 〉
F ∩ = F1 × F2

and δ∩ : (S 1 × S 2 ) × Σ → S 1 × S 2 is defined by

δ∩ (〈s, t 〉, a) = 〈δ1 (s, a), δ2 (t , a)〉 ∀s ∈ S 1 , ∀t ∈ S 2 , ∀a ∈ Σ

Then L(A ∩ ) = L 1 ∩ L 2 .
Proof. As usual, the key is to show that x ∈ L(A ∩ ) ⇔ x ∈ L 1 ∩ L 2 . The proof hinges on the inductive
statement that δ∩ obeys the same rule that defines δ∩ ; that is, (∀s ∈ S 1 )(∀t ∈ S 2 )(∀x ∈ Σ∗ )[δ∩ (〈s, t 〉, x) =
〈δ1 (s, x), δ2 (t , x)〉]. The details are left for the reader (see the exercises).

The idea behind the above construction is to build a machine that “remembers” the state changes
that both A 1 and A 2 make as they each process the same string, and hence the state set consists of all
possible pairs of states from A 1 and A 2 . The goal was to design the transition function δ∩ so that being
in state 〈s, t 〉 in A ∩ indicates that A 1 would currently be in state s and A 2 would be in state t . This goal
also motivates the definition of the new start state; we want to begin in the start states of A 1 and A 2 , and
hence s 0∩ = 〈s 01 , s 03 〉. We only wish to accept strings that are common to both languages, which means
that the terminating state in A 1 belongs to F 1 and the last state reached in A 2 is likewise a final state. This
requirement naturally leads to the definition of F ∩ , where 〈s, t 〉 is a final state if and only if both s and t
were final states in their respective machines.

147
Figure 5.3: The automata discussed in Example 5.7

Example 5.7
Consider the two machines A 1 and A 2 displayed in Figure 5.3. Note that A 2 “remembers” whether there
have been an even or an odd number of b s, while A 1 “counts” the number of letters ( mod 3). We now
demonstrate how the definition in Lemma 5.1 can be applied to form a deterministic machine that ac-
cepts the intersection of L(A 1 ) and L(A 2 ). The structure of A ∩ would in this case look like the automaton
shown in Figure 5.4. Note that A ∩ does indeed keep track of the criteria that both A 1 and A 2 use to ac-
cept or reject strings. We will be in a state on the right side of A ∩ if an odd number of b s have been seen
and on the left side when an even number of b s have been processed. At the same time, we will be in
the upper, middle, or lower row of states depending on the total number of letters (mod 3) that have
been processed. There is but one final state, corresponding to the situation where we have both an odd
number of b s and the letter count is 0 (mod 3).
The operations used in the previous three theorems are common to set theory. We now present some
new operators that are special to string algebra. We have defined concatenation (·) for individual strings,
but there is a natural extension of the definition to languages, as indicated by the next definition.

Definition 5.4 Let L 1 and L 2 be languages. The concatenation of L 1 with L 2 , written L 1 · L 2 , is defined by

L 1 · L 2 = {x · y | x ∈ L 1 ∧ y ∈ L 2 }

Example 5.8
b ,cc
If L 1 = {λ,b cc aa
cc} and L 2 = {λ,aa baa
aa,baa
baa}, then

b ,cc
L 1 · L 2 = {λ,b cc aa
cc,aa baa
aa,baa cc aa
baa,cc bbaa
aa,bbaa ccbaa
bbaa,ccbaa
ccbaa}.

148
Figure 5.4: The resulting DFA for Example 5.7

Note that baa qualifies to be in L 1 · L 2 for two reasons: baa = λ · baa and baa = b · aa
aa. Thus we see
that the concatenation contains only eight words rather than the expected 9(= 3 · 3). In general, L 1 · L 2
consists of all words that can be formed by the concatenation of a word from L 1 with a word from L 2 ;
for finite sets, concatenating an n word set with an m word set results in no more than n · m words. As
shown in this example, the number of words can actually be less than n · m. Larger languages can be
concatenated, also. For example, Σ∗ · Σ = Σ+ .
The concatenation of two FAD languages is also FAD, as can easily be seen by employing NDFAs with
λ-transitions.

Example 5.9
Figure 5.5 illustrates two nondeterministic finite automata B 1 and B 2 that accept the languages L 1 and
L 2 given in Example 5.8. Combining these two machines and linking the final states of B 1 to the start
states of B 2 with λ-transitions yields a new NDFA that accepts L 1 · L 2 , as shown in Figure 5.6.

Example 5.10
Consider the deterministic finite automata A 1 and A 2 displayed in Figure 5.7. These can similarly be
linked together to form an NDFA that accepts the concatenation of the languages accepted by A 1 and
A 2 , as shown in Figure 5.8.
It is also possible to directly build a machine for concatenation without using any λ-transitions, al-
though the penalty for limiting our attention to less exotic machines is a loss of clarity in the construction.
While the proof of the following theorem does not depend on λ-transitions, the resulting machine is still
nondeterministic.

149
Figure 5.5: Two candidates for concatenation

Figure 5.6: An NDFA which accepts the concatenation of the machines discussed in Example 5.9

Figure 5.7: A pair of candidates for concatenation

150
Figure 5.8: Concatenation of the machines in Example 5.10 via lambda-moves

Theorem 5.4 For any alphabet Σ, DΣ is closed under · (concatenation).

Proof. Let L 1 and L 2 belong to DΣ . Then there are deterministic finite automata A 1 = 〈Σ, S 1 , s 01 , δ1 , F 1 〉
and A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉 such that L(A 1 ) = L 1 and L(A 2 ) = L 2 . Without loss of generality, assume S 1 ∪S 2 =
;. Define a nondeterministic machine A • = 〈Σ, S • , S 0• , δ• , F • 〉, where

S• = S1 ∪ S2
S 0• = {s 01 }
if λ ∉ L 2
½
F2
F• =
F1 ∪ F2 if λ ∈ L 2
and δ• : (S 1 ∪ S 2 ) × Σ → ℘(S 1 ∪ S 2 ) is defined by

 {δ1 (s, a)} if s ∈ S 1 − F 1
δ• (s, a) = {δ1 (s, a), δ2 (s 02 , a)} if s ∈ F 1

{δ2 (s, a)} if s ∈ S 2
It can be shown that L(A • ) = L(A 1 ) · L(A 2 ) = L 1 · L 2 (see the exercises).

Example 5.11
Consider the deterministic finite automata A 1 and A 2 in Example 5.10. These can be linked together to
form the NDFA A · , and the reader can indeed verify that the machine illustrated in Figure 5.9 accepts the
concatenation of the languages accepted by A 1 and A 2 . Notice that the new transitions from the final
states of A 1 mimic the transitions out of the start state of A 2 .
Thus we see that avoiding λ-transitions while defining a concatenation machine is relatively sim-
ple. Unfortunately, avoiding the nondeterministic aspects of the construction is relatively impractical
and would basically entail re-creating the construction in Definition 4.5 (which outlined the method for
converting an NDFA into a DFA). Whereas it was merely convenient (rather than necessary) to employ
NDFAs to demonstrate that DΣ is closed under union, the use of nondeterminism is essential to the proof
of closure under concatenation.

Example 5.12
Consider the nondeterministic finite automata B 1 and B 2 from Example 5.9. Applying the analog of
Theorem 5.4 (see Exercise 5.43) yields the automaton shown in Figure 5.10. Notice that each final state
of B 1 now also mimics the start state of B 2 , and t 0 has become a disconnected state. Both s 0 and s 1 are
still final states since λ ∈ L(B 2 ).

151
Figure 5.9: Concatenation of the machines in Example 5.10 without lambda-moves (Example 5.11)

Figure 5.10: An NDFA without lambda-moves which accepts the concatenation of the languages dis-
cussed in Example 5.8

Example 5.13

Consider the nondeterministic finite automata B 1 and B 3 shown in Figure 5.11, where B 1 is the same
as that given in Example 5.9, while B 3 differs just slightly from B 2 (t 0 is no longer a final state). Note
aa
that L(B 3 ) = {aa baa}, and λ ∉ L(B 3 ). Applying Theorem 5.4 in this case yields the automaton shown in
baa
aa,baa
Figure 5.12. In this construction, s 0 and s 1 are no longer final states since the definition of F must follow
a different rule when λ 6= L(B 3 ). By examining the resulting machine, the reader can verify that having t 3
as the only final state is indeed the correct strategy for this case.
Besides concatenation, string algebra allows other new operators on languages. The operators ∗
and + , which have at this point only been defined for alphabets, likewise have natural extensions to
languages. Loosely, we would expect L ∗ to consist of all words that can be formed by the concatenation
of several words from L.

Definition 5.5 Let L be a language over some alphabet Σ. Define

152
Figure 5.11: Candidates for concatenation in which the second machine does not accept λ (Example
5.13)

Figure 5.12: The concatenation of the NDFAs in Example 5.13

153
L 0 = {λ}
L1 = L
L2 = L · L
L3 = L · L2 = L · L · L

and in general

L n = L · L n−1 , for n = 1, 2, 3, . . .
∞
S i
∗
L = L = L ∪ L 1 ∪ L 2 ∪ · · · = {λ} ∪ L ∪ L · L ∪ L · L · L ∪ . . .
0
i =0
∞
L+ = Li = L ∪ L2 ∪ L3 ∪ · · · = L ∪ L · L ∪ L · L · L ∪ . . .
S
i =1

L ∗ is called the Kleene closure of the language L.

Example 5.14
aa,cc }, then L ∗ = {λ,aa
aa
If L = {aa aa
aa,cc ,aac
aac caa
aac,caa aaaa
caa,aaaa cc
aaaa,cc aaaaaa
cc,aaaaaa aaaac
aaaaaa,aaaac aac aa
aaaac,aac aa, . . .}.

Example 5.15
db
If K = {d b ,cc }, then K ∗ consists of all words (over {b, c, d }) for which each occurrence of d is immediately
b,b
followed by (at least) one b .
DΣ is closed under both ∗ and + . The technique for Kleene closure is outlined in Theorem 5.5. The
construction for L + is similar (see the exercises).

Theorem 5.5 For any alphabet Σ, DΣ is closed under ∗ (Kleene closure).

Proof. Let L belong to D. Then there is a nondeterministic finite automaton A = 〈Σ, S, s 0 , δ, F 〉 such
that L(A) = L. Define a nondeterministic machine A ∗ = 〈Σ, S ∗ , S 0∗ , δ∗ , F ∗ 〉, where

S ∗ = S ∪ {q 0 } (where q 0 is some new state; q 0 ∉ S)

S 0∗ = s 0 ∪ {q 0 }
F ∗ = F ∪ {q 0 }

and δ∗ : (S ∪ {q 0 } × Σ → ρ(S ∪ {q 0 }) is defined by

δ(s, a) µ



 ¶ if s ∉ F ∪ {q 0 }
δ∗ (s, a) = δ(s, a) ∪ δ(t , a) if s ∈ F
S

 t ∈S 0

; if s ∈ q 0

We claim that L(A ∗ ) = L(A)∗ = L ∗ (see the exercises).

Example 5.16
Consider the nondeterministic finite automaton B displayed in Figure 5.13, which accepts all words that
contain exactly two (consecutive) b s. Using the modifications described above, the new NDFA B · would
look like the automaton shown in Figure 5.14. Notice that the new automaton does indeed accept L(B )∗ ,
the set of all words in which the b s always occur in side-by-side pairs. This example also demonstrates
the need for the special extra start (and final) state q 0 (see the exercises).

154
Figure 5.13: The NDFA B in Example 5.16.

Figure 5.14: The resulting NDFA for Example 5.16

It is instructive to compare the different approaches taken in the proofs of Theorems 5.4 and 5.5. In
both cases, nondeterministic automata were built, but Theorem 5.4 began with deterministic machines,
while Theorem 5.5 assumed that an NDFA was provided. Note that, in the construction of δ• in Theo-
rem 5.4, δ1 was a deterministic transition function and as such produced a single state, whereas δ• , on
the other hand, must adhere to the nondeterministic definition and produce a set of states. As a con-
sequence, the definition of δ• involved expressions like {δ1 (s, a)}, which indicated that the single state
given by δ1 (s, a) should be viewed as a singleton set.
By contrast, Theorem 5.5 specified the nondeterministic transition function δ· in terms of δ, which
was also assumed to be nondeterministic. This gave rise to definitions of the form δ∗ (s, a) = δ(s, a). In
this case, no set brackets { } were necessary since δ(s, a) by assumption already represented a set (rather
than just a single element as in the deterministic case).
The definition of the new set of start states So is also affected by the type of machine from which the
new NDFA is formed. In reviewing Theorems 5.4 and 5.5, the reader should be able to see the parallel
between the differences in the specifications of the δ function and the differences in the definitions of S 0•
and S 0∗ . It is also instructive to compare and contrast the proof of Theorem 5.2 to those discussed above.

5.2 Further Closure Properties

The operators discussed in this section, while not as fundamental as those presented earlier, illustrate
some useful techniques for constructing modified automata. Also explored are techniques that provide
existence proofs rather than constructive proofs.

Theorem 5.6 For any alphabet Σ, DΣ is closed under the operator Z, where Z is defined by

Z (L) = {x | x is formed by deleting zero or more letters from a word in L}.

155
Figure 5.15: The automaton C discussed in Example 5.17

Figure 5.16: An automaton accepting Z (C ) in Example 5.17

Proof. See the exercises and the following example.

Example 5.17

Consider the deterministic finite automaton C displayed in Figure 5.15, which accepts the language
{a n b m | n ≥ 1, m ≥ 1}. Z (L(C )) would then be

{a n b m | n ≥ 0, m ≥ 0}

and can be accepted by modifying C so that every transition in the diagram has a corresponding λ-move
(allowing that particular letter to be skipped), as shown in Figure 5.16.

Theorem 5.7 For any alphabet Σ, DΣ is closed under the operator Y , where Y is defined by

Y (L) = {x | x is formed by deleting exactly one letter from a word in L}.

Proof. See the exercises and the following example.

Figure 5.17: The automaton D discussed in Example 5.18

156
Figure 5.18: The modified machine in Example 5.18

Example 5.18
We need a way to skip a letter as was done in Example 5.17, but we must now skip one and only one
letter. The technique for accomplishing this involves using copies of the original machine. Consider
the deterministic finite automaton D displayed in Figure 5.17. We will use λ-moves to mimic normal
transitions, but in this case we will move from one copy of the machine to an appropriate state in a
second copy. Being in the first copy of the machine will indicate that we have yet to skip a letter, and
being in the second copy will signify that we have followed exactly one λ-move and have thus skipped
exactly one letter. Hence the second copy will be the only one in which states are deemed final, and the
first copy will contain the only start state. The modified machine for this example might look like the
NDFA shown in Figure 5.18. The string abaaba, which is accepted by the original machine, should cause ab
ab,
aa
aa, and ba to be accepted by the new machine. Each of these three are indeed accepted, by following the
correct λ-move at the appropriate time. A similar technique, with the state transition function slightly
redefined, could be used to accept words in which every other letter was deleted. If one wished only
to acknowledge every third letter, three copies of the machine could be suitably connected together to
achieve the desired result (see the exercises).
While DΣ is certainly the most important class of languages we have seen so far, we will now consider
some other classes whose properties can be investigated. The closure properties of other collections of
languages will be considered in the exercises and in later chapters.

Definition 5.6 Let Σ be an alphabet. Then WΣ is defined to be the set of all languages over Σ recognized
by NDFAs; that is,
WΣ = {L ⊆ Σ∗ | ∃ NDFA N 3 L(N ) = L}.

Lemma 5.2 Let Σ be an alphabet. Then WΣ = DΣ .

Proof. The proof follows immediately from Theorem 4.1 and Exercise 4.25.

The reader should note that Lemma 5.2 simply restates in new terms the conclusion reached in Chap-
ter 4, where it was proved that NDFAs were exactly as powerful as DFAs. More specifically, it was shown

157
that any language that could be recognized by an NDFA could also be recognized by a DFA, and con-
versely. While every subset of Σ∗ represents a language, those in DΣ have exhibited many nice properties
owing to the convenient representation afforded by finite automata. We now focus our attention on “the
other languages,” that is, those that are not in DΣ .

Definition 5.7 Let Σ be an alphabet. Then NΣ is defined to be the set of all non-FAD languages over Σ;
that is,
NΣ = {L ⊆ Σ∗ | there does not exist any finite automaton M 3 L(M ) = L}

NΣ is all the “complicated” languages (subsets) that can be formed from Σ∗ ; that is, NΣ = ρ(Σ∗ )− DΣ .
Be careful not to confuse NΣ with the set of languages that can be recognized by NDFAs (WΣ in Definition
5.6).

Theorem 5.8 Let Σ be an alphabet. Then NΣ is closed under ∼ (complementation).

Proof. (by contradiction): Assume the lemma is not true, which means that there exists a language K
for which

K ∈ NΣ ∧ ∼K ∉ NΣ ⇒ (by definition of NΣ )
∼K ∈ DΣ ⇒ (by Theorem 5.1)
∼(∼K ) ∈ DΣ ⇒ (since ∼(∼K ) = K )
K ∈ DΣ ⇒ (by definition of NΣ )
K ∉ NΣ

which contradicts the assumption. Thus the lemma must be true.

Lemma 5.3 N{a,b} is not closed under ∩.

Proof. Let L 1 = {a p | p is prime} and let L 2 = {b p | p is prime}. Then L 1 ∈ N{a,b} , L 2 ∈ N{a,b} but
L 1 ∩ L 2 = ; ∉ N{a,b} (why?).

As another useful example of closure, we consider the transformation of one language to another via
a language homomorphism, which represents the process of consistently replacing each single letter a i ,
by a word w i . Such transformations are commonplace in computer science; some applications expect
lines in a text file to be delimited with a carriage return/line feed pair, while other applications expect
only a carriage return. Stripping away the unwanted line feeds is tantamount to applying a homomor-
phism that replaces most ASCII characters by the same symbol, but replaces line feeds by λ. Converting
all lowercase letters in a file to uppercase is another common transformation that can be defined by a
language homomorphism.

Definition 5.8 Let Σ = {a a1 ,a am } be an alphabet and let Γ be a second alphabet. Given words w 1 ,
a2 , . . . ,a
w 2 , . . . , w m over Γ, define a language homomorphism ψ: Σ → Γ∗ by ψ(a i ) = w i for each i , which can be
extended to ψ: Σ∗ → Γ∗ by:

ψ(λ) = λ
(∀a ∈ Σ)(∀x ∈ Σ∗ )(ψ(a · x) = ψ(a) · (ψ(x)))

ψcan be further extended to operate on a language L by defining

ψ(L) = {ψ(z) ∈ Γ∗ | z ∈ L}

In this context, ψ: ℘(Σ∗ ) → ℘(Γ∗ ).

158
Figure 5.19: (a) The automaton discussed in Example 5.21; (b) The resulting automaton for Example 5.21

Example 5.19
Let Σ = {a b } and Γ = {cc ,d
a ,b d }, and define ψ by ψ(a a ) = cd and ψ(b b ) = d . For K = {λ,abab bb}, ψ(K ) =
bb
ab,bb
cd d ,d
{λ,cd d d }, while for L = {aa ,bb } , ψ(L) represents all words over {cc ,d
∗ d } in which every c is immediately
followed by d .

Example 5.20
As a second example, let Σ = {), (} and let Γ be the ASCII alphabet. If µ is defined by µ(() = begin and µ()) =
end, then the set M of all strings of matched parentheses maps to K , the set of all matched begin-end
pairs.
A general homomorphism ψ maps a language over Σ to a language over Γ. However, to consider the
closure of DΣ , we will restrict our attention for the moment to homomorphisms for which Γ = Σ. It is
more generally true, though, that even for language homomorphisms between two different alphabets,
if L is FAD, the homomorphic image of L is also FAD (see the exercises).

Theorem 5.9 Let Σ be an alphabet, and let ψ: Σ → Σ∗ be a language homomorphism. Then DΣ is closed
under ψ.
Proof. See the exercises and the following examples. A much more concise way to handle this transfor-
mation will be seen in Chapter 6 when substitutions are explored.

If the homomorphism is length preserving, that is, if it always maps letters to single letters, it is rel-
atively easy to define a new automaton from the old one. Indeed, the state transition diagram hardly
changes at all; all transitions marked b are simply relabeled with the new letter ψ(b b ). For more com-
plex homomorphisms, extra states must be added to accommodate the processing of the surplus letters.
The following two examples illustrate the appropriate transformation of the state transition function and
suggest a convenient labeling for the new states.

Example 5.21
Consider the DFA B displayed in Figure 5.19a. For the homomorphism ξ defined by ξ(a a ) = a and ξ(b
b) = a ,
the automaton that will accept ξ(L(B )) is shown in Figure 5.19b. Note that even in simple examples like
this one the resulting automaton can be nondeterministic.

Example 5.22
For the NDFA C displayed in Figure 5.20a and the homomorphism µ defined by µ(a a ) = cc and µ(b
b) = a ,
the automaton that will accept µ(L(C )) is shown in Figure 5.20b. Note that each state of C requires an

159
Figure 5.20: (a) The automaton discussed in Example 5.22; (b) The resulting automaton for Example 5.22

extra state to accommodate the cc transition.

Example 5.23
Consider the identity homomorphism µ: Σ → Σ∗ defined by (∀a ∈ Σ)(µ(a a ) = a ). Since µ(L) = L, any
collection of languages, including NΣ , is clearly closed under this homomorphism. Unlike DΣ , though,
there are many homomorphisms under which NΣ is not closed.

Lemma 5.4 Let Σ = {a b }, and let ξ: Σ → Σ∗ be defined by ξ(a) = a and ξ(b) = a . Then NΣ is not closed
a ,b
under ξ.
Proof. Consider the set L of all strings that have the same number of a s as b s. This language is in NΣ ,
but ξ(L) is the set of all even-length strings of a s, which is clearly not in NΣ .

A rather trivial example involves the homomorphism defined by ψ(a) = λ for every letter a ∈ Σ. Then
for all languages L, whether or not L ∈ NΣ , ψ(L) = {λ}, which is definitely not in NΣ .

Definition 5.9 Let ψ: Σ → Γ∗ be a language homomorphism and consider z ∈ Γ∗ , The inverse homomor-
phic image of z under ψ is then
ψ−1 (z) = {x ∈ Σ∗ | ψ(x) = z}
For a language L ⊆ Γ∗ , the inverse homomorphic image of L under ψ is defined by

ψ−1 (L) = {x ∈ Σ∗ | ψ(x) ∈ L}

Thus, x ∈ ψ−1 (L) ⇔ ψ(x) ∈ L. While the image of a string under a homomorphism is a single word,
note that the inverse image of a single string may be an entire set of words.

Example 5.24
Consider ξ from Lemma 5.4 in which ξ: Σ → Σ∗ was defined by ξ(a
a ) = a and ξ(b
b ) = a . Let z = aa
aa. Since
ξ(bb
bb) = ξ(ba
bb ba) = ξ(ab
ba ab) = ξ(aa
ab aa
aa) = aa
aa,

ξ−1(aa
aa bb
aa) = {bb ba
bb,ba ab
ba,ab aa}. Note that ξ−1(ba
aa
ab,aa ba
ba) = ;.

160
For L = {x ∈ {a a }∗ | |x| ≡ 0 mod 3}, ξ−1(L) = {x ∈ {a b }∗ | |x| ≡ 0 mod 3}. Note that this second set is
a ,b
definitely larger, since it also contains words with b s in them.
It can be shown that DΣ is closed under inverse homomorphism. The trick is to make the state
transition function of the new automaton simulate, for a given letter a , the action the old automaton
would have taken for the entire string ψ(a a ). As the following proof will illustrate, the only change that
need take place is in the δ function; the newly constructed machine is even deterministic!

Theorem 5.10 Let Σ be an alphabet, and let ψ: Σ → Σ∗ be a language homomorphism. Then DΣ is closed
under ψ−1 .
Proof. Let L ∈ DΣ . Then there exists a DFA A = 〈Σ, S, s 0 , δ, F 〉 such that L(A) = L. Define a new DFA
ψ
A ψ = 〈Σ, S ψ , s 0 , δψ , F ψ 〉 by

Sψ = S
ψ
s0 = s0
Fψ = F

and δψ is defined by
δψ (s,a
a ) = δ(s, ψ(a
a )) ∀s ∈ S, ∀a ∈ Σ

Induction can be used to show δψ(s, x) = δ(s, ψ(x))∀s ∈ S, ∀x ∈ Σ∗ , and in particular ψ−1 (s 0 , x) = δ(s 0 , ψ(x))
∀x ∈ Σ∗ . Hence L(A ψ ) = ψ−1 (L(A)).

This theorem makes it possible to extend the range of the pumping lemma (Theorem 2.3) to many
otherwise unpleasant problems. The set M given in Example 5.20 can easily be shown to violate Theorem
2.3 and is therefore not FAD. The set K given in Example 5.20 is just as clearly not FAD, but this is quite
tedious to formally prove by the pumping lemma (the number of choices for u, v, and w is prohibitively
large to thoroughly cover). An argument might proceed as follows: Assume K were FAD. Then M , being
the inverse homomorphic image of a FAD language, must also be FAD. Since M is known (by an easy
pumping lemma proof) to be definitely not FAD, the assumption that K is FAD must be incorrect. Thus,
K ∈ NΣ .

Lemma 5.5 Let Σ = {a b }, and let ξ: Σ → Σ∗ be defined by ξ(a

a ,b a ) = a and ξ(b
b ) = a . Then NΣ is not closed
under ξ−1 .
Proof. Consider the set L of all strings that have the same number of a s as b s. This language is in NΣ
but ξ−1 (L) is {λ}, which is clearly not in NΣ .

We close this chapter by considering two operators for which it is definitely not convenient to modify
the structure of an existing automaton to construct a new automaton with which to demonstrate closure.

Theorem 5.11 Let Σ be an alphabet. Define the operator b by

L b = {x | ∃y ∈ Σ∗ 3 (x y ∈ L ∧ |x| = |y|)}

Then DΣ is closed under the operator b.

Proof. L b represents the first halves of all the words in L. For example, if K = {ad ad ,abaa
abaa ccccc
abaa,ccccc
ccccc}, then
b
K = {a ab}. Assume that L is FAD. Then there exists a DFA A = 〈Σ, S, s 0 , δ, F 〉 that accepts L. The proof con-
a ,ab
ab
sists of identifying those states q that are “midway” between the start state and a final state; specifically, we

161
need to identify the set of strings for which q is the midpoint. The previous closure results for union, inter-
section, homomorphism, and inverse homomorphism will be used to construct the language representing
L b . Define the length homomorphism ψ: Σ → {1 1}∗ by ψ(a
a ) = 1 for all a ∈ Σ. Note that ψ effectively counts
the number of letters in a word:
ψ(x) = 1 |x|

The following argument can be applied to each state q to determine the set of strings that use it as a “mid-
way” state.
Consider the initial set for q, I (A, q) = {x | δ(s 0 , x) = q} and the terminal set for q, T (A, q) = {x | δ(q, x) ∈
F }. We are interested in finding those words in I (A, q) that are the same length as words in T (A, q).
ψ(I (A, q)) represents strings of 1s whose lengths are the same as words in I (A, q). A similar interpretation
can be given for ψ(T (A, q)). Therefore, ψ(I (A, q)) ∩ ψ(T (A, q)) will reflect those lengths that are common
to both the initial set and the terminal set. The inverse image under ψ for this set will then reflect only
those strings in Σ∗ that are of the correct length to reach q from s 0 . This set is ψ−1 (ψ(I (A, q)) ∩ ψ(T (A, q))).
Not all strings of a given length are likely to reach q, though, so this set must be intersected with I (A, q) to
correctly describe those strings that are both of the proper length and that reach q from the start state. This
set, I (A, q)∩ψ−1 (ψ(I (A, q))∩ψ(T (A, q))), is thus the first halves of all words that have q as their midpoint.
This process can be repeated for each of the (finite) number of states in the automaton A, and the union of
the resulting sets will form all the first halves of words that are accepted by A; that is, the union will equal
Lb .
Note that by moving the start state of A and forming the automaton A q = 〈Σ, S, q, δ, F 〉, each of the
initial sets I (A, q) can be shown to be FAD. Similarly, the automaton A q = 〈Σ, S, s 0 , δ, {q}〉 illustrates that
each terminal set T (A, q) must be FAD, also. Since L b has now been shown to be formed from these basic
FAD sets by applying homomorphisms, intersections, inverse homomorphisms, and unions, L b must be
FAD since DΣ is closed under each of these types of operations.

Example 5.25
Consider the automaton A displayed in Figure 5.21. For the state highlighted as q, the quantities dis-
cussed in Theorem 5.11 would be as follows:

I (A, q) = abc
abc,abc
{abc abc abcabc,abc abc abc abc abc,abcabc abc abc abc
abc, . . .}
T (A, q) = aa aaaa
aa,aaaa
{aa aaaaaa
aaaa,aaaaaaaaaaaa,aaaaaaaa aaaaaaaa
aaaaaaaa, . . .}
= a 2 ,a
{a a 4 ,a
a 6 ,aa 8 ,a
a 10 ,aa 12 , . . .}
ψ(I (A, q)) = 13 ,1
{1 16 ,1
19 ,1112 ,1 115 , . . .}
2 4 6 8 11 0,111 2, . . .}
ψ(T (A, q)) = 1 ,1
{1 1 ,11 ,1 1 ,1
6 12 18
ψ(I (A, q)) ∩ ψ(T (A, q)) = 1 ,1
{1 1 ,1 1 , . . .} = {x ∈ {1 1}∗ | |x| ≡ 0 mod 6}
ψ−1 (ψ(I (A, q)) ∩ ψ(T (A, q))) = {x ∈ {aa ,bb ,cc }+ | |x| ≡ 0 mod 6}
= aaaaaa
aaaaaa,aaaaab
{aaaaaa aaaaab
aaaaab,aaaaac aaaaac
aaaaac,aaaabaaaaaba aaaabb
aaaaba,aaaabb
aaaabb, . . .}
I (A, q) ∩ ψ−1 (ψ(I (A, q)) ∩ ψ(T (A, q))) = abc abc
{abc abc,abc abc abc abc abc abc, . . .}

Similar calculations would have to be done for each of the other states of A. Once again, NΣ does not
enjoy the same closure properties that DΣ does.

Lemma 5.6 Let Σ be an alphabet. Then NΣ is not closed under the operator b.
Proof. Let L = {a n b n | n ≥ 0} ∈ NΣ . Then L b = {a n | n ≥ 0} ∉ NΣ .

162
Figure 5.21: The automaton discussed in Example 5.25

Other examples that show NΣ is not closed under the operator b abound. If K = {x ∈ {a b }∗ | |x|a =
a ,b
|x|b }, then K b = {a b }∗ . The last operator we will cover in this chapter is useful for illustrating closures
a ,b
that may not be effective, that is, for which there may not exist an algorithm for constructing the desired
entity.

Definition 5.10 Let L 1 and L 2 be languages. The quotient of L 1 with L 2 , written L 1 /L 2 , is defined by
L 1 /L 2 = {x | ∃y ∈ Σ∗ 3 y ∈ L 2 ∧ x y ∈ L 1 }

Roughly speaking, the quotient consists of the beginnings of those words in L 1 that terminate in a
word from L 2 .

Example 5.26
Let Σ = {a b }∗ . Then {b
a ,b b 2 ,b
b 4 ,b
b 6 ,b
b 8 ,b
b 10 ,b
b 12 , . . .}/{b b 1 ,b
b } = {b b 3 ,b
b 5 ,b
b 7 ,b
b 9 ,b
b 11 , . . .}.
2
b ,b 4
b ,b 6
b ,b 8
b ,bb 10,b 12
b , . . .}/{aa } = { }.
Note that {b

Theorem 5.12 For any alphabet Σ, DΣ is closed under quotient.

Proof. Let L 1 and L 2 belong to DΣ . Then there is a deterministic finite automaton A 1 = 〈Σ, S 1 , s 01 , δ1 , F 〉
such that L(A 1 ) = L 1 . An automaton that accepts L 1 /L 2 can be defined by A / = 〈Σ, S 1 , s 02 , δ1 , F / 〉, where F /
is defined to be the set of all states t for which there is a word in L 2 that reaches a final state from t . That
is, F / = {t | ∃y ∈ Σ∗ 3 (y ∈ L 2 ∧ δ1 (t , y) ∈ F )}. It can be shown that A / does indeed accept L 1 /L 2 , and hence
DΣ is closed under quotient (see the exercises).

Note that the above proof did not mention the automaton associated with the second language L 2 .
Indeed, the definition given for F / is sufficient to argue that the new automaton does recognize the quo-
tient of the two languages. It was not actually necessary to deal with an automaton for L 2 in order to
argue that there must exist a DFA that recognizes L 1 /L 2 . The proof of Theorem 5.12 is thus an existence
proof, but does not indicate whether DΣ is effectively closed under quotient. Indeed, Theorem 5.12 ac-
tually proves that the quotient of a FAD language with any other language (including those in NΣ ) will
always be FAD. However, if it is hard to determine just which strings in the other language may have
the properties we need to define F / ; we may not really know which subset of states F / should actually
be [after all, we could hardly check the property δ1 (q, y) ∈ F , one string at a time, for each of an infi-
nite number of strings y in L 2 in a finite amount of time]. Fortunately, it is not necessary to know F /
exactly, since there are only a finite number of ways to choose a set of final states in the automaton A / ,

163
and the proof of Theorem 5.12 assures us that one of those ways must be the correct one that admits the
conclusion L(A / ) = L 1 /L 2 .
It would, however, be quite convenient to know what F / actually is so that we could construct the
automaton that actually accepts the quotient; this seems much more satisfying that just knowing that
such a machine must exist! If L 2 is FAD, the existence of an automaton A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉 for which
L(A 2 ) = L 2 does make it possible to calculate F / exactly (see the exercises). Thus, DΣ is effectively closed
under quotient. In later chapters, languages that may make it impossible to determine F / will be studied.
We defer the details of such problems until then.

Lemma 5.7 Let Σ = {a a ,b

b }. NΣ is not closed under quotient.
Proof. Consider the set L of all strings that have a different number of a s than b s. This language is in
NΣ , but L/L = Σ∗ (why?).

From the exercises it will become clear that NΣ is not closed over most of the usual (or unusual!)
operators. Note that DΣ is by contrast a very special set, in that it appears to be closed over every reason-
able unary and binary operation that we might consider. The question of closure will again arise as more
complex classes of machines and languages are presented in later chapters.

Exercises
5.1. Let Σ be an alphabet. Define F Σ to be the collection of all finite languages over Σ. Prove or give
counterexamples to the following:

(a) F Σ is closed under complementation.

(b) F Σ is closed under union.
(c) F Σ is closed under intersection.
(d) F Σ is closed under concatenation.
(e) F Σ is closed under Kleene closure.
(f) F Σ is closed under relative complement.

5.2. Let Σ be an alphabet. Define C Σ to be the collection of all languages over Σ (a language is cofinite if
it is the complement of a finite language). Prove or give counterexamples to the following:

(a) C Σ is closed under complementation.

(b) C Σ is closed under union.
(c) C Σ is closed under intersection.
(d) C Σ is closed under concatenation.
(e) C Σ is closed under Kleene closure.
(f) C Σ is closed under relative complement.

5.3. Let Σ be an alphabet. Define B Σ = F Σ ∪C Σ (see Exercises 5.1 and 5.2). Prove or give counterexamples
to the following:

(a) B Σ is closed under complementation.

164
(b) B Σ is closed under union.
(c) BΣ is closed under intersection.
(d) B Σ is closed under concatenation.
(e) B Σ is closed under Kleene closure.
(f) BΣ is closed under relative complement.

5.4. Let Σ be an alphabet. Define I Σ to be the collection of all infinite languages over Σ. Note that
I Σ = ℘(Σ∗ ) − F Σ (see Exercise 5.1). Prove or give counterexamples to the following:

(a) I Σ is closed under complementation.

(b) I Σ is closed under union.
(c) I Σ is closed under intersection.
(d) I Σ is closed under concatenation.
(e) I Σ is closed under Kleene closure.
(f) I Σ is closed under relative complement.

5.5. Let Σ be an alphabet. Define J Σ to be the collection of all languages over Σ that have infinite comple-
ments. Note that J Σ = ℘(Σ∗ )−C Σ (see Exercise 5.2). Prove or give counterexamples to the following:

(a) J Σ is closed under complementation.

(b) J Σ is closed under union.
(c) J Σ is closed under intersection.
(d) J Σ is closed under concatenation.
(e) J Σ is closed under Kleene closure.
(f) J Σ is closed under relative complement.

a ,b
5.6. Define E to be the collection of all languages over {a b } that contain the word abba
abba. Prove or give
counterexamples to the following:

(a) E is closed under complementation.

(b) E is closed under union.
(c) E is closed under intersection.
(d) E is closed under concatenation.
(e) E is closed under Kleene closure.
(f) E is closed under relative complement.

5.7. If a collection of languages is closed under intersection, does it have to be closed under union?
Prove or give a counter example.

5.8. If a collection of languages is closed under intersection and complement, does it have to be closed
under union? Prove or give a counterexample.

5.9. Show that if a collection of languages is closed under concatenation it is not necessarily closed
under Kleene closure.

165
5.10. Show that if a collection of languages is closed under Kleene closure it is not necessarily closed
under concatenation.

5.11. Show that if a collection of languages is closed under complementation it is not necessarily closed
under relative complement.
p
5.12. Give a finite set of numbers that is closed under .
p
5.13. Give an infinite set of numbers that is closed under .

5.14. Given deterministic machines A 1 and A 2 , use the definition of A ∪ and Definition 4.5 to describe an
algorithm for building a deterministic automaton AU that will accept L(A 1 ) ∪ L(A 2 ).

5.15. Given deterministic machines A 1 and A 2 , and without relying on the construction used in Theorem
5.2:

(a) Build a deterministic automaton A u that will accept L(A 1 ) ∪ L(A 2 ).

(b) Prove that your construction behaves as advertised.
(c) If no minimization is performed in Exercise 5.14, how do the number of states in A ∪ , AU , and
A u compare? (Assume A 1 has n states and A 2 has m states, and give expressions based on
these variables.)

5.16. Let Σ be an alphabet. Define the (unary) operator P by

P (L) = {x | ∃y ∈ Σ∗ 3 x y ∈ L} (for any collection of words L)

a ,bbc
P (L) then represents all the prefixes of words in L. For example, if K = {a bbc d d }, then P (K ) =
bbc,d
λ,a
{λ a ,b
b ,bb
bb bbe
bb,bbe d ,d
bbe,d d d }. Prove that DΣ is closed under the operator P .

5.17. Let Σ be an alphabet. Define the (unary) operator S by

S(L) = {x | ∃y ∈ Σ∗ 3 y x ∈ L} (for any collection of words L)

a ,bbc
S(L) then represents all the suffixes of words in L. For example, if K = {a bbc d d }, then S(K ) =
bbc,d
λ,a
{λ a ,cc ,bc
bc bbc
bc,bbc d ,d
bbc,d d d }.

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing S(L).
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator S.

5.18. Let Σ be an alphabet. Define the (unary) operator C by

C (L) = {x | ∃y, z ∈ Σ∗ 3 y xz ∈ L} (for any collection of words L)

a ,bbc
C (L) then represents all the centers of words in L. For example, if K = {a bbc d d }, then C (K ) =
bbc,d
λ,a
{λ a ,cc ,bc
bc bbc
bc,bbc b ,bb
bbc,b bb d ,d
bb,d d d }.

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing C (L).

166
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator C .

5.19. Let Σ be an alphabet. Define the (unary) operator F by

F (L) = {x | x ∈ L ∧ (if ∃y ∈ Σ∗ 3 x y ∈ L, then y = λ)}

F (L) then represents all the words in L that are not the beginnings of other words in L. For example,
ad ,ab
if K = {ad ab abbad }, then F (K ) = {ad
ab,abbad ad ,abbad
abbad }. Prove that DΣ is closed under the operator F .

5.20. Let Σ be an alphabet, and x = a 1 a 2 . . . a n−1 a n ∈ Σ∗ ; define x r = a n a n−1 . . . a 2 a 1 . For a language L

over Σ, define L r = {x r | x ∈ L}. Note that the (unary) reversal operator r is thus defined by L r =
{a n a n−1 . . . a 3 a 2 a 1 | a 1 a 2 a 3 . . . a n−1 a n ∈ L}, and L r therefore represents all the words in L written
backward. For example, if K = {λ λ,ad
ad ,bbc
bbc
bbc,bbad λ,d
bbad }, then K r = {λ da cbb
a,cbb d abb
cbb,d abb}.

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing L r .
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator r

5.21. Let Σ = {a
a ,b
b ,cc ,d
d }. Define the (unary) operator G by

G(L) = {a n a n−1 . . . a 3 a 2 a 1 a 1 a 2 a 3 . . . a n−1 a n | a 1 a 2 a 3 . . . a n−1 a n ∈ L}

= {w r · w | w ∈ L}
λ,ad
(see the definition of w r in Exercise 5.20). As an example, if K = {λ ad ,bbc
bbc bbad }, then G(K ) =
bbc,bbad
λ,d
{λ d aad ,cbbbbc
cbbbbc
cbbbbc,d d abbbbad }.

(a) Prove that DΣ is not closed under the operator G.

(b) Prove that NΣ is closed under the operator G (perhaps make use of Theorem 5.11).
(c) Prove that NΣ is not closed under the operator b (b is defined in Theorem 5.11).

5.22. Prove that NΣ is closed under the operator r (see Exercise 5.20).

5.23. Prove that NΣ is not closed under the operator P (see Exercise 5.16).

5.24. Prove that NΣ is not closed under the operator S (see Exercise 5.17).

5.25. Prove that NΣ is not closed under the operator C (see Exercise 5.18).

5.26. Prove that NΣ is not closed under the operator F (see Exercise 5.19).

5.27. Consider the following alternate “proof” of Theorem 5.1: Let A be an NDFA and define A ∼ as sug-
gested in Theorem 5.1. Give an example to show that L(A ∼ ) might not be equal to ∼L(A).

5.28. Complete the proof of Lemma 5.7.

5.29. Give an example of a collection of languages that is closed under union, concatenation, and Kleene
closure, but is not closed under intersection.

167
5.30. If a collection of languages is closed under union, does it have to be closed under intersection?
Prove or give a counterexample.

5.31. Refer to the construction in Theorem 5.4 and prove that L(A • ) = L 1 ·L 2 . Warning! This involves a lot
of tedious details.

5.32. Refer to the construction in Theorem 5.5 and prove that L(A ∗ ) = L ∗ . Warning! This involves a lot of
tedious details.

5.33. Amplify the explanations for each of the equivalences in the proof of Theorem 5.2.

5.34. Given a DFA A = 〈Σ, S, s 0 , δ, F 〉, define an NDFA that will accept L(A)+ .

5.35. Given an NDFA A = 〈Σ, S, s 0 , δ, F 〉, define an NDFA that will accept L(A)+ .

5.36. If L is FAD, is it necessarily true that all subsets of L are FAD? Prove or give a counterexample.

5.37. If L ∈ DΣ , is it necessarily true that all supersets of L are in DΣ ? Prove or give a counterexample.

5.38. If L ∈ NΣ , is it necessarily true that all subsets of L are in NΣ ? Prove or give a counterexample.

5.39. If L ∈ NΣ , is it necessarily true that all supersets of L are in NΣ ? Prove or give a counterexample.

5.40. Explain the purpose of the new start state q 0 in the proof of Theorem 5.5.

5.41. Redesign the construction in the proof of Theorem 5.4, making use of λ-transitions where appro-
priate.

5.42. Redesign the construction in the proof of Theorem 5.5, making use of λ-transitions where appro-
priate. Do this in such a way as to make the “extra” start state q 0 unnecessary.

5.43. Redesign the construction in the proof of Theorem 5.4, assuming that A 1 and A 2 are NDFAs.

5.44. Redesign the construction in the proof of Theorem 5.5, assuming that A is a DFA.

5.45. How does the right congruence generated by a language L compare to the right congruence gener-
ated by the complement of L? Hint: It may be helpful to consider the construction of A ∼ given in
Theorem 5.1 when A is a minimal machine accepting L.

5.46. (a) Give examples of languages L 1 and L 2 for which R (L 1 ∩L 2 ) = R L 1 ∩ R L 2 (see Definition 2.2).
(b) Give examples of languages L 1 and L 2 for which R (L 1 ∩L 2 ) 6= R L 1 ∩R L 2 . Hint: It may be helpful to
consider the construction of A ∩ given in Lemma 5.1 to direct your thinking.

5.47. Consider the following assertion: DΣ is closed under relative complement; that is, if L 1 and L 2 are
FAD, then L 1 − L 2 is also FAD.

(a) Prove this by appealing to existing theorems.

(b) Define an appropriate “new” machine.
(c) Prove that the machine constructed in part (b) behaves as advertised.

5.48. Define LΣ to be the set of all languages recognized by NDFAs with λ-transitions. What sort of clo-
sure properties does LΣ have? How does LΣ compare to DΣ ?

168
5.49. (a) Give an example of a language L for which λ ∈ L + .
(b) Give three examples of languages L for which L + = L.

5.50. Recall that δ∪ : (S 1 ∪ S 2 ) × Σ → ℘(S 1 ∪ S 2 ) was defined by

 δ1 (s,a
a)

if s ∈ S 1
∪
δ (s,a
a) = , ∀s ∈ S 1 ∪ S 2 , a ∈Σ
∀a
δ2 (s,a
a)

if s ∈ S 2

(a) Prove (by induction) that δ∪ conforms to a similar formula:


 δ1 (s, x) if s ∈ S 1

δ (s, x ) =
∪ , ∀s ∈ S 1 ∪ S 2 , ∀x ∈ Σ∗
δ2 (s, x) if s ∈ S 2



(b) Was this fact used in the proof of Theorem 5.2?

5.51. Let Σ be an alphabet. Prove or give counterexamples to the following:

(a) NΣ is closed under relative complement.

(b) NΣ is closed under union.
(c) NΣ is closed under concatenation.
(d) NΣ is closed under Kleene closure.
(e) If L ∈ NΣ , then L + ∈ NΣ .

5.52. Why was it necessary to require that S 1 ∩ S 2 = ; in the proof of Theorem 5.4? Would any step of the
proof be invalid without this assumption? Explain.

5.53. Let Σ be an alphabet. Define E (L) = {z | (∃y ∈ Σ+ )(∃x ∈ L)(z = y x)}.

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing E (L).
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator E .

5.54. Let Σ be an alphabet. Define B (L) = {z | (∃x ∈ L)(∃y ∈ Σ∗ )(z = x y)}.

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing B (L).
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator B .

5.55. Let Σ be an alphabet. Define M (L) = {z | (∃x ∈ L)(∃y ∈ Σ+ )(z = x y)}.

(a) Given an automaton accepting L. describe how to modify it to produce an automaton accept-
ing M (L).
(b) Prove that your construction behaves as advertised.

169
(c) Argue that DΣ is closed under the operator M .

5.56. Refer to the definitions given in Lemma 5.1 and use induction to show that

(∀s ∈ S 1 )(∀t ∈ S 2 )(∀x ∈ Σ∗ )(δ∩ (〈s, t 〉, x) = 〈δ1 (s, x), δ2 (t , x)〉)

5.57. Refer to Lemma 5.1 and prove that L(A ∩ ) = L 1 ∩ L 2 . As long as the reference is explicitly stated, the
result in Exercise 5.56 can be used without proof.

5.58. Prove Theorem 5.6.

5.59. Prove Theorem 5.7.

5.60. (a) Cleverly define a machine modification that does not use any λ-moves that could be used to
prove Theorem 5.7 (your new machine is still likely to be nondeterministic, however).
(b) Prove that your modified machine behaves as advertised.

5.61. Let W (L) = {x | x is formed by deleting one or more letters from a word in L}.

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing W (L).
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator W .

5.62. Let V (L) = {x | x is formed by deleting the odd-positioned letters from a word in L}. [Note: This refers
to the first, third, fifth, and so on, letters in a word. For example, if abcd e f g ∈ L, then bd f ∈ V (L).]

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing V (L).
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator V .

5.63. Let U (L) = {x | x is formed by deleting the even-positioned letters from a word in L}. [Note: This
refers to the second, fourth, sixth, and so on, letters in a word. For example, if abcd e f g ∈ L, then
aceg ∈ U (L).]

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing U (L).
(b) Prove that your construction behaves as advertised.
(c) Argue that DΣ is closed under the operator U.

5.64. Let T (L) = {x | x is formed by deleting every third, sixth, ninth, and so on, letters from a word in
L}. [Note: This refers to those letters in a word whose index position is congruent to 0 mod 3. For
example, if abcd e f g ∈ L, then abd eg ∈ T (L).]

(a) Given an automaton accepting L, describe how to modify it to produce an automaton accept-
ing T (L).
(b) Prove that your construction behaves as advertised.

170
(c) Argue that DΣ is closed under the operator T .

5.65. Let P = {x | |x| is prime} and let I (L) be defined by I (L) = L ∩ P .

(a) Show that DΣ is not closed under I .

(b) Show that F Σ is closed under I (see Exercise 5.1).
(c) Prove or disprove: C Σ is closed under I (see Exercise 5.2).
(d) Prove or disprove: B Σ is closed under I (see Exercise 5.3).
(e) Prove or disprove: I Σ is closed under I (see Exercise 5.4).
(f) Prove or disprove: J Σ is closed under I (see Exercise 5.5).
(g) Prove or disprove: E is closed under I (see Exercise 5.6).
(h) Prove or disprove: NΣ is closed under I .

b } that do not contain λ. Prove or give coun-

a ,b
5.66. Define C to be the collection of all languages over {a
terexamples to the following:

(a) C is closed under complementation.

(b) C is closed under union.
(c) C is closed under intersection.
(d) C is closed under concatenation.
(e) C is closed under Kleene closure.
(f) C is closed under relative complement.
(g) If L ∈ C , then L + ∈ C

5.67. (a) Consider the statement that DΣ is closed under finite union:
i. Prove by existing theorems and induction.
ii. Prove by construction.
(b) Prove or disprove that DΣ is closed under infinite union. Justify your assertions.

5.68. Let Σ = {a
a ,b
b }.

(a) Give examples of three homomorphisms under which NΣ is not closed.

(b) Give examples of three homomorphisms under which NΣ is closed.

5.69. Let Σ = {a
a }. Can you find two different homomorphisms under which NΣ is not closed? Justify your
conclusions.

5.70. Refer to the construction given in Theorem 5.10.

(a) Prove δψ(s, x) = δ(s, ψ(x))∀s ∈ S, ∀x ∈ Σ∗

(b) Complete the proof of Theorem 5.10.

5.71. Consider the homomorphism ξ given in Lemma 5.4 and the set L of all strings that have the same
number of a s as b s.

171
(a) DΣ is closed under inverse homomorphism, but ξ(L) is the set of all even-length strings of a s,
and it appears that under ξ−1 the DFA language ξ(L) maps to the non-FAD language L. Explain
the apparent contradiction. Hint: First compute ξ−1 (ξ(L))
(b) Give an example of a homomorphism for which ψ(ψ−1 (L)) 6= L.
(c) Give an example of a homomorphism for which ψ−1 (ψ(L)) 6= L.
(d) Prove ψ(ψ−1 (L)) ⊆ L.
(e) Prove L ⊆ ψ−1 (ψ(L)).

5.72. Let Σ be an alphabet. Define the (unary) operator e by

L e = {x | ∃y ∈ Σ∗ 3 (y x ∈ L ∧ |x| = |y|)}

L 0 then represents the last halves of all the words in L. For example, if K = {ad
ad ,abaa
abaa ccccc
abaa,ccccc
ccccc}, then
K = {dd ,aa
aa
aa}. Prove that DΣ is closed under the operator e.

5.73. Refer to the proof of Theorem 5.11 and show that there exists an automaton A for which it would
be incorrect to try to accept L b by redefining the set of final states to be the set of “midway” states.

5.74. Consider the sets M and K in Example 5.20. Assume that we have used the pumping lemma to show
that M is not FAD. What would be wrong with arguing that, since M was not FAD, its homomorphic
image cannot be FAD either, and hence K is therefore not FAD.

5.75. Prove Theorem 5.9.

5.76. Let Σ be the ASCII alphabet. Define a homomorphism that will capitalize all lowercase letters (and
does not change punctuation, spelling, and the like).

5.77. Consider the proof of Theorem 5.12.

/ / /
(a) Show that L(A ) = L 1 /L 2 for the A and A defined by A = 〈Σ, S 1 , s 01 , δ1 , F 〉 and A = 〈Σ, S 1 , s 01 , δ1 , F / 〉,
/
where F = {t | ∃y ∈ Σ∗ 3 (y ∈ L 2 ∧ δ1 (t , y) ∈ F )}.
(b) Given deterministic finite automata A 1 = 〈Σ, S 1 , s 01 , δ1 , F 〉 such that L(A 1 ) = L 1 , and A 2 =
/
〈Σ, S 2 , s 02 , δ2 , F 2 〉 for which L(A 2 ) = L 2 , give an algorithm for computing F = {t | ∃y ∈ Σ∗ 3
(y ∈ L 2 ∧ δ1 (t , y) ∈ F )}.

5.78. Given two alphabets Σ1 and Σ2 and a DFA A = 〈Σ1 , S, s 0 , δ, F 〉:

(a) Define a new automaton A 0 = 〈Σ1 ∪ Σ2 , S 0 , s 00 , δ0 , F 0 〉 for which L(A 0 ) = L(A).

(b) Prove that A 0 behaves as advertised.

5.79. Let S be a collection of languages that is closed under union, concatenation, and Kleene closure.
Prove or disprove: If S contains an infinite number of languages, every language in S must be FAD.

5.80. Let S be a collection of languages that is closed under union, concatenation, and Kleene closure.
Prove or disprove: If S is a finite collection. every language in S must be FAD.

5.81. Let u be a unary language operator that, when composed with itself, yields the identity function.
Prove that if DΣ is closed under u, then NΣ must also be closed under u.

172
Chapter 6

Regular Expressions

In this chapter we will develop a standard notation for denoting FAD languages and thus explore yet
another characterization of these languages. The specification of a language by an automaton unfortu-
nately does not provide a convenient summary of those strings that are accepted; it is straightforward to
check whether any particular word belongs to the language, but it is often difficult to get an overall sense
of the set of accepted words. Were the language finite, the individual words could simply be explicitly
listed. The delineation of an infinite set in this manner is clearly impossible.
Up to this point, we have relied on English descriptions of the languages under consideration. Nat-
ural languages are unfortunately imprecise, and even small machines can have impossibly complex de-
scriptions. The concept of regular expressions provides a clear and concise vehicle for denoting many of
the languages we have studied in the previous chapters.

6.1 Algebra of Regular Expressions

The definition of set union and the concepts of language concatenation (Definition 5.4) and Kleene clo-
sure (Definition 5.5) afford a convenient and powerful method for building new languages from existing
ones. The expression ({a b } · {cc })∗ · {d
a ,b d } is an infinite set built from simple alphabets and the operators
presented in Chapter 5. We will see that this type of representation is quite suitable for our purposes and
is intimately related to the finite automaton definable languages.

a 1 ,a
Definition 6.1 Let I = {a a m } be an alphabet. A regular set over Σ is any set that can be formed by
a 2 , . . . ,a
a sequence of applications of the following rules:

a 1 }, {a
i. {a a 2 }, . . . , {a
a m } are regular sets.

ii. { } (the empty set of words) is a regular set.

iii. {λ} (the set containing only the empty word) is a regular set.

iv. If L 1 and L 2 are regular sets, then so is L 1 · L 2 .

v. If L 1 and L 2 are regular sets, then so is L 1 ∪ L 2 .

vi. If L 1 is a regular set, then so is L ∗1

173
Example 6.1
Let Σ = {a
a ,b
b ,cc }. Each of the following languages are regular sets:

{λ} b } ∪ {cc }
{b a } · ({b
{a b } ∪ {cc }) {bb · λ}
a }∗
{a ({a b }∗ )
a } ∪ {λ}) · ({b {} {cc } · { }

The multitude of set brackets in these expressions is somewhat undesirable; we now present a com-
mon shorthand notation to represent such sets. Expressions like {a a }∗ will simply be written as a ∗ , and
a } · {b
{a b } will be shortened to ab
ab. The notation we wish to use can be formally defined in the following
recursive manner.

Definition 6.2 Let Σ = {a

a 1 ,a a m } be an alphabet. A regular expression over Σ is a sequence of symbols
a 2 , . . . ,a
formed by repeated application of the following rules:

a 2 , . . . ,a
i. a 1 ,a a m are all regular expressions, representing the regular sets {a
a 1 }, {a
a 2 }, . . . , {a
a m }, respectively.

ii. ; is a regular expression representing { }.

iii. ² is a regular expression representing {λ}.

iv. If R 1 and R 2 are regular expressions corresponding to the sets L 1 and L 2 , then (R 1 · R 2 ) is a regular
expression representing the set L 1 · L 2 .

v. If R 1 and R 2 are regular expressions corresponding to the sets L 1 and L 2 , then (R 1 ∪ R 2 ) is a regular
expression representing the set L 1 ∪ L 2 .

vi. If R 1 is a regular expression corresponding to the set L 1 , then (R 1 )∗ is a regular expression representing
the set L ∗1 .

Example 6.2
Let Σ = {a
a ,b
b ,cc }. The regular sets in Example 6.1 can be represented by the following regular expressions:

² b ∪cc )
(b a · (b
(a b ∪cc ) (bb · · · ²)
a )∗
(a a ∪ ²) · (b
((a b )∗ ) (;)∗ (cc · ;)

Note that each expression consists of the “basic building blocks” given by 6.2i through 6.2iii and are
connected by the operators ∪, · and ∗ according to rules 6.2iv through 6.2vi. Each expression is intended
to denote a particular language over Σ. Such representations of languages are by no means unique. For
example, (a a · (b
b ∪cc )) and ((a
a ·b
b ) ∪ (a
a ·cc )) both represent the same set, {ab
ab ac
ab,ac b · ²) and b both
ac}. Similarly, (b
b
represent {b }.
The intention of the parentheses is to prevent ambiguity; a ·b b ∪cc could mean (a a ·(b
b ∪cc )) or ((a
a ·b
b )∪cc ),
and the difference is important: the first expression represents {ab ab
ab,acac
ac}, while the second represents
ab
ab,cc }, which are obviously different languages. To ease the burden of all these parentheses, we will
{ab
adopt the following simplifying conventions.
Notational Convention: The precedence of the operators, from highest to lowest, shall be ∗ , ·, ∪.
When writing a regular expression, parentheses that conform to this hierarchy may be omitted. In par-
ticular, the outermost set of parentheses can always be omitted. Juxtaposition may be used in place of
the concatenation symbol (·).

174
Example 6.3
Thus, a ·bb ∪cc will be taken to mean (a a ·b
b ) ∪cc ), not (a
a · (b
b ∪cc )), since · has precedence over ∪. Redundant
parentheses that are implied by the precedence rules can be eliminated, and thus (((a a ·b
b ) ∪cc ) ·d
d ) can be
d . Notice that b ∪cc ∗ represents (b
ab ∪cc )d
written as (ab b ∪(cc ∗ )), not (bb ∪cc )∗ . Kleene closure therefore behaves
much like exponentiation does in ordinary algebraic expressions in that it is given precedence over the
other operators. Concatenation and union behave much like the algebraic operators multiplication and
addition, respectively. Indeed, some texts use + instead of ∪ for union; the symbol for concatenation
already agrees with that for multiplication (·), and we will likewise allow the symbol to be omitted in
favor of juxtaposition. The constants ; and ² behave much like the numbers 0 and 1 do in algebra. The
common identities x + 0 = x, x · 1 = x and x · 0 = 0 have parallels in language theory (see Lemma 6.1).
Indeed, ; is the identity for union and ² is the identity for concatenation.
Thus far we have been very careful to distinguish between the name of an object and the object itself.
In algebra, we are used to saying that the symbol 4 equals the string of symbols (that is, the word) 20 ÷ 5;
we really mean that both names refer to the same object, the concept we generally call the number four.
(You should be able to think of many more strings that are commonly used as a name for this number, for
example, ||||, IV, and 1002 ). We will be equally inexact here, writing a · (b b ∪cc ) = (a
a ·b
b ) ∪ (a
a ·cc ). This will be
taken to mean that the sets represented by the two expressions are equal (as is the case here; both equal
ab ac
ab,ac
{ab ac}) and will not be construed to mean that the two expressions themselves are identical (which is
clearly not the case here; the right-hand side has more a s, more parentheses, and more concatenation
symbols).

Definition 6.3 Let R be a regular expression. The language represented by R is formally denoted by L(R).
Two regular expressions R 1 and R 2 will be said to be equivalent if the sets represented by the two expressions
are equal, and we will write R 1 = R 2 .

Thus, R 1 and R 2 are equivalent if L(R 1 ) = L(R 2 ), but this is commonly abbreviated R 1 = R 2 . The word
“equivalent” has been seen in three different contexts so far: there are equivalent DFAs, equivalent ND-
FAs, and now equivalent regular expressions. In each case, the intent has been to equate constructs that
are associated with the same language. Now that the idea of equality (equivalence) has been established,
some general identities can be outlined. The properties given in Lemma 6.1 follow directly from the
definitions of the operators.

Lemma 6.1 Let Σ be an alphabet, and let R 1 , R 2 , and R 3 be regular expressions. Then:

(a) R 1 ∪ ; = R 1

(b) R 1 · ² = R 1 = ² · R 1

(d) R 1 ∪ R 2 = R 2 ∪ R 1

(e) R 1 ∪ R 1 = R 1

(f ) R 1 ∪ (R 2 ∪ R 3 ) = (R 1 ∪ R 2 ) ∪ R 3

(g) R 1 · (R 2 · R 3 ) = (R 1 · R 2 ) · R 3

(h) R 1 · (R 2 ∪ R 3 ) = (R 1 · R 2 ) ∪ (R 1 · R 3 )

175
(i) ²∗ = ²

(j) ;∗ = ²

(k) (R 1 ∪ R 2 )∗ = (R 1∗ cupR 2∗ )∗

(l) (R 1 ∪ R 2 )∗ = (R 1∗ · R 2∗ )∗

(m) (R 1∗ )∗ = R 1∗

(n) (R 1 ) ∗ ·(R 1∗ ) = R 1∗

Furthermore, there are examples of sets for which:

(b0 ) R 1 ∪ ² 6= R 1
(d0 ) R 1 · R 2 6= R 2 · R 1
(e0 ) R 2 · R 1 6= R 1
(h0 ) R 1 ∪ (R 2 · R 3 ) 6= (R 1 ∪ R 2 ) · (R 1 ∪ R 3 )
(k0 ) (R 1 · R 2 )∗ 6= (R 1∗ · R 2∗ )∗
(l0 ) (R 1 · R 2 )∗ 6= (R 1∗ ∪ R 2∗ )∗
Proof. Property (h) will be proved here. The remainder are left as exercises.

w ∈ R 1 · (R 1 ∪ R 3 ) ⇔ (by definition of ·)
(∃x, y)(y ∈ (R 2 ∪ R 3 ) ∧ x ∈ R 1 ∧ w = x · y) ⇔ (by definition of ∪)
(∃x, y)((y ∈ R 2 ∨ y ∈ R 3 ) ∧ (x ∈ R 1 ∧ w = x · y)) ⇔ (by the distributive law)
(∃x, y)(((y ∈ R 2 ) ∧ (x ∈ R 1 ∧ w = x · y))
∨((y ∈ R 3 ) ∧ (x ∈ R 1 ∧ w = x · y))) ⇔ (by definition of ·)
(∃x, y)((w = x · y ∈ R 1 · R 2 ) ∨ (w = x · y ∈ R 1 · R 3 )) ⇔ (by definition of ∪)
w ∈ (R 1 · R 2 ) ∪ (R 1 · R 3 )

Note that identity (c) in Lemma 6.1 implies that {a a ,b

b } · ; = ;, which follows immediately from the
definition of concatenation. If w ∈ {a a ,b
b } · ;, then w would have to be of the form x · y, where x ∈ {a
a ,b
b}
and y ∈ ;; there are clearly no valid choices for y, so {aa ,b
b } · ; is empty.

6.2 Regular Sets As FAD Languages

Armed with the constructs and properties discussed in the first section, we will now consider what types
of languages can actually be defined by regular expressions. How general is this method of expressing
sets of words? Can the FAD languages be represented by regular expressions? (Yes). Can all program-
ming languages be represented by regular expressions? (No). Are regular sets always finite automaton
definable languages? (Yes). We begin by addressing this last question.

Definition 6.4 Let Σ be an alphabet. RΣ is defined to be the set of all regular sets over Σ.

The first question to be considered is, Can every regular set be recognized by a DFA? That is, is RΣ ⊆
DΣ ? It is clear that the “basic building blocks” are recognizable. Figure 6.1 shows three NDFAs that accept
{ }, {λ}, and {cc }, respectively. Recalling the constructions outlined in Chapter 5, it is easy to see how to
combine these “basic machines” into machines that will accept expressions involving the operators ∪, ·,
and ∗ .

176
Figure 6.1: NDFAs which recognize regular expressions with zero operators

Figure 6.2: The NDFA discussed in Example 6.4

Example 6.4
An NDFA that accepts a ∪b b (as suggested by the proof of Theorem 5.2) is shown in Figure 6.2. Note that
it is composed of the basic building blocks for the letters a and b , as suggested by the constructions in
Figure 6.1.

Example 6.5
An NDFA that accepts (a a ∪ b )∗ is shown in Figure 6.3. The automaton given in Figure 6.2 for (a
a ∪ b ) is
modified as suggested by the proof of Theorem 5.5 to produce the Kleene closure of (a a ∪ b ). Recall that
the “extra” state q 0 was added to ensure that λ is accepted by the new machine.

Example 6.6
An NDFA that accepts c · (a b )∗ (as suggested by the proof of Theorem 5.4) is shown in Figure 6.4.
a ∪b
Note that in this last example q 0 , t 0 , and s 0 are disconnected states, and r 1 , s 1 , and t 1 could be coa-
lesced into a single state. The resulting machines are not advertised to be efficient; the main point is that
they can be built. The techniques illustrated above are used to prove the following lemma.

177
Figure 6.3: The NDFA discussed in Example 6.5

Figure 6.4: The NDFA discussed in Example 6.6

178
Lemma 6.2 Let Σ be an alphabet and let R be a regular set over Σ. Then there is a DFA that accepts R.
Proof. The proof is by induction on the number of operators in the regular expression describing R
(see the exercises). Note that Figure 6.1 effectively illustrates the basis step: Those regular expressions with
zero operators (;, ²,a a 1 ,a
a 2 , . . . ,a
a m ) do indeed correspond to FAD languages. This covers sets generated by
rules i, ii, and iii of Definition 6.2. For sets corresponding to regular expressions with a positive number
of operators, the outermost operator can be identified, and it will be either ·, ∪, or ∗ , corresponding to an
application of rule iv, v, or vi. The induction assumption will guarantee that the subexpressions used by
the outermost operator have corresponding DFAs. Theorems 5.2, 5.4, and 5.5 can then be invoked to argue
that the entire expression has a corresponding DFA.

Corollary 6.1 Let Σ be an alphabet. Then RΣ ⊆ DΣ .

Proof. The proof follows immediately from Lemma 6.2.

Since we are assured that every regular set can be accepted by a finite automaton, the collection of
regular sets is clearly contained in the set of FAD languages. This also means that those languages that
cannot be represented by a DFA (that is, those contained in NΣ ) have no chance of being represented by
a regular expression.

6.3 Language Equations

The next question we will address is whether DΣ ⊆ RΣ , that is, whether every FAD language can be
represented by a regular expression. The reader is invited to take a sample DFA and try to express the
language it accepts by a regular expression. You will probably be able to do it, but only by guesswork
and trial and error. Our first question appears to have a much more methodical solution: Given a regular
expression, it was a relatively straightforward task to draw a NDFA (and then a DFA); in fact, we have a
set of algorithms for doing just that, and we could program a computer to do the task for us. This second
question does not seem to have an obvious algorithm connected with it, and we will have to attack the
problem using a new concept: language equations.
In algebra, we are used to algebraic equations such as 3x + 7 = 19. Recall that a solution to this
equation is a numerical value for x that will make the equation true, that is, make both sides equal.
In the above example, there is only one choice for x, the unique solution 4. Equations can have two
different solutions, like x 2 = 9, no [real] solutions, like x 2 = −9, or an infinite number of solutions, like
2(x + 3) = x + 6 + x. In a similar way, set equations can be solved, such as {a a ,b
b ,cc } = {a a ,b
b } ∪ X . Here X
represents a set, and we are again looking for a value for X that will make the equation true; an obvious
choice is X = {cc }, but there are other choices, like X = {b b ,cc } (since {a
a ,b
b ,cc } = {a
a ,b
b }∪{b b ,cc }). Such equations
may likewise have no solutions, like X ∪{b b } = {a
a ,cc }, or an infinite number of solutions, such as X ∪{b b} = X
(what sorts of sets satisfy this last equation?). We wish to look at set equations where the sets are actually
sets of strings, that is, language equations. The type of equation in which we are most interested has one
and only one solution, as outlined in the next theorem. It is very similar in form and spirit to the theorem
in algebra that says “For any numbers a and b, where a 6= 0, the equation ax = b has a unique solution
given by x = b ÷ a.”

Theorem 6.1 Let Σ be an alphabet. Let E and A be any subsets of Σ∗ . Then the language equation X =
E ∪ A · X admits the solution X = A ∗· E . Any other solution Y must contain A ∗· E . Furthermore, if λ ∉ A,
X = A ∗· E is the unique solution.

179
Proof. First note that the set A ∗ E is indeed a solution to this equation, since A ∗ E = E ∪ A · (A ∗ E ) (see
the exercises). Now assume that some set Y is a solution to this equation, and let us investigate some of the
properties that Y must have: If Y is a solution, then

Y = E ∪ A ·Y ⇒ (by definition of ∪)
E ⊆ Y ∧ A ·Y ⊆ Y ⇒ (if E ⊆ Y , then A · E ⊆ A · Y )
A ·E ⊆ A ·Y ⊆ Y ⇒ (by substitution)
A · A ·E ⊆ A · A ·Y ⊆ A ·Y ⊆ Y ⇒ (by induction)
(∀n ∈ N)(A n · E ⊆ Y ) ⇒ (by definition of A ∗ )
A ∗· E ⊆ Y

Thus, every solution must contain all of A ∗ E , and A ∗ E is in this sense the smallest solution. This is
true regardless of whether or not λ belongs to A.
Now let us assume that λ ∉ A and that we have a solution W that is actually “bigger” than A ∗ E ; we
will show that this is a contradiction, and thus all solutions must look exactly like A ∗ E . If W is a solution,
W 6= A ∗ E , then there must be some elements in the set W − A ∗ E ; choose a string of minimal length from
among these elements and call it z. Thus z ∈ W and z ∉ A ∗ E , and since E ⊆ A ∗ E (why?), z ∉ E . Since W is
a solution, we have

W = E ∪ A ·W ⇒ (since z ∈ W and it cannot be in the E part)

z ∈ A ·W ⇒ (by definition of ·)
(∃x ∈ A, ∃y ∈ W ) 3 z = x · y ⇒ (by definition of | |)
|z| = |x| + |y| ⇒ (since λ ∉ A and x ∈ A, so x ∉ λ and |x| > 0)
|y| < |z|

Note that y cannot belong to A ∗ E (if y ∈ A ∗ E , then, since x ∈ A, z(= x · y) ∈ A · (A ∗ E ) ⊆ A ∗ E , which

means that z ∈ A ∗ E , and we started by assuming that z ∉ A ∗ E ); since y ∈ W , we have y ∈ W − A ∗ E , and
we have produced a string shorter than z, which belongs to W − A ∗ E . This is the contradiction we were
looking for, and we can conclude that it is impossible for a solution W to be larger than A ∗ E . Since we have
already shown that no solution can be smaller than A ∗ E , we now know that the only solution is exactly
A∗E .

Example 6.7
X = {b b ,cc } ∪ {a
a } · X does indeed have a solution; X can equal {a a }∗· {bb ,cc }. Note also that this is the only
solution (verify, for example, that X = {a ∗
a } · {cc } is not a solution). The equation Z = {b a , λ} · Z has
b ,cc } ∪ {a
several solutions; among them are Z = {a a }∗· {bb ,cc } and Z = {a b ,cc }∗ .
a ,b
It is instructive to explicitly list the first few elements of {a a }∗· {bb ,cc } and begin to check the validity of
the solution to the first equation. If Y is a solution, then the two sides of the equation Y = {b b ,cc } ∪ {a
a} · Y
b c
must be equal. Since both and appear on the right-hand side they must also be on the left-hand side,
which clearly means that they have to be in Y . Once b is known to be in Y , it will give rise to a term on
the right-hand side due to the presence of {a a }·Y . Thus, a ·bb must also be found on the left-hand side and
therefore is in Y , and so on. The resulting sequence of implications parallels the first part of the proof of
Theorem 6.1.
To see intuitively why no string other than those found in {a a }∗ · {bb ,cc } may belong to a solution for
X = {b b ,cc } ∪ {a
a } · X , consider a string such as aaaa. If this were to belong to X , then it would appear on the
left-hand side and therefore would have to ppear on the right-hand side as well if the two sides were to
indeed be equal. On the right-hand side are just the two components, {b b ,cc } and {a
a } · X . aa is clearly not

180
in {bb ,cc }, so it must be in {a a } · X , which does seem plausible; all that is necessary is for a to be in X , and
then aa will belong to {a a }· X . If a is in X , though, it must also appear on the left-hand side, and so a must
be on the right-hand side as well. Again, a is not in {b b ,cc }, so it must be in {a
a } · X . This can happen only
if λ belongs to X so that a · λ will belong to {a a } · X . This implies that λ must now show up on both sides,
and this leads to a contradiction: λ cannot be on the right-hand side since λ clearly is not in {b b ,cc }, and it
cannot belong to {a a } · X either, since all these words begin with an a . This contradiction shows why aa
cannot be part of any solution X .
This example illustrates the basic nature of these types of equations: for words that are not in {a a }∗·
b ,cc }, the inclusion of that word in the solution leads to the inclusion of shorter and shorter strings,
{b
which eventually leads to a contradiction. This property was exploited in the second half of the proof
of Theorem 6.1. Rather than finding shorter and shorter strings, though, it was assumed we already had
the shortest, and we showed that there had to be a still shorter one; this led to the desired contradiction
more directly.
Our main goal will be to solve systems of language equations, since the relationships between the
terminal sets of an automaton can be described by such a system. Systems of language equations are
similar in form and spirit to systems of algebraic equations, such as

3x 1 + x 2 = 10
x1 − x2 = 2

which has the unique solution x 1 = 3, x 2 = 1. We will look at systems of language equations such as

X 1 = ² ∪ a · X 1 ∪b
b · X2
X 2 = ; ∪b b · X1 ∪ ; · X2

which has the (unique) solution X 1 = (a bb)∗ , X 2 = b · (a

a ∪bb
bb bb)∗ . Checking that this is a solution entails
a ∪bb
bb
verifying that both equations are satisfied if these expressions are substituted for the variables X 1 and
X2.
The solution of such systems parallels the solution of algebraic equations. For example, the system

3x 1 + x 2 = 10
x1 − x2 = 2

can be solved by treating the second statement as an equation in just the variable X 2 and solving as
indicated by the algebraic theorem “For any numbers a and b, where a 6= 0, the equation ax = b has a
unique solution given by x = b ÷ a.” The second statement can be written as (−1)x 2 = 2 − x 1 , which then
admits the solution x 2 = (2 − x 1 ) ÷ (−1) or x 2 = x 1 − 2. This solution can be inserted into the first equation
to eliminate x 2 and form an equation solely in x 1 . Terms can be regrouped and the algebraic theorem
can be applied to find x 1 . We would have
3x 1 + x 2 = 10

which becomes
3x 1 + x 1 − 2 = 10

or
4x 1 − 2 = 10

or
4x 1 = 12

181
or
x 1 = 12 ÷ 4
yielding
x1 = 3
This value of x 1 can be back-substituted to find the unique solution for x 2 : x 2 = x 1 − 2 = 3 − 2 = 1.
Essentially, the same technique can be applied to any two equations in two unknowns, and formulas
can be developed that predict the coefficients for the reduced set of equations. Consider the generalized
system of algebraic equations with unknowns x 1 and x 2 , constant terms E 1 and E 2 , and coefficients
A 11 , A 12 , A 21 , and A 22 :

A 11 x 1 + A 12 x 2 = E 1
A 21 x 1 + A 22 x 2 = E 2

Recall that the appropriate formulas for reducing this to a single equation of the form Â 11 x 1 = Ê 1 , where
the new coefficients Â 11 and Ê 1 can be calculated as

Ê 1 = E 1 A 22 − E 2 A 12
Â 11 = A 11 A 22 − A 12 A 21

A similar technique can be used to eliminate variables when there is a larger number of equations in the
system. The following theorem makes similar predictions of the new coefficients for language equations.

Theorem 6.2 Let n ≥ 2 and consider the system of equations in the unknowns X 1 , X 2 , . . . , X n given by

X1 = E 1 ∪ A 11 X 1 ∪ A 12 X 2 ∪ · · · ∪ A 1(n−1) X n−1 ∪ A 1n X n
X2 = E 2 ∪ A 21 X 1 ∪ A 22 X 2 ∪ · · · ∪ A 2(n−2) X n−1 ∪ A 2n X n
·
·
·
X n−1 = E n−1 ∪ A (n−1)1 X 1 ∪ A (n−1)2 X 2 ∪ · · · ∪ A (n−1)(n−1) X n−1 ∪ A (n−1)n X n
Xn = E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 ∪ A nn X n

in which (∀i , j ∈ {1, 2, · · · , n})(λ ∉ A i j ).

a. This system has a unique solution.

b. Define Ê i = E i ∪ (A i n · A ∗nn · E n ) for all i = 1, 2, · · · , n − 1

and
Â i j = A i j ∪ (A i n · A ∗nn · A n j ) for all i , j = 1, 2, · · · , n − 1.
The solution to the original set of equations will agree with the solution to the following set of n − 1
equations in the unknowns X 1 , X 2 , · · · , X n−1 :

X1 = Ê 1 ∪ Â 11 X 1 ∪ Â 12 X 2 ∪ · · · ∪ Â 1(n−1) X n−1
X2 = Ê 2 ∪ Â 21 X 1 ∪ Â 22 X 2 ∪ · · · ∪ Â 2(n−2) X n−1
·
·
·
X n−1 = Ê n−1 ∪ Â (n−1)1 X 1 ∪ Â (n−1)2 X 2 ∪ · · · ∪ Â (n−1)(n−1) X n−1

182
c. Once the solution to the above n − 1 equations in (b) is known, that solution can be used to find the
remaining unknown:

X n = A ∗nn · (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 )

Proof. The proof hinges on the repeated application of Theorem 6.1. The last of the n equations, X n =
E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 ∪ A nn X n can be thought of as an equation in the one unknown X n
with a coefficient of A nn for X n , and the remainder of the expression a “constant” term not involving X n .
The following parenthetical grouping illustrates this viewpoint:

X n = (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 ) ∪ A nn X n

Note that for any subscript k, if A nk does not contain λ, neither will A nk X k . Theorem 6.1 can therefore be
applied, to the one equation in the one unknown X n , with coefficients

E = (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 )

and A = A nn . The solution, A ∗ E , is exactly as given by part (c) above:

X n = A ∗nn · (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 )

or
X n = A ∗nn · E n ∪ A ∗nn · A n1 X 1 ∪ A ∗nn · A n2 X 2 ∪ · · · ∪ A ∗nn · A n(n−1) X n−1
If there was a unique solution for the terms X 1 through X n−1 , then Theorem 6.1 would guarantee a unique
solution for X n , too.
The solution for X n can be substituted for X n in each of the other n − 1 equations. If the kth equation
is represented by
X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · · ∪ A kn X n
then the substitution will yield
X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · ·
∪(A kn · (A ∗nn · E n ∪ A ∗nn · A n1 X 1 ∪ ·A ∗nn · A n2 X 2 ∪ · · · ∪ ·A ∗nn · A n(n−1) X n−1 ))
By using the distributive law, this becomes
X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · ·
∪(A kn · A ∗nn · E n ∪ A kn · A ∗nn · A n1 X 1 ∪ A kn · A ∗nn · A n2 X 2 ∪ · · · ∪ A kn · A ∗nn · A n(n−1) X n−1 )
Collecting like terms yields
X k = (E k ∪ A kn · A ∗nn · E n ) ∪ (A k1 X 1 ∪ A kn · A ∗nn · A n1 X 1 )
∪(A k2 X 2 ∪ A kn · A ∗nn · A n2 X 2 ) ∪ · · · ∪ (A k(n−1) X n−1 ∪ A kn · A ∗nn · A n(n−1) X n−1 )
or
X k = (E k ∪ A kn · A ∗nn · E n ) ∪ (A k1 X 1 ∪ A kn · A ∗nn · A n1 X 1 )
∪(A k2 ∪ A kn · A ∗nn · A n2 )X 2 ∪ · · · ∪ (A k(n−1) ∪ A kn · A ∗nn · A n(n−1) )X n−1

The constant term in this equation is (E k ∪ A kn · A ∗nn · E n ), which is exactly the formula given for Ê k in part
(b). The coefficient for X 1 is seen to be

(A k1 ∪ A kn · A ∗nn · A n1 ),

while the coefficient for X 2 is (A k2 ∪ A kn · A ∗nn · A n2 ), and so on. The coefficient for X j would then be Â k j =
A k j ∪ (A kn · A ∗nn · A n j ), which also agrees with the formula given in part (b). This is why the solution of the
original set of equations agrees with the solution of the set of n − 1 equations given in part (b).

183
Part (a) is proved by induction on n: the method outlined above can be repeated on the new set of n − 1
equations to eliminate X n−1 and so on, until one equation in the one unknown X 1 is obtained. Theorem
6.1 will guarantee a unique solution for X 1 , and part (c) can then be used to find the unique solution for
X 2 , and so on.

Example 6.8

Consider the system defined before, where

X 1 = ² ∪ a · X 1 ∪b
b · X2
X 2 = ; ∪b b · X1 ∪ ; · X2

The proof of Theorem 6.2 implies the solution for X 1 will agree with the solution to the one-variable
equation X 1 = Ê 1 ∪ Â 11 X 1 , where

Ê 1 = E 1 ∪ (A 12 , A ∗22 , E 2 ) = E ∪ (b
b · ² · ;) = ² ∪ (b
b · ² · ;) = ² ∪ ; = ²

and
Â 11 = A 11 ∪ (A 12 · A ∗22 · A 21 ) = a ∪ (b
b · ;∗ ·b b · ² ·b
b ) = a ∪ (b b ) = a ∪bb
bb
bb.

Thus we have X 1 = ² ∪ (a bb). X 1 , which by Theorem 6.1 has the (unique) solution X 1 = Â ∗11 Ê 1 = (a
a ∪ bb a∪
∗
bb
bb) ·². Substituting this into the second equation yields X 2 = ;∪b b ·(a bb)∗ ∪;· X 2 , which by Theorem
a ∪bb
bb
6.1 has the (unique) solution X 2 = ;∗ · (b
b · (a bb)∗ ) = b · (a
a ∪bb
bb bb)∗ . Note that this expression for X 2 could
a ∪bb
bb
also be found by applying the back-substitution formula given in the proof of Theorem 6.2.
We will now see that the language accepted by a DFA can be equated with the solution of a set of
language equations, which will allow us to prove the following important theorem.

Theorem 6.3 Let Σ be an alphabet and let L be an FAD language over Σ. Then L is a regular set over Σ.
Proof. If L is FAD, then there exists an n > 0 and a deterministic finite automaton A = 〈Σ, {s 1 , s 2 , . . . ,
s n }, s 1 , δ, F 〉 such that L(A) = L. For each i = 1, 2, . . . , n, define X i = {z ∈ Σ∗ | δ(s i , z) ∈ F }; that is, X 1 is
the set of all strings that, when starting at state s 1 , reach a final state in A. Each X i then represents the
terminal set T (A, s i ) described in Chapter 3. Since s 1 is the start state of this machine, it should be clear
that X 1 = L(A) = L. Define

 ; if s i ∉ F
Ei = for i = 1, 2, . . . , n
²

if s i ∈ F

and

[
Ai j = a for i , j = 1, 2, . . . , n
a ∈ Σ ∧ δ(s i ,a
a) = s j

That is, A i j represents the set of all letters that cause a transition from state s i to state s j . Notice that since
λ ∉ Σ none of the sets A i j contain the empty string, and therefore by Theorem 6.2, there is a unique solution
to the system:

184
Figure 6.5: The DFA discussed in Example 6.9

X1 = E 1 ∪ A 11 X 1 ∪ A 12 X 2 ∪ · · · ∪ A 1n X n
X2 = E 2 ∪ A 21 X 1 ∪ A 22 X 2 ∪ · · · ∪ A 2n X n
·
·
·
Xn = E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A nn X n
However, these equations exactly describe the relationships between the terminal sets denoted by X 1 , X 2 , · · · ,
X n , at the beginning of this proof (compare with Example 6.11), and hence the solution will represent ex-
actly those quantities. In particular, the solution for X 1 will be a regular expression for L(A), that is, for
L.

Example 6.9
Consider the DFA B given by the diagram in Figure 6.5, which accepts all strings with an odd number of
a ,b
b s over {a b }. This machine generates the following system f language equations:

b X2
X 1 = ; ∪ a X 1 ∪b
X 2 = ² ∪b
b X1 ∪ a X2

which will have the same solution for X 1 as the equation

X 1 = Ê 1 ∪ Â 11 X 1

where
Ê 1 = E 1 ∪ (A 12 · A ∗22 · E 2 ) = ; ∪ (b
b · a ∗ · ²) = b · a ∗
and
Â 11 = A 11 ∪ (A 12 · A ∗22 · A 21 ) = a ∪ (b
b · a ∗ ·b
b)
Theorem 6.1 predicts the solution for X 1 to be (aa ∪ (b a ∗ ·b
b ·a b ))∗ ·b a ∗ . It can be verified that this solution
b ·a
describes all those strings with an odd number of b s. X 1 is indeed the terminal set for t 1 , that is, T (B, t 1 ).
Likewise, finding X 2 yields all strings with an even number of b s, which is the terminal set for t 2 , T (B, t 2 ).
Nondeterministic finite automata can likewise be represented by language equations, and without
the intermediate step of applying Definition 4.5 to acquire a deterministic equivalent. The sets E i and
A i j retain essentially the same definitions as before: E i is ² or ;, depending on whether or not s i is a
final state, and A i j again represents exactly the set of all letters that cause a transition from state s i to
state s j . This definition requires a minor cosmetic change for NDFAs, since the state transition function
is slightly different: [
Ai j = a for i , j = 1, 2, . . . , n
a ∈ Σ ∧ s j ∈ δ(s j ,a
a)

185
Figure 6.6: (a) The NDFA B discussed in Example 6.10 (b) The NDFA C discussed in Example 6.10 (c) The
NDFA D discussed in Example 6.10

An n-state NDFA therefore gives rise to n equation in n unknowns, which can be solved as outlined
by Theorems 6.2 and 6.1. While Definition 4.5 need not be used as a conversion step, an NDFA with λ-
moves will have to be transformed into an equivalent NDFA without λ-moves. An appropriate definition
for A i j could be given for the original NDFA, and while the resulting equations would describe the rela-
tion between the terminal sets, some A i j set might then contain λ as a member. There are systems of
equations arising in this manner that do not have unique solutions (see the exercises). For an NDFA with
λ-moves, Definition 4.9 could be applied to find an equivalent NDFA without λ-moves, since Theorems
6.2 and 6.1 specifically prohibit the empty string as a part of a coefficient. However, if the ambiguous
equations generated from a machine with λ-moves were solved as suggested in Theorems 6.1 and 6.2, a
“minimal” solution would be obtained that would correspond to the desired answer.

Example 6.10
Consider again the system described in Example 6.8. This can be thought of as the set of language equa-
tions corresponding to the NDFA called B , illustrated in Figure 6.6a. Note that L(B ) is indeed the given
solution: L(B ) = X 1 = (a bb)∗ . Notice the similarity between B and the machine C shown in Figure
a ∪ bb
6.6b, which has s 2 as the start state. Note that L(C ) is given by X 2 = b · (a bb)∗ , where X 2 was the other
a ∪bb
bb
part of the solution given in Example 6.8 (verify this). Finally, consider a similar machine D in Figure 6.6c
with both s 1 and s 2 as start states. Can you quickly write a regular expression that describes the language
accepted by D?

Example 6.11
Regular expressions for machines with more than two states can be found by repeated application of
the technique described in Theorem 6.2. For example, consider the three-state DFA given in Figure 6.7.

186
Figure 6.7: The DFA discussed in Example 6.11

The solution for this three-state machine will be explored shortly. We begin by illustrating the natural
relationships between the terminal sets described in Theorem 6.3. First let us note that the language
accepted by this machine includes:

1. All strings that end with b .

2. Strings that contain no b s, but for which |x|a − |x|c sis a multiple of 3.

3. Strings that are concatenations of type (1) and type (2) strings.

According to Theorem 6.3, the equations for this machine are

X 1 = ² ∪b
b X 1 ∪ a X 2 ∪cc X 3
b ∪cc )X 1 ∪ ;X 2 ∪ a X 3
X 2 = ; ∪ (b
a ∪b
X 3 = ; ∪ (a b )X 1 ∪cc X 2 ∪ ;X 3

which can be simplified to

X 1 = ² ∪b
b X 1 ∪ a X 2 ∪cc X 3
b ∪cc )X 1 ∪ a X 3
X 2 = (b
a ∪b
X 3 = (a b )X 1 ∪cc X 2

and rewritten as

X 1 = ² ∪b
b X 1 ∪ a X 2 ∪cc X 3
X 2 = b X 1 ∪cc X 1 ∪ a X 3
X 3 = a X 1 ∪cc X 1 ∪cc X 2

The equation for X 1 admits the following interpretation; recalling that X 1 represents all the strings
that reach a final state when starting from s 1 , we see that these can be broken up into four distinct classes:

1. Strings of length 0: (²).

187
2. Strings that start with a (and note that a moves the current state from s 1 to s 2 ) and then proceed
a X 2 ).
(from s 2 ) to a final state: (a

b X 1 ).
3. Strings that start with b and then proceed to a final state: (b

4. Strings that start with c and then proceed to a final state: (cc X 3 ).

The union of these four classes should equal X 1 , which is exactly what the first equation states.
X 2 = b X 1 ∪cc X 1 ∪aa X 3 can be interpreted similarly; ² does not appear in this equation because there is
no way to reach a final state from s 2 if no letters are processed. If at least one letter is processed, then that
first letter is an a , b , or c . If it is a , then we move from state s 2 to s 3 , and the remainder of the string must
take us to a final state from s 3 (that is, the remainder must belong to X 3 ). Strings that begin with an a and
are followed by a string from X 3 can easily be described by a · X 3 . Similarly, strings that start with b or c
must move from s 2 to s 1 , and then be followed by a string from X 1 . These strings are described by b · X 1
and c · X 1 . The three cases for reaching a final state from s 2 that have just been described are exhaustive
(and mutually exclusive), and so their union should equal all of X 2 . This is exactly the relation expressed
by the second equation, X 2 = b X 1 ∪cc X 1 ∪ a X 3 . The last equation admits a similar interpretation.
None of the above observations are necessary to actually solve the system! The preceding discussion
is intended to illustrate that the natural relationships between the terminal sets described by Theorem
6.3 and the correspondences we have so laboriously developed here are succinctly predicted by the lan-
guage equations. Once the equations are written down, we can simply apply Theorem 6.2 and reduce to
a system with only two unknowns. We have
E 1 = ², E 2 = ;, E3 = ;
A 11 = b , A 12 = a , A 13 = c
A 21 = b ∪cc , A 22 = ;, A 23 = a
A 31 = a ∪bb, A 32 = c , A 33 = ;
from which we can compute
Ê 1 = E 1 ∪ A 13 A ∗33 E 3 = ² ∪cc ;∗ ; = ², Ê 2 = ;
Â 11 = A 11 ∪ A 13 A ∗33 A 31 = b ∪cc ;∗ (a
a ∪bb ) = b ∪cc (a
a ∪b
b ), Â 12 = a ∪cc ;∗c = a ∪cc
cc
b c a ∗ a b b c a a b a ∗c ac
Â 21 = ∪c ∪ ; (a ∪b ) = ∪c ∪ (a ∪b ), Â 22 = ; ∪ ; =
which gives the following system of equations:
X 1 = ² ∪ (b
b ∪cc (a
a ∪b b ))X 2 ∪ (aa ∪cc
cc
cc)X 2
X 2 = ; ∪ ((bb ∪cc ) ∪ a (a
a ∪b b ))X 1 ∪ ac X 2
These two equations can be reduced to a single equation by applying Theorem 6.2 again:

Êˆ1 = Ê 1 ∪ Â 12 ( Â 22 )∗ Ê 2 = ² ∪ (a
a ∪cc
cc ac)∗ · ; = ²
ac
cc) · (ac
Âˆ = Â ∪ Â ( Â )∗ Â = b ∪cc (a
11 11 12 22 21 a ∪bb ) ∪ (a
a ∪cc
cc ac)∗ ((b
ac
cc)(ac b ∪cc ) ∪ a (a
a ∪b
b ))
which yields one equation in one unknown whose solution is

X 1 = ( Âˆ 11 )∗ Êˆ1 = (b
b ∪cc (a
a ∪b
b ) ∪ (a
a ∪cc
cc ac)∗ ((b
ac
cc)(ac b ∪cc ) ∪ a (a b )))∗ · ²
a ∪b

Since s 1 was the only start state, the regular expression given by X 1 should describe the language ac-
cepted by the original three-state automaton.
Returning to our observations above, this expression can be reconciled with our intuitive notion of
what the solution “should” look like. Âˆ can be expanded to yield the following form:
11

188
Figure 6.8: The NDFA discussed in Example 6.12

Âˆ 11 = b ∪ca
ca ∪cb ac)∗b ∪ a (ac
cb ∪ a (ac
ac ac)∗c ∪ a (ac
ac ac)∗ab ∪ a (ac
ac ac)∗aa
ac
cc ac ∗ cc ac ∗ cc ac ∗ cc ac ∗
cc(ac
∪cc ac) b ∪cc
cc(ac
ac) c ∪cc
cc(ac
ac) ab ∪cc cc(ac
ac) aa
Observe that each of the 11 subexpressions consists of strings that (1) end with b , or (2) contain no b s,
but for which |x|a − |x|c is a multiple of 3. Hence the Kleene closure of this expression, which represents
the language accepted by this machine, does indeed agree with our notion of what X 1 should describe.
Since s 1 is also the only final state in this example, it is interesting to note that each of the subexpres-
sions of Âˆ describe strings that, when starting at s in the automaton, return you to s again for the first
11 1 1
time (examine the diagram and verify this).

Example 6.12
Consider the automaton shown in Figure 6.8. It is similar to the one in Example 6.11, but it now gives
ˆ
rise to four equations in four unknowns. As these equations are solved, the final Âˆ coefficient for X
11 1
will again describe strings that, when starting at s 1 in the automaton, return you to s 1 again for the first
ˆ ˆ
time; it will agree with Âˆ in Example 6.11. The final constant term associated with X (that is, Êˆ ), will
11 1 1
represent all those strings that deposit you in a final state from s 1 without ever returning to s 1 . In this
ˆ ˆ ˆ
automaton, this will be given by Êˆ1 = d e ∗. Âˆ ∗11 Êˆ1 therefore represents strings that go from s 1 back to s 1
any number of times, followed by a string that leaves s 1 (for the last time) for a final state.
In general, the final coefficient and constant terms can always be interpreted in this manner. In
Example 6.11, the only way to reach a final state from s 1 and avoid having to return again to s 1 was to not
leave in the first place; this was reflected by the fact that Êˆ = ².
1

Example 6.13
Consider the automaton illustrated in Figure 6.9, which is identical to the DFA in Example 6.11 except
for the placement of the final state. Even though the initial system of three equations is now different,
we can expect Âˆ to compute to the same expression as before. Since Êˆ is supposed to represent all
11 1
those strings that deposit you in a final state from s 1 without ever returning to s 1 , one should be able
to predict that the new final constant term will look like Êˆ = a (ac
ac)∗ ∪ c (cc a
ac
1 a)∗c . An expression for the
language recognized by this automaton would then be given by

189
Figure 6.9: The DFA discussed in Example 6.13

X 1 = ( Âˆ 11 )∗ Êˆ1
b ∪cc (a
= (b a ∪b b ) ∪ (a
a ∪cc
cc ac)∗ ((b
ac
cc)(ac b ∪cc ) ∪ a (a b )))∗ · (a
a ∪b ac)∗ ∪cc (cc a
a (ac
ac a)∗c )

It may often be convenient to eliminate a variable other than the one that is numerically last. This
can be accomplished by appropriately renumbering the unknowns and applying Theorem 6.2 to the new
set of equations. For convenience, we state an analog of Theorem 6.2 that allows the elimination of the
m th unknown from a set of n equations in n unknowns. The following lemma agrees with Theorem 6.2 if
m = n.

Lemma 6.3 Let n and m be positive integers and let m ≤ n. Consider the system of n ≥ 2 equations in the
unknowns X 1 , X 2 , . . . , X n , given by

X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · · ∪ A kn X n , for k = 1, 2, . . . , n

in which (∀i , j )(∀λ ∉ A i j )

The unknown X m can be eliminated from this system to form the following n − 1 equations in the
unknowns X 1 , X 2 , . . . , X m−1 , X m+1 , . . . , X n .

X k = Ê k ∪ Â k1 X 1 ∪ Â k2 X 2 ∪ . . . ∪ Â k(m−1) X m−1 ∪ Â k(m+1) X m+1 ∪ . . . ∪ Â kn X n ,

for k = 1, 2, . . . , m − 1, m + 1, . . . , n

where
Ê i = E i ∪ (A i m · A ∗mm · E m ), for all i = 1, 2, . . . , m − 1, m + 1, . . . , n

and
Â i j = A i j ∪ (A i m · A ∗mm · A m j ), for all i , j = 1, 2, . . . , m − 1, m + 1, . . . , n

Furthermore, once the solution to the above n − 1 equations is known, that solution can be used to find the
remaining unknown:

X m = A ∗mm · (E m ∪ A m1 X 1 ∪ A m2 X 2 ∪ . . . ∪ A m(m−1) X m−1

∪ A m(m+1) X m+1 ∪ . . . ∪ A mn X n )

Proof. The proof follows from a renumbering of the equations given in Theorem 6.2.

190
Figure 6.10: The DFA discussed in Exercise 6.19

A significant reduction in the size of the expressions representing the solutions can often be achieved
by carefully choosing the order in which to eliminate the unknowns. This situation can easily arise when
solving language equations that correspond to finite automata. For example, consider the DFA illustrated
in Figure 6.10. The equations for this machine are given by

0 ∪1
X 1 = ; ∪ ;X 1 ∪ (0 1)X 2 ∪ ;X 3
X 2 = ² ∪ ;X 1 ∪1
1 X 2 ∪ ;X 3
0 ∪1
X 3 = ; ∪ ;X 1 ∪ (0 1)X 2 ∪ ;X 3

Using Theorem 6.2 to methodically solve for X 1 , X 2 , and X 3 involves eliminating X 3 and then eliminating
X 2 . Theorem 6.1 can then be used to solve for X 1 , and then the back-substitution rules can be employed
to find X 2 and X 3 . The regular expressions found in this manner are quite complex. A striking simpli-
fication can be made by eliminating X 3 and then eliminating X 1 (instead of X 2 ) . The solution for X 2 is
quite concise, which leads to simple expressions for X 1 and X 3 during the back-substitution phase (see
Exercise 6.19).
Let A = 〈Σ, {s 1 , s 2 , . . . , s n }, s 1 , δ, F 〉 be a deterministic finite automaton. We have seen that the re-
lationships between the terminal sets T (A, s i ) described in Chapter 3 give rise to a system of equa-
tions. Similarly, the initial sets I (A, s i ) defined in Chapter 2 are also interrelated. Recall that, for a
state s i , I (A, s i ) is comprised of strings that, when starting in the start state, lead to the state s i . That
is, I (A, s i ) = {x | δ(s 1 , x) = s i }. The equations we have discussed to this point have been right linear; that
is, the unknowns X i appear to the right of their coefficients. The initial sets for an automaton are also
related by a system of equations, but these equations are left linear; the unknowns Yi appear to the left
of their coefficients. The solution for sets of left-linear equations parallels that of right-linear systems.

Theorem 6.4 Let n and m be positive integers and let m ≤ n. Consider the system of n ≥ 2 equations in
the unknowns Y1 , Y2 , . . . , Yn given by

Yk = Iˆk ∪ Y1 B k1 ∪ Y2 B k2 ∪ · · · ∪ Yn B kN , for k = 1, 2, . . . , n

in which (∀i , j )(λ ∉ B i j ).

a. The unknown Ym can be eliminated from this system to form the following n − 1 equations in the un-
knowns Y1 , Y2 , . . . , Ym−l , Ym+l , . . . , Yn .

Yk = Iˆk ∪ Y1 B̂ k1 ∪ Y2 B̂ k2 ∪ · · · ∪ Ym−1 B̂ k(m−1) ∪ Ym+1 B̂ k(m+1) ∪ · · · ∪ Yn B̂ kn ,

191
for k = 1, 2, . . . , m − 1, m + 1, . . . , n
where
∗
Iˆi = I i ∪ (I m · B mm · B i m ), for all i = 1, 2, . . . , m − 1, m + 1, . . . , n
and
∗
B̂ i j = B i j ∪ (B m j · B mm · B i m ), for all i , j = 1, 2, . . . , m − 1, m + 1, . . . , n

b. Once the solution to the above n −1 equations is known, that solution can be used to find the remaining
unknown:

Ym = (I m ∪ Y1 B m1 ∪ Y2 B m2 ∪ · · · ∪ Ym−1 B m(m−1)
∗
∪ Ym+1 B m(m+1) ∪ · · · ∪ Yn B mn ) · B mm

∗
c. A single equation Y1 = I 1 ∪ Y1 B 11 has the unique solution Y1 = I 1 B 11 .

Proof. The proof is essentially a mirror image of the proofs given in Theorems 6.1 and 6.2.

Lemma 6.4 Let A = 〈Σ, {s 1 , s 2 , . . . , s n }, S 0 , δ, F 〉 be an NDFA. For each i = 1, 2, . . . , n, let the initial set I (A, s i ) =
{x | δ(s 1 , x) = s i } be denoted by Yi . The unknowns Y1 , Y2 , . . . , Yn satisfy a system of n left-linear equations of
the form
Yk = I k ∪ Y1 B k1 ∪ Y2 B k2 ∪ · · · ∪ Yn B kn , for k = 1, 2, . . . , n
where the coefficients are given by

 ; if s i ∉ S 0
Ii = for i = 1, 2, . . . , n
²

if s i ∈ S 0

and
[
Bi j = a for i , j = 1, 2, . . . , n
a ∈ Σ ∧ s i n ∈ δ(s j ,a
a)

Proof. See the exercises.

In contrast to Theorem 6.3, where A i j represented the set of all letters that causes a transition from
state s i to state s j , B i j represents the set of all letters that causes a transition from state s j to state s i .
That is, B i j = A j i . In the definition in Theorem 6.3, E i represented the set of all strings of length zero that
can reach final states from s i . Compare this with the definition of I i above, which represents the set of
all strings of length zero that can reach s i from a start state.

6.4 FAD Languages As Regular Sets; Closure Properties

The technique outlined by Theorems 6.1, 6.2, and 6.3 provide the second half of the correspondence be-
tween regular sets and FAD languages. As a consequence, regular expressions and automata characterize
exactly the same class of languages.

Corollary 6.2 Let Σ be an alphabet. Then DΣ ⊆ RΣ .

Proof. The proof follows immediately from Theorem 6.3.

192
Theorem 6.5 Kleene’s Theorem. Let Σ be an alphabet. Then RΣ = DΣ .
Proof. The proof follows immediately from Corollaries 6.1 and 6.2.

Thus the terms FAD language and regular set can be used interchangeably, since languages accepted
by finite automata can be described by regular expressions, and vice versa. Such languages are often
referred to as regular languages. The correspondence will allow, for example, the pumping lemma to be
invoked to justify that certain languages cannot be represented by any regular expression.
RΣ is therefore closed under every operator for which DΣ is closed. We have now seen two represen-
tations for FAD languages, and a third will be presented in Chapter 8. Since there are effective algorithms
for switching from one representation to another, we may use whichever vehicle is most convenient to
describe a language or prove properties about regular languages. For example, we may use whichever
concept best lends itself to the proof of closure properties. The justification that RΣ is closed under
union follows immediately from Definition 6.1; much more effort was required in Chapter 5 to prove
that the union of two languages represented by DFAs could be represented by another DFA. On the other
hand, attempting to justify closure under complementation by using regular expressions is an exercise
in frustration. We will now see that closure under substitution is conveniently proved via regular expres-
sions.
A substitution is similar to a language homomorphism (Definition 5.8), in which letters were replaced
by single words. Substitutions will denote the methodical replacement of the individual letters within a
regular expression with sets of words. The only restriction on these sets is that they must also be regular
expressions, though not necessarily over the same alphabet.

Definition 6.5 Let Σ = {a a 1 ,a a m } be an alphabet and let Γ be a second alphabet. Given regular
a 2 , . . . ,a
expressions R 1 , R 2 , . . . , R n over Γ, define a regular set substitution s: Σ → ℘(Γ∗ ) by s(a a i ) = R i for each
i = 1, 2, . . . , m, which can be extended to s : Σ∗ → ℘(Γ∗ ) by

s(λ) = ²

and
a ∈ Σ)(∀x ∈ Σ∗ )(s(a
(∀a a · x) = s(a
a ) · s(x))

s can be further extended to operate on a language L ⊆ Σ∗ by defining

[
s(L) = s(z)
z ∈L

In this last context, s : ℘(Σ∗ ) → ℘(Γ∗ ).

Example 6.14
Let Σ = {0 0} and Γ = {a a ,b
b }. Define s(0 a ∪b
0) = (a b ) · (a
a ∪bb ). From the recursive definition, s(00
00 a ∪b
00) = (a b ) · (a
a∪
b ) · (aa ∪b
b ) · (a
a ∪b
b ). Furthermore, the language s(0 0∗ ) represents all even-length strings over {a
a ,b
b }.
The definition of s(L) for a language L allows the domain of the substitution to be extended all the
way to s: ℘(Σ∗ ) → ℘(Γ∗ ). It can be proven that the image of RΣ under s is contained in RΓ (see the
exercises); however, the image of NΣ under s is not completely contained in NΓ
In Example 6.14, the language 0 ∗ was regular and so was its image under s. It is possible to start with a
nonregular set and define a substitution that produces a regular set (see Lemma 6.5), but it is impossible
for the image of a regular set to avoid being regular, as shown by the next theorem.

193
Theorem 6.6 Let Σ be an alphabet, and let s : Σ → ℘(Σ∗ ) be a substitution. Then RΣ is closed under s.
Proof. Choose an arbitrary regular expression R over Σ. We must show that s(R) represents a regular set.
R is an expression made up of the letters in Σ and the characters (, ), ∪, and ∗ . Form the new expression R 0 by
a ). R 0 is then clearly another regular expression over Σ. In fact, it can be shown
replacing each letter a by s(a
that R 0 represents exactly the words in s(R); this is formally accomplished by inducting on the number
of operators in the expression R. To prove this, one must argue that this substitution correspondence is
preserved by each of the six rules defining regular expressions. The basis step of the induction involves all
regular expressions with zero operators, that is, those defined by the first three rules for generating a regular
expression.

a i ),
i. The substitution corresponding to any single letter a i is a regular expression corresponding to s(a
a i ) = s(a
since, by definition of s, s(a a i ).

ii. The substitution corresponding to ; is a regular expression corresponding to s(;), since, by definition
of s, s(;) = ;.

iii. The substitution corresponding to ² is a regular expression corresponding to s(λ), since, by definition
of s, s(λ) = ².

The inductive step requires an argument that the correspondence is preserved whenever another of the
three operators is introduced to form a more complex expression. These assertions involve the final three
rules for generating regular expressions.

iv. If R 1 and R 2 are regular expressions, then the substitution corresponding to (R 1 · R 2 ) is a regular ex-
pression representing the concatenation of the two corresponding substitutions. That is, s(R · R 2 ) =
s(R 1 ) · s(R 2 ).

v. If R 1 and R 2 are regular expressions, then the substitution corresponding to (R 1 ∪ R 2 ) is a regular ex-
pression representing s(R 1 ) ∪ s(R 2 ).

vi. If R 1 is a regular expression, then the substitution corresponding to (R 1 )∗ is a regular expression repre-
senting (s(R 1 ))∗ .

Each of these three assertions follows immediately from the definition of substitution and is left as an
exercise. The inductive step guarantees that the substitution correspondence is preserved in any regular
expression R, regardless of the number of operators in R. Consequently, R 0 is indeed a regular expression
denoting s(R), and RΣ is therefore closed under s.

The analogous result does not always hold for the nonregular sets.

Lemma 6.5 Let Σ be an alphabet.

a. There are examples of regular set substitutions s : Σ → ℘(Σ∗ ) for which NΣ is not closed under s.

b. There are examples of regular set substitutions t : Σ → ℘(Σ∗ ) for which NΣ is closed under t .

Proof. (a) NΣ is not closed under some substitutions. Let Σ = {a

a ,b
b } and define s(a
a ) = (a
a ∪ b ) and
b ) = (a
s(b a ∪b
b ). The image of the nonregular set

L = {x ||x|a = |x|b }

194
is the set of even-length words, which is regular. Thus L ∈ NΣ but s(L) ∉ NΣ .
(b) NΣ is closed under some substitutions. Some substitutions do preserve nonregularity (such as the
identity substitution i , since for any language L, i (L) = L). In this case, (∀L)(L ∈ NΣ ⇒ i (L) ∈ NΣ ) and
therefore NΣ is closed under i .

Note that a substitution in which each R i is a single string then conforms to Definition 5.8 and represents
a language homomorphism.

Corollary 6.3 Let Σ be an alphabet, and let ψ : Σ → Σ∗ be a language homomorphism. Then RΣ closed
under ψ.
Proof. The proof follows immediately from Theorem 6.6, since a language homomorphism is a special
type of substitution.

As in Chapter 5, this result can also be proved by successfully modifying an appropriate DFA, show-
ing that DΣ (= RΣ ) is closed under language homomorphism. It is likewise possible to use machine
constructs to show that DΣ is closed under substitution, but this becomes much more complex than
the argument given for Theorem 6.6. A third characterization of regular languages will be presented in
Chapter 8, affording a choice of three distinct avenues for proving closure properties of RΣ .

Exercises
6.1. Let Σ = {a
a ,b
b }. Give (if possible) a regular expression that describes the set of all even-length words
in Σ∗ .

6.2. Let Σ = {a b }. Give (if possible) a regular expression that describes the set of all words x in Σ∗ for
a ,b
which |x| ≥ 2.

6.3. Let Σ = {a b }. Give (if possible) a regular expression that describes the set of all words x in Σ∗ for
a ,b
which |x|a = |x|b .

6.4. Let Σ = {a b ,cc }. Give a regular expression that describes the set of all odd-length words in Σ∗ that do
a ,b
not end in b .

6.5. Let Σ = {a b ,cc }. Give a regular expression that describes the set of all words in Σ∗ that do not contain
a ,b
two consecutive c s.

6.6. Let Σ = {a b ,cc }. Give a regular expression that describes the set of all words in Σ∗ that do contain
a ,b
two consecutive c s.

6.7. Let Σ = {a b ,cc }. Give a regular expression that describes the set of all words in Σ∗ that do not contain
a ,b
any c s.

6.8. Let Σ = {0
0,1
1}. Give, if possible, regular expressions that will describe each of the following languages.
Try to write these directly from the descriptions (that is, avoid relying on the nature of the corre-
sponding automata).

(a) L 1 = {x | |x| mod 3 = 2}

(b) L 2 = Σ∗ − {w | ∃n ≥ 1 3 w = a 1 . . . a n ∧ a n = 1 }

195
(c) L 3 = {y | |y|0 > |y|1 }

6.9. Let Σ = {a
a ,b
b ,cc }. Give, if possible, regular expressions that will describe each of the following lan-
guages. Try to write these directly from the descriptions (that is, avoid relying on the nature of the
corresponding automata).

(a) L 1 = {x | (|x|a is odd) ∧ (|x|b is even)}

(b) L 2 = {y | (|y|c is even) ∨ (|y|b is odd)}
(c) L 3 = {z | (|z|a is even)}
(d) L 4 = {z | |z|c is a prime number}
(e) L 5 = {x | abc is a substring of x}
(f) L 6 = {x | ac aba is a substring of x}
b ,cc }∗ | |x|a ≡ 0 mod 3}
a ,b
(g) L 7 = {x ∈ {a

6.10. Let Σ = {a
a ,b
b ,d
d }. Give a regular expression that will describe

Ψ = {x ∈ Σ∗ | (x begins with d ) ∨ (x contains two consecutive b s)}.

6.11. Let Σ = {a
a ,b
b ,cc }. Give a regular expression that will describe

Φ = {x ∈ Σ∗ | every b in x is immediately followed by c }.

6.12. Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9}. Give a regular expression that will describe

r = {x ∈ Σ∗ | the number represented by x is evenly divisible by 3}

0,00
= {λ,0 00 000
00,000 3,03
000, . . . ,3 03 003
03,003 6,9
003, . . . ,6 9,12
12 15
12,15
15, . . .}.

6.13. Let Σ = {0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9}. Give a regular expression that will describe

K = {x ∈ Σ∗ | the number represented by x is evenly divisible by 5}.

6.14. Use the exact constructs given in the theorems of Chapter 5 to build a NDFA that accepts b ∪ a ∗c
(refer to Examples 6.4, 6.5, and 6.6). Do not simplify your answer.

6.15. Give examples of sets that demonstrate the following inequalities listed in Lemma 6.1:

(a) R 1 ∪ ² ∉ R 1
(b) R 1 · R 2 6= R 2 · R 1
(c) R 1 · R 1 6= R 1
(d) R 1 ∪ (R 2 · R 3 ) 6= (R 1 ∪ R 2 ) · (R l ∪ R 3 )
(e) (R 1 · R 2 )∗ 6= (R 1∗ · R 1∗ )∗
(f) (R 1 · R 2 )∗ 6= (R 1∗ ∪ R 2∗ )∗

Find other examples of sets that show the following expressions may be equal under some condi-
tions:

196
(g) R 1 ∪ ² ∉ R 1
(h) R 1 · R 2 6= R 2 · R 1 (even if R 1 6= R 2 )
(i) R 1 · R 1 6= R 1
(j) R 1 ∪ (R 2 · R 3 ) 6= (R 1 ∪ R 2 ) · (R l ∪ R 3 ) (even if R 1 6= R 2 6= R 3 6= R 1 )
(k) (R 1 · R 2 )∗ 6= (R 1∗ · R 1∗ )∗ (even if R 1 6= R 2 )
∗
(l) (R 1 · R 2 ) 6= (R 1∗ ∪ R 2∗ )∗ (even if R 1 6= R 2 )

6.16. Prove the equalities listed in Lemma 6.1.

6.17. (a) Consider Theorem 6.1. Find examples of sets A and E that will show that A ∗· E is not a unique
solution if λ ∈ A.
(b) Find examples of sets A and E that will show that A ∗·E can be the unique solution even if λ ∈ A.

1 }∗ :
0,1
6.18. Solve the following set of language equations for X 0 and X 1 over {0

0 ∪1
X 1 = (0 1)X 1
X 2 = ² ∪11 X 0 ∪0
0X1

Do you see any relation between these equations and the DFA A in Example 3.4?

6.19. (a) Solve the following set of language equations for X 1 , X 2 , and X 3 by eliminating X 3 and then
eliminating X 2 . Solve for X 1 and then back-substitute to find X 2 and X 3 . Note that these equa-
tions arise from the automaton in Figure 6.10.
0 ∪1
X 1 = ; ∪ ;X 1 ∪ (0 1)X 2 ∪ ;X 3
X 2 = ² ∪0
0 X 1 ∪1
1X2 ∪ ;X 3
0 ∪1
X 3 = ; ∪ ;X 1 ∪ (0 1)X 2 ∪ ;X 3
(b) Rework part (a) by eliminating X 3 and then eliminating X 1 (instead of X 2 ).
(c) How does the solution in part (b) compare to the solution in part (a)? Is one more concise? Are
they equivalent?

6.20. Prove Lemma 6.2. [Hint: Let P (m) be the statement that “Every regular expression R with m or fewer
operators represents a regular set that is FAD,” and induct on m]

6.21. Let Σ = {a
a ,b
b ,cc }. Find all solutions to the language equation X = X ∪ {b
b }.

6.22. Prove that, for any languages A and E , A ∗ E = E ∪ A · (A ∗ E ).

6.23. Give a regular expression that will describe the intersection of the regular sets (ab b )∗a and (ba
ab ∪b ba ∪
a )∗ .

6.24. Develop an algorithm that, when applied to two regular expressions, will generate an expression
describing their intersection.

a ∪bb
6.25. Verify by direct substitution that X 1 = (a bb a ∪bb
bb) and X 2 = b · (a bb
bb) is a solution to

X 1 = ² ∪ a · X 1 ∪b
b · X2
X 2 = ; ∪b b · X2 ∪ ; · X2

197
6.26. (a) Find L(D) for the machine D described in Example 6.10.
(b) Generalize your technique: For a machine A with start states s 1 , s 2 , . . . , s i m , L(A) is given by
?

6.27. Let Σ = {a
a ,b
b }. Give a regular expression that will describe the complement of the regular set
ab ∪b ∗
b) a .
(ab

6.28. Develop an algorithm that, when applied to a regular expression, will generate an expression de-
scribing the complement.

6.29. Let Σ = {a b ,cc }. Define E (L) = {z | (∃y ∈ Σ+ )(∃x ∈ L)z = y x}. Use the regular expression concepts
a ,b
given in this chapter to argue that RΣ is closed under the operator E (that is, don’t build a new
automaton; build a new regular expression from the old expression).

6.30. Let Σ = {a b ,cc }. Define B (L) = {z | (∃x ∈ L)(∃y ∈ Σ∗ )z = x y}. Use the regular expression concepts
a ,b
given in this chapter to argue that RΣ is closed under the operator B (that is, don’t build a new
automaton; build a new regular expression from the old expression).

6.31. Let Σ = {a b ,cc }. Define M (L) = {z | (∃x ∈ L)(∃y ∈ Σ+ )z = x y}. Use the regular expression concepts
a ,b
given in this chapter to argue that RΣ is closed under the operator M (that is, don’t build a new
automaton; build a new regular expression from the old expression).

6.32. (a) Let Σ = {a

a ,b
b ,cc }. Show that there does not exist a unique solution to the following set of lan-
guage equations:
X1 = b ∪ ² · X1 ∪ a · X2
X2 = c ∪ ² · X1 ∪ ² · X2
(b) Does this contradict Theorem 6.2? Explain.

1 }∗ :
0,1
6.33. Solve the following set of language equations for X 0 and X 1 over {0

X 0 = 0 ∗1 ∪ (10
10)∗ X 0 ∪ 0 (0
10 0 ∪1
1)X 2
X 1 = ² ∪ 1 ∗0101X 0 ∪ 0 X1

6.34. Let Σ = {a
a ,b
b ,cc }.

(a) Give a regular expression that describes the set of all words in Σ∗ that end with c and for which
aa
aa, bb
bb, and cc never appear as substrings.
(b) Give a regular expression that describes the set of all words in Σ∗ that begin with c and for
which aa
aa, bb
bb, and cc never appear as substrings.

6.35. Let Σ = {a
a ,b
b ,cc }.

(a) Give a regular expression that describes the set of all words in Σ∗ that contain no more than
two c s.
(b) Give a regular expression that describes the set of all words in Σ∗ that do not have exactly one
c.

6.36. Recall that the reverse of a word x, written x 0 , is the word written backward. The reverse of a lan-
guage is likewise given by L 0 = {x 0 | x ∈ L}. Let Σ = {a
a ,b
b ,cc }.

198
Figure 6.11: The NDFA for Exercise 6.40

(a) Note that (R 1 ∪ R 2 )0 = (R 10 ∪ R 20 ) for any regular sets R 1 and R 2 . Give similar equivalences for
each of the rules in Definition 6.1.
(b) If L were represented by a regular expression, explain how to generate a regular expression
representing L 0 (compare with the technique used in the proof of Theorem 6.6).
(c) Prove part (b) by inducting on the number of operators in the expression.
(d) Use parts (a), (b), and (c) to argue that RΣ is closed under the operator r .

6.37. Complete the details of the proof of Theorem 6.4.

6.38. Let Σ = {a
a ,b
b ,cc }.

(a) Give a regular expression that describes the set of all words in Σ∗ for which no b is immediately
preceded by a .
(b) Give a regular expression that describes the set of all words in Σ∗ that contain exactly two c s
and for which no b is immediately preceded by a .

6.39. Let Σ = {a
a ,b
b ,cc }.

(a) Give a regular expression that describes the set of all words in Σ∗ for which no b is immediately
preceded by c .
(b) Give a regular expression that describes the set of all words in Σ∗ that contain exactly one c
and for which no b is immediately preceded by c .

6.40. (a) Use Theorem 6.3 to write the two right-linear equations in two unknowns corresponding to
the NDFA given in Figure 6.11.
(b) Solve these equations for both unknowns.
(c) Give a regular expression that corresponds to the language accepted by this NDFA.
(d) Rework the problem with two left-linear equations.

6.41. (a) Use Theorem 6.3 to write the four right-linear equations in four unknowns corresponding to
the NDFA given in Figure 6.12.
(b) Solve these equations for all four unknowns.
(c) Give a regular expression that corresponds to the language accepted by this NDFA.
(d) Rework the problem with four left-linear equations.

6.42. (a) Use Theorem 6.3 to write the seven right-linear equations in seven unknowns corresponding
to the NDFA given in Figure 6.13.

199
Figure 6.12: The automaton for Exercise 6.41

Figure 6.13: The NDFA for Exercise 6.42

(b) Solve these equations for all seven unknowns. Hint: Make use of the simple nature of these
equations to eliminate variables without appealing to Theorem 6.2.
(c) Give a regular expression that corresponds to the language accepted by this NDFA.
(d) Rework the problem with seven left-linear equations.

6.43. Prove that for any languages A, E , and Y , if E ⊆ Y , then A · E ⊆ A · Y .

6.44. Let Σ be an alphabet, and let s : Σ → Γ∗ be a substitution.

(a) Prove that the image of RΣ under s is contained in RΣ .

(b) Give an example to show that the image of NΣ under s need not be completely contained in
NΓ .

6.45. Give a detailed proof of Lemma 6.3.

6.46. Let Σ = {a b } and Ξ = {x ∈ Σ∗ | x contains (at least) two consecutive b s ∧x does not contain two
a ,b
consecutive a s}.

6.47. Let Σ = {a
a ,b
b ,cc }. Give regular expressions that will describe:

(a) {x ∈ {a b ,cc }∗ | every b in x is eventually followed by c }; that is, x might look like baabac
a ,b baabacaa,
aa or
bc
bcacc,
acc and so on.
b ,cc }∗ | every b in x is immediately followed by c }.
a ,b
(b) {x ∈ {a

6.48. Let Σ = {a
a ,b
b }. Give, if possible, regular expressions that will describe each of the following lan-
guages. Try to write these directly from the descriptions (that is, avoid relying on the nature of the
corresponding automata).

(a) The language consisting of all words that have neither consecutive a s nor consecutive b s.

200
(b) The language consisting of all words that begin and end with different letters.
(c) The language consisting of all words for which the last two letters match.
(d) The language consisting of all words for which the first two letters match.
(e) The language consisting of all words for which the first and last letters match.

a ,b
6.49. The set of all valid regular expressions over {a b } is a language over the alphabet {a b , (, ), ∪,∗ , ;, ²}.
a ,b
Show that this language is not FAD.

6.50. Give regular expressions corresponding to the languages accepted by each of the NDFAs listed in
Figure 6.14.

6.51. Complete the details of the proof of Theorem 6.6.

6.52. Prove Lemma 6.4.

6.53. Corollary 6.3 followed immediately from Theorem 6.6. Show that Theorems 5.2, 5.4, and 5.5 are also
corollaries of Theorem 6.6.

6.54. Let F be the collection of languages that can be formed by repeated application of the following five
rules:

a } ∈ F and {b
i. {a b} ∈ F
ii. { } ∈ F
iii. {λ} ∈ F
iv. If F 1 ∈ F 2 and F 2 ∈ F , then F 1 · F 2 ∈ F
v. If F 1 ∈ F and F 2 ∈ F , then F 1 ∪ F 2 ∈ F

Describe the class of languages generated by these five rules.

201
Figure 6.14: The automata for Exercise 6.50

202
Chapter 7

Finite-State Transducers

We have seen that finite-state acceptors are by no means robust enough to accept standard computer
languages like Pascal. Furthermore, even if a DFA could reliably recognize valid Pascal programs, a ma-
chine that only indicates “Yes, this is a valid program” or “No, this is not a valid program” is certainly
not all we expect from a compiler. To emulate a compiler, it is necessary to have a mechanism that will
produce some output other than a simple yes or no: in this case, we would expect the corresponding ma-
chine language code (if the program compiled successfully) or some hint as to the location and nature of
the syntax errors (if the program was invalid).
A machine that accepts input strings and translates them into output strings is called a sequential
machine or transducer. Our conceptual picture of such a device is only slightly different from the model
of a DFA shown in Figure 7.1a. We still have a finite-state control and an input tape with a read head, but
the accept/reject indicator is replaced by an output tape and writing device, as shown in Figure 7.1b.
These machines do not have the power to model useful compilers, but they can be employed in many
other areas. Applications of sequential machine concepts are by no means limited to the computer world
or even to the normal connotations associated with “read” and “write.” A vending machine is essentially
a transducer that interprets inserted coins and button presses as valid inputs and returns candy bars and
change as output. Elevators, traffic lights, and many other common devices that monitor and react to
limited stimuli can be modeled by finite-state transducers.
The vending machine analogy illustrates that the types of input to a device (coins) may be very dif-
ferent from the types of output (candy bars). In terms of our conceptual model, the read head may be
capable of recognizing symbols that are different from those that the output head can print. Thus we will
have an output alphabet Γ that is not necessarily the same as our input alphabet Σ.
Also essential to our model is a rule that governs what characters are printed. For our first type of
transducer, this rule will depend on both the current internal state of the machine and the current symbol
being scanned by the read head and will be represented by the function ω. Finally, since we are dealing
with translation rather than acceptance/rejection, there is no need to single out accepting states: the
concept of final states can be dispensed with entirely.

7.1 Basic Definitions

Definition 7.1 A finite-state transducer (FST) or Mealy sequential machine|seeFST with a distinguished
start state is a sextuple 〈Σ, Γ, S, s 0 , δ, ω〉, where:

i. Σ denotes the input alphabet.

203
Figure 7.1: The difference between an acceptor and a transducer

ii. Γ denotes the output alphabet.

iii. S denotes the set of states, a finite nonempty set.

iv. s 0 denotes the start state; s 0 ∈ S.

v. δ denotes the state transition function; δ: S × Σ → S.

vi. ω denotes the output function; ω: S × Σ → Γ.

The familiar state transition diagram needs to be slightly modified to represent these new types of
machines. Since there is one labeled arrow for each ordered pair in the domain of the state transition
function and there is also one output symbol for each ordered pair, we will place the appropriate output
symbol by its corresponding arrow, and separate it from the associated input symbol by a slash, /.

Example 7.1
Let V = 〈{n n ,d d ,qq ,b b }, {ϕϕ,n n 0 ,d q 0 ,cc 0 ,cc 1 ,cc 2 ,cc 3 ,cc 4 }, S, s 0 , δ, ω〉 be the FST illustrated in Figure 7.2. V describes
d 0 ,q
the action of a candy machine that dispenses 30¢ Chocolate Explosions. n , d , q denote inputs of nickels,
dimes, and quarters (respectively), and b denotes the act of pushing the button to select a candy bar.
ϕ ,n
n 0 ,d
d 0 ,q
q 0 ,cc 0 ,cc 2 ,cc 2 ,cc 3 ,cc 4 represent the vending machine’s response to these inputs: it may do nothing,
return the nickel that was just inserted, return the dime, return the quarter, or dispense a candy bar with
0, 1, 2, 3, or 4 nickels as change, respectively. Note that the transitions agree with the vending machine
model presented in Chapter 1; the new model now specifies the action corresponding to the given input.
It is relatively simple to modify the above machine to include a new input r that signifies that the coin
return has been activated and a new output a representing the release of all coins that have been inserted
(see the exercises).
Various modern appliances can be modeled by FSTs. Many microwave ovens accept input through
the door latch mechanism and an array of keypad sensors, and typical outputs include the control lines
to the microwave generator, the elements of a digital display, an interior light, and an audible buzzer. The
physical circuitry needed to implement these common machines will be discussed in a later section. We
now examine the ramifications of Definition 7.1 by concentrating on the details of a very simple finite-
state transducer.

204
Figure 7.2: A finite-state transducer model of the vending machine discussed in Example 7.1

Example 7.2
Let B = 〈Σ, Γ, S, s 0 , δ, ω〉 be given by

Σ = {a
a ,b b}
Γ = {0
0,1 1}
S = {s 0 , s 1 }
s0 = s0

The state transition function is defined in Table 7.1a.

Table 7.1a
ω a b
s0 s0 s1
s1 s0 s1

It can be more succinctly specified by (∀s ∈ S)[δ(s,a a ) = s 0 and δ(s,b

b ) = s 1 ]. Finally, Table 7.1b displays
the output function, which can be summarized by (∀c ∈ Σ)[ω(s 0 , c) = 0 and ω(s 1 , c) = 1 ]
All the information about B is contained in the diagram displayed in Figure 7.3. Consider the input
abaabbaa. From s 0 , the first letter of z, that is, a , causes a 0 to be printed, since ω(s 0 ,a
sequence z = abaabbaa a ) = 0,
and since δ(s 0 ,a
a ) = s 0 , the machine remains in state s 0 . The second letter b causes a second 0 to be
printed since ω(s 0 ,b
b ) = 0 , but the machine now switches to state s 1 [δ(s 0 ,b b ) = s 1 ]. The third input letter
causes a 1 to be printed [ω(s 1 ,a a ) = 1 ], and so on. The entire output string will be 00100110 00100110, and the

205
Table 7.1b
ω a b
s0 0 0
s1 1 1

Figure 7.3: The state transition diagram for the transducer discussed in Example 7.2

machine, after starting in state s 0 , will successively assume the state s 0 , s 1 , s 0 , s 0 , s 1 , s 1 , s 0 , s 0 as the input
string is processed. We are not currently interested in the terminating state for a given string (s 0 in this
case), but rather in the resulting output string, 00100110
00100110.
It should be clear that the above discussion illustrates a very awkward way of describing translations.
While ω describes the way in which single letters are translated, the study of finite-state transducers will
involve descriptions of how entire strings are translated. This situation is reminiscent of the modification
of the state transition function δ, which likewise operated on single letters, to the extended state transi-
tion function δ (which was defined for strings). Indeed, what is called for is an extension of ω to ω, which
will encompass the translation of entire strings. The translation cited in the last example could then be
succinctly stated as ω(s 0 ,abaabbaa
abaabbaa 00100110. That is, the notation ω(t , y) is intended to represent
abaabbaa) = 00100110
the output string produced by a transducer (beginning from state t ) in response to the input string y.
The formal recursive definition of ω will depend not only on ω but also on the state transition func-
tion δ (and its extension δ). δ retains the same conceptual meaning it had for finite-state acceptors:
δ(s, x) denotes the state reached when starting from s and processing, in sequence, the individual letters
of the string x. Furthermore, the conclusion stated in Theorem 1.1 still holds:

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀s ∈ S)(δ(s, y x) = δ(δ(s, y), x))

A similar statement can be made about ω once it has been rigorously defined.

Definition 7.2 Given a FST A = 〈Σ, Γ, S, s 0 , δ, ω〉, the extended output function for A, denoted by ω, is a
function ω: S × Σ∗ → Γ∗ defined recursively as follows:

i. (∀t ∈ S) ω(t , λ) = λ

ii. (∀t ∈ S)(∀x ∈ Σ∗ )(∀a ∈ Σ)(ω(t ,a

a x) = ω(t ,a
a ) · ω(δ(t ,a
a ), x)

Example 7.3
Let B = 〈Σ, Γ, S, s 0 , δ, ω〉 be the FST given in Example 7.2. Then

ω(s 2 ,baa
baa
baa) = ω(s 2 ,b
b ) · ω(δ(s 1 ,b aa) = 1 · ω(s 1 ,aa
b ),aa
aa aa
aa)
= 1 · ω(s 1 ,a
a ) · ω(δ(s 1 ,a a ) = 11 · ω(s 0 ,a
a ),a a ) = 110

206
Note that a three-letter input sequence gives rise to exactly three output symbols: ω is length preserving,
in the sense that (∀t ∈ S)(∀x ∈ Σ∗ )(|ω(t , x)| = |x|).
The ω function extends the ω function from single letters to words. Whereas the ω function maps a
state and a letter to a single symbol from Γ, the ω function maps a state and a word to an entire string
from Γ∗ . It can be deduced from (i) and (ii) (see the exercises) that (iii) (∀t ∈ S)(∀a ∈ Σ)(ω(t , a) = ω(t , a),
which is the observation that ω and ω treat single letters the same. The extended output function ω has
properties similar to those of δ, in that the single letter a found in the recursive definition of ω can be
replaced by an entire word y. The analog of Theorem 1.1 is given below.

Theorem 7.1 Let A = 〈Σ, Γ, S, s 0 , δ, ω〉 be a FST. Then:

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀t ∈ S)(ω(t , y x) = ω(t , y) · ω(δ(t , y), x))

and
(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀s ∈ S)(δ(s, y x) = δ(δ(s, y), x))

Proof. The proof is by induction on |y| (see the exercises and compare with Theorem 1.1).

Example 7.4
Let B = 〈Σ, Γ, S, s 0 , δ, ω〉 be the FST given in Example 7.2. Consider the string z = abaabbaa = y x, where
y = abaab and x = baa baa. To apply Theorem 7.1 with t = s 0 , we first calculate ω(s 0 , y) = ω(s 0 ,abaab
abaab
abaab) =
00100, and δ(s 0 , y) = s 1 . From Example 7.3, ω(s 1 ,baa
00100 baa
baa) = 110
110, and hence, as required by Theorem 7.1,

00100110 = ω(s 0 ,abaabbaa

abaabbaa) = ω(s 0 , y x) = ω(s ) , y) · ω(δ(t , y), x) = 00100 ·110
abaabbaa 110

For a given FST A with a specified start state, the deterministic nature of finite-state transducers
requires that each input string be translated into a unique output string; that is, the relation f A that
associates input strings with their corresponding output strings is a function.

Definition 7.3 Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉, the translation function for M , denoted by f M , is the
function f M : Σ∗ → Γ∗ defined by f M (x) = ω(s 0 , x).

Note that f M , like ω, is length preserving: (∀x ∈ Σ∗ )(| f M (x)| = |x|). Consequently, for any n ∈ N, if the
domain of f M were restricted to Σn , then the range of f M would likewise be contained in Γn .

Example 7.5
Let B = 〈Σ, Γ, S, s 0 , δ, ω〉 be the finite-state transducer given in Figure 7.3. Since ω(s 0 ,abaab abaab 00100
abaab) = (00100
00100),
abaab
f B (abaab 00100. Similarly, f B (λ) = λ, f B (a
abaab) = 00100 a ) = 0 , f B (b
b ) = 0 , f B (aa
aa
aa) = 00 ab
00, f B (ab
ab) = 00 ba
00, f B (ba
ba) = 01
01,
bb
f B (bb 01
bb) = 01. Coupled with these seven base definitions, this particular f B could be recursively defined
by

(∀x ∈ Σ∗ ) f B (xaa
aa a ) ·0
aa) = f B (xa 0
ab
f B (xab a ) ·0
ab) = f B (xa 0
ba
f B (xba b ) ·1
ba) = f B (xb 1

207
Figure 7.4: The state transition diagram for the Mealy machine C in Example 7.6

and

bb
f B (xbb b ) ·1
bb) = f b (xb 1

f B in essence replaces a s with 0 s and b s with 1 s, and “delays” the output by one letter. More specifically,
the translation function for B takes an entire string and substitutes 0 s and 1 s for a s and b s (respectively),
deletes the last letter of the string, and appends a 0 to the front of the resulting string. The purpose of
the two states s 0 and s 1 in the FST B is to remember whether the previous symbol was an a or a b (re-
spectively) and output the appropriate replacement letter. Note that 1 s are always printed on transitions
from s 1 , and 0 s are printed as we leave s 0 .

Example 7.6
a ,b
Let C = 〈{a b }, {0 1}, {t 0 , t 1 , t 2 , t 3 }, t o , δC , ωc 〉 be the FST shown in Figure 7.4. C flags occurrences of the
0,1
string aab by printing a 1 on the output tape only when the substring aab appears in the input stream.
Clearly, not all functions from Σ∗ to Γ∗ can be represented by finite-state transducers; we have al-
ready observed that functions that are not length preserving cannot possibly qualify. As the function
discussed later in Example 7.7 shows, not all length-preserving functions qualify, either.

Definition 7.4 Given a function f : Σ∗ → Γ∗ , f is finite transducer definable (FTD) iff there exists a trans-
ducer A such that f = f A .

Due to the deterministic nature of transducers, any two strings that “begin the same” must start being
“translated the same.” This observation is the basis for the following theorem.

Theorem 7.2 Assume f is FTD. Then

(∀n ∈ N)(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀z ∈ Σ∗ )(the first n letters of f (x y) must agree with the first n letters of f (xz)
Proof. See the exercises.

Example 7.7
b ,cc }∗ → {0
a ,b
Consider the function g : {a 1}∗ , which replaces input symbols by 0 unless the next letter is c ,
0,1
in which case 1 is used instead. Thus,

abc aaccb
g (abc abb
aaccb) = 01001100 and g (abb
abb) = 000
000.

208
With n = 2, choosing x = ab
ab, y = caaccb
caaccb, and z = b shows that g violates Theorem 7.2, so g cannot be
FTD.
The necessary condition outlined in the previous theorem is by no means sufficient to guarantee that
a function is FTD; other properties such as a pumping lemma-style repetitiousness of the translation
must also be present (see the exercises).

7.2 Minimization of Finite-State Transducers

Two transducers that perform exactly the same translation over the entire range of input strings from
Σ will be called equivalent transducers. This is in spirit similar to the way equivalence was defined for
deterministic finite automata.

Definition 7.5 Given transducers

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉,

A is said to be equivalent to B iff f A = f B .

Just as with finite automata, a reasonable goal when constructing a transducer is to produce an effi-
cient machine, and, as before, this will be equated with the size of the finite-state control; given a trans-
lation function f , a minimal machine for f is a FST that has the minimum number of states necessary to
perform the required translation.

Definition 7.6 Given a finite-state transducer A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉, A is the minimal Mealy machine
for the translation f A iff for all finite-state transducers B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉 for which f A = f B , kS A k ≤
kS B k.

Thus, A is minimal if there is no equivalent machine with fewer states.

Example 7.8
The FST C = 〈{a a ,b
b }, {0 0,11}, {t 0 , t 1 , t 2 , t 3 }, t 0 , δC , ωC 〉 given in Figure 7.4 is not minimal. The FST D =
a ,b
〈{a b }, {0 1}, {q 0 , q 1 , q 2 }, q 0 , δD , ωD 〉 given in Figure 7.5 performs the same translation, but has only three
0,1
states.
The concept of two transducers being essentially the same except for a trivial renaming of the states
will again be formalized through the definition of isomorphism (and homomorphism. As before, it will
be important to match the respective start states and state transitions; but rather than matching up final
states (which do not exist in the FST model), we must instead ensure that the output function is preserved
by the relabeling process.

Definition 7.7 Given two FSTs

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉,

and a function µ: S A → S B , µ is called a Mealy machine homomorphism from A to B iff the following
three conditions hold:

i. µ(s 0 A ) = s 0B .

209
Figure 7.5: The state transition diagram for the Mealy machine D in Example 7.8

a ∈ Σ)(µ(δ A (s,a
ii. (∀s ∈ S A )(∀a a )) = δB (µ(s),a
a )).

a ∈ Σ)(ω A (s,a
iii. (∀s ∈ S A )(∀a a ) = ωB (µ(s),a
a )).

As in Chapter 3, a bijective homomorphism will be called an isomorphism and will signify that the
isomorphic machines are essentially the same (except perhaps for the names of the states). The isomor-
phism is essentially a recipe for renaming the states of one machine to produce identical transducers.

Definition 7.8 Given two FSTs

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉,

and a function µ: S A → S B , µ is called a Mealy machine isomorphism from A to B iff the following five
conditions hold:

i. µ(s 0 A ) = s 0B .

a ∈ Σ)(µ(δ A (s,a
ii. (∀s ∈ S A )(∀a a )) = δB (µ(s),a
a )).

a ∈ Σ)(ω A (s,a
iii. (∀s ∈ S A )(∀a a ) = ωB (µ(s),a
a )).

iv. µ is a one-to-one function from S A to S B .

v. µ is onto S B .

Definition 7.9 If µ: S A → S B is an isomorphism between two transducers A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and

B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉, then A is said to be isomorphic to B , and we will write A ∼
= B.

Example 7.9
Consider the two FSTs C = 〈{a a ,b
b }, {0 1}, {t 0 , t 1 , t 2 , t 3 }, t 0 , δC , ωC 〉, given in Figure 7.4, and D =
0,1
a ,b
〈{a b }, {0 1}, {q 0 , q 1 , q 2 }, q 0 , δD , ωD 〉, displayed in Figure 7.5. The function u: {t 0 , t 1 , t 2 , t 3 } → {q 0 , q 1 , q 2 },
0,1
defined by µ(t 0 ) = q 0 , µ(t 1 ) = q 1 , µ(t 2 ) = q 2 , and µ(t 3 ) = q 0 is a homomorphism between C and D. Condi-
tions (i) and (ii) are exactly the same criteria used for finite automata homomorphisms and have exactly

210
the same interpretation: the start states must correspond and the transitions must match. The third
condition is present to ensure that the properties of the ω function are respected; for example, since t 2
causes 1 to be printed when b is processed, so should the corresponding state in the D machine, which
is q 2 = µ(t 2 ) in this example. Indeed, ωC (t 2 ,b
b ) = 1 = ωD (µ(t 2 ),b
b ). Such similarities extend to full strings
also: note that ωC (t 0 ,aabaab) = 001 = ωD (µ(t 0 ),aab
aab aab
aab) in this example. The results can be generalized as
presented in the next lemma.

Lemma 7.1 If µ: S A → S B is a homomorphism between two FSTs

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉,

then
(∀s ∈ S A )(∀x ∈ Σ∗ )(µ(δ A )s, x)) = δB (µ(s), x))

and
(∀s ∈ S A )(∀x ∈ Σ∗ )(ω A (s, x) = ωB (µ(s), x))

Proof. The proof is by induction on |x| (see the exercises).

Corollary 7.1 If µ: S A → S B is a homomorphism between two FSTs A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B =

〈Σ, Γ, S B , s 0B , δB , ωB 〉, then A is equivalent to B ; that is, f A = f B .
Proof. The proof follows immediately from Lemma 7.1 and the definition of f M .

In a manner very reminiscent of the approach taken to minimize deterministic finite automata, no-
tions of state equivalence relations, reduced machines, and connectedness can be defined. As was the
case in Chapter 3, a reduced and connected machine will be isomorphic to every other equivalent mini-
mal machine. The definition for connectedness is essentially unchanged.

Definition 7.10 A state s in a transducer M = 〈Σ, Γ, S, s 0 , δ, ω〉 is called accessible iff

(∃x s ∈ Σ∗ ) 3 δ(s 0 , x s ) = s

The transducer M = 〈Σ, Γ, S, s 0 , δ, ω〉 is called connected iff

(∀s ∈ S)(∃x s ∈ Σ∗ ) 3 δ(s 0 , x s ) = s

That is, every state s of S can be reached by some string (x s ) in Σ∗ ; once again, the choice of the state
s will have a bearing on which particular string is used as a representative. States that are not accessible
do not affect the translation performed by the transducer; such states can be safely deleted to form a
connected version of the machine.

Definition 7.11 Given a FST M = 〈Σ, Γ, S, s 9 , δ, ω〉, define the transducer M c = 〈Σ, Γ, S c , s 0c , δc , ωc 〉, called
M connected, by

S c = {s ∈ S | ∃x ∈ Σ∗ 3 δ(s 0 , x) = s}
s 0c = s 0

δc is essentially the restriction of B to S c × Σ: (∀a a ∈ Σ)(∀s ∈ S c )(δc (s,a

a ) = δ(s, a)), and ωc is the restriction
of ωt o S c × Σ: (∀a
a ∈ Σ)(∀s ∈ S c )(ωc (s,a
a ) = ω(s,a
a )).

211
M c is, as in Chapter 3, the machine M with the unreachable states “thrown away.” As with DFAs,
trimming a machine in this fashion has no effect on the operation of the transducer. To formally prove
this, the following lemma is needed.

Lemma 7.2 Given transducers

M = 〈Σ, Γ, S, s 0 , δ, ω〉 and M c = 〈Σ, Γ, S c , s 0c , δc , ωc 〉,

the restriction of ω to S c × Σ∗ is ωc .
Proof. We must show that (∀y ∈ Σ∗ )(∀t ∈ S c )(ωc (t , y) = ω(t , y)). This can be done with a straightfor-
ward induction on |y|. Let P (n) be defined by

(∀y ∈ Σ)(∀t ∈ S c )(ωc (t , y) = ω(t , y)).

The basis step is trivial, since ωc (t , λ) = λ = ω(t , λ). For the inductive step, assume (∀y ∈ Σm )(∀t ∈
S c )(ωc (t , y) = ω(t , y)), and let t ∈ S c and z ∈ Σm+1 be given. Then ∃x ∈ Σm , ∃a
a ∈ Σ for which z = a x,
and therefore

ωc (t , z) = (by definition of z)
ωc (t ,a
a x) = (by Definition 7.2ii)
c c
ω (t ,a
a ) · ω (δ (t ,a
c a ), x) = (by definition of δc )
ωc (t ,a
a ) · ωc (δ(t ,a
a ), x) = (by the induction assumption)
c
ω (t ,aa ) · ω(δ(t ,aa ), x) = (by definition of ωc )
ω(t ,aa ) · ω(δ(t ,a)
a)
a), x) = (by Definition 7.2ii)
ω(t ,a
a x) = (by definition of z)
ω(t , z)

Since z was an arbitrary element of Σm+1 , and t was an arbitrary state in S c ,

(∀y ∈ Σm+1 )(∀t ∈ S c )(ωc (t , z) = ω(t , z)),

which proves P (m + 1). Hence, P (m) ⇒ P (m + 1), and, since m was arbitrary, (∀m ∈ N)(P (m) ⇒ P (m + 1)).
By the principle of mathematical induction, P (n) is therefore true for all n, and the lemma is proved.

Since ωc = ω, it immediately follows that f M = f M c , and we are therefore assured that the operation
of any transducer is indistinguishable from the operation of its connected counterpart.

Theorem 7.3 Given transducers

M = 〈Σ, Γ, S, s 0 , δ, ω〉 and M c = 〈Σ, Γ, S c , s 0c , δc , ωc 〉,

M is equivalent to M c .
Proof. f M c (x) = ωc (s 0c , x) = ωc (s 0 , x) = ω(s 0 , x) = f M (x), and hence by the definition of equivalence of
transducers, M is equivalent to M c .

Corollary 7.2 Given a FTD function f , the minimal machine corresponding to f must be connected.
Proof. (by contradiction): Assume the minimal machine M is not connected; then, by Theorem 7.3,
f M = f M = f , and clearly kS c k < kSk, and hence M could not be minimal.
c

212
While connectedness is a necessary condition for minimality, it is not sufficient, as evidenced by the
machine C in Figure 7.4: C was connected, but the FST D in Figure 7.5 was an equivalent but smaller
transducer.
As was the case with finite automata in Chapter 3, connectedness is just one of the two major re-
quirements for minimality. The other requirement is that no two states behave identically. For DFAs,
this translated into statements about acceptance and rejection. For FSTs, this will instead involve the
behavior of the output function. The analog to Definition 3.2 is given next.

Definition 7.12 Given a transducer M = 〈Σ, Γ, S, s 0 , δ, ω〉, the state equivalence relation on M , E M , is de-
fined by
(∀s ∈ S)(∀t ∈ S)(sE M t ⇔ (∀x ∈ Σ∗ )(ω(s, x) = ω(t , x)))

In other words, we will relate states s and t if and only if it is not possible to determine, by only
observing the output, whether we are starting from state s or state t (no matter what input string is
used). The more efficient machines will not have such duplication of states, and, as with DFAs, will be
said to be reduced.

Definition 7.13 A transducer M = 〈Σ, Γ, S, s 0 , δ, ω〉 is called reduced iff (∀s, t ∈ S)(sE M t ⇔ s = t ).

As before, if M is reduced, E M must be the identity relation on the set of states S, and each equiva-
lence class must contain only a single element. We defer for the moment the discussion of how E M can
be efficiently calculated. Once the state equivalence relation is known, in a manner that is also analo-
gous to the treatment of finite automata, states related by E M can be coalesced to form a machine that is
reduced.

Definition 7.14 Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉, define M modulo its state equivalence relation, M/E M ,
by M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉, where

SEM = {[s]E M | s ∈ S}
s 0E M = [s 0 ]E M

δE M is defined by
(∀a ∈ Σ)(∀[s]E M ∈ S E M )(δE M ([s]E M ,a
a ) = [δ(s,a
a )]E M ),
and ωE M is defined by
(∀a ∈ Σ)(∀[s]E M ∈ S E M )(ωE M ([s]E M ,a
a ) = ω(s,a
a )).

The proof that δE M is well defined is similar to that found in Chapter 3. In an analogous fashion, ωE M
must be shown to be well defined (see the exercises).
All the properties that one would expect of M /E M are present, as outlined in the following theorem.

Theorem 7.4 Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉,

M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉

is equivalent to M and is reduced. Furthermore, if M is connected, so is M /E M

Proof. The proof that connectedness is preserved is identical to that given for Theorem 3.5; showing that
M /E M is reduced is very similar to the proof of Theorem 3.4. The proof of the fact that the two machines are
equivalent requires the inductive argument that (∀y ∈ Σ∗ )(∀t ∈ S)(ω(t , y) = ωE M ([t ]E M , y)) and is indeed
very similar to the proofs of Lemma 7.2 and Theorem 7.3.

213
An argument similar to that given for Corollary 7.2 shows that a reduced FST is also a requirement
for minimality.

Corollary 7.3 Given a FTD function f , the minimal machine corresponding to f must be reduced.
Proof. The proof is by contradiction; see the exercises.

Being reduced, like connectedness, is a necessary condition for a machine to be minimal, but it is also
not sufficient (see the exercises). One would hope that the combination of being reduced and connected
would be sufficient to guarantee that the given machine is minimal. This is indeed the case, and one
more important result, proved next in Theorem 7.5, is needed to complete the argument: Two reduced
and connected FSTs are equivalent iff they are isomorphic. Armed with this result, we can also show
that a minimal transducer can be obtained from any FST M by reducing and connecting it. As in Chapter
3, connecting and reducing an arbitrary machine M will therefore be guaranteed to produce the most
efficient possible machine for that particular function.

Theorem 7.5 Two reduced and connected FSTs, M 1 = 〈Σ, Γ, S 1 , s 01 , δ1 , ω1 〉 and M 2 = 〈Σ, Γ, S 2 , s 02 , δ2 , ω2 〉,
are equivalent iff M 1 ∼= M2 .
Proof. By Corollary 7.1, if M 1 ∼ = M 2 , then M 1 is equivalent to M 2 . The converse half of the proof is very
reminiscent of that given for Theorem 3.1. We must assume M 1 and M 2 are equivalent and then prove that
an isomorphism can be exhibited between M 1 and M 2 . A natural way to define such an isomorphism is as
follows: Given a state s in M 1 , choose a string x, such that δ1 (s 01 , x s ) = s. Let µ(s) = δ2 (s 02 , x s ). At least one
such string x s , must exist for each state of M 1 , since M 1 was assumed to be connected. There may be several
choices for x s for a given state s, but all will yield the same value for δ2 (s 02 , x s ), and so µ is well defined
(see the exercises). The function µ satisfies the three properties of a homomorphism and turns out to be a
bijection (see the exercises). Thus M 1 ∼ = M 2 . As will be clear from the exercises, the hypothesis that M 1 and
M 2 are reduced and connected is crucial to the proof of this part of the theorem.

Note that Theorem 7.5 implies that, as long as we are dealing with reduced and connected machines,
f M1 = f M2 iff M 1 ∼
= M 2. The conclusions discussed earlier now follow immediately from Theorem 7.5.

Corollary 7.4 Given a FST M , a necessary and sufficient condition for M to be minimal is that M is both
reduced and connected.
Proof. See the exercises.

Corollary 7.5 Given a FST M , M c /E M c is minimal.

Proof. Let M be a FST and let A be a minimal machine that is equivalent to M . By Corollaries 7.2 and
7.3, A must be both reduced and connected. By Theorems 7.3 and 7.4, M c /E M c is also reduced, connected,
and equivalent to M (and hence to A). Theorem 7.5 would then guarantee that A and M c /E M c are iso-
morphic, and therefore they have the same number of states. Since A was assumed to have the minimum
possible number of states, M c /E M c also has that property and is thus minimal.

The minimal machine can therefore be found as long as M c /E M c can be computed. Finding S c (and
from that M c ) is accomplished in exactly the same manner as described in Chapter 3. The strategy for
generating E M is likewise quite similar, and again uses the i th state equivalence relation, as outlined
below.

214
Definition 7.15 Given a transducer M = 〈Σ, Γ, S, s 0 , δ, ω〉 and a nonnegative integer i , define a relation
between the states of M called E i M , the ith partial state equivalence relation on M , by

(∀s, t ∈ S)(sE i M t ⇔ (∀x ∈ Σ∗ 3 |x| ≤ i )(ω(s, x) = ω(t , x)))

Thus E i M relates states that cannot be distinguished by strings of length i or less, whereas E M relates
states that cannot be distinguished by any string of any length. All the properties attributable to the
analogous relations for finite automata (E i A ) carryover, with essentially the same proofs, to the relations
for finite-state transducers (E i M )

Lemma 7.3 . Given a transducer M = 〈Σ, Γ, S, s 0 , δ, ω〉:

a. E m+1M is a refinement of E mM ; that is, (∀s, t ∈ S)(sE m+1M t ⇒ sE mM t ).

b. E M is a refinement of E mM ; that is, (∀s, t ∈ S)(sE M t ⇒ sE mM t ); hence, E M ⊆ E mM .

c. (∃m ∈ N 3 E mM = E m+1M ) ⇒ (∀k ∈ N)(E m+kM = E mM ).

d. (∃m ∈ N 3 m ≤ kSk ∧ E mM = E m+1M ).

e. (∃m ∈ N 3 E mM = E m+1M ) ⇒ E mM = E M .

Proof. The proof is similar to the proofs given in Chapter 3 for E i A (see the exercises).

Lemma 7.4 Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉:

a. E 0M has just one equivalence class, which consists of all of S.

a ∈ Σ)(ω(s,a
b. E 1M is defined by sE 1M t ⇔ (∀a a ) = ω(t ,a
a )).

c. For i ≥ 1, E i +1M can be computed from E i M as follows:

(∀s ∈ S)(∀t ∈ S)(∀i ≥ 1)(sE i +1M t ⇔ SE i M t ∧ (∀a ∈ Σ)(δ(s,a

a )E i M δ(t ,a
a ))).

Proof. The proof is similar to the proofs given in Chapter 3 for E i A (see the exercises).

Corollary 7.6 Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉, there is an algorithm for computing E M .
Proof. Use Lemma 7.4 to compute successive E i M relations from E l M until E i M = E i +1M ; by Lemma
7.3, this E i M will equal E M , and this will all happen before i reaches kSk, the number of states in S. Thus
the procedure is guaranteed to halt.

Corollary 7.7 Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉, there is an algorithm for computing the minimal machine
equivalent to M .
Proof. Using the algorithm for computing the set of connected states, M c can be found. The output
function is used to find E 1M c ’, and the state transition function is then used to calculate successive relations
until E M c is found. M c /E M c , can then be defined and will be the minimal machine equivalent to M .

215
7.3 Moore Sequential Machines
Moore machines form another class of transducer that is equivalent in power to Mealy machines. They
use a less complex output function, but often require more states than an equivalent Mealy machine to
perform the same translation. An illustration of the convenience and utility of Moore machines can be
found in Example 7.16, which demonstrates that traffic signal controllers can most naturally be modeled
by the transducers discussed in this section.

Definition 7.16 A Moore sequential machine|seeMSM (MSM) with a distinguished start state is a sextu-
ple
〈Σ, Γ, S, s 0 , δ, ω〉, where:

i. Σ denotes the input alphabet.

ii. Γ denotes the output alphabet.

iii. S denotes the set of states, a finite nonempty set.

iv. s 0 denotes the start state; s 0 ∈ S.

v. δ denotes the state transition function; δ: S × Σ → S.

vi. ω denotes the output function; ω: S → Γ.

Note that the only change from Definition 7.1 is the specification of the domain of ω. Conceptually,
we will envision the machine printing an output symbol as a new state is reached (rather than during
the transition, as was the case for Mealy machines). Note that the output symbol can no longer depend
(directly) on the current symbol being scanned; it is solely a function of the current state of the machine.
Consequently, the state transition diagrams will list the output function next to the state name, separated
by a slash, /. We will adopt the convention that no symbol will be printed until the first character is read
and a transition is made (an alternate view, not adopted here, is to decree that the machine print the
symbol associated with s 0 when the machine is first turned on; under that interpretation, an output
string would be one character longer than its corresponding input string).

Example 7.10
Let C = 〈Σ, Γ, S, r 0 , δ, ω〉 be given by

Σ = {a
a ,bb}
Γ = {0
0,1 1}
s0 = r 0

The state transition table is shown in Table 7.2. Finally, the output function is given by ω(r 0 ) = 0 , ω(r 1 ) =
1 , ω(r 2 ) = 0 , and ω(r 3 ) = 1 , or, more succinctly, [w(r i ) = i mod 2] for i = 0, 1, 2, 3. All the above informa-
tion about C is contained in Figure 7.6. This Moore machine performs the same translation as the Mealy
machine B in Example 7.2.
Results that were targeted toward a FST in the previous sections were specific to Mealy machines.
When the descriptor “transducer” appears in the theorems and definitions presented earlier, the concept
or result applies unchanged to both FSTs and MSMs. Most of these results are alluded to but not restated
in this section. For example, δ is defined like and behaves like the extended state transition functions

216
Table 7.2
δ a b
r0 r0 r2
r1 r0 r2
r2 r1 r3
r3 r1 r3

Figure 7.6: The state transition diagram for the transducer discussed in Example 7.10

for DFAs and FSTs. On the other hand, because of the drastic change in the domain of ω, ω must be
modified as outlined below in order for ω(s, x) to represent the output string produced when starting at
s and processing x.

Definition 7.17 Given a MSM A = 〈Σ, Γ, S, s 0 , δ, ω〉, the extended output function for A, denoted again by
ω, is a function ω: S × Σ → Γ∗ defined recursively by:

i. (∀t ∈ S) ω(t , λ) = λ

ii. (∀t ∈ S)(∀x ∈ Σ∗ )(∀a ∈ Σ)(ω(t ,a

a x) = ω(δ(t ,a
a )) · ω(δ(t ,a
a ), x)

Note that the domain of the function w has been extended further than usual: in all previous cases,
the domain was enlarged from S × Σ to S × Σ∗ ; in this instance, we are beginning with a domain of only S
and still extending it to S ×Σ∗ . The above definition allows the following analog of Theorem 7.1 to remain
essentially unchanged.

Theorem 7.6 Let Σ be an alphabet and A = 〈Σ, Γ, S, s 0 , δ, ω〉 be a Moore sequential machine. Then

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀t ∈ S)(ω(t , y x) = ω(t , y) · ω(δ(t , y), x))

Proof. The proof is by induction on |y| (see the exercises and compare with Theorem 1.1).

217
As before, the essence of a Moore machine is captured in the translation function that the machine
describes.

Definition 7.18 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉, the translation function for M , denoted by f M , is the
function f M : Σ∗ → Γ∗ defined by f M (x) = ω(s 0 , x).

Definition 7.5 applies to Moore machines; two MSMs are equivalent if they define the same transla-
tion. Indeed, it is possible for a Mealy machine to be equivalent to a Moore machine, as shown by the
transducers in Figures 7.2 and 7.6.
It is easy to turn a Moore machine A = 〈Σ, Γ, S, s 0 , δ, ω〉 into an equivalent Mealy machine M = 〈Σ, Γ, S, s 0 , δ, ω0 〉.
The first five parts of the transducer are unchanged. Only the sixth component (the output function)
must be redefined, as outlined below.

Definition 7.19 Given a Moore machine A = 〈Σ, Γ, S, s 0 , δ, ω〉, the corresponding Mealy machine M is given
by M = 〈Σ, Γ, S, s 0 , δ, ω0 〉, where ω0 is defined by

(∀a ∈ Σ)(∀s ∈ S)(ω0 (s,a

a ) = ω(δ(s,a
a )))

Pictorially, all arrows that lead into a given state in the Moore machine should be labeled in the cor-
responding Mealy machine with the output symbol for that particular state. It follows easily from the
definition that the corresponding machines perform the same translation.

Theorem 7.7 Given a Moore machine A = 〈Σ, Γ, S, s 0 , δ, ω〉, the corresponding Mealy machine
M = 〈Σ, γ, S, s 0 , δ, ω0 〉 is equivalent to A; that is, (∀x ∈ Σ∗ )( f M (x) = f A (x)).
Proof. The proof is by induction on |x| (see the exercises).

Example 7.11
Let A = 〈Σ, Γ, S, r 0 , δ, ω〉 be the Moore machine given in Figure 7.6. The corresponding Mealy machine
M = 〈Σ, Γ, S, s 0 , δ, ω0 〉 is then given by

Σ = {a
a ,b
b }, Γ = {0
0,1
1}, S = {r 0 , r 1 , r 2 , r 3 }, s0 = r 0

and the state transition table and the output function table are specified as in Tables 7.4a and 7.4b. The

Table 7.3a Table 7.3b

δ a b ω0 a b
r0 r0 r2 r0 0 0
r1 r0 r2 r1 0 0
r2 r1 r3 r2 1 1
r3 r1 r3 r3 1 1

new Mealy machine is shown in Figure 7.7. Note that the arrow labeled a leaving r 1 now has a 0 associ-
0) originally output a 0 .
ated with it, since the state at which the arrow pointed (r 0 /0
In a similar fashion, an equivalent Moore machine can be defined that corresponds to a given Mealy
machine. However, due to the more restricted nature of the output function of the Moore constructs, the
new machine will generally need more states to perform the same translation.

218
Figure 7.7: The state transition diagram for the Mealy machine M in Example 7.11

The idea behind the construct is to break each state in the Mealy machine up into a group of several
similar states in the Moore machine, each of which prints a different output symbol. The new transition
function mimics the old one; if state r maps to state t in the Moore machine, then any state in the group
corresponding to r will map to one particular state in the group of states corresponding to t . The partic-
ular state within the group is chosen in a manner that will guarantee that the appropriate output symbol
will be printed. This construct is implemented in the following definition.

Definition 7.20 Given a Mealy machine M = 〈Σ, Γ, S, s 0 , δ, ω〉, the corresponding Moore machine A is given
by A = 〈Σ, Γ, S × Γ, 〈s 0 , α〉, δ0 , ω0 〉, where α is an (arbitrary) member of Γ,

δ0 is defined by (∀s ∈ S)(∀b a ∈ Σ)(δ0 (〈s,b

b ∈ Γ)(∀a b )a
a 〉,a a ), ω(s,a
a ) = (δ(s,a a )))

and
ω0 is defined by (∀s ∈ S)(∀b
b ∈ Γ)(ω0 (〈s,b
b 〉) = b ).

Theorem 7.8 Given a Mealy machine M = 〈Σ, Γ, S, s 0 , δ, ω〉, the corresponding Moore machine A = 〈Σ, Γ, S, s 0 , δ0 , ω0 〉
is equivalent to M ; that is, (∀x ∈ Σ∗ )( f A (x) = f M (x)),
Proof. The proof is by induction on |x| (see the exercises).

Since every Mealy machine has an equivalent Moore machine and every Moore machine has an
equivalent Mealy machine, either construct can be used as a basis of what was meant by a translation f
being finite transducer definable.

Corollary 7.8 A translation J is FTD iff f can be defined by a FST M iff f can be defined by a MSM A.
Proof. The proof is immediate from the definition of FTD and Theorems 7.7 and 7.8.

Example 7.12
Consider the Mealy machine B from Figure 7.3. The corresponding Moore machine A = 〈Σ, Γ, S, q 0 , δ, ω〉
is given by

219
Figure 7.8: The state transition diagram for the Moore machine A in Example 7.12

Σ = {aa ,b
b}
Γ = {0
0,1 1}
S = {〈s 0 ,00〉, 〈s 0 ,1
1〉, 〈s 1 ,0
0〉, 〈s 1 ,1
1〉}
1〉
q 0 = 〈s 0 ,1
ω(〈s 0 ,0
0〉) = 0 , ω(〈s 0 ,1 1〉) = 1 , ω(〈s 1 ,0
0〉) = 0 , ω(〈s 1 ,1
1〉) = 1

and the state transition table is specified as in Table 7.4.

Table 7.4
δ a b
0〉
〈s 0 ,0 0〉
〈s 0 ,0 0〉
〈s 1 ,0
1〉
〈s 0 ,1 0〉
〈s 0 ,0 0〉
〈s 1 ,0
0〉
〈s 1 ,0 1〉
〈s 0 ,1 1〉
〈s 1 ,1
1〉
〈s 1 ,1 1〉
〈s 0 ,1 1〉
〈s 1 ,1

Figure 7.8 displays this new Moore machine. Note that this transducer A, except for the placement
of the start state, looks very much like the Moore machine C given in Figure 7.6. Indeed, any ordered
pair that is labeled with the original start state would be an acceptable choice for the new start state in
the corresponding Moore machine. For example, the automaton A 0 , which is similar to A but utilizes
0〉 as the new start state, is another Moore machine that is equivalent to the original Mealy machine
〈s 0 ,0
B . The transition diagram for A 0 is shown in Figure 7.9. In fact, by appropriately recasting the defini-
tion of isomorphism so that it applies to Moore sequential machines, it can be shown that A 0 and C are
isomorphic. The definition of isomorphic again guarantees that a renaming of the states can be found
that preserves start states, transition functions, and output functions. Indeed, the definition of isomor-
phism agrees with that of Mealy machines (and of DFAs, for that matter) except in the specification of
the correspondence between output functions. The formal definition is given below.

220
Figure 7.9: The state transition diagram for the Moore machine A 0 in Example 7.12

Definition 7.21 Given two MSMs

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ω A 〉,

and a function µ: S A → S B , µ is called a Moore machine isomorphism from A to B iff the following five
conditions hold:isomorphism

i. µ(s 0 A ) = s 0B .

a ∈ Σ)(µ(δ A (s,a
ii. (∀s ∈ S A )(∀a a )) = δB (µ(s),a
a )).

iii. (∀s ∈ S A )(ω A (s) = ωB (µ(s))).

iv. µ is a one-to-one function between S A and S B .

v. µ is onto S B .

Example 7.13
The two Moore machines A 0 in Figure 7.9 and C in Figure 7.6 are indeed isomorphic. There is a func-
tion µ from the states of A 0 to the states of C that satisfies all five properties of an isomorphism. This
correspondence is given by µ(〈s 0 ,0 0〉) = r 0 , µ(〈s 0 ,1
1〉) = r 1 , µ(〈s 1 ,0
0〉) = r 2 , and µ(〈s 1 ,1
1〉) = r 3 , succinctly
defined by µ(〈s i , j 〉) = r 2i + j , for i , j ∈ {0
0,1
1}. As before, a homomorphism is meant to represent a corre-
spondence between states that preserves the algebraic structure of the transducer without necessarily
being a bijection.

Definition 7.22 Given two MSMs

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ω A 〉,

and a function µ: S A → S B , µ is called a Moore machine homomorphism from A to B iff the following
three conditions hold:homomorphism.

221
i. µ(s 0 A ) = s 0B .

a ∈ Σ)(µ(δ A (s,a
ii. (∀s ∈ S A )(∀a a )) = δB (µ(s),a
a )).

iii. (∀s ∈ S A )(ω A (s) = ωB (µ(s))).

The isomorphism µ discussed in Example 7.13 is also a homomorphism. Preserving the algebraic
structure of the transducer guarantees that the translation is also preserved: if A and B are homomor-
phic, then they are equivalent. The homomorphism criterion that applies to single letters once again
extends to similar statements about strings, as outlined in Lemma 7.5.

Lemma 7.5 If µ: S A → S B is a homomorphism between two MSMs

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ω A 〉,

then
(∀s ∈ S A )(∀x ∈ Σ∗ )(µ(δ A (s, x)) = δB (µ(s), x))
and
(∀s ∈ S A )(∀x ∈ Σ∗ )(ω A (s, x) = ωB (µ(s), x)).
Proof. The proof is by induction on |x| (see the exercises).

Corollary 7.9 If µ: S A → S B is a homomorphism between two MSMs A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B =

〈Σ, Γ, S B , s 0B , δB , ωB 〉, then A is equivalent to B ; that is, f A = f B .
Proof. The proof follows immediately from Lemma 7.5 and the definition of f M .

It is interesting to note that the MSMs A in Figure 7.8 and A 0 in Figure 7.9 are not isomorphic. In
fact, there does not even exist a homomorphism (in either direction) between A and A 0 since the start
states print different symbols, and rules (i) and (iii) therefore conflict. The absence of an isomorphism in
this instance illustrates that an analog to Theorem 7.5 cannot be asserted under the definition of Moore
sequential machines presented here. Observe that A and A 0 are equivalent and they are both minimal
(four states are necessary in a Moore machine to perform this translation), yet they are not isomorphic.
The reader should contrast this failure with the analogous statement about Mealy machines in Theorem
7.5.
Producing a result comparable to Theorem 7.5 is not possible without a fundamental adjustment of
at least one of the definitions. One possibility is to drop the distinguished start state from the defini-
tion of the Moore machine. This removes condition (i) from the isomorphism definition and thereby
resolves the conflict between (i) and (iii). We have already noted that many applications do not require a
distinguished start state (such as elevators and traffic signal controls), which makes this adjustment not
altogether unreasonable.
A more common alternative is to decree that a Moore sequential machine first print the character
specified by the start state upon being turned on (before any of the input tape is read) and then proceed
as before. This results in output strings that are always one symbol longer than the corresponding input
strings, and the length-preserving property of transducers is thereby lost. A more substantial drawback
results from the less natural correspondence between Mealy and Moore machines: no FST can be truly
equivalent to any MSM since translations would not even be of the same length.
The advantage of this decree is that machines like A and A 0 (from Figures 7.8 and 7.9) would no
longer be equivalent, and hence they would not be expected to be isomorphic. Note that equivalence is

222
lost since, under the new decree for translations, they would produce different output when presented
with, say, λ as input: A would print 1 while A 0 would produce 0 . Our definition of a MSM (Definition 7.16)
was chosen to remain compatible with the translations obtained from Mealy machines and to preserve
a distinguished state as the start state; these advantages were obtained at the expense of a convenient
analog to Theorem 7.5.
A third, and perhaps the best, alternative is to modify what we mean by a MSM isomorphism. Def-
inition 7.21 can be rephrased to relax the condition that the start states of the two machines must print
the same character.
As with Mealy machines, Moore machines can also be minimized, and a reduced and connected
MSM is guaranteed to be the smallest MSM which performs that translation. Note that Definitions
7.4 (FTD), 7.5 (equivalence), 7.9 (isomorphic), 7.10 (connected), 7.12 (state equivalence relation), 7.13
(reduced), and 7.15 (i th relation) have been phrased to encompass both forms of transducers. Minor
changes (generally involving the domain of the output function) are all that is necessary to make the re-
maining definitions and results conform to the Moore constructs. We begin with a formal definition of
minimality, which is in essence the same as the definitions presented for DFAs and FSTs (Definitions 2.7
and 7.6).

Definition 7.23 Given a MSM A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉, A is the minimal Moore machine for the transla-
tion f A iff for all MSMs B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉 for which f A = f B , kS A k ≤ kS B k.

A connected Moore machine is essential to minimality. The previous definition of connectedness

(Definition 7.10) suffices for both FSTs and MSMs and was therefore phrased to apply to all transducers,
rather than to one specific type of transducer. For an arbitrary Moore machine, the algorithm for finding
the set of accessible states is unchanged; transitions are followed from the start state until no further
new states are found. The connected version of a MSM is again obtained by paring down the state set to
encompass only the connected states and restricting the δ and ω functions to the smaller domain.

Definition 7.24 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉, define the transducer M c = 〈Σ, Γ, S c , s 0c , δc , ωc 〉, called
M connected, by

S c = {s ∈ S c | ∃x ∈ Σ∗ 3 δ(s 0 , x) = s}
s 0c = s 0
δc is essentially the restriction of δ to S c × Σ : (∀a
a ∈ Σ)(∀s ∈ S c )(δc (s,a a )), and ωc is the restriction
a ) = δ(s,a
of ω to S c × Σ : (∀s ∈ S c )(ωc (s) = ω(s)).

The concept of a reduced Moore machine and the definition of the state equivalence relation are
identical in spirit and in form to those presented for Mealy machines (Definitions 7.12 and 7.13). The
definition that outlines how to reduce a Moore machine by coalescing states differs from that given for
FSTs (Definition 7.14) only in the specification of the output function. In both Definition 7.14 and the
following Moore machine analog, the value ω takes for an equivalence class is determined by the value
given for a representative of that equivalence class. As before, this natural definition for the output func-
tion can be shown to be well defined (see the exercises).

Definition 7.25 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉, define M/E M , M modulo its state equivalence relation,
by M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉, where

S E M = {[s]E M | s ∈ S}
s 0E M = [s 0 ]E M

223
δE M is defined by
a ∈ Σ)(∀[s] ∈ S E M )(δE M ([s]E M ,a
(∀a a ) = [δ(s,a
a )]E M ),

and ωE M is defined by
(∀[s] ∈ S E M )(ωE M ([s]E M ) = ω(s))

The Moore machine M /E M has all the properties attributed to the Mealy version. Without changing
the nature of the translation, it is guaranteed to produce a MSM which is reduced.

Theorem 7.9 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉:

a. M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉 is equivalent to M .

b. M /E M is reduced.

c. If M is connected, so is M /E M .

d. Given a FTD function f , the minimal Moore machine corresponding to f must be reduced.

Proof. The proof is similar to Theorem 7.4 (see the exercises).

As mentioned earlier, the definition of a MSM chosen here denies a convenient analog to Theorem
7.5. However, a reduced and connected Moore machine must be minimal.

Theorem 7.10 (a) Given a MSM M , a necessary and sufficient condition for M to be minimal is that M is
both reduced and connected.

(b) Given a MSM M , M c /E M c , is minimal.

Proof. See the exercises.

The minimal Moore machine corresponding to a MSM M can thus be obtained if the connected state
set and the state equivalence relation can be computed. The algorithm for calculating the accessible
states is the same as before, and computing the state equivalence relation will again be accomplished
using the concept of the i th state equivalence relation (Definition 7.15). All the results proved previously
in Lemma 7.3 still hold, showing that successive calculations are guaranteed to halt and produce E M . All
that remains is to specify both a starting point and a way to find the next relation from the current E i M .
With Mealy machines, E 0M consisted of one single equivalence class, since λ could not distinguish
between states. All states were therefore related to each other under E 0M . With Moore machines, dif-
ferent states cause different letters to be printed. E 0M can therefore be thought of as grouping together
states that print the same symbol.

Lemma 7.6 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉:

(a) E 0M is defined by sE 0M t ⇔ (ω(s) = ω(t )).

(b) For i ≥ 0, E i +1M can be computed from E i M as follows:

a ∈ Σ)(δ(s,a
(∀s ∈ S)(∀t ∈ S)(∀i ≥ 0)(sE i +1M t ⇔ sE i M t ∧ (∀a a )E i M δ(t ,a
a ))).

Proof. The proof is essentially the same as in Chapter 3 (see Theorem 3.8).

224
Corollary 7.10 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉, there is an algorithm for computing E M .
Proof. See the exercises.

E 0M will generally have one equivalence class for each symbol in Γ; r k(E 0M ) could be less than kΓk if
some output symbols are not printed by any state (remember that equivalence classes are by definition
nonempty). The rule for computing E i +1M from E i M is identical to that given for Mealy machines (and
DFAs); only the starting point, E 0M , had to be redefined for Moore machines (compare with Lemma 7.4).
Lemmas 7.3 and 7.6 imply that there is an algorithm for finding E M for any Moore machine M ; this was
the final computation needed to produce M c /E M c , which will be the minimal Moore machine equivalent
to the MSM M .

Corollary 7.11 Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉, there is an algorithm for computing the minimal ma-
chine equivalent to M.
Proof. See the exercises.

7.4 Transducer Applications and Circuit Implementation

The vending machine example that began this chapter showed that the transducer was capable of mod-
eling many of the machines we deal with in everyday life. This section gives examples of several types
of applications and then shows how to form the circuitry that will implement such transducers. Trans-
ducers can be used not only to model physical machinery, but can also form the basis for computational
algorithms. The following example can be best thought of not as a model of a machine that receives files,
but as a model of the behavior of the computer algorithm that specifies how such files are to be received.

Example 7.14
The transducer metaphor is often used to succinctly describe the structure of many algorithms com-
monly used in computer applications, most notably in network communications. Kermit is a popular
means of transferring files between mainframes and microcomputers. A transfer is accomplished by the
send portion of Kermit on the source host exchanging information with the receive portion of Kermit
on the destination host. The two processes communicate by exchanging packets of information; these
packets comprise the input alphabet of our model. When the Kermit protocol was examined in Chapter
1 (Example 1.16), it was noted that a full description of the algorithm must also describe the action taken
upon receipt of an incoming packet; these actions comprise the output alphabet of our model. During a
file transfer, the states of the receiving portion of Kermit on the destination host are R (awaiting a transfer
request), RF (awaiting the name of the file to be transferred), RD (awaiting more data to be placed in the
new file), and A (abort due to an unrecoverable error). The set of states will again be A, R, RD, RF }.
Expected inputs are represented by S (an initialization packet, indicating that a transfer is requested),
H (a header packet, containing the name of one of the files to be created and opened), D (a data packet),
Z (an end of file (EOF)EOF packet marker, signaling that no more data need be placed in the currently
opened file), and B (break, signaling the end of transmission). Unexpected input, representing a garbled
transmission, is denoted by X . The input alphabet is therefore Σ = {B B ,D
D ,H
H ,S
S , X , Z }.
When Kermit receives a recognizable packet [that is also appropriate at the current stage of the con-
versation], it sends an acknowledgment (ACK) back to the other host. This action will be represented in
the output alphabet by the symbol Y . When the receiver expects and gets a valid header packet, it opens
the appropriate file and also acknowledges the packet. This pair of actions is represented by the output

225
Figure 7.10: The state transition diagram for the receive portion of Kermit, as discussed in Example 7.14

symbol O . W will denote the writing of the packet contents to the opened file and acknowledgment of
the packet, and ϕ will denote that no action is taken. C will indicate that the currently opened file is
closed. N will represent the transmission of a NAK (negative acknowledgment, essentially a request to
resend the contents of the previous packet), which is used to alert the sender that a garbled packet was
detected. The output alphabet is therefore Γ = {N N ,O
O ,W
W ,Y ϕ}. The complete algorithm is summed up in
Y ,ϕ
the state transition diagram given in Figure 7.10.
Hardware as well as software can be profitably modeled by finite-state transducers. The column-
by-column addition of two binary numbers is quite naturally modeled by a simple two-state FST, since
the carry bit is the only piece of previous history needed by the transducer to correctly sum the current
column. This discussion will focus on binary numbers in order to keep the alphabets small, but trivial
extensions will make the two-state machine apply to addition in any base system.

Example 7.15

A computation such as the one shown in Figure 7.11a would be divided up into columns and presented to
the FST as indicated in Figure 7.11b (shown in mid-computation). A digit from the first number and the
corresponding digit from the second number are presented to the transducer as a single input symbol.
With the column pairs represented by standard ordered pairs, the corresponding input might appear as
in Figure 7.12 (shown at the start of computation). As illustrated by the orientation of the tape, this FST
must be set up to process strings in reverse, that is, from right to left, since computations must start with
the low-order bits to ensure that the correct answer is always (deterministically) computed. With states
C (representing carry) and N (no carry), input alphabet Σ = {〈0, 0〉, 〈0, 1〉, 〈1, 0〉, 〈1, 1〉} and output alphabet
Γ = {00,1
1}, this binary adder behaves as shown in the state transition diagram given for B in Figure 7.13.
For the problem displayed in Figure 7.11a, the output produced by B would be 01001 (9 in binary), which
is the appropriate translation of the addition problem given (6 + 3).
Unfortunately, addition is not truly length preserving; adding the three-digit numbers 110 and 011
produces a binary answer that is four digits long. The adder B defined in Example 7.15 cannot correctly
reflect a carry out of the most significant binary position. While the concept of final states is not present
in our formal definition of transducers, this FST B provides an example in which it is natural to both

226
Figure 7.11: (a) The addition problem discussed in Example 7.15 (b) Conceptual model of the binary
adder discussed in Example 7.15

Figure 7.12: The binary adder discussed in Example 7.15

227
Figure 7.13: The state transition diagram for a binary adder modeled as a Mealy machine, as discussed
in Example 7.15

produce continuous output and track the terminal state: if a computation ends in state C , then we know
that an overflow condition has occurred. B clearly operates correctly on all strings that have been padded
with 〈0, 0〉 as the last (leftmost) symbol; employing such padding is reminiscent of the use of the <EOS>
symbol when building circuits for DFAs. Indeed, it might be profitable to specifically include an <EOS>
symbol and have the transducer react to <EOS> by printing a y or n to indicate whether or not there was
overflow.
While the binary adder is only one small component of a computer, finite-state transducers can be
profitably used to model complete systems; one such application involves traffic lights. The controller
for a large intersection may handle eight banks of traffic signals for the various straight-ahead and left-
turn lanes, as well as four sets of walk lights (see the exercises). Input about the intersection conditions is
often fed to the controller from pedestrian walk buttons and metal detectors embedded in the roadway.
For simplicity, we will choose a simplified intersection to illustrate how to model a traffic controller by
a transducer. The simplified example nevertheless incorporates all the essential features of the more
intricate intersections. A full-blown model would only require larger alphabets and more states.

Example 7.16
Consider a small north-south street that terminates as it meets a large east-west avenue, as shown in
Figure 7.14. Due to the heavy traffic along the avenue, the westbound traffic attempting to turn left is
governed by a left-turn signal (signal 2 in Figure 7.14). Traffic continuing west is controlled by signal 1,
while signal 3 governs eastbound traffic. Vehicles entering the intersection from the south rely on signal
4. The red, yellow, and green lights of these four signals represent the output of the transducer. Protecting
westbound traffic while turning left is accomplished by an output configuration of 〈G,G, R, R〉, which is
meant to indicate that the first two signals are green while the eastbound and northbound lanes have red
lights. The output alphabet can thus be represented by ordered foursomes of R, Y , and G (red, yellow,
and green). We can succinctly define

Γ = {R, Y ,G} × {R, Y ,G} × {R, Y ,G} × {R, Y ,G},

though there will be some combinations (like 〈G,G,G,G〉) that are not expected to appear in the model.
The most prevalent output configuration is expected to be 〈G, R,G, R〉, which allows unrestricted
flow of the east-west traffic on the avenue. Due to the relatively small amount of traffic on the north-
south street, the designers of the intersection chose to embed the sensors α in the left-turn lane and β
in the northbound lane and only depart from the 〈G, R,G, R〉 configuration when a vehicle is sensed by
these detectors. There is therefore a pair of inputs to our transducer, indicating the status of sensor α

228
Figure 7.14: The intersection discussed in Example 7.16

and sensor β. The four combinations will be represented by 〈0, 0〉, (no traffic above either sensor), 〈1, 0〉
(sensor α active), 〈0, 1〉 (sensor β active), and 〈1, 1〉 (both detectors have currently sensed vehicles).
The controller is most naturally modeled by a Moore machine, since the state of the system is so
intimately tied to the status of the four lights. From the configuration 〈G, R,G, R〉, activation of the β
sensor signifies that all traffic should be stopped except that governed by signal 4. The output should
therefore move through the pattern 〈Y , R, Y , R〉 to 〈R, R, R,G〉 and remain in that state until the β sensor
is deactivated. This and the other transitions are illustrated in Figure 7.15.
In actuality, the duration of patterns incorporating the yellow caution light is shorter than others.
With the addition of extra states, a clock cycle length on the order of 5 seconds (commensurate with the
typical length of a yellow light) could be used to govern the length of the different output configurations.
For example, incorporating s g as shown in Figure 7.16 guarantees that the output 〈R, R, R,G〉 will per-
sist for at least two cycles (10 seconds). From an engineering standpoint, complicating the finite-state
control in this manner can be avoided by varying the clock cycle length.
We now discuss some of the hardware that comprise the heart of traffic controllers and vending ma-
chines. As was done with deterministic finite automata in Chapter 1 and nondeterministic finite au-
tomata in Chapter 4, finite-state transducers can be implemented with digital logic circuits. We again
use a clock pulse, D flip-flops, and an encoding for the states. Besides needing an encoding for the input
alphabet, it is now necessary to have an encoding for the output alphabet, which will be represented by
the bits w 1 , w 2 , w 3 , . . .. We again suggest (solely for simplicity and standardization in the exercises) order-
ing the symbols in Γ alphabetically and assigning binary codes in ascending order, as was recommended
earlier for Σ. We must construct a circuit for generating each w j , in the same manner as we built circuits
implementing the accept function for finite automata.
Many practical applications of FSTs (such as traffic signals) operate continuously, rather than starting
and stopping for one small string. In such cases, an <EOS> symbol is not necessary; the circuit operates
until power is shut off. Similarly, an <SOS> symbol is not essential for a traffic signal complex; upon
resuming operation after a power failure, it is usually immaterial whether east-west traffic first gets a
green light or whether it gets a red light in deference to the north-south traffic. In contrast, it is important
for vending machines to initialize to the proper state or some interesting discounts could be obtained by
playing with the power cord.

229
Figure 7.15: The state transition diagram for a stoplight modeled as a Moore machine, as discussed in
Example 7.16

Figure 7.16: The modified controller discussed in Example 7.16

230
Figure 7.17: The state transition diagram for the Mealy machine in Example 7.17

Table 7.5a Table 7.5b

t 1 a 1 t 10 t1 a1 w 1
1 1 1 1 1 0
0 1 0 0 1 1
1 0 0 1 0 1
0 0 1 0 0 1

Example 7.17
Consider the FST displayed in Figure 7.17. If <EOS> and <SOS> is unnecessary, then the input alphabet
can be represented by a single bit a 1 , with a 1 = 0 representing c and a 1 = 1 representing d . Similarly, the
output alphabet can be represented by a single bit w 1 with w 1 = 0 representing a and w 1 = 1 representing
b . The states can likewise be represented by a single bit t 1 , with t 1 = 0 representing s 0 and t 1 = 1 repre-
senting s 1 . As before, we can construct a truth table to represent the state transition function, defining t 10
in terms of t 1 and a 1 . The complete table is given in Table 7.5a.
The principal disjunctive normal form for the transition function is therefore seen to be t 10 = (t 1 ∧
a 1 ) ∨ (¬t 1 ∧ ¬a 1 ). The output function can be found in a similar manner, as shown in Table 7.5b.
Thus, w 1 = (t 1 ↑ a 1 ). As in Example 1.12, the circuit for t 1 , will be fed back into the D flip-flop(s); the
circuit for w 1 will form the output for the machine (replacing the acceptance circuit used in DFAs). The
complete network is shown in Figure 7.18. Note that we would want the output device to print on the
rising edge of the clock cycle, before the new value of t 1 propagates through the circuitry.
A larger output alphabet would require an encoding of several bits; each w 1 would have its own
network of gates, and the complete circuit would then simultaneously generate several bits of output
information. As in Chapter 1, additional states or input symbols will add bits to the other encoding

Figure 7.18: Circuit diagram for Example 7.17

231
schemes and add to the number of rows in the truth tables for δ and ω. Each additional state bit will also
require its own D flip-flop and a new truth table for its feedback loop. Each additional state bit doubles
the number of states that can be represented, which means that, as was the case with deterministic finite
automata, the number of flip-flops grows as the logarithm of the number of states.

Exercises
7.1. Let A = 〈Σ, Γ, S, s 0 , δ, ω〉 be a Mealy machine. Prove the following statements from Theorem 7.1:

(a) (∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀t ∈ S)(ω(t , y x) = ω(t , y) · ω(δ(t , y), x))

(b) (∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀s ∈ S)(δ(s, y x) = δ(δ(s, y), x))

7.2. Refer to Lemma 7.1 and prove:

(a) (∀s ∈ S A )(∀x ∈ Σ∗ )(µ(δ A (s, x)) = δB (µ(s), x))

(b) (∀s ∈ S A )(∀x ∈ Σ∗ )(ω A (s, x) = ωB (µ(s), x))

7.3. Prove Corollary 7.3.

7.4. Prove Corollary 7.4 by showing that a necessary and sufficient condition for a Mealy machine M to
be minimal is that M is both reduced and connected.

7.5. Show that any FTD function f must satisfy a “pumping lemma.”

(a) Devise the statement of a theorem that shows that the way any sufficiently long string is trans-
lated determines how an entire sequence of longer strings are translated.
(b) Prove the statement made in part (a).

7.6. In each of the following parts, you may assume the results in the preceding parts; for example, you
may assume parts (a) and (b) when proving (c).

(a) Prove Lemma 7.3a.

(b) Prove Lemma 7.3b.
(c) Prove Lemma 7.3c.
(d) Prove Lemma 7.3d.
(e) Prove Lemma 7.3e.

7.7. Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉, prove the following statements from Lemma 7.4:

(a) E 0M has just one equivalence classes, which consists of all of S.

a ∈ Σ)(ω(s,a
(b) E 1M is defined by sE 1M t ⇔ (∀a a ) = ω(t ,a
a )).
a ∈ Σ)(δ(s,a
(c) (∀s ∈ S)(∀t ∈ S)(∀i ≥ 1)(sE i +1M t ⇔ sE i M t ∧ (∀a a )E i M δ(t ,a
a )).

7.8. Prove Theorem 7.6 by showing that if A = 〈Σ, Γ, S, s 0 , δ, ω〉 is a Moore machine then

(∀x ∈ Σ∗ )(∀y ∈ Σ∗ )(∀t ∈ S)(ω(t , y x) = ω(t , y) · ω(δ(t , y), x))

7.9. Prove Theorem 7.7.

232
7.10. Prove Theorem 7.8.

7.11. Use Lemma 7.6 to find EC in Example 7.10.

7.12. Show that there is a homomorphism from the machine M in Example 7.11 to the machine B in
Example 7.2.

7.13. Prove that, in a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉, (∀t ∈ S)(∀a

a ∈ Σ)(ω(t ,a
a ) = ω(t ,a
a )).

7.14. Modify the vending machine in Example 7.1 so that it can return all the coins that have been in-
serted. Let r denote a new input that represents activating the coin return, and let a represent a
new output corresponding to the vending machine releasing all the coins in its temporary holding
area.

7.15. Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉 and M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉, show that δE M is well de-
fined.

7.16. Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉 and M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉, show that ωE M is well de-
fined.

7.17. Give an example that shows that requiring a FST M to be reduced is not a sufficient condition to
ensure that M is minimal.

7.18. Show that the function µ defined in the proof of Theorem 7.5 is well defined.

7.19. Given the function µ defined in the proof of Theorem 7.5, prove that µ is really an isomorphism; that
is:

(a) µ(s 01 ) = s 02
(b) (∀s ∈ S 1 )(∀a ∈ Σ)(µ(δ1 (s,a
a )) = δ2 (µ(s),a
a ))
a ∈ Σ)(ω1 (s,a
(c) (∀s ∈ S 1 )(∀a a ) = ω2 (µ(s),a
a ))
(d) µ is a one-to-one function between S 1 and S 2 .
(e) µ is onto S 2 .

7.20. Consider a transducer that implements a “one-unit delay” over the alphabets Σ = {a b } and Γ =
a ,b
a ,b
{a b ,xx }. The first letter of the output string should be x , and the nth letter of the output string
should be the n − 1st letter of the input string (for n > 1). Thus, ω(abbab
abbab
abbab) = xabba
xabba, and so on.

(a) Define a sextuple for a Mealy machine that will perform this translation.
(b) Draw a Mealy machine that will perform this translation.
(c) Define a sextuple for a Moore machine that will perform this translation.
(d) Draw a Moore machine that will perform this translation.

7.21. Consider the circuit diagram that would correspond to the vending machine in Example 7.1.

(a) Does there appear to be any reason to use an <EOS> symbol in the input alphabet? Explain.
(b) Does there appear to be any reason to use an <SOS> symbol in the input alphabet? Explain.
(c) How many encoding bits are needed for the input alphabet? Define an appropriate encoding
scheme.

233
(d) How many encoding bits are needed for the output alphabet? Define an appropriate encoding
scheme.
(e) How many encoding bits are needed for the state names? Define an appropriate encoding
scheme.
(f) Write the truth table and corresponding (minimized) Boolean function for t 2 . Try to make the
best possible use of the don’t-care combinations.
(g) Write the truth table and corresponding (minimized) Boolean function for w 2 . Try to make the
best possible use of the don’t-care combinations.
(h) Define the other functions and draw the complete circuit for the vending machine.

7.22. Consider the vending machine described in Exercise 7.14.

(a) Does there appear to be any reason to use an <EOS> symbol in the input alphabet? Explain.
(b) How many encoding bits are needed for the input alphabet? Define an appropriate encoding
scheme.
(c) How many encoding bits are needed for the output alphabet? Define an appropriate encoding
scheme.
(d) How many encoding bits are needed for the state names? Define an appropriate encoding
scheme.
(e) Write the truth table and corresponding (minimized) Boolean function for t 3 Try to make the
best possible use of the don’t-care combinations.
(f ) Write the truth table and corresponding (minimized) Boolean function for w 3 . Try to make the
best possible use of the don’t-care combinations.
(g) Define the other functions and draw the complete circuit for the vending machine.

7.23. Use the standard encoding conventions to draw the circuit corresponding to the FST defined in
Example 7.2.

7.24. Use the standard encoding conventions to draw the circuit corresponding to the FST defined in
Example 7.6.

7.25. Use the standard encoding conventions to draw the circuit corresponding to the FST D defined in
Example 7.8.

7.26. Give an example that shows that requiring a FST M to be connected is not a sufficient condition to
ensure that M is minimal.

7.27. Consider a transducer that implements a “two-unit delay” over the alphabets Σ = {a b } and Γ =
a ,b
a ,b
{a b ,xx }. The first two letters of the output string should be xx
xx, and the nth letter of the output string
should be the n − 2nd letter of the input string (for n > 2). Thus, ω(abbaba
abbaba
abbaba) = xxabba
xxabba, and so on.

234
7.28. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M 1 is not re-
duced.
(b) What essential property of the proposed isomorphism µ is now absent?

7.29. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M 1 is not con-
nected.
(b) What essential property of the proposed isomorphism µ is now absent?

7.30. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M 2 is not re-
duced.
(b) What essential property of the proposed isomorphism µ is now absent?

7.31. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M 2 is not con-
nected.
(b) What essential property of the proposed isomorphism µ is now absent?

7.32. (a) Give an example of a FST A for which A is not reduced and A 0 is not reduced.
(b) Give an example of a FST A for which A is not reduced and A 0 is reduced.

7.33. Complete the proof of Theorem 7.4 by showing:

(a) (∀y ∈ Σ∗ )(∀t ∈ S)(ω(t , y) = ωE M ([t ]E M , y)).

(b) M /E M is equivalent to M .
(c) M /E M is reduced.
(d) If M is connected, then M /E M is connected.

7.34. Let Σ = {0 1} and Γ = {yy ,n

0,1 n }.

(a) Define f 1 (a a m ) = y m if a 1 = 1 , and let f 1 (a

a 1a 2 . . .a a 1a 2 . . . a m ) = n m otherwise. Thus, f 1 (10
10
10) = y y
0101
and f 1 (0101
0101) = nnnn
nnnn. Demonstrate that f 1 is FTD.
(b) Define f 2 (a a m ) = y m if a m = 1 , and let f 2 (a
a 1a 2 . . .a a m ) = n m otherwise. Thus, f 2 (10
a 2a 2 . . .a 10
10) = nn
0101
and f 2 (0101
0101) = y y y yy. Prove that f 2 is not FTD.

7.35. Let Σ = {a b } and Γ = {0

a ,b 0,1 a 1a 2 . . . a m ) to be the first m letters of the infinite sequence
1}. Define f 3 (a
0100100010000105 106 107 108 1 . . .. Thus, f 3 (ababababab
ababababab
ababababab) = 0100100010 and f 3 (abbaa abbaa
abbaa) = 01001
01001.
Argue that f 3 is not FTD.

7.36. Assume f is FTD. Prove that (∀x ∈ Σn )(∀y ∈ Σ∗ )(∀z ∈ Σ∗ ) (the first n letters of f (x y) must agree with
the first n letters of f (xz)).

7.37. Consider an elevator in a building with two floors. Floor one has an up button u on the wall, floor
two has a down button d , and there are buttons labeled 1 and 2 inside the elevator itself. The four
actions taken by the elevator are close the doors, open the doors, go to floor 1, and go to floor 2.
Assume that an inactive elevator will attempt to close the doors. For simplicity, assume that the
model is not to incorporate sensors to test for improperly closed doors, nor are there buttons to
hold the doors open, and the like. Also assume that when the elevator arrives on a given floor the call
button for that floor is automatically deactivated (rather than modeling the shutoff as a component
of the output).

235
Figure 7.19: The intersection discussed in Exercise 7.39

(a) Define the input alphabet for this transducer (compare with Example 7.16).
(b) Define the output alphabet for this transducer.
(c) Define the Mealy sextuple that will model this elevator.
(d) Draw a Mealy machine that will model this elevator.
(e) Define the Moore sextuple that will model this elevator.
(f ) Draw a Moore machine that will model this elevator.
(g) Without using <EOS> or <SOS>, draw a circuit that will implement the transducer defined in
part (d).

7.38. Build a Mealy machine that will serve as a traffic signal controller for the intersection described in
Example 7.16.

7.39. Consider the intersection described in Example 7.16 with walk signals added to the north-south
crosswalks (only). As shown in Figure 7.19, there is an additional input sensor γ corresponding to
the pedestrian walk button and an additional component of the output that will always be in one
of two states (WW for walk and D for don’t walk). There are walk buttons at each of the corners, but
they all trip the same single input sensor; similarly, the output for the walk light is displayed on each
corner, but they all change at once and can be modeled as a single component. Assume that if the
walk button is activated all traffic but that on the side street is stopped, and the walk lights change
from D to W . Further assume that the walk lights revert to D and W before the side street light turns
to yellow.

(a) Define the new input and output alphabets.

(b) Draw a Moore machine that implements this scenario.

236
Figure 7.20: The intersection discussed in Exercise 7.40

(c) Draw a Mealy machine that implements this scenario.

7.40. Consider an intersection similar to that described in Example 7.16, as shown in Figure 7.20. There
are now four left-turn signals in addition to the four straight-ahead signals and additional input sen-
sors γ and δ for the other left-turn lanes. Assume that a normal alternation of straight-ahead traffic
is carried out, with no left turns indicated unless the corresponding sensor is activated. Further
assume that left-turn traffic will be allowed to precede the opposing traffic.

(a) Define the new input and output alphabets.

(b) Draw a Moore machine that implements this scenario.
(c) Draw a Mealy machine that implements this scenario.

7.41. Consider an adder similar to the one in Example 7.15, but which instead models addition in base 3.

(a) Define the input and output alphabets.

(b) Draw a Mealy machine that performs this addition.
(c) Draw a Moore machine that performs this addition.
(d) Draw a circuit that will implement the transducer built in part (b); use both <EOS> and <SOS>.

237
7.42. Consider an adder similar to the one in Example 7.15, but which models addition in base 10.

(a) Define the input and output alphabets.

(b) Define the sextuple of a Mealy machine that performs this addition (by indicating the output
and transitions by concise formulas, rather than writing out the 200 entries in the tables).
(c) Define the sextuple of a Moore machine that performs this addition.
(d) Draw a circuit that will implement the transducer built in part (b); use both <EOS> and <SOS>.

7.43. Consider a function f 4 implementing addition in a manner similar to the function described by the
transducer in Example 7.15, but that scans the characters (that is, columns of digits) from left to
right (rather than right to left as in Example 7.15). Argue that f 4 is not FTD.

7.44. Given a MSM M , prove the following statements from Theorem 7.9:

(a) M /E M is equivalent to M .
(b) M /E M is reduced.
(c) If M is connected, so is M /E M .

7.45. Given a FTD function f , prove that the minimal Moore machine corresponding to f must be re-
duced.

7.46. Given a MSM M , prove the following statements from Theorem 7.10:

(a) A necessary and sufficient condition for M to be minimal is that M is both reduced and con-
nected.
(b) M c /E M c is minimal.

7.47. Given MSMs A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉, and a homomorphism µ: S A →

S B , prove the following statements from Lemma 7.5 and Corollary 7.9:

(a) (∀s ∈ S A )(∀x ∈ Σ∗ )(µ(δ A (s, x)) = δB (µ(s), x)).

(b) (∀s ∈ S A )(∀x ∈ Σ∗ )(ω A (s, x) = ωB (µ(s), x))).
(c) A is equivalent to B ; that is, f A = f B .

7.48. Prove Corollary 7.10.

7.49. Prove Corollary 7.11.

7.50. Given a FST M = 〈Σ, Γ, S, s 0 , δ, ω〉 and M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉 defined by

S E M = {[s]E M | s ∈ S}
s 0E M = [s 0 ]E M

δE M is defined by
(∀a ∈ Σ)(∀[s]E M ∈ S E M )(δE M ([s]E M ,a
a ) = [δ(s,a
a )]E M )

and ωE M is defined by
a ∈ Σ)(∀[s]E M ∈ S E M )(ωE M ([s]E M ,a
(∀a a ) = ω(s,a
a ))

238
(a) Show that δE M is well defined.
(b) Show that ωE M is well defined.

7.51. Given a MSM M = 〈Σ, Γ, S, s 0 , δ, ω〉 and M /E M = 〈Σ, Γ, S E M , s 0E M , δE M , ωE M 〉 defined by

S E M = {[s]E M | s ∈ S}
s 0E M = [s 0 ]E M

δE M is defined by
(∀a ∈ Σ)(∀[s]E M ∈ S E M )(δE M ([s]E M ,a
a ) = [δ(s,a
a )]E M )

and ωE M is defined by
(∀[s]E M ∈ S E M )(ωE M ([s]E M ) = ω(s))

(a) Show that δE M is well defined.

(b) Show that ωE M is well defined.

7.52. Consider the following assertion: If there is an isomorphism from A to B and A is connected, then
B must also be connected.

(a) Prove that this is true for isomorphisms between Mealy machines.
(b) Prove that this is true for isomorphisms between Moore machines.

7.53. Consider the following assertion: If there is an isomorphism from A to B and B is connected, then
A must also be connected.

(a) Prove that this is true for isomorphisms between Mealy machines.
(b) Prove that this is true for isomorphisms between Moore machines.

7.54. Consider the following assertion: If there is a homomorphism from A to B and A is connected, then
B must also be connected.

(a) Give an example of two Mealy machines for which this assertion is false.
(b) Give an example of two Moore machines for which this assertion is false.

7.55. Consider the following assertion: If there is a homomorphism from A to B and B is connected, then
A must also be connected.

(a) Give an example of two Mealy machines for which this assertion is false.
(b) Give an example of two Moore machines for which this assertion is false.

7.56. Assume A and B are connected FSTs and that there exists an isomorphism ψ from A to B and an
isomorphism µ from B to A. Prove that ψ = µ−1 .

7.57. Assume A and B are FSTs and there exists an isomorphism ψ from A to B and an isomorphism µ
from B to A. Give an example for which ψ 6= µ−1 .

7.58. Give an example of a three-state MSM for which E 0A has only one equivalence class. Is it possible
for E 0A to be different from E 1A in such a machine? Explain.

239
7.59. (a) Give an example of a Mealy machine for which M is not connected and M /E M is not connected.
(b) Give an example of a Mealy machine for which M is not connected but M /E M is connected.

7.60. (a) Give an example of a Moore machine for which M is not connected and M /E M is not connected.
(b) Give an example of a Moore machine for which M is not connected but M /E M is connected.

7.61. For a homomorphism µ: S A → S B between two Mealy machines

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉,

prove (∀s, t ∈ S A )(µ(s)E B µ(t ) ⇔ sE A t ).

7.62. For a homomorphism µ: S A → S B between two Moore machines

A = 〈Σ, Γ, S A , s 0 A , δ A , ω A 〉 and B = 〈Σ, Γ, S B , s 0B , δB , ωB 〉,

prove (∀s, t ∈ S A )(µ(s)E B µ(t ) ⇔ sE A t ).

7.63. (a) Give an example of a FST for which A is not reduced and A c is not reduced.
(b) Give an example of a FST for which A is not reduced and A c is reduced.

7.64. (a) Give an example of a MSM for which A is not reduced and A c is not reduced.
(b) Give an example of a MSM for which A is not reduced and A c is reduced.

7.65. Isomorphism (∼
=) is a relation in the set of all Mealy machines.

(a) Prove that ∼

= is a symmetric relation; that is, formally justify that if there is an isomorphism
from A to B then there is an isomorphism from B to A.
(b) Prove that ∼
= is a reflexive relation.
(c) Show that if f and g are isomorphisms, then f ◦ g is also an isomorphism (whenever f ◦ g is
defined).
(d) From the results of parts (a), (b), and (c) given above, prove that ∼
= is an equivalence relation
over the set of all Mealy machines.
(e) Show that homomorphism is not an equivalence relation over the set of all Mealy machines.

7.66. (a) Prove that ∼

= is an equivalence relation in the set of all Moore machines.
(b) Show that homomorphism is not an equivalence relation over the set of all Moore machines.

7.67. Given a Mealy machine M = 〈Σ, Γ, S, s 0 , δ, ω〉, prove that there exists a homomorphism µ from M to
M /E M .

7.68. Given a Moore machine M = 〈Σ, Γ, S, s 0 , δ, ω〉, prove that there exists a homomorphism µ from M to
M /E M .

7.69. Consider the intersection presented in Example 7.16 and note that the construction presented in
Figure 7.15 prevents the transducer from leaving s 2 or s 6 while the appropriate sensor is active. The
length of time spent in each output configuration can be limited by replacing s 2 with a sequence
of states that ensures that the output configuration will change within, say, three clock cycles (this
is similar to the spirit in which s 8 was added). A similar expansion can be made with regard to s 6 .
While this would not be a likely problem if the side street were not heavily traveled, higher traffic
situations would require a different solution than that shown in Figure 7.15.

240
(a) Modify Figure 7.15 so that the output configuration can, when necessary, remain at 〈R, R, R,G〉
for three clock cycles, but not for four clock cycles.
(b) Starting with the larger transducer found in part (a), make a similar expansion to s 6 .
(c) Starting with the larger transducer found in part (a), make an expansion to s 6 in such a way
that the left-turn signal is guaranteed to be green for a minimum of two clock cycles and a
maximum of four clock cycles.

7.70. Consider the intersection presented in Example 7.16 and note that the construction presented in
Figure 7.15 prevents the transducer from returning to s 0 while either of the sensors is active. Thus,
even if the length of time spent in each output configuration was limited (see Exercise 7.69), left-
turn and northbound traffic could perpetually alternate without ever allowing the east-west traffic
to resume. This would not be a likely problem if the side street were not heavily traveled, but higher
traffic situations would require a different solution than the one presented in Example 7.16.

(a) Without adding any states to Figure 7.15, modify the state transition diagram so that east-west
traffic will receive a green light occasionally.
(b) By adding new states to Figure 7.15 (to remember the last lanes that had the right of way),
implement a controller that will ensure that no lane will get a second green light if any other
lane that has an active sensor has yet to receive a green light. (It may be helpful to think of the
east-west traffic as having an implicit sensor that is always actively demanding service).

7.71. Prove that if two Moore machines are homomorphic then they are equivalent.

7.72. Show that, for any FTD function f : Σ∗ → Σ∗ , DΣ is closed under f .

241
242
Chapter 8

Regular Grammars

In the preceding chapters, we have seen several ways to characterize the set of FAD languages: via DFAs,
NDFAs, right congruences, and regular expressions. In this chapter we will look at still another way to
represent this class, using the concept of grammars. This construct is very powerful, and many restric-
tions must be placed on the general definition of a grammar in order to limit the scope to FAD languages.
The very restrictive regular grammars will be explored in full detail in this chapter. The more robust
classes of grammars introduced here will be discussed at length in later chapters.

8.1 Overview Of The Grammar Hierarchy

Much like the rules given in Backus-Naur Form (BNF) in Chapters 0 and 1, the language-defining power
of a grammar stems from the generation of strings through the successive replacement of symbols in a
partially constructed string. These replacement rules form the foundation for the definition of program-
ming languages and are used in compiler construction not only to determine correct syntax, but also
to help determine the meaning of the statements and thereby guide the translation of a program into
machine language.

Example 8.1
A BNF that describes the set of all valid FORTRAN identifiers is given below. Recall that such identifiers
must begin with a letter and be followed by no more than five other letters and numerals. These criteria
can be specified by the following set of rules.
S:: = a S 1 |b b S 1 | . . . |zz S 1 |a
a |bb | . . . |zz
b S 2 | . . . |zz S 2 |a
S 1 :: = a S 2 |b a |bb | . . . |zz |0
0S 2 |1
1S 2 |2
2S 2 | . . . |9
9S 2 |0
0|1
1|2
2| . . . |9
9
b S 3 | . . . |zz S 3 |a
S 2 :: = a S 3 |b a |bb | . . . |zz |0
0S 3 |1
1S 3 |2
2S 3 | . . . |9
9S 3 |0
0|1
1|2
2| . . . |9
9
b S 4 | . . . |zz S 4 |a
S 3 :: = a S 4 |b a |bb | . . . |zz |0
0S 4 |1
1S 4 |2
2S 4 | . . . |9
9S 4 |0
0|1
1|2
2| . . . |9
9
b S 5 | . . . |zz S 5 |a
S 4 :: = a S 5 |b a |bb | . . . |zz |0
0S 5 |1
1S 5 |2
2S 5 | . . . |9
9S 5 |0
0|1
1|2
2| . . . |9
9
b | . . . |zz |0
S 5 :: = a |b 0|1
1|2 2| . . . |9
9
The first rule specifies that S can be replaced by any of the 26 letters of the Roman alphabet or any such
letter followed by the token S 1 . These productions (rules) do indeed define the variable names found in
FORTRAN programs. Starting with S, a derivation might proceed as S ⇒ s S 1 ⇒ su suS 2 ⇒ sum
sum, indicating
that sum is a valid FORTRAN identifier. Invalid identifiers, such as 2a 2a, cannot be derived from these
productions by starting with S.

243
Example 8.2
The strings used to represent regular sets (see Chapter 6) could have been succinctly specified using
BNF. Recall that regular languages over, say, {a a ,b
b ,cc } are described by regular expressions. These regu-
lar expressions were strings over the alphabet {; ;,²²,a a ,b ∪,··, ∗ ,)),((}, and the formal definition was quite
b ,cc ,∪
complex. A regular expression over {aa ,b
b ,cc } was defined to be a sequence of symbols formed by repeated
application of the following rules:

i. a , b , c are each regular expressions.

ii. ; is a regular expression.

iii. ² is a regular expression.

iv. If R 1 and R 2 are regular expressions, then so is (R 1 · R 2 ).

v. If R 1 and R 2 are regular expressions, then so is (R 1 ∪ R 2 ).

vi. If R 1 is a regular expression, then so is R 1∗ .

The conditions set forth above could have instead been succinctly specified by the BNF shown below.

;|(R · R)|(R ∪ R)|R ∗

b ]cc |²²|;
R ::= a |b

The following is a typical derivation, culminating in the regular expression (a · (c ∪ ²))∗

R ⇒ R∗
⇒ (R · R)∗
a · R)∗
⇒ (a
a · (R ∪ R))∗
⇒ (a
a · (cc ∪ R))∗
⇒ (a
a · (cc ∪ ²))∗
⇒ (a

Note that in the intermediate steps of the derivation, we do not wish to consider strings such as (a a · R)∗
to be valid regular expressions. (a ∗
a · R) is not a string over the alphabet {; ;,²²,a
a ,b
b ,cc ,∪ ∗
∪,··, ,)),((}, and it does
not represent a regular language over {a a ,b
b ,cc }. To generate a valid regular expression, the derivation must
proceed until all occurrences of R are removed. To differentiate between the symbols that may remain
and those that must be replaced, grammars divide the tokens into terminal symbols and nonterminal
symbols, respectively.
The following notational conventions will be used throughout the remainder of the text. Members
of Σ, will be represented by lowercase roman letters such as a , b , c , d and will be referred to as terminal
symbols. A new alphabet Ω will be introduced, and its members will be represented by uppercase roman
letters such as A, B , C , and S, and these will be called nonterminal symbols. S will often denote a special
nonterminal, called the start symbol. The specification of the production rules will be somewhat differ-
ent from the BNF examples given above. The common grammatical notation for rules such as S ::= a S 1
and S ::= b S 1 is S → a S 1 and S → b S 1 . As with BNF, a convenient shorthand notation for a group of pro-
ductions involves the use of the | (or) symbol. The productions Z → aa aaB , Z → ac ac, Z → cb cbT , which all
denote replacements for Z , could be succinctly represented by Z → aa ac
aaB |ac cb
ac|cb
cbT.

244
A production can be thought of as a replacement rule; that is, A → cd ba indicates that occurrences
of the (nonterminal) A can be replaced by the string cd ba
ba. For example, the string ababB Ad d Bcc can be
transformed into the string ab cd bad Bcc by applying the production A → cd ba
abBcd ba; we will write
ab d Bcc ⇒ ab
abB Ad cd bad Bcc ,
abBcd
and say that ab cd bad Bcc was derived (in one step) from ab
abBcd abB Ad d Bcc . Productions may be applied in
succession; for example, if both A → cd ba and B → e f B were available, then the following modifi-
cations of the string ab d Bcc would be possible: ab
abB Ad abB Add Bcc ⇒ ab
abBcd cd bad Bcc ⇒ abe f Bcdcd bad Bcc ⇒
cd bad Bcc , and we might write ab
abe f e f Bcd d Bcc ⇒ abe f e f Bcd
abB Ad cd bad Bcc to indicate that ab
abB Ad d Bcc can
cd bad Bcc in zero or more steps (three steps in this case). Note that the distinction
produce abe f e f Bcd
between ⇒ and ⇒ is reminiscent of the difference between the state transition functions δ and δ. As
with the distinction between the transducer output functions ω and ω, the overbar is meant to indicate
*
the result of successive applications of the underlying operation. The symbol ⇒ is often used in place of
⇒.
As illustrated by Example 8.1, several nonterminals may be used in the grammar. The set of nonter-
minals in the grammar given for FORTRAN identifiers was comprised of {S, S 1 , S 2 , S 3 , S 4 , S 5 }. The start
symbol designates which of these nonterminals should always be used to begin derivations.
The previous examples discussed in this section have illustrated all the essential components of a
grammar. A grammar must specify the terminal alphabet, the set of intermediary nonterminal symbols,
and the designated start symbol, and it must also enumerate the set of rules for replacing phrases within
a derivation with other phrases. In the above examples, the productions have all involved the replace-
ment of single nonterminals with other strings. In an unrestricted grammar, a general replacement rule
may allow an entire string α to be replaced by another string β. Thus, aB cD → bc A would be a legal
production, and thus whenever the sequence aB cD is found within a derivation it can be replaced by
the shorter string bc A A.
Definition 8.1 An unrestricted or type 0 grammar over an alphabet Σ is a quadruple G = 〈Ω, Σ, S, P 〉,
where:
Ω is a (nonempty) set of nonterminals.
Σ is a (nonempty) set of terminal symbols (and Ω ∩ Σ = ;).
S is the designated start symbol (and S ∈ Ω).
P is a set of productions of the form α → β, where α ∈ (Ω ∪ Σ)∗ , β ∈ (Ω ∪ Σ)∗ .

Example 8.3
Consider the grammar
G 00 = 〈{A, B, S, T }, {a b ,cc }, S, {S → a SBcc , S → T, T → λ, T B → b T,cc B → Bcc }〉
a ,b
A typical derivation, starting from the start state S, would be:
S ⇒ (by applying S → a SBcc )
a SBcc ⇒ (by applying S → a SBcc )
aaSBcc Bcc ⇒ (by applying S → T )
aa
aaT Bcc Bcc ⇒ (by applying T B → b T )
aa
aabT c Bcc ⇒ (by applying c B → Bcc )
aab
aab
aabT Bcccc ⇒ (by applying T B → b T )
aabbT cc ⇒ (by applying T → λ)
aabb
aabbcc

245
Depending on how many times the production S → a SBcc is used, this grammar will generate strings
such as λ, abc
abc, aabbcc
aabbcc, and aaabbbccc
aaabbbccc. The set of strings that can be generated by this particular
grammar is {a a i b i c i |i ≥ 0}. In this sense, each grammar defines a language. Specifically, we require that
derivations start with the designated start symbol and proceed until only members of Σ remain in the
resulting string.

Definition 8.2 Given a grammar G = 〈Ω, Σ, S, P 〉, the language generated by G, denoted by L(G), is given
*
by L(G) = {x | x ∈ Σ∗ ∧ S ⇒ x}.

A language that can be defined by a type 0 grammar is called a type 0 language. Thus, as shown by
a i b i c i |i ≥ 0} is a type 0 language.
the grammar G 00 given in Example 8.3, L(G 00 ) = {a
The way grammars define languages is fundamentally different from the way automata define lan-
guages. An automaton is a cognitive device, in that it is used to directly decide whether a given string
should be accepted into the language. In contrast, a grammar is a generative device: the productions
specify how to generate all the words in the language represented by the grammar, but do not provide an
obvious means of determining whether a given string can be generated by those rules. There are many
applications in which it is important to be able to determine whether a given string can be generated
by a particular grammar, and the task of obtaining cognitive answers from a generative construct will be
addressed at several points later in the text. The reverse transformation, that is, producing an automa-
ton that recognizes exactly those strings that are generated by a given grammar, is addressed in the next
section.
The distinction between generative and cognitive approaches to representing languages has been
explored previously, when regular expressions were considered in Chapter 6. Regular expressions are
also a generative construct, in the sense that a regular expression can be used to begin to enumerate
the words in the corresponding regular set. As is the case with grammars, it is inconvenient to use reg-
ular expressions in a cognitive fashion: it may be difficult to tell whether a given string is among those
represented by a particular regular expression. Chapter 6 therefore explored ways to transform a regu-
lar expression into a corresponding automaton. It is likewise feasible to define corresponding automata
for certain grammars (see Lemma 8.2). However, Example 8.3 illustrated that some grammars produce
non-FAD languages and therefore cannot possibly be represented by deterministic finite automata. The
translation from a mechanical representation of a language to a grammatical representation is always
successful, in that every automaton has a corresponding grammar (Lemma 8.1). This result is similar to
Theorem 6.3, which showed that every automaton has a corresponding regular expression.
Note that in Example 8.3 the only production that specified that a string be replaced by a shorter
string was T → λ. Consequently, the length of the derived string either increased or remained constant
except where this last production was applied. Rules such as a Bcc D → be A, in which four symbols are
replaced by only three, will at least momentarily decrease the length of the string. Such productions
are called contracting productions. Grammars that satisfy the added requirement that no production
may decrease the length of the derivation are called context sensitive. Such grammars cannot generate
as many languages as the unrestricted grammars, but they have the added advantage of allowing deriva-
tions to proceed in a more predictable manner. Programming languages are explicitly designed to ensure
that they can be represented by grammars that are context sensitive.

Definition 8.3 A pure context-sensitive grammar over an alphabet Σ is a quadruple G = 〈Ω, Σ, S, P 〉,

where:

246
Ω is a (nonempty) set of nonterminals.
Σ is a (nonempty) set of terminal symbols (and Ω ∩ Σ = ;).
S is the designated start symbol (and S ∈ Ω).
P is a set of productions of the form α → β, where α ∈ (Ω ∪ Σ)∗ , β ∈ (Ω ∪ Σ)∗ , and |α| ≤ |β|.

In a derivation in a context-sensitive grammar, if S ⇒ x 1 ⇒ x 2 ⇒ . . . ⇒ x n , then we are assured that

1 = |S| ≤ |x 1 | ≤ |x 2 | ≤ · · · ≤ |x n |. Unfortunately, this means that in a pure context-sensitive grammar it is
impossible to begin with the start symbol (which has length 1) and derive the empty string (which is of
length 0).

Example 8.4
Languages that contain λ, such as {a a i b i c i |i ≥ 0} generated in Example 8.3 by the unrestricted grammar
G 00 , cannot possibly be represented by a pure context-sensitive grammar. However, the empty string
is actually the only impediment to finding an alternative collection of productions that all satisfy the
condition |α| ≤ |β|. The language {aa i b i c i |i ≥ 1} can be represented by a pure context-sensitive grammar,
as illustrated by the following grammar. Let G be given by

a ,b
G = 〈{A, B, S, T }, {a b ,cc }, S, {S → a SBcc , S → a T c , T → b , T B → b T,cc B → Bcc }〉

The derivation to produce aabbcc would now be

S ⇒ (by applying S → a SBcc )

a SBcc ⇒ (by applying S → a T c )
aaT c Bcc ⇒ (by applying c B → Bcc )
aa
aa cc ⇒ (by applying T B → b T )
aaT Bcc
aab
aabT cc ⇒ (by applying T → b )
aabbcc

The shortest string derivable by G is S ⇒ a T c ⇒ abc

abc. In Example 8.3, the shortest derivation was S ⇒
T ⇒ λ.
Any pure context-sensitive grammar can be modified to include λ by adding a new start state Z and
two new productions Z → λ and Z → S, where S was the original start state. Such grammars and their
resulting languages are generally referred to as type 1 or context sensitive.

Definition 8.4 A context-sensitive or type 1 grammar over an alphabet Σ is either a pure context-sensitive
grammar or a quadruple
G 0 = 〈Ω ∪ {Z }, Σ, Z , P ∪ {Z → λ, Z → S}〉,
where G = 〈Ω, Σ, S, P 〉 is a pure context-sensitive grammar and Z ∉ Ω ∪ Σ.

The only production α → β that violates the condition |α| ≤ |β| is Z → λ, and this production can-
not play a part in any derivation other than Z ⇒ λ. From the start symbol Z , the application Z → λ
immediately ends the derivation (producing λ), while the application of Z → S will provide no further
opportunity to use Z → λ, since the requirement that Z ∉ Ω ∪ Σ means that the other productions will
never allow Z to reappear in the derivation. Thus, G 0 enhances the generating power of G only to the
extent that G 0 can produce λ. Every string in L(G) can be derived from the productions of G 0 , and G 0
generates no new strings besides λ. This argument essentially proves that L(G 0 ) = L(G) ∪ {λ} (see the
exercises).

247
Example 8.5
The language generated by G 00 in Example 8.3 was L(G 00 ) = {a a i b i c i |i ≥ 0}. Since L(G 00 ) is {a
a i b i c i |i ≥
1} ∪ {λ}, it can therefore be represented by a context-sensitive grammar by modifying the pure context-
sensitive grammar in Example 8.4. Let G 0 be given by

G 0 = 〈{A, B, S, T, Z }, {a
a ,b
b ,cc }, Z ,
{S → a SBcc , S → a T c , T → b , T B → b T,cc B → Bcc , Z → λ, Z → S}〉

The derivation to produce aabbcc would now be

Z ⇒ (by applying Z → S)
S ⇒ (by applying S → a SBcc )
a SBcc ⇒ (by applying S → a T c )
aaT c Bcc ⇒ (by applying c B → Bcc )
aa
aa cc ⇒ (by applying T B → b T )
aaT Bcc
aab
aabT cc ⇒ (by applying T → b )
aabbcc
This grammar does produce λ, but all other derivations are strictly length-increasing. Note that this was
not the case in the grammar G 00 in Example 8.3. The last step of the derivation shown there transformed
a string of length 7 into a string of length 6. Gil does not satisfy the definition of a context-sensitive
grammar; even though only T could produce λ, T could occur later in the derivation. The presence of
T at later steps destroys the desirable property of having all other derivations strictly length-increasing
at each step. Definition 8.4 is constructed to ensure that the start symbol Z can never appear in a later
derivation step.
The restriction of productions to nondecreasing length reduces the number of languages that can
be generated; as discussed in later chapters, there exist type 0 languages that cannot be generated by
any type 1 grammar. The restriction also allows arguments about the derivation process to proceed by
induction on the number of symbols in the resulting terminal string and is crucial to the development of
normal forms for context-sensitive grammars.
We have already seen examples of different grammars generating the same set of words, as in the
grammars G 00 and G 0 from Examples 8.3 and 8.5. The term context sensitive comes from the fact that
context-sensitive languages (that is, type 1 languages) can be represented by grammars in which the
productions are all of the form αB γ → αβγ, where a single nonterminal B is replaced by the string β in
the context of the strings α on the left and γ on the right. Specialized grammars such as these, in which
there are restrictions on the form of the productions, are examples of normal forms and are discussed
later in the text.
If the productions in a grammar all imply that single nonterminals can be replaced without regard
to the context, then the grammar is called context free. In essence, this means that all productions are of
the form A → β, where the left side is just a single nonterminal and the right side is an arbitrary string.
The resulting languages are also called type 2 or context free.

Definition 8.5 A pure context-free grammar over an alphabet Σ is a quadruple G = 〈Ω, Σ, S, P 〉, where:

Ω is a (nonempty) set of nonterminals.

Σ is a (nonempty) set of terminal symbols (and Ω ∩ Σ = ;).
S is the designated start symbol (and S ∈ Ω).
P is a set of productions of the form A → β, where A ∈ Ω, β ∈ (Ω ∪ Σ)∗ .

248
Note that since the length of the left side of a context-free production is 1 and the right side cannot be
empty, pure context-free grammars have no contracting productions and are therefore pure context-
sensitive grammars. As with pure context-sensitive grammars, pure context-free grammars cannot gen-
erate languages that contain the empty string.

Definition 8.6 A context-free or type 2 grammar over an alphabet Σ is either a pure context-free grammar
or a quadruple
G 0 = 〈Ω ∪ {Z }, Σ, Z , P ∪ {Z → λ, Z → S}〉,
where G = 〈Ω, Σ, S, P 〉 is a pure context-free grammar and Z ∉ Ω ∪ Σ.

Productions of the form C → β are called C-rules. As was done with context-sensitive grammars, this
definition uses a new start state Z to avoid all such length-decreasing productions except for a single one
of the form Z → λ., which is used only for generating the empty string. Type 2 languages will therefore
always be type 1 languages. Note that the definition ensures that the only production that can decrease
the length of a derivation must be the Z-rule Z → λ.
The grammar corresponding to the BNF given in Example 8.2 would be a context-free grammar, and
thus the collection of all regular expressions is a type 2 language. The grammar given in Example 8.4
is not context free due to the presence of the production c B → Bcc , but this does not yield sufficient
evidence to claim that the resulting language {a a i b i c i |i ≥ 1} is not a context-free language. To support
this claim, it must be shown that no type 2 grammar can generate this language. A pumping lemma for
context-free languages will be presented in Chapter 10 to provide a tool for measuring the complexity of
such languages. Just as there are type 1 languages that are not type 2, there are type 0 languages that are
not type 1.
Note that even these very restrictive type 2 grammars can produce languages that are not FAD. As
shown in Example 8.2, the language consisting of the collection of all strings representing regular ex-
pressions is context free. However, this collection is not FAD, since it is clear that the pumping lemma
(Theorem 2.3) would show that a DFA could not hope to correctly match up unlimited pairs of parenthe-
ses.
Consequently, even more severe restrictions must be placed on grammars if they are to have gener-
ative powers similar to the cognitive powers of a deterministic finite automaton. The type 3 grammars
explored in the next section are precisely what is required. It will follow from the definitions that all type
3 languages are type 2. It is likewise clear that all type 2 languages must be type 1, and every type 1 lan-
guage is type 0. Thus, a hierarchy of languages is formed, from the most restrictive type 3 languages to
the most robust type 0 languages. The four classes of languages are distinct; there are type 2 languages
that are not type 3 (for example, Example 8.2), type 1 languages that are not type 2 (see Chapter 9), and
type 0 languages that are not type 1 (see Chapter 12).

8.2 Right-Linear Grammars and Automata

The grammatical classes described in Section 8.1 are each capable of generating all the FAD languages;
indeed, they even generate languages that cannot be recognized by finite automata. This section will
explore a class of grammars that generate the class of regular languages: every FAD language can be gen-
erated by one of the right-linear grammars defined below, and yet no right-linear grammar can generate
a non-FAD language.

Definition 8.7 A right-linear grammar over an alphabet Σ is a quadruple G = 〈Ω, Σ, S, P 〉, where

249
Ω is a (nonempty) set of nonterminals.
Σ is a (nonempty) set of terminal symbols (and Ω ∩ Σ = ;).
S is the designated start symbol (and S ∈ Ω).
P is a set of productions of the form A → xB , where A ∈ Ω, B ∈ (Ω ∪ λ), and x ∈ Σ∗

. Right-linear grammars belong to the class of type 3 grammars and generate all the type 3 languages.
Grammars that are right linear are very restrictive; only one nonterminal can appear, and it must appear
at the very end of the expression. Consequently, in the course of a derivation, new terminals appear
only on the right end of the developing string, and the only time the string might shrink in size is when
a (final) production of the form A → λ is applied. A right-linear grammar may have several contracting
productions that produce λ and may not strictly conform with the definition of a context-free grammar.
However, Corollary 8.3 will show that every type 3 language is a type 2 language.
Right-linear grammars generate words in the same fashion as the grammars defined in Section 8.1.
The following definition of derivation is tailored to right-linear grammars, but it can easily be generalized
to less restrictive grammars (see Chapter 9).

Definition 8.8 Let G = 〈Ω, Σ, S, P 〉 be a right-linear grammar, y ∈ Σ∗ , and A → xB be a production in

P . We will say that y xB can be directly derived from y A by applying the production A → xB , and write
y A ⇒ y xB . Furthermore, if

(x 1 A 1 ⇒ x 2 A 2 ) ∧ (x 2 A 2 ⇒ x 3 A 3 ) ∧ · · · ∧ (x n−1 A n−1 ⇒ x n A n ),

where x i ∈ Σ∗ for i = 1, 2, . . . , n, A i ∈ Ω for i = 1, 2, . . . , n − 1, and A n ∈ (Ω ∪ λ), then we will say that x 1 A 1

*
derives x n A n , and write x 1 A 1⇒ xn A n .

*
While the symbol ⇒ might be more consistent with our previous extension notations, ⇒ is most com-
monly used in the literature.

Example 8.6
a ,b
b }, S, {S → a S, S → b T, T → aa * aabaa
Let G 1 = 〈{T, S}, {a aa}〉.Then S ⇒aabaa
aabaa, since by Definition 8.2, with x 1 =
λ, x 2 = a , x 3 = aa
aa, x 4 = aab aabaa, A 1 = A 2 = A 3 = S, A 4 = T , and A 5 = λ.
aab, x 5 = aabaa

S ⇒ aS (by applying S → a S)
⇒ aa
aaS (by applying S → a S)
⇒ aab
aabT (by applying S → b T )
⇒ aabaa (by applying T → aa
aa)

Derivations similar to Example 8.1, which begin with only the start symbol S and end with a string with
symbols entirely from Σ (that is, which do not contain any nonterminals) will be the main ones in which
we are interested. As formally stated in Definition 8.2, the set of all strings (in Σ∗ ) that can be derived
from the start symbol form the language generated by the grammar G and will be represented by L(G).
*
In symbols, L(G) = {x | x ∈ Σ∗ ∧ S⇒ x}.

Example 8.7
a ,b
As in Example 8.6, consider G 1 = 〈{T, S}, {a aa}〉. Then L(G 1 ) = a ∗baa =
b }, S, {S → a S, S → b T, T → aa
baa abaa
baa,abaa
{baa aabaa
abaa,aabaa
aabaa, . . .}. Note that each of these words can certainly be produced by G 1 ; the number

250
of a s at the front of the string is entirely determined by how many times the production S → a S is used
in the derivation. Furthermore, no other words in Σ∗ can be derived from G 1 ; beginning from S, the pro-
duction S → a S may be used several times, but if no other production is used, a string of the form a n S
will be produced, and since S ∉ Σ, this is not a valid string of terminals. The only way to remove the S is
to apply the production S → b T , which will leave a string of the form a n b T , which is also not in Σ∗ . The
only production that can be applied at this point is T → aa aa, deriving a string of the form a n baa
baa. A proof
n baa
involving induction on n would be required to formally prove that L(G 1 ) = {a a | n ∈ N} = a ∗baa
baa. If G
contains many productions, such inductive proofs can be truly unpleasant.

Example 8.8
0,1
Consider the grammar Q = 〈{I , F }, {0 1,..}, I , {I → 0 I |1
1 I |0.
0. 1.F, F → λ|0
1.
0.F |1. 0F |1
1F }〉. L(Q) generates the set of
all (terminating) binary numbers including 101.11 101.11, 011.
011., 10.0
10.0, 0.010
0.010, and so on.
In a manner similar to that used for automata and regular expressions, we will consider two gram-
mars to be similar in some fundamental sense if they generate the same language. The following defini-
tion formalizes this notion.

Definition 8.9 Two grammars G 1 = 〈Ω1 , Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉 are called equivalent iff L(G 1 ) =
L(G 2 ), and we will write G 2 ' G 1 .

Example 8.9
Consider G 1 from Examples 8.6 and 8.7, and define the right-linear grammar G 8 = 〈{Z }, {a a ,b
b }, Z ,
∗
{Z → a Z , Z → baa
baa}〉. Then L(G 8 ) = a baa = L(G 1 ), and therefore G 8 ' G 1 . The concept of equivalence
applies to all types of grammars, whether or not they are right linear, and hence the grammars G 00 and G 0
from Examples 8.3 and 8.5 are likewise equivalent.
Definition 8.9 marks the fourth distinct use of the operator L and the concept of equivalence. It has
previously been used to denote the language recognized by a DFA, the language recognized by an NDFA,
and the language represented by a regular expression [although the more precise notation L(R), which
is the regular set represented by the regular expression R, has generally been eschewed in favor of the
more common convention of denoting both the set and the expression by the same symbol R]. In the
larger sense, then, a representation X of a language, regardless of whether X is a grammar, DFA, NDFA,
or regular expression, is equivalent to another representation Y iff L(X ) = L(Y ).
Our first goal in this section is to demonstrate that a cognitive representation of a language (via a
DFA) can be replaced by a generative representation (via a right-linear grammar). In the broader sense
of equivalence of representations discussed above, Lemma 8.1 shows that any language defined by a DFA
has an equivalent representation as a right-linear grammar. We begin with a definition of the class of all
type 3 languages.

Definition 8.10 Given an alphabet Σ, GΣ is defined to be the collection of all languages generated by right-
linear grammars over Σ.

The language generated by G 1 in Example 8.7 turned out to be FAD. We will now prove that every
language in GΣ is FAD, and, conversely, every FAD language L has (at least one) right-linear grammar
that generates L. This will show that GΣ = DΣ . We begin by showing that a mechanical representation A
of a language is equivalent to a grammatical representation (denoted by G A in Lemma 8.1).

251
Lemma 8.1 Given any alphabet Σ and a DFA A = 〈Σ,Q, q 0 , δ, F 〉, there exists a right-linear grammar G A
for which L(A) = L(G A ).
Proof. Without loss of generality, assume Q = {q 0 , q 1 , q 2 , . . . , q m }. Define G A = 〈Q, Σ, q 0 , P A 〉, where
P A = {q → a · δ(q,a a ∈ Σ} ∪ {q → λ | q ∈ F }. There is one production of the form s → b t for each
a ) | q ∈ Q,a
transition in the DFA, and one production of the form s → λ for each final state s in F . (It may be helpful
to look over Example 8.10 to get a firmer grasp of the nature of P A before proceeding with this proof.) Note
that the set of nonterminals Ω is made up of the names of the states in A, and the start symbol S is the
name of the start state of A.
The heart of this proof is an inductive argument, which will show that for any string x = a 1a 2 · · ·a an ∈
Σ ,
∗

q 0 ⇒ a 1 · (δ(q 0 ,a a 1 ))
⇒ a 1 · a 2 · (δ(q 0 ,a a 1a 2 ))
* a n−1 · (δ(q 0 ,a a 1a 2 . . .a
a n−1 ))
⇒ a 1 · a 2 · · ·a
a n · (δ(q 0 ,a
⇒ a 1 · a 2 · · ·a a 1a 2 . . .a
a n ))

from which it follows that, if δ(q 0 ,a

a 1a 2 . . .a
a n ) ∈ F , then
*
q 0⇒a a n · δ(q 0 ,a
a 1a 2 · · ·a a 1a 2 · · ·a
a n ) ⇒ a 1a 2 · · ·a
an

The actual inductive statement and proof is left as an exercise; given this fact, if x ∈ L(A), then δ(q 0 , x) ∈ F
*
and there is a corresponding derivation q 0⇒ x, and so x ∈ L(G A ). Thus L(A) ⊆ L(G A ). A similarly tedious
inductive argument will show that if, for some sequence of integers i 1 , i 2 , . . . , i n ,

a n qi n ,
q i 0 ⇒ a 1 q i 1 ⇒ a 1a 2 q i 2 ⇒ · · · ⇒ a 1a 2 . . .a

a n will cause the DFA (when starting in state q i 0 ) to visit the states q i 1 , q i 2 , . . . , q i n .
then the string a 1a 2 · · ·a
Furthermore, if q i n ∈ F , then, by applying the production q i n → λ, q 0 ⇒ a 1 a 2 · · ·aa n q i n ⇒ a 1 a 2 · · ·a
a n . This
will show that valid derivations correspond to strings reaching final states in A, and so L(G A ) ⊆ L(A) (see
the exercises). Thus L(G A ) = L(A).

Example 8.10
Let
b }, {S, T }, S, δ, {T }〉
a ,b
B = 〈{a
where
δ(S,a
a ) = T , δ(S,b
b) = T
δ(T,a
a ) = S , δ(T,b
b) = S

This automaton is shown in Figure 8.1. Applying the construction in Lemma 8.1, we have Ω = {S, T }, Σ =
a ,b
{a b }, S = S, and
P B = {S → a T, S → b T, T → a S, T b S, T → λ}.
Note that the derivation S ⇒ b T ⇒ ba
baS ⇒ babbabT ⇒ bab mirrors the action of the DFA as it processes the
string bab
bab, recording at each step of the derivation the string that has been processed so far, followed
by the current state of B . Conversely, in trying to duplicate the action of B as it processes the string ab
ab,
we have S ⇒ a T ⇒ ababS, which cannot be transformed into a string of only a s and b s without processing
at least one more letter, and hence ab ∉ L(G B ). Since S is not a final state, it cannot be removed from

252
Figure 8.1: The automaton discussed in Example 8.10

the derivation, corresponding to the rejection of any string that brings us to a nonfinal state. Those
strings that are accepted by B are exactly those that end in the state T , and for which we will have the
opportunity to use the production T → λ in the corresponding derivation in G B , which will leave us with
a terminal string of only a s and b s.
Lemma 8.1 showed that a cognitive representation of a finite automaton definable language can be
expressed in an appropriate generative form (via a right-linear grammar). There are many practical ap-
plications in which it is necessary to test whether certain strings can be generated by a particular gram-
mar. For unrestricted grammars, the answers to such questions can be far from obvious. In contrast,
the specialized right-linear grammars discussed in this section can always be transformed into a simple
cognitive representation: every right-linear grammar has a corresponding equivalent NDFA.

Lemma 8.2 Let Σ be any alphabet and G = 〈Ω, Σ, S, P 〉 be a right-linear grammar, then there exists an
NDFA AG (with λ-transitions) for which L(G) = L(AG ).
Proof. Define AG = 〈Σ,QG , q 0G , δG , FG 〉, where

QG = {〈z〉 | z = λ ∨ z ∈ Ω ∨ ∃y ∈ Σ∗ and ∃B ∈ Ω
such that B → y z is a production in P }
q 0G = {〈S〉}
FG = {〈λ〉},

and δG is comprised of (normal) transitions of the form

δG (〈w〉,a
a ) = {〈x〉 | ∃y ∈ (Ω ∪ Σ)∗ , ∃B ∈ Ω 3 w = a x ∧ B → y w
is a production in P }

δG also contains some λ-transitions of the form

δG (〈B 〉, λ) = {〈v〉 | B → v is a production in P }

As in the proof of Lemma 8.1, there is a one-to-one correspondence between paths through the machine
and derivations in the grammar. Inductive statements will be the basis from which it will follow that
L(AG ) = L(G) (see the exercises).

The following example may be helpful in providing a firmer grasp of the nature of AG .

Example 8.11
a ,b
Let G 1 = 〈{T, S}, {a b }, S, {S → a S, S → b T, T → aa
aa}〉. Then

a ,b
AG 1 = 〈{a b }, {〈a
a S〉, 〈S〉, 〈b
b T 〉, 〈T 〉, 〈aa
aa a 〉, 〈λ〉}, {〈S〉}, δG 1 , {〈λ〉}〉,
aa〉, 〈a

where δG 1 is given by

253
Figure 8.2: The automaton corresponding to the grammar G 1

δG 1 (〈S〉, λ) = {〈aa S〉, 〈b

b T 〉} δG 1 (〈T 〉, λ) = {〈aa
aa
aa〉}
δG 1 (〈a
a S〉,a
a ) = {〈S〉} δG 1 (〈b
b T 〉,b
b ) = {〈T 〉}
δG 1 (〈aa
aa
aa〉,aa ) = {〈a
a 〉} δG 1 (〈aa 〉,a
a ) = {〈λ〉}

and all other transitions are empty [for example, δG 1 (〈S〉,a

a ) = ;]. This automaton is shown in Figure 8.2.
Note that abaa is accepted by this machine by visiting the states 〈S〉, 〈a a S〉, 〈S〉, 〈b
b T 〉, 〈T 〉, 〈aa
aa a 〉, 〈λ〉,
aa〉, 〈a
a ab
and that the corresponding derivation in G 1 is S ⇒ S ⇒ abT ⇒ abaa. abaa

Theorem 8.1 Given any alphabet Σ, GΣ = DΣ .

Proof. Lemma 8.1 guaranteed that every DFA has a corresponding grammar, and so DΣ ⊆ GΣ . By
Lemma 8.2, every grammar has a corresponding NDFA, and so GΣ ⊆ WΣ = DΣ . Thus GΣ = DΣ .

8.3 Regular Grammars and Regular Expressions

The grammars we have considered so far are called right linear because productions are constrained to
have the resulting nonterminal appear to the right of the terminal symbols. We next consider the class of
grammars that arises by forcing the lone nonterminal to appear to the left of the terminal symbols.

Definition 8.11 A left-linear grammar over an alphabet Σ is a quadruple G = 〈Ω, Σ, S, P 〉, where:

Ω is a (nonempty) set of nonterminals.

Σ is a (nonempty) set of terminal symbols (and Ω ∩ Σ = ;).
S is the designated start symbol (and S ∈ Ω).
P is a set of productions of the form A → B x, where A ∈ Ω, B ∈ (Ω ∪ λ), and x ∈ Σ∗ .

cd , where the nonterminal B occurs to the left

Note that a typical production might now look like A → Bcd
of the terminal string cd .

Example 8.12
a ,b
Let G 2 = 〈{A, S}, {a b }, S, {S → Abaa
baa a , A → λ}〉. Then
baa, A → Aa

L(G 2 ) = a ∗baa = {baa

baa abaa
baa,abaa aabaa
abaa,aabaa
aabaa, . . .} = L(G 1 ),

254
and so G 2 ' G 1 (compare with Example 8.7). Note that there does not seem to be an obvious way to
transform the right-linear grammar G 1 discussed in Example 8.7 into an equivalent left-linear grammar
such as G 2 (see the exercises).
As was done for right-linear grammars in the last section, we could show that these left-linear gram-
mars also generate the set of regular languages by constructing corresponding machines and grammars
(see the exercises). However, we will instead prove that left-linear grammars are equivalent in power to
right-linear grammars by applying known results from previous chapters. The key to this strategy is the
reverse operator r (compare with Example 4.10 and Exercises 5.20 and 6.36).

a n−1a n ∈ Σ∗ ; define x r = a n a n−1 · · ·a

Definition 8.12 For an alphabet Σ, and x = a 1a 2 · · ·a a 2a 1 . For a lan-
r r r
guage L over Σ, define L = {x | x ∈ L}. For a grammar G = 〈Ω, Σ, S, P 〉, define G = 〈Ω, Σ, S, P 0 〉, where P 0 is
given by P 0 = {A → x r | A → x was a production in P }.

Lemma 8.3 Let G be a right-linear grammar. Then G r is a left-linear grammar, and L(G r ) = L(G)r . Simi-
larly, if G is a left-linear grammar, then G r is a right-linear grammar, and again L(G r ) = L(G)r .
Proof. A straightforward induction on the number of productions used to produce a given terminal
string (see the exercises). It can be shown that S⇒ *
B x by applying n productions from G iff S⇒ *
x r B by
r
applying n corresponding productions from G .

Example 8.13
Consider
a ,b
G 3 = 〈{T, S}, {a b ,cc ,d
d }, S, {S → ab
abS, S → cd T, T → b T, T → b }〉.

Then
G 3r = 〈{T, S}, {a
a ,b
b ,cc ,d
d }, S, {S → Sba
ba
ba, S → T d cc, T → T b , T → b }〉,
ab)∗cd bb ∗ , L(G 3r ) = b ∗bd cc(ba
ab
L(G 3 ) = (ab ba)∗ ,
ba

and
L(G 3r ) = L(G 3 )r [and L(G 3 ) = L(G 3r )r ].

Theorem 8.2 Let Σ be an alphabet. Then the class of languages generated by the set of all left-linear gram-
mars over Σ is the same as the class of languages generated by the set of all right-linear grammars over
Σ.
Proof. Let G be a left-linear grammar. G r is then a right-linear grammar, and L(G r ) is therefore FAD
by Theorem 8.1. Since DΣ is closed under the reverse operator r (see Exercise 5.20), L(G r )r is also FAD. But,
by Lemma 8.3, L(G r )r = L(G), and so L(G) is FAD. Hence every left-linear grammar generates a member of
DΣ and therefore (by Lemma 8.3) has a corresponding right-linear grammar.
Conversely, if L is generated by a right-linear grammar, then L is a language in DΣ , and so is L r (as
shown by Exercise 5.20 or 6.36). Since DΣ = GΣ , there is a right-linear grammar G that generates L r ,
and hence G r is a left-linear grammar that generates L (why?). Thus every right-linear grammar has a
corresponding left-linear grammar.

Definition 8.13 A regular or type 3 grammar is a grammar that is either right-linear or left-linear.

Thus, the languages generated by left-linear (and hence regular) grammars are referred to as type 3 lan-
guages. The class of type 3 languages is exactly GΣ .

255
Corollary 8.1 The class of languages generated by regular grammars is equal to GΣ .
Proof. The proof follows immediately from Theorem 8.2.

With the correspondences developed between the grammatical descriptors and the mechanical con-
structs, it is possible to transform a regular expression into an equivalent grammar by first transforming
the representation of the language into an automaton (as described in Chapter 6) and then applying
Lemma 8.1 to the resulting machine. Conversely, the grammar G 1 in Example 8.11 gives rise to the seven-
state NDFA AG 1 (using Lemma 8.2), which could in turn be used to generate seven equations in seven
unknowns. These could then be solved for a regular expression representing L(G 1 ) via Theorems 6.1
and 6.2. A much more efficient method, which generates equations directly from the productions them-
selves, is outlined in the following theorem.

Theorem 8.3 Let G = 〈{S 1 , S 2 , . . . , S n }, Σ, S 1 , P 〉 be a right-linear grammar, and for each nonterminal S i
define X si to be the set of all terminal strings that can be derived from S i by using the productions in P . X S 1
then represents L(G), and these sets satisfy the language equations

X S k = E k ∪ A k1 X S 1 ∪ A k2 X S 2 ∪ · · · ∪ A kn X S n , for k = 1, 2, . . . , n

where E i is the union of all terminal strings x that appear in productions of the form S i → x, and A i j is
the union of all terminal strings x that appear in productions of the form S i → xS j .
Proof. Since S 1 is the start symbol, X S 1 is by definition the set of all words that can be derived from the
start symbol, and hence X S 1 = L(G). The relationships between the variables X S i essentially embody the
relationships enforced by the productions in P .

Example 8.14
Consider G 1 from Example 8.11, in which

a ,b
G 1 = 〈{T, S}, {a b }, S, {S → a S, S → b T, T → aa
aa}〉.

The corresponding equations are

b XT
X S = ; ∪ a X S ∪b
X T = aa ∪ ;X S ∪ ;X T

Eliminating X T via Theorem 6.2 yields X S = baa ∪ a X S . Theorem 6.1 can be applied to this equation to
yield X S = L(G 1 ) = a ∗baa
baa. Solving these two equations is indeed preferable to appealing the resulting
NDFA from Example 8.11 and solving the corresponding seven equations.

Example 8.15
Let Σ = {a
a ,b
b ,cc }, and consider the set of all words that end in b and for which every c is immediately
followed by a . This can be succinctly described by the grammar G = 〈{S}, {a a ,b
b ,cc }, S, {S → a S, S → b S, S →
ca
aS, S → b }〉. The resulting one equation in the single unknown X S is X S = b ∪(aa ∪b b ∪cc a
a)X S , and Theorem
6.1 can be applied to yield a regular expression for this language; that is, X S = (aa ∪b a)∗b .
b ∪cc a
Unfortunately, another grammar that generates this same language is

G 0 = 〈{S}, {a b ,cc }, S, {S → λS, S → a S, S → b S, S → c a

a ,b aS, S → b }〉.

256
In this case, however, the resulting one equation in the single unknown X S is X S = b ∪ (λ ∪ a ∪bb ∪cc a
a)X S ,
and Theorem 6.1 explicitly prohibits λ from appearing as a coefficient of an unknown. The equation
no longer has a unique solution; other solutions are now possible, such as X S = Σ∗ . Nevertheless, the
reduction described by Theorem 6.1 still predicts the correct expression for this language; that is, X S =
ca)∗b . For equations arising from grammatical constructs, the desired solution will always
(λ ∪ a ∪ b ∪ ca
be the minimal solution predicted by the technique used in Theorem 6.1. The condition prohibiting λ
from appearing in the set A in the equation X = E ∪ AX was required to guarantee a unique solution.
Regardless of the nature of the set A, A ∗ E is guaranteed to be a solution, and it will be contained in any
other solution, as restated in Lemma 8.4.

Lemma 8.4 Let E and A be any two sets, and consider the language equation X = E ∪ AX . A ∗ E is always
a solution for X , and any other solution Y must satisfy the property A ∗ E ⊆ Y .
Proof. Follows immediately from Theorem 6.1.

Consider again the grammar

G 0 = 〈{S}, {a b ,cc }, S, {S → λS, S → a S, S → b S, S → c a

a ,b aS, S → b }〉,

which generates the language (λ ∪ a ∪ b ∪ c a a)∗b . The corresponding equation was X = E ∪ AX , where
E = b , and A = (λ ∪ a ∪ b ∪ ca
ca). Note that E represents the set of terminal strings that can be generated
from S using exactly one production, while A · E = (λ ∪ a ∪ b ∪ c a b is the set of all strings that can be
a)b
generated from S using exactly two productions. Similarly, A · A · E represents all terminal strings that
can be generated from S using exactly three productions. By induction, it can be shown that A n−1 · E is
the set of all strings that can be generated from S using exactly n productions. From this it follows that
the minimal solution A ∗ E is indeed the language generated by the grammar.
Clearly, a useless production of the form S → λS in a grammar can simply be removed from the pro-
duction set without affecting the language that is generated. In the above example, it was the production
S → λS that was responsible for λ appearing in the coefficient set A. It is only the nongenerative produc-
tions, which do not produce any terminal symbols, that can give rise to a nonunique solution. However,
the removal of productions of the form V → λT will require the addition of other productions when T
is a different nonterminal than V . Theorem 9.4, developed later, will show that these grammars can be
transformed into equivalent grammars that do not contain productions of the form V → λT . Theorem
8.4 shows that it is not necessary to perform such transformations before producing equations that will
provide equivalent regular expressions: the techniques outlined in Theorem 6.2 can indeed be used to
solve systems of equations, even if the coefficients contain the empty word. Indeed, the minimal solution
found in this manner will be the regular expression sought. This robustness is similar to that found in
Theorem 6.3, which was stated for deterministic finite automata. Regular expressions for nondetermin-
istic finite automata can be generated by transforming the NDFA into a DFA and then applying Theorem
6.3, but it was seen that it is both possible and more efficient to apply the method directly to the NDFA
without performing the transformation. The following theorem justifies that a transformation to a well-
behaved grammar is an unnecessary step in the algorithm for finding a regular expression describing the
language generated by a right-linear grammar.

Lemma 8.5 Consider the system of equations in the unknowns X 1 , X 2 , . . . , X n given by

257
X 1 = E 1 ∪ A 11 X 1 ∪ A 12 X 2 ∪ · · · ∪ A 1(n−1) X n−1 ∪ A 1n X n
X 2 = E 2 ∪ A 21 X 1 ∪ A 22 X 2 ∪ · · · ∪ A 2(n−1) X n−1 ∪ A 2n X n
·
·
·
X n−1 = E n−1 ∪ A (n−1)1 X 1 ∪ A (n−1)2 X 2 ∪ · · · ∪ A (n−1)(n−1) X n−1 ∪ A (n−1)n X n
X n = E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 ∪ A nn X n

a. Define Ê i = E i ∪ (A i n · A ∗nn · E n ) for all i = 1, 2, . . . , n − 1 and

Â i j = A i j ∪ (A i n · A ∗nn · A ni ) for all i , j = 1, 2, . . . , n − 1.

Any solution of the original set of equations will agree with a solution of the following set of n − 1 equa-
tions in the unknowns X 1 , X 2 , . . . , X n−1 :

X 1 = Ê 1 ∪ Â 11 X 1 ∪ Â 12 X 2 ∪ · · · ∪ Â 1(n−1) X n−1
X 2 = Ê 2 ∪ Â 21 X 1 ∪ Â 22 X 2 ∪ · · · ∪ Â 2(n−1) X n−1
·
·
·
X n−1 = Ê n−1 ∪ Â (n−1)1 X 1 ∪ Â (n−1)2 X 2 ∪ · · · ∪ Â (n−1)(n−1) X n−1

b. Given a solution to the above n − 1 equations in (a), that solution can be used to find a compatible
expression for the remaining unknown:

X n = A ∗nn · (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 )

c. This system has a unique minimal solution in the following sense: Let W1 ,W2 , . . . ,Wn denote the solu-
tion found by eliminating variables and back-substituting as specified in (a) and (b). If Y1 , Y2 , . . . , Yn is
any other solution to the original n equations in n unknowns, then W1 ⊆ Y1 ,W2 ⊆ Y2 , . . ., and Wn ⊆ Yn .

Proof. This proof is by induction on the number of equations. Lemma 8.4 proved the basis step for
n = 1. As in Theorem 6.2, the inductive step is proved by considering the last of the n equations,

X n = (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 ) ∪ A nn X n

This can be thought of as an equation in the one unknown X n with a coefficient of A nn for X n , and the
remainder of the expression a “constant” term not involving X n .
For a given solution for X 1 through X n−1 Lemma 8.4 can therefore be applied to the above equation in
the one unknown X n , with coefficients

E = (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 )

and A = A nn to find a minimal solution for X, for the corresponding values of X 1 through X n−1 . This is
exactly as given by part (b) above:

X n = A ∗nn · (E n ∪ A n1 X 1 ∪ A n2 X 2 ∪ · · · ∪ A n(n−1) X n−1 )

258
or
X n = A ∗nn · E n ∪ A ∗nn · A n1 X 1 ∪ A ∗nn · A n2 X 2 ∪ · · · ∪ A ∗nn · A n(n−1) X n−1 )

Specifically, if X 1 through X n−1 are represented by a minimal solution W1 through Wn−1 , then Lemma 8.4
implies that the inclusion of Wn , given by

Wn = A ∗nn · E n ∪ A ∗nn · A n1W1 ∪ A ∗nn · A n2W2 ∪ · · · ∪ A ∗nn · A n(n−1)Wn−1

will yield a minimal solution W1 through Wn of the original n equations in n unknowns.

The minimal solution for the n −1 equations in the unknowns X 1 , X 2 , . . . , X n−1 , denoted by W1 through
Wn−1 can be found by substituting this particular solution Wn for X n in each of the other n − 1 equations.
If the k t h equation is represented by

X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · · ∪ A kn X n

then the substitution will yield

X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · ·
∪ (A kn · (A ∗nn · E n ∪ A ∗nn · A n1 X 1 ∪ A ∗nn · A n2 X 2 ∪ · · · ∪ A ∗nn · A n(n−1) X n−1 )

Due to the nature of union and concatenation, no other solution for X n can possibly allow a smaller so-
lution for X 1 , X 2 , . . . , X n−1 to be found. Specifically, if Yn is a solution satisfying the nth equation, then
Lemma 8.4 guarantees that Wn ⊆ Yn , and consequently

X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · · ∪ A kn Wn ⊆ E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · · ∪ A kn Yn

Thus, the minimal value for each X k is compatible with the substitution of Wn defined earlier. Hence, by
using the distributive law, the revised equation becomes

X k = E k ∪ A k1 X 1 ∪ A k2 X 2 ∪ · · · ∪ (A kn · A ∗nn · E n ∪ A kn · A ∗nn · A n1 X 1
∪ A kn · A ∗nn · A n2 X 2 ∪ · · · ∪ A kn · A ∗nn · A n(n−1) X n−1 )

Collecting like terms yields

X k = (E k ∪ A kn · A ∗nn · E n ) ∪ (A k1 X 1 ∪ A kn · A ∗nn · A n1 X 1 )
∪ (A k2 X 2 ∪ A kn · A ∗nn · A n2 X 2 ) ∪ · · · ∪ (A k(n−1) X n−1 ∪ A kn · A ∗nn · A n(n−1) X n−1 ),
or
X k = (E k ∪ A kn · A ∗nn · E n ) ∪ (A k1 ∪ A kn · A ∗nn · A n1 )X 1
∪ (A k2 ∪ A kn · A ∗nn · A n2 )X 2 ∪ · · · ∪ (A k(n−1) ∪ A kn · A ∗nn · A n(n−1) )X n−1

The constant term in this equation is (E k ∪ A kn · A ∗nn ·E n ), and the coefficient for X j is Â k j = A k j ∪(A kn · A ∗nn ·
A n j ), which agrees with the formula given in (a). The substitution of X n was shown to yield a minimal set
of n − 1 equations in the unknowns X 1 through X n−1 , and the induction assumption guarantees that the
elimination and back-substitution method yields a minimal solution for W1 through Wn−1 . Lemma 8.4
then guarantees that the solution for

Wn = A ∗nn · E n ∪ A ∗nn · A n1W1 ∪ A ∗nn · A n2W2 ∪ · · · ∪ A ∗nn · A n(n−1)Wn−1 )

is minimal, which completes the minimal solution for the original system of n equations.

259
As with Lemma 8.4, the minimal expressions thus generated describe exactly those terminal strings
that can be produced by a right-linear grammar. In an analogous fashion, left-linear grammars give rise
to a set of left-linear equations, which can be solved as indicated in Theorem 6.4.
The above discussion describes the transformation of regular grammars into regular expressions.
Generating grammars from regular expressions hinges on the interpretation of the six building blocks of
regular expressions, as described in Definition 6.2. Since GΣ is the same as DΣ , all the closure proper-
ties known about DΣ must also apply to GΣ , but it can be instructive to reprove these theorems using
grammatical constructions. Such proofs will also provide guidelines for directly transforming a regular
expression into a grammar without first constructing a corresponding automaton.

Theorem 8.4 Let Σ be an alphabet. Then GΣ is effectively closed under union.

Proof. Let G 1 = 〈Ω1 , Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉 be two right-linear grammars, and without loss
of generality assume that Ω1 ∩Ω2 = ;. Choose a new nonterminal Z such that Z ∉ Ω1 ∪Ω2 , and consider the
new grammar G ∪ defined by G ∪ = 〈Ω1 ∪ Ω2 ∪ {Z }, Σ, Z , P 1 ∪ P 2 ∪ {Z → S 1 , Z → S 2 }〉. It is straightforward to
show that L(G ∪ ) = L(G 1 )∪L(G 2 ) (see the exercises). From the start symbol Z there are only two productions
that can be applied; if Z → S 1 is chosen, then the derivation will have to continue with productions from
P 1 and produce a word from L(G 1 ) (why can’t productions from P 2 be applied?). Similarly, if Z → S 2 is
chosen instead, the only result can be a word from L(G2).

In an analogous fashion, effective closure of GΣ can be demonstrated for the operators Kleene closure
and concatenation. The proof for Kleene closure is outlined below. The construction for concatenation
is left for the exercises; the technique is illustrated in Example 8.18.

Theorem 8.5 Let Σ be an alphabet. Then GΣ is effectively closed under Kleene closure.
Proof. Let G = 〈Ω, Σ, S, P 〉 be a right-linear grammar. Choose a new nonterminal Z such that Z ∉ Ω,
and consider the new grammar G ∗ defined by

G ∗ = 〈Ω ∪ {Z }, Σ, Z , P ∗ 〉,

where
P ∗ = 〈{Z → λ, Z → S}
∪{A → xB | (x ∈ Σ∗ ) ∧ (A, B ∈ Ω) ∧ (A → xB ∈ P )}
∪{A → x Z | (x ∈ Σ∗ ) ∧ (A ∈ Ω) ∧ (A → x ∈ P )}〉.

That is, all productions in P that end in a nonterminal are retained, while all other productions in P
are appended with the new symbol Z , and the two new productions Z → λ and Z → S are added. A
straightforward induction argument will show that the derivations that use n applications of productions
of the form A → x Z generate exactly the words in L(G)n . Consequently, L(G ∗ ) = L(G)∗ .

Theorem 8.6 Let Σ be an alphabet. Then GΣ is effectively closed under concatenation.

Proof. See the exercises.

Corollary 8.2 Every regular expression has a corresponding right-linear grammar.

Proof. While this follows immediately from the fact that GΣ = DΣ and Theorem 6.1, the previous
theorems outline an effective procedure for transforming a regular expression into a right-linear grammar.
This can be proved by induction on the number of operators in the regular expression. The basis step
consists of the observation that expressions with zero operators, which must be of the form ;, λ, or a , can

260
be represented by the right-linear grammars 〈{S}, Σ, S, {S → S}〉, 〈{S}, Σ, S, {S → λ}〉, and 〈{S}, Σ, S, {S → a }〉,
respectively.
To prove the inductive step, choose an arbitrary regular expression R with m +1 operators, and identify
the outermost operator. R must be of the form R 1 ∪ R 2 or R 1 · R 2 or R 1∗ , where R 1 (and R 2 ) have m or fewer
operators. By the induction hypothesis, R 1 (and R 2 ) can be represented as right-linear grammars, and
therefore by Theorem 8.4, 8.5, or 8.6, R can also be represented by a right-linear grammar. Any regular
expression can thus be methodically transformed into an equivalent right-linear grammar.

Example 8.16
Let Σ = {aa ,b
b ,cc }, and consider the regular expression (a a ∪b
b ). The grammars G 1 = 〈{R}, {a
a ,b
b ,cc }, R, {R → a }〉
and G 2 = 〈{T }, {a a ,b
b ,cc }, T, {T → b }〉 can be combined as suggested in Theorem 8.4 (with A playing the role
of Z ) to form G = 〈{T, R, A}, {a a ,b
b ,cc }, A, {A → R, A → T, R → a , T → b }〉.

Example 8.17
b )∗ . The grammar
a ∪b
Consider the regular expression (a

a ,b
G = 〈{T, R, A}, {a b ,cc }, A, {A → R, A → T, R → a , T → b }〉

from Example 8.16 can be modified as suggested in Theorem 8.5 to form

b ,cc }, Z , {Z → λ, Z → A, A → R, A → T, R → a Z , T → b Z }〉.
a ,b
G ∗ = 〈{T, R, A, Z }, {a

b )∗ .
a ∪b
G ∗ generates (a

Example 8.18
b )∗c . The grammars
a ∪b
Consider the regular expression (a

b ,cc }, Z , {Z → λ, Z → A, A → R, A → T, R → a Z , T → b Z }〉.
a ,b
G ∗ = 〈{T, R, A, Z }, {a

and
a ,b
G 3 = 〈{V }, {a b ,cc },V, {V → c }〉

can be combined with modified productions to form

G 0 = 〈{T, R, A, Z ,V, S}, {a

a ,b
b ,cc }, S,
{S → Z , Z → λV, Z → A, A → R, A → T, R → a Z , T → b Z ,V → c }〉.

G 0 generates (a
a ∪bb )∗ c .
The previous examples illustrate the manner in which regular expressions can be systematically
translated into right-linear grammars. Constructions corresponding to those given in Theorems 8.4, 8.5,
and 8.6 can similarly be found for left-linear grammars (see the exercises).
Normal forms for grammars are quite useful in many contexts. A standard representation can be
especially useful in proving theorems about grammars. For example, the construction given in Lemma
8.2 would have been more concise and easier to investigate if complex productions such as S → bc aa
aaT
could be avoided. Indeed, if all productions in the grammar G had been of the form A → a B or A → λ,
both the state set and the state transition function of AG could have been defined more easily. Other

261
Figure 8.3: An automaton corresponding to a grammar in normal form

constructions and proofs may also be able to make use of the simpler types of productions in grammars
that conform to such normal forms. The following theorem guarantees that a given right-linear gram-
mar has a corresponding equivalent grammar containing only productions that conform to the above
standard.

Theorem 8.7 Every right-linear grammar G has an equivalent right-linear grammar G 1 in which all pro-
ductions are of the form A → a B or A → λ.
Proof. Let G be a right-linear grammar. By Lemma 8.2, there exists an NDFA AG that is equivalent to G.
d d
From Chapter 4, AG is an equivalent deterministic finite automaton, and Lemma 8.1 can be applied to AG
to form an equivalent right-linear grammar. By the construction given in Lemma 8.1, all the productions
in this grammar are indeed of the form A → a B or A → λ. .

Note that the proof given is a constructive proof: rather than simply arguing the existence of such a
grammar, a method for obtaining G 1 is outlined. The above theorem could have been proved without
relying on automata constructs. Basically, “long” productions like T → abc abcR would be replaced by a
series of productions involving newly introduced nonterrninals, for example, T → a X , X → b Y , Y → c R.
Similarly, a production like T → aa might be replaced by the sequence T → a B, B → a C ,C → λ. If the
existence of such a normal form had been available for the proof of Lemma 8.2, the construction of AG
could have been simplified and the complexity of the proof drastically curtailed. Indeed, the resulting
machine would have contained no λ-moves. Only one state per nonterminal would have been necessary,
with final states corresponding to nonterminals that had productions of the form A → λ. Productions of
the form A → a B would imply that B ∈ δ(A,a a ).

Example 8.19
b }, S, {S → a S, S → b T, T → a B, B → a C ,C → λ}〉 can be represented by the NDFA
a ,b
G = 〈{S, T, B,C }, {a
shown in Figure 8.3.
In practice, given an arbitrary right-linear grammar G, the work associated with finding the complex
machine defined in Lemma 8.2 has simply been replaced by the effort needed to transform G into the ap-
propriate normal form. Nevertheless, the guarantee that regular languages have grammars that conform
to the above normal form is useful in many proofs, as illustrated above and in Theorem 8.8 below.
As with context-free and context-sensitive languages, the contracting productions can be limited to
Z → λ, where Z is the start symbol. This is only necessary if λ ∈ L; if λ ∉ L, there need be no contracting
productions at all. We wish to show how to produce a right-linear grammar with no more than one
contracting production. By relying on the existence of the normal form described in Theorem 8.7, this
can be done without dealing with right-linear grammars in their full generality.

262
Theorem 8.8 Every right-linear grammar G has an equivalent right-linear grammar G 0 in which the start
symbol Z never appears on the right in any production, and the only length-contracting production that
may appear is Z → λ. Furthermore, all other productions are of the form A → a B or A → a .
Proof. Let G = 〈Ω, Σ, S, P 〉 be a right-linear grammar. Without loss of generality, assume that G is of
the form specified by Theorem 8.7. (If G were not of the proper form, Theorem 8.7 guarantees that an
equivalent grammar that is in the proper form could be found and used in place of G.) Choose a new
nonterminal Z such that Z ∉ Ω, and consider the new grammar G 0 defined by G 0 = 〈Ω ∪ {Z }, Σ, Z , P 0 〉,
where P 0 contains Z → S, and all productions from P of the form A → xB , where x ∈ Σ∗ and A, B ∈ Ω. P 0
also contains the productions in the set {A → a | (∃B ∈ Ω)(A → a B ∈ P ∧B → λ ∈ P )}. Finally, if S → λ was a
production in P , then Z → λ is included in P 0 . Note that no other productions of the form B → λ are part
of P 0 . Other productions have been added to compensate for this loss. Derivations using the productions
in P 0 typically start with Z → S, then proceed with productions of the form A → xB , and terminate with
one production of the form A → a . The corresponding derivation in the original grammar G would be
very similar, but would start with the old start symbol S and therefore avoid the Z → S application used
in G 0 . The productions of the form A → xB are common to both grammars, and the final step in G 0 that
uses A → a would be handled by two productions in G: A → a B and B → λ. An induction argument on
the number of productions in a derivation will show that every derivation from G 0 has a corresponding
derivation in G that produces the same terminal string, and vice versa. Thus, L(G 0 ) = L(G), which justifies
that G 0 is equivalent to G. G 0 was constructed to conform to the conditions specified by the theorem, and
thus the proof is complete.

Corollary 8.3 Every type 3 language is also a type 2 language.

Proof. Let L be a type 3 language. Then there exists a right-linear grammar G that generates L. By
Theorem 8.8, there is an equivalent right-linear grammar G 0 that satisfies the definition of a context-free
grammar. Thus, L is context free.

Section 8.1 explored several generalizations of the definition of a regular grammar, and, unlike the
generalization from DFAs to NDFAs, new and larger classes of languages result from these generaliza-
tions. These new types of grammars will be explored in the following chapters, and the corresponding
generalized machines will be developed.

Exercises
8.1. Can strings like ab
abB Add Bcc (where B and A are nonterminals) ever be derived from the start symbol
S in a right-linear grammar? Explain.

8.2. Given A and G A as defined in Lemma 8.1, let P (n) be the statement that (∀x ∈ Σn )(∃ j ∈ N) [if t 0⇒
*
xt j ,
then δ A (t 0 , x) = t j ]. Prove that P (n) is true for all n ∈ N.

8.3. Give regular expressions that describe the language generated by:

a ,b
(a) G 4 = 〈{S, A, B,C ,V,W, X }, {a b ,cc }, S, {S → ab A|bb
bb cc ab,C → λ|cc S,V →
ccV, A → b C |cc X , B → ab
bbB |cc
a V |cc X ,W → aa
aa|aa W, X → b V |aaaa X }〉
(b) G 5 = 〈{S 0 , S 1 , S 2 }, {0 1}, S 0 , {S 0 → λ|0
0,1 0S 2 |1
1S 1 , S 1 → 0 S 1 |1
1S 2 , S 2 → 0 S 2 |1
1S 0 }〉
a ,b
(c) G 6 = 〈{T, Z }, {a b }, Z , {Z → a Z , Z → b T, T → a Z }〉
a ,b
(d) G 7 = 〈{S, B,C }, {a b ,cc }, S, {S → a S|ab
ab
abB |cc C , B → ab
abB |λ,C → c C |cc a
a}〉

263
8.4. Use the inductive fact proved in Exercise 8.2 to formally prove Lemma 8.1.

8.5. Draw the automata corresponding to the grammars given in Exercise 8.3.

8.6. Give, if possible, right-linear grammars that will generate:

b ,cc }∗ that do not contain two consecutive b s.

a ,b
(a) All words in {a
b ,cc }∗ that do contain two consecutive b s.
a ,b
(b) All words in {a
b ,cc }∗ that have the same number of a s as b s.
a ,b
(c) All words in {a
b ,cc }∗ that have an even number of a s.
a ,b
(d) All words in {a
b ,cc }∗ that do not end in the letter b .
a ,b
(e) All words in {a
b ,cc }∗ that do not contain any c s.
a ,b
(f) All words in {a

8.7. Give left-linear grammars that will generate the languages described in Exercise 8.6.

8.8. Complete the inductive portion of the proof of Theorem 8.8.

8.9. Complete the inductive portion of the proof of Theorem 8.5.

8.10. Use the more efficient algorithm indicated in Theorem 8.3 to find regular expressions to describe
L(G 5 ), L(G 6 ), and L(G 7 ) in Exercise 8.3.

8.11. (a) Restate Theorem 8.3 so that it generates valid language equations for left-linear grammars.
(b) Restate Lemmas 8.4 and 8.5 for these new types of equations.
(c) Use your new methods to find a regular expression for L(G 2 ) in Example 8.12.

0,1
8.12. Consider the grammar Q = 〈{I , F }, {0 1,..}, I , {I → 0 I |1
1 I |0.
0. 1.F, F → λ|0
1.
0.F |1. 0F |1
1F }〉. L(Q) generates the
set of all (terminating) binary numbers including 101.11, 011., 10.0, 0.010, and so on.

(a) Find the corresponding NDFA for this grammar.

(b) Write the right-linear equations corresponding to this grammar.
(c) Solve the equations found in part (b) for both unknowns.

8.13. Find right-linear grammars for:

(a) (a b )cc ∗ (d
a ∪b ab)∗ )
d ∪ (ab
ab
b )∗a (a
a ∪b
(b) (a b )∗
a ∪b

8.14. Find left-linear grammars for:

(a) (a b )cc ∗ (d
a ∪b ab)∗ )
d ∪ (ab
ab
b )∗a (a
a ∪b
(b) (a b )∗
a ∪b

8.15. (a) Describe an efficient algorithm that will convert a right-linear grammar into a left-linear gram-
mar.
(b) Apply your algorithm to
G 4 = 〈{S, A, B,C ,V,W, X }, {aa ,b
b ,cc }, S, {S → ab A|bb
bb cc
ccV, A → b C |cc X , B → ab
bbB |cc ab,
C → λ|cc S,V | → a V |cc X ,W → aa aa|aa W, X → b V |aa
aa X }〉

264
8.16. Describe an algorithm that will convert a given regular grammar G into another regular grammar
G 0 that generates the complement of L(G).

8.17. Without appealing to the tricks used in Chapter 12, outline an algorithm that will determine whether
the language generated by a given regular grammar G is empty.

8.18. Without appealing to the tricks used in Chapter 12, outline an algorithm that will determine whether
the language generated by a given regular grammar G is infinite.

8.19. Without appealing to the tricks used in Chapter 12, outline an algorithm that will determine whether
two right-linear grammars G 1 and G 2 generate the same language.

8.20. Consider the grammar

b ,cc }, S, {S → a SBcc , S → λ, SB → b S,cc B → Bcc }〉

a ,b
H = 〈{A, B, S}, {a

Determine L(H ).

8.21. What is wrong with proving that GΣ is closed under concatenation by using the following construc-
tion? Let G 1 = 〈ΩI , Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉 be two right-linear grammars, and, without loss
of generality, assume that Ω1 ∩ Ω2 = ;. Choose a new nonterminal Z such that Z ∉ Ω1 ∪ Ω2 , and
define a new grammar G 0 = 〈Ω1 ∪ Ω2 ∪ {Z }, Σ, Z , P 1 ∪ P 2 ∪ {Z → S 1 · S 2 }〉. Note: It is straightforward
to show that L(G 0 ) = L(G 1 ) · L(G 2 ) (see Chapter 9).

8.22. Prove that GΣ is closed under concatenation by:

(a) Constructing a new grammar G 0 with the property that L(G 0 ) = L(G 1 ) · L(G 2 ).
(b) Proving that L(G 0 ) = L(G 1 ) · L(G 2 ).

8.23. Use the constructs presented in this chapter to solve the following problem from Chapter 4: Given
a nondeterministic finite automaton A without λ-transitions, show that it is possible to construct a
nondeterministic finite automaton with λ-transitions A 0 with the properties (1) A 0 has exactly one
start state and exactly one final state, and (2) L(A 0 ) = L(A).

8.24. Complete the proof of Lemma 8.2 by:

(a) Defining an appropriate inductive statement.

(b) Proving the statement defined in part (a).

8.25. Complete the proof of Lemma 8.3 by:

(a) Defining an appropriate inductive statement.

(b) Proving the statement defined in part (a).

8.26. Fill in the details in the second half of the proof of Theorem 8.2 by providing reasons for each of the
assertions that were made.

8.27. a n baa
(a) Refer to Example 8.7 and use induction to formally prove that L(G 1 ) = {a baa|n ∈ N}.
a n baa
(b) Refer to Example 8.9 and use induction to formally prove that L(G 8 ) = {a baa|n ∈ N}.

265
8.28. Notice that regular grammars are defined to have production sets that contain only right-linear-
type productions or only left-linear-type productions. Consider the following grammar C , which
contains both types of productions:

0,1
C = 〈{S, A, B }, {0 1}, S, {S → 0 A|1
1B |0
0|1
1|λ, A → S0
0, B → S1
1}〉.

0 ⇒ 01
Note that S ⇒ 0 A ⇒ 0 S0 0 ⇒ 01
01B0 10 ⇒ 0110
01S10 0110.

(a) Find L(C ).

(b) Is L(C ) FAD?
(c) Should the definition of regular grammars be expanded to include grammars like this one?
Explain.

8.29. (a) Why was it important to assume that Ω1 ∩ Ω2 = ; in the proof of Theorem 8.4? Give an exam-
ple.
(b) Why was it possible to assume that Ω1 ∩ Ω2 = ; in the proof of Theorem 8.4? Give a justifica-
tion.

8.30. Consider the NDFA AG defined in Lemma 8.2. If AG is disconnected, what does this say about the
grammar G?

8.31. Apply Lemma 8.1 to the automata in Figure 8.4.

8.32. (a) Restate Lemma 8.1 so that it directly applies to NDFAs.

(b) Prove this new lemma.
(c) Assume Σ = {a
a ,b
b ,cc } and apply this new lemma to the automata in Figure 8.5.

8.33. Let Σ = {a
a ,b
b }. Define context-free grammars for the following languages:

(a) L 1 = all words over Σ∗ for which the last letter matches the first letter.
(b) L 2 = all odd-length words over Σ∗ for which the first letter matches the center letter.
(c) L 3 = all words over Σ∗ for which the last letter matches none of the other letters.
(d) L 4 = all even-length words over Σ∗ for which the two center letters match.
(e) L 5 = all odd-length words over Σ∗ for which the center letter matches none of the other letters.
(f) Which of the above languages are regular?

8.34. Define context-free grammars for the following languages:

b }∗ | |x|a < |x|b }

a ,b
(a) L = {x ∈ {a
b }∗ | |x|a ≥ |x|b }
a ,b
(b) G = {x ∈ {a
1 }∗ | w = w r }
0,1
(c) K = {w ∈ {0
(d) Φ = {x ∈ {a b ,cc }∗ | ∃i , j , k ∈ N 3 x = a j b k c m , where j ≥ 3 and k = m}
a ,b

8.35. Define context-free grammars for the following languages:

b }∗ | |x|a = 2|x|b }
a ,b
(a) L 1 = {x ∈ {a

266
Figure 8.4: Automata for Exercise 8.31

267
Figure 8.5: Automata for Exercise 8.32

268
b }∗ | |x|a 6= |x|b }
a ,b
(b) L 2 = {x ∈ {a
A ,B
(c) The set of all postfix expressions over the alphabet {A B ,+
+,−
−}
A ,B
(d) The set of all parenthesized infix expressions over the alphabet {A B ,+ −,((,))}
+,−

8.36. Define context-sensitive grammars for the following languages:

(a) Γ = {x ∈ {0
0,1 2}∗ | ∃w ∈ {0
1,2 1}∗ 3 x = w · 2 · w} = {2
0,1 2,121
121 020
121,020 11211
020,11211 10210
11211,10210
10210, . . .}
b }∗ | ∃ j ∈ N 3 |x| = 2 j } = {b
(b) Φ = {x ∈ {b b ,bb
bb bbbb
bb,bbbb b 8 ,b
bbbb,b b 16 ,b
b 32 , . . .}

8.37. Consider the grammar

b ,cc }, S, {S → a SBcc , S → λ, SB → b S,cc B → Bcc }〉

a ,b
G = 〈{A, B, S}, {a

Show that this context-sensitive grammar is not equivalent to G 00 given in Example 8.3, where

G 00 = 〈{A, B, S, T }, {a b ,cc }, S, {S → a SBcc , S → T, T → λ, T B → b T,cc B → Bcc }〉

a ,b

8.38. Design context-free grammars that accept:

(a) L 1 = a ∗ (b
b ∪cc )∗ ∩ {x ∈ {a b ,cc }∗ | |x|a = |x|b + |x|c }
a ,b
b ,cc }∗ | ∃i , j , k ∈ N 3 x = a i b j c k , where i + j = k}
a ,b
(b) L 2 = {x ∈ {a
b ,cc }∗ | |x|a + |x|b = |x|c }
a ,b
(c) L 3 = {x ∈ {a

8.39. Refer to Definition 8.4 and prove that L(G 0 ) = L(G) ∪ {λ}.

8.40. Refer to Definition 8.6 and prove that L(G 0 ) = L(G) ∪ {λ}.

8.41. (a) Show that if G is in the form specified in Theorem 8.8, so is the G ∗ described in Theorem 8.5.
(b) Give an example that shows that, even if G 1 and G 2 are in the form specified in Theorem 8.8,
the grammar G ∪ described in Theorem 8.4 may not be.
(c) Is your construction for G • in Example 8.18 normal form preserving?

8.42. Given two left-linear grammars G 1 and G 2 , give a set of rules to find a new left-linear grammar that
will generate:

(a) L(G 1 ) ∪ L(G 2 )

(b) L(G 1 ) · L(G 2 )
(c) L(G 1 )∗

269
270
Chapter 9

Context-Free Grammars

The preceding chapter explored the properties of the type 3 grammars. The next class of grammars in the
language hierarchy, the type 2 or context-free grammars, are central to the linguistic aspects of computer
science. Context-free grammars were originally used to help specify natural languages and are thus well-
suited for defining computer languages. These context-free grammars represent a much wider class
of languages than did the regular grammars. Due to the need for balancing parentheses and matched
begin-end pairs (among other things), the language Pascal cannot be specified by a regular grammar, but
it can be defined with a context-free grammar. Programming languages are specifically designed to be
representable by context-free grammars in order to take advantage of the desirable properties inherent
in type 2 grammars. These properties are explored in this chapter, while Chapter 10 investigates the
generalized automata corresponding to context-free languages.

9.1 Parse Trees

Derivations in a context-free grammar are similar to those of regular grammars, and the definition of
derivation given below is compatible with that given in Definition 8.8.

Definition 9.1 Let Σ be any alphabet, G = 〈Ω, Σ, S, P 〉 be a right-linear grammar, αAγ ∈ (Σ ∪ Ω)∗ , and
A → β be a production in P . We will say that αβγ can be directly derived from αAγ by applying the
production A → β, and write αAγ ⇒ αβγ. Furthermore, if (α1 ⇒ α2 ) ∧ (α = 2 ⇒ α3 ) ∧ · · · ∧ (αn−1 ⇒ αn ),
*
then we will say that α1 derives αn and write α1⇒ αn .

*
As with Definition 8.8, α1⇒ α1 in zero steps. In generating a particular string, regular grammars
typically allowed only a single sequence of applicable productions. Context-free grammars are generally
more robust, as shown by Example 9.4, which illustrates several derivations for a single string.
The special nature of the productions in a context-free grammar, which replace a single nonterminal
with a string of symbols, allow derivations to be diagrammed in a treelike structure, much as sentences
are diagrammed in English. For example, the rules of English specify that a sentence is composed of a
subject followed by a predicate, which is reflected in the production

Other rules include

271
Figure 9.1: A parse tree for the English grammar

and
<predicate> → <verb><prepositional phrase>

A specific sequential application of these and other rules to form an English sentence might be dia-
grammed as shown in Figure 9.1. Such diagrams are called parse trees or derivation trees.

Definition 9.2 A parse tree or derivation tree for a regular or context-free grammar G = 〈Ω, Σ, S, P 〉 is a
labeled, ordered tree in which the root node is labeled S, and the n subtrees of a node labeled A are labeled
α1 through αn only if A → α1 · α2 · · · αn is a production in P , and each a i ∈ (Ω ∪ Σ). However, if B → λ is a
production in P , then a node labeled B may instead have a single subtree labeled λ. The parse tree is called
complete if no leaf is labeled with a nonterminal.

Recall that for context-free grammars only the start symbol Z can have a production of the form
B → λ; regular grammars are allowed to have several such rules.

Example 9.1
As illustrated in Figure 9.1, a parse tree shows a particular sequence of substitutions allowed by a given
grammar. A left-to-right rendering of the leaves of this complete parse tree yields the terminal string “the
check is in the mail.”

272
Figure 9.2: The parse tree discussed in Example 9.2

Example 9.2
Regular grammars form parse trees that are much more restrictive; at any given level in the tree, only one
node can be labeled with a nonterminal. Figure 9.2 shows the parse tree for the word aaabaa from the
grammar
a ,b
G 1 = 〈{T, S}, {a b }, S, {S → a S, S → b T, T → aa
aa}〉.

In general, since productions in a right-linear grammar allow only the rightmost symbol to be a non-
terminal, parse trees for right-linear grammars will only allow the rightmost child of a node to have a
nontrivial subtree.

Example 9.3
Given a context-free grammar G, a common task required of compilers is to scan a proposed terminal
string x belonging to L(G) and build a parse tree corresponding to x. If G is the “regular expression”
grammar defined in Example 8.2, G = 〈{R}, {a b ,cc ,((,)),²²,;
a ,b ;,∪ b |cc |²²|;
∪,··, ∗ }, R, {R → a |b ;|((R · R))|((R ∪ R))|R ∗ }〉 and
∗
x is ((a ∪ b) · c)
c), the desired result would be a representation of the tree shown in Figure 9.3.
In a perfect world of perfect programmers, it might be appropriate to assume that x can definitely
be generated by the productions in G. In our world, however, compilers must unfortunately perform
the added task of determining whether it is possible to generate the proposed terminal string x, that
is, whether the file presented represents a syntactically correct program. This is typically done as the
parse trees are being built, and discrepancies are reported to the user. For the “regular expression” gram-
mar used in Example 9.3, there is an algorithm for scanning the symbols of proposed strings such as
((a ∪ b)∗· c) to determine whether a parse tree can be constructed. In the case of a string like ((a ∪ b) b),
no such parse tree exists, and the string therefore cannot be generated by the grammar. If the produc-
tions of a grammar follow certain guidelines, the task of finding the correct scanning algorithm is greatly
simplified. The desired properties that should be inherent in a programming language grammar are
investigated later in the text.
In a separate phase, after the parse trees are found, the compiler then uses the trees and other con-
structs to infer meaning to the program, that is, to generate appropriate machine code that reflects the
advertised meaning (that is, the semantics) of the program statements. For example, the parse tree for

273
Figure 9.3: The parse tree discussed in Example 9.3

b )∗ ·cc ) in Figure 9.3 clearly shows both the order in which the operators ∪, ·, and ∗ should be applied
a ∪b
((a
and the expressions to which they should be applied.
Given a particular complete parse tree for a string x, there may be some freedom in the order in
which the associated productions are applied.

Example 9.4
For the grammar

b ,cc ,((,)),²²,;
a ,b
G = 〈{R}, {a ∪,··, ∗ }, R, {R → a |b
;,∪ ;|((R · R))|((R ∪ R))|R ∗ }〉
b |cc |²²|;

each of the following are valid derivations of the string x = ((a ∪ b)∗· c)
c).
Derivation 1:
R ⇒ (R · R)
⇒ (R ∗ · R)
⇒ ((R ∪ R)∗ · R)
a ∪ R)∗ · R)
⇒ ((a
a ∪b
⇒ ((a b )∗ · R)
a ∪b
⇒ ((a b )∗ ·cc )
Derivation 2:
R ⇒ (R · R)
⇒ (R ∗ · R)
⇒ ((R ∪ R)∗ · R)
⇒ ((R ∪ R)∗ ·cc )
⇒ ((R ∪b b )∗ ·cc )
a ∪b
⇒ ((a b )∗ ·cc )

274
Derivation 3:
R ⇒ (R · R)
⇒ (R ·cc )
⇒ (R ∗ ·cc )
⇒ ((R ∪ R)∗ ·cc )
a ∪ R)∗ ·cc )
⇒ ((a
a ∪b
⇒ ((a b )∗ ·cc )

Derivation 4:
R ⇒ (R · R)
⇒ (R ·cc )
⇒ (R ∗ ·cc )
⇒ ((R ∪ R)∗ ·cc )
⇒ ((R ∪b b )∗ ·cc )
a ∪b
⇒ ((a b )∗ ·cc )

Definition 9.3 A derivation sequence is called a leftmost derivation if at each step in the sequence the
leftmost nonterminal is next expanded to produce the following step. A derivation sequence is called a
rightmost derivation if at each step in the sequence the rightmost nonterminal is next expanded to produce
the following step.

The first of the derivations given in Example 9.4 is a leftmost derivation since at each step it is always
the leftmost nonterminal that is expanded to arrive at the next step. Similarly, the last of these, derivation
4, is a rightmost derivation. There are many other possible derivations, such as derivations 2 and 3, which
are neither leftmost nor rightmost.
The restrictions on regular grammars ensure that there is never more than one nonterminal present
at any point during a derivation. This linear nature of regular grammars ensures that all derivations of
a parse tree follow exactly the same sequence, since there is never a choice of nonterrninals to expand.
Thus, the rightmost derivation of a parse tree in a regular grammar is always the same as its leftmost
derivation.
Parse trees in context-free grammars are generally more robust, allowing several different derivation
sequences to correspond to the same tree. For a given parse tree, though, there is only one leftmost
derivation. In Figure 9.4, the nodes in the parse tree for ((a a ∪ b )∗ · c ) are numbered to show the order in
which they would be visited by a preorder traversal. Note that the sequence in which the nonterminals
would be expanded in a leftmost derivation corresponds to the order in which they appear in the pre-
order traversal.

9.2 Ambiguity
Whereas each tree corresponds to a unique leftmost derivation, it is possible for a terminal string to have
more than one leftmost derivation. This will happen whenever a string x corresponds to more than one
parse tree, that is, whenever there are truly distinct ways of applying the productions of the grammar to
form x. Grammars for which this can happen are called ambiguous.

Definition 9.4 A grammar G = 〈Ω, Σ, S, P 〉 is called ambiguous if there exists a string x ∈ Σ∗ that corre-
sponds to two distinct parse trees. A grammar that is not ambiguous is called unambiguous.

275
Figure 9.4: The preorder traversal of the parse tree

Example 9.5
a }, S, {S → A A, A → a Sa
Consider the grammar G 2 = 〈{S, A}, {a a , A → a }〉. Figure 9.5 shows the two distinct
aaaaa
parse trees associated with the word aaaaa. Note that the leftmost derivations corresponding to these
trees are indeed different:
S ⇒ AA
⇒ a Sa
aA
aA
⇒ a A Aa
⇒ aa AaaA
⇒ aaaa A
⇒ aaaaa
is the sequence indicated by the parse tree in Figure 9.5a, while
S ⇒ AA
⇒aA
⇒ aa
aaSaa
a
⇒ aa A Aa
⇒ aaa Aaa
⇒ aaaaa
corresponds to Figure 9.5b.
Recall that context-free grammars are used to inspect statements within a computer program and
determine corresponding parse trees. Such ambiguity is undesirable in a grammar that describes a pro-
gramming language, since it would be unclear which of the trees should be used to infer the meaning
of the string. Indeed, this ambiguity would be intolerable if a statement could give rise to two trees that
implied different meanings, as illustrated in Example 9.6 below. It is therefore of practical importance to
avoid descriptions of languages that entail this sort of ambiguity.

276
Figure 9.5: (a) A parse tree for aaaaa in Example 9.5 (b) An alternate parse tree for aaaaa

The language defined by the grammar G 2 in Example 9.5 is actually quite simple. Even though G 2
a 2 ,a
is not a regular grammar, it can easily be shown that L(G 2 ) is the regular set {a a 5 ,a
a 8 ,a
a 11 ,a
a 14 , . . .}.
The ambiguity is therefore not inherent in the language, but is rather a consequence of the needlessly
complex grammar used to describe the language. A much simpler context-free grammar is given by
a }, T, {T → aaa
G 3 = 〈{T }, {a aaaT, T → aa
aa}〉. This grammar happens to be right linear and is definitely not
ambiguous.

Example 9.6

The following sampling from a potential programming language grammar illustrates the semantic prob-
a ,b
lems that can be caused by ambiguity. Consider the grammar G s = 〈{<expression>, <identifier>}, {a b,
d ,−
c ,d −}, <expression>, P 〉, where P consists of the productions

277
<expression> → <identifier>
<expression> → <identifier> – <expression>
<expression> → <expression> – <identifier>
<identifier> → a
<identifier> → b
<identifier> → c
<identifier> → d

L(G s ) then contains the string a − b − d , which can be generated by two distinct parse trees, as shown in
Figure 9.6. Figure 9.6a corresponds to the following leftmost derivation.

<expression> ⇒ <expression> – <identifier>

⇒ <identifier> – <expression> – <identifier>
⇒ a− <expression> – <identifier>
⇒ a− <identifier> – <identifier>
⇒ a − b− <identifier>
⇒ a −b −d

Figure 9.6b corresponds to a different leftmost derivation, as shown below.

<expression> ⇒ <identifier> – <expression>

⇒ a− <expression>
⇒ a− <identifier> – <expression>
⇒ a − b− <expression>
⇒ a − b− <identifier>
⇒ a −b −d

If the productions of G s were part of a grammatical description of a programming language, there are
obvious semantics associated with the productions involving the − operator. The productions

<expression> → <identifier> − <expression>

and
<expression> → <expression> − <identifier>

indicate that two values should be combined using the subtraction operator to form a new value. The
compiler would be responsible for generating code that carried out the appropriate subtraction. Unfor-
tunately, the two parse trees give rise to functionally different code. For the parse tree in Figure 9.6a, the
subtraction will be performed left to right, while in the parse tree in Figure 9.6b the ordering of the opera-
tors is right to left. Subtraction is not a commutative operation, and the expression (a −b)−d will usually
produce a different value than a − (b − d ). Ambiguity can thus be a fatal flaw in a grammar describing a
programming language.
In the language L(G s ) discussed in Example 9.6, the ambiguity is again not inherent in the language
itself, but is rather a consequence of the specific productions in the grammar G s describing the language.
In most programming languages, the expression a − b − d is allowed and has a well-defined meaning.
Most languages decree that such expressions be evaluated from left to right, and hence a − b − d would
be interpreted as (a a−b b)−d
−d . This interpretation can be enforced by simply removing the production

<expression> → <identifier> − <expression>

278
Figure 9.6: (a) A parse tree for a − b − d in Example 9.6 (b) An alternate parse tree for a − b − d

279
from G s to form the new grammar
a ,b
G m = 〈{<expression>, <identifier>}, {a b ,cc ,d −}, <expression>, P r 〉
d ,−
where P r consists of the productions
<expression> → <identifier>
<expression> → <expression> – <identifier>
<identifier> → a
<identifier> → b
<identifier> → c
<identifier> → d
It should be clear that G s and G m are equivalent, and both generate the regular language ((a a ∪b b ∪cc ∪
d ) · − )∗ · (a
a ∪ b ∪ c ∪ d ). G m gives rise to unique parse trees and is therefore unambiguous. It should be
noted that the language could have been defined with a single nonterminal; a simpler grammar equiva-
lent to G m is G t = 〈{T }, {a a ,b
b ,cc ,d
d ,− b |cc |d
−}, T, {T → a |b d |T − T }〉. However, since G t is ambiguous, it is much
more difficult to work with than G m . The pair of nonterminals <expression> and <identifier> are used to
circumvent the ambiguity problem in this language. For the grammar G m , the production
<expression> → <expression> − <identifier>
contains the nonterminal <expression> to the left of the subtraction token and <identifier> to the right of
the − . Since <identifier> can only be replaced by a terminal representing a single variable, the resulting
parse tree will ensure that the entire expression to the left of the − will be evaluated before the operation
corresponding to this current subtraction token is performed. In this fashion, the distinction between
the two nonterminals forces a left-to-right evaluation sequence. In fact, a more robust language with
other operators like × and ÷ will require more nonterminals to enforce the default precedence among
these operators.
Most modern programming languages employ a solution to the ambiguity problem that is different
from the one just described. Programmers generally do not want to be constrained by operators that
can only be evaluated from left to right, and hence matched parentheses are used to indicate an order
of evaluation that may differ from the default. Thus, unambiguous grammars that correctly reflect the
meaning of expressions like d − (b − c) or even (a) − ((c − (d ))) are sought.

Example 9.7
The following grammar G P allows expressions with parentheses, minus signs, and single-letter identifiers
to be uniquely parsed.
a ,b
G P = 〈{<expression>, <identifier>}, {a b ,cc ,d −,((,))}, <expression>, P 00 〉
d ,−
where P 00 consists of the productions
<expression> → (<expression>)
<expression> → <expression> – (<expression>)
<expression> → <identifier>
<expression> → <expression> – <identifier>
<identifier> → a
<identifier> → b
<identifier> → c
<identifier> → d

280
The first two productions in P 00 , which were not present in P 0 , are designed to handle the balancing
of parentheses. The first rule allows superfluous sets of parentheses to be correctly recognized. The sec-
ond rule ensures that an expression that is surrounded by parentheses is evaluated before the operator
outside those parentheses is evaluated. In the absence of parentheses, the left-to-right ordering of the
operators is maintained. Figure 9.7 illustrates the unique parse tree for the expression (a) − ((c − (d ))).
G P is a context-free language that is too complex to be regular; the pumping lemma for regular sets
(Theorem 2.3) can be used to show that is impossible for a DFA to maintain an unlimited number of cor-
responding balanced parentheses. This language, and the others discussed so far, can all be expressed
by unambiguous grammars. It should be clear that every language generated by grammars has ambigu-
ous grammars that also generate it, since an unambiguous grammar can always be modified to become
ambiguous. What is not immediately clear is whether there are languages that can only be generated by
ambiguous grammars.

Definition 9.5 A context-free language L is called inherently ambiguous if every grammar that generates
L is ambiguous. A context-free language that is not inherently ambiguous is called unambiguous.

Definition 9.6 Let the class of context-free language L over the alphabet Σ be denoted by CΣ . Let the class
of unambiguous context-free languages be denoted by UΣ

Theorem 9.1 There are context-free languages that are inherently ambiguous; that is, UΣ is properly con-
tained in CΣ .
Proof. The language L = {aa n b n c m d m |n, m ∈ N} ∪ {a a i b j c j d i |i , j ∈ N} is a context-free language (see the
exercises). L is also inherently ambiguous, since there must exist two parse trees for some of the strings
in the intersection of the two sets {a a n b n c m d m |n, m ∈ N} and {a a i b j c j d i |i , j ∈ N}. The proof of this last
statement is tedious to formalize; the interested reader is referred to [HOPC].

Theorem 9.1 states that there exist inherently ambiguous type 2 languages. No type 3 language is
inherently ambiguous. Even though there are regular grammars that are ambiguous, every regular gram-
mar has an equivalent grammar that is unambiguous. This assertion is supported by the following ex-
amples and results.

Example 9.8
Consider the following right-linear grammar G r :

a ,b
G r = 〈{S, A,C }, {a b ,cc }, S, {S → Abc
bc
bc, S → ab
abC , A → a ,C → c }〉

Only one terminal string can be derived from G r but this word has two distinct derivation trees, as shown
in Figure 9.8. Thus, there are regular grammars that are ambiguous.

Theorem 9.2 Given any right-linear grammar G = 〈Ω, Σ, S, P 〉, there exists an equivalent right-linear
grammar that is unambiguous.
Proof. Let G 0 = G (A σd ) . That is, beginning with the right-linear grammar G, use the construction out-
G
lined in Lemma 8.2 to find the corresponding automaton AG . Use Definition 4.9 to remove the lambda-
transitions and Definition 4.5 to produce a deterministic machine, and then apply the construction out-
lined in Lemma 8.1 to form the new right-linear grammar G 0 . By Lemma 8.2, Theorem 4.2, Theorem 4.1,
and Lemma 8.1, the language defined by each of these constructs is unchanged, so G 0 is equivalent to G.

281
Figure 9.7: The parse tree discussed in Example 9.7

Figure 9.8: The parse trees discussed in Example 9.8

Due to the deterministic nature of the machine from which this new grammar was built, the resulting
parse tree for a given string must be unique, since only one production is applicable at any point in the
derivation. A formal inductive statement of this property is left as an exercise.

Corollary 9.1 The class GΣ of languages generated by regular grammars is properly contained in UΣ .
Proof. Containment follows immediately from Theorem 9.2. Proper containment is demonstrated by
the language and grammar discussed in Example 9.3.

Example 9.9
The right-linear grammar

a ,b
G r = 〈{S, A,C }, {a b ,cc }, S, {S → a B, S → ab
abC , B → bc
bc,C → c }〉

282
in Example 9.8 can be transformed, as outlined in Theorem 9.2, into an unambiguous grammar. The
automaton corresponding to G, found by applying the technique given in Lemma 8.2 is shown in Figure
9.9a. The version of this automaton without lambda-moves (with the inaccessible states not shown)
is illustrated in Figure 9.9b. The deterministic version, with the disconnected states again removed, is
given in Figure 9.9c. For simplicity, the states are relabeled in Figure 9.9d. The corresponding grammar
specified by Lemma 8.1 is

G 0 = 〈{S 0 , S 1 , S 2 , S 3 , S 4 }, {a
a ,b
b ,cc }, S 0 , {S 0 → a S 1 |b b S 4 |cc S 4 , S 1 → a S 4 |b b S 2 |cc S 4 ,
S 2 → a S 4 |b b S 4 |cc S 3 , S 3 → λ|a a S 4 |bb S 4 |cc S 4 , S 4 → a S 4 |b b S 4 |cc S 4 }〉

The orderly nature of this resulting type of grammar easily admits the specification of an algorithm
that scans a proposed terminal string and builds the corresponding parse tree. The partial parse tree for
a string such as abb would be as pictured in Figure 9.10a. This would clearly be an invalid string since S 4
cannot be replaced by λ. By contrast, the tree for the word abc would produce a complete parse tree, and
it is instructive to step through the process by which it is built. The root of the tree must be labeled S 0 , and
scanning the first letter of the word abc is sufficient to determine that the first production to be applied is
S 0 → a S 1 (since no other S 0 -rule immediately produces an a ). Scanning the next letter provides enough
information to determine that the next S 1 rule that is used must be S 1 → b S 2 , and the third letter admits
the production S 2 → c S 3 and no other. Recognizing the end of the string causes a check for whether the
current non terminal can produce the empty string. Since S 3 → λ is in the grammar, the string abc is a
valid terminal string, and corresponds to the parse tree shown in Figure 9.10b.
Grammars that admit scanning algorithms like the one outlined above are called LL0 grammars since
the parse tree can be deduced using a left-to-right scan of the proposed string while looking ahead 0
symbols to produce a leftmost derivation. That is, the production that produces a given symbol can be
immediately determined without regard to the symbols that follow.
a }, T, {T → aaa
Note that the grammar G 3 = 〈{T }, {a aaaT, T → aa
aa}〉 is LL2; that is, upon seeing a , the scan-
ner must look ahead two symbols to see if the end-of-string marker is imminent. In this grammar, a may
be produced by either of the two T -rules; the letters following this symbol in the proposed string are an
important factor in determining which production must be applied. The language described by G 3 is
simple enough to be defined by a grammar that is LL0, since every regular grammar can be transformed
as suggested by the proof of Theorem 9.2.
The deterministic orderliness of LL0 grammars may be generally unattainable, but it represents a de-
sirable goal that a compiler designer would strive to approximate when specifying a grammatical model
of a programming language. When a grammar is being defined to serve as a guide to construct a com-
piler, an LL0 grammar is clearly the grammar of choice. Indeed, if even a portion of a context-free gram-
mar conforms to the LL0 property, this is of considerable benefit. Whereas the technique outlined in
Theorem 9.2 could be applied to any regular language to find a hospitable LL0 grammar, programming
languages are generally more complex than regular languages, and these languages are unlikely to have
LL0 models. For context-free languages, it is much more likely that it will not be possible to immedi-
ately determine which production (or sequence of productions) will produce the symbol currently being
scanned. In such cases, it will be necessary to look ahead to successive symbols to make this determina-
tion.
A classic example of the need to look ahead in parsing programming languages is reflected in the
following FORTRAN statement:
DO 77 I = 1.5
Since FORTRAN allows blanks within identifiers, this is a valid statement and should cause the variable

283
Figure 9.9: (a) The automaton discussed in Example 9.9 (b) The simplified automaton discussed in Ex-
ample 9.9 (c) The deterministic automaton discussed in Example 9.9 (d) The final automaton discussed
in Example 9.9

284
Figure 9.10: (a) The partial parse tree for the string abb (b) The parse tree for the string abc

D077I to be assigned the value 1.5. On the other hand, the statement

DO 77 I = 1, 5

specifies a “do” loop, and has an entirely different meaning. A lexical analyzer that sees the three charac-
ters ‘DO ’ cannot immediately determine whether this represents a token for a do loop, or is instead part
of a variable identifier. It may have to wait until well after the equal sign is scanned to correctly identify
the tokens.

9.3 Canonical Forms

The definition of a context-free grammar was quite broad, and it is desirable to establish canonical forms
that will restrict the type of productions that can be employed. Unrestricted context-free grammars do
not admit very precise relationships between the strings generated by the grammar and the production
sequences generating those strings. In particular, the length of a terminal string may bear very little
relation to the number of productions needed to generate that string.

Example 9.10

A string of length 18 can be generated with only three applications of productions from the grammar

a ,b
〈{S}, {a b ,cc }, S, {S → abc abc
abcS, S → abc abc
abc}〉

285
A string of length 1 can be generated by no less than five productions in the grammar

a b c }, S 1 , {S 1 → S 2 , S 2 → S 3 , S 3 → S 4 , S 4 → S 5 , S 5 → a }
〈{S 1 , S 2 , S 3 , S 4 , S 5 }, {a

It should be clear that even more extreme examples can be defined, in which the number of terminal
symbols markedly dominates the number of productions, and vice versa.
The pumping theorem for context-free grammars (Theorem 9.7) and other theorems hinge on a more
precise relationship between the number of terminal symbols produced and the number of productions
used to produce those symbols. Grammars whose production sets satisfy more rigorous constraints are
needed if such relationships are to be guaranteed. The constraints should not be so severe that some
context-free languages cannot be generated by a set of productions that conform to the restrictions. In
other words, some well-behaved normal forms are sought.
A practical step toward that goal is the abolition of productions that cannot participate in valid
derivations. The algorithm for identifying such productions constitutes an application of the algorithms
developed previously for finite automata. The following definition formally identifies productions that
cannot participate in valid derivations.

Definition 9.7 A production A → β in a context-free grammar G = 〈Ω, Σ, S, P 〉 is useful if it is part of a

derivation beginning with the start symbol and ending with a terminal string. That is, the A-rule A → β is
* *
useful if there is a derivation S ⇒ αAω ⇒ αβω ⇒ x, where x ∈ Σ∗ .

A production that is not useful is called useless.

A nonterminal that does not appear in any useful production is called useless.
A nonterminal that is not useless is called useful.

Example 9.11
Consider the grammar with productions
S → g Aee , S → a Y B, S → C Y
A → b B Y , A → ooooC
B → dd ,B → D
C → j V B,C → g i
D →n
U → kW
V → ba X X X ,V → o V
W →c
X →fV
Y → Y hm
This grammar illustrates the three basic ways a nonterminal can qualify as useless.

1. For the nonterminal W above, it is impossible to find a derivation from the start symbol S that
produces a sentential form containing W . U also lacks this quality.

2. No derivation containing the nonterminal Y can produce a terminal string. X and V are likewise
useless for the same reason.

3. B is only produced in conjunction with useless nonterminals, and it is therefore useless also. Once
B is judged useless, D is seen to be useless for similar reasons.

286
Theorem 9.3 Every nonempty context-free language L can be generated by a context-free grammar that
contains no useless productions and no useless nonterminals.
Proof. Note that if L were empty the conclusion would be impossible to attain: the start symbol would
be useless, and every grammar by definition must have a start symbol. Assume that L is a nonempty
context-free language. By Definition 8.6, there is a context-free grammar G = 〈Ω, Σ, S, P 〉 that generates
L. The desired grammar G u can be formed from G, with the useless productions removed from P and
the useless nonterminals removed from Ω. The new grammar G u will be equivalent to G, since the lost
items were by definition unable to participate in significant derivations. G u will then obviously contain
no useless productions and no useless nonterminals.
A grammar with the desired properties must therefore exist, but the outlined argument does not indi-
cate how to identify the items that must be removed. The following algorithm, based on the procedures
used to investigate finite automata, shows how to effectively transform a context-free grammar G into an
equivalent context-free grammar G u with no useless items.
Several nondeterministic finite automata over the (unrelated) alphabet {I } will be considered, each
identical except for the placement of the start state. The states of the NDFA correspond to nonterminals
of the grammar, and one extra state, denoted by ω, is added to serve as the only final state. A transition
from A to C will arise if a production in P allows A to be replaced by a string containing the nonterminal
C . States corresponding to nonterminals that directly produce terminal strings will also have transitions
to the sole final state ω. Formally, for the grammar G = 〈Ω, Σ, S, P 〉 and any nonterminal B ∈ Ω, define the
NDFA A B = 〈{1 1}, Ω ∪ {ω}, B, δ, {ω}〉, where δ is defined by δ(ω,1
1) = ;, and for each A ∈ Ω, let

1) = {C |(C ∈ Ω ∧ (∃α, γ ∈ (Ω ∪ Σ)∗ )(A → αC γ ∈ P ))} ∪ {ω}

δ(A,1

if (∃α ∈ Σ∗ )(A → α ∈ P ), and

1) = {C |(C ∈ Ω ∧ (∃α, γ ∈ (Ω ∪ Σ)∗ )(A → αC γ ∈ P ))}

δ(A,1

otherwise.
Note that, for any two nonterminals R and Q in Ω, A R and AQ are identical except for the specification
of the start state. The previously presented algorithms for determining the set of connected states in an
automaton can be applied to these new automata to identify the useless nonterminals. As noted before,
there are three basic ways a nonterminal can qualify as useless. The inaccessible states in the NDFA A S
correspond to nonterminals of the first type and can be eliminated from both the grammar and the au-
tomata. For each remaining nonterminal B , if the final state ω is not accessible in A B , then B is a useless
nonterminal of the second type and can be eliminated from further consideration in both the grammar
and the automata. Checking for disconnected states in the pared-down version of A S will identify useless
nonterminals of the third type. The process can be repeated until no further disconnected states are found.

Example 9.12
Consider again the grammar introduced in Example 9.11. The structure of each of the automata is similar
to that of A S , shown in Figure 9.11a. Note that the disconnected states are indeed W and U , which can
be eliminated from the state transition table. Checking the accessibility of ω in A S , A A , A B , AC , and
A D result in no changes, but V , X , and Y are eliminated when AV , A X , and A Y are examined, resulting
in the automaton displayed in Figure 9.11b. Eliminating transitions associated with the corresponding
useless productions yields the automaton shown in Figure 9.11c. Checking for disconnected states in
this machine reveals the remaining inaccessible states. Thus, the equivalent grammar G u with no useless
nonterminals contains only the productions S → g Aee , A → ooC , and C → g i .

287
Note that the actual language described by the NDFA A S is of no consequence, nor may any finite
automaton be capable of producing the context-free language in question. However, the above method
illustrates that the tools developed for automata can be brought to bear in areas that do not directly
apply to FAD languages. A more efficient algorithm for identifying useless nonterminals can be found in
[HOPC]. If computerized, such a tailored algorithm would consume less CPU time than if the automata
modules described above were employed. In terms of the programming effort required, though, it is
often more advantageous to adhere to the “toolbox approach” and adapt existing tools to new situations.
Note that the algorithm developed in Theorem 9.3 relied on connectedness, and hence the speci-
fication of the final states was unimportant in this approach. With ω as the lone final state, some of
the decision algorithms developed in Chapter 12 could have been used in place of the connectivity and
accessibility checks.
Example 9.12 illustrates the simplification that can be attained by the elimination of useless produc-
tions. Further convenience is afforded by the elimination of nongenerative A-rules of the form A → B .
a ,b
Recall that in the grammar 〈{S 1 , S 2 , S 3 , S 4 , S 5 }, {a b ,cc }, S 1 , {S 1 → S 2 , S 2 → S 3 , S 3 → S 4 , S 4 → S 5 , S 5 → a }〉, all
the nonterminals were useful, but the production set was still needlessly complex.

Definition 9.8 A production of the form A → B , where A, B ∈ Ω, is called a unit production or a nongener-
ative production.

As with the elimination of useless nonterminals, unit productions can be removed with the help of
automata constructs. The interested reader is referred to [DENN] for the constructive proof. The proof
given below indicates the general algorithmic approach.

Theorem 9.4 Every pure context-free language L can be generated by a pure context-free grammar which
contains no useless non-terminals and no unit productions. Every context-free language L 0 can be gener-
ated by a context-free grammar which contains no useless non-terminals and no unit productions except
perhaps the Z -rule Z → S, where Z is the new start symbol.
Proof. If the first statement of the theorem is proved, the second will follow immediately from Defini-
tion 8.6. If L is a pure context-free language, then by Definition 8.5 there is a pure context-free grammar
G = 〈Ω, Σ, S, P 〉 that generates L. Divide the production set up into P u and P n , the set of unit productions
and the set of nonunit productions, respectively. For each nonterminal B found in P u , find B u = {C |B ⇒ C },
the unit closure of B . The derivations sought must all come from the (finite) set P u , and there is clearly an
algorithm that correctly calculates B u . In fact, B u is represented by the set of accessible states in a suitably
defined automaton (see the exercises). Define a new grammar G 0 = 〈Ω, Σ, S, P 0 〉, where P 0 = P n ∪ {B → α|B
is a nonterminal in P u ∧ C ∈ B u ∧ C → α ∈ P n }. A straightforward induction argument shows that G 0 is
equivalent to G, and G 0 contains no unit productions. Note that if G is pure, so is G 0 .
G’ is likely to contain useless nonterminals, even if all the productions in G were useful (see Example
9.13). However, the algorithm from Theorem 9.3 can now be applied to G 0 to eliminate useless nonter-
minals. Since that algorithm creates no new productions, the resulting grammar will still be free of unit
productions.

Example 9.13
Consider again the pure context-free grammar

a ,b
〉{S 1 , S 2 , S 3 , S 4 , S 5 }, {a b ,cc }, S 1 , {S 1 → S 2 , S 2 → S 3 , S 3 → S 4 , S 4 → S 5 , S 5 → a }〉

288
Figure 9.11: (a) The automaton discussed in Example 9.12 (b) The simplified automaton discussed in
Example 9.12 (c) The final automaton discussed in Example 9.12

289
The production set is split into P n = {S 5 → a } and

P u = {S 1 → S 2 , S 2 → S 3 , S 3 → S 4 , S 4 → S 5 }.

The unit-closure sets are

S 1u = {S 1 , S 2 , S 3 , S 4 , S 5 }
SU
2 = {S 2 , S 3 , S 4 , S 5 }
S 3u = {S 3 , S 4 , S 5 }
S 4u = {S 4 , S 5 }
S 5u = {S 5 }

Since S 5 → a and S 5 ∈ S 3u , the production S 3 → a is added to P 0 . The full set of productions is P 0 = {S 1 →

a , S 2 → a , S 3 → a , S 4 → a , S 5 → a }. The elimination of useless nonterminals and productions results in
the grammar 〈{S 1 }, {a a ,b
b ,cc }, S 1 , {S 1 → a }〉.

Example 9.14
Consider the context-free grammar with productions

Z → S, Z → λ
S → C Bhh, S → D
A → aa
aaC
B → Sf , B →ggg
C → c A, C → d ,C → C
D → E , D → S ABC
E → be

The unit closures of each of the appropriate nonterminals and the new productions they imply are shown
below. Note that Z → S is not considered and that the productions suggested by C → C are already
present.
*
S ⇒ D S → S ABC
*
S ⇒E S → be
*
D ⇒ E D → be
*
C ⇒ C C → c A, C → d

The new set of productions is therefore

Z → S, Z → λ
S → S ABC , S → be h
be, S → C Bh
A → aa
aaC
B → Sf , B →ggg
C → c A, C → d
D → be
be, D → S ABC

Note that D is now useless and can be eliminated.

The assurance that every context-free grammar corresponds to an equivalent grammar with no unit
productions is helpful in many situations. In particular, it is instrumental to the proof showing that the
following restrictive type of grammar is indeed a canonical form for context-free languages.

290
Definition 9.9 A pure context-free grammar G = 〈Ω, Σ, S, P 〉 is in pure Chomsky normal form (PCNF) if P
contains only productions of the form A → BC and A → d , where B and C are nonterminals and d ∈ Σ.
A context-free grammar G = 〈Ω, Σ, Z , P 〉 is in Chomsky normal form(CNF) if the Z -rules Z → S and
Z → λ are the only allowable productions involving the start symbol Z , and all other productions are of
the form A → BC and A → d , where B and C are nonterminals and d ∈ Σ.

Thus, in PCNF the grammatical rules are limited to producing exactly two nonterminals or one termi-
nal symbol. Few of the grammars discussed so far have met the restricted criteria required by Chomsky
normal form. However, every context-free grammar can be transformed into an equivalent CNF gram-
mar, as indicated in the following proof. The basic strategy will be to add new nonterminals and replace
undesired productions such as A → J K cb by a set of equivalent productions in the proper form, such as
A → J Y11 , Y11 → K Y12 , Y12 → X c X b , X c → c , X b → b , where Y11 , Y12 , X b , and X c are new nonterminals.

Theorem 9.5
Every pure context-free language L can be generated by a pure Chomsky normal form grammar.
Every context-free language L 0 can be generated by a Chomsky normal form grammar.
Proof. Again, if the first statement of the theorem is proved, the second will follow immediately from
Definition 8.6. If L is a pure context-free language, then by Definition 8.5 there is a pure context-free gram-
mar G = 〈Ω, Σ, S, P 〉 that generates L. Theorem 9.4 shows that without loss of generality we may assume
that P contains no unit productions. We construct a new grammar G 0 = 〈Ω, Σ, S, P 0 〉 in the following man-
ner. Number the productions in P , and consider each production in turn. If the right side of the kth
production consists of only a single symbol, then it must be a terminal symbol, since there are no unit pro-
ductions. No modifications are necessary in this case, and the production is retained for use in the new
set of productions P 0 . The same is true if the kth production consists of two symbols and they are both
nonterminals. If one or both of the symbols is a terminal, then the rule must be modified by replacing
any terminal symbol a with a new nonterminal X a . Whenever such a replacement is done, a produc-
tion of the form X a → a must also be included in the new set of productions P 0 . If the kth production
is A → α1 α2 α3 · · · αn , where the number of (terminal and nonterminal) symbols is n > 2, then new non-
terminals Yk1 , Yk2 , . . . , Ykn−2 must be introduced and the rule must be replaced by the set of productions
A → α1 Yk1 , Yk1 → α2 Yk2 , Yk2 → α3 Yk3 , · · · Ykn−2 → αn−1 αn . Again, if any α1 , is a terminal symbol such as
a , it must be replaced as indicated earlier by the nonterminal X a .
Each new set of rules is clearly capable of producing the same effect as the rule that was replaced. Each
nonterminal Yki is used in only one such replacement set to ensure that the new rules do not combine in
unexpected new ways. Tedious but straightforward inductive proofs will justify that L(G) = L(G 0 ).

Example 9.15
The grammar discussed in Example 9.14 can be transformed into CNF by the algorithm given in Theorem
9.5. After elimination of the unit productions and the consequent useless productions, the productions
(suitably numbered) that must be examined are
1. S → S ABC 5. B → S f
2. S → be 6. B → g g g
3. S → C Bhh 7. C → c A
4. A → aaaaC 8. C → d
In the corresponding lists given below, notice that only production 8 is retained; the others are replaced
by

291
S → SY11 , Y11 → AY12 , Y12 → BC
S → Xb Xe
S → C Y31 , Y31 → B X h
A → X a Y41 , Y41 → X a C
B → SXf
B → X g Y61 , Y61 → X g X g
C → Xc A
C →d
and the terminal productions X b → b , X e → e , X h → h , X a → a , X f → f , X g → g . Since d did not appear
as part of a two-symbol production, the rule X d → d can be eliminated. The above rules, with S as the
start symbol, form a pure Chomsky normal form grammar. The new start symbol Z and productions
Z → S and Z → λ would be added to this pure context-free grammar to obtain the required CNF.
Grammars in Chomsky normal form allow an exact correspondence to be made between the length
of a terminal string and the length of the derivation sequence that produces that string. If the empty
string can be derived, the production sequence will consist of exactly one rule application (Z → λ). A
simple inductive argument shows that, if a string of length n > 0 can be derived, the derivation sequence
must contain exactly 2n steps. In the grammar derived in Example 9.15, for example, the following ter-
minal string of length 5 is generated in exactly ten productions:
Z ⇒ S ⇒ C Y31 ⇒ d Y31 ⇒ d B X h ⇒ d S X f X h ⇒ d X b X e X f X h ⇒ d b
bX e X r X h
⇒ d be X f Xh ⇒ d be f Xh ⇒ d be f h
Other useful properties are also assured for grammars in Chomsky normal form. When a grammar is
in CNF, all parse trees can be represented by binary trees, and upper and lower bounds on the depth of a
parse tree for a string of length n can be found (see the exercises). The derivational relationship between
the number of production steps used and the number of terminals produced implies that CNF grammars
generate an average of one terminal every two productions. The following canonical form requires every
production to contain at least one terminal symbol, and grammars in this form must produce strings of
length n(> 0) in no more than n steps.

Definition 9.10 A pure context-free grammar G = 〈Ω, Σ, S, P 〉 is in pure Greibach normal form (PGNF) if
P contains only productions of the form A → d α, where α ∈ (Ω ∪ Σ)∗ and d ∈ Σ..
A context-free grammar G = Ω, Σ, Z , P 〉 is in Greibach normal form (GNF) if the Z -rules Z → S and
Z → λ are the only allowable productions involving the start symbol Z , and all other productions are of
the form A → d α, where α ∈ (Ω ∪ Σ)∗ and d ∈ Σ.

In pure Greibach normal form, the grammatical rules are limited to producing at least one terminal
symbol as the first symbol. The original grammar in Example 9.9 is a PGNF grammar, but few of the
other grammars presented in this chapter meet the seemingly mild restrictions required for Greibach
normal form. The main obstacle to obtaining a GNF grammar is the possible presence of left recursion. A
*
nonterminal A is called left recursive if there is a sequence of one or more productions for which A ⇒ Aβ
for some string β. Greibach normal form disallows such occurrences since no production may produce
a string starting with a nonterminal. Replacing productions involved with left recursion is complex, but
every context-free grammar can be transformed into an equivalent GNF grammar, as shown by Theorem
9.6. Two techniques will be needed to transform the productions into the appropriate form, and the
following lemmas ensure that the grammatical transformations leave the language unchanged. The first
indicates how to remove an X -rule that begins with an undesired nonterminal; Lemma 9.1 specifies a
new set of productions that compensate for the loss.

292
Lemma 9.1 Let G = 〈Ω, Σ, S, P 〉 be a context-free grammar, and assume there is a string α and nontermi-
nals X and B for which X → B α ∈ P . Further assume that the set of all B -rules is given by {B → β1 , B →
β2 , . . . , B → βm } and let G 0 = 〈Ω, Σ, S, P 0 〉, where

P 0 = P ∪ {X → β1 α, X → β2 α, . . . , X → βm α} − {X → B α}.

Then L(G) = L(G 0 ).

Proof. Let each nonterminal A be associated with the set of sentential forms X A that A can produce.
*
That is, let A = X A = {x ∈ (Σ ∪ Ω)∗ |A ⇒ x}. The nonterminals then denote variables in a set of language
equations that reflect the productions in P . These equations will generally not be linear; several variables
may be concatenated together within a single term. Since the set of all B -rules were B → β1 , B → β2 , . . . , B →
βm , X B satisfies the equation
X B = β1 ∪ β2 ∪ · · · ∪ βm
Similarly, if the X -rules other than X → B α are X → γ1 , X → γ2 , . . . , X → γn , then X satisfies the equation

X X = γ1 ∪ γ2 ∪ · · · ∪ γn ∪ X B α

Substituting for X B in the X X equation yields

X X = γ1 ∪ γ2 ∪ · · · ∪ γn ∪ (β1 ∪ β2 ∪ · · · ∪ βm )α

which by the distributive law becomes

X X = γ1 ∪ γ2 ∪ · · · ∪ γn ∪ β1 α ∪ β2 α ∪ · · · ∪ βm α

This shows why the productions X → β1 α, X → β2 α, . . . , X → βm α can replace the rule X → B α.

The type of replacement justified by Lemma 9.1 will not eliminate left recursion. The following lemma
indicates a way to remove all the left-recursive X -rules by introducing a new right-recursive nonterminal.

Lemma 9.2 Let G = 〈Ω, Σ, S, P 〉 be a context-free grammar, and choose a nonterminal X ∈ Ω. Denote the
set of all left-recursive X -rules by X r = {X → X α1 , X → X α2 , . . . , X → X αm } and the set of all non[left]recursive
X-rules by X n = {X → γ1 , X → γ2 , . . . , X → γn }. Choose a new nonterminal Y ∉ Ω and let G 00 = 〈Ω ∪
{Y }, Σ, S, P 00 〉, where P 00 = P ∪ {X → γ1 Y , X → γ2 Y , . . . , X → γn Y } ∪ {Y → α1 , Y → α2 , . . . , Y → αm } ∪ {Y →
α1 Y , Y → α2 Y , . . . , Y → αm Y } − X r . Then L(G) = L(G 00 ).
Proof. As in Lemma 9.1, let each nonterminal A be associated with the set of sentential forms X A that
A can produce, and consider the set of language equations generated by P . The X X equation is

X X = γ1 ∪ γ2 ∪ · · · ∪ γn ∪ X X α1 ∪ X X α2 ∪ · · · ∪ X X αm

Solving by the method indicated in Theorem 6.4c for an equivalent expression for X X shows that

X X = (γ1 ∪ γ2 ∪ · · · ∪ γn )(α1 ∪ α2 ∪ · · · ∪ αm )∗

In the new set of productions P n , the equations of interest are

X X = γ1 ∪ γ2 ∪ · · · ∪ γn ∪ γ1 X Y ∪ γ2 X Y ∪ · · · ∪ γn X Y
X Y = α1 ∪ α2 ∪ · · · ∪ αm ∪ α1 X Y ∪ α2 X Y ∪ · · · ∪ αn X Y

Factoring each equation produces

293
X X = γ1 ∪ γz ∪ · · · ∪ γn ∪ (γ1 ∪ γ2 ∪ · · · ∪ γn )X Y
X Y = α1 ∪ α2 ∪ · · · ∪ αm ∪ (α1 ∪ α2 ∪ · · · ∪ αm )X Y

and the second can also be solved for an equivalent expression for X Y , yielding

X Y = (α1 ∪ α2 ∪ · · · ∪ αm )∗ (α1 ∪ α2 ∪ · · · ∪ αm )

Substituting this expression for X Y in the X X equation produces

γ1 ∪ γz ∪ · · · ∪ γn ∪ (γ1 ∪ γz ∪ · · · ∪ γn )(α1 ∪ α2 ∪ · · · ∪ αm )∗ (α1 ∪ α2 ∪ · · · ∪ αm )

which by the distributive law becomes

X X = (γ1 ∪ γz ∪ · · · ∪ γn )(λ ∪ (α1 ∪ α2 ∪ · · · ∪ αm )∗ (α1 ∪ α2 ∪ · · · ∪ αm ))

Using the fact that λ ∪ B ∗ B = B ∗ , this simplifies to

X X = (γ1 ∪ γ2 ∪ · · · ∪ γn )(α1 ∪ αz ∪ · · · ∪ αm )∗

Therefore, when X Y is eliminated from the sentential forms, X X produces exactly the same strings as before.
This indicates why the productions in the sets

{X → γ1 Y , X → γ2 Y , . . . , X → γn Y } ∪ {→ α1 , Y → α2 , . . . , Y → αm }∪
{Y → α1 Y , Y → α2 Y , . . . , Y →m Y }

can replace the left-recursive X -rules X → γ1 , X → γ2 , . . . , X → γn .

Note that the new production set eliminates all left-recursive X -rules and does not introduce any
new left-recursive productions. The techniques discussed in Lemmas 9.1 and 9.2, when applied in the
proper order, will transform any context-free grammar into one that is in Greibach normal form. The
appropriate sequence is given in the next theorem.

Theorem 9.6
Every pure context-free language L can be generated by a pure Greibach normal form grammar.
Every context-free language L 0 can be generated by a Greibach normal form grammar.
Proof. Because of Definition 8.6, the second statement will follow immediately from the first. If L is a
pure context-free language, then by Definition 8.5 there is a pure context-free grammar
G = 〈{S 1 , S 2 , . . . , S r }, Σ, S 1 , P 〉 that generates L. We construct a new grammar by applying the transforma-
tions discussed in the previous lemmas.
Phase 1: The replacements suggested by Lemmas 9.1 and 9.2 will be used to ensure that the increasing
condition is met: if S i → S j α belongs to the new grammar, then i > j . We transform the S k rules for
k = r, r − 1, . . . , 2, 1 (in that order), considering the productions for each nonterminal in turn. At the end
of the i th iteration, the top i nonterminals will conform to the increasing condition. After the final step,
all nonterminals (including any newly introduced ones) will conform, all left recursion will be eliminated,
and we can proceed to phase 2.
The procedure for the i th iteration is: If an S i -rule of the form S i → S j α is found where i < j . eliminate
it as specified in Lemma 9.1. This may introduce other rules of the form S i → S j α0 , in which i is still
less than j 0 . Such new rules will likewise have to be eliminated via Lemma 9.1, but since the offending
subscript will decrease each time, this process will eventually terminate. S i -rules of the form S i → S j α
where i = j can then be eliminated according to Lemma 9.2. This will introduce some new nonterminals,

294
which can be given new, higher-numbered subscripts. Lemma 9.2 is designed so that the new rules will
automatically satisfy the increasing condition specified earlier. The remaining S i -rules must then conform
to the increasing condition. The process continues with lower-numbered rules until all the rules in the new
production set conform to the increasing condition.
Phase 2: At this point, S 1 conforms to the increasing condition, and since there are no nonterminals
with subscripts that are less than 1, all the S 1 -rules must begin with terminal symbols, as required by
GNF. The only S 2 rules that may not conform to GNF are those of the form S 2 → S 1 α, and Lemma 9.1 can
eliminate such rules by replacing them with the S 1 -rules. Since all the S 1 -rules now begin with terminal
symbols, all the new S 2 -rules will have the same property. This process is applied to S k -rules for increasing
k until the entire production set conforms to GNF.
The resulting context-free grammar is in GNF, and since all modifications were of the type allowed by
Lemmas 9.1 and 9.2, the new grammar is equivalent to the original.

Example 9.16
Consider the pure context-free grammar

a ,b
〈{S 1 , S 2 , S 3 }, {a b ,cc ,d
d ,ee }, S 1 , {S 1 → S 1 S 2c , S 1 → S 3b S 3 , S 2 → S 1 S 1 , S 2 → d , S 3 → S 2e }〉

If the given subscript ordering is not the most convenient, the nonterminals can be renumbered. The
current ordering will minimize the number of transformations needed to produce Greibach normal
form, since the only production that does not conform to the increasing condition is S 1 → S 3b S 3 . Thus,
the first and second steps of phase 1 are trivially completed; no substitutions are necessary. In the third
step, Lemma 9.1 allows the offending production

S 1 → S 3b S 3

to be replaced by
S 1 → S 2eb
ebS 3

The new production produces the smaller-subscripted nonterminal S 2 , but the new rule still does not
satisfy the increasing condition. Replacing S 1 → S 2eb
ebS 3 as indicated by Lemma 9.1 yields the two pro-
ductions
S 1 → S 1 S 1eb
ebS 3 and S 1 → d eb
ebS 3

At this point, the grammar contains the productions

S 1 → S 1 S 2c , S 1 → S 1 S 1eb
ebS 3 , S 1 → d eb
ebS 3 , S2 → S1S1, S2 → d , S 3 → S 2e

The first nonterminal has a left-recursive rule that must be eliminated by introducing the new nontermi-
nal S 4 . In the notation of Lemma 9.2, n = 1, m = 2, γ1 = d ebebS 3 , α1 = S 2c , and α2 = S 1eb
ebS 3 . Eliminating
S 1 → S 1 S 2c and S 1 → S 1 S 1eb
ebS 3 introduces the new nonterminal Y = S 4 and the productions

S 1 → d eb
ebS 3 S 4 , S 4 → S 2c , S 4 → S 1eb
ebS 3 , S 4 → S 2c S 4 , S 4 → S 1eb
ebS 3 S 4

Phase 1 is now complete. All left-recursion has been eliminated and the grammar now contains the
productions

295
S 1 → d eb
ebS 3 S 4 , S 1 → d eb
ebS 3
S2 → S1S1, S2 → d
S 3 → S 2e
S 4 → S 2c , S 4 → S 1eb
ebS 3 , S 4 → S 2c S 4 , S 4 → S 1eb
ebS 3 S 4

all of which satisfy the increasing condition. The grammar is now set up for phase 2, in which substitu-
tions specified by Lemma 9.1 will ensure that every rule begins with a nonterminal.
The S 1 -rules are in acceptable form, as is the S 2 -rule S 2 → d . The other S 2 -rule, S 2 → S 1 S 1 , is replaced
via Lemma 9.1 with S 2 → d eb ebS 3 S 4 S 1 and S 2 → eb
ebS 3 S 1 . Replacement of the S 3 -rule then yields S 3 →
d eb
ebS 3 S 4 S 1e , S 3 → d eb
ebS 3 S 1e and S 3 → d ee.
The S 4 rules are treated similarly. The final set of productions at the completion of phase 2 contains

S 1 → d eb
ebS 3 S 4 , S 1 → d eb ebS 3
S 2 → d eb
ebS 3 S 4 S 1 , S 2 → d eb ebS 3 S 1 , S 2 → d
S 3 → d eb
ebS 3 S 4 S 1e , S 3 → d eb ebS 3 S 1e , S 3 → d e
S 4 → d cc, S 4 → d eb ebS 3 S 4 S 1c , S 4 → d eb
ebS 3 S 1c , S 4 → d ebebS 3 S 4eb
ebS 3 , S 4 → d eb
ebS 3eb
ebS 3 ,
S 4 → d eb
ebS 3 S 4 S 1c S 4 ,
S 4 → d ccS 4 , S 4 → d eb ebS 3 S 1c S 4 , S 4 → d eb
ebS 3 S 4eb
ebS 3 S 4 , S 4 → d eb
ebS 3eb
ebS 3 S 4

In this grammar, S 2 is now useless and can be eliminated.

Greibach normal form is sometimes considered to require all productions to be of the form A → d α,
where α ∈ Ω∗ and d ∈ Σ. Such rules must produce exactly one leading terminal symbol; the rest of
the string must be exclusively nonterminals. It should be clear that this extra restriction can always be
enforced by a technique similar to the one employed for Chomsky normal form. The above conversion
process would be extended to phase 3, in which unwanted terminals such as e are replaced by a new
nonterminal X e , and new productions such as X e → e are introduced. For the grammar in Example 9.16,
the first production might look like S 1 → d X e X b S 3 S 4 .

9.4 Pumping Theorem

As was the case with type 3 languages, some languages are too complex to be defined by a context-free
grammar. To prove a language L is context-free, one need only define a [context-free] grammar that
generates L. By contrast, to prove L is not context free, one must effectively argue that no context-free
grammar can possibly generate L. The pumping lemma for deterministic finite automata (Theorem 2.3)
showed that the repetition of patterns within strings accepted by a DFA was a consequence of the nature
of the finite description. The finiteness of grammatical descriptions likewise implies a pumping theorem
for languages represented by context-free grammars. The proof is greatly simplified by the properties
implied by the existence of canonical forms for context-free grammars.

Theorem 9.7 Let L be a context-free language over Σ∗ . Then

(∃n ∈ N)(∀z ∈ L 3 |z| ≥ n)(∃u, v, w, x, y ∈ Σ∗ ) 3 z = uv w x y, |v w x| ≤ n, |v x| ≥ 1, and (∀i ∈ N)(uv i w x i y ∈ L)
Proof. Given a context-free language L, there must exist a PCNF grammar G = 〈Ω, Σ, S, P 〉 generating
L − {λ}. Let k = kΩk. The parse tree generated by this PCNF grammar for any word z ∈ L is a binary
tree with each (terminal) symbol in z corresponding to a distinct leaf in the tree. Let n = 2k+1 . Choose a
string z generated by G of length at least n (if there are no strings in L that are this long, then the theorem is
vacuously true, and we are done). The binary parse tree for any such string z must have depth at least k +1,
which implies the existence of a path involving at least k + 2 nodes, beginning at the root and terminating

296
Figure 9.12: The parse tree discussed in the proof of Theorem 9.7

with a leaf. The labels on the k + 1 interior nodes along the path must all be nonterminals, and since
kΩk = k, they cannot all be distinct. Indeed, the repetition must occur within the “bottom” k + 1 interior
nodes along the path. Call the repeated label R (see Figure 9.12), and note that there must exist a derivation
for the parse tree that looks like
* * *
S⇒ uRy ⇒ uv R x y ⇒ uv w x y

where u, v, w, x, and y are all terminal strings and z = uv w x y. That is, there are productions in P that
* * * *
allow R ⇒ v R x and R ⇒ w. Since S → u R y and R ⇒ w, S⇒ uw y is a valid derivation, and uw y is therefore
* * * *
a word in L. Similarly, S ⇒ u R y ⇒ uv R x y ⇒ uv v R xx y ⇒ uv v w xx y, and so uv 2 w x 2 y ∈ L. Induction
i i
shows that each of the strings uv w x y belongs to L for i = 0, 1, 2, . . .. If both v and x were empty, these
strings would not be distinct words in L. This case cannot arise, as shown next, and thus the existence of
the “long” string z ∈ L implies that there is an infinite sequence of strings that must belong to L.
The two occurrences of R were in distinct places in the parse tree, and hence at least one production
was applied in deriving u y R x y from u R y. Since the PCNF grammar G contains neither contracting pro-
ductions nor unit productions, the sentential form u y R x y must be of greater length than u R y, and hence
|v| + |x| > 0. Furthermore, the subtree rooted at the higher occurrence of R was of height k + 1 or less, and
hence accounts for no more than 2k+1 (= n) terminals. Thus, |v w x| ≤ n.
All the criteria described in the pumping theorem are therefore met. Since a context-free language must
be generated by a CNF grammar with a finite number of nonterminals, there must exist a constant n (such
as n = 2(kΩk+1) ) for which the existence of a string of length at least n implies the existence of an infinite
sequence of distinct strings that must all belong to L, as stated in the theorem.

As with the pumping lemma, the pumping theorem is usually applied to justify that certain languages

297
are complex (by proving that the language does not satisfy the pumping theorem and is thus not context
free). Such proofs naturally employ the contrapositive of Theorem 9.7, which is stated next.

Theorem 9.8 Let L be a language over Σ∗ .

if (∀n ∈ N)(∃z ∈ L 3 |z| ≥ n)(∀u, y, w, x, y ∈ Σ∗ 3 z = uv w x y, |v w x| ≤ n, |v x| ≥ 1)(∃i ∈ N 3 uv i w x i y ∉ L)
then L is not context free.
Proof. See the exercises.

Examples 8.5 and 9.17 show that there are context-sensitive languages which are not context free.

Example 9.17
The language L = {a k b k c k | k ∈ N} is not a context-free language. Let n be given, and choose z = a n b n c n .
Then z ∈ L and |z| = 3n ≥ n. If L were context free, there must be choices for u, v, w, x, and y satisfy-
ing the pumping theorem. Every possible choice of these strings leads to a contradiction, and hence L
cannot be context free. A sampling of the various cases is outlined below.
If the strings v and x contain only one type of letter (for example, c), then uv 2 w x 2 y will contain
more c s than a s or b s, and thus uv 2 w x 2 y ∉ L. If v were, say, all b s and x were all cs, then uv 2 w x 2 y
would contain too few a s and would again not be a member of L. If v were to contain two types of letters
aabb, then uv 2 w x 2 y = uv v w xx y = uaabbaabb
such as v = aabb aabbaabb
aabbaabbw xx y and would represent a string that
had some b s preceding some a s, and again uv 2 w x 2 y ∉ L. All other cases are similar to these, and they
collectively imply that L is not a context-free language. [The shortest choice for v w x that can successfully
be pumped is ab n c , but this violates the condition that |v w x| ≤ n.]
Example 9.17 illustrates one major inconvenience of the pumping theorem: the inability to specify
which portion of the string is to be “pumped.” With the pumping lemma in Chapter 2, variants were
explored that allowed the first n letters to be pumped or the last n letters to be pumped. Indeed, any n
consecutive letters in a word from an FAD language can be pumped. For context-free languages, such
precision is more elusive. The uncertainty as to where the v w x portion of the string was in Example 9.17
led to many subcases, since all combinations of u, v, w, x, and y had to be shown to lead to contra-
dictions. The following result, a variant of Ogden’s lemma, allows some choice in the placement of the
portion of the string to be pumped in a “long” word from a context-free language.

Theorem 9.9 Let L be a context-free language over Σ∗ . Then

(∃n ∈ N)
(∀z ∈ L 3 |z| ≥ n and z has any n or more positions marked as distinguished)
(∃u, v, w, x, y ∈ Σ∗ ) 3 z = uv w x y,
v w x contains no more than n distinguished positions,
v x contains at least one distinguished position,
w contains at least one distinguished position, and
(∀i ∈ N)(uv i w x i y ∈ L)
Proof. Given a context-free language L, there must exist a PCNF grammar G = 〈Ω, Σ, S, P 〉 generating
L − {λ}, Let n = 2kΩk+1 . The proof is similar to that given for the pumping theorem (Theorem 9.7); the
method for choosing the path now depends on the placement of the distinguished positions. A suitable
path is constructed by beginning at the root of the binary parse tree and, at each level, descending to the
right or left to lengthen the path. The decision to go right or left is determined by observing the number of
distinguished positions generated in the right subtree and the number of distinguished positions generated
in the left subtree. The path should descend into the subtree that has the larger number of distinguished

298
positions; ties can be broken arbitrarily. The resulting path will terminate at a leaf corresponding to a
distinguished position, will be of sufficient length to guarantee a repeated label R within the bottom kΩk+1
interior nodes, and so on. The conclusions now follow in much the same manner as those given in the
pumping theorem.

Example 9.18
For the language L = {a a k b k c k |k ∈ N} investigated in Example 9.17, Ogden’s lemma could be applied with
n n n
the first n letters of a b c as the distinguished positions. Since w must have at least one distinguished
letter (that is, an a ), and u and v must precede w, the u and v portions of the string would then be
required to be all a s. This greatly reduces the number of cases that must be considered. Note that more
than n letters can be chosen as the distinguished positions, and they need not be consecutive.

9.5 Closure Properties

Recall that CΣ represented the class of context-free languages over Σ. The applications of the pumping
theorem show that not every language is context free. The ability to show that specific languages are
not context free makes it feasible to decide which language operators preserve context-free languages.
The context-free languages are closed under most of the operators considered in Chapter 5; the major
exceptions are complement and intersection. We begin with a definition of substitution for context-free
languages.

Definition 9.11 Let Σ = {a a 1 ,a a m } be an alphabet and let Γ be a second alphabet. Given context-free
a 2 , . . . ,a
languages L 1 , L 2 , . . . , L m over Γ, define a substitution s: Σ → ℘(Γ∗ ) by s(a a i ) = L, for each i = 1, 2, . . . , m,
which can be extended to s: Σ∗ → ℘(Γ∗ ) by
s(λ) = λ
and
a ∈ Σ)(∀x ∈ Σ∗ )(s(a
(∀a a · x) = s(a
a ) · s(x))
s can be further extended to operate on a language L ⊆ Σ∗ by defining s: ℘(Σ∗ ) → ℘(Γ∗ ), where
[
s(L) = s(z)
z∈L

A substitution is similar to a language homomorphism (Definition 5.8), where letters were replaced
by single words, and to the regular set substitution given by Definition 6.5. For context-free languages,
substitution denotes the consistent replacement of the individual letters within each word of a context-
free language with sets of words. Each such set of words must also be a context-free language, although
not necessarily over the original alphabet.

Example 9.19
Let L = L(G t ), where
a ,b
G t = 〈{T }, {a b ,cc ,d
d ,− b |cc |d
−}, T, {T → a |b d |T − T }〉
Let L 1 denote the set of all valid FORTRAN identifiers.
Let L 2 denote the set of all strings denoting integer constants.
Let L 3 denote the set of all strings denoting real constants.
Let L 4 denote the set of all strings denoting double-precision constants.

299
If the substitution s were defined by s(a a ) = L 1 , s(b
b ) = L 2 , s(cc ) = L 3 , s(d
d ) = L 4 , then s(L) would represent
the set of all unparenthesized FORTRAN expressions involving only the subtraction operator.
In this example, s(L) is a language over a significant portion of the ASCII alphabet, whereas the orig-
inal alphabet consisted of only five symbols. The result is still context free, and this can be proved for all
substitutions of context-free languages into context-free languages. In Example 9.19, the languages L 1 ,
through L 4 were not only context free, but were in fact regular. There are clearly context-free grammars
defining each of them, and it should be obvious how to modify G t to produce a grammar that generates
s(L). If G 1 = 〈Ω1 , Σ1 , S 1 , P 1 〉 is a grammar generating L 1 [perhaps using the productions shown in Exam-
ple 8.1], for example, then occurrences of a in the productions of G t should simply be replaced by the
start symbol S 1 of G 1 and the productions of P 1 added to the new grammar that will generate s(L). This
is essentially the technique used to justify Theorem 9.10. The closure theorem is stated for substitutions
that do not modify the terminal alphabet, but it is also true in general, as a trivial modification of the
following proof would show.

Theorem 9.10 Let Σ be an alphabet, and let s: Σ → Σ∗ be a substitution. Then CΣ is closed under s.
Proof. Let Σ = {a a 1 ,a
a 2 , . . . ,a
a m }. If L is a context-free language, then there is a context-free grammar
G = 〈Ω, Σ, S, P 〉 that generates L 1 . If s: Σ → Σ∗ is a substitution satisfying Definition 9.11, then for each
letter a k ∈ Σ there is a corresponding grammar G k = 〈Ωk , Σk , S k , P k 〉 for which L(G k ) = s(a a k ). Since nonter-
minals can be freely renamed, we may assume that Ω, Ω1 , Ω2 , . . . , Ωm have no common symbols. s(L) will
be generated by the context-free grammar

G 0 = 〈Ω ∪ Ω1 ∪ Ω2 ∪ · · · ∪ Ωm , Σ, S, P 0 ∪ P 1 ∪ P 2 ∪ · · · ∪ P m 〉,

where P 0 consists of the rules of P , with each appearance of a k replaced by S k . From the start symbol S 1
the rules of P 0 can be used as they were in the original grammar G, producing strings with the start symbol
of the kth grammar where the kth terminal symbol would be. Since the nonterminal sets were assumed to
be pairwise disjoint, only the rules in P k can be used to expand S k resulting in the desired terminal strings
from s(aa k ). It follows that L(G 0 ) = s(L), and thus s(L) is context free.

Theorem 9.11 Let Σ be an alphabet, and let ψ: Σ → Σ∗ be a homomorphism. Then CΣ is closed under ψ.
Proof. Languages that consist of single words are obviously context free. Hence, Theorem 9.10 applies
when single words are substituted for letters. Since homomorphisms are therefore special types of substi-
tutions, CΣ is closed under homomorphism.

Many of the other closure properties of the collection of context-free grammars follow immediately
from the result for substitution. Closure under union could be proved by essentially the same method
presented in Theorem 8.4. An alternate proof, based on Theorem 9.10, is given next.

Theorem 9.12 Let Σ be an alphabet, and let L 1 and L 2 be context-free languages over Σ. Then L 1 ∪ L 2 is
context free. Thus, CΣ is closed under union.
Proof. Assume L 1 and L 2 are context-free languages over Σ. The grammar ∪ = 〈{S}, {a a ,b
b }, S, {S → a , S →
b }〉 clearly generates the context-free language {aa ,b
b }. The substitution defined by s(aa ) = L 1 and s(b b ) = L2
a ,b
gives rise to the language s({a b }), which obviously equals L 1 ∪L 2 . By Theorem 9.10, this language must be
context free.

A similar technique can be used for concatenation and Kleene closure. It is relatively easy to directly
construct appropriate new grammars that combine the generative powers of the original grammars. The
exercises explore constructions that prove these closure properties without relying on Theorem 9.10.

300
Theorem 9.13 Let Σ be an alphabet, and let L 1 and L 2 be context-free languages over Σ. Then L 1 · L 2 is
context free. Thus, CΣ is closed under concatenation.
Proof. Let L 1 and L 2 be context-free languages over Σ. The pure context-free grammar C = 〈{S}, {a a ,b
b },
S, {S → ab
ab}〉 generates the language {ab ab
ab}. The substitution defined by s(a a ) = L 1 and s(b
b ) = L 2 gives rise to
the language s({ab ab
ab}) = L 1 · L 2 . By Theorem 9.10, L 1 · L 2 must therefore be context free.

Closure under Kleene closure could be justified by Theorem 9.10 in a similar manner, since the
context-free grammar
b }, S, {Z → λ, Z → S, S → a S, S → a }〉
a ,b
K = 〈{Z , S}, {a
generates the language a ∗ . The substitution defined by s(a a ∗ ), which is
a ) = L 1 gives rise to the language s(a
∗ ∗
L 1 , and so L 1 is also context free. The proof of Theorem 9.14 instead illustrates how to modify an existing
grammar.

Theorem 9.14 Let Σ be an alphabet, and let L 1 be a context-free language over Σ. Then L ∗1 is context free.
Thus, CΣ is closed under Kleene closure.
Proof. If L 1 is a context-free language, then there is a pure context-free grammar G 1 = 〈Ω1 , Σ, S 1 , P 1 〉
that generates L 1 − {λ}. Choose nonterminals Z 0 and S 0 such that Z 0 ∉ Ω1 and S 0 ∉ Ω1 , and define a new
grammar
G ∗ = 〈Ω1 ∪ {S 0 , Z 0 }, Σ, Z 0 , P 1 ∪ {Z 0 → λ, Z 0 → S 0 , S 0 → S 0 S 1 , S 0 → S 1 }〉
A straightforward induction shows that L(G ∗ ) = L(G 1 )∗ .

Thus, many of the closure properties of the familiar operators investigated in Chapter 5 for regular
languages carryover to the class of context-free languages. Closure under intersection does not extend,
as the next result shows.

Lemma 9.3 C{aa ,bb ,cc } is not closed under intersection.

Proof. The languages L 1 = {a a i b j c j | i , j ∈ N} and L 2 = {a
a n b n c m | n, m ∈ N} are context free (see the
k k k
a b c | k ∈ N} was shown in Example 9.17 to be a language that was not
exercises), and yet L 1 ∩ L 2 = {a
context free. Hence Ca ,bb ,cc } is not closed under intersection.

The exercises show that CΣ is not closed under intersection for any alphabet k with two or more
letters. It was noted in Chapter 5 that De Morgan’s laws implied that any collection of languages that is
closed under union and complementation must also be closed under intersection. It therefore follows
immediately that C{aa ,bb ,cc } cannot be closed under complementation either.

Lemma 9.4 Ca ,bb ,cc } is not closed under complementation.

Proof. Assume that C{aa ,bb ,cc } is closed under complementation. Then any two context-free languages L 1
and L 2 would have context-free complements ∼L 1 and ∼L 2 . By Theorem 9.12, ∼L 1 ∪ ∼L 2 is context free,
and the assumption would imply that its complement is also context free. But ∼(∼L 1 ∪ ∼L 2 ) = L 1 ∩ L 2 ,
which would contradict Lemma 9.3 (for example, if L 1 were {a a i b j c j | i , j ∈ N} and L 2 were {a
a n b n c m | n, m ∈
N}). Hence the assumption must be false and C{aa ,bb ,cc } cannot be closed under complementation.

Thus, the context-free languages do not enjoy all of the closure properties that the regular languages
do. However, the distinction between a regular language and a context-free language is lost if the under-
lying alphabet contains only one letter, as shown by the following theorem. The proof demonstrates that
there is a certain regularity in the lengths of any context-free language. It is the relationships between

301
the different letters in the words of a context-free language that give it the potential for being non-FAD.
If L is a context-free language over the singleton alphabet {a a }, then no such complex relationships can
exist; the character of a word is determined solely by its length.

Theorem 9.15 C{aa } = D{aa } ; that is, every context-free language over a single-letter alphabet is regular.
Proof. Let L be a context-free language over the singleton alphabet {a a }, and assume the CNF grammar
G = 〈Ω, Σ, S, P 〉 generates L. Let n = 2 kΩk+1
. Consider the words in L that are of length n or greater, choose
the smallest such word, and denote it by a j 1 . Since j 1 ≥ n, the pumping theorem can be applied to this
word, and hence a j 1 can be written as uv w x y, where u = a p 1 , v = a q1 , w = a r 1 , x = a s1 , and y = a t1 . Let
i 1 = q 1 + s 1 . Note that |v w x| ≤ n implies that i 1 ≤ n. The pumping theorem then implies that all strings
in the set L 1 = {a a j 1 +k1 |k = 0, 1, 2, . . .} must belong to L. These account for many of the large words in L.
If there are other large words in L, choose the next smallest word a j 2 that is of length greater than n that
belongs to L but is not already in the set L 1 . By a similar argument, there is an integer i 2 ≤n for which all
strings in the set L 2 = {a a j 2 +k2 |k = 0, 1, 2, . . .} must also belong to L. Note that if i 1 happens to equal i 2 , then
j 1 − j 2 is not a multiple of n, or else a j 2 would belong to L 1 . That is, j 1 and j 2 must in this case belong to
different equivalence classes modulo n. While large words remain unaccounted for, we continue choosing
the next smallest word a j m+1 that is of length greater than n and belongs to L but is not already in the set
L 1 ∪ L 2 ∪ · · · ∪ L m . Since each i k ≤ n, there are at most n choices for the i k values, and only n different
equivalence classes mod n in which the i k s may fall, totaling n 2 different combinations. Thus, all the long
words in L will be accounted for by the time m reaches n 2 . The words in L of length less than n constitute a
finite set F , which is regular. Each L k is represented by the regular expression indicated by (a a i k )∗ · a j k , and
there are less than n 2 of these expressions, so L is the finite union of regular sets, and is therefore regular.

If a regular language is intersected with a context-free language, the result may not be regular, but it
will be context free. The proof that CΣ is closed under intersection with a regular set will use the tools
developed in Chapter 10. The constructs in Chapter 10 will also allow us to show that CΣ is closed under
inverse homomorphism. Such results are useful in showing closure under other operators and will also
be useful in identifying certain languages as non-context-free. These conclusions will be based on a more
powerful type of machine, called a pushdown automaton. The context-free languages will correspond to
the languages that can be represented by such recognizers.

Exercises
9.1. Characterize the nature of parse trees of left-linear grammars.

9.2. Give context-free grammars for the following languages:

a n b n c m d m | n, m ∈ N}
(a) {a
a i b j c j d i | i , j ∈ N}
(b) {a
a n b n c m d m | n, m ∈ N} ∪ {a
(c) {a a i b j c j d i | i , j ∈ N}

9.3. (a) Find, if possible, unambiguous context-free grammars for each of the languages given in Exer-
cise 9.2.
(b) Prove or disprove: If L 1 and L 2 are unambiguous context-free languages, then L 1 ∪L 2 is also an
unambiguous context-free language.
(c) Is UΣ closed under union?

302
9.4. State and prove the inductive result needed in Theorem 9.2.

9.5. Consider the proof of Theorem 9.4. Let G = 〈Ω, Σ, S, P 〉 be a context-free grammar, with the produc-
tion set divided up into P u and P n (the set of unit productions and the set of nonunit productions,
respectively). Devise an automaton-based algorithm that correctly calculates B u = {C | B ⇒C *
} for
u
each nonterminal B found in P .

9.6. (a) What is wrong with proving that CΣ is closed under concatenation by using the following con-
struction? Let G 1 = 〈Ω1 , Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉 be two context-free grammars, and
without loss of generality assume that Ω1 ∩ Ω2 = ;. Choose a new nonterminal Z such that
Z ∉ Ω1 ∪ Ω2 , and define a new grammar G ∗ = 〈Ω1 ∪ Ω2 ∪ {Z }, Σ, Z , P 1 ∪ P 2 ∪ {Z → S 1 · S 2 }〉. Note:
It is straightforward to show that L(G ∗ ) = L(G 1 ) · L(G 2 ).
(b) Modify G ∗ so that it reflects an appropriate valid context-free grammar. (Hint: Pay careful
attention to the treatment of lambda productions.)
(c) Prove that CΣ is closed under concatenation by using the construction defined in part (b).

9.7. Let Σ = {a
a ,b a i b j c k | i , j , k ∈ N and i + j = k} is context free.
b ,cc }. Show that {a

9.8. (a) Show that the following right-linear grammar is ambiguous.

a }, S, {S → A, S → B, A → aa A, A → A, B → aaa
G = 〈{S, A, B }, {a aaaB, B → A}〉

(b) Use the method outlined in Theorem 9.2 to remove the ambiguity in G.

9.9. The regular expression grammar discussed in Example 9.3 produces strings with needless outermost
parentheses, such as ((a ∪ b) · c)
c).

(a) Define a grammar that generates all the words in this language and strings that are stripped of
(only) the outermost parentheses, as in (a ∪ b) · cc.
(b) Define a grammar that generates all the words in this language and also allows extraneous sets
of parentheses, such as ((((a) ∪ b)) · c)
c).

9.10. For the regular expression grammar discussed in Example 9.3:

(a) Determine the leftmost derivation for ((a ∗ · b) ∪ (c · d )∗ )).

(b) Determine the rightmost derivation for ((a ∗ · b) ∪ (c · d )∗ )).

9.11. Consider the grammars G and G 0 in the proof of Theorem 9.5. Induct on the number of steps in a
derivation in G to show that L(G) = L(G 0 ).

9.12. For a grammar G in Chomsky normal form, prove by induction that for any string x ∈ L(G) other
than x = λ the number of productions applied to derive x is 2|x|.

9.13. (a) For a grammar G in Chomsky normal form and a string x ∈ L(G), state and prove a lower bound
on the depth of the parse tree for x.
(b) For a grammar G in Chomsky normal form and a string x ∈ L(G), state and prove an upper
bound on the depth of the parse tree for x.

9.14. Convert the following grammars to Chomsky normal form.

303
a ,b
(a) 〈{S, B,C }, {a b ,cc }, S, {S → a B, S → ab
abC , B → bc
bc,C → c }〉
a ,b
(b) 〈{S, A, B }, {a b ,cc }, S, {S → c B A, S → B, A → c B, A → Abb
bb
bbS, B → aaa
aaa}〉
b ,cc ,((,)),²²,;
a ,b
(c) 〈{R}, {a ;,∪ b |cc |²²|;
∪,··, ∗ }, R, {R → a |b ;|(R · R)|(R ∪ R)|R ∗ }〉
a ,b
(d) 〈{T }, {a b ,cc ,d
d ,− b |cc |d
+}, T, {T → a |b
−,+ d |T − T |T + T }〉

9.15. Convert the following grammars to Greibach normal form.

a ,b
(a) 〈{S 1 , S 2 }, {a b ,cc ,d
d ,ee }, S 1 , {S 1 → S 2 S 1e , S 1 → S 2b , S 2 → S 1 S 2 , S 2 → c }〉
a ,b
(b) 〈{S 1 , S 2 , S 3 }, {a b ,cc ,d
d ,ee }, S 1 , {S 1 → S 3 S 1 , S 1 → S 2 a , S 2 → be
be, S 3 → S 2c }〉
a ,b
(c) 〈{S 1 , S 2 , S 3 }, {a b ,cc ,d
d ,ee }, S 1 , {S 1 → S 1 S 2c , S 1 → d S 3 , S 2 → S 1 S 1 , S 2 → a , S 3 → S 3e }〉

9.16. Let G be a context-free grammar, and obtain G 0 from G by adding rules of the form A → λ. Prove
that there is a context-free grammar G 00 that is equivalent to G 0 . That is, show that apart from the
special rule Z → λ all other lambda productions are unnecessary.

9.17. Prove the following generalization of Lemma 9.1: Let G = 〈Ω, Σ, S, P 〉 be a context-free grammar,
and assume there are strings α and γ and nonterminals X and B for which X → γB α ∈ P . Further
assume that the set of all B rules is given by {B → β1 , B → β2 , . . . , B → βm }, and let G 0 = 〈Ω, Σ, S, P 0 〉,
where
P 0 = P ∪ {X B → γβ1 α, X B → γβ2 α, . . . , X B → γβm α} − {X → γβα}.
Then L(G) = L(G 0 ).

d }∗ | ∃ prime p 3 y = d P } = {d
9.18. Let P = {y ∈ {d d d ,d
d d d ,d d 7 ,d
d d d d d ,d d 11 ,d
d 13 , . . .}.

(a) Prove that P is not context free by directly applying the pumping theorem.
(b) Prove that P is not context free by using the fact that P is known to be a nonregular language.

9.19. Let Γ = {x ∈ {0
0,1 2}∗ | ∃w ∈ {0
1,2 1}∗ 3 x = w · 2 · w} = {2
0,1 2,121
121 020
121,020 11211
020,11211 10210, . . .}. Prove that Γ is not
10210
11211,10210
context free.

9.20. Let Ψ = {x ∈ {0 1}∗ | ∃w ∈ {0

0,1 1}∗ 3 x = w · w} = {λ,00
0,1 00 11
00,11 0000
11,0000 1010
0000,1010 1111, . . .}. Prove that Ψ is not
1111
1010,1111
context free.

b }∗ | ∃ j ∈ N 3 |x| = 2 j } = {b
9.21. Let Ξ = {x ∈ {b b ,bb
bb bbbb
bb,bbbb b 8 ,b
bbbb,b b 16 ,b
b 32 , . . .}. Prove that Ξ is not context free.

a }∗ | ∃ j ∈ N 3 |x| = j 2 } = {λ,a
9.22. Let Φ = {x ∈ {a a ,aaaa
aaaa a 9 ,a
aaaa,a a 16 ,a
a 25 , . . .}, and let

Φ0 = {x ∈ {b d }∗ | |x|b ≥ 1 ∧ |x|c = (|x|d )2 }.

b ,cc ,d

(a) Prove that Φ is not context free.

(b) Use the conclusion of part (a) and the properties of homomorphism to prove that Φ0 is not
context free.
(c) Use Ogden’s lemma to directly prove that Φ0 is not context free.
(d) Is it possible to use the pumping theorem to directly prove that Φ0 is not context free?

1}∗ | |y|0 = |y|1 }. Prove or disprove that L is context free.

0,1
9.23. Consider L = {y ∈ {0

9.24. Refer to the proof of Theorem 9.9.

304
(a) Give a formal recursive definition of the path by (1) stating boundary conditions, and (2) giving
a rule for choosing the next node on the path.
(b) Show that the conclusions of Theorem 9.9 follow from the properties of this path.

9.25. Show that CΣ is closed under ∪ by directly constructing a new context-free grammar with the ap-
propriate properties.

9.26. Let XΣ be the set of all languages that are not context free. Determine whether or not:

(a) XΣ is closed under union.

(b) XΣ is closed under complement.
(c) XΣ is closed under intersection.
(d) XΣ is closed under Kleene closure.
(e) XΣ is closed under concatenation.

9.27. Let Σ be an alphabet, and x = a 1a 2 · · ·a a n−1a n ∈ Σ∗ ; define x r = a n a n−1 · · ·a

a 2a 1 . For a language L
over Σ, define L = {x | x ∈ L}. Note that the (unary) reversal operator r is thus defined by L r =
r r

a n a n−1 · · ·a
{a a n−1a n ∈ L}, and L r therefore represents all the words in L written
a 3a 2a 1 | a 1a 2a 3 · · ·a
backward. Show that CΣ is closed under the operator r .

9.28. Let Σ = {a
a ,b
b ,cc ,d
d }. Define the (unary) operator T by

a n a n−1 · · ·a
T (L) = {a a 3a 2a 1a 1a 2a 3 · · ·a
a n−1a n |a
a 1a 2a 3 · · ·a
a n−1a n ∈ L}
= {w r · w|w ∈ L}

(see the definition of w r in Exercise 9.27). Prove or disprove that CΣ is closed under the operator T.

9.29. Prove or disprove that C{aa ,bb } is closed under relative complement; that is, if L 1 and L 2 are context
free, then L 1 − L 2 is also context free.

9.30. (a) Prove that C{aa ,bb } is not closed under intersection, nor is it closed under complement.
(b) By defining an appropriate homomorphism, argue that whenever Σ has more than one sym-
bol, CΣ is not closed under intersection, nor is it closed under complement.

9.31. Consider the iterative method discussed in the proof of Theorem 9.3. Outline an alternative method
based on an automaton with states labeled by the sets in ℘(Ω).

9.32. Consider grammars in Greibach normal form that also satisfy one of the restrictions of Chomsky
normal form; that is, no production has more than two symbols on the right side.

(a) Show that this is not a “normal form” for context-free languages by demonstrating that there is
a context-free language that cannot be generated by any grammar in this form.
(b) Characterize the languages generated by grammars that can be represented by this restrictive
form.

9.33. Let L be any collection of words over an alphabet Σ. Prove that L ∗ must be regular.

9.34. If kΣk = 1, prove or disprove that CΣ is closed under complementation.

305
a n b n c m | n, m ∈ N} is context free.
9.35. Prove that {a

a k b n c m | (k 6= n) ∧ (n 6= m)} is not context free.

9.36. Use Ogden’s lemma to prove that {a

spell: Command not found.

306
Chapter 10

Pushdown Automata

In the earlier part of this text, the representation of languages via regular grammars was a generative
construct equivalent to the cognitive power of deterministic finite automata and nondeterministic finite
automata. Chapter 9 showed that context-free grammars had more generative potential than did regular
grammars, and thus defined a significantly larger class of languages. This chapter and the next explore
generalizations of the basic automata construct introduced in Chapter 1. In Chapter 4, we discovered
that adding nondeterminism did not enhance the language capabilities of an automaton. It seems that
more powerful automata will need the ability to store more than a finite amount of state information,
and machines with the ability to write and read from an indefinitely long tape will now be considered.
Automata that allow unrestricted access to all portions of the tape are the subject of Chapter 11. Such
machines are regarded to be as powerful as a general-purpose computer. This chapter will deal with au-
tomata with restricted access to the auxiliary tape. One such device is known as a pushdown automaton
and is strongly related to the context-free languages.

10.1 Definitions and Examples

A language such as {a a n b n | n ≥ 1} can be shown to be non-FAD by the pumping lemma, which uses the
observation that a finite-state control cannot distinguish between an unlimited number of essentially
different situations. Deterministic finite automata could at best “count” modulo some finite number;
unlimited matching was one of the many things beyond the capabilities of a finite-state control. One
possible enhancement would be to augment the automaton with a single integer counter, which could be
envisioned as a sack in which stones could be placed (or removed) in response to input. The automaton
would begin with one stone in the sack and process input much as a nondeterministic finite automaton
would. With each transition, the machine would not only choose a new state, but also choose to add
another stone to the sack, remove an existing stone from the sack, or leave the contents unchanged.
The δ function is independent of the status of the sack; the sack is used only to determine whether the
automaton should continue to process input symbols. Perhaps some sort of weight sensor would be used
to detect when there were stones in the sack, and the device would continue to operate as long as stones
were present; the device would halt when the sack is empty. If all the symbols on the input tape happen
to have been consumed at the time the sack empties, the input string is accepted by the automaton.
Such devices are called counting automata and are general enough to recognize many non-FAD lan-
guages. A device to recognize {a a n b n | n ≥ 1} would need three states. The start state will transfer control to
a second state when an a is read, leaving the sack contents unchanged. The start state will have no valid

307
moves for b , causing words that begin with b to be rejected since the input tape will not be completely
consumed. The automaton will remain in the second state in response to each a , adding a stone to the
sack each time an a is processed. The second state will transfer control to the third state upon receipt of
the symbol b and withdraw a stone from the sack. The third state has no moves for a and remains in that
state while removing a stone for each b that is processed. For this device, only words of the form a n b n
will consume all the input just when the sack becomes empty.
Another type of counting automaton handles acceptance in the same manner as nondeterministic
finite automata. That is, if there is a sequence of transitions that consumes all the input and leaves the
device in a final state, the input word is accepted (irrespective of the sack contents). As with NDFAs, the
device may halt prematurely if there are no applicable transitions (or if the sack empties).
These counting automata are not quite general enough to recognize all context-free languages. More
than one type of “stone” is necessary in order for such an automaton to emulate the power of context-
free grammars, at which point the order of the items becomes important. Thus, the sack is replaced by a
stack, a last-in, first-out (LIFO) list. The most recently added item is positioned at the end called the top
of the stack. A newly added item is placed above the current top and becomes the new top item as it is
pushed onto the stack. The action of the finite-state control can be influenced by the type of item that is
on the top of the stack. Only the top (that is, the most recently placed) item can affect the state transition
function; the device has no ability to reexamine items that have previously been deleted (that is, have
been popped). The next item below the top of the stack cannot be examined until the top item is popped
(and that popped item thereby becomes unavailable for later reinspection). As with counting automata,
an empty stack will halt the operation of this type of automaton, called a pushdown automaton.

Definition 10.1 A (nondeterministic) pushdown automaton (NPDA or just PDA) is a septuple

P = 〈Σ, Γ, S, s 0 , δ, B, F 〉, where
Σ is the input alphabet.
Γ is the stack alphabet.
S is a finite nonempty set of states.
s 0 is the start state (s 0 ∈ S).
δ is the state transition function,
δ: S × (Σ ∪ λ) × Γ → the set of finite subsets of S × Γ∗ .
B is the bottom of the stack symbol (B ∈ Γ).
F is the set of final states (F ⊆ S).

By the definition of alphabet (Definition 1.1), both Σ and Γ must be nonempty. Figure 10.1 presents
a conceptualization of a pushdown automaton. As with an NDFA, there is a finite-state control and a
read head for the input tape, which only moves forward. The auxiliary tape also has a , which not only
moves forward, but can move backward when an item is popped. The state transition function is meant
to signify that, given a current state, an input symbol being currently scanned, and the current top stack
symbol, the automaton may choose both a new current state and a new string of symbols from Γ∗ to
replace the top stack symbol. This definition allows the machine to behave nondeterministically, since
a current state, input letter, and stack symbol are allowed to have any (finite) number of alternatives for
state transitions and strings from Γ∗ to record on the stack.
The auxiliary tape is similar to that of a finite-state transducer; the second component of the range
of the state transition function in a pushdown automaton specifies the string to be written on the stack
tape. Thus, the functions δ and ω of a FST are essentially combined in the δ function for pushdown
automata. The auxiliary tape differs from that of a FST in that the current symbol from Γ on the tape is

308
Figure 10.1: A model of a pushdown automaton

sensed by the stack read/write head and can affect the subsequent operation of the automaton. If no
symbols are written to tape during a transition, the tape head drops back one position and will then be
scanning the previous stack symbol. In essence, a state transition is initiated by the currently scanned
symbol on both the input tape and the stack tape and begins with the stack symbol being popped from
the stack; the state transition is accompanied by a push operation, which writes a new string of stack
symbols on the stack tape. If several symbols are written, the auxiliary read/write head will move ahead
an appropriate amount, and the head will be positioned over the last of the symbols written. Thus, if
exactly one symbol is written, the stack tape head does not move, and the effect is that the old top-of-
stack symbol is overwritten by the new symbol. When the empty string is to be written, the effect is a
pop followed by a push of no letters, and the stack tape head retreats one position. If the only remaining
stack symbol is removed from the stack in this fashion, the stack tape head moves off the end of the tape.
It would then no longer be scanning a valid stack symbol, so no further transitions are possible, and the
device halts.
It is possible to manipulate the stack and change states without consuming an input letter, which is
the intent of the λ-moves in the state transition function. Since at most one symbol can be removed from
the stack as a result of a transition, λ-moves allow the stack to be shortened by several symbols before
the next input symbol is processed.
Acceptance can be defined by requiring the stack to be empty after the entire input tape is consumed
(as was the case with counting automata) or by requiring that the automaton be in a final state after all
the input is consumed. The nondeterminism may allow the device to react to a given input string in
several distinct ways. As with NDFAs, the input word is considered accepted if at least one of the possible
reactions satisfies the criteria for acceptance. For a given PDA, the set of words accepted by the empty
stack criterion will likely differ from the set of words accepted by the final state condition.

Example 10.1

b }, {A, B }, {q, r }, q, δ, B, ;〉, where δ is defined by

a ,b
Consider the PDA defined by P 1 = 〈{a

309
Figure 10.2: The PDA discussed in Example 10.1

δ(q,a
a , B ) = {〈q, A〉}
δ(q,a
a , A) = {〈q, A A〉}
δ(q,b
b, B ) = { }
δ(q,b
b , A) = {〈r, λ〉}
δ(r,a
a, B ) = { }
δ(r,a
a , A) = { }
δ(r,b
b, B ) = { }
δ(r,b
b , A) = {〈r, λ〉}

Note that since the set of final states is empty no strings are accepted by final state. We wish to consider
the set of strings accepted by empty stack. In general, when the set of final states is nonempty, the PDA
will designate a machine designed to accept by final state; F = ; will generally be taken as an indication
that acceptance is to be by empty stack.
The action of the state transition function can be displayed much like that of finite-state transducers.
Transition arrows are no longer labeled with just a symbol from the input alphabet, since both a stack
symbol and an input symbol now govern the action of the automaton. Thus, arrows are labeled by or-
dered pairs from Σ×Γ. As with FSTs, this is followed by the output caused by the transition. The diagram
corresponding to P 1 is shown in Figure 10.2.
The reaction of P 1 to the string aabb is the sequence of moves displayed in Figures 10.3a-e. Initially,
the heads of the two tapes are positioned as shown in Figure 10.3a, with the (current) initial state high-
lighted. Since the state is q, the input symbol is a , and the stack symbol is B , the first transition rule
δ(q,aa , B ) = {〈q, A〉} applies; P 1 remains in state q, and the popped stack symbol B is replaced by a single
A. Figure 10.3b shows the new state of the automaton. The stack read/write head is in the same posi-
tion, since the length of the stack did not change. The input read head moves on to the next letter, since
the first input symbol was consumed. The second rule now applies, and the single A is replaced by the
pair A A as P 1 returns to q again, as shown in Figure 10.3c. Note that the stack tape head advanced as
the topmost symbol was written. The rule δ(q,b b , A) = {〈r, λ〉} now applies, and the state of the machine
switches to r as the (topmost) A is popped and replaced with an empty string, leaving the stack shorter
than before. This is shown in Figure 10.3d. The last of the eight transition rules now applies, leaving the
automaton in the configuration shown by Figure 10.3e. Since the stack is now empty, no further moves
are possible. However, since the read head has reached the end of the input string, the word aabb is
accepted by P 1 . The word aab would be rejected by P 1 since the automaton would run out of input in a
configuration similar to that of Figure 10.3d, in which the stack is not yet empty. The word aabbb would
not be accepted because the stack would empty prematurely, leaving P 1 stuck in a configuration similar
to that of Figure 10.3e, but with the input string incompletely consumed. The word aaba would like-
wise be rejected because there would be no move from the state r with which to process the final input
symbol a .
As with deterministic finite automata, once an input symbol is consumed, it has no further effect on

310
Figure 10.3a: Walkthrough of the pushdown automaton discussed in Example 10.1

Figure 10.3b: Walkthrough of the pushdown automaton discussed in Example 10.1

Figure 10.3c: Walkthrough of the pushdown automaton discussed in Example 10.1

311
Figure 10.3d: Walkthrough of the pushdown automaton discussed in Example 10.1

Figure 10.3e: Walkthrough of the pushdown automaton discussed in Example 10.1

312
the operation of the pushdown automaton. The current state of the device, the remaining input symbols,
and the current stack contents form a triple that describes the current configuration of the PDA. The
bb
bb, A A〉 thus describes the configuration of the PDA in Figure 10.3c. When processing aabb
triple 〈q,bb aabb, P 1
moved through the following sequence of configurations:

aabb
aabb, B 〉
〈q,aabb
abb
abb, A〉
〈q,abb
bb
bb, A A〉
〈q,bb
〈r,bb , A〉
〈r, λ, A〉

Successive configurations followed from their predecessors by applying a single rule from the state tran-
sition function. These transitions will be described by the operator `.

Definition 10.2 The current configuration of pushdown automaton P = 〈Σ, Γ, S, s 0 , δ, B, F 〉 is described by

a triple 〈s, x, α〉, where

s is the current state.

x is the unconsumed portion of the input string.
α is the current stack contents (with the topmost symbol written as the leftmost).

An ordered pair 〈t , γ〉 within the finite set of objects specified by δ(s,a

a , A) can cause a move in the
pushdown automaton P from the configuration 〈s,a a y, Aβ〉 to the configuration 〈t , y, γβ〉. This transition
a y, Aβ〉 ` 〈t , y, γβ〉.
is denoted as 〈s,a
A sequence of successive moves in which

〈s 1 , s 1 , α1 〉 ` 〈s 2 , x 2 , α2 〉, 〈s 2 , x 2 , α2 〉 ` 〈s 3 , x 3 , α3 〉, . . . , 〈s m−1 , x m−1 , αm−1 〉 ` 〈s m , x m , αm 〉

is denoted by 〈s 1 , x 1 , α1 〉 `* 〈s m , x m , αm 〉.

The operator `* reflects the reflexive and transitive closure of `, and thus we also have 〈s 1 , x 1 , α1 〉 `* 〈s 1 , x 1 , α1 〉
and clearly 〈s 1 , x 1 , α1 〉 ` 〈s 2 , x 2 , α2 〉 implies 〈s 1 , x 1 , α1 〉 `* 〈s 2 , x 2 , α2 〉.

Example 10.2

aabb, B 〉 `* 〈r, λ, λ〉 because 〈q,aabb

aabb
For the pushdown automaton P 1 in Example 10.1, 〈q,aabb aabb abb
aabb, B 〉 ` 〈q,abb
abb, A〉 `
bb b , A〉 ` 〈r, λ, λ〉.
bb, A A〉 ` 〈r,b
〈q,bb

Definition 10.3 For a pushdown automaton P = 〈Σ, Γ, S, s 0 , δ, B, F 〉, the language accepted via final state
by P , L(P ), is
{x ∈ Σ∗ | ∃r ∈ F, ∃α ∈ Γ∗ 3 〈s 0 , x, B 〉 `* 〈r, λ, α〉}

The language accepted via empty stack by P , Λ(P ), is

{x ∈ Σ∗ | ∃r ∈ S 3 〈s 0 , x, B 〉 `* 〈r, λ, λ〉}

313
Example 10.3
Consider the pushdown automaton P 1 in Example 10.1. Since only strings of the form a i b i (for i ≥ 1)
a i b i , B 〉 `* 〈r, λ, λ〉, it follows that Λ(P 1 ) = {a
allow 〈q,a a n b n | n ≥ 1}. However, F = ; and thus L(P 1 ) is clearly
;.
The pushdown automaton P 1 in Example 10.1 was deterministic in the sense that there will never
be more than one choice that can be made from any configuration. The following example illustrates a
pushdown automaton that is nondeterministic.

Example 10.4
b }, {S,C }, {t }, t , δ, S, ;〉, where δ is defined by
a ,b
Consider the pushdown automaton defined by P 2 = 〈{a
δ(t ,aa , S) = {〈t , SC 〉, 〈t ,C 〉}
δ(t ,aa ,C ) = { }
δ(t ,bb , S) = { }
δ(t ,bb ,C ) = {〈t , λ〉}
δ(t , λ, S) = { }
δ(t , λ,C ) = { }
In this automaton, there are two distinct courses of action when the input symbol is a and the top stack
symbol is S, which leads to several possible options when trying to process the word aabb
aabb. One option
is to apply the first move whenever possible, which leads to the sequence of configurations

aabb abb
aabb, S〉 ` 〈t ,abb
〈t ,aabb bb
abb, SC 〉 ` 〈t ,bb
bb, SCC 〉.

Since there are no λ-moves and δ(t ,b b , S) = { }, there are no further moves that can be made, and the
input word cannot be completely consumed in this manner. Another option is to choose the second
move option exclusively, leading to the abortive sequence 〈t ,aabb aabb abb,C 〉; δ(t ,a
abb
aabb, S〉 ` 〈t ,abb a ,C ) = { }, and
processing again cannot be completed. A mixture of the first and second moves results in the sequence
aabb abb
aabb, S〉 ` 〈t ,abb
〈t ,aabb bb
abb, SC 〉 ` 〈t ,bb
bb,CC 〉 ` 〈t ,bb ,C 〉 ` 〈t , λ, λ〉, and aabb is thus accepted by P 2 . Further
experimentation shows that Λ(P 2 ) = {a a n b n | n ≥ 1}. To successfully empty its stack, this automaton must
correctly “guess” when the last a is being read and choose the second transition pair, placing only C on
the stack.

Definition 10.4 Two pushdown automata M 1 = 〈Σ, Γ1 , S 1 , s 01 , δ1 , B 1 , F 1 〉 and M 2 = 〈Σ, Γ2 , S 2 , s 02 , δ2 ,

B 2 , F 2 〉 are called equivalent iff they accept the same language.

The pushdown automaton P 1 from Example 10.1 is therefore equivalent to P 2 in Example 10.4. The
concept of equivalence will apply even if one device accepts via final state and the other accepts via
empty stack. In keeping with the previous broad use of the concept of equivalence, if any two finite
descriptors define the same language, those descriptors will be called equivalent. Thus, if a PDA M
happens to accept the language described by a regular expression R, we will say that R is equivalent to
M.

Example 10.5
The following pushdown automaton illustrates the use of λ-moves and acceptance by final state for the
language {aa n b m | n ≥ 1 ∧ (n = m ∨ n = 2m)}. Let P 3 = 〈{a b }, {A}, {s 0 }, {s 0 , s 1 , s 2 , s 3 , s 4 }, δ, A, {s 2 , s 4 }〉, where δ
a ,b
is defined by

314
Figure 10.4: The PDA discussed in Example 10.5

δ(s 0 ,aa , A) = {〈s 0 , A A〉}

δ(s 0 ,bb , A) = { }
δ(s 0 , λ, A) = {〈s 1 , λ〉, 〈s 3 , λ〉}
δ(s 1 ,aa , A) = { }
δ(s 1 ,bb , A) = {〈s 1 , λ〉}
δ(s 1 , λ, A) = {〈s 2 , λ〉}
δ(s 2 ,aa , A) = { }
δ(s 2 ,bb , A) = { }
δ(s 2 , λ, A) = { }
δ(s 3 ,aa , A) = { }
δ(s 3 ,bb , A) = { }
δ(s 3 , λ, A) = {〈s 4 , λ〉}
δ(s 4 ,aa , A) = { }
δ(s 4 ,bb , A) = {〈s 3 , λ〉}
δ(s 4 , λ, A) = { }

The finite-state control for this automaton is diagrammed in Figure 10.4. Note that the λ-move from state
s 3 is not responsible for any nondeterminism in this machine. From S 3 , only one move is permissible: the
λ-move to s 4 . On the other hand, the λ-move from state s 1 does allow a choice of moving to s 2 (without
moving the read head) or staying at s 1 while consuming another input symbol. The choice of moves
from state s 0 also contributes to the nondeterminism; the device must “guess” whether the number of
b s will equal the number of a s or whether there will be half as many, and at the appropriate time transfer
control to s 1 or s 3 , respectively. Notice that the moves defined by states s 3 and s 4 allow two stack symbols
to be removed for each b consumed. Furthermore, a string like aab can transfer control to s 3 as the final
b is processed, but the λ-move can then be applied to reach s 4 even though there are no more symbols
on the input tape.
Since A was the only stack symbol in P 3 , the language could have as easily been described by

315
the sack-and-stone counting device described at the beginning of the section. It should be clear that
counting automata are essentially pushdown automata with a singleton stack alphabet. Pushdown au-
tomata with only one stack symbol cannot generate all the languages that a PDA with two symbols can
[DENN]. However, it can be shown that using more than two stack symbols does not contribute to the
generative power of a PDA; for example, a PDA with Γ = {A, B,C , D} can be converted into an equiva-
lent machine with Γ0 = {0, 1} and the occurrences of the old stack symbols replaced by the encodings
A = 01, B = 001, C = 0001, and D = 00001.
Every NDFA can be simulated by a PDA that simply ignores its stack. In fact, every NDFA has an
equivalent counting automaton, as shown in the following theorem.

Theorem 10.1 Given any alphabet Σ, and an NDFA A:

1. There is an equivalent pushdown automaton (counting automaton) A 0 for which L(A) = L(A 0 ).

2. There is an equivalent pushdown automaton (counting automaton) A 00 for which L(A) = Λ(A 00 ).

Proof. The results for pushdown automata will actually follow from the results of the next section,
since pushdown automata can define all the context-free languages, and the regular language defined by
the NDFA A must be context free. The following constructions will use only the one stack symbol ¢, and
hence A 0 and A 00 are actually counting automata for which L(A) = L(A 0 ) and L(A) = Λ(A 00 ).
While the construction of a PDA from an NDFA is straightforward, the inductive proofs are simplified
if we appeal to Theorem 4.1, and assume that the given automaton is actually a DFA A = 〈Σ, S, s 0 , δ, F 〉.
Define the PDA A 0 = 〈Σ, {¢}, S, s 0 , δ0 , ¢, F 〉, where δ0 is defined by

a ∈ Σ)(δ0 (s,a
(∀s ∈ S)(∀a a , ¢) = {〈δ(s,a
a ), ¢〉})

and (∀s ∈ S)(δ0 (s, λ, ¢) = { }. That is, the PDA makes the same transitions that the DFA does and replaces the
¢ with the same symbol on the stack at each move. The proof that A and A 0 are equivalent is by induction
on the length of the input string, where P (n) is the statement that

(∀x ∈ Σn )(δ(s 0 , x) = t ⇔ 〈s 0 , x, ¢〉 `* 〈t , λ, ¢〉)

The PDA with a single stack symbol that accepts L via empty stack is quite similar; final states are simply
given the added option of removing the only symbol on the stack. That is, A 00 = 〈Σ, {¢}, S, s 0 , δ00 , ¢, ;〉, where
δ00 is defined by
a ∈ Σ)(δ00 (s,a
(∀s ∈ S)(∀a a , ¢) = {〈δ(s,a
a ), ¢〉})

and
(∀s ∈ F )(δ00 (s, λ, ¢) = {〈s, λ〉})

while
(∀s ∈ S − F )(δ00 (s, λ, ¢) = { })

The same type of inductive statement proved for A 0 holds for A 00 , and it therefore will follow that exactly
those words that terminate in what used to be final states empty the stack, and thus L(A) = Λ(A 00 ).

316
10.2 Equivalence of PDAs and CFGs
In this section, it will be shown that if L is accepted by a PDA, then L can be generated by a CFG, and,
conversely, every context-free language can be recognized by a PDA. We will also show that the class of
pushdown automata that accept by empty stack defines exactly the same languages as the class of push-
down automata that accept by final state. In each case, the languages defined are exactly the context-free
languages.

Definition 10.5 For a given alphabet Σ, let

PΣ = {L ⊆ Σ∗ | ∃PDA P 3 L = Λ(P )}
FΣ = {L ⊆ Σ∗ | ∃PDA P 3 L = L(P )}

Recall that CΣ was defined to be the collection of context-free languages. We begin by showing that
CΣ ⊆ PΣ . To do this, we must show that, given a language L generated by a context-free grammar G, there
is a PDA PG that recognizes exactly those words that belong to L. The pushdown automaton given in the
next definition simulates leftmost derivations in G. That is, as the symbols on the input tape are scanned,
the automaton guesses at the production that produced that letter and remembers the remainder of the
sentential form by pushing it on the stack. PG is constructed in such a way that, when the stack contents
are checked against the symbols on the input tape, wrong guesses are discovered and the device halts.
Wrong guesses, corresponding to inappropriate or impossible derivations, are thereby prevented from
emptying the stack, and yet each word that can be generated by G will be guaranteed to have a sequence
of moves that results in acceptance by empty stack.

Definition 10.6 Given a context-free grammar G = 〈Ω, Σ, S, P 〉 in pure Greibach normal form, the single-
state pushdown automaton corresponding to G is the septuple

PG = 〈Σ, Ω ∪ Σ, {s}, s, δG , S, ;〉,

where δG is defined by

 {〈s, α〉 | Ψ → a α ∈ P }, if Ψ ∈ Ω


δG (s,a
a , Ψ) = a ∈ Σ, ∀Ψ ∈ (Ω ∪ Σ)
∀a
{〈s, λ〉}, if Ψ ∈ Σ ∧ Ψ = a


Example 10.6

Consider the pure Greibach normal form grammar

a ,b
G = 〈{S}, {a b }, S, {S → a Sb
b , S → ab
ab}〉

a n b n | n ≥ 1}. The automaton PG is then

which is perhaps the simplest grammar generating {a

a ,b
PG = 〈{a b }, {S,a b }, {s}, s, δG , S, ;〉
a ,b

where δG is defined by

317
δG (s,a
a , S) = {〈s, Sbb ), (s,b
b 〉}
δG (s,a
a ,aa ) = {〈s, λ〉}
δG (s,a
a ,bb) = { }
δG (s,b
b , S) = { }
δG (s,b
b ,aa) = { }
δG (s,b
b ,bb ) = {〈s, λ〉}

This automaton contains no λ-moves and is essentially the same as P 2 in Example 10.4, with the state t
now relabeled as s, the stack symbol b now playing the role of C , and the unused stack symbol a added
to Γ. The derivation S ⇒ a Sb
b ⇒ aabb corresponds to the successful move sequence

aabb abb
aabb, S〉 ` 〈s,abb
〈s,aabb b 〉 ` 〈s,bb
abb, Sb bb bb
bb,bb b 〉 ` 〈s, λ, λ〉.
b ,b
bb〉 ` 〈s,b

The exact correspondence between derivation steps and move sequences is illustrated in the next exam-
ple.

Example 10.7
For a slightly more complex example, consider the pure Greibach normal form grammar

b ,cc ,((,)),²²,;
a ,b
G = 〈{R}, {a ∪,··, ∗ }, {R, {R → a |b
;,∪ ;|(R · R)|(R ∪ R)|(R)∗ }〉.
b |cc |²²|;

The automaton PG is then

b ,cc ,((,)),²²,;
a ,b
〈{a ∪,··,∗ }, {R,a
;,∪ b ,cc ,((,)),²²,;
a ,b ∪,··,∗ }, {s}, S, δG , R, ;〉,
;,∪

where δG is comprised of the following nonempty transitions:

δG (s,((, R) = {〈s, R · R))〉, 〈s, R ∪ R))〉, 〈s, R))∗ 〉}

δG (s,aa , R) = {〈s, λ〉}
δG (s,bb , R) = {〈s, λ〉}
δG (s,cc , R) = {〈s, λ〉}
δG (s,²², R) = {〈s, λ〉}
δG (s,;
;, R) = {〈s, λ〉}
δG (s,aa ,aa ) = {〈s, λ〉}
δG (s,bb ,bb ) = {〈s, λ〉}
δG (s,cc ,cc ) = {〈s, λ〉}
δG (s,;
;,; ;) = {〈s, λ〉}
δG (s,²²,²²) = {〈s, λ〉}
δG (s,∪
∪,∪ ∪) = {〈s, λ〉}
δG (s,··,··) = {〈s, λ〉}
δG (s, ∗ , ∗ ) = {〈s, λ〉}
δG (s,)),))) = {〈s, λ〉}
δG (s,((,(() = {〈s, λ〉}

In this grammar, it happens that the symbol ( is never pushed onto the stack, and so the last transition is
not utilized. Transitions not listed are empty; that is, they are of the form δG (s,d
d , A) = { }.
Consider the string (a ∪ (b · c))
c)), which has the following (unique) derivation:

318
Figure 10.5a: Walkthrough of the pushdown automaton discussed in Example 10.7

R ⇒ (R ∪ R))
(a∪R))
⇒ (a∪
))
⇒ (a ∪ ((R · R))
))
⇒ (a ∪ (b · R))
⇒ (a ∪ (b · c))

PG simulates this derivation with the following steps:

(a ∪ (b · c))
〈s,(a c)), R)〉 ` 〈s,aa ∪ (b · c))
c)), R ∪ R))〉
∪(b · c))
` 〈s,∪(b c)),∪∪R))〉
(b · c))
` 〈s,(b c)), R))〉
` 〈s,bb · c))
c)), R · R))))
))〉
·c)),··R))
·c))
` 〈s,·c)) ))
))〉
c))
c)), R))
` 〈s,c)) ))
))〉
)) ))
)),))
` 〈s,)) ))〉
` 〈s,)),))〉
` 〈s, λ, λ〉

Figures 10.5a-f illustrates the state of the machine at several points during the move sequence. At
each point when an R is the top stack symbol and the input tape head is scanning a ( , there are three
choices of productions that might have generated the opening parenthesis, and consequently the au-
tomaton has three choices with which to replace the R on the stack. If the wrong choice is taken, PG will
halt at some future point. For example, if the initial move guessed that the first parenthesis was due to a
concatenation operation, the move sequence would be

(a ∪ (b · c))
〈s,(a a ∪ (b · c))
c)), R〉 ` 〈s,a c)), R · R))〉 ` 〈s,∪(b c)),··R))〉
∪(b · c))

Since there are no λ-moves and the entry for δG (s,∪

∪,··) is empty, this attempt can go no further. A con-
struction such as the one given in Definition 10.6 can be shown to produce the desired automaton for
any context-free grammar in Greibach normal form.

Theorem 10.2 Given any alphabet Σ, CΣ ⊆ PΣ . In particular, for any context-free grammar G, there is a
pushdown automaton that accepts (via empty stack) the language generated by G.

319
Figure 10.5b: Walkthrough of the pushdown automaton discussed in Example 10.7

Figure 10.5c: Walkthrough of the pushdown automaton discussed in Example 10.7

Figure 10.5d: Walkthrough of the pushdown automaton discussed in Example 10.7

320
Figure 10.5e: Walkthrough of the pushdown automaton discussed in Example 10.7

Figure 10.5f: Walkthrough of the pushdown automaton discussed in Example 10.7

321
Proof. Let G 0 be any context-free grammar. Theorem 9.6 guarantees that there is a pure Greibach nor-
mal form grammar G = 〈Ω, Σ, S, P 〉 for which L(G) = L(G 0 ) − {λ}. If λ ∉ L(G 0 ), the PDA PG from Definition
10.6 can be used directly. If λ ∈ L(G 0 ), then there is a Greibach normal form grammar

G 00 = 〈Ω ∪ {Z }, Σ, Z , P ∪ {Z → S, Z → λ}〉,

which generates L(G 0 ), and the state transition function for L(G 0 ) should then include the move δG (s, λ, Z ) =
{〈s, S〉, 〈s, λ〉} to reflect the two Z -rules. The bottom of the stack symbol would then be Z , the new start sym-
bol.
In either case, induction on the number of moves in a sequence will show that (∀x ∈ Σ∗ )(∀β ∈ (Σ ∪
Ω) )(〈s, x, S〉 `* 〈s, λ, β〉 iff S ⇒
∗ *
xβ as a leftmost derivation). Note that xβ is likely to be a sentential form
that still contains nonterminals. The words x that result in an empty stack (β = λ) will then be exactly
those words that produce an entire string of terminal symbols from the start symbol S (or Z in the case
where the grammar contains the two special Z -rules). In other words, L(G 0 ) = Λ(PG ).

Given a context-free grammar, the definition of an equivalent PDA is easy once an appropriate GNF
grammar is in hand. In Example 10.6, the grammar was already in Greibach normal form. To find a PDA
for the grammar in Chapters 8 and 9 that generates regular expressions, the grammar

b ,cc ,((,)),²²,;
a ,b
〈{R}, {a ∪,··, ∗ }, R, {R → a |b
;,∪ ;|((R · R))|((R ∪ R))|R ∗ }〉
b |cc |²²|;

would have to be converted to Greibach normal form. The offending left-recursive production R → R ∗
would have to be replaced, resulting in an extra nonterminal and about three times as many productions.
The definition of the PDA for this grammar would be correspondingly more complex (see the exercises).
Since every context-free language can be represented by a pushdown automaton with only one state,
one might suspect that more complex PDAs with extra states may be able to define languages that are
more complex than those in CΣ . It turns out that extra states yield no more cognitive power; the infor-
mation stored within the finite-state control can effectively be stored on the stack tape. This will follow
from the fact that the converse of Theorem 10.2, that context-free grammars have equivalent pushdown
automata, is also true.
Defining a pushdown automaton based on a context-free grammar is not as elegant as the construc-
tion presented in Definition 10.6, but the idea is to have the leftmost derivations in the grammar corre-
spond to successful move sequences in the PDA.

Definition 10.7 Let P = 〈Σ, Γ, S, s 0 , δ, B, ;〉 be a pushdown automaton. Define the grammar G P = 〈Ω, Σ, Z , P P 〉,
where
Ω = {Z } ∪ {A st | A ∈ Γ, s, t ∈ S}

and
rt t t t t t q
t m−1 t m
P P = {Z → B s0 t | t ∈ S} ∪ {A sq → a A 1 1 A 21 2 A 32 3 · · · A m−1 A mm | A ∈ Γ,a
a ∈ Σ ∪ {λ},
〈r, A 1 A 2 · · · A m 〉 ∈ δ(s,a
a , A), s, q, r, t 1 , t 2 , . . . , t m ∈ S}
∪ {A sr → a | s, r ∈ S, A ∈ Γ,a a ∈ Σ ∪ {λ}, 〈r, λ〉 ∈ δ(s,a a , A)}

rq
a , A) gives rise to a rule of the form A sq → a A 1 for
Note that when m = 1, the transition 〈r, A 1 〉 ∈ δ(s,a
each state q ∈ S.

322
Example 10.8
b }, {S,C }, {t }, t , δ, S, ;〉,
a ,b
Consider again the pushdown automaton from Example 10.4, defined by P 2 = 〈{a
where δ is given by

δ(t ,a
a , S) = {〈t , SC 〉, 〈t ,C 〉}
δ(t ,a
a ,C ) = { }
δ(t ,b
b , S) = { }
δ(t ,b
b ,C ) = {〈t , δ〉}

Since there is but one state and two stack symbols, the nonterminals for the corresponding grammar G P 2
are Ω = {Z , S t t ,C t t }. P P 2 can be calculated as follows: Z → S t t is the only rule arising from the first criteria
for productions. Since δ(t ,a a , S) contains 〈t , SC 〉, a move that produces two stack symbols, m = 2 and the
resulting production is S t t → a S t t C t t . The only other rule due to the second criteria arises because
a , S) contains 〈t ,C 〉, which with m = 1 yields S t t → a C t t . Finally, 〈t , λ〉 ∈ δ(t ,b
δ(t ,a b ,C ) causes C t t → b to
be added to the production set. The resulting grammar is therefore

G P 2 = 〈{Z , S t t ,C t t }, {a b }, Z , {Z → S t t , S t t → a S t t C t t , S t t → a C t t ,C t t → b }〉
a ,b

a n b n | n ≥ 1} and is therefore equivalent to P 2 .

and G P 2 does indeed generate {a

Example 10.9
Now consider the pushdown automaton from Example 10.1, defined by

b }, {A, B }, {q, r }, q, δ, B, ;〉,

a ,b
P 1 = 〈{a

where the nonempty transitions were

δ(q,a
a , B ) = {〈q, A〉}
δ(q,a
a , A) = {〈q, A A〉}
δ(q,b
b , A) = {〈r, λ〉}
δ(r,b
b , A) = {〈r, λ〉}

Since there are two stack symbols and two choices for each of the state superscripts, the nonterminal set
for the grammar G P is Γ = {Z , B q q , B qr , B r q , B r r , A q q , A qr , A r q , A r r }, although some of these will turn out
to be useless.
P P 1 contains the Z -rules Z → B q q and Z → B qr from the first criteria for productions. The transition
δ(q,aa , B ) = {〈q, A〉} accounts for the productions B qr → a A qr and B q q → a A q q . δ(q,a a , A) = {〈q, A A〉}
gives rise to the A q q-rules A q q → a A q q A q q and A q q → a A qr A r q , as well as A qr-rules A qr → a A q q A qr
and A qr → a A qr A r r . δ(q,bb , A) = {〈r, λ〉} accounts for another A qr -rule, A qr → b . Finally, the transition
δ(r,bb , A) = {〈r, λ〉} generates the only A r r-rule, A r r → b .
Note that some of the potential nonterminals (B qr , B r q , B r r ) are never generated, and others (A q q , B q q )
cannot produce terminal strings. The resulting grammar, with useless items deleted, is given by

G P 1 = 〈{Z , B qr , A qr , A r r }, {a b }, Z , {Z → B qr , B qr → a A qr , A qr → a A qr A r r , A qr → b , A r r → b }〉
a ,b

a n b n | n ≥ 1}.
and G P 1 generates the language P 1 recognizes: {a
Notice that the move sequence

323
aaabbb aabbb
aaabbb, B 〉 ` 〈q,aabbb
〈q,aaabbb aabbb, A〉
abbb
abbb, A A〉
` 〈q,abbb
bbb
bbb, A A A〉
` 〈q,bbb
bb
bb, A A〉
` 〈r,bb
` 〈r,bb , A〉
` r, λ, λ〉

corresponds to the leftmost derivation

Z ⇒ B qr ⇒ a A qr
⇒ aa A qr A r r
⇒ aaa A qr A r r A r r
⇒ aaab A r r A r r
⇒ aaabb A r r
⇒ aaabbb

Note the relationship between the sequence of stack configurations and the nonterminals in the cor-
responding sentential form. For example, when aaa has been processed by P 1 , A A A is on the stack, and
when the leftmost derivation has produced aaa aaa, the remaining nonterminals are also three A-based
symbols (A qr A r r A r r ). A qr denotes a nonterminal (which corresponds to the stack symbol A) that will
eventually produce a terminal string as the stack shrinks below the current size during a sequence of
transitions that lead from state q to state r . This finally happens in the last of the following steps, where
aaa A qr A r r A r r ⇒ aaabbb
aaabbb. A r r , by contrast, denotes a nonterminal (again corresponding to the stack
symbol A) that will produce a terminal string as the stack shrinks in size during transitions from state r
back to state r . In this example, this occurs in the next to the last two steps. The initial stack symbol po-
sition held by B is finally vacated during a sequence of transitions from q to r , and hence B qr appears in
the leftmost derivation. On the other hand, it was not possible to vacate B ’s position during a sequence
of moves from q to q, so B q q consequently does not participate in significant derivations.
The strong correspondence between profitable move sequences in P and valid leftmost derivations
in G P forms the cornerstone of the following proof.

Theorem 10.3 Given any alphabet Σ, PΣ ⊆ CΣ . In particular, for any pushdown automaton P , there is a
context-free grammar G P for which L(G P ) = Λ(P ).
Proof. Let P = 〈Σ, Γ, S, s 0 , δ, B, ;〉 be a pushdown automaton, and let G P be the grammar given in
Definition 10.7. The key to the proof is to show that all words accepted by empty stack in the PDA P can
be generated by G P and that only such words can be generated by G P . That is, we wish to show that the
automaton halts in some state t with an empty stack after processing the terminal string x exactly when
there is a leftmost derivation of the form
Z ⇒ A s0 t ⇒
*
x
That is,
(∀x ∈ Σ∗ )(Z ⇒ B s0 t ⇒
*
x iff 〈s 0 , x, B 〉 `* 〈t , λ, λ〉)
The desired conclusion, that L(G P ) = Λ(P ), will follow immediately from this equivalence. The equivalence
does not easily lend itself to proof by induction on the length of x; indeed, to progress from the m t h to the
m + 1st step, a more general statement involving more of the nonterminals of G P is needed. The following
statement can be proved by induction on the number of moves and leads to the desired conclusion when
s = s 0 and A = B :
(∀x ∈ Σ∗ )(∀A ∈ Γ)(∀s ∈ S)(∀t ∈ S)(A st ⇒
*
x ⇔ 〈s, x, A〉 `* 〈t , λ, λ〉)

324
The resulting grammar will then generate Λ(P ), but G P may not be a strict context-free grammar; λ-moves
may result in some productions of the form A sr → λ, which will then have to be “removed,” as specified by
Exercise 9.16.

Thus, PΣ = CΣ . Furthermore, only one state in a PDA is truly necessary, as noted in the following
corollary. In essence, this means that for PDAs that accept by empty stack, any state information can be
effectively encoded with the information on the stack.

Corollary 10.1 For every PDA P that accepts via empty stack, there is an equivalent one-state PDA P 0 that
also accepts via empty stack.
Proof. Let P be a PDA that accepts via empty stack. Let P 0 = PG P . That is, from the original PDA P ,
find the corresponding context-free grammar G P . By Theorem 10.3, this is equivalent to P . However, by
Theorem 10.2, the grammar G P has an equivalent one-state PDA, which must also be equivalent to P .

Unlike the pushdown automata discussed in this section, PDAs that accept via final state cannot al-
ways make do with a single state. As the exercises will make clear, at least one final and one nonfinal
state are typically necessary. Unlike DFAs, PDAs with only one state can accept some nontrivial lan-
guages, since selected words can be rejected because there is no appropriate move sequence. However,
a single final state and a single nonfinal state are sufficient for full generality, as shown in the following
section.

10.3 Equivalence Of Acceptance By Final State and Empty Stack

In this section, we explore the ramifications of accepting words according to the criteria that a final state
can be reached after processing all the letters on the input tape, rather than the criteria that the stack
is emptied. Theorem 10.4 will show that any language that can be accepted via empty stack can also be
accepted via final state. In terms of Definition 10.5, this means that PΣ ⊆ FΣ . Since PΣ = CΣ , this means
that every context-free language can be accepted by a PDA via final state. Theorem 10.5 ensures that no
“new” languages can be produced by pushdown automata that accept via final state; FΣ ⊆ PΣ , and so
FΣ = P Σ = C Σ .
As in the last section, the key to the correspondence is the definition of an appropriate translation
from one finite representation to another. We first consider a scheme for modifying a PDA so that the
new device can transfer to a final state whenever the old device was capable of emptying its stack. To do
this, we need to place a “buffer”symbol at the bottom of the stack, which will appear when the original
automaton would have emptied its stack. The new machine operates in almost the same fashion as the
original automaton; the differences amount to an additional transition at the start of operation to install
the new buffer symbol and an extra move at the end of operation to transfer to the (new) final state.

Theorem 10.4 Every pushdown automaton P that accepts via empty stack has an equivalent two-state
pushdown automaton Pf that accepts via final state.
Proof. Corollary 10.1 guaranteed that every pushdown automaton that accepts via empty stack has an
equivalent one-state pushdown automaton that also accepts via empty stack. Without loss of generality,
we may therefore assume that P = 〈Σ, Γ, {s}, s, δ, B, ;〉.Define P f by choosing a new state f and two new
stack symbols Y and Z such that Y , Z ∉ Γ, and let P f = 〈Σ, Γ∪{Y , Z }, {s, f }, s, δ f , Z , { f }〉, where δf is defined
by:

1. δf (s, λ, Z ) = {〈s, B Y 〉}

325
a ∈ Σ)(∀A ∈ Γ)(δf (s,a
2. (∀a a , A) = δ(s,a
a , A))

3. (∀A ∈ Γ)(δf (s, λ, A) = δ(s, λ, A))

4. δf (s, λ, Y ) = {〈 f , Y 〉}

a ∈ Σ)(δf (s,a
5. (∀a a , Z ) = { } ∧ δf (s,a
a , Y ) = { })

a ∈ Σ ∪ {λ})(∀A ∈ Γ ∪ {Y , Z })(δf ( f ,a
6. (∀a a , A) = { })

Notice that rules 2 and 3 imply that, while the original stack symbols appear on the stack, the machine
moves exactly as the original PDA. Rules 5 and 6 indicate that no letters can be consumed while there is a
Y or Z on the stack, and no moves are possible once the final state f is reached. Since the bottom of the
stack symbol is now the new letter Z , rule 1 is the only rule that initially applies. Its application results in
a configuration very much like that of the old PDA, with the symbol Y underneath the old bottom of the
stack symbol B . P f now simulates P until the Y is uncovered (that is, until a point is reached in which the
old PDA would have emptied its stack). In such cases (and only in such cases), rule 4 applies, and control
can be transferred to the final state f , and Pf must then halt.
By inducting on the number of moves in a sequence, it can be shown for any α, β ∈ Γ∗ that

(∀x, y ∈ Σ∗ )(〈s, x y, α〉 `* 〈s, y, β〉 in P ⇔ 〈s, x y, αY 〉 `* 〈s, y, βY 〉 in Pf )

From this, with y = β = λ and α = B , it follows that

(∀x ∈ Σ∗ )(〈s, x, B 〉 `* 〈s, λ, λ〉 in P ⇔ 〈s, x, B Y 〉 `* 〈s, λ, Y 〉 in Pf )

Consequently, since δf (s, λ, Z ) = {〈s, B Y 〉} and δf (s, λ, Y ) = {〈 f , Y 〉},

(∀x ∈ Σ∗ )(〈s, x, B 〉 `* 〈s, λ, λ〉 in P ⇔ 〈s, x, Z 〉 `* 〈 f , λ, Y 〉 in Pf )

which implies that (∀x ∈ Σ∗ )(x ∈ Λ(P ) ⇔ x ∈ L(Pf )), as was to be proved.

Thus, every language which is Λ(P ) for some PDA can be recognized by a PDA that accepts via final
state, and this PDA need only employ one final and one nonfinal state. Thus, PΣ ⊆ FΣ . One might
conjecture that FΣ , might actually be larger than PΣ , since some added capability might arise if more
than two states are used in a pushdown automaton that accepts via final state. This is not the case, as
demonstrated by the following theorem. Once again, the information stored in the finite control can
effectively be transferred to the stack; only one final and one nonfinal state are needed to accept any
context-free language via final state, and context-free languages are the only type accepted via final state.

Theorem 10.5 Every pushdown automaton P that accepts via final state has an equivalent pushdown
automaton P that accepts via empty stack.
Proof. Assume that P = 〈Σ, Γ, S, s 0 , δ, B, F 〉. Define P λ by choosing new stack symbols Y and Z such
that Y , Z ∉ Γ and a new state e such that e ∉ S, and let P λ = 〈Σ, Γ ∪ {Y , Z }, S ∪ {e}, s 0 , δλ , Z , ;〉, where δλ is
defined by:

1. δλ (s 0 , λ, Z ) = {〈s 0 , B Y 〉}

2. (∀a ∈ Σ)(∀A ∈ Γ)(∀s ∈ S)(δλ (s,a

a , A) = δ(s,a
a , A))

3. (∀A ∈ Γ)(∀s ∈ S − F )(δλ (s, λ, A) = δ(s, λ, A))

326
4. (∀A ∈ Γ)(∀ f ∈ F )(δλ ( f , λ, A) = δ( f , λ, A) ∪ {〈e, λ〉})

5. (∀A ∈ Γ)(δλ (e, λ, A) = {〈e, λ〉}

6. δλ (e, λ, Y ) = {〈e, λ〉}

The first rule guards against P λ inappropriately accepting if P simply empties its stack (by padding the
stack with the new stack symbol Y ). The intent of rules 2 through 4 is to arrange for P λ to simulate the
moves of P and allow P λ to enter the state e when final states can be reached. The state e does not allow
any further symbols to be processed, but does allow the stack contents (including the new buffer symbol) to
be emptied via rules 5 and 6. Thus, P λ has a sequence of moves for input x that empties the stack exactly
when P has a sequence of moves that leads to a final state.
By inducting on the number of moves in a sequence, it can be shown for any α, β ∈ Γ∗ that

(∀x, y ∈ Σ∗ )(∀s, t ∈ S)(〈s, x y, α〉 `* 〈t , y, β〉 in P ⇔ (s, x y, αY ) `* (t , y, βY ) in P λ )

From this, with y = λ, α = B , and t ∈ F , it follows that

(∀x, y ∈ Σ∗ )(∀t ∈ F ))〈s 0 , x, B 〉 `* 〈t , λ, β) in P ⇔ 〈s 0 , x, B Y 〉 `* 〈t , λ, βY 〉 in P λ )

Consequently, since δλ (s 0 , λ, Z ) = {〈s 0 , B Y 〉} and δλ (t , λ, A) contains (e, λ), repeated application of rules 5
and 6 implies

(∀x, y ∈ Σ∗ )(∀t ∈ F )(〈s 0 , x, B 〉 `* 〈t , λ, β) in P ⇔ 〈s 0 , x, Z 〉 `* 〈t , λ, λ〉 in P λ )

This shows that (∀x ∈ Σ∗ )(x ∈ Λ(P λ ) ⇔ x ∈ L(P )).

Thus, FΣ ⊆ PΣ and so FΣ = PΣ = CΣ . Acceptance by final state yields exactly the same class of
languages as acceptance by empty stack. This class of languages, described by these cognitive con-
structs, has been encountered before and can be defined by the generative constructs which comprise
the context-free grammars. Note that since the type 3 languages are contained in the type 2 languages,
the portion of Theorem 10.1 dealing with pushdown automata follows immediately from the results in
this and the previous section.

10.4 Closure Properties and Deterministic Pushdown Automata

Since the collection of languages recognized by pushdown automata is exactly the collection of context-
free languages, the results in Chapter 9 show that PΣ is closed under substitution, homomorphism,
union, concatenation, and Kleene closure. Results for context-free languages likewise imply that PΣ
is not closed under complement or intersection.
It is hard to imagine how to find a method that would combine two context-free grammars to produce
a new context-free grammar that might accept the intersection of the original languages. The constructs
for regular expressions and regular grammars likewise did not lend themselves to such methods, and yet
it was possible to find appropriate constructs that did represent intersections. As presented in Chapter 5,
this was possible by turning to the cognitive representation for this class of languages, the deterministic
finite automata. It is instructive to recall the technique that allowed two DFAs A 1 and A 2 to be combined
to form a new DFA A ∩ that accepts the intersection of the languages accepted by the original devices and
to see why this same method cannot be adapted to pushdown automata.

327
Figure 10.6: A model of a “pushdown automaton” with two tapes

The automaton A ∩ used a cross product of the states of A 1 and A 2 to simultaneously keep track of
the progress of both DFAs through an appropriate revamping of the state transition function. A ∩ only
accepted strings that would have reached final states in both A 1 and A 2 . Two pushdown automata P 1 and
P 2 might be combined into a new PDA pn using the cross-product approach, but the transition function
for this composite PDA cannot be reliably defined. A problem arises since the δ function depends on the
top stack symbol, and it is impossible to keep track of both the original stacks through any type of stack
encoding, since the stack size of P 1 might be increasing while the stack size of P 2 is decreasing. A device
like the one depicted in Figure 10.6 could be capable of recognizing the intersection of two context-free
languages, but such a machine is inherently more powerful than PDAs. The language {a a n b n c n | n ≥ 0} is
not context free, yet a two-tape automata could recognize this set of words by storing the initial a s on the
first stack tape, match them against the incoming b s while storing those b s on the second tape, and then
matching the c s against the b s on the second tape (see the exercises).
If one were to attempt to intersect a context-free language with a regular language, one would expect
the result to be context free, since the corresponding cross-product construct would need only one tape.
This is indeed the case, as shown by the following theorem.

Theorem 10.6 CΣ is closed under intersection with a regular set. That is, if L 1 is context free and R 2 is
regular, L 1 ∩ R 2 is always context free.
Proof. Let L 1 be a context-free language and let R 2 be a regular set. Since CΣ = FΣ there must be a
PDA P 1 = 〈Σ, Γ1 , S 1 , s 0l , δ1 , B 1 , F 1 〉 for which L 1 = L(P 1 ). Let A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉 be a DFA for which
R 2 = L(A 2 ). Define
P ∩ = 〈Σ, Γ1 , S 1 × S 2 , 〈s 01 , s 02 〉, δ∩ , B 1 , F 1 × F 2 〉,

where δ∩ is defined by:

1. (∀s 1 ∈ S 1 )(∀s 2 ∈ S 2 )(∀aa ∈ Σ)(∀A ∈ Γ1 )

∩ a , A) = {〈〈t 1 , t 2 〉, β〉 | 〈t 1 , β〉 ∈ δ1 (s 1 ,a
(δ (〈s 1 , s 2 〉,a a , A) ∧ t 2 = δ2 (s 2 ,a
a )})

328
2. (∀s 1 ∈ S 1 )(∀s 2 ∈ S 2 )(∀A ∈ Γ1 )
(δ∩ (〈s 1 , s 2 〉, λ, A) = {〈〈t 1 , t 2 〉, β〉 | (t 1 , β) ∈ δ1 (s 1 , λ, A) ∧ t 2 = s 2 }).

As with the constructions in the previous sections, induction on the number of moves exhibits the
desired correspondence between the behaviors of the machines. In particular, it can be shown for any
α, β ∈ Γ∗ that

(∀x ∈ Σ∗ )(∀s 1 , t 1 ∈ S 1 )(∀s 2 , t 2 ∈ S 2 )(〈〈s 1 , s 2 〉, x y, α〉 `* 〈〈t 1 , t 2 〉, y, β) in P 1 ⇔

((〈s 1 , x y, α) `* 〈t 1 , y, β〉 in P ∩ ) ∧ (t 2 = δ2 (s 2 , x))))

From this, with α = B 1 , s 1 = s 01 , s 2 = s 02 and the observation that 〈t 1 , t 2〉 ∈ F 1 × F 2 iff t 1 ∈ F 1 ∧ t 2 ∈ F 2 , it

follows that (∀x ∈ Σ∗ )(x ∈ L(P ∩ ) ⇔ (x ∈ L(P 1 ) ∧ x ∈ L(A 2 ))). Therefore,

L(P ∩ ) = L(P 1 ) ∩ L(A 2 ) = L 1 ∩ R 2 .

Since P ∩ is a PDA accepting L 1 ∩ R 2 , L 1 ∩ R 2 must be context free.

Closure properties such as this are quite useful in showing that certain languages are not context free.
Consider the set L = {x ∈ {a a ,b
b ,cc }| |x|a = |x|b = |x|c }. Since the letters in a word can occur in any order,
a pumping theorem proof is less straight-forward than for the set {a a n b n c n | n ≥ 0}. However, if L were
context free, then L ∩ a ∗b ∗c ∗ would also be context free (why?). But L ∩ a ∗b ∗c ∗ = {a a n b n c n | n ≥ 0}, and
thus L cannot be context free. The exercises suggest other occasions for which closure properties are
useful in showing certain languages are not context free.
For the machines discussed in the first portion of this text, it was seen that nondeterminism did not
add to the computing power of DFAs. In contrast, this is not the case for pushdown automata. There are
languages that can be accepted by nondeterministic pushdown automata that cannot be accepted by
any deterministic pushdown automaton. The following is the broadest definition of what can constitute
a deterministic pushdown automaton.

Definition 10.8 A deterministic pushdown automaton (DPDA) is a pushdown automaton

P = 〈Σ, Γ, S, s 0 , δ, B, F 〉 with the following restrictions on the state transition function δ:

a ∈ Σ)(∀A ∈ Γ)(∀s ∈ S)(δ(s,a

1. (∀a a , A) is empty or contains just one element).

2. (∀A ∈ Γ)(∀s ∈ S)(δ(s, λ, A) is empty or contains just one element).

3. (∀A ∈ Γ)(∀s ∈ S)(δ(s, λ, A) 6= ; ⇒ (∀a

a ∈ L)(δ(s,a
a , A) = ;)).

Rule 1 states that, for a given input letter, deterministic pushdown automata cannot have two differ-
ent choices of destination states or two different choices of strings to place on the stack. Rule 2 ensures
that there is no choice of λ-moves either. Furthermore, rule 3 guarantees that there will never be a choice
between a λ-move and a transition that consumes a letter; states that have a λ-move can have only that
one move; no other transitions of any type are allowed out of that state. Thus, for any string, there is
never any more than one path through the machine. Unlike deterministic finite automata, determinis-
tic pushdown automata may not always completely process the strings in Σ∗ ; a given string may reach
a state that has no further valid moves, or a string may prematurely empty the stack. In each case, the
DPDA would halt without processing any further input (and reject the string).

329
Example 10.10

The automaton P 1 in Example 10.1 was deterministic. The PDAs in Examples 10.4 and 10.5 were not de-
terministic. The automaton PG derived in Example 10.7 was not deterministic because there were three
possible choices of moves listed for δG (s,((, R): {〈s, R ·R))〉, 〈s, R ∪R))〉, 〈s, R))∗ 〉}. These choices corresponded
to the three different operators that might have generated the open parenthesis.

Pushdown automata provide an appropriate mechanism for parsing sentences in programming lan-
guages. The regular expression grammar in Example 10.7 is quite similar to the arithmetic expression
grammar that describes expressions in many programming languages. Indeed, the transitions taken
within the corresponding PDA give an indication of which productions in the underlying grammar were
used; such information is of obvious use in compiler construction. A nondeterministic pushdown au-
tomaton is at best a very inefficient tool for parsing; a DPDA is much better suited to the task.

As mentioned in the proof of Theorem 10.2, each leftmost derivation in G has a corresponding se-
quence of moves in PG . If G is ambiguous, then there is at least one word with two distinct leftmost
derivations, and hence if that word appeared on the input tape of PG , there would be two distinct move
sequences leading to acceptance. In this case, PG cannot possibly be deterministic. On the other hand,
if PG is nondeterministic, this does not mean that G is ambiguous, as demonstrated by Example 10.7.
In parsing a string in that automaton, it may not be immediately obvious which production to use (and
hence which transition to take), but for any string, there is at most only one correct choice; each word has
a unique parse tree and a unique leftmost derivation. The grammar in Example 10.7 is not ambiguous,
even though the corresponding PDA was nondeterministic.

Example 10.11

The following Greibach normal form grammar is similar to the one used to construct the PDA in Example
10.7, but with the different operators paired with unique delimiters. Let

b ,cc ,((,)),{{,}},[[,]],²²,;
a ,b
G = 〈{R}, {a ∪,··, ∗ }, R, {R → a |b
;,∪ b |cc |²²|; R ∗ }〉.
;|((R · R))|[[R ∪ R]]|R

The automaton PG is then

b ,cc ,((,)),{{,}},[[,]],²²,;
a ,b
〈{a ∪,··, ∗ }, {R,a
;,∪ a ,b ∪,··, ∗ }, {s}, s, δG , R, ;〉
b ,cc ,((,)),{{,}},[[,]],²²,∪

where δG is comprised of the following nonempty transitions:

330
δG (s,((, R) = {〈s, R · R))〉}
δG (s,[[, R) = {〈s, R ∪ R]]〉}
δG (s,{{, R) = {〈s, R}}∗ 〉}
δG (s,aa , R) = {〈s, λ〉}
δG (s,bb , R) = {〈s, λ〉}
δG (s,cc , R) = {〈s, λ〉}
δG (s,²², R) = {〈s, λ〉}
δG (s,;
;, R) = {〈s, λ〉}
δG (s,aa ,aa ) = {〈s, λ〉}
δG (s,bb ,bb ) = {〈s, λ〉}
δG (s,cc ,cc ) = {〈s, λ〉}
δG (s,;
;,; ;) = {〈s, λ〉}
δG (s,² ,²²) = {〈s, λ〉}
²
δG (s,∪
∪,∪ ∪) = {〈s, λ〉}
δG (s,··,··) = {〈s, λ〉}
δG (s, ∗ , ∗ ) = {〈s, λ〉}
δG (s,)),))) = {〈s, λ〉}
δG (s,]],]]) = {〈s, λ〉}
δG (s,}},}}) = {〈s, λ〉}
δG (s,((,(() = {〈s, λ〉}
δG (s,[[,[[) = {〈s, λ〉}
δG (s,{{,{{) = {〈s, λ〉}

All other transitions are empty; that is, they are of the form δG (s,dd , A) = { }. The resulting PDA is
clearly deterministic, since there are no λ-moves and the other transitions are all singleton sets or are
empty. It is instructive to step through the transitions in PG for a string such as [{(a · b)}∗ ∪ c]
c]. Upon en-
countering a delimiter while scanning a prospective string, the parser would immediately know which
operation gave rise to that delimiter, and need not “guess” at which of the three productions might have
been applied. Note that G was an LL0 grammar (as defined in Section 9.2), and the properties of G re-
sulted in PG being a deterministic device. An efficient parser for this language follows immediately from
the specification of the grammar, whereas the grammar in Example 10.7 gave rise to a nondeterministic
device.
Programmers would not be inclined to tolerate remembering which delimiters should be used in
conjunction with the various operators, and hence programming language designers take a slightly dif-
ferent approach to the problem. The nondeterminism in Example 10.7 may only be an effect of the
a n b n | n ≥ 1}
particular grammar chosen and not inherent in the language itself. Note that the language {a
had a grammar that produced a nondeterministic PDA (Example 10.4), but it also had a grammar that
corresponded to a DPDA (Example 10.1). In compiler construction, designers lean toward syntax that is
compatible with determinism, and they seek grammars for the language that reflect that determinism.

Example 10.12

Consider again the language discussed in Example 10.7, which can also be expressed by the following
grammar
b ,cc ,((,)),²²,;
a ,b
H = 〈{S, T }, {a ∪,··, ∗ }, s, {S → ( ST |a
;,∪ b |cc |²²|;
a |b ∪ S))|))∗ }〉
;, T → · S))|∪

331
The automaton P H is then

b ,cc ,((,)),²²,;
a ,b
〈{a ∪,··, ∗ }, {S, T,a
;,∪ b ,cc ,((,)),²²,;
a ,b ∪,··, ∗ }, {t }, t , δH , S, ;〉
;,∪

where each production of H gives rise to the following transitions in δH :

δH (t ,((, S) = {〈t , ST 〉}
δH (t ,aa , S) = {〈t , λ〉}
δH (t ,bb , S) = {〈t , λ〉}
δH (t ,cc , S) = {〈t , λ〉}
δH (t ,²², S) = {〈t , λ〉}
δH (t ,;
;, S) = {〈t , λ〉}
δH (t ,··, T ) = {〈t , S))〉}
δH (t ,∪
∪, T ) = {〈t , S))〉}
δH (t ,)), T ) = {〈t , ∗ 〉}

While the formal definition of δH specifies several productions of the form δH (t ,d d ) = {〈t , λ〉}, by ob-
d ,d
serving what can be put on the stack by the above productions, it is clear that the only remaining useful
transitions in δH are
δH (t , ∗ , ∗ ) = {〈t , λ〉}

and
δH (t ,)),))) = {〈t , λ〉}

Thus, even though the PDA PG in Example 10.7 turned out to be nondeterministic, this was not a flaw
in the language itself, since P H is an equivalent DPDA. Notice that the grammar G certainly appears to
be more straightforward than H . G had fewer nonterminals and fewer productions, and it is a bit harder
to understand the relationships between the nonterminals of H . Nevertheless, the LL0 grammar H led
to an efficient parser and G did not.
To take advantage of the resulting reduction in complexity, all major programming languages are
designed to be recognized by DPDAs. These constructs naturally lead to a mechanical framework for
syntactic analysis. In Example 10.12, the application of the production T → ∪ S)) [that is, the use of the
transition δH (t ,∪
∪, T ) = {〈t , S))} ] signifies that the previous expression and the expression to which S will
expand are to be combined with the union operator. It should be easy to see that a similar grammar and
DPDA for arithmetic expressions (using +, −,∗ , and / rather than ∪, · and ∗ ) would provide a guide for
converting such expressions into their equivalent machine code.
Deterministic pushdown automata have some surprising properties. Recall that CΣ was not closed
under complementation, and since PΣ = CΣ , there must be some PDAs that define languages whose
complement cannot be recognized by any PDA. However, it can be shown that any language accepted
by a DPDA must have a complement that can also be recognized by a DPDA. The construction used to
prove this statement, in which final and nonfinal states are interchanged in a DPDA that accepts via fi-
nal state, is similar to the approach used in Theorem 5.1 for deterministic finite automata. It is useful
to recall why it was crucial in the proof of Theorem 5.1 to begin with a DFA when interchanging states,
rather than using an NDFA. Strings that have multiple paths in an NDFA that lead to both final and non-
final states would be accepted in the original automaton and also in the machine with the states inter-
changed. Furthermore, some strings may have no complete paths through the NDFA and be rejected in
both the original and new automata. The problem of multiple paths does not arise with DPDAs, since by
definition no choice of moves is allowed. However, strings that do not get completely consumed would

332
be rejected in both the original DPDA and the DPDA with final and nonfinal states interchanged. Thus,
the proof of closure under complement for DPDAs is not as straightforward as for DFAs. There are three
ways an input string might not be completely consumed: the stack might empty prematurely, there may
be no transition available at some point, or there might only be a cycle of λ-moves available that con-
sumes no further input. The exercises indicate that it is possible to avoid these problems by padding
the stack with a new bottom-of-the-stack symbol, and adding a “garbage state” to which strings that are
hopelessly stuck would transfer.

Theorem 10.7 If L is a language recognized by a deterministic pushdown automaton, then ∼L can also be
recognized by a DPDA.
Proof. See the exercises.

Definition 10.9 Given any alphabet Σ let AΣ represent the collection of all languages recognized by de-
terministic pushdown automata. If L ∈ AΣ , then L is said to be a deterministic context-free language
(DCFL).

Theorem 10.7 shows that unlike PΣ , AΣ is closed under complementation. This divergent behavior
has some immediate consequences, as stated below.

Theorem 10.8 Let Σ be an alphabet.

If |Σ| = 1, then DΣ = AΣ = PΣ .
If |Σ| > 1, then DΣ is properly contained in AΣ , which is properly contained in PΣ .

Proof. For every alphabet Σ, examining the proof of Theorem 10.1 shows that every finite automaton
has an equivalent deterministic pushdown automaton, and thus it is always true that DΣ ⊆ AΣ . Definition
10.7 implies that AΣ ⊆ PΣ . If |Σ| = 1, then Theorem 9.15 showed that DΣ = CΣ (= PΣ ), from which it follows
that DΣ = AΣ = PΣ . If |Σ| > 1, an example such as {a a n b n |n ≥ 1} shows that DΣ is properly contained in AΣ
(see the exercises). Since P{aa ,bb } and A{aa ,bb } have different closure properties, they cannot represent the same
collection, and AΣ ⊆ PΣ implies that the containment must be proper.

In the proof of Theorem 10.6, it is easy to see that if P 1 is deterministic then P ∩ will be a DPDA, also.
Hence AΣ , like PΣ , is closed under intersection with a regular set. Also, the exercises show that both
AΣ and PΣ are closed under difference with a regular set. However, the closure properties of AΣ and
PΣ disagree in just about every other case. The languages

a n b m | (n ≥ 1) ∧ (n = m)}
L 1 = {a and a n b m | (n ≥ 1) ∧ ((n = 2m)}
L 2 = {a

are both DCFLs, and yet L 1 ∪ L 2 = {a a n b m | (n ≥ 1) ∧ (n = m ∨ n = 2m)} is not a DCFL (see the exercises).
Thus, unlike PΣ , AΣ is not closed under union if Σ is comprised of at least two symbols (recall that since
D{aa } = A{aa } = P{aa } , A{aa } would be closed under union). If AΣ was closed under intersection, then AΣ
would (by De Morgan’s law) be closed under union, since it is closed under complement. Hence, AΣ
cannot be closed under intersection.
The language {cc n b m | (n ≥ 1) ∧ (n = m)} ∪ {a a n b m | (n ≥ 1) ∧ (n = 2m)} is definitely a DCFL, and yet a
simple homomorphism can transform it into

a n b m | (n ≥ 1) ∧ ((n = m) ∨ (n = 2m))}
{a

333
(see the exercises). Thus, AΣ is not closed under homomorphism. Since homomorphisms are special
cases of substitutions, AΣ is not closed under substitution either. AΣ is also the only collection of lan-
guages discussed in this text that is not closed under reversal; {cc a n b m | (n ≥ 1) ∧ (n = m)} ∪ {a
a n b m | (n ≥
m n
b a c | (n ≥ 1) ∧ (n = m)} ∪ {b m n
b a | (n ≥ 1) ∧ (n = 2m)} is not. These
1) ∧ (n = 2m)} is a DCFL, but {b
properties are summed up in the following statements.

Theorem 10.9 Given any alphabet Σ, AΣ is closed under complement. AΣ is also closed under union,
intersection, and difference with a regular set. That is, if L 1 is a DCFL and R 2 is a FAD language, then the
following are deterministic, context-free languages:

∼L 1
L 1 ∩ R2
L 1 ∪ R2
L 1 − R2
R2 − L 1

Proof. The proof follows from the above discussion and theorems and the exercises.

Lemma 10.1 Let Σ be an alphabet comprised of at least two symbols. Then AΣ is not closed under union,
intersection, concatenation, Kleene closure, homomorphism, substitution, or reversal. That is, there are
examples of deterministic context-free languages L 1 and L 2 , a homomorphism h, and a substitution s for
which the following are not DCFLs:

L1 ∪ L2
L1 ∩ L2
L1 · L2
L ∗1
h(L 1 )
s(L 1 )
L r1

Proof. The proof follows from the above discussion and theorems and the exercises.

Example 10.13
These closure properties can often be used to justify that certain languages are not DCFLs. For example,
the language
b ,cc }∗ | |x|a = |x|b } ∪ {x ∈ {a
a ,b
L = {x ∈ {a b ,cc }∗ | |x|b = |x|c }
a ,b
can be recognized by a PDA but not by a DPDA. If L were a DCFL, then ∼L = {x ∈ {a b ,cc }∗ | |x|a 6= |x|b } ∩
a ,b
a ,b
{x ∈ {a a k b n c m | (k 6= n) ∧ (n 6= m)},
b ,cc }∗ | |x|b 6= |x|c } would also be a DCFL. However, ∼L ∩ a ∗b ∗c ∗ = {a
which should also be a DCFL. Ogden’s lemma shows that this is not even a CFL (see the exercises), and
hence the original hypothesis that L was a DCFL must be false. The interested reader is referred to similar
discussions in [HOPC] and [DENN].
The restriction that the head scanning the stack tape could only access the symbol at the top of the
stack imposed limitations on the cognitive power of this class of automata. While the current contents
of the top of the stack could be stored in the finite-state control and be remembered after the stack
was popped, only a finite number of such pops can be recorded within the states of the PDA. At some
point, seeking information further down on the stack will cause an irretrievable loss of information. One

334
might suspect that if popped items were not erased (so that they could be revisited and reviewed at some
later point) a wider class of languages might be recognizable. Generalized automata that allow such
nondestructive “backtracking” are called Turing machines and form a significantly more powerful class
of automata. These devices and their derivatives are the subject of the next chapter.

Exercises
10.1. Refer to Theorem 10.1 and use induction to show

(∀x ∈ Σ∗ )(δ(s 0 , x) = t ⇔ 〈s 0 , x, ¢〉 `* 〈t , λ, ¢〉)

a nb n | n ≥
10.2. Define a deterministic pushdown automaton P 1∗ with only one state for which Λ(P 10 ) = {a
1}.

10.3. Consider the pushdown automaton defined by P 20 = 〈{a b }, {S,C }, {t }, t , δ, S, {t }〉, where δ is defined
a ,b
by

δ(t ,a
a , S) = {〈t , SC 〉, 〈t ,C 〉}
δ(t ,a
a ,C ) = { }
δ(t ,b
b , S) = { }
δ(t ,b
b ,C ) = {〈t , λ〉}

(a) Give an inductive proof that

a i , S〉 `* 〈t , λ, α〉 ⇒ (α = SC i ∨ α = C i ))
(∀i ∈ N)(〈t ,a

(b) Give an inductive proof that

(∀i ∈ N)(〈t , x,C 〉 `* 〈t , λ, β)〉 ⇒ (x = b i ))

a i b j c k | i , j , k ∈ N and i + j = k}.
10.4. Let L = {a

(a) Find a pushdown automaton (which accepts via final state) that recognizes L.
(b) Find a pushdown automaton (which accepts via empty stack) that recognizes L.
(c) Is there a counting automaton that accepts L?
(d) Is there a DPDA that accepts L?
(e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (a).

b ,cc }∗ | |x|a + |x|b = |x|c }.

a ,b
10.5. Let L = {x ∈ {a

335
(e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (a).

10.6. Prove or disprove that:

(a) PΣ is closed under inverse homomorphism.

(b) AΣ is closed under inverse homomorphism.

10.7. Give an example of a finite language that cannot be recognized by any one-state PDA that accepts
via final state.

a n b n c m d m | n, m ∈ N}.
10.8. Let L = {a

(a) Find a pushdown automaton (which accepts via final state) that recognizes L.
(b) Find a pushdown automaton (which accepts via empty stack) that recognizes L.
(c) Is there a DPDA that accepts L?
(d) Is there a counting automaton that accepts L?
(e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (b).

10.9. Refer to Theorem 10.2 and use induction on the number of moves in a sequence to show that

(∀x ∈ Σ∗ )(∀β ∈ (Σ ∪ Ω)∗ )(〈s, x, S) `* 〈s, λ, β〉 iff *

S⇒ xβ as a leftmost derivation)

10.10. Consider the grammar

b ,cc ,((,)),²²,;
a ,b
〈{R}, {a ∪,··, ∗ }, R, {R → a |b
;,∪ ;|((R · R))|((R ∪ R))|R ∗ }〉
b |cc |²²|;

(a) Convert this grammar to Greibach normal form, adding the new nonterminal Y .
(b) Use Definition 10.6 on part (a) to find the corresponding PDA.
(c) Use the construct suggested by Theorem 10.4 in part (b) to find the corresponding PDA that
accepts via final state.

a i b j c j d i | i , j ∈ N}.
10.11. Let L = {a

10.12. Consider the PDA P 3 in Example 10.5. Use Definition 10.7 to find G P 3 .

10.13. Refer to Theorem 10.3 and use induction to show

(∀x ∈ Σ∗ )(∀A ∈ Γ)(∀s ∈ S)(∀t ∈ S)(A st ⇒

*
x iff 〈s, x, A〉 `* 〈t , λ, λ〉)

a n b n c m d m | n, m ∈ N} ∪ {a
10.14. Let L = {a a i b j c j d i | i , j ∈ N}.

(a) Find a pushdown automaton (which accepts via final state) that recognizes L.

336
(b) Find a pushdown automaton (which accepts via empty stack) that recognizes L.
(c) Is there a DPDA that accepts L?
(d) Is there a counting automaton that accepts L?
(e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (b).

10.15. Consider the PDA PG in Example 10.6. Use Definition 10.7 to find G PG .

10.16. Refer to Theorem 10.4 and use induction to show

(∀α, β ∈ Γ∗ )(∀x, y ∈ Σ∗ )(〈s, x y, α〉 `* 〈s, y, β〉 in P iff 〈s, x y, αY 〉 `* 〈s, y, βY 〉 in Pf )

10.17. Refer to Theorem 10.5 and use induction to show

(∀α, β ∈ Γ∗ )(∀x, y ∈ Σ∗ )(∀s, t ∈ S)(〈s, x y, α〉 `* 〈t , y, β〉 in P iff 〈s, x y, αY 〉 `* 〈t .y, βY 〉 in P λ )

b ,cc }∗ | |x|a = |x|b ∧ |x|b ≥ |x|c } is not context free. (Hint: Use closure properties.)
a ,b
10.18. Prove that {x ∈ {a

10.19. (a) Give an appropriate definition for the state transition function of the two-tape automaton
pictured in Figure 10.6, stating the new domain and range.
a n b n c n | n ≥ 1} via final state.
(b) Define a two-tape automaton that accepts {a

10.20. a n b n c n | n ≥ 1} is not context free.

(a) Prove that {a
b ,cc }∗ | |x|a = |x|b } is not context free. [Hint: Use closure properties and
a ,b
(b) Prove that {x ∈ {a
apply part (a).]

10.21. (a) Find a DPDA that accepts

{cc n b m | (n ≥ 1) ∧ (n = m)} ∪ {a
a n b m | (n ≥ 1) ∧ (n = 2m)}

(b) Define a homomorphism that transforms part (a) into a language that is not a DCFL.

a k b n c m | (k 6= n) ∧ (n 6= m)} is not a context-free language.

10.22. Use Ogden’s lemma to show that {a

10.23. Refer to Theorem 10.6 and use induction to show

(∀α, β ∈ Γ∗ )(∀x ∈ Σ∗ )(∀s 1 , t 1 ∈ S 1 )(∀s 2 , t 2 ∈ S 2 )

(〈〈s 1 , s 2 〉, x y, α〉 `* 〈〈t 1 , t 2 〉, y, β〉 in P 1 ⇔ ((〈s 1 , x y, α〉 `* 〈t 1 , y, β〉 in P ∩ ) ∧ (t 2 = δ2 (s 2 , x))))

10.24. Assume that P is a DPDA. Prove that there is an equivalent DPDA P 0 (which accepts via final state)
for which:

(a) P 0 always has a move for all combinations of states, input symbols, and stack symbols.
(b) P 0 never empties its stack.
(c) For each input string presented to P 0 , P 0 always scans the entire input string.

10.25. Assume the results of Exercise 10.24, and show that AΣ is closed under complementation. (Hint:
Exercise 10.24 almost allows the trick of switching final and nonfinal states to work; the main re-
maining problem involves handling the case where a series of λ-moves may cycle through both
final and nonfinal states.)

337
10.26. Give an example that shows that AΣ is not closed under concatenation.

10.27. Give an example that shows that AΣ is not closed under Kleene closure.

ca n b m | (n ≥ 1) ∧ (n = m)} ∪ {a
10.28. Show that {ca a n b m | (n ≥ 1) ∧ (n = 2m)} is a DCFL.

10.29. (a) Modify the proof of Theorem 10.6 to show that if L 1 is context free and R 2 is regular, L 1 − R 2 is
always context free.
(b) Prove the result in part (a) by instead appealing to closure properties for complement and
intersection.

10.30. (a) Modify the proof of Theorem 10.6 to show that if L 1 is context free and R 2 is regular, L 1 ∪ R 2 is
always context free.
(b) Prove the result in part (a) by instead appealing to closure properties for complement and
intersection.

10.31. Argue that if L 1 is a DCFL and R 2 is regular, R 2 − L 1 is always a DCFL.

10.32. (a) Prove that {w2w r | w ∈ {0

0,1 2}∗ } is a DCFL.
1,2
(b) Prove that {w w r | w ∈ {0 1}∗ } is not a DCFL.
0,1

10.33. Give examples to show that even if L 1 and L 2 are DCFLs:

(a) L 1 · L 2 need not be a DCFL.

(b) L 1 − L 2 need not be a DCFL.
(c) L ∗1 need not be a DCFL.
(d) L r1 need not be a DCFL.

10.34. Consider the quotient operator / given by Definition 5.10. Prove or disprove that:

(a) PΣ is closed under quotient.

(b) AΣ is closed under quotient.

10.35. Consider the operator b defined in Theorem 5.11. Prove or disprove that:

(a) PΣ is closed under the operator b.

(b) AΣ is closed under the operator b.

10.36. Consider the operator Y defined in Theorem 5.7. Prove or disprove that:

(a) PΣ is closed under the operator Y .

(b) AΣ is closed under the operator Y .

10.37. Consider the operator P given in Exercise 5.16. Prove or disprove that:

(a) PΣ is closed under the operator P .

(b) AΣ is closed under the operator P .

10.38. Consider the operator F given in Exercise 5.19. Prove or disprove that:

(a) PΣ is closed under the operator F .

(b) AΣ is closed under the operator F .

338
Chapter 11

Turing Machines

In the preceding chapters, we have seen that DFAs and NDFAs represented the type 3 languages and
pushdown automata represented the type 2 languages. In this chapter we will explore the machine ana-
log to the type 1 and type 0 grammars. These devices, called Turing machines, are the most powerful
automata known and can recognize every language considered so far in this text. We will also encounter
languages that are too complex to be recognized by any Turing machine. Indeed, we will see that any
other such (finite) scheme for the representation of languages is likewise forced to be unable to represent
all possible languages over a given alphabet. Turing machines provide a gateway to undecidability, dis-
cussed in the next chapter, and to the general theory of computational complexity, which is rich enough
to warrant much broader treatment than would be possible here.

11.1 Definitions and Examples

Pushdown automata turned out to be the appropriate cognitive devices for the type 2 languages, but
further enhancements in the capabilities of the automaton model are necessary to achieve the generality
inherent in type 0 and type 1 languages. A (seemingly) minor modification will be all that is required.
Turing machines are comprised of the familiar components that have already been used in previous
classes of automata. As with the earlier constructions, the heart of the device is a finite-state control,
which reacts to information scanned by the tape head(s). Like finite-state transducers and pushdown
automata, information can be written to tape as transitions between states are made. Unlike FSTs and
PDAs, Turing machines have only one tape with which to work, which serves both the input and the
output needs of the device. Note that with finite-state transducers the presence of a second tape was
purely for convenience; a single tape, with input symbols overwritten by the appropriate output symbol
as the read head progressed, would have sufficed. Whereas a pushdown automaton could write an entire
string of symbols to the stack, a Turing machine is constrained to print a single letter at a time. These
new devices would therefore be of less value than PDAs were they not given some other capability. In all
previous classes of automata, the read head was forced to move one space to the right on each transition
(or, in the case of λ-moves, remain stationary). On each transition, the Turing machine tape head has
the option of staying put, moving right, or moving left. The ability to move back to the left and review
previously written information accounts for the added power of Turing machines.
It is possible to view a Turing machine as a powerful transducer of computable functions, with an
associated function defined much like those for FSTs. That is, as with finite-state transducers, each word
that could be placed on an otherwise blank tape is associated with the word formed by allowing the

339
Turing machine to operate on that word. With FSTs, this function was well defined; the machine would
process each letter of the word in a unique way, the read head would eventually find the end of the word
(that is, it would scan a blank), and the device would then halt. With Turing machines, there is no built-in
guarantee that it will always halt; since the tape head can move both right and left, it is possible to define
a Turing machine that would reverberate back and forth between two adjacent spaces indefinitely. A
Turing machine is also not constrained to halt when it scans a blank symbol; it may overwrite the blank
and/or continue moving right indefinitely.
Rather than viewing a Turing machine as a transducer, we will primarily be concerned with employ-
ing it as an acceptor of words placed on the tape. Some variants of Turing machines are defined with
a set of final states, and the criteria for acceptance would then be that the device both halt and be in a
final state. For our purposes, we will employ the writing capabilities of the Turing machine and simply
require that acceptance be indicated by printing a Y just prior to halting. If such a Y is never printed
or the machine does not halt, the word will be considered rejected. It may be that there are words that
might be placed on the input tape that would prevent the machine from halting, which is at best a seri-
ous inconvenience; if the device has been operating for an extraordinary amount of time, we may not be
able to tell if it will never halt (and thus reject the word), or whether we simply need to be patient and
wait for it to eventually print the Y . This uncertainty can in some cases be avoided by finding a superior
design for the Turing machine, which would always halt, printing N when a word is rejected and Y when
a word is accepted. This is not always a matter of being clever in defining the machine; we will see that
there are some languages that are inherently so complex that this goal is impossible to achieve.

Figure 11.1: A model of a Turning machine.

A conceptual model of a Turing machine is shown in Figure 11.1. Note that the tape head is capable
of both reading and overwriting the currently scanned symbol. As before, the tape is composed of a
series of cells, with one symbol per cell. The tape head will also be allowed to move one cell to either
the left or right during a transition. Note that unlike all previous automata, the tape does not have a “left
end”; it extends indefinitely in both directions. This tape will be used for input, output, and as a “scratch
pad” for any intermediate calculations. At the start of operation of the device, all but a finite number of
contiguous cells are blank. Also, unlike our earlier devices, the following definition implies that Turing
machines may continue to operate after scanning a blank.

Definition 11.1 A Turing machine that recognizes words over an alphabet Σ is a quintuple M = 〈Σ, Γ, S, s 0 , δ〉,
where

Σ is the input alphabet.

Γ is the auxiliary alphabet, and Σ, Γ, and {L, R} are pairwise disjoint sets of symbols.

340
S is a finite nonempty set of states (and S ∩ (Σ ∪ Γ) = ;).
s 0 is the start state (s 0 ∈ S).
δ is the state transition function δ: S × (Σ ∪ Γ) → (S ∪ {h}) × (Σ ∪ Γ ∪ {L
L ,R
R }).

The auxiliary alphabet always includes the blank symbol (denoted by #), and neither Σ nor Γ include the
special symbols L and R , which denote moving the tape head left and right, respectively. The state h is a
special halt state, from which no further transitions are possible; h ∉ S.

The alphabet Σ is intended to denote the nonblank symbols that can be expected to be initially
present on the input tape. By convention, it is assumed that the tape head is positioned over the left-
most nonblank (in the case of the empty string, though, the head will be scanning a blank). In Definition
11.1, the state transition function is deterministic; for every state in S and every tape symbol scanned,
exactly one destination state is specified, and one action is taken by the tape head. The tape head may
either:

1. Overprint the cell with a symbol from Σ or Γ (and thus a blank may be printed).

2. Move one cell left (without printing).

3. Move one cell right (also without printing).

In the case where a cell is overprinted, the tape head remains positioned on that cell.
The above definition of a Turing machine is compatible with the construct implemented by Jon Bar-
wise and John Etchemendy in their Turing’s World© software package for the Apple® Macintosh. The
Turing’s World program allows the user to interactively draw a state transition diagram of a Turing ma-
chine and watch it operate on any given input string. As indicated by the next example, the same software
can be used to produce and test state transition diagrams for deterministic finite automata.

Example 11.1
The following simple Turing machine recognizes the set of even-length words over {a a ,b
b }. The state tran-
sition diagram for this device is shown in Figure 11.2 and conforms to the conventions introduced in
Chapter 7. Transitions between states are represented by arrows labeled by the symbol that caused the
transition. The symbol after the slash denotes the character to be printed or, in the case of L and R , the
a ,b
direction to move the tape head. The quintuple is 〈{a b }, {# N }, {s 0 , s 1 }, s 0 , δ〉, where δT is given by
Y ,N
#,Y

δT (s 0 ,a
a ) = 〈s 1 ,R
R〉
δT (s 0 ,b
b ) = 〈s 1 ,R
R〉
δT (s 0 ,#
#) = 〈h,Y Y〉
δT (s 1 ,a
a ) = 〈s 0 ,R
R〉
δT (s 1 ,b
b ) = 〈s 0 ,R
R〉
δT (s 1 ,#
#) = 〈h,N N〉

This particular Turing machine operates in much the same way as a DFA would, always moving right
as it scans each symbol of the word on the input tape.
When it reaches the end of the word (that is, when it first scans a blank), it prints Y or N , depending
on which state it is in, and halts. It differs from a DFA in that the accept/reject indication is printed on
the tape at the right end of the word. Figure 11.3 shows an alternative way of displaying this machine, in
which the halt state is not explicitly shown. Much like the straight start state arrow that denotes where the

341
Figure 11.2: The state transition diagram of the Turing machine discussed in Example 11.1

Figure 11.3: An alternate depiction of the Turing machine discussed in Example 11.1

automaton is entered, the new straight arrows show how the machine is left. This notation is especially
appropriate for submachines. As with complex programs, a complex Turing machine may be comprised
of several submodules. Control may be passed to a submachine, which manipulates the input tape until
it halts. Control may then be passed to a second submachine, which then further modifies the tape
contents. When this submachine would halt, control may be passed on to a third submachine, or back to
the first submachine, and so on. The straight arrows leaving the state transition diagram can be thought
of as exit arrows for a submachine, and they function much like a return statement in many programming
languages. Example 11.4 illustrates a Turing machine that employs submachines.
We will see that any DFA can be emulated by a Turing machine in the manner suggested by Example
11.1. The following example shows that Turing machines can recognize languages that are definitely not
FAD. In fact, the language accepted in Example 11.2 is not even context free.

Example 11.2

The Turing machine M illustrated in Figure 11.4 operates on words over {a a ,bb ,cc }. When started at the
leftmost end of the word, it is guaranteed to halt at the rightmost end and print Y or N . It happens
to overwrite the symbols comprising the input word as it operates, but this is immaterial. In fact, it is
possible to design a slightly more complex machine that restores the word before halting (see Example
11.11). The quintuple is 〈{aa ,b
b ,cc }, {# N }, {s 0 , s 1 , s 2 , s 3 , s 4 , s 5 , s 6 }, s 0 , δ〉, where δ is as indicated in the
Y ,N
#, X ,Y
diagram in Figure 11.4. It is intended to recognize the language {x ∈ {a b ,cc }∗ | |x|a = |x|b = |x|c }. One
a ,b
possible procedure for processing a string to check if it had the same number of a s, b s, and c s is given by
the pseudocode below.

342
while an a remains do
begin
replace a by X
return to leftmost symbol
find b ; if none, halt and print N
replace b by X
return to leftmost symbol
find c ; if none, halt and print N
replace c by X
return to leftmost symbol
end
halt and print Y if no more b s nor c s remain

States s 0 and s 1 in Figure 11.4 check the while condition, and states s 2 through s 6 perform the body
of the do loop. On each iteration, beginning at the leftmost symbol, state s 0 moves the tape head right,
checking for symbols that have not been replaced by X . If it reaches the end of the word (that is, if it scans
a blank), the a s, b s, and c s all matched, and it halts, printing Y . If b or c is found, state 1 searches for
a s; if the end of the string is reached without finding a corresponding a , the machine halts with N , since
there were an insufficient number of a s. From either s 0 or s 1 , control passes to s 2 when an a is scanned,
and that a is replaced by X . State s 2 , like s 4 and s 6 , returns the tape head to the leftmost character.
This is done by scanning left until a blank is found and then moving right as control is passed on to the
next state. State s 3 searches for b , halting with N if none is found. The first b encountered is otherwise
replaced by X , and the Turing machine enters s 4 , which then passes control on to s 5 after returning to the
leftmost symbol. State s 5 operates much like s 3 , searching for c this time, and s 6 returns the tape head to
the extreme left if the previous a and b have been matched with c . The process then repeats from s 0 .
To see exactly how the machine operates, it is useful to step through the computation for an input
string such as babcca
babcca. To do this, conventions to designate the status of the device are quite helpful.
Like the stack in a PDA, the tape contents may change as transitions occur, and the notation for the con-
figuration of a Turing machine must reflect those changes. Steps in a computation will be represented
according to the following conventions.

Definition 11.2 Let M = 〈Σ, Γ, S, s 0 , δ〉 be a Turing machine that is operating on a tape containing
. . . ###αbb β### . . ., currently in state t with the tape head scanning the b , where α, β ∈ (Σ ∪ Γ)∗ , α contains
no leading blanks and β has no trailing blanks. This configuration will be represented by αtb b β.
γ ` ψ will be taken to mean that the configuration denoted by ψ is reached in one transition from γ.
The symbol `* will denote the reflexive and transitive closure of `.

That is, the symbol representing the state will be embedded within the string, just to the left of the
symbol being scanned. If δ(t ,b R 〉, then αtb
b ) = 〈s,R b β ` αbb sβ. The new placement of the state label within
the string indicates that the tape head has indeed moved right one symbol. The condition S ∩ (Σ ∪ Γ) = ;
ensures that there is no confusion as to which symbol in the configuration representation denotes the
state. As with PDAs, γ `* ψ means that γ produces ψ in zero or more transitions. Note that the leading and
trailing blanks are not represented, but α and β may contain embedded blanks. Indeed, b may be a blank.
The representation ac###t # indicates that the tape head has moved past the word ac and is scanning the
fourth blank to the right of the word (α = ac### b = # , β = λ). At the other extreme, t ##ac
ac###,b ac shows the tape
head two cells to the left of the word (α = λ,b b = # , β = #ac
#ac). A totally blank tape is represented by t #.

343
Figure 11.4: The Turing machine M discussed in Example 11.2

Definition 11.3 For a Turing machine M = 〈, Σ, Γ, S, s 0 , δ〉, the language accepted by M , denoted by L(M ),
is L(M ) = {x ∈ Σ∗ | s 0 x `* xhY
Y }. A language accepted by a Turing machine is called a Turing-acceptable
language.

It is generally convenient to assume that the special symbol Y is not part of the input alphabet. Note
that words can be rejected if the machine does not print a Y or if the machine never halts.
Several reasonable definitions of acceptance can be applied to Turing machines. One of the most
common specifies that the language accepted by M is the set of all words for which M simply halts, ir-
respective of what the final tape contents are. It might be expected that this more robust definition of
acceptance might lead to more (or at least different) languages being recognized. However, this defini-
tion turns out to yield a device with the same cognitive power as specified by Definition 11.3, as indicated
below. More precisely, let us define

L 1 (A) = {x ∈ Σ∗ | ∃α, β ∈ (Σ ∪ Γ)∗ 3 s 0 x `* αhβ}

L 1 (A) is thus the set of all words that cause A to halt. Let L be a language for which L = L 1 (B ) for some
Turing machine B . It can be shown that there exists another Turing machine C that accepts L according
to Definition 11.3; that is, L 1 (B ) = L(C ) for some C . The converse is also true: any language of the form
L(M ) is L 1 (A) for some Turing machine A. Other possible definitions of acceptance include

L 2 (M ) = {x ∈ Σ∗ | ∃α, β ∈ (Σ ∪ Γ)∗ 3 s 0 x `* αhY

Y β}

and
L 3 (M ) = {x ∈ Σ∗ | ∃α ∈ (Σ ∪ Γ)∗ 3 s 0 x `* αhY
Y}
These distinguish all words that halt with Y somewhere on the tape and all words that halt with Y at the
end of the tape, respectively.

344
It should be clear that a Turing machine A, accepting L = L 1 (A 1 ) has an equivalent Turing machine
A 2 for which L = L 2 (A 2 ). A 2 can be obtained from A 1 by simply adding a new state and changing the
transitions to the halt state so that they now all go to the new state. The new state prints Y wherever the
tape head is and then, upon scanning that Y , halts. Similarly, a Turing machine A 3 can be obtained from
A 2 by instead requiring the new state to scan right until it finds a blank. It would then print Y and halt,
and L 2 (A 2 ) = L 3 (A 3 ). The technique for modifying such an A 3 to obtain A 4 for which L 3 (A 3 ) = L(A 4 ) is
discussed in the next section and illustrated in Example 11.11.

Example 11.3

Consider again the machine M in Example 11.2 and the input string babcc a a. By the strict definition of
acceptance given in Definition 11.3, L(M ) = {λ}, since λ is the only word that does not get destroyed by
M . Using the looser criteria for acceptance yields a more interesting language. The following steps show
that s 0babcca `* X X X X X X hY
Y.

s 0babcca ` b s 1abcc a ` b s 2 X bcc a ` s 2bX bcc a `

s 2#bX bcc a ` s 3bX bcc a ` s 4 X X bcc a X X bcc a
` s 4 #X `
s 5 X X bcc a ` X s 5 X bcc a ` X X s 5bcc a ` X X b
bs 5cc a `
XXb bs 6 X ca ` X X s 6bX c a ` X s 6 X bX c a ` s 6 X X bX c a `
s 6#X X bX c a ` s 0 X X bX c a ` X s 0 X bX c a ` X X s 0bX c a `
XXb bs 1 X ca `* s 2#X X bX c X `*
X X s 3bX c X `* s 4#X X X XC X `*
X X X X s 5c X `* s 6#X X X X X X `*
X X X X X X s0 ` X X X X X X hY

The string babcca is therefore accepted. ac is rejected since s 0ac `* X cchN N . Further analysis shows that
a ,b ∗
b ,cc } | |x|a = |x|b = |x|c }. Since the only place Y is printed is at the end of the
L 3 (M ) is exactly {x ∈ {a
word on the tape, L 3 (M ) = L 2 (M ). Every word eventually causes M to halt with either Y or N on the tape,
and so L 1 (M ) = Σ∗ .

Example 11.4

The composite Turing machine shown in Figure 11.5a employs two submachines (Figure 11.5b and Fig-
ure 11.5c) and is based on the parenthesis checker included as a sample in the Turing’s World software.
The machine will search for correctly matched parentheses, restoring the original string and printing Y
if the string is syntactically correct, and leaving a $ to mark the offending position if the string has mis-
matched parentheses. Asterisks are recorded to the left of the string as left parentheses are found, and
these are erased as they are matched with right parentheses.
Figure 11.5a shows the main architecture of the Turing machine. The square nodes represent the
submachines illustrated in Figures 11.5b and 11.5c. When s 0 encounters a left parenthesis, it marks the
occurrence with $ , and transfers control to the submachine S 1 . S 1 moves the read head to the left end
of the string, and deposits one ∗ there. The cells to the left of the original string serve as a scratch area;
the asterisks record the number of unmatched left parentheses encountered thus far. Submachine S 1
then scans right until the $ is found; it then restores the original left parenthesis. At this point, no further
internal moves can be made in S 1 , and the arrow leaving s 12 indicates that control should be returned to
the parent automaton.

345
Figure 11.5a: The Turing machine discussed in Example 11.4

The transition leaving the square S 1 node in Figure 11.5a now applies, and the tape head moves to
the right of the left parenthesis that was just processed by S 1 , and control is returned to s 0 . s 0 continues
to move right past the symbols a and b , uses S 1 to process subsequent left parenthesis, and transfers
control to the submachine S 2 whenever a right parenthesis is encountered.
Submachine S 2 attempts to match a right parenthesis with a previous left parenthesis. As control was
passed to S 2 , the right parenthesis was replaced by $ so that this spot on the tape can be identified later.
The transitions in state s 20 move the tape head left until a blank cell is scanned. If the cell to the right of
this blank does not contain an asterisk, s 21 has no moves and control is passed back to the parent Turing
machine, which will enter s 4 and move right past all the symbols in the word, printing N as it halts. The
absence of the asterisk implies that no previous matching left parenthesis had been found, so halting
with N is the appropriate action.
If an asterisk had been found, s 21 would have replaced it with a blank, and then would have no further
moves, and the return arrow would be followed. The blank that is now under the tape head will cause the
parent automaton to pass control to s 3 , which will move right to $, and the $ is then restored to ). Control
returns to s 0 as the tape head moves past this parenthesis.
The start state continues checking the remainder of the word in this fashion. When the end of the
word is reached, s 6 is used to examine the left end of the string; remaining asterisks indicate unmatched
left parentheses, and will yield N as the machine halts from s 8 . If s 6 does not encounter ∗ , the Turing
machine halts with Y and accepts the string from s 7 .
As more complex examples are considered, one may begin to suspect that any programming assign-
ment could be carried out on a Turing machine. While it would be truly unwise to try to make a living

346
Figure 11.5b: Submachine S 1 in Example 11.4

Figure 11.5c: Submachine S 2 in Example 11.4

selling computers with this architecture, these devices are generally regarded to be as powerful as any
general-purpose computer. That is, if an algorithm for solving a class of problems can be carried out on
a computer, then there should be a Turing machine that can produce identical output for each instance
of a problem in that class.
The language {x ∈ {a b ,cc }∗ | |x|a = |x|b = |x|c } is not context free, so it cannot be recognized by a
a ,b
PDA. Turing machines can therefore accept some languages that PDAs cannot, and we will see that they
can recognize every context-free language. We began with DFAs, which were then extended to the more
powerful PDAs, which have now been eclipsed by the Turing machine construct. Each of these classes of
automata has been substantially more capable than the previous class. If this text were longer, one might
wonder when the next class of superior machines would be introduced. Barring the application of magic
or divine intuition, there does not seem to be a “next class.” That is, any machine that is constrained to
operate algorithmically by a well-defined set of rules appears to have no more computing power than do
Turing machines.
This constraint, “to behave in an algorithmic fashion,” is an intuitive notion without an obvious exact
formal expression. Indeed, “behaving like a Turing machine” is generally regarded as the best way to
express this notion! A discussion of how Turing machines came to be viewed in this manner is perhaps
in order. An excellent in-depth treatment of their history can be found in [BARW].
At the beginning of the twentieth century, mathematicians were searching for a universal algorithm
that could be applied to mechanically prove any well-stated mathematical formula. This naturally fo-
cused attention on the manipulation of symbols. In 1931, Gödel showed that algorithms of this sort
cannot exist. Since this implied that there were classes of problems that could not have an algorithmic
solution, this then led to attempts to characterize those problems that could be effectively “computed.”
In 1936, Turing introduced his formal device for symbol manipulation and suggested that the definition
of an algorithm be based on the Turing machine. He also outlined the halting problem (discussed later),
which demonstrated a problem to which no Turing machine could possibly provide the correct answer
in all instances. The search for a better, perhaps more powerful characterization of what constitutes an
algorithm continued.

347
Figure 11.6: The DFA T discussed in Example 11.5

While it cannot be proved that it is impossible to find a better formalization that is truly more power-
ful, on the basis of the accumulating evidence, no one believes that a better formulation exists. For one
thing, other attempts at formalization, including grammars, λ-calculus, µ-recursive functions, and Post
systems, have all turned out to yield exactly the same computing power as Turing machines. Second, all
attempts at “improving” the capabilities of Turing machines have not expanded the class of languages
that can be recognized. Some of these possible improvements will be examined in the next section. We
close this section by formalizing what Example 11.1 probably made clear: every DFA can be simulated
by a Turing machine.

Theorem 11.1 Every FAD language is Turing acceptable.

Proof. We show that given any DFA A = 〈Σ, S, s 0 , δ, F 〉, there is a Turing machine M A that is equivalent
#,Y
to A. Define M A = 〈Σ, {# N }, S, s 0 , δ A 〉, where δ A is defined by
Y ,N

(∀s ∈ S)(∀aa ∈ Σ)(δ A (s,aa ) = 〈δ(s,a

a ),R
R 〉)
#) = 〈h,Y
(∀s ∈ F )(δ A (s,# Y 〉)
#) = 〈h,N
(∀s ∈ S − F )(δ A (s,# N 〉)

A simple inductive argument on |x| shows that

(∀x ∈ Σ∗ )(∀α, β ∈ (Σ ∪ Γ)∗ )(αt xβ `* αxqβ iff δ A (t , x) = q)

From this it follows that

(∀x ∈ Σ∗ )(s 0 x `* xq# iff δ A (s 0 , x) = q)

Therefore,
(∀x ∈ Σ∗ )(s 0 x `* xhY
Y iff δ A (s 0 , x) ∈ F )

which means that L(M A ) = L(A).

This result actually follows trivially from the much stronger results presented later. Not only is every
type 3 language Turing acceptable, but every type 0 language is Turing acceptable (as will be shown
by Theorem 11.2). The above proof presents the far more straightforward conversion available to type 3
languages and illustrates the flavor of the inductive arguments needed in other proofs concerning Turing
machines. By using this conversion, the Turing’s World© software can be employed to interactively build
and test deterministic finite automata on the Apple® Macintosh.

Example 11.5
a ,b
Consider the DFA T shown in Figure 11.6, which recognizes all words of even length over {a b }. The
corresponding Turing machine is illustrated in Example 11.1 (see Figure 11.2).

348
11.2 Variants of Turing Machines

There are several ways in which the basic definition of the Turing machine can be modified. For example,
Definition 11.1 disallows the tape head from both moving and printing during a single transition. It
should be clear that if such an effect were desired at some point it could be effectively accomplished
under the more restrictive Definition 11.1 by adding a state to the finite-state control. The desired symbol
could be printed as control is transferred to the new state. The transition out of the new state would
then move the tape head in the appropriate fashion, thus accomplishing in two steps what a “fancier”
automaton might do in one step. While this modification might be convenient, the ability of Definition
11.1-style machines to simulate this behavior makes it clear that such modified automata are no more
powerful than those given by Definition 11.1. That is, every such modified automaton has an equivalent
Turing machine.
It is also possible to examine machines that are more restrictive than Definition 11.1. If the machine
were constrained to write on only a fixed, finite amount of the tape, this would seriously limit the types of
languages that could be recognized. In fact, only the type 3 languages can be accepted by such machines.
Linear bounded automata, , which are Turing machines constrained to write only on the portion of the
tape containing the original input word, are also less powerful than unrestricted Turing machines and are
discussed in a later section. Having an unbounded area in which to write is therefore an important factor
in the cognitive power of Turing machines, but it can be shown that the tape need not be unbounded
in both directions. That is, Turing machines that cannot move left of the cell the tape head originally
scanned can perform any calculation that can be carried out by the less-restrictive machines given by
Definition 11.1 (see the exercises).
In deciding whether a Turing machine can simulate the modified machines suggested below, it is
important to remember that the auxiliary alphabet Γ can be expanded as necessary, as long as it remains
finite. In particular, it is possible to expand the information content of each cell by adding a second
“track” to the tape. For example, we may wish to add check marks to certain designated cells, as shown
in Figure 11.7. The lower track would contain the original symbols, and the upper track mayor may not
have a check mark. This can be accomplished by doubling the combined size of the alphabets Σ and Γ
to include all symbols without check marks and the same symbols with check marks. The new symbols
can be thought of as ordered pairs, and erasing a check mark then amounts to rewriting a pair such as
a , p〉 with 〈a
〈a a ,#
#〉. A scheme such as this could be used to modify the automaton in Example 11.2. Rather
than replacing designated symbols with X , a check could instead be placed over the original symbol. Just
prior to acceptance, each check mark could be erased, leaving the original string to the left of the Y (see
Example 11.11).
The foregoing discussion justifies that a Turing machine with a tape head capable of reading two
tracks can be simulated by a Definition 11.1 style Turing machine; indeed, it is a Turing machine with
a slightly more complex alphabet. When convenient, then, we may assume that we have a Turing ma-
chine with two tracks. A similar argument shows that, for any finite number k, a k-track machine has
an equivalent one-track Turing machine with an expanded alphabet. The symbols on the other tracks
p
can be more varied than just and # ; any finite number of symbols may appear on any of the tracks.
Indeed, a Turing machine may initially make a copy of the input string on another track to use in a later
calculation and/or to restore the tape to its original form. The ability to preserve the input word in this
manner illustrates why each language L = L 3 (A) for some Turing machine A must be Turing acceptable;
that is, L = L 3 (A) implies that there is a multitrack Turing machine M for which L = L(M ).

349
Figure 11.7: A Turing machine with a two-track tape

Figure 11.8: Emulating a two-headed Turing machine with a three-track tape

Example 11.6
Conceptualizing the tape as being divided into tracks simplifies many of the arguments concerning mod-
ification of the basic Turing machine design. For example, a modified Turing machine might have two
heads that move independently up and down a single tape, both scanning symbols to determine what
transition should be made and both capable of moving in either direction (or remaining stationary and
overwriting the current cell) as each transition is carried out. Such machines would be handy for recog-
nizing certain languages. The set {aa n b n |n ≥ 1} can be easily recognized by such a machine. If both heads
started at the left of the word, one head might first scan right to the first b encountered. The two heads
could then begin moving in unison to the right, comparing symbols as they progressed, until the leading
head encounters a blank and/or the trailing head scans its first b . If these two events occurred on the
same move, the word would be accepted. A single head Turing machine would have to travel back and
forth across the word several times to ascertain if it contained the same number of a s as b s. The ease with
which the two-headed mutation accomplished the same task might make one wonder whether such a
modified machine can recognize any languages which the standard Turing machine cannot.
To justify that a two-headed Turing machine is no more powerful than the type described by Defini-
tion 11.1, we must show that any two-headed machine can be simulated by a corresponding standard
Turing machine. As suggested by Figure 11.8, a three-track Turing machine will suffice. The original in-
formation would remain on the first track, and check marks will be placed on tracks 2 and 3 to signify the
simulated locations of the two heads. Several moves of the single head will be necessary to simulate just
one move of the two-headed variant, and the finite-state control must be replicated and augmented to

350
keep track of the stages of the computation. Each simulated move will begin with the single tape head
positioned over the leftmost check mark. The tape contents are scanned, and the symbol found is re-
membered by the finite state control. The tape head then moves right until the second check mark is
found. At this point, the device will have available the input symbols that would have been scanned by
both heads in the two-headed variant, and hence it can determine what action each of the heads would
have taken. The rightmost checkmark would then be moved left or right or the current symbol on track
1 overwritten, whichever is appropriate. The single tape head would then scan left until the other check
mark is found, which would then be similarly updated. This would complete the simulation of one move,
and the process would then repeat.
Various special cases must be dealt with carefully, such as when both heads would be scanning the
same symbol and when the heads “cross” to leave a different head as the leftmost. These cases are tedious
but straightforward to sort out, and thus any language that can be recognized by a two-headed machine
can be recognized by a standard Turing machine. Similarly, a k-headed Turing machine can be simulated
by a machine conforming to Definition 11.1. The number of tracks required would then be k +1, and the
set of states must expand so that the device can count the number of check marks scanned on the left
and right sweeps of the tape.
Multihead Turing machines are therefore fundamentally no more powerful than the single-head va-
riety. This means that whenever we need to justify that some task can be accomplished by a Turing
machine we may employ a variant with several heads whenever this is convenient. We have seen that
a n b n |n ≥ 1} was Turing acceptable. It can also be useful in
this variant simplified the justification that {a
showing that other variants are no more powerful than the type of machines given by Definition 11.1, as
illustrated in the next example.

Example 11.7
Consider now a device employing several independent tapes with one head for each tape, as depicted in
Figure 11.9. If we think of the tapes as stationary and the heads mobile, it is easy to see that we could
simply glue the tapes together into one thick tape with several tracks, as indicated in Figure 11.10. The
multiple heads would now scan an entire column of cells, but a head would ignore the information on all
but the track for which it was responsible. In this fashion, a multitape Turing machine can be simulated
by a multihead Turing machine, which can in turn be simulated by a standard Turing machine. Thus,
multitape machines are no more powerful than the machines considered earlier.
One of the wilder enhancements involves the use of a two-dimensional tape, which would actually
be a surface on which the tape head can move not only left and right, but also up and down to adjacent
squares. With some frantic movement of the tape head on a one-dimensional tape, two-dimensional
Turing machines can be successfully simulated. Indeed, k-dimensional machines (for finite k) are no
more powerful than a standard Turing machine. The interested reader is referred to [HOPC].

Example 11.8
A potentially more interesting question involves the effects that nondeterminism might have on the com-
putational power of a Turing machine. With finite automata, it was seen that NDFAs recognized exactly
the same class of languages as DFAs. However, deterministic pushdown automata accepted a distinctly
smaller class of languages than their nondeterministic cousins. It is consequently hard to develop even
an intuition for what “should” happen when nondeterminism is introduced to the Turing machine con-
struct.

351
Figure 11.9: A three-tape Turing machine

Figure 11.10: Emulating a three-tape Turing machine with a single three-track tape

352
Before we can address this question, we must first define what we mean by a nondeterministic Turing
machine. As with finite automata and pushdown automata, we may wish to allow a choice of moves
from a given configuration, leading to several disparate sequences of moves for a given input string.
Like NDFAs and NPDAs, we will consider a word accepted if there is at least one sequence of moves that
would have resulted in a Y being printed. Simulating such machines with deterministic Turing machines
is more involved than it may at first seem. If each possible computation was guaranteed to halt, it would
be reasonable to try each sequence of moves, one after the other, halting only when a Y was found. If one
sequence led to an N being printed, we would then move on to the next candidate. Since there may be
a countable number of sequences to try, this process may never end. This is not really a problem, since
if a sequence resulting in a Y exists, it will eventually be found and tried, and the machine will halt and
accept the word. If no such sequence resulting in a Y exists, and there are an infinite number of negative
attempts to be checked, the machine will never halt. By our original definition of acceptance, this will
result in the word being rejected, which is the desired result.
The trouble arises in trying to simulate machines that are not guaranteed to halt under all possible
circumstances. This is not an inconsequential concern; in Chapter 12, we will identify some languages
that are so complex that their corresponding Turing machines cannot halt for all input strings. A problem
then arises in trying to switch from one sequence to the next. If, say, the first sequence we tried did not
halt and instead simply continued operation without ever producing Y or N , we would never get the
chance to try other possible move sequences. Since the machine will not halt, the word will therefore be
rejected, even if some later sequence would have produced Y . Simulating the nondeterministic machine
in this manner will not be guaranteed to recognize the same language, and an alternative method must
be used.
This problem is avoided by simulating the various computations in the following (very inefficient)
manner. We begin by simulating the first move of the first sequence. We then start over with the first
move of the second sequence, and then begin again and simulate two moves in the first sequence. On
the next pass, we simulate the first move of the third sequence, then two moves of the second sequence,
and then three moves of the first sequence. On each pass, we start computing a new sequence and move
a little further along on the sequences that have already been started. If any of these sequences results in
Y , we will eventually simulate enough of that sequence to discover that fact and accept the word. In this
way, we avoid getting trapped in a dead end with no opportunity to pursue the alternatives.
Implementing the above scheme will produce a deterministic Turing machine that is equivalent to
the original nondeterministic machine. It remains to be shown that the Turing machine can indeed start
over as necessary, and that the possible move sequences can be enumerated in a reasonable fashion so
that they can be pursued according to the pattern outlined above. A three-tape (deterministic) Turing
machine will suffice. The first tape will keep an inviolate copy of the input string, which will be copied
onto the second tape each time a computation begins anew. A specific sequence of steps will be carried
out on this second scratch tape, after which the presence/absence of Y will be determined. The third
tape is responsible for keeping track of the iterations and generating the appropriate sequences to be
employed. Enumerating the sequences is much like the problem of generating words over some alpha-
bet in lexicographic order (see the exercises). Methods for generating the “directing sequences” can be
found in both [LEWI] and [HOPC]. These references also propose a more efficient approach to the whole
simulation, which is based on keeping track of the sets of possible configurations, much as was done in
Definition 4.5 for nondeterministic finite automata.
Thus, neither nondeterminism nor any of the enhancements considered above improved the com-
putational power of these devices. As mentioned previously, no one has yet been able to find any me-

353
chanical enhancement that does yield a device that can recognize a language that is not Turing accept-
able. Attempts at producing completely different formal systems have fared no better, and there is little
cause to believe that such systems exist. We now turn to characterizing what appears to be the largest
class of algorithmically definable languages. In the next section, we will see that the Turing-acceptable
languages are exactly the type 0 languages introduced in Chapter 8.

Definition 11.4 For a given alphabet Σ, let TΣ be the collection of all Turing-acceptable languages, and
let ZΣ be the collection of all type 0 languages.

The freedom to use several tapes and nondeterminism makes it easier to explore the capabilities of
Turing machines and relate TΣ to the previous classes of languages encountered. It is now trivial to justify
that every PDA can be simulated by a nondeterministic Turing machine with two tapes. The first tape
will hold the input, which will be scanned by the first tape head, which will only have to move right or, at
worst, remain stationary and reprint the same character it was scanning. The second tape will function
as the stack, with strings pushed or symbols popped in correspondence with what takes place in the
PDA. Since a Turing machine can only print one symbol at a time, some new states may be needed in the
finite-state control to simulate pushing an entire string, but the translation process is quite direct.

Lemma 11.1 Let Σ be an alphabet. Then PΣ ⊆ TΣ . That is, every context-free language is Turing accept-
able, and the containment is proper.
Proof. Containment follows from the formalization of the above discussion (see the exercises). Example
11.3 presented a language over {a a ,b
b ,cc } that is Turing acceptable but not context free. While the distinction
between DΣ , and PΣ , disappeared for singleton alphabets, proper containment remains between P{aa } and
T{aa } , as shown by languages such as {aa n | n is a perfect square }.

In the next section, an even stronger result is discussed, which shows that the class of Turing-acceptable
languages includes much more than just the context-free languages. Lemma 11.1 is actually an imme-
diate corollary of Theorem 11.2. The next section also explores the formal relationship between Turing
machines and context-sensitive languages.

11.3 Turing Machines, LBAs, and Grammars

The previous sections have shown that the class of Turing-acceptable languages properly contains the
type 2 languages. We now explore how the type 0 and type 1 languages relate to Turing machines. Since
the preceding discussions mentioned that no formal systems have been found that surpass Turing ma-
chines, one would expect that every language generated by a grammar can be recognized by a Turing
machine. This is indeed the case, as indicated by the following theorem.

Theorem 11.2 Let Σ be an alphabet. Then ZΣ ⊆ TΣ . That is, every type 0 language is Turing acceptable.
Proof. We justify that, given any type 0 grammar G = 〈Σ, Γ, S, P 〉, there must be a Turing machine TG
that is equivalent to G. As with the suggested conversion of a PDA to a Turing machine, TG will employ two
tapes and nondeterminism. The first tape again holds the input, which will be compared to the sentential
form generated on the second tape. The second tape begins with only the start symbol on an otherwise
blank tape. The finite-state control is responsible for nondeterministically guessing the proper sequence of
productions to apply, and with each guess, the second tape is modified to reflect the new sentential form.
If at some point the sentential form agrees with the contents of the first tape, the machine prints Y and

354
halts. A guess will consist of choosing both an arbitrary position within the current sentential form and
a particular production to attempt to substitute for the substring beginning at that position. Only words
that can be generated by the grammar will have a sequence of moves that produces Y , and no word that
cannot be generated will be accepted. Thus, the new Turing machine is equivalent to G.

Example 11.9
a ,b
Consider the context-sensitive grammar G = 〈{a b ,cc }, {S, A, B,C }, S, P 〉 where P contains the productions

1. Z → λ

2. Z → S

3. S → S ABC

4. S → ABC

5. AB → B A

6. B A → AB

7. C B → BC

8. BC → C B

9. C A → AC

10. AC → C A

11. A → a

12. B → b

13. C → c

It is quite easy to show that L(G) = {x ∈ {a b ,cc }∗ | |x|a = |x|b = |x|c } by observing that no production
a ,b
changes the relative numbers of (lowercase and capital) As, B s, and C s, and the six context-sensitive
rules allow them to be arbitrarily reordered. One of the attempted “guesses” made by the Turing machine
TG concerning how the productions might be applied is:

Use (2) beginning at position 1.

Use (4) beginning at position 1.
Use (6) beginning at position 2. . . .

This would lead to a failed attempt, since it corresponds to Z ⇒ S ⇒ ABC , and the substring BC begin-
ning at position 2 does not match B A, the left side of rule 6. On the other hand, there is a pattern of
guesses that would cause the following sequence of symbols to appear on the second tape:

Z ⇒ S ⇒ ABC ⇒ B AC ⇒ BC A ⇒ Bcc A ⇒ Bcc a ⇒ bc a

This would lead to a favorable comparison if bc a was the word on the input tape. Note that the Tur-
ing machine may have to handle shifting over existing symbols on the scratch tape to accommodate

355
increases in the size of the sentential form. Since type 0 grammars allow length-reducing productions,
the machine may also be required to shrink the sentential form when a string of symbols is replaced by
a smaller string.
A rather nice feature of type 1 languages is that the length of the sentential form could never decrease
(except perhaps for the application of the initial production Z → λ), and hence sentential forms that
become longer than the desired word are known to be hopeless. All context-sensitive (that is, type 1)
languages can therefore be recognized by a Turing machine that use an amount of tape proportional to
the length of the input string, as outlined below.

Definition 11.5 A linear bounded automaton (LBA) is a nondeterministic Turing machine that recognizes
words over an alphabet Σ given by the quintuple M = 〈Σ, Γ, S, s 0 , δ〉, where

Σ is the input alphabet

Γ is the auxiliary alphabet containing the special markers < and > and Σ, Γ, and {L L ,R
R } are
pairwise disjoint sets (and thus <, > ∉ Σ).
S is a finite nonempty set of states (and S ∩ (Σ ∪ Γ) = ;).
s 0 is the start state (s 0 ∈ S).
δ is the state transition function δ: S × (Σ ∪ Γ) → (S ∪ {h} × (Σ ∪ Γ ∪ {L L ,R
R }), where
R
(∀s ∈ S)(δ(s, <) = (q,R ) for some q ∈ S ∪ {h}), and
(∀s ∈ S)(δ(s, >) = (q,L L ) for some q ∈ S ∪ {h}, or δ(s, >) = (h,Y
Y ) or δ(s, >) = 〈h,NN 〉)

That is, the automaton cannot move left of the symbol < nor overwrite it. The LBA likewise cannot move
L, R , Y ,
right of the symbol >, and it can only overwrite it with Y or N just prior to halting. The symbols #,L
and N retain their former meaning, although # can be dropped from Γ since it will never be scanned. As
implied by the following definition, the special markers < and > are intended to delimit the input string,
and Definition 11.5 ensures that the automaton cannot move past these limits. As has been seen, the
use of several tracks can easily multiply the amount of information that can be stored in a fixed amount
of space, and thus the restriction is essentially that the amount of available tape is a linear function
of the length of the input string. In practice, any Turing machine variant for which each tape head is
constrained to operate within an area that is a multiple of the length of the input string is called a linear
bounded automaton.

Definition 11.6 For a linear bounded automaton M = 〈Σ, Γ, S, s 0 , δ〉, the language accepted by M, denoted
by L(M ), is L(M ) = {x ∈ Σ∗ | <s 0 x> `* <xhY
Y }. A language accepted by a linear bounded automaton is called
a linear bounded language (LBL).

Note that while the endmarkers must enclose the string x, it is the word x (rather than <x> that is
considered to belong to L(M ). As before, other criteria for acceptance are equivalent to Definition 11.6.
The set of all words for which a LBA merely halts can be shown to be a LBL according to the above
definition. The following example illustrates a linear bounded automaton that is intended to recognize
all words that cause the machine to print Y at the end of the (obliterated) word. Example 11.13 illustrates
a general technique for restoring the input word, producing an LBA that accepts according to Definition
11.6.

Example 11.10
Consider the machine L shown in Figure 11.11 and the input string babcc a
a. The following steps show
* X
that <s 0babcca
babcca> ` <X X X X X X hY .

356
Figure 11.11: The Turing machine discussed in Example 11.10

<s 0babcca
babcca> ` <b b s 1abcc a
a> ` <b b s 2 X bcc a
a> ` <s 2bX bcc a a> `
bX bcc a
s 2 <bX a> ` <s 3bX bcc aa> ` <s 4 X X bcc a a> ` s 4 <X X X bcc aa> `
<s 5 X X bcc a X s 5 X bcc a
a> ` <X a> ` <X X X s 5bcc aa> ` <X XXb bs 5cc a
a> `
<XXXb bs 6 X ca X X s 6bX c a
ca> ` <X a> ` <X X s 6 X bX c aa> ` <s 6 X X bX c aa> `
s 6 <XX X bX c aa> ` <s 0 X X bX c a
a> ` <X X s 0 X bxc aa> ` <XX X s 0bX c a a> `
<XXXb bs 1 X ca
ca> `* s 2 <X
X X bX c X > `*
<XX X s 3bX c X > `* s 4 <X
X X X XcX > `*
*
X X X X s 6c X >
<X ` s 6 <XXXXXXX> `*
X X X X X X s0 >
<X ` <X X X X X X X hY

Definition 11.7 For a given alphabet Σ, let LΣ be the collection of all linear bounded languages, and let
OΣ be the collection of all context-sensitive (type 1) languages.

The proof of Theorem 11.2 can be modified to show that all context-sensitive languages can be rec-
ognized by linear bounded automata. Since context-sensitive languages do not contain contracting pro-
ductions, no sentential forms that are longer than the desired word need be considered. Consequently,
the two-tape Turing machine in Theorem 11.2 can operate as a linear bounded automaton. The first
tape with the input word never changes and thus satisfies the boundary restriction, while the finite-state
control can simply abort any computation on the second tape that violates the length restriction. Just as
Theorem 11.2 showed that ZΣ ⊆ TΣ we now have a relationship between another pair of cognitive and
generative classes.

Theorem 11.3 Let Σ be an alphabet. Then OΣ ⊆ LΣ . That is, every type 1 language is a LBL.
Proof. The proof follows from the formalization of the above discussion (see the exercises).

357
Figure 11.12: The Turing machine discussed in Example 11.11

We have argued that every type 0 grammar must have an equivalent Turing machine, and it can con-
versely be shown that every Turing-acceptable language can be generated by a type 0 grammar. To do
this, it is most convenient to use the very restrictive criteria for a Turing-acceptable language given in
Definition 11.3, in which the original input string is not destroyed. For Turing machines which behave in
this fashion, the descriptions of the device configurations bear a remarkable resemblance to the deriva-
tions in a grammar.

Example 11.11

Consider again the language {x ∈ {a b ,cc }∗ | |x|a = |x|b = |x|c . As discussed in Example 11.3, the Turing
a ,b
machine in Figure 11.4 destroys the word originally on the input tape. Figure 11.12 depicts a slightly more
complex Turing machine that restores the original word just prior to acceptance. It will (fortunately) not
generally be necessary for our purposes to restore rejected words, since there are intricate languages for
which this is not always possible. The modified quintuple is T = 〈{a a ,b
b ,cc }, {# B ,C
#, A ,B C ,Y
Y ,N
N }, {s 0 , s 1 , s 2 , s 3 , s 4 ,
s 5 , s 6 , s 7 , s 8 }, s 0 , δ〉, where δ is as indicated in the diagram in Figure 11.13. “Saving” the original input string
is accomplished by replacing occurrences of the different letters by distinct symbols and restoring them
later. The implementation reflects one of the first uses suggested for multiple-track machines: using the
second track to check off input symbols. For legibility, an a with a check mark above it is denoted by A ,
while an a with no check mark remains an a . Similarly, checked b s are represented by B and checked c s
by C . Thus, if the string B AbC c a were on a two-track tape employing check marks, it would look like

pp p

babcc a

The additional states s 7 and s 8 essentially erase the check marks just before halting by replacing A with
a , B with b , and C with c .
Consider again the input string babcc a processed by the Turing machine in Example 11.3. It is also
accepted by this Turing machine because the following steps show that s 0babcc a `* babcc ahY . Note
how closely the steps correspond with those in Example 11.3. The sequence below also illustrates how s 7
converts the string back to lowercase, after which s 8 returns the tape head to the right for acceptance.

358
Figure 11.13: The state transition diagram discussed in Example 11.11

s 0babcca ` b s 1abcc a ` b s 2 Abcc a ` s 2b Abcc a `

s 2#b Abcca ` s 3b Abcc a ` s 4B Abcc a ` s 4#B Abcc a `
s 5B Abcc a ` B s 5 Abcc a ` B A As 5sbcc a ` B Ab
Abs 5cc a `
B Ab C c a B
Abs 6 ca ` As 6A bC c a B
` s6 AbC c a ` s 6B AbC c a `
s 6#B AbC c a ` s 0B AbC c a ` B s 0 AbC c a ` B A As 0bC c a `
B Ab ca
Abs 1C ca `* s 2#B AbC c A `*
BA As 3bC ca `* s 4#B ABC c A `*
*
B ABC s 5c A ` s 6#B ABCC A `*
B ABCC A As 0 ` B ABCC s 7 A ` B ABCC s 7a B ABC s 7C a `
B ABC s 7ca `* s 7babcc a ` s 7#babcc a `
s 8babcca ` b s 8abcc a `* babcc a as 8 `
babccahY

If occurrences of the machine transition symbol ` are replaced by the derivation symbol ⇒, the
above sequence would look remarkably like a derivation in a type 0 grammar. Indeed, we would like
to construct a grammar in which sentential forms like b s 1abcc a could be derived from s 0babcc a in
one step. Since the machine changed configurations because of the transition rule δ(s 0 ,b b ) = 〈s 2 ,R
R 〉, this
transition should have a corresponding production of the form s 0b → b s 1 , Each transition in the Turing
machine will be responsible for similar productions.
Unfortunately, the correspondence between transition rules and productions is complicated by the
fact that the tape head may occasionally scan blank cells, which must then be added to the sentential
form. The special characters [ and ] will bracket the sentential form throughout this stage of the deriva-
tion and will indicate the current left and right limits of the tape head travel, respectively. Attempting to

359
move left past the conceptual position of [ (or right past the position of ] ) will result in the addition of a
blank symbol to the sentential form.
To generate the words accepted by a Turing machine, our grammar will randomly generate a word
over Σ, delimit it by brackets, and insert the symbol for the start state at the left edge. The rules derived
from the transitions should then be able to transform a string such as [s 0babcc a# #babcc ahY ##].
a#] into [#babcc
Since only the letters in Σ will be considered terminal symbols, the symbols [, ], # , and Y are nontermi-
nals, and the derivation will not yet be complete. To derive terminal strings for just the accepted words,
the presence of Y will allow further productions to delete those remaining nonterminals.

Definition 11.8 Given a Turing machine M = 〈Σ, Γ, S, s 0 , δ〉, the grammar corresponding to M , G M , is
given by G M = 〈Σ, Γ ∪ S ∪ {Z ,W,U ,V, [, ]}, Z , P M 〉, where P M contains the following classes of productions:
1. Z → [W # ] ∈ P M
(∀a ∈ Σ)([W → [W a ∈ P M )
W → s0 ∈ P M

2. Each printing transition gives rise to a production rule as follows

(∀s ∈ S)(∀t ∈ S ∪ {h})(∀a b ∈ Σ ∪ Γ)(if δ(s,a
a ,b a ) = 〈t ,b
b 〉, then sa
a → tb
b ∈ PM )

Each move right gives rise to a production rule as follows

a ∈ Σ ∪ Γ)(if δ(s,a
(∀s, t ∈ S)(∀a a ) = 〈t ,R
R 〉, then sa
a → a t ∈ PM )

If a = # , an additional production is needed:

(∀s, t ∈ S)(if δ(s,# R 〉, then s] → # t ] ∈ P M )
#) = 〈t ,R

Each move left gives rise to a production rule as follows

(∀s, t ∈ S)(∀a a ∈ Σ ∪ Γ)
(if δ(s,a
a ) = 〈t ,L
L 〉, then [sa
a → [t#a d ∈ Σ ∪ Γ)(d
#a ∈ P M ∧ (∀d d sa
a → td
d a ∈ P M ))

3. hYY → U ∈ PM
U## → U ∈ PM
U ] → V ∈ PM
(∀aa ∈ Σ)(a
aV → V a ∈ PM )
#V → V ∈ P M
[V → λ ∈ P M

The rules in class 1 are intended to generate all words of the form [s 0 x# #], where x is an arbitrary
member of Σ . The remaining rules are defined in such a way that only those strings x that are recognized
∗

by M can successfully produce a terminal string. Note that once W is replaced by s 0 neither Z nor W can
appear in a later sentential form. After s 0 is generated, the rules in class 2 may apply. It can be inductively
argued that the derivations arising from the application of these rules directly reflect the changes in the
configuration of the Turing machine (see Theorem 11.4).
None of the class 3 productions can be used until the point at which the halt state would be reached in
the corresponding computation. Since h ∉ S, none of the class 2 productions can then be used. Only if Y
was written to tape as the Turing machine halted will the production hY Y → U be applicable. U will then
delete the trailing blanks and ] from the sentential form, and then V will percolate to the left, removing
the leading blanks and the final nonterminal [, leaving only the terminal string x in the (completed)
sentential form. The following example illustrates a derivation stemming from a typical Turing machine.

360
Example 11.12
Consider the Turing machine T in Figure 11.13 and the corresponding grammar G T . Among the many
possible derivations involving the class 1 productions is

Z ⇒ [W # ] ⇒ [W a#
a#] ⇒ [W c a#
a#] ⇒ [W cc a#
a#] ⇒ [W bcc a#
a#] ⇒ [W abcc a#
a#]
⇒ [W babcca#
babcca#] ⇒ [s 0babcc a#a#]

Only class 2 productions apply at this point, and there is exactly one derivation applicable at each step
in the following sequence.

[s 0babcca#
babcca#] ⇒ [b b s 1abcc a#
a#] b s 2 Abcc a#
⇒ [b a#] ⇒ [s 2b Abcc a# a#] ⇒
[s 2#b Abcca#
Abcca#] ⇒ [# # s 3b Abcc a#
a#] ⇒ [# # s 4B Abcc a#
a#] ⇒ [s 4#B Abcc a# a#] ⇒
[## s 5B Abcc a# #B s 5 Abcc a#
a#] ⇒ [#B a#] ⇒ [#B#B A As 5bcc a#
a#] ⇒ [#B#B AbAbs 5cc a#
a#] ⇒
#B Ab
[#B Abs 6C ca#
ca#] ⇒ [#B#B A As 6bC c a# #B s 6 AbC c a#
a#] ⇒ [#B a#] ⇒ [## s 6B AbC c a#
a#] ⇒
[s 6#B AbC c a# a#] ⇒ [## s 0B AbC c a# #B s 0 AbC c a#
a#] ⇒ [#B #B A
a#] ⇒ [#B As 0bC c a#
a#] ⇒
#B Ab * *
[#B Abs 2C ca#
c a#] ⇒[s 2 #B AbC c A#
A#] ⇒
#B A * *
[#B As 3bC c A#
A#] ⇒[s 4#B ABC c A# A#] ⇒
#B ABC s 5c A# * *
[#B A#] ⇒[s 6#B ABCC A# A#] ⇒
#B ABCC A
[#B #B ABCC s 7 A#
As 0# ] ⇒ [#B #B ABCC s 7a#
A#] ⇒ [#B #B ABC s 7C a#
a#] ⇒ [#B a#] ⇒
#B ABC s 7ca# * # s 7babcc a#
[#B ca#] ⇒[# a#] ⇒ [s 7#babcc a# a#] ⇒
# s 8babcca# #b * #babcc
[# babcca#] ⇒ [#b #bs 8abcc a#a#] ⇒[#babcc a
as 8# ] ⇒
#babcca
[#babcca
#babccahY Y]

In Turing machines where the tape head travels further afield, there may be many more blanks enclosed
within the brackets. At this point, the class 3 productions take over to tidy up the string:

#babccahY ] ⇒ [#babcc
[#babccahY #babcc a #babcc a
aU ] ⇒ [#babcc #babcc
aV ⇒ [#babcc
#babccV a
#babc #bab
#babcV ca ⇒ [#bab
⇒ [#babc #ba
#babV cc a ⇒ [#ba #b
#baV bcc a ⇒ [#b
#bV abcc a
#V babcca ⇒ [V babcc a ⇒ babcc a
⇒ [#

As expected, babcca ∈ L(G T ).

It is interesting to observe that the only stage in which a choice of productions is available is during
the replacement of the nonterminal W . Once a candidate string is so chosen, the determinism of the
Turing machine forces the remainder of the derivation to be unique. This is true even for strings that
were not accepted by the Turing machine: if class 2 productions are applied to [s 0baa# baa#], there is ex-
actly one derivation sequence for this sequential form, and it leads to [BB Aa B Aa
Aas 5# ] and then [B AahNN ]. No
productions apply to this sentential form, and thus no terminal string will be generated. The relation-
ship between strings accepted by the Turing machine and the strings generated by the corresponding
grammar is at the heart of the following theorem.

Theorem 11.4 Let Σ be an alphabet. Then TΣ ⊆ ZΣ . That is, every Turing-acceptable language can be
generated by a type 0 grammar.
Proof. Let M be a Turing machine M = 〈Σ, Γ, S, s 0 , δ〉, and let

L(M ) = {x ∈ Σ∗ | s 0 x `* xhY
Y },

as specified in the most restrictive sense of a Turing-acceptable language (Definition 11.3). Consider the
grammar G M corresponding to M , as given in Definition 11.8. The previous discussion of G M provided a

361
general sense of the way in which the productions could be used and justified that they could not be com-
bined in unexpected ways. A rigorous proof requires an explicit formal statement of the general properties
that have been discussed. A trivial induction on the length of x shows that by using just the productions in
class 1
*
(∀x ∈ Σ∗ )(Z ⇒[s #])
0 x#

Another induction argument establishes the correspondence between sequences of applications of the
class 2 productions and sequences of moves in the Turing machine. Specifically, by inducting on the num-
ber of transitions, it can be shown that

(∀s, t ∈ S ∪ {h})(∀α, β, γ, ω ∈ (Σ ∪ Γ)∗ )

#i αsβ#
(αsβ `* γt ω iff (∃i , j , m, n ∈ N)([# # j ] ⇒[#
* #m #n ]))
γt ω#

The actual number of padded blanks is related to the extent of the tape head movement, but this is not im-
portant for our purposes. The essential observation is that a move sequence in M is related to a derivation
sequence in G M , with perhaps some change in the number of blanks at either end. The above statement
was stated in full generality to facilitate the induction proof (see the exercises). We need apply it in a very
limited sense, as stated below.

(∀α, β, γ, ω ∈ (Σ ∪ Γ)∗ )(s 0 x `* xhY

hY iff (∃m, n ∈ N)([s 0 x# *
#m xhY
#] ⇒[# hY # n ]))

Y appears on the tape after a finite number

Observe that the productions in class 3 cannot be used unless hY
Y triggers the class 3 productions, which remove all the
of steps. As discussed earlier, the presence of hY
remaining nonterminals. Thus,

(∀x ∈ Σ∗ )(s 0 x `* xhY *

Y iff Z ⇒[s *
#m xhY
#] ⇒[#
0 x# Y #n ] ⇒
*
x)

which implies that L(M ) = L(G M ).

Since every Turing machine has an equivalent type 0 grammar and every type 0 grammar generates
a Turing-acceptable language, we have two ways of representing the same class of languages.

Corollary 11.1 The class of languages generated by type 0 grammars is exactly the Turing-acceptable lan-
guages. That is, ZΣ = TΣ .
Proof. The proof follows immediately from Theorems 11.2 and 11.4.

As will be seen in Chapter 12, the linear bounded languages are a distinctly smaller class than the
Turing-acceptable languages. Theorem 11.3 showed that OΣ ⊆ LΣ , and a technique similar to that used
in Theorem 11.4 will show that LΣ ⊆ OΣ . That is, we can show that every linear bounded automaton
has an equivalent context-sensitive grammar. Note that the class 1 and 2 productions in Definition 11.8
contained no contracting productions; it was only when the class 3 productions were applied that the
sentential form might shrink. When dealing with linear bounded automata, the tape head is restricted
to the portion of the tape containing the input string, so there will be no extraneous blanks to delete.
The input word on the tape of a linear bounded automaton is bracketed by distinct symbols < and >,
which might be used in the corresponding grammar in a fashion similar to [ and ]. These would be
immovable in the sense that no new blanks would be inserted between them and the rest of the bracketed
word. Unfortunately, in Definition 11.8 the delimiters [ and ] must eventually disappear, shortening the
sentential form. No such shrinking can occur if we hope to produce a context-sensitive grammar.

362
Figure 11.14a: A three-track Turing machine employing delimiters

Figure 11.14b: An accepting configuration

To overcome this difficulty, it is useful to imagine a three-track tape with the input word on the middle
track and the delimiter - on the upper track of the tape above the first symbol of the word. Another - will
occur on the lower track below the last character of the input string. These markers will serve as guides to
prevent the tape head from moving past the limits of the input word. For example, if the linear bounded
automaton contained the word <babcca babcca
babcca> on its input tape, the tape for the corresponding three-track
automaton would be as pictured in Figure 11.14a. If the word were accepted, the tape would eventually
reach the configuration shown in Figure 11.14b as it halted, printing Y on the lower track. It is a relatively
simple task to convert a linear bounded automaton into a three-track automaton, where the tape head
never moves left of the tape cell with the - in the upper track, and never moves right of the cell with
the - in the lower track (see the exercises). We will refer to such an automaton as a strict linear bounded
automaton. The definitions used will depend on the upper and lower track markers occurring in different
cells, which makes the representation of words of length less than two awkward. Since this construct is
motivated by a need to find a context-sensitive grammar, we will simply modify the resulting grammar
to explicitly generate any such short words and not rely on the above formalism.

Example 11.13
Consider the linear-bounded automaton discussed in Example 11.10, which accepted {x ∈ {a b ,cc }∗ | |x|a =
a ,b
|x|b = |x|c }. As suggested by the exercises, this can be modified to form the three-track strict linear

363
bounded automaton shown in Figure 11.15, which accepts {x ∈ {a a ,bb ,cc }∗ | |x| ≥ 2 ∧ |x|a = |x|b = |x|c }. To
avoid explicitly mentioning the three tracks, a cell containing b on the middle track and − on the upper
track is denoted by the single symbol b , a cell containing A on the middle track and − on the lower track
is shown as A , and so on. Thus, the six original symbols in {a a ,b
b ,cc , A ,B
B ,CC } give rise to six other symbols
−
employing the overbar , six more using the underscore , and some symbols indicating acceptance (or
possibly rejection), such as a Y (or C N ). For clarity, only those combinations that can actually occur in
a transition sequence are shown in Figure 11.15. The sequence of moves that would transform the tape
from the configuration shown in Figure 11.14a to that of Figure 11.14b is shown below.

s 0b abcc
abccaa ` b s 1abcc
abcca a ` b s 2 Abcc
Abcca a ` s 2b Abcc
Abcca a `
s 3b Abcc
Abccaa ` s 4B AbccAbcca a `
B Abcc a B Abcc
s 5 Abcca ` s 5 Abcca ` a B A bcc
s 5 bccaa ` B AbAbs 5cc
cca a `
B Ab a ` B A s 6bC cca
Abs 6C cca a ` B s 6 AbC cca a ` s 6B AbC ccaa `
s 0B AbC cca a ` B s 0 AbC cca a ` B A s 0bC cca
a `
B Ab a
Abs 1C cca `* s 2B AbC c A `*
B A s 3bC c A `* s 4B ABC c A `*
*
B ABC s 5c A ` s 6B ABCC A `*
B ABCC s 0 A ` B ABCC s 7a ` B ABC s 7C a `
B ABC s 7c a `* s 7B abcc
abcca a `
a ` b s 5abcc *
a ` b abcc s 8a
s 8b abcc
abcca abcca `
b abcc
abccha aY

Consider implementing a grammar similar to that given in Definition 11.8, but applied to a strict lin-
ear bounded automaton incorporating the two delimiting markers on separate tracks. The new symbols
will eliminate the need for [ and ] and avoid the contracting productions that were required to delete [
and ] from the sentential form. The class 3 productions would simply replace a symbol such as a Y with
a and b with b .
Unfortunately, it will not be possible to explicitly use distinct symbols to keep track of the state and
the placement of the state head, as was done with s 0 , s 1 , . . . , s n , and h in the previous production sets.
This extraneous symbol will also have to disappear to form a terminal string, and this must be done in
a way that does not use contracting productions. As with the underscore and overbar, the state name
will be encoded as a subscript attached to one symbol in the sentential form. Thus, each original symbol
d , which has already given rise to additional nonterminals d and d , will also require nonterminals such
d n to be added to Γ. The inclusion of d i within a sentential form will reflect that the tape
d 1 , . . . ,d
as d 0 ,d
head is currently scanning this d while the finite-state control is in state s i . Further symbols will also be
needed; d i indicates that the tape head is scanning the leftmost symbol, which happens to be d , while
the finite-state control is in state s i , and d i indicates a similar situation involving the rightmost symbol.
This plethora of nonterminals can be used to define a context-sensitive grammar that generates the
language recognized by a strict linear bounded automaton. For the automaton given in Example 11.13,
generating the terminal string b abcca will begin with the random generations of the six-symbol senten-
tial form b 0abcca with the class 1 productions, which will be transformed into b abcc a Y by the class 2
productions, and finally into babcca via the class 3 productions. In the following definition, note that
by the conditions placed on a strict linear bounded automaton Γ already contains symbols of the form
A and A , and hence so will Γs . For simplicity, the state set is required to be of the form {s 0 , s 1 , . . . , s n }, but
clearly the state names of any automaton could be renumbered sequentially to fit the given definition.

364
Figure 11.15: The Turing machine discussed in Example 11.13

Definition 11.9 Given a strict linear bounded automaton

B = 〈Σ, Γ, {s 0 , s 1 , . . . , s n }, s 0 , δ〉,

the context-sensitive grammar corresponding to B , G B , is given by

G B = 〈Σ, ΓB , Z , P B 〉,

where ΓB is given by
ΓB = Γ ∪ {d
d i | d ∈ Σ ∪ Γ, i = 1, 2, . . . , n, or i = Y } ∪ {Z , S,W }
P B contains the following classes of productions:
1. If λ ∈ L(B ), then Z → λ ∈ P B
Z → S ∈ PB
(∀dd ∈ Σ)(if d ∈ L(B ), then S → d ∈ P B )
(∀dd ∈ Σ)(S → W d ∈ P B )
(∀dd ∈ Σ)(W → W d ∈ P B )
(∀dd ∈ Σ)(W → d 0 ∈ P B )

2. Each printing transition gives rise to a production rule as follows:

(∀s i , s j ∈ S)(∀a b ∈ Σ ∪ Γ)( if δ(s i ,a

a ,b a ) = 〈s j ,b
b 〉, then a i → b j ∈ P B )

Each move right gives rise to a production rule as follows:

a ∈ Σ ∪ Γ)(if δ(s i ,a
(∀s i , s j ∈ S)(∀a a ) = 〈s j ,R d ∈ Σ ∪ Γ)(a
R 〉, then (∀d a i d → ad j ∈ P B )

365
Each move left gives rise to a production rule as follows:

a ∈ Σ ∪ Γ)(if δ(s i ,a
(∀s i , s j ∈ S)(∀a a ) = 〈s j ,L d ∈ Σ ∪ Γ)(d
L 〉, then (∀d d a i → d j a ∈ PB )

Each halt with acceptance gives rise to a production rule as follows:

b ∈ Σ ∪ Γ)(∀a
(∀s i ∈ S)(∀b a ∈ Σ)(if δ(s i ,b
b ) = 〈h,a
a Y 〉, then b i → a y ∈ P B )

3. (∀a b ∈ Σ)(ba
a ,b ba Y → b Y a ∈ P B )
(∀a b ∈ Σ)(b
a ,b b a Y → ba ∈ P B )

Example 11.14
Consider again the strict linear bounded automaton B given in Figure 11.15 and the corresponding
context-sensitive grammar G B . The following derivation sequences show that babcc a ∈ L(G B ):

Z ⇒ S ⇒ W a ⇒ W c a ⇒ W cc
ccaa ⇒ W bcc
bccaa ⇒ W abcc
abccaa ⇒ b 0abcc
abccaa

At this point, only the class 2 productions can be employed, yielding:

b 0abcc
abccaa ⇒ b a 1bcc
bccaa ⇒ b A 2bcc
bccaa ⇒ b 2 Abcc
Abccaa ⇒
b 3 Abcc
Abccaa ⇒ B 4 Abcc
Abccaa ⇒
B 5 Abcc
Abccaa ⇒ B A 5bcc
bcca a ⇒ B Ab 5cc
ccaa ⇒ B Abc 5c a ⇒
B AbC 6c a ⇒ B Ab 6C ccaa ⇒ B A 6bC ccaa ⇒ B 6 AbC cca
a ⇒
a ⇒ B A 0bC cca
B 0 AbC cca a ⇒ B Ab 0C cca
a ⇒
* B *
B AbC 1c a ⇒B 2 AbC c A ⇒
* B *
B Ab 3C c A ⇒B 4 ABC c A ⇒
B ABC c 5 A * B ABCC A *
⇒B 6 ⇒
B ABCC A 0 ⇒ B ABCC
ABCCa a 7 ⇒ B ABCC 7a ⇒
* B a
B ABC c 7a ⇒B 7abcc
abcca ⇒
a ⇒ b a 8bcc * b
a ⇒b a8
b 8abcc
abcca bcca abcc
abcca

Finally, since δ(s 8 ,a

a ) = 〈h,a
aY ), the class 3 productions can then be applied:

b abcc
abccaa 8 ⇒ b abcc
abccaa Y ⇒ b abcc Y a ⇒ b abc Y c a ⇒ b ab Y cc a ⇒ b a Y bcc a ⇒ babcc a

Once again, the grammars springing from Definition 11.9 can generate sentential forms correspond-
ing to any string in Σ∗ , as long as the length of the string is at least two. As with the grammars arising from
Definition 11.8, only strings that would have been accepted by the original machine will lead to a termi-
nal string. If the productions of this example were applied to the sentential form b 0a a , at each step there
will be exactly one choice of applicable production, until eventually the form B A a is obtained. At this
step, no production will apply, and therefore a terminal string cannot be generated from b 0a a . This cor-
respondence between words accepted by the machine δ and words generated by the context-sensitive
grammar G B given in Definition 11.9 is the foundation of the following theorem.

Theorem 11.5 Let Σ be an alphabet. Then LΣ ⊆ OΣ . That is, every linear bounded language can be gen-
erated by a type 1 grammar.

366
Proof. Any linear bounded language can be recognized by a strict linear bounded automaton (see
the exercises). Hence, if L is a linear bounded language, there exists a strict linear bounded automaton
B = 〈Σ, Γ, {s 0 , s 1 , . . . , s n }, s 0 , δ〉 which accepts exactly the words in L by printing Y on the lowest of the three
tracks after restoring the original word to the middle track. We will employ the grammar G B corresponding
to B , as given in Definition 11.9. Example 11.14 illustrated that these productions can be used in a manner
similar to those of Definition 11.8, and it is easy to justify that they cannot be combined in unexpected ways.
Induction on the length of x will show that by using just the productions in class 1,

*
(∀x ∈ Σ∗ )(∀a b ∈ Σ)(Z ⇒a
a ,b a 0 xb
b)

The correspondence between sequences of applications of the class 2 productions and sequences of
moves in B follows as in Theorem 11.4. Due to the myriad positions that the integer subscript can oc-
cupy, and the special cases caused by the presence of the overbars and underscores, the general induction
statement is quite tedious to state and is left as an exercise. The statement will again be applied to the
special case in which we are interested, as stated below.

(∀x ∈ Σ∗ )(∀a
a ,b b `* a xhb
b ∈ Σ)(s 0a xb bY iff a 0 xb *
b ⇒ bY )
a xb

* a b
bY ⇒a
A final induction argument will show that a xb xb . Thus,

(∀x ∈ Σ∗ )(∀a b `* a xhb

b ∈ Σ)(s 0a xb
a ,b *
bY iff Z ⇒aa 0 xb *
b ⇒aa xb *
bY ⇒ b)
a xb

This establishes the correspondence between words of length at least two accepted by B and those generated
by G B . Definition 11.9 included specific productions of the form Z → λ and S → d to ensure that words of
length 0 and 1 also corresponded. This implies that L(B ) = L(G B ), as was to be shown.

The proof of Theorem 11.5 argues that there exists a context-sensitive grammar G B for each strict
linear bounded automaton B , and it certainly appears that given an automaton B we can immediately
write down all the productions in P B , as specified by Definition 11.9. However, some of the class 1 pro-
ductions may cause some trouble. For example, determining whether the production Z → λ is included
in P B depends on whether the automaton halts with Y when presented with a blank tape. In the next
chapter, we will see that even this simple question cannot be effectively answered for arbitrary Turing
machines! That is, it is impossible to find an algorithm that, when presented with the state diagram of a
Turing machine, can reliably determine whether or not the machine accepts the empty string. It will be
shown that any such proposed algorithm is guaranteed to give the wrong answer for some Turing ma-
chines. Similarly, it now seems that there might be some uncertainty about which members of Σ give rise
to productions of the form S → d .
The productions specified by Definition 11.9 were otherwise quite explicit; only the productions re-
lating to the immediate generation of a single character or the empty string were in any way question-
able. There are only |Σ| + 1 such productions, and some combination of them has to be the correct set
of productions to include in P B . Thus, as stated in the theorem, we are assured that a context-sensitive
grammar does exist, even if we are unclear as to exactly which of these special productions it should
contain.
As will be seen in Chapter 12, it is possible to determine which words are accepted (and which are
rejected) by linear bounded automata. Unlike unrestricted Turing machines, there is only a finite span
of tape upon which symbols can be placed. Furthermore, there are only a finite number of characters
that can appear in those cells, a finite number of positions the tape head can be in, and a finite number

367
of states to consider. The limited number of configurations makes it possible to determine exactly which
words of a given size are recognized by the LBA.
We have seen that every linear bounded automaton is equivalent to a strict linear bounded automa-
ton, and these have equivalent type 1 grammars. Conversely, every type 1 grammar generates a linear
bounded language, which implies there is another correspondence between a generative construct and
a cognitive construct.

Corollary 11.2 The class of languages generated by context-sensitive grammars is exactly the linear bounded
languages. That is, LΣ = OΣ .
Proof. The proof follows immediately from Theorems 11.3 and 11.5.

11.4 Closure Properties and The Hierarchy Theorem

Finally, we consider some of the closure properties of the classes of languages explored in this chapter.
Since TΣ = ZΣ we may use either cognitive or generative constructs for this class, whichever is most
convenient. The fact that LΣ = OΣ , will allow similar freedom for the type 1 languages. The next theorem
illustrates a case in which the grammatical construct is the easier to use.

Theorem 11.6 Let Σ be an alphabet. Then TΣ , is closed under union.

Proof. If L 1 and L 2 are two Turing-acceptable languages, then by Theorem 11.4 there are type 0 gram-
mars G 1 = 〈Ω1 , Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉 that recognize L 1 and L 2 . Without loss of generality,
assume that Ω1 ∩ Ω2 = ;. Choose a new nonterminal Z such that Z ∉ Ω1 ∪ Ω2 , and consider the new
type 0 grammar G ∪ defined by G ∪ = 〈Ω1 ∪ Ω2 ∪ {Z }, Σ, Z , P 1 ∪ P 2 ∪ {Z → S 1 , Z → S 2 }〉. Clearly, L(G ∪ ) =
L(G 1 ) ∪ L(G 2 ). By Theorem 11.2, there is a Turing machine equivalent to G ∪ , and hence L 1 ∪ L 2 is Turing
acceptable.

Theorem 11.6 could be proved directly by constructing a new Turing machine from Turing machines
T1 and T2 accepting L 1 and L 2 . It is a bit harder to give a concrete proof and care must be taken to avoid
inappropriate constructions. For example, it would be incorrect to build the new machine in such a way
that it first simulates T1 halting with Y if T1 does, and then simulating T2 if T1 would have halted with
N . It must be remembered that there is no guarantee that a Turing machine will ever halt for a given
word. The above construction will incorrectly reject words that could be recognized by T2 but which
were rejected by T1 because T1 never halted; the new machine would never get a chance to simulate T2 .
One valid construction involves a two-tape Turing machine, which immediately copies the input word
onto the second tape. By using a cross product of the states of T1 and T2 and appropriate transitions, the
action of both machines could be simultaneously simulated, and the new machine would accept as soon
as either simulation indicated that the word should be accepted. A slight modification of this construct
would show that TΣ is also closed under intersection, but the next theorem outlines a superior method.

Theorem 11.7 Let Σ be an alphabet. Then TΣ is closed under intersection.

Proof. L 1 and L 2 are two Turing-acceptable languages recognized by the Turing machines T1 and T2 ,
respectively. We build a new Turing machine T ∩ with T1 and T2 as submachines. T ∩ transfers control to
the submachine T1 . If T1 never halts, the input will be rejected, which is the desired result. If T1 halts, T ∩
erases the Y and moves the tape head back to the leftmost character and transfers control to the subma-
chine T2 . T ∩ will halt if T2 does, and if T2 also accepts, Y will be left in the proper place on the tape. T ∩
therefore accepts if and only if both T1 and T2 accept, and hence L 1 ∩ L 2 is Turing acceptable.

368
Note that it was important that, except for the presence of Y after the input word, T1 left the tape in
the same condition it found it, with the input string intact for T2 . As with type 3 and type 2 grammars,
there is no pleasant way to combine type 0 grammars to produce a grammar that generates the inter-
section of type 0 languages, although Theorem 11.7 guarantees (along with Corollary 11.1) that such a
grammar must surely exist.

Theorem 11.8 Let Σ be an alphabet. Then TΣ is closed under reversal, homomorphism, inverse homo-
morphism, substitution, concatenation, and Kleene closure.
Proof. The proof for reversal is almost trivial; it is almost as simple as replacing every transition that
moves the tape head to the right with a transition to the left, and likewise making left moves into right
moves. This will yield a mirror image machine, which when started at the rightmost character will print
Y just past the leftmost character. We therefore have to modify this machine by adding preliminary states
that will move the tape head from its traditional leftmost starting position to the opposite end of the word.
Similarly, just before the Y would be printed, we must again move the tape head to the right.
The description of the modifications necessary to convert a type 0 grammar into one that generates the
reverse of the original is even more succinct: Each rule in the original grammar is modified by writing the
characters to the left of the production symbol → backward, and similarly reversing the string on the right
of →. That is, a production like Dc → AB e would become cD → eB A A. A relatively trivial induction on the
number of steps in a derivation proves that the new grammar accepts the reverse of the original language.
The proofs of closure under the remaining operators are left for the exercises.

As shown in Chapter 12, there are some operators under which TΣ is not closed. Complementation
is perhaps the most glaring exception. The closure properties of LΣ are very similar to those of TΣ . In
most cases, slight modifications of the above proofs carry over to the type 1 languages.

Theorem 11.9 Let Σ be an alphabet. Then OΣ is closed under reversal, homomorphism, inverse homo-
morphism, substitution, concatenation, union, and intersection.
Proof. Both proofs given for reversal carry over without modification. In the cognitive approach, the
states added to the mirror image Turing machine keep the tape head within the confines of the input
word, and hence if the original machine was a LBA, the new version will also be a LBA. In the generative
approach, reversing the characters in type 1 productions still results in a type 1 grammar. That is, if the
original grammar had no contracting productions, neither will the new grammar.
Proving that the union of two type 1 languages is type 1 is similar to the proof given in Theorem 11.6,
although care must be taken to avoid extraneous productions of the form Z1 → λ. Building an intersec-
tion machine from two linear bounded automata can be done exactly as described in Theorem 11.7. The
remaining closure properties are left for the exercises.

It is clear from our definitions that OΣ ⊆ ZΣ , but we have yet to prove that OΣ ⊂ ZΣ . That the inclu-
sion is proper and ZΣ is truly a larger class than OΣ will be shown to be a consequence of the material
considered in Chapter 12. Apart from this one missing piece, we have over the course of several chapters
encountered the major components of the following hierarchy theorem.

Theorem 11.10 Let Σ be an alphabet for which kΣk ≥ 2. Then

DΣ = WΣ = RΣ = GΣ ⊂ UΣ ⊂ CΣ = PΣ ⊂ LΣ = OΣ ⊂ ZΣ = TΣ

Proof. The cognitive power of deterministic and nondeterministic finite automata was shown to be
equivalent in Chapter 4, and their relation to regular expressions was investigated in Chapter 6. These

369
were all shown to describe the type 3 languages in Chapter 8. In Chapter 9, Theorem 9.1 and Corollary 9.1
showed that the context-free languages (over alphabets with at least two symbols) properly contained the
unambiguous context-free languages, which in turn properly contained the regular languages. In Chapter
10, the (nondeterministic) pushdown automata were shown to recognize exactly the type 2 languages. The
context-sensitive language {x ∈ {a b ,cc }∗ | |x|a = |x|b = |x|c } is not context free, so the type 1 languages
a ,b
properly contain the type 2 languages. In this chapter, the linear bounded automata were shown to be
recognize exactly the type 1 languages and Turing machines were shown to accept the type 0 languages.
Corollary 12.4 will show that the type 1 languages are properly included in the type 0 languages.

Exercises
11.1. By making the appropriate analogies for states and input, answer the musical question “How is a
Turing machine like an elevator?” What essential (missing) component prevents an elevator from
modeling a general computing device?

11.2. Let Σ = {a b ,cc } and let L = {w | w = w r }.

a ,b

(a) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize L.
(b) Justify that there exists a linear bounded automaton that accepts L.
(c) Describe how nondeterminism or additional tapes and heads might be employed to recog-
nize L.

11.3. Let Σ = {aa }. Explicitly define a deterministic, one-tape, one-head Turing machine that will recog-
a n
nize {a | n is a perfect square}.

11.4. Let Σ = {a
a ,b
b ,cc }.

(a) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize
a k b n c m | (k 6= n) ∧ (n 6= m)}.
{a
(b) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize {x ∈
b ,cc }∗ | |x|a 6= |x|b ∧ |x|b 6= |x|c }.
a ,b
{a

11.5. (a) Recall that there are several common definitions of acceptance that can be applied to Turing
machines. Design a machine M for which

b ,cc }∗ | |x|a = |x|b = |x|c }.

a ,b
L(M ) = L 1 (M ) = L 2 (M ) = L 3 (M ) = {x ∈ {a

(b) For any Turing-acceptable language L, is it always possible to find a corresponding machine
for which L(M ) = L 1 (M ) = L 2 (M ) = L 3 (M ) = L? Justify your answer.

b ,cc }∗ }.
a ,b
11.6. Let L = {w w | w ∈ {a

370
11.7. Given an alphabet Σ = {aa 1 ,a
a 2 ,a
a 3 , . . . ,a
a n }, associate each word with the base n number derived
from the subscripts. Thus, a 3a 2a 4 is associated with 324, a 1 with 1, and λ with 0. These associated
numbers then imply a lexicographic ordering of Σ∗ , with

λ < a 1 < a 2 < a 3 < · · · < a 1a 1 < a 1a 2 < a 1a 3 < · · · < a 2a 1 < a 2a 2 < · · · < a 1a 1a 1 < · · ·

(a) Given an alphabet Σ, build a Turing machine that, given an input word x, will replace that
word with the string that follows x in lexicographic order.
(b) Using the machine in part (a) as a submachine, build a Turing machine that will start with
a blank tape and sequentially generate the words in Σ∗ in lexicographic order, erasing the
previous word as the following word is generated.
(c) Using the machine in part (a) as a submachine, build a Turing machine that will start with a
blank tape and sequentially enumerate the words in Σ∗ in lexicographic order, placing each
successive word to the right of the previous word on the tape, separated by a blank.
(d) Explain how these techniques can be used in building a deterministic version of a nondeter-
ministic Turing machine.

11.8. Define a semi-infinite tape as one that has a distinct left boundary but extends indefinitely to the
right, such as those employed by DFAs.

(a) Given a Turing machine satisfying Definition 11.1, define an equivalent two-track Turing ma-
chine with a semi-infinite tape.
(b) Prove that your construction is equivalent to the original.

11.9. Let Σ = {a a }. Explicitly define a deterministic, one-tape, one-head Turing machine that will recog-
a n | n is a power of 2} = {a
nize {a a ,aa
aa aaaa
aa,aaaa
aaaa, . . .}.

11.10. Define a three-head Turing machine that accepts {x ∈ {a b ,cc }∗ | |x|a = |x|b = |x|c }. Assume that all
a ,b
three heads start on the leftmost character. Is there any need for any of the heads to ever move left?

11.11. Let Σ be an alphabet. Prove that every context-free language is Turing-acceptable by providing the
details for the construction discussed in Lemma 11.1.

11.12. Let Σ be an alphabet. Prove that every type 1 language is a LBL by providing the details for the
construction discussed in Theorem 11.3.

11.13. Let M = 〈Σ, Γ, S, s 0 , δ〉 be a linear bounded automaton. Show how to convert M into a three-track
automaton that never scans any cells but those containing the original word by:

(a) Explicitly defining the new alphabets.

(b) Explicitly defining the new transitions from the old. (Hint: From any state, an old transition
“leaving” the word to scan one of the delimiters must return to the word in a unique manner.)
(c) Prove that for words of length at least 2 your new strict linear bounded automaton accepts
exactly when M does.

11.14. By adding appropriate new symbols (of the form b ) and suitable transitions:

(a) Modify the strict linear bounded automaton defined in Exercise 11.13 so that it correctly han-
dles strings of length 1.

371
(b) Assume that a strict LBA that initially scans a blank is actually scanning an empty tape. If we
expect to handle the empty string, we cannot insist that a strict linear bounded automaton
never scan a cell that is not part of the input string, since the tape head must initially look at
something. If we instead require that the tape head of a strict LBA may never actively move to
a cell that is not part of the input string, then the dilemma is solved. Show that such a strict
LBA can be found for any type 1 language.

11.15. Refer to Theorem 11.4 and show, by inducting on the number of transitions, that

(∀s, t ∈ S ∪ {h})(∀α, β, γ, ω ∈ (Σ ∪ Γ)∗ )

#i αsβ#
(αsβ `* γt ω iff (∃i , j , m, n ∈ N)([# # j ]⇒[#
* #m #n ]))
γt ω#

11.16. State and prove the general induction statement needed to rigorously prove Theorem 11.5.

11.17. If G = 〈Σ, Γ, Z , P 〉 is a grammar for a type 0 language:

(a) Explain why the following construction may not accept L(G)∗ : Choose a new start symbol W ,
and form G∗ = 〈Σ, Γ ∪ {W },W, P ∪ {W → λ,W → W W,W → Z }〉.
(b) Give an example of a grammar that illustrates this flaw.
(c) Given a type 0 grammar G = 〈Σ, Γ, Z , P 〉, define an appropriate grammar G∗ that should ac-
cept the Kleene closure of L(G).
(d) Prove that the construction defined in part (c) has the property that L(G∗ ) = L(G)∗ .

11.18. Let Σ be an alphabet. Prove that TΣ is closed under:

(a) Homomorphism
(b) Inverse homomorphism
(c) Concatenation
(d) Substitution

11.19. (a) Show that any Turing machine A 1 accepting L = L 1 (A 1 ) has an equivalent Turing machine
A 2 for which L = L 2 (A 2 ) by explicitly modifying the quintuple for A 1 and proving that your
construction behaves as desired.
(b) Show that any Turing machine A 2 accepting L = L 2 (A 2 ) has an equivalent Turing machine
A 3 for which L = L 3 (A 3 ) by explicitly modifying the quintuple for A 2 and proving that your
construction behaves as desired.

11.20. Let Σ be an alphabet. Prove that OΣ is closed under:

(a) Homomorphism
(b) Inverse homomorphism
(c) Concatenation
(d) Substitution

372
Chapter 12

Decidability

In this chapter, the nature and limitations of algorithms are explored. We will first look at the general
properties that can be ascertained about finite automata and FAD languages. For example, we might like
to be able to enter the state transition table of a DFA into a suitably sized array and then run a program
that determines whether the DFA was connected. An algorithm for checking this property was outlined
in Chapter 3. Similarly, we have seen that it is possible to write a program to check whether an arbitrary
DFA is minimal. We know this property can be reliably checked because we proved that the algorithms in
Chapter 3 could be applied to ascertain the correct answer for virtually every conceivable DFA. There are
an infinite number of DFAs about which the question can be posed, and yet our algorithm decides the
question correctly in all cases. In the following section we consider questions that can be asked about
more complex languages and machines.
In the latter part of this chapter, we will see that unlike the questions in Sections 12.1 and 12.2, there
are some questions that are in a fundamental sense unanswerable in the general case. That is, there can-
not exist an algorithm that correctly answers such a question in all cases. These questions will be called
undecidable. An undecidable question about Pascal programs is considered in detail in Section 12.3
and is independent of advanced machine theory. The concept of undecidability is addressed formally in
Section 12.4, and other undecidable problems are also presented.

12.1 Decidable Questions About Regular Languages

Recall that a procedure is a finite set of instructions that unambiguously specifies deterministic, discrete
steps for performing some task. In this chapter, the task will generally involve providing the correct
answer to some yes-no question. Most questions that involve a numerical answer can be rephrased as
a yes-no question of similar complexity. For example, the question “What is the minimum number of
states necessary for a DFA to accept the language represented by the regular expression R?” has the yes-
no analog “Does there exist a DFA with fewer than, say, five states that accepts the language represented
by the regular expression R?” Clearly, if we can answer the first question, the second question is easy to
answer. Conversely, if questions like the second one can be answered for any number we wish (rather
than just five), then the answer to the first question can be deduced.
Recall also that an algorithm is a procedure that is guaranteed to halt in all instances. Note that
“guaranteed to halt” does not mean that there is a fixed time limit on how long it may take to finish the
procedure for all inputs; some instances may take far longer than others. For example, the question
“Does there exist a DFA with fewer than ten states that accepts the language represented by abab(b b ∪cc )∗ ?”

373
will probably take less time to answer than “Does there exist a DFA with fewer than ten states that accepts
the language represented by a ∗b ((b b ∗d ∪cc ∗b )d
d ∪ee )∗ ?”
It is important to keep in mind that algorithms are intended to provide a general solution to a vast ar-
ray of similar problems and are (usually) not limited to a single specific instance. As an example, consider
the task of sorting a file containing the three names:

Williams
Jones
Smith

A variety of sorting algorithms, when applied to this file, will produce the correct output. It is also possi-
ble to write a program that ignores its input and always prints the lines

Jones
Smith
Williams

This program does yield the correct answer for the particular problem we wished to solve, and indeed it
solves the sorting problem for all files that contain exactly these three particular names in some arbitrary
order (there are six such files). Thus, this trivial program is an algorithm that solves the sorting problem
for these six specific instances. A slightly more complex program might be capable of printing two or
three distinct answers, depending on the input, and thus solve the sorting problem for an even larger
(but still finite) class of instances.
It should be clear that producing an algorithm that solves a finite set of instances is no great accom-
plishment, since these algorithms are guaranteed to exist. Such an algorithm could be programmed as
one big case statement, which identifies the particular input instance and produces the corresponding
output for that instance. Algorithms that apply to an infinite set of instances are of much more theoreti-
cal and practical interest.

Definition 12.1 Given a set of instances and a yes-no question that can be applied to those instances, we
will say that the question is decidable if there is an algorithm for determining in each instance the (correct)
answer to the question.

A more precise definition of decidability is presented in Section 12.4, based on the perceived rela-
tionship between Turing machines and algorithms. As mentioned earlier, if the set of instances is finite,
an algorithm is guaranteed to exist, no matter how complex the question appears to be.

Example 12.1
A typical set of instances might be the set of all deterministic finite automata over a given alphabet Σ; a
typical question might be whether a given automaton accepts at least one string in Σ∗ .
It is possible to devise an algorithm to correctly answer the question posed in Example 12.1 for every
finite automaton A = 〈Σ, S, s 0 , δ, F 〉. The first idea that might come to mind is to simply look at strings
from Σ∗ in an orderly manner and use δ to determine whether that string is accepted by A; if we find a
string that does reach a final state, it is clear that the answer to the question should be “YES−L(A) 6= ;,”
while if we never find a string that is accepted, the answer should be “NO−L(A) = ;.” This procedure is
guaranteed to halt and give the correct answer if the language is indeed nonempty. However, the proce-
dure will never halt and answer NO (in a finite amount of time) because there are an infinite number of

374
strings in Σ∗ that must be checked. A modification of this basic idea is necessary to produce a procedure
that will halt under all circumstances (that is, to produce an algorithm ).

Theorem 12.1 Given any alphabet Σ and a DFA A = 〈Σ, S, s 0 , δ, F 〉, it is decidable whether L(A) = ;.
Proof. Let n = kSk. Since both Σ and S are finite sets,

B = {λ} ∪ Σ ∪ Σ2 ∪ · · · ∪ Σn−1

is a finite set, and we can examine each string of this set and still have a procedure that halts. There is
clearly an algorithm for determining the set C of all states that are reached by these few strings. Specifically,

C = {δ(s 0 , x) | x ∈ Σ∗ ∧ |x| < n} = {δ(s 0 , x)|x ∈ B }.

Note that Theorem 2.7 implies that if a string (of any length) is accepted by A then there is another string
of length less than n that is also accepted by A. Consequently, it is sufficient to examine only the “short”
strings contained in B rather than examine all of Σ∗ . If any of the strings in B lead to a final state (that is,
if C ∩ F 6= ;), then the answer to the question is clearly “NO−L(A) is not empty,” while if C ∩ F = ;, then
Theorem 2.7 guarantees that “YES−L(A) is empty” is the correct answer. We have therefore constructed an
algorithm (which computes C and then examines C ∩ F , both of which can be done in a finite amount of
time) for determining whether the language accepted by a given machine is empty.

The definition of C does not suggest the most efficient algorithm for calculating the set C ; better
strategies are available. The technique is similar to that employed to find the state equivalence relation
E A . C is actually the set of connected states S c , which can be calculated recursively as indicated in Defi-
nition 3.10. Note that Theorem 12.1 answers the question posed in Example 12.1. The set of instances to
which this question applies can easily be expanded. It can be shown that it is decidable whether L(A) = ;
for any NDFA A by first employing Definition 4.5 to find the equivalent DFA A d and then applying the
method outlined in Theorem 12.1 to that machine. It is possible to find a much more efficient algorithm
for answering this question that does not rely on the conversion to a DFA (see the exercises).
Just as the algorithm for converting an NDFA into a DFA allows the emptiness question to be an-
swered for NDFAs, the techniques in Chapter 6 justify that the similar question for regular expressions is
decidable. That is, since every regular expression has an equivalent DFA, the question of whether a reg-
ular expression describes any strings is clearly decidable. Similar extensions can be applied to most of
the results in this section. Just as we can decide whether a DFA A accepts any strings, we can also decide
if A accepts an infinity of strings, as shown by Theorem 12.2. This can be proved by a related appeal to
Theorem 12.1, but an efficient algorithm for answering this question depends on the following lemma.

Lemma 12.1 Let Σ be an alphabet, A = 〈Σ, S, s 0 , δ, F 〉 be a finite automaton, n = kSk, and M = {x | x ∈

L(A) ∧ |x| ≥ n}. Then, if M 6= ;, M must contain a string of minimal length (call it x m ), and furthermore
|x m | < 2n.
Proof. The proof is obtained by repeated application of the pumping lemma with i = 0 (see the exercises
and Theorem 2.7).

A question similar to the one posed in Theorem 12.1 is “Does a given DFA accept a finite or an infinite
number of strings?” This is also a decidable question, as demonstrated by the following theorem. The
proof is based on the observation that a DFA A that accepts no strings of length greater than some fixed
constant must by definition recognize a finite set, while the pumping lemma implies that if L(A) contains
a sufficiently long string, then L(A) must contain an infinite number of related strings.

375
Theorem 12.2 Given any alphabet Σ and a DFA A = 〈Σ, S, s 0 , δ, F 〉, it is decidable whether L(A) is an infi-
nite set.
Proof. Let n = kSk. Clearly, if A accepts no strings of length n or greater, then L(A) is finite. From the
pumping lemma, we know that if A accepts even one string of length equal to or greater than n, then A
must accept an infinite number of strings. We still cannot check all the strings of length greater than n and
have a procedure that halts, so Lemma 12.1 will be invoked to argue that if a long string is accepted by A,
then a string whose length is in the range n ≤ |x| < 2n must be accepted, and it is therefore sufficient to
check the strings in this limited range. Thus, our algorithm will consist of computing the intersection of
{δ(s 0 , y) | y ∈ Σ∗ ∧ n ≤ |y| < 2n} and F . L(A) is infinite iff this intersection is nonempty.

If we were to write a program that consulted the matrix containing the state transition table for A
to actually determine {δ(s 0 , y) | y ∈ Σ∗ ∧ n ≤ |y| < 2n}, it would be very inefficient to implement this
computation as implied by the definition. Repeatedly looking up entries in the state transition table to
determine δ for each word in this large class of specified strings would involve an enormous duplication
of effort. It is far better to recursively calculate R i = {δ(s 0 , x) | x ∈ Σi }, which represents the set of all states
that can be reached by strings of length exactly i . This can be easily computed by defining R 0 = {s 0 } and
using the recursive formula
a ) | a ∈ Σ, s ∈ R i }
R i +1 = {δ(s,a
Successive sets can thereby be calculated from R 0 . When R i is reached, it is checked against F , and
the algorithm halts and returns Yes if they have a common state. Otherwise, R n+1 through R 2n−1 are
checked, and No is returned if no final state appears in this group. This method is easily adaptable to
nondeterministic finite automata by setting R 0 to be the set of all start states and adjusting the definition
of R i +1 to conform to NDFA notation.
The involved arguments presented in Lemma 12.1 and the proof of Theorem 12.2 are necessary to
justify that the above efficient recursive algorithm correctly answers the question of whether a finite
automaton accepts an infinite number of strings. However, if we were simply interested in justifying that
it is decidable whether L(A) is infinite, without worrying about efficiency, it would have been much more
convenient to simply adapt the result of Theorem 12.1. In particular, we could have easily built a DFA
that accepts all strings of length at least n, form the “intersection” machine, and apply Theorem 12.1 to
the new machine.
Specifically, if A is an n-state deterministic finite automaton, consider the DFA A n = 〈Σ, {r 0 , r 1 , r 2 , . . . ,
r n }, r 0 , δn , {r n }〉, where δn is defined by

a ∈ Σ)(δn (r i ,a
(∀i = 0, 1, . . . , n)(∀a a ) = r max{i +1,n} )

It is easy to show that L(A n ) = {x ∈ Σ∗ | |x| ≥ n}, and building A ∩ as specified in Lemma 5.1 produces a
DFA for which L(A ∩ ) = {x ∈ L(A) | |x| ≥ n}. The question of whether L(A) is infinite now becomes the
question of whether L(A ∩ ) is nonempty, which was shown to be decidable by Theorem 12.1.
An indication of the nature of the automaton A ∩ is given in Figure 12.1. The above argument pro-
vides a much shorter and clearer proof of Theorem 12.2, but it should not be construed to be the basis
of an efficient algorithm. Forming the intersection of A and A ∩ involves well over n 2 states, and thus
applying the technique described in Theorem 12.1 to A ∩ may involve more than n 2 iterations. For our
purposes, we will henceforth be content to discover whether various tasks are merely possible and not
be concerned with efficiency.
The following theorem answers a major question about DFAs: “Are two given deterministic finite
automata equivalent?” At first glance, this appears to be a hard question; an initial strategy might be

376
Figure 12.1: The automaton A n

to check longer and longer strings, and answer “No, they are not equivalent” if a string is found that is
accepted by one machine but is not accepted by the other. As in the proof of Theorems 12.1 and 12.2, we
would again be faced with the task of determining when we could confidently stop checking strings and
answer “Yes, they are equivalent.”
Such a strategy can be made to work, but an easier method is again available. We are essentially
checking whether the start state of the first machine treats strings differently than does the start state
of the second machine. This problem was addressed in Chapter 3, and an algorithm that accomplished
this sort of checking has already been presented. This observation provides the basis for the proof of the
following theorem.

Theorem 12.3 Given any alphabet Σ and two DFAs A 1 = 〈Σ, S 1 , s 01 , δ1 , F 1 〉 and A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉, it is
decidable whether L(A 1 ) = L(A 2 ).
Proof. Without loss of generality, assume that S 1 ∩ S 2 = ;, and construct a new DFA defined by A =
〈Σ, S 1 ∪ S 2 , s 01 , δ, F 1 ∪ F 2 〉, where

δ1 (s,a
½
a ), iff s ∈ S 1
a ∈ Σ)δ(s,a
(∀s ∈ S 1 ∪ S 2 )(∀a a) =
δ2 (s,a
a ), iff s ∈ S 2

Corollary 3.5 outlines the algorithm for constructing E A for this machine, and it should be clear from the
definition of A that s 01 E A s 02 ⇔ L(A 1 ) = L(A 2 ).

Example 12.2
Consider the two machines A 1 and A 2 displayed in Figure 12.2. The machine A constructed according to
Theorem 12.3 would look like the diagram inside the dotted box shown in Figure 12.3. This new machine
is very definitely disconnected, and in this example s 01 is not related to s 02 by E A since these two states
treat ab differently (ab
ab is accepted by A 1 and rejected by A 2 ). The reader is encouraged to generate an-
other example using two equivalent machines, and verify that the two original start states would indeed
be related by E A .
The following theorem explores the relationship between the complexity of a given regular expres-
sion and the size of the corresponding minimal DFA.

Theorem 12.4 Given any alphabet Σ and a regular expression R over Σ, it is decidable whether there exists
a DFA with fewer than five final states that accepts the language described by R.
Proof. Given R, Lemma 6.2 indicates the algorithm (generated by the constructions presented in Theo-
rems 5.2, 5.4, and 5.5) for building some NDFA that accepts the regular set corresponding to R. Definition
4.5 outlines the algorithm for converting this NDFA into a DFA. Theorem 3.7 and Corollary 3.5 indicate
the algorithms for minimizing this DFA. Counting the number of final states in this minimal machine will
allow the question to be answered correctly.

The careful reader may have noticed that the minimal machine described in Chapters 2 and 3 was
only advertised to have the minimum total number of states and has not yet been guaranteed to have

377
Figure 12.2: The DFAs discussion in Example 12.1

Figure 12.3: The composite DFA discussed in 12.1

the smallest number of final states (perhaps there is an equivalent machine with many more nonfinal
states but fewer final states). An investigation of the relationship between the final states of the minimal
machine and the equivalence classes comprising the right congruence generated by this language will
show that no equivalent machine can have fewer final states than the minimal machine has (see the
exercises).
The proofs of Theorems 12.3 and 12.4 are good examples of using existing algorithms to build new
algorithms. This technique should be applied whenever possible in the following exercises. It is certainly
useful in resolving the following question about grammars.
Given two right linear grammars G 1 = 〈Ω1 , Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉, it is clearly decidable
whether G 2 is equivalent to G 1 . An algorithm can be formed that simply:

1. Uses the construction presented in Lemma 8.2 to find AG 1 and AG 2 .

2. Converts these NDFAs to two DFAs called A 1 and A 2 .

3. Appeals to the algorithm presented in Theorem 12.3 to correctly answer the question.

A trivial extension of this idea proves the following theorem.

378
Theorem 12.5 It is decidable whether two given regular grammars G 1 = 〈Ω, Σ, S 1 , P 1 〉 and G 2 = 〈Ω2 , Σ, S 2 , P 2 〉
are equivalent.
Proof. See the exercises.

Most of the decidability questions we have asked about languages recognized by finite automata or
described by regular expressions can also be answered for languages generated by grammars through a
similar transformation of existing algorithms. Such algorithms are generally not the most efficient ones
available, and it can often be instructive to develop a new method from scratch. This is especially true of
the following question, which has no analog in the realms of finite automata or regular expressions.

Theorem 12.6 It is decidable whether a given right-linear grammar G = 〈Ω, Σ, S, P 〉 contains any useless
nonterminals.
Proof. Recall that a nonterminal is useless if it can never appear in the derivation of any valid terminal
string. Essentially, only two things can prevent a nonterminal X from being effectively used somewhere in
a valid derivation: either X can never appear as part of a partial derivation that begins with only the start
symbol (no matter how many productions we apply), or, once X is generated, it can never lead to a valid
terminal string.
Finding the members of Ω that can be produced from S is a simple recursive procedure: Begin with
Z0 = {S} and form Z1 by adding to Z0 all the nonterminals that appear on the right side of productions
that are used to replace S. Then form Z2 by adding to Z1 all the nonterminals that appear on the right side
of productions that are used to replace members of Z1 and so on. More formally:

Z0 = {S}

and for i ≥ 1,
Zi +1 = Zi ∪ {Y ∈ Ω | (∃x ∈ Σ∗ )(∃T ∈ Zi ) 3 T → xY is a production in P }

Clearly, Z0 ⊆ Z1 ⊆ · · · ⊆ Zn ⊆ · · · ⊆ Ω, and as was shown for similar collections of nested entities (such as
E 0A , E 1A , . . . in Chapter 3), after a finite number of steps we will reach the point where Zm = Zm+1 and Zm
will then represent the set of all nonterminals that can be reached from the start symbol S.
In a similar fashion, we can generate another nested sequence of sets W0 ,W1 , . . . , where Wi represents
the set of all nonterminals that can produce a terminal string in i or fewer steps. We are again guaranteed
to reach a point where Wn = Wn+i and Wn will indeed be the set of all nonterminals that can ever produce
a valid terminal string.
Zm ∩ Wn is thus the set of all useful members of Ω, and Ω − (Zm ∩ Wn ) is therefore the set of all useless
nonterminals.

Example 12.3
a ,b
G 4 = 〈{S, A, B,C ,V,W, X }, {a b ,cc }, S, {S → ab A|bb
bb cc ab,C → λ|cc S,V → a V |cc X ,W →
ccV, A → b C |cc X , B → ab
bbB |cc
aa
aa|aa W, X → b V |aa
aa
aaX }〉 contains three useless nonterminals, V , W , and X . Recursively calculating the
sets described in the above proof yields:

Z0 = {S} W0 = { }
Z1 = {S, A, B,V } W1 = {C , B,W }
Z2 = {S, A, B,V,C , X } W2 = {C , B,W, A, S}
Z3 = {S, A, B,V,C , X } W3 = {C , B,W, A, S}

379
Thus W cannot be generated from the start symbol, and V and X cannot produce terminal strings. The
useful symbols are Z2 ∩ W2 = {S, A, B,C },
The techniques employed here should look somewhat familiar. They involve iteration methods simi-
lar to those developed in Chapter 3. In fact, it is possible to apply the connectivity algorithms for nonde-
terministic finite automata to this problem by transforming the right-linear grammar G into the NDFA
AG , as defined in the proof of Lemma 8.1. The automaton corresponding to the grammar in Example
12.1 is shown in Figure 12.4. Note that the state labeled <W > is inaccessible, which means that it can-
not be reached from <S>. This indicates that there is no sequence of productions starting with the start
symbol S that will produce a string containing W .
Checking whether a nonterminal such as V can produce a terminal string is tantamount to checking
V V
whether the language accepted by AG is nonempty, where AG is AG with the start state moved to the
V
state labeled <V >. Since both L(AG ) and L(AGX ) are empty, V and X are useless.

12.2 Other Decidable Questions

It is fairly easy to find succinct algorithms that answer most of the reasonable questions one might ask
about representations of regular languages. For each of the more complex classes of languages, there
are many reasonable questions that are not decidable. Several of these will be presented in the following
sections. In this section, we consider some of the answerable questions that can be asked about the more
robust machines and grammars.

Theorem 12.7 Given any context-free grammar G, it is decidable whether L(G) = ;.

Proof. In Theorem 9.3, a scheme was presented that specified how to build several automata that would
be used to identify the useless nonterminals in G = 〈Σ, Γ, S, P 〉. Since L(G) = ; iff the start symbol S is
useless, there is an algorithm for testing whether L(G) = ;.

It is also possible to tell whether a context-free grammar generates a finite or an infinite number of
distinct words. The proof is based on the same principle that was employed in the pumping theorem
proof: the presence of long strings implies that some useful nonterminal A must derive a nontrivial
sentential form containing A. The start state S must produce a useful sentential form containing A, and
A can then be used to generate an infinite series of distinct strings.

Theorem 12.8 Given any context-free grammar G, it is decidable whether L(G) is infinite.
Proof. Let G = 〈Σ, Γ, S, P 〉 be a context-free grammar. By Theorem 9.5, there exists a Chomsky normal
0
form grammar G 0 = 〈Σ, Γ0 , S, P 0 〉 that is equivalent to G. Let n = 2|Γ | . By Theorem 9.7, any string in L(G)
of length n or greater can be pumped and will imply that L(G) is infinite. An argument similar to that of
Lemma 12.1 will show that it is sufficient to check strings in the set {y | y ∈ Σ∗ ∧ n ≤ |y| < 2n} for member-
ship in L(G). There are only a finite number of derivation sequences that can produce words in this range.
The algorithm for determining whether L(G) is infinite will check whether

{y | y ∈ L(G) ∧ n ≤ |y| < 2n}

is empty; if so, L(G) is finite, and L(G) is infinite otherwise.

The exercises explore more efficient methods for searching for a string that can be pumped. The inti-
mate correspondence between context-free grammars and pushdown automata guarantees that similar
questions about PDAs are decidable.

380
Figure 12.4: The automaton discussed in Example 12.3

381
Corollary 12.1 Given any pushdown automaton P , it is decidable whether:

a. L(P ) is empty.

b. L(P ) is finite.

c. L(P ) is infinite.

Proof. By Theorem 10.3, every PDA P has a corresponding context-free grammar G P . The algorithms
described in Theorems 12.7 and 12.8 can be applied to G P to determine the nature of L(G P ), and since
L(P ) = L(G P ), the same questions about P are likewise decidable.

Given a particular word x and a context-free grammar G, it is decidable whether x can be generated
by G. In fact, this question can be decided for context-sensitive grammars, too. The proof heavily relies
on the fact that no sentential form longer than x can possibly generate x in the absence of contracting
productions in the grammar.

Theorem 12.9 Given any context-sensitive grammar G and any word x, it is decidable whether x ∈ L(G).
Proof. Let G = 〈Σ, Γ, S, P 〉 be a context-sensitive grammar and let x ∈ Σ∗ . It is possible to construct
a (finite) graph and apply existing algorithms from graph theory to answer the question of whether G
generates x. The nodes of the graph will correspond to the strings from (Σ ∪ Γ)∗ of length n or less. The
(directed) edges from a node representing a given sentential form w lead to the strings (of length n or less)
that can be generated from w by applying a single production from P . Both the sentential form x and S
will appear in this graph, and the question of whether x ∈ L(G) is equivalent to the question of whether
there is a path from S to x. There are many standard algorithms for determining paths and components
in a graph, and thus the question of whether x ∈ L(G) is decidable.

The generation of all the edges in the graph generally involves more effort than is needed to answer
the question. A more efficient method is similar to the recursive calculations used to find the set of
connected states in a DFA. Beginning with {S}, the production set P can be consulted to determine the
labels of nodes that can be derived from S in one step. These new labels can be added to the set of
accessible sentential forms, and the added nodes can be checked until no new labels are found. The set
of sentential forms will then consist of
*
{w ∈ (Σ ∪ Γ)∗ | S ⇒ w ∧ |w| ≤ n}

and contain all words in L(G) of length ≤ n, If we are only interested in the specific word x, then the
algorithm can return Yes as soon as x appears in the set of accessible sentential forms and would return
No if x did not appear by the time the set stopped growing.
The above algorithm will suffice for any grammar that does not contain contracting productions, but
can clearly give the wrong answers when applied to type 0 grammars. Since the length of sentential forms
can both grow and shrink in unrestricted grammars, the word x may actually be generated by a sequence
of productions that at some point generates a sentential form longer than x. Such a sequence would not
be considered by the method outlined in Theorem 12.9, and the algorithm might answer No when the
correct answer is Yes. We could define a procedure that looked at larger and larger graphs (consisting
of more and longer sentential forms), which would halt and answer Yes if a derivation sequence for x
was discovered. If x actually can be generated by G, this method will eventually uncover the appropriate
sequence. We therefore have a procedure that will reliably tell us if a word can be generated by an unre-
stricted grammar. Unless we include a specification of when to stop and answer No, this procedure is not

382
an algorithm. In later sections, we will see that it is impossible to determine, for an arbitrary type 0 gram-
mar G, if an arbitrary word x is not generated by G. The question of whether x ∈ L(G) is not decidable for
arbitrary grammars.
It turns out that there are many reasonable questions such as this one that cannot be determined
algorithmically. We begin our overview of undecidable problems with an analysis of a very reasonable
question concerning Pascal programs. Subsequent sections consider undecidable questions concerning
the grammars and machines covered in this text.

12.3 An Undecidable Problem

Having now developed a false sense of security about our ability to produce algorithms for determining
many properties about machines and languages, we now step back and see whether there is anything
we cannot do algorithmically. A simple counting argument will show that there are too many things to
calculate and not enough algorithms with which to calculate them all. It may be helpful to review the
section on cardinality in Chapter 0 and recall that there are different orders of infinity. A diagonalization
argument showed that the natural numbers could not be put in one-to-one correspondence with the real
numbers; there are simply too many real numbers to allow such a matching to occur. A similar mismatch
occurs when comparing the (countable) number of algorithms to the (uncountable) number of possible
yes-no functions.
By definition, an algorithm is a finite list of instructions, written over some finite character set. As
such, there are only a countable number of different algorithms that can be written. It may be helpful
to consider the set of all Pascal programs and view each file that contains the ASCII code for a program,
which is essentially a sequence of zeros and ones, as one very long binary integer. Clearly, an infinite
number of Pascal programs can be written, but no more programs than there are binary integers, so the
number of such files is indeed countable.
Now consider the possible lists of answers that could be given to questions involving a countable
number of instances. We will argue that there are an uncountable number of yes-no patterns that might
describe the answers to such questions. Notice that the descriptions for automata, grammars, and the
like are also finite, and thus there are a countable number of DFAs, a countable number of grammars, and
so on, that can be described. The questions we asked in the previous sections were therefore applied to
a countable number of instances, and these instances could be ordered in some well-defined way, much
as the natural numbers are ordered. If we think of a yes response corresponding to the digit 1 and a no
response corresponding to 0, then the corresponding series of answers to a particular question can be
thought of as an unending sequence of 0s and 1s. By placing a decimal point at the beginning of the
sequence, each such pattern can be thought of as a binary fraction, representing a real number between
.00000 . . . = 0 and .111111 . . . = 1. Conversely, each such real number in this range represents a sequence
of yes-no answers to some question. Since there are an uncountable number of real numbers between
0 and 1, there are an uncountable number of answers that might be of interest to us. Some of these
answers cannot be obtained by algorithms, since there are not enough algorithms to go around. Thus,
there must be many questions that are not decidable.
It is not immediately apparent that the existence of undecidable questions is much of a drawback;
perhaps all the “interesting” questions are decidable. After all, there are an uncountable number of
real numbers, yet all computers and many humans seem to make do with just the countable number of
rational numbers. Unfortunately, there are many simple and meaningful questions that are undecidable.
We discuss one such question now; others are considered in the next section.

383
Just about every programmer has had the experience of running a program that never produces any
output and never shows any sign of halting. For programs that are fairly short, this is usually not a prob-
lem. For major projects that are expected to take a very long time, there comes an agonizing moment
when we have to give up hope that it is on the verge of producing a useful answer and stop the program
on the assumption that it has entered an infinite loop. While it would be very nice to have a utility that
would look over a program and predict how long it would run, most of us would settle for a device that
would simply predict whether or not it will ever halt.
It’s a good bet that you have never used such a device, which may at first seem strange since a solution
to the halting problem would certainly provide information that would often be useful. If you have never
thought about this before, you might surmise that the scarcity of such programs is a consequence of
any one of several limiting factors. Perhaps they are inordinately expensive to run, or no one has taken
the time to implement an existing scheme, or perhaps no one has yet figured out how to develop the
appropriate algorithms. In actuality, no one is even looking for a “halting” algorithm, since no such
algorithm can possibly exist.
Let us consider the implications that would arise if such an algorithm could be programmed in, say,
Pascal. We can consider such an algorithm to be implemented as a Boolean function called HALT, which
looks at whatever program happens to be in the file named data.p and returns the value TRUE if that
program will halt, and returns FALSE if the program in data.p would never halt. Perhaps this function is
general enough to look at source code for many different languages, but we will see that it is impossible
for it to simply respond correctly even when looking solely at Pascal programs.
The programmer of the function HALT would likely have envisioned it to be used in a program such
as CHECK, shown in Figure 12.5. We will use it in a slightly different way and show that a contradiction
arises if HALT really did solve the halting problem. Our specific assumptions are that:

1. HALT is written in Pascal.

2. HALT always gives a correct answer to the halting problem, which means:

(a) It always returns an answer after a finite amount of time.

(b) The answer returned is FALSE if the Pascal program in data.p would never halt.
(c) The answer returned is TRUE if the Pascal program in data.p would halt (or if the program in
data.p will not compile).

Consider the program TEST in Figure 12.6, which is structured so that it will run forever if the function
HALT indicates that the program in the file data.p would halt, and simply quits if HALT indicates that the
program in data.p would not halt. Some interesting things happen if we run this program after putting a
copy of the source code for TEST in data.p.
If HALT does not produce an answer, then HALT certainly does not behave as advertised, and we have
an immediate contradiction. HALT is supposed to be an algorithm, so it must eventually return with an
answer. Since HALT is a Boolean function, we have only two other cases to consider.
Case 1: HALT returns a value of TRUE to the calling program TEST. This has two consequences, the
first of which is implied by the asserted behavior of HALT.

i. If halt does what it is supposed to do, this means that the program in data.p halts. We ran this pro-
gram with the source code for TEST in data.p, so TEST must actually halt.

The second consequence comes from examining the code for TEST, and noting what happens when
HALT returns TRUE.

384
program CHECK;
{ envisioned usage of HALT }
function HALT:boolean;
begin
{ marvelous code goes here }
end { HALT }

begin { CHECK }
if HALT then
writeln( ’The program in file data.p will halt’ )
else
writeln( ’The program in file data.p will not halt’)
end { CHECK }.

Figure 12.5: A possible usage of HALT

ii. The if statement in the program TEST then causes the infinite loop to be entered, and TEST runs
forever, doing nothing particularly useful.

Our two consequences are that TEST halts and TEST does not halt. This is a clear contradiction, and so
case 1 never occurs.
Case 2: HALT returns a value of FALSE to the calling program TEST. This likewise has two conse-
quences. Considering the advertised behavior of HALT, this must mean that the program in data.p, TEST,
must not halt. However, the code for TEST shows that if HALT returns FALSE we execute the else state-
ment, write one line, and then stop. TEST therefore halts. TEST must again both halt and not halt.
Whichever way we turn, we reach a contradiction. The only possible conclusion is that the function
HALT does not behave as advertised. It must either return no answer, or give an incorrect answer.
It should be clear that the problem cannot be fixed by having the programmer who proposed the
function HALT fiddle with the code; the above contradiction will be reached regardless of what code
appears between the begin and end statements in the function HALT. We have shown that any such
proposed function is guaranteed to behave inappropriately when fed a program such as TEST. In actu-
ality, there are an infinite number of programs that cause HALT to misbehave, but it was sufficient to
demonstrate just one failure to justify that no such function can solve the general problem.
The above argument demonstrates that the halting problem for Pascal programs is undecidable or
unsolvable. That is, there does not exist a Pascal program that can always decide correctly, when fed an
arbitrary Pascal program, whether that program halts.
If we were to define an algorithm as “something that can be programmed in Pascal,” we would have
shown that there is no algorithm for deciding whether an arbitrary Pascal program halts. One might
suspect that this is therefore not a very satisfying definition of what an algorithm is, since we have a
concise, well-stated problem that cannot be solved using Pascal. It is generally agreed that the problem
does not lie with some overlooked feature that was inadvertently not incorporated into Pascal. Clearly, all
programming languages suffer from similar inadequacies. For example, an argument similar to the one
presented for Pascal would show that no C program can be devised that can tell which C programs can
halt. Thus, no other programming language can provide a more robust definition of what an algorithm
is.

385
program TEST;
{ to be placed in the file data.p }
var FOREVER: boolean;
function HALT: boolean;
begin
{ marvelous code goes here }
end; { HALT }

begin { TEST }
FOREVER : = false;
if HALT then
repeat
FOREVER : = false;
until FOREVER
else
writeln( ’This program halts’)
end { TEST }.

Figure 12.6: Another program incorporating HALT

There are variations on this theme that likewise lead to contradictions. Might there be a Pascal pro-
gram that can check which C programs can halt? If you believe that every Pascal program can be rewritten
as an equivalent C program, the answer is definitely no; a Pascal program that checks C programs could
then be rewritten as a C program (which checks C programs), and we again reach a contradiction.
It is generally agreed that the limitations do not arise from some correctable inadequacy in our cur-
rent methods of implementing algorithms. That is, the limitations of algorithmic solutions seem to be
inherent in the nature of algorithms. Programming languages, Turing machines, grammars, and all other
proposed systems for implementing algorithms have been shown to be subject to the same limitations
in computational power. The use of Turing machines to implement algorithms has several implications
that apply to the theory of languages. These are explored in the following sections.

12.4 Turing Decidability

In the previous section, we saw that no Pascal program could always correctly predict when another Pas-
cal program would halt. A similar statement was true for C programs, and Turing machines, considered
as computing devices, are no different; no Turing machine solves the halting problem.
Each of us is probably familiar with the way in which a Pascal program reads a file, and hence it is not
hard to imagine a Pascal program that reacts to the code for another Pascal program. As long as the input
alphabet contains at least two symbols, encodings can be defined for the structure of a Turing machine,
which allows the blueprint for its finite state control to be placed on the input tape of another Turing
machine. A binary encoding might be given for the number of states, followed by codes that enumerate
the moves from each of the states. Just as we are not presently concerned about the exact ASCII codes
that define the individual characters in a file containing a Pascal program, we need not be concerned

386
with the specific representation used to encode a Turing machine on an input tape.
Consider input tapes that contain the encoding of a Turing machine, followed by some delimiter,
followed by an input word. Assume there exists a Turing machine H that, given such an encoding of an
arbitrary machine and an input word, always correctly predicts whether the Turing machine represented
by that encoding halts for that particular word. This assumption leads to a contradiction exactly as shown
in the last section for Pascal programs. We would be able to use the machine H as a submachine in
another Turing machine that halts exactly when it is not supposed to halt, and thereby show that H
cannot possibly behave properly.

Theorem 12.10 Given a Turing machine M and a word w, it is undecidable whether M halts when the
string w is placed on the input tape.
Proof. The proof is essentially the same argument that was presented in the last section.

We will see that the unsolvability of the halting problem will imply that it is not decidable whether a
given string will cause a Turing machine to halt and print Y . If a word is accepted, this fact can eventually
be discovered, but we cannot reliably tell which words are rejected by an arbitrary Turing machine. If we
could, we would have an algorithm for computing the complement of any Turing-acceptable language.
In the next section, we will show that there are Turing-acceptable languages that have complements that
are not Turing-acceptable, which means that a general algorithm for computing complements cannot
exist.
A problem equivalent to the halting problem involves the question of whether an arbitrary type 0
grammar accepts a given word. This can be seen to be almost the same question as was asked of Turing
machines.

Theorem 12.11 Given a type 0 grammar G and a word w, it is undecidable whether G generates w.
Proof. If this question were decidable, it would provide an algorithm for solving the halting problem,
which is known to be undecidable. That is, if there existed an algorithm for deciding whether w ∈ L(G),
there would also be an algorithm for deciding whether w is accepted by a Turing machine. The Turing
machine algorithm would operate as follows:
Given an arbitrary Turing machine M , modify M to produce M 0 , an equivalent machine that halts
only when it accepts. Use Definition 11.8 to find the corresponding type 0 grammar G M 0 , which is also
equivalent to M . The algorithm that predicts whether w ∈ L(G M 0 ) can now be used to decide whether M
halts on input w.
This scheme would therefore solve the halting problem for an arbitrary Turing machine, and hence the
algorithm that predicts whether w ∈ L(G) cannot exist. Thus, w ∈ L(G) is undecidable for arbitrary type 0
grammars.

Given the intimate correspondence between Turing machines and type 0 grammars, it is perhaps
not surprising that it is just as hard to solve the membership question for type 0 grammars as it was to
solve the halting problem for Turing machines. We now consider a question that may initially appear
to be more tractable than the halting problem. However, it will be shown to be unsolvable by the same
reasoning used in the last theorem: if this question could be decided, then the halting problem would be
decidable.

Theorem 12.12 Given an arbitrary Turing machine T , it is undecidable whether T accepts λ.

Proof. Assume that there exists an algorithm for deciding whether T accepts λ. That is, assume that
there exists a Turing machine X that, when fed an encoding of any Turing machine T , halts with Y when

387
T would accept λ and halts with N whenever T rejects λ. X can then be used to determine whether an
arbitrary Turing machine M would accept an arbitrary word x. Given a machine M and a string x, it is
easy to modify M to produce a new Turing machine T M x , which accepts λ exactly when M accepts x. T M x ,
is formed by adding a new start state that checks whether the read head is initially scanning a blank (that
is, if λ is on the input tape). If not, control remains in this state, and T M x never halts. However, if the
initially scanned symbol is a blank, new states are used to write x on the input tape and return the read
head to the leftmost symbol of x. Control then passes to the original start state of M . In this manner, T M x
accepts λ exactly when M accepts x.
This correspondence makes it possible to use the Turing machine X as a submachine in another Turing
machine X H that solves the halting problem. That is, given an input tape with an encoding of a machine
M followed by the symbols for a word x, X H can be easily programmed to modify the encoding of M to
produce the encoding of T M x and leave this new encoding on the input tape before passing control to the
submachine X . X H then accepts exactly when T M x accepts λ, which happens exactly when the original
machine M halts on input x. X H would therefore represent an algorithm for solving the halting problem,
which we know cannot exist. The portion of the machine that modifies the encoding of M is quite elemen-
tary, so it must be the submachine X that cannot exist. Thus, there is no algorithm that can accomplish
the task for which X was designed, that is, determining whether an arbitrary Turing machine T accepts
the empty string.

The conclusion that X was the portion of X H that behaves improperly is akin to the observation in
the previous section that the main part of the Pascal program TEST was valid, and hence it must be the
function HALT that behaves incorrectly.

12.5 Turing-Decidable Languages

We now consider languages whose criteria for membership is related to the halting problem. Define
the language D to be those words that either are not encodings of Turing machines or are encodings
of machines that would halt with Y when presented with their own encoding on their input tape. The
language D is Turing acceptable, since a multitape machine could copy the input word to a second tape,
check whether the encoding truly represented a valid Turing machine, and then use the “directions” on
the second tape to simulate the action of the encoded machine on the original input. The multitape
machine would halt with Y if the encoding was invalid or if the simulated machine ever accepts.
On the other hand, the complement of D is not Turing acceptable. Let U be the set of all valid encod-
ings of Turing machines that do not halt when fed their own encodings. Then U = ∼D, and there does
not exist a machine T for which L(T ) = U . If such a machine existed, it would have an encoding, and this
leads to the same problem encountered with the HALT function in Pascal. This encoding of T is either
a word in U or is not a word in U ; both cases lead to contradictions. If the encoding of T belongs to U ,
then by definition of U it does not halt when fed its own encoding. But the assumption that L(T ) = U
requires that T halt with Y for all encodings belonging to U , which means T must halt when fed its own
encoding. A similar contradiction is also reached if the encoding of T does not belong to U . Therefore,
no such Turing machine T can exist, and U is an example of a language that is not Turing-acceptable.
We have finally found a language that is not type 0. A counting argument would have shown that,
since there are only a countable number of type 0 grammars and an uncountable number of subsets of
Σ∗ , there had to be many languages over Σ that are not in ZΣ ( = TΣ ). We have now seen that some of
these unrepresentable languages are meaningful sets for which it would be quite desirable to be able to
recognize or generate.

388
Theorem 12.13 If kΣk ≥ 2, then TΣ is not closed under complementation.
Proof. Encodings of arbitrary Turing machines can be effectively accomplished with only two distinct
symbols in the alphabet. The Turing-acceptable language D described above has a complement U that is
not Turing-acceptable.

Our original criteria for belonging to the language L accepted by a Turing machine M implied that
M would eventually halt when presented with any word in L, but we had no guarantees about how M
will behave when presented with a word that is not in L. M may halt with N on the tape, or M may run
forever. Indeed, we have just seen a Turing-acceptable language for which this will be the best we can
hope for. Turing machines therefore embody procedures, which are essentially a deterministic set of step-
by-step instructions. We now consider the languages accepted by the subclass of Turing machines that
correspond to algorithms, procedures that are guaranteed to eventually halt under all circumstances.

Definition 12.2 Let Σ be an alphabet. Define HΣ to be the collection of all languages that can be recognized
by Turing machines that halt on all input.

Languages in HΣ are called Turing-decidable languages. A trivial modification shows that L ∈ HΣ if

there exists a Turing machine that not only halts upon placing a Y after the input word on an otherwise
blank tape for accepted words, but similarly preserves the input word and prints N for each rejected
string. Such devices will be referred to as halting Turing machines.

Theorem 12.14 HΣ is closed under complementation.

Proof. Let L be a Turing-decidable language. Then there must exist a Turing machine H for which
L(H ) = L and that halts with Y or N for all strings in Σ∗ . The finite-state control of H can be easily modified
to produce a Turing machine H 0 for which L(H 0 ) = ∼L. All that is required is to replace every transition in
H that prints N with a similar transition that prints Y and likewise make sure that N will be printed by
H 0 whenever H prints Y .

This result has some immediate consequences.

Corollary 12.2 There is a language that is Turing acceptable but not Turing decidable. That is, HΣ ⊂ TΣ
Proof. Definition 12.2 implies that HΣ ⊆ TΣ . By Theorems 12.13 and 12.14, these two classes have
different closure properties, and thus they cannot be equal. Therefore, HΣ ⊂ TΣ .

Actually, we have already seen a language that is Turing acceptable but not Turing decidable. D was
shown to be Turing acceptable, but if D were Turing decidable, then its complement would be Turing
decidable by Theorem 12.14. However, ∼D = U , and U is definitely not Turing decidable since it is not
even Turing acceptable.
OΣ , the context-sensitive languages, is another subclass of TΣ . It is possible to determine how HΣ
relates to OΣ and thereby insert HΣ into the language hierarchy.

Corollary 12.3 Every context-sensitive language is Turing decidable. That is, OΣ ⊆ HΣ

Proof. This is actually a corollary of Theorem 12.9. Given a type 1 language L, there is a context-
sensitive grammar G that generates L. The proof of Theorem 12.9 presented an algorithm for determining
whether an arbitrary word can be generated by G. This algorithm can be implemented as a Turing machine
TG that can determine whether a given word can be generated by G and always halts with the correct
answer. Thus, L is Turing decidable.

389
These implications provide the missing element in the proof of Theorem 11.10, as stated in the next
corollary.

Corollary 12.4 The class of context-sensitive languages is properly contained in the class of Turing accept-
able languages. That is, OΣ ⊂ TΣ .
Proof. By the previous corollaries, OΣ ⊆ HΣ and HΣ ⊂ TΣ .

Actually, the context-sensitive languages are properly contained in HΣ . This will be shown by ex-
hibiting a language that is recognized by a Turing machine that halts on all inputs, but that cannot be
generated by any context-sensitive language. The following proof, based on diagonalization, should by
now look familiar.

Theorem 12.15 Let Σ be an alphabet for which kΣk ≥ 2. There is a language that is Turing decidable but
not context sensitive. That is, OΣ ⊂ HΣ .
Proof. By Corollary 12.3, OΣ ⊆ HΣ , and it remains to be shown that there is a member of HΣ that is
not a member of OΣ . By the technique described in Theorem 12.9, every context-sensitive grammar can be
represented by a halting Turing machine, and each such Turing machine has an encoding of its finite-state
control. Define L to be the set of all encodings of Turing machines that:

1. Represent context-sensitive grammars.

2. Reject their own encoding.

Providing a scheme for encoding the quadruple for a context-sensitive grammar is left for the exercises. Any
reasonable encoding scheme will make it a simple task to determine whether a candidate string represents
nonsense or a valid context-sensitive grammar.
L can therefore be recognized by a halting Turing machine that:

1. Checks if the string on the input tape represents the encoding of a valid context-sensitive grammar.

2. Calculates the encoding of the corresponding Turing machine.

3. Simulates that Turing machine being fed its own encoding.

This process is guaranteed to halt, since the Turing machine being simulated is known to be a halting
Turing machine. Thus, L ∈ HΣ . However, if L ∈ OΣ , we find ourselves in a familiar dilemma. If there is
a context-sensitive grammar G L that generates L, then this grammar would have a corresponding Turing
machine TL , which would have an encoding x L . If x L did not belong to L, then by definition of L it would
be an encoding of a machine (TL ) that did not reject its own encoding (x L ). Thus, TL recognizes x L , and
therefore the corresponding grammar G L must generate X L . But then X L ∈ L(G L ) = L, contradicting the
assumption that X L did not belong to L. If on the other hand X L belongs to L, then, by definition of L, TL
must reject its own encoding (x L ), and thus x L ∉ L(TL ) = L(G L ) = L, which is another contradiction. Thus,
no such context-sensitive grammar G L can exist, and L is not a context-sensitive language.

The above diagonalization technique can be generalized; given any enumerable class M of languages
whose members are all represented by halting Turing machines, there must exist a halting Turing ma-
chine that recognizes a language not in M (see the exercises). The following theorem summarizes how
the other classes of languages discussed in the text fit in the language hierarchy.

390
Theorem 12.16 . Let Σ be an alphabet for which kΣk ≥ 2. Then

DΣ = WΣ = RΣ = GΣ ⊂ AΣ ⊂ CΣ = PΣ ⊂ LΣ = OΣ ⊂ HΣ ⊂ ZΣ = TΣ ⊂ ℘(Σ∗ )

Proof. The relationship between the type 0, type 1, type 2, and type 3 languages was outlined in Theo-
rem 11.10. Theorem 10.8 showed that AΣ properly lies between the type 3 and type 2 languages. Corollary
12.2 and Theorem 12.15 show that HΣ properly lies between the type 1 and type 0 languages and also show
that the type 1 languages are a proper subset of the type 0 languages. The existence of languages that are
not Turing acceptable shows that TΣ is properly contained in ℘(Σ∗ ). A counting argument shows that
proper containment of TΣ in ℘(Σ∗ ) also holds even if Σ is a singleton set.

The relationships between six distinct and nontrivial classes of languages are characterized by The-
orem 12.16. Each of these classes is defined by a particular type of automaton. The trivial class of all
languages, ℘(Σ∗ ), was shown to have no mechanical counterpart. We have seen that type 3 languages
appear in many useful applications. Program design, lexical analysis, and various engineering problems
are aided by the use of finite automata concepts. Programming languages are always defined in such
a way that they belong to the class AΣ , since compilers should operate deterministically. The theory
of compiler construction builds on the material presented here; syntactic analysis, the translation from
source code to machine code, is guided by the generation of parse trees for the sentences in the pro-
gram, which in turn give meaning to the code. The type 0 languages represent the fundamental limits
of mechanical computation. The concepts presented in this text provide a foundation for the study of
computational complexity and other elements of computation theory.

Exercises
12.1. Verify the assertions made in the proof of Theorem 12.1 concerning Theorem 2.7.

12.2. Prove Lemma 12.1.

12.3. Given an FAD language L, the minimal DFA accepting L, and another machine B for which L(B ) =
L, prove that the number of nonfinal states in the minimal machine must be equal to or less than
the number of nonfinal states in B .

12.4. Given two DFAs A 1 = 〈Σ, S 1 , s 01 , δ1 , F 1 〉 and A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉, show that it is decidable whether
L(A 1 ) ⊆ L(A 2 ).

12.5. Given any alphabet Σ and a DFA A = 〈Σ, S, s 0 , δ, F 〉, show that it is decidable whether L(A) is cofinite.
(Note: A set L is cofinite iff its complement is finite, that is, iff Σ∗ − L is finite.)

12.6. Given any alphabet Σ and a DFA A = 〈Σ, S, s 0 , δ, F 〉, show that it is decidable whether L(A) contains
any string of length greater than 1228.

12.7. Given any alphabet Σ and a DFA A = 〈Σ, S, s 0 , δ, F 〉, show that it is decidable whether A accepts any
even-length strings.

12.8. Given any alphabet Σ and regular expressions R 1 and R 2 over Σ, show that it is decidable whether
R 1 and R 2 represent languages that are complements of each other.

391
12.9. Given any alphabet Σ and regular expressions R 1 and R 2 over Σ, show that it is decidable whether
R 1 and R 2 describe any common strings.

12.10. Given any alphabet Σ and a regular expression R 1 over Σ, show that it is decidable whether there is
a DFA with less than 31 states that accepts the language described by R 1 .

12.11. Given any alphabet Σ and a regular expressions R 1 over Σ, show that it is decidable whether there
is a DFA with more than 31 states that accepts the language described by R 1 . (You should be able
to argue that there is a one-step algorithm that always supplies the correct yes-no answer to this
question.)

12.12. Given any alphabet Σ and a regular expression R over Σ, show that it is decidable whether there
exists a NDFA (with λ-moves) with at most one final state that accepts R.

12.13. Given any alphabet Σ and a DFA A = 〈Σ, S, s 0 , δ, F 〉, show that it is decidable whether there exists a
NDFA (without λ-moves) with at most one final state that accepts the same language A does.

12.14. Given any alphabet Σ and regular expressions R 1 and R 2 over Σ, show that it is decidable whether
R1 = R2 .

12.15. Given any alphabet Σ and regular expressions R 1 and R 2 over Σ (which represent languages L 1 and
L 2 , respectively), show that it is decidable whether they generate the same right congruences (that
is, whether R L 1 = R L 2 ).

12.16. Prove Theorem 12.5.

12.17. Outline an efficient algorithm for computing {δ(s 0 , y) | y ∈ Σ∗ ∧n ≤ |y| < 2n} in the proof of Theorem
12.2, and justify why your procedure always halts.

12.18. Consider intersecting the set {δ(s 0 , y) | y ∈ Σ∗ ∧ 5n ≤ |y| < 6n} with F to answer the question posed
in Theorem 12.2. Would this strategy always produce the correct answer? Justify your claims.

12.19. Show that it is decidable whether two Mealy machines are equivalent.

12.20. Show that it is decidable whether two Moore machines are equivalent.

12.21. Given any alphabet Σ and a regular expression R, show that it is decidable whether R represents
any strings of length greater than 28. Give an argument that does not depend on finite automata
or grammars.

12.22. Given any alphabet Σ and a right-linear grammar G, show that it is decidable whether L(G) con-
tains any string of length greater than 28. Give an argument that does not depend on finite au-
tomata or regular expressions.

12.23. Refer to the proof of Theorem 12.6 and prove that Z0 ⊆ Z1 ⊆ · · · ⊆ Zn ⊆ · · · ⊆ Ω.

12.24. Refer to the proof of Theorem 12.6 and prove that if (∃m ∈ N)(Zm = Zm+1 ) then Zm will then repre-
sent the set of all nonterminals that can be reached from the start symbol S.

12.25. Refer to the proof of Theorem 12.6 and prove that (∃m ∈ N)(Zm = Zm+1 ).

12.26. (a) Refer to the proof of Theorem 12.6 and give a formal definition of Wi .

392
(b) Prove that W0 ⊆ W1 ⊆ · · · ⊆ Wn ⊆ · · · ⊆ Ω.

12.27. Refer to the proof of Theorem 12.6 and prove that if (∃m ∈ N)(Wm = Wm+1 ) then Wm will represent
the set of all nonterminals that can produce valid terminal strings.

12.28. Refer to the proof of Theorem 12.6 and prove that (∃m ∈ N)(Wm = Wm+1 ).

12.29. Let A be an arbitrary NDFA (with λ-moves). A string processed by A may successfully find several
paths through the machine; it is also possible that a string will be rejected because there are no
complete paths available.

(a) Show that it is decidable whether there exists a string with no complete path in A.
(b) Show that it is decidable whether there exists a string that has at least one path through A that
leads to a nonfinal state.
(c) Show that it is decidable whether there exists a string accepted by A for which all complete
paths lead to final states.
(d) Show that it is decidable whether all strings accepted by A have the property that all their
complete paths lead to final states.
(e) Show that it is decidable whether all strings have unique paths through A.

12.30. Given two DFAs A 1 = 〈Σ, S 1 , s 01 , δ1 , F 1 〉 and A 2 = 〈Σ, S 2 , s 02 , δ2 , F 2 〉:

(a) Show that it is decidable whether there exists a homomorphism between A 1 and A 2 .
(b) Show that it is decidable whether there exists an isomorphism between A 1 and A 2 .
(c) Show that it is decidable whether there exist more than three isomorphisms between A 1 and
A 2 . (Note: There are examples of [disconnected] DFAs for which more than three isomor-
phisms do exist!)

12.31. Given any alphabet Σ and a regular expression R 1 over Σ, show that it is decidable whether R 1
describes an infinite number of strings. Do this by developing an algorithm that does not depend
on the construction of a DFA, that is, does not depend on Theorem 12.2.

12.32. Given a Mealy machine M and a Moore machine A, show that it is decidable whether M is equiva-
lent to A.

12.33. Given any alphabet Σ and regular expressions R 1 and R 2 over Σ, show that it is decidable whether
the language represented by R 2 properly contains that of R 1 .

12.34. It can be shown that it is decidable whether L(A) = ; for any NDFA A by first finding the equivalent
DFA A d and applying Theorem 12.1 to that machine.

(a) Give an efficient method for answering this question that does not rely on the conversion to
a DFA.
(b) Give an efficient method for testing whether L(A) is infinite for any NDFA A. Your method
should likewise not rely on the conversion to a DFA.

12.35. Given a DPDA M , show that it is decidable whether L(M ) is a regular set.

393
12.36. (a) Refer to Theorem 12.9 and outline an appropriate algorithm for determining paths in the
graphs discussed.
(b) Give the details for a more efficient recursive algorithm.

12.37. Prove that HΣ is closed under:

(a) Union
(b) Intersection
(c) Concatenation
(d) Reversal

12.38. Let L = L 2 (T ) for some Turing machine T that halts on all inputs. That is, let L consist of all strings
that cause T to halt with Y somewhere on the tape. Prove that there exists a halting Turing machine
T for which L = L 2 (T ) = L(T ). T must:

1. Halt on all input.

2. Place a Y after the input word on an otherwise blank tape for accepted words.
3. Place an N after the input word on an otherwise blank tape for rejected words.

12.39. (a) Assume there is a Turing machine M M that determines whether an encoding of a Turing ma-
chine T belongs to some set X . Let the class of languages recognized by Turing machines
with encodings in X be denoted by M. Prove that if every encoding in X represents a halting
Turing machine then there must exist a halting Turing machine that recognizes a language
not in M.
(b) Apply part (a) to prove Theorem 12.15.

12.40. (a) Outline a scheme for encoding the quadruple of context-sensitive grammars suitable for use
by a Turing machine. You may assume that there are exactly two terminal symbols, but note
that your scheme must be able to handle an unrestricted number of nonterminals.
(b) Outline the algorithm that a Turing machine might use to decide whether an input string
represented the encoding of a valid context-sensitive grammar.

12.41. Show that it is undecidable whether L(X ) = ; for:

(a) Arbitrary Turing machines X

(b) Arbitrary halting Turing machines X
(c) Arbitrary context-sensitive grammars X
(d) Arbitrary linear bounded automata X

12.42. Show that it is undecidable whether L(X ) = Σ∗ for:

(a) Arbitrary Turing machines X

(b) Arbitrary halting Turing machines X
(c) Arbitrary context-sensitive grammars X
(d) Arbitrary linear bounded automata X

394
(e) Arbitrary context-free grammars X
(f) Arbitrary pushdown automata X

12.43. Consider the set E of all encodings of Turing machines that halt on input λ. Prove or disprove:

(a) E ∈ TΣ
(b) E ∈ HΣ
(c) E ∈ OΣ

12.44. Consider the set N of all encodings of Turing machines that do not halt on input λ. Prove or dis-
prove:

(a) N ∈ TΣ
(b) N ∈ HΣ
(c) N ∈ OΣ

395
Index

A c (A connected) branch instructions, 55

for DFA, 95
for FST, 211 Γ, see output alphabet
for MSM, 223 CΣ (C-sigma), 281
A/E A (A modulo E A ) C-rules, 249
for DFA, 97 cardinality, 15
AΣ (A-sigma ), 333 Cartesian product, see cross product
absorption law, 7 Chomsky normal form, see CNF
accept by empty stack, 310 clock pulse, 46
accept by final state, 31, 35, 310 closed, see closure
accept circuitry, 49 closure, 143
accepting state (DFA), 30 under complementation, 144, 158, 301
accessible state, 86, 211 under concatenation, 151, 301
addition automaton (FST), 226 under homomorphism, 160
algorithm, 85, 373 under intersection, 147, 158, 301
alphabet, 26 under inverse homomorphism, 161
auxiliary, 340, 356 under Kleene closure, 154, 301
input, 30, 115, 203, 308, 340, 356 under language homomorphism, 159, 195, 300
output, 203 under quotient, 163
stack, 308 under reversal, 167, 305
ambiguous context free grammar, 275 under substitution, 194, 300
ambiguous context free language, inherently, 281 under union, 145, 300
ASCII alphabet, 26, 40, 47, 63 CNF, 291
associative law, 7, 27, 175 cofinite, 22
automaton, see DFA, NDFA, FST, MSM, PDA, DPDA, collection C Σ , 164
LBA, Turing machine decidability, 391
auxiliary tape of a PDA, 308 set, 60
cognitive device, 246
Backus-Naur Form, see BNF commutative law, 7, 27, 175
balanced parentheses language, 345 composition, 13
basic NDFAs, 176 concatenation
basis step, 17 of languages, 148
biconditional, 6 of strings, 27
bijection, 13, 92 congruence modulo n, 10
binary alphabet, 26, 226 connected DFA, 86
black box model, 29 context free, 248
blank symbol, 40, 341 context-sensitive grammar, 246, 247
BNF, 18, 43, 84, 123 context-sensitive grammar (pure), 246

396
converse, 14 PDAs, 314, 316, 325, 326
corresponding deterministic finite automaton, 120 regular expressions, 175, 193
countable, 16 transducers, 209, 218, 219
counting automaton, 307 Turing machines, 345, 348–350, 353, 354, 357
cross product, 8, 147, 328, 368 extended output function, 217
extended output function ω
DΣ (D-sigma ), 144 for FST, 206
δ function, see state transition function for MSM, 217
δ, see extended state transition function δ extended state transition function δ
D flip-flop, 46 for DFA, 33
DCFL, 333 for FST, 206
De Morgan’s law, 7 for MSM, 216
dead state, 114 for NDFA, 116
decidable, 374 for NDFA A λ , 131
denumerable, 16
derivation, 250 FΣ (F-sigma ), 317
derivation tree, 272 final state
deterministic context-free language, 333 for DFA, 30
deterministic finite acceptor, see DFA for NDFA, 115
deterministic finite automaton, see DFA for PDA, 308
deterministic pushdown automaton, 329 for Turing machine, 340
DFA, see finite automaton, 30 finite automaton, 25
directly derive, 250 C implementation, 41
disconnected DFA, 86 circuit implementation, 46
DO loop lookahead, 283 deterministic, 31
DPDA, 329 homomorphism, 91
isomorphism, 92
² (regular expression), 174 minimal deterministic, 86, 95, 99
²-move, see λ-transition nondeterministic, 115
empty string, 28 Pascal implementation, 40
empty word, 28 finite automaton definable, 38
end markers for LBA, 356 finite transducer definable, see FTD
EOF packet, 56, 225 finite-state transducer, see FST
equal, 8 FST, 203
equality FTD, 208
of rationals, 9 function, 11
of sets, 8
GΣ (G-sigma ), 251
of strings, 27
garbage state, 114
equipotent, 15
generative device, 246
equivalent
GNF, 292
FST states, 213 grammars, 243
DFA states, 86, 87 Greibach normal form, 292
DFAs, 73, 96, 98
logical expressions, 6, 17 HΣ (H-sigma ), 389
MSM states, 223 halt state, 341
NDFAs, 133 halting problem

397
for Turing machines, 347 type k, 246, 248, 249, 254, 255
in C, 385 unambiguous, 281
in Pascal, 385 language accepted by M , 344
head recursive, 58 language accepted by M, 356
height of a parse tree, 297 language equations, 179
hierarchy language homomorphism, 158
of grammars, 249, 263 latch, 46
of language classes, 369, 391 LBA, 356
homomorphism LBL, 356
between DFAs, 91 left linear, 191
between FSTs, 209 left recursive, 292
between MSMs, 221 left-linear grammar, 254
leftmost derivation, 275
ith partial state equivalence relation (DFA), 99
length, 161
ith partial state equivalence relation (FST), 215
length preserving, 159, 207
ith partial state set (DFA), 103
linear bounded automaton, 349, 356
identical DFAs, 92
linear bounded language, see LBL
inaccessible, 86
LL0, 283
inaccessible state, 86
logic gates, 46
inherently ambiguous, 281
initial sets, 67 M connected (FST M c ), 211
initial state, see start state M connected (MSM M c ), 223
injective, 13 M/E M , 213, 223
instance of a question, 374 Mealy machine homomorphism, 209
inverse homomorphic image, 160 Mealy machine isomorphism, 210
isomorphic, 93, 210, 221 Mealy sequential machine, see FST
isomorphism Mechanical Turk, 2
between DFAs, 85, 92 minimal, 73
between FSTs, 209
minimal machine, 51
between MSMs, 221
minimal Mealy machine, 209
Kermit protocol, 56, 225 minimal Moore machine, 223
Kleene closure, 154, 173, 177, 260, 327, 334 Moore machine homomorphism, 221
Kleene’s Theorem, 193 Moore machine isomorphism, 221
Moore sequential machine, see MSM
λ-closure, 130 MSM, 216
λ-transition, 130 multi-head Turing machine, 352
LΣ (L-sigma ), 357 multi-tape Turing machine, 388
lambda-closure, see λ-closure multi-track Turing machine, 349
lambda-move, see λ-transition
language, 37 NΣ (N-sigma ), 158
accepted by empty stack PDA, 313 N, see natural numbers
accepted by final state PDA, 313 natural numbers, 8, 123
inherently ambiguous, 281 NDFA, 115, 316
linear bounded, 356 nondeterministic finite automaton, see NDFA
Turing-acceptable, 344 nondeterministic pushdown automaton, see NPDA
Turing-decidable, 389 nondeterministic Turing machine, 351

398
nongenerative production, 288 regular language, 193
nonterminal, 244 regular set over Σ, 173
normal forms, 248 regular set substitution, 193
NPDA, 308 relabeling, 89
relation induced by L, 66
OΣ (O-sigma ), 357 reset circuitry, 50
one to one, 13 right congruence, 65
onto, 12 right linear, 191
opcodes, 55 right-linear grammar, 249
output alphabet, 203 rightmost derivation, 275
PΣ (P-sigma ), 317
sack and stone machine, see counting automaton
parenthesis checker, 345
semantics, 273
parse tree, 272
semi-infinite tape, 371
PCNF, 291
sequential machine, see transducer
PDA, 308
stack, 308
PDNF, 6
start state, 31, 115, 130, 204, 216, 308, 341, 356
PGNF, 292
state equivalence, 87
power set, 15
state equivalence relation
preorder traversal, 275
for DFA, 87
principal disjunctive normal form, 6
for transducers, 213
procedure, 85
state transition function
productions, 243
for DFA, 30
protocol, 56
for FST, 203
pumping lemma, 73
for LBA, 356
pure Chomsky normal form, see PCNF
for MSM, 216
pure context-free grammar, 248
for NDFA, 115
pure context-sensitive grammar, 246
for NDFA A λ , 131
pure Greibach normal form, 292
for Turing machine, 340
pushdown automaton, 76, 308
submachines, 342
Q (rationals), 8 subset, 8
quotient, 162 substitution, 299
context free, 300
RΣ (R-sigma ), 176 regular set, 193
R (reals), 8 subtraction grammar, 277
range, 12 surjective, 12
rank, 67 symmetric, 8
read/write head, 308, 339 syntax diagrams, 18
recognizer, 25, 35
reduced TΣ (T-sigma ), 354
DFA, 89 Tail recursion, 34
FST,MSM, 213 tape
refinement, 10 auxiliary, 308
reflexive, 8 blank, 343
regular expression, 174 input, 29
regular expression grammar, 273 multi-track, 350

399
output, 208 useful, 286, 379
stack, 308 useless, 286, 379
terminal, 244
terminal sets, 88 WΣ (W-sigma ), 157
transducer, 203, 216
XΣ (X-sigma ), 305
transitive, 8
translation function (FST), 207 ZΣ (Z-sigma ), 354
translation function (MSM), 218 Z (integers), 8
Turing machine, 340
bounded, see LBA
bounded on one end, 349
configuration, 343
corresponding grammar, 360
deterministic, 340
encoding, 386
halt state, 341
halting problem, 347
linear bounded, see LBA
moves, 341
multi-head, 352
multi-tape, 388
multi-track, 349
nondeterministic, 351
parenthesis checker, 345
two-dimensional, 351
undecidable problems, 386
Turing’s World, 25, 341, 348
Turing-acceptable, 344
Turing-decidable languages, 389
type 0 grammar, 245
type 1 grammar, 247
type 2 grammar, 249
type 3 grammar, 255

UΣ (U-sigma ), 281
unambiguous context free grammar, 281
unambiguous grammar, 275
undecidable, 385
undecidable problems, 383
C halting problem, 385
Pascal halting problem, 384–386
Turing machine halting problems, 387–388
unit production, 288
unreachable state, 86
unrestricted, 245
unsolvable, 385

400
401

Tailoring: Training Manual
100% (5)
Tailoring: Training Manual
42 pages
K.L.P. Mishra (FLAT)
43% (7)
K.L.P. Mishra (FLAT)
434 pages
Air Compressor Parts PDF
0% (1)
Air Compressor Parts PDF
51 pages
The Pillar of Computation Theory
100% (3)
The Pillar of Computation Theory
343 pages
Automata Notes
No ratings yet
Automata Notes
75 pages
Theory of Computation 1
No ratings yet
Theory of Computation 1
131 pages
RC msn4
No ratings yet
RC msn4
151 pages
Operating System
No ratings yet
Operating System
231 pages
Theory of Finite Automata With A Introduction To Formal Language - Carroll, John PDF
100% (1)
Theory of Finite Automata With A Introduction To Formal Language - Carroll, John PDF
447 pages
Onga'nya 24
No ratings yet
Onga'nya 24
23 pages
Finite Automata Course Notes by Mark V. Lawson
100% (2)
Finite Automata Course Notes by Mark V. Lawson
194 pages
Introduction To The Theory of Computing
No ratings yet
Introduction To The Theory of Computing
227 pages
6 Accounting For Merchandising Businesses
100% (1)
6 Accounting For Merchandising Businesses
107 pages
C' Ifornia: California Code Ol, Regulations
No ratings yet
C' Ifornia: California Code Ol, Regulations
62 pages
C
100% (1)
C
75 pages
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
No ratings yet
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
45 pages
2022.alley Stoughton - Formal Languages - Book
No ratings yet
2022.alley Stoughton - Formal Languages - Book
406 pages
Bamboo Art: Terracotta
No ratings yet
Bamboo Art: Terracotta
2 pages
Computer Science Press - Introduction To Logic and Automata
No ratings yet
Computer Science Press - Introduction To Logic and Automata
302 pages
Station Island
No ratings yet
Station Island
246 pages
Theory of Computation
100% (2)
Theory of Computation
220 pages
Automata and Complexity Theory Module
No ratings yet
Automata and Complexity Theory Module
102 pages
Theory of Finite Automata With A Introduction To Formal Language - Carroll, John
No ratings yet
Theory of Finite Automata With A Introduction To Formal Language - Carroll, John
447 pages
Gallier Theory of Computation
No ratings yet
Gallier Theory of Computation
398 pages
Notes
No ratings yet
Notes
210 pages
Notes
No ratings yet
Notes
193 pages
Automata and Computability
No ratings yet
Automata and Computability
240 pages
CHAPTER2 AUTOMATA Landscape
No ratings yet
CHAPTER2 AUTOMATA Landscape
112 pages
SOLUTION, SUSPENSION and COLLOID Activity Sheet
67% (3)
SOLUTION, SUSPENSION and COLLOID Activity Sheet
1 page
Final3 PDF
No ratings yet
Final3 PDF
91 pages
Full Notes
No ratings yet
Full Notes
152 pages
Mycbseguide: Class 12 - Accountancy Sample Paper 07
No ratings yet
Mycbseguide: Class 12 - Accountancy Sample Paper 07
15 pages
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
No ratings yet
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
109 pages
G52lac Notes
No ratings yet
G52lac Notes
128 pages
Week4 Chapter2 Automata
No ratings yet
Week4 Chapter2 Automata
52 pages
Lec. Notes
No ratings yet
Lec. Notes
210 pages
Models of Computation
No ratings yet
Models of Computation
30 pages
Lectures 1 To 31
No ratings yet
Lectures 1 To 31
118 pages
Wolfgang Aat
No ratings yet
Wolfgang Aat
189 pages
01-Sap Annual Report2023 Ang
No ratings yet
01-Sap Annual Report2023 Ang
119 pages
G52lac Notes 2up PDF
No ratings yet
G52lac Notes 2up PDF
84 pages
Computer Science & Information Technology: Theory of Computation
No ratings yet
Computer Science & Information Technology: Theory of Computation
21 pages
Machines and Their Languages (G51MAL) Lecture Notes Spring 2003
No ratings yet
Machines and Their Languages (G51MAL) Lecture Notes Spring 2003
27 pages
An Introduction To Formal Languages and Automata Book
No ratings yet
An Introduction To Formal Languages and Automata Book
14 pages
Raghuvamsa CantoV English Meaning
No ratings yet
Raghuvamsa CantoV English Meaning
69 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
Knitting Chapter
No ratings yet
Knitting Chapter
12 pages
Oracle Database 12c: OR1, 5 Tage
No ratings yet
Oracle Database 12c: OR1, 5 Tage
1 page
MB-310 Dynamics 365 Finance
No ratings yet
MB-310 Dynamics 365 Finance
13 pages
Hazid Record
No ratings yet
Hazid Record
21 pages
Chapter 18: C++ As A Better C Introducing Object Technology
No ratings yet
Chapter 18: C++ As A Better C Introducing Object Technology
23 pages
Water Ingress Analysis and Splash Protection Evaluation For Vehicle Wading Using Non-Classical CFD Simulation
No ratings yet
Water Ingress Analysis and Splash Protection Evaluation For Vehicle Wading Using Non-Classical CFD Simulation
13 pages
Computer (Eng) SSC CHSL 2024 All 70 Questions (RBE)
No ratings yet
Computer (Eng) SSC CHSL 2024 All 70 Questions (RBE)
8 pages
Critical Analysis of My Mother at Sixty Six
No ratings yet
Critical Analysis of My Mother at Sixty Six
7 pages
Cosc261 Notes 1
No ratings yet
Cosc261 Notes 1
17 pages
Betas
No ratings yet
Betas
4 pages
BROCHURE
No ratings yet
BROCHURE
4 pages
Cadd 1 Final Tos
No ratings yet
Cadd 1 Final Tos
2 pages
Imagine
No ratings yet
Imagine
5 pages
7º Basico B (7 Grade) : I.-Listening Test (Questions 1 To 10) Listen and Answer
No ratings yet
7º Basico B (7 Grade) : I.-Listening Test (Questions 1 To 10) Listen and Answer
3 pages
Grammar Test - Year - 6 - Feira
No ratings yet
Grammar Test - Year - 6 - Feira
2 pages
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
No ratings yet
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
1 page
130 FT End Fed Half Wave - Multiband Operation
No ratings yet
130 FT End Fed Half Wave - Multiband Operation
1 page
Centrifugation I.: Centrifuge Selection - Tubular Bowl Centrifuge
No ratings yet
Centrifugation I.: Centrifuge Selection - Tubular Bowl Centrifuge
3 pages
Agstar Technical Series:: Complete Mix Digesters
No ratings yet
Agstar Technical Series:: Complete Mix Digesters
2 pages