We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
154 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES
Exercise 4.3.4: Give an algorithm to tell whether two regular languages L;
and Lz have at least one string in common,
Exercise 4.3.5: Give an algorithm to tell, for two regular languages L, and
Lz over the same alphabet 5, whether there is any string in D* that is in neither
Ly, nor Lp.
4.4 Equivalence and Minimization of Automata
In contrast to the previous questions — emptiness and membership — whose
algorithms were rather simple, the question of whether two descriptions of two
regular languages actually define the same language involves considerable intel-
lectual mechanics. In this section we discuss how to test whether two descriptors
for regular languages are equivalent, in the sense that they define the same lan-
guage. An important consequence of this test is that there is a way to minimize
a DFA. That is, we can take any DFA and find an equivalent DFA that has
the minimum number of states. In fact, this DFA is essentially unique: given
any two minimum-state DFA’s that are equivalent, we can always find a way
to rename the states so that the two DFA’s become the same.
4.4.1 Testing Equivalence of States
We shall begin by asking a question about the states of a single DFA. Our goal
is to understand when two distinct states p and q can be replaced by a single
state that behaves like both p and q. We say that states p and g are equivalent
ift
¢ For all input strings w, 6(p, w) is an accepting state if and only if 6(q,w)
is an accepting state.
Less formally, it is impossible to tell the difference between equivalent states
p and q merely by starting in one of the states and asking whether or not a
given input string leads to acceptance when the automaton is started in this
(unknown) state. Note we do not require that 4(p,w) and (g,w) are the same
state, only that either both are accepting or both are nonaccepting.
If two states are not equivalent, then we say they are distinguishable. That
is, state p is distinguishable from state q if there is at least one string w such
that one of 5p, w) and &a, w) is accepting, and the other is not accepting.
Example 4.18: Consider the DFA of Fig. 4.8, whose transition function we
shall refer to as 6 in this example. Certain pairs of states are obviously not
equivalent. For example, C and G are not equivalent because one is accepting
and the other is not. That is, the empty string distinguishes these two states,
because 4(C, ¢) is accepting and &(G,6) is not.
Consider states A and G. String € doesn’t distinguish them, because they are
both nonaccepting states. String 0 doesn’t distinguish them because they go to4.4. EQUIVALENCE AND MINIMIZATION OF AUTOMATA 155
Figure 4.8: An automaton with equivalent states
states B and G, respectively on input 0, and both these states are nonaccepting.
Likewise, string 1 doesn’t distinguish A from G, because they go to F and E,
respectively, and both are nonaccepting. However, 01 distinguishes A from G,
because 6(A,01) = C, 6(G,01) = EB, C is accepting, and E is not. Any input
string that takes A and G to states only one of which is accepting is sufficient
to prove that A and G are not equivalent.
In contrast, consider states A and E. Neither is accepting, so € does not
distinguish them. On input 1, they both go to state F. Thus, no input string
that begins with 1 can distinguish A from £, since for any string x, 6(A,1z) =
&(E, 12).
Now consider the behavior of states A and E on inputs that begin with 0.
They go to states B and H, respectively. Since neither is accepting, string 0
by itself does not distinguish A from E. However, B and H are no help. On
input 1 they both go to C, and on input 0 they both go to G. Thus, all inputs
that begin with 0 will fail to distinguish A from E. We conclude that no input
string whatsoever will distinguish A from £; i.e., they are equivalent states. O
To find states that are equivalent, we make our best efforts to find pairs
of states that are distinguishable. It is perhaps surprising, but true, that if
we try our best, according to the algorithm to be described below, then any
pair of states that we do not find distinguishable are equivalent. The algo-
rithm, which we refer to as the table-filling algorithm, is a recursive discovery
of distinguishable pairs in a DFA A = (Q,5, 4,90, F)-
BASIs: If p is an accepting state and q is nonaccepting, then the pair {p,q} is
distinguishable.
INDUCTION: Let p and q be states such that for some input symbol a, r =
6(p,a) and s = 6(q,a) are a pair of states known to be distinguishable. Then156 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES
{p,q} is a pair of distinguishable states. The reason this rule makes sense is
that there must be some string w that distinguishes r from s; that is, exactly
one of &r, w) and 6(s,w) is accepting. Then string aw must distinguish p from
q, since 5(p, aw) and 5(q, aw) is the same pair of states as 4(r, 1) and 4(s, w).
Example 4.19: Let us cxccute the table-filling algorithm on the DFA of
Fig 4.8. The final table is shown in Fig. 4.9, where an z indicates pairs of
distinguishable states, and the blank squares indicate those pairs that have
been found equivalent. Initially, there are no z’s in the table.
xt
x |x
x
a
oe fo |e
*
a [oe
Zam AB
|e
» Is
oe
x
ABCDEFG
Figure 4.9: Table of state inequivalences
For the basis, since C is the only accepting state, we put 2’s in each pair
that involves C. Now that we know some distinguishable pairs, we can discover
others. For instance, since {C,H} is distinguishable, and states E and F go to
H and C, respectively, on input 0, we know that {E, F} is also a distinguishable
pair. In fact, all the 2’s in Fig. 4.9 with the exception of the pair {A,G} can be
discovered simply by looking at the transitions from the pair of states on either
0 or on 1, and observing that (for one of those inputs) one state goes to C and
the other does not. We can show {A,G} is distinguishable on the next round,
since on input 1 they go to F and E, respectively, and we already established
that the pair {E, F} is distinguishable.
However, then we can discover no more distinguishable pairs. The three
remaining pairs, which are therefore equivalent pairs, are {A, E}, {B,H}, and
{D,F}. For example, consider why we can not infer that {A, E} is a distin-
guishable pair. On input 0, A and E go to B and H, respectively, and {B, H}
has not yet been shown distinguishable. On input 1, A and E both go to F, so
there is no hope of distinguishing them that way. The other two pairs, {B, H}
and {D, F} will never be distinguished because they each have identical tran-
sitions on 0 and identical transitions on 1. Thus, the table-filling algorithm
stops with the table as shown in Fig. 4.9, which is the correct determination of
equivalent and distinguishable states. O4.4. EQUIVALENCE AND MINIMIZATION OF AUTOMATA 157
Theorem 4.20: If two states are not distinguished by the table-filling algo-
rithm, then the states are equivalent.
PROOF: Let us again assume we are talking of the DFA A = (Q,5, 6,90, F).
Suppose the theorem is false; that is, there is at least one pair of states {p,q}
such that
1, States p and q are distinguishable, in the sense that there is some string
w such that exactly one of 6(p, w) and 6(q,w) is accepting, and yet
2. The table-filling algorithm does not find p and g to be distinguished.
Call such a pair of states a bad pair.
If there are bad pairs, then there must be some that are distinguished by the
shortest strings among all those strings that distinguish bad pairs. Let {p,q}
be one such bad pair, and let w = a1@2:-+a, be a string as short as any that
distinguishes p from g. Then exactly one of 6(p,w) and 4(g,w) is accepting.
Observe first that w cannot be ¢, since if € distinguishes a pair of states,
then that pair is marked by the basis part of the table-filling algorithm. Thus,
n21.
Consider the states r = 6(p,a,) and s = 6(g,a1). States r and s are distin-
guished by the string aza3---an, since this string takes r and s to the states
6(p,w) and 6(q, w). However, the string distinguishing r from s is shorter than
any string that distinguishes a bad pair. Thus, {r,s} cannot be a bad pair.
Rather, the table-filling algorithm must have discovered that they are distin-
guishable.
But the inductive part of the table-filling algorithm will not stop until it has
also inferred that p and q are distinguishable, since it finds that 6(p,a,) =r is
distinguishable from 6(q,a,) = s. We have contradicted our assumption that
bad pairs exist. If there are no bad pairs, then every pair of distinguishable
states is distinguished by the table-filling algorithm, and the theorem is true.
o
4.4.2 Testing Equivalence of Regular Languages
The table-filling algorithm gives us an easy way to test if two regular languages
are the same. Suppose languages L and M are each represented in some way,
e.g., one by a regular expression and one by an NFA. Convert each represent-
ation to a DFA. Now, imagine one DFA whose states are the union of the
states of the DFA’s for L and M. Technically, this DFA has two start states,
but actually the start state is irrelevant as far as testing state equivalence is
concerned, so make any state the lone start state.
Now, test if the start states of the two original DFA’s are equivalent, using
the table-filling algorithm. If they are equivalent, then L = M, and if not, then
LAM.158 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES
Figure 4.10: Two equivalent DFA’s
Example 4.21: Consider the two DFA’s in Fig. 4.10. Each DFA accepts
the empty string and all strings that end in 0; that is the language of regular
expression ¢ + (0 + 1)"0. We can imagine that Fig. 4.10 represents a single
DFA, with five states A through E. If we apply the table-filling algorithm to
that automaton, the result is as shown in Fig. 4.11.
Bix
Cc x
D x
E |x x ix
ABCD
Figure 4.11: The table of distinguishabilities for Fig. 4.10
To see how the table is filled out, we start by placing 2’s in all pairs of
states where exactly one of the states is accepting. It turns out that there is
no more to do. The four remaining pairs, {A,C}, {A,D}, {C, D}, and {B, E}
are all equivalent pairs. You should check that no more distinguishable pairs
are discovered in the inductive part of the table-filling algorithm. For instance,
with the table as in Fig. 4.11, we cannot distinguish the pair {A, D} because
on 0 they go to themselves, and on 1 they go to the pair {B, E}, which has4.4. EQUIVALENCE AND MINIMIZATION OF AUTOMATA 159
not yet been distinguished. Since A and C are found equivalent by this test,
and those states were the start states of the two original automata, we conclude
that these DFA’s do accept the same language. O
The time to fill out the table, and thus to decide whether two states are
equivalent is polynomial in the number of states. If there are n states, then
there are (3), or n(n — 1)/2 pairs of states. In one round, we consider all pairs
of states, to sce if one of their successor pairs has been found distinguishable,
so a round surely takes no more than O(n?) time. Moreover, if on some round,
no additional z’s are placed in the table, then the algorithm ends. Thus, there
can be no more than O(n?) rounds, and O(n*) is surely an upper bound on the
running time of the table-filling algorithm.
However, a more careful algorithm can fill the table in O(n?) time. The
idea is to initialize, for each pair of states {r,s}, a list of those pairs {p,q} that
“depend on” {r,s}. That is, if {r,s} is found distinguishable, then {p,q} is
distinguishable. We create the lists initially by examining each pair of states
{p,q}, and for cach of the fixed number of input symbols a, we put {p,q} on
the list for the pair of states {5(p, a), 6(q,@)}, which are the successor states for
pand g on input a.
If we ever find {r,s} to be distinguishable, then we go down the list. for
{r,s}. For cach pair on that list that is not already distinguishable, we make
that pair distinguishable, and we put the pair on a queue of pairs whose lists
we must check similarly.
The total work of this algorithm is proportional to the sum of the lengths
of the lists, since we are at all times either adding something to the lists (ini-
tialization) or examining a member of the list for the first and last time (when
we go down the list for a pair that has been found distinguishable). Since the
size of the input alphabet is considered a constant, each pair of states is put on
O(1) lists. As there are O(n?) pairs, the total work is O(n”).
4.4.3 Minimization of DFA’s
Another important consequence of the test for equivalence of states is that we
can “minimize” DFA’s. That is, for each DFA we can find an equivalent DFA
that has as few states as any DFA accepting the sarne language. Morcover,
except for our ability to call the states by whatever names we choose, this
minimum-state DFA is unique for the language. The algorithm is as follows:
1. First, eliminate any state that cannot be reached from the start state.
2. Then, partition the remaining states into blocks, so that all states in the
same block are equivalent, and no pair of states from different blocks are
equivalent. Theorem 4.24, below, shows that we can always make such a
partition.
Example 4.22: Consider the table of Fig. 4.9, where we determined the state
equivalences and distinguishabilities for the states of Fig. 4.8. The partition160 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES
of the states into equivalent blocks is ({A,E}, {B,H}, {C}, {D,F}, {G}).
Notice that the three pairs of states that are equivalent are each placed in a
block together, while the states that are > distinguishable from all the other states
are each in a block alone.
For the automaton of Fig. 4.10, the partition is ({A,C,D}, {B, E}). This
example shows that we can have more than two states in a block. It may
appear fortuitous that A, C, and D can all live together in a block, because
every pair of them is equivalent, and none of them is equivalent to any other
state. However, as we shall sce in the next theorem to be proved, this situation
is guaranteed by our definition of “equivalence” for states. O
Theorem 4.23: The equivalence of states is transitive. That is, if in some
DFA A = (Q,2,6,q0, F) we find that states p and g are equivalent, and we also
find that q and r are equivalent, then it must be that p and r are equivalent.
PROOF: Note that transitivity is a property we expect of any relationship called
“equivalence.” However, simply calling something “equivalence” doesn’t make
it transitive; we must prove that the name is justified.
Suppose that the pairs {p,q} and {q,r} are equivalent, but pair {p,r} is
distinguishable. Then there is some input string w such that exactly one of
&p, w) and &(r, w) is an accepting state. Suppose, by symmetry, that 5(p,w)
is the accepting state.
Now consider whether &@, w) is accepting or not. If it is accepting, then
{q,r} is distinguishable, since 6(q,w) is accepting, and 8(r,w) is not. If 5(g, w)
is nonaccepting, then {p, q} is distinguishable for a similar reason. We conclude
by contradiction that {p,r} was not distinguishable, and therefore this pair is
equivalent. O
We can use Theorem 4.23 to justify the obvious algorithm for partitioning
states. For each state g, construct a block that consists of g and all the states
that are equivalent to g. We must show that the resulting blocks are a partition;
that is, no state is in two distinct blocks.
First, observe that all states in any block are mutually equivalent. That is,
if p and r are two states in the block of states equivalent to gq, then p and r are
equivalent to each other, by Theorem 4.23.
Suppose that there are two overlapping, but not identical blocks. That
is, there is a block B that includes states p and g, and another block C' that
includes p but not g. Since p and q are in a block together, they are equivalent.
Consider how the block C was formed. If it was the block generated by p, then
q would be in C, because those states are equivalent. Thus, it must be that
there is some third state s that generated block C; i.e., C is the set of states
equivalent to s.
We know that p is equivalent to s, because pis in block C. We also know that
pis equivalent to g because they are together in block B. By the transitivity of
Theorem 4.23, q is equivalent to s. But then q belongs in block C, a contradic-
tion. We conclude that equivalence of states partitions the states; that is, two4.4. EQUIVALENCE AND MINIMIZATION OF AUTOMATA 161
states either have the same set of equivalent states (including themselves), or
their equivalent states are disjoint. To conclude the above analysis:
Theorem 4.24: If we create for each state g of a DFA a block consisting of
q and all the states equivalent to g, then the different blocks of states form a
partition of the set of states.> That is, each state is in exactly one block. All
members of a block are equivalent, and no pair of states chosen from different
blocks are equivalent. O
We are now able to state succinctly the algorithm for minimizing a DFA
A=(Q,2,5,90, F).
1. Use the table-filling algorithm to find all the pairs of equivalent states.
2. Partition the set of states Q into blocks of mutually equivalent states by
the method described above.
3. Construct the minimum-state equivalent DFA B by using the blocks as
its states. Let + be the transition function of B. Suppose S is a set of
equivalent states of A, and a is an input symbol. Then there must exist one
block T of states such that for all states q in S, 6(g, a) is a member of block
T. For if not, then input symbol a takes two states p and q of S to states
in different blocks, and those states are distinguishable by Theorem 4.24.
That fact lets us conclude that p and q are not equivalent, and they did
not both belong in S. As a consequence, we can let (S,a) = T. In
addition:
(a) The start state of B is the block containing the start state of A.
(b) The set of accepting states of B is the set of blocks containing ac-
cepting states of A. Note that if one state of a block is accepting,
then all the states of that block must be accepting. The reason is
that any accepting state is distinguishable from any nonaccepting
state, so you can’t have both accepting and nonaccepting states in
one block of equivalent states.
Example 4.25: Let us minimize the DFA from Fig. 4.8. We established the
blocks of the state partition in Example 4.22. Figure 4.12 shows the minimum-
state automaton, Its five states correspond to the five blocks of equivalent states
for the automaton of Fig. 4.8.
The start state is {A, E}, since A was the start state of Fig. 4.8. The only
accepting state is {C}, since C is the only accepting state of Fig. 4.8. Notice
that the transitions of Fig. 4.12 properly reflect the transitions of Fig. 4.8. For
instance, Fig. 4.12 has a transition on input 0 from {A, £} to {B, H}. That
5You should remember that the same block may be formed several times, starting from
different states. However, the partition consists of the different blocks, so this block appears
only once in the partition.162 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES
Figure 4.12: Minimum-state DFA equivalent to Fig. 4.8
makes sense, because in Fig. 4.8, A goes to B on input 0, and E goes to H.
Likewise, on input 1, {A,E} goes to {D, F}. If we examine Fig. 4.8, we find
that both A and E go to F on input 1, so the selection of the successor of
{A, E} on input 1 is also correct. Note that the fact neither A nor E goes to
D on input 1 is not important. You may check that all of the other transitions
are also proper. O
4.4.4 Why the Minimized DFA Can’t Be Beaten
Suppose we have a DFA A, and we minimize it to construct a DFA M, using the
partitioning method of Theorem 4.24. That theorem shows that we can’t group
the states of A into fewer groups and still have an equivalent DFA. However,
could there be another DFA N, unrelated to A, that accepts the same language
as A and M, yet has fewer states than M? We can prove by contradiction that
N does not exist.
First, run the state-distinguishability process of Section 4.4.1 on the states of
M and N together, as if they were one DFA. We may assume that the states of
M and N have no names in common, so the transition function of the combined
automaton is the union of the transition rules of M and N, with no interaction.
States are accepting in the combined DFA if and only if they are accepting in
the DFA from which they come.
The start states of M and N are indistinguishable because L(M) = L(N).
Further, if {p,q} are indistinguishable, then their successors on any one input4.4, EQUIVALENCE AND MINIMIZATION OF AUTOMATA 163
Minimizing the States of an NFA
You might imagine that the same state-partition technique that minimizes
the states of a DFA could also be used to find a minimum-state NFA
equivalent to a given NFA or DFA. While we can, by a process of exhaustive
enumeration, find an NFA with as few states as possible accepting a given
regular language, we cannot simply group the states of some given NFA
for the language.
An example is in Fig. 4.13. None of the three states are equivalent.
Surely accepting state B is distinguishable from nonaccepting states A and
C. However, A and C are distinguishable by input 0. The successors of C
are A alone, which does not include an accepting state, while the successors
of A are {A,B}, which docs include an accepting state. Thus, grouping
equivalent states does not reduce the number of states of Fig. 4.13.
However, we can find a smaller NFA for the same language if we
simply remove state C. Note that A and B alone accept all strings ending
in 0, while adding state C does not allow us to accept any other strings.
symbol are also indistinguishable. The reason is that if we could distinguish
the successors, then we could distinguish p from q.
Neither Af nor N could have an inaccessible state, or else we could eliminate
that state and have an even smaller DFA for the same language. Thus, every
state of M is indistinguishable from at least one state of N. To see why, suppose
pis a state of M. Then there is some string a,a2---a, that takes the start
state of AM to state p. This string also takes the start state of N to some state
q. Since we know the start states are indistinguishable, we also know that their
successors under input symbol a, are indistinguishable, Then, the successors
of those states on input a2 are indistinguishable, and so on, until we conclude
Figure 4.13: An NFA that cannot be minimized by state equivalence164 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES
that p and q are indistinguishable.
Since N has fewer states than M, there are two states of Af that are in-
distinguishable from the same state of N, and therefore indistinguishable from
each other. But M was designed so that all its states are distinguishable from
each other. We have a contradiction, so the assumption that N exists is wrong,
and MM in fact has as few states as any equivalent DFA for A. Formally, we
have proved:
Theorem 4.26: If A is a DFA, and M the DFA constructed from A by the
algorithm described in the statement of Theorem 4.24, then M has as few states
as any DFA equivalent to A. O
In fact we can say something even stronger than Theorem 4.26. There must.
be a one-to-one correspondence between the states of any other minimum-state
N and the DFA M. The reason is that we argued above how each state of M
must be equivalent to one state of N, and no state of M can be equivalent to
two states of N. We can similarly argue that no state of N can be equivalent
to two states of M, although each state of N must be equivalent to one of Af’s
states. Thus, the minimum-state DFA equivalent to A is unique except for a
possible renaming of the states.
+
Ramat Ga wa!
AIASSORalS
yams swyaal-
Figure 4.14: A DFA to be minimized
4.4.5 Exercises for Section 4.4
* Exercise 4.4.1: In Fig. 4.14 is the transition table of a DFA.
a) Draw the table of distinguishabilities for this automaton.
b) Construct the minimum-state equivalent DFA.
Exercise 4.4.2: Repeat Exercise 4.4.1 for the DFA of Fig 4.15.
1! Exercise 4.4.3: Suppose that p are g are distinguishable states of a given DFA
A with n states. As a function of n, what is the tightest upper bound on how
long the shortest string that distinguishes p from g can be?4.5. SUMMARY OF CHAPTER 4 165
BOG yey
B
Cc
D
E
F
G
H
I
A
Figure 4.15: Another DFA to minimize
4.5 Summary of Chapter 4
+ The Pumping Lemma: If a language is regular, then every sufficiently long
string in the language has a nonempty substring that can be “pumped,”
that is, repeated any number of times while the resulting strings are also
in the language. This fact can be used to prove that many different
languages are not regular.
+ Operations That Preserve the Property of Being a Regular Language:
There are many operations that, when applied to regular languages, yield
a regular language as a result. Among these are union, concatenation, clo-
sure, intersection, complementation, difference, reversal, homomorphism
(replacement of each symbol by an associated string), and inverse homo-
morphism.
+ Testing Emptiness of Regular Languages: There is an algorithm that,
given a representation of a regular language, such as an automaton or
regular expression, tells whether or not the represented language is the
empty set.
+ Testing Membership in a Regular Language: There is an algorithm that,
given a string and a representation of a regular language, tells whether or
not the string is in the language.
+ Testing Distinguishability of States: Two states of a DFA are distinguish-
able if there is an input string that takes exactly one of the two states to
an accepting state. By starting with only the fact that pairs consisting of
one accepting and one nonaccepting state are distinguishable, and trying
to discover addition pairs of distinguishable states by finding pairs whose
successors on one input symbol are distinguishable, we can discover all
pairs of distinguishable states.