Minimal DFA For Symmetric Difference NFA
Minimal DFA For Symmetric Difference NFA
Brink van der Merwe1 , Hellis Tamm2 , and Lynette van Zijl1
1
Department of Computer Science
Stellenbosch University, Private Bag X1, 7602 Matieland, South Africa
[email protected],[email protected]
2
Institute of Cybernetics, Tallinn University of Technology,
Akadeemia tee 21, 12618 Tallinn, Estonia
[email protected]
1 Introduction
⊕-NFAs are typically defined by using the symmetric difference set operation, in
contrast to the union set operation as in the case of NFAs. We give this definition,
in order to be consistent with previous literature, but also equivalently consider
⊕-NFAs as weighted automata over the semiring Z2 (the Galois field with two
elements).
We begin this section by recalling the definitions for a semiring and for
weighted automata.
Definition 1. (from [6], [7]) A tuple (S, ⊕, ⊗, 0̄, 1̄) is a semiring if (S, ⊕, 0̄) is a
commutative monoid with identity element 0̄, (S, ⊗, 1̄) is a monoid with identity
element 1̄, ⊗ distributes over ⊕, and 0̄ is an annihilator for ⊗: for all a ∈ S,
a ⊗ 0̄ = 0̄ ⊗ a = 0̄.
Example 1.
a) The Boolean semiring is the two element semiring over true (true being 1̄)
and false (false being 0̄) using and as ⊗ and or as ⊕.
b) The symmetric difference semiring (Z2 ), is obtained by replacing or with
exclusive or in the definition of the Boolean semiring.
c) The tropical semiring is the semiring (N∪{+∞}, min, +, +∞, 0), also known
as the min-plus semiring, with min and + extended to N ∪ {+∞} in the
natural way (N denotes the natural numbers, including 0).
Note that δ(a) is a Q × Q-matrix whose (p, q)-th entry δ(a)p,q ∈ S indicates
the weight of the transition from p to q on the symbol a.
2
Let S be a semiring and A a weighted automaton over S. A path in A is
an alternating sequence P = q0 a1 q1 . . . qn−1 an qn ∈ Q(ΣQ)∗ . Its run weight is
the product rw(P ) = I(q0 ) ⊗ δ(a1 )q0 ,q1 ⊗ δ(a1 )q1 ,q2 ⊗ . . . ⊗ δ(an )qn−1 ,qn F (qn ).
The label of a path P = q0 a1 q1 . . . qn−1 an qn ∈ Q(ΣQ)∗ , denoted by label(P ),
is the word a1 . . . an ∈ Σ ∗ . The behaviour of a weighted automaton A is the
function kAk : Σ ∗ → S defined by kAk(w) = ⊕label(P )=w rw(P ). One can check
T
that kAk(w1 . . . wn ) = Iδ(w1 ) . . . δ(wn )F , with usual matrix multiplication,
T
considering I and F as row vectors and and denoting by F the column vector
obtained by transposing F .
Note that the Boolean semiring and Z2 are the only semirings with two
elements. In the case of two element semirings, we can interpret weights as
acceptance and rejection (words with weight 1 are accepted). Also, weighted au-
tomata over the Boolean semiring and over Z2 accept the same class of languages,
namely, the regular languages.
⊕-NFAs are in fact precisely weighted automata over Z2 , but in order to stay
consistent with previous work on the topic, we next give the standard definition
for ⊕-NFAs.
3
a
q1 q2 q3 q1 q2 q3 q1 q3
a a a a a
a
a
a a a
q2 q3 q1 q2 q1 q2 q3
- QD = {δ(I, w) | w ∈ Σ ∗ };
- for J ∈ QD ⊆ 2Q , and a ∈ Σ, δ D (J, a) = ⊕q∈J δ(q, a), with δ D (∅, a) = ∅ for
all a ∈ Σ, if ∅ ∈ QD ;
- the start state q0 of N D is the set I;
- the final states F D of N D is the set {K ∈ QD | |K ∩ F | mod 2 6= 0}.
As is the case for weighted automata in general, one can encode the transition
table of a unary ⊕-NFA N as a binary matrix m(δ(a)):
1 if qj ∈ δ(qi , a)
m(δ(a))ij =
0 otherwise,
and successive matrix multiplications in the Galois field Z2 reflect the subset
construction on N .
m(δ(a)) is called the characteristic matrix of N , and c(x) = det(m(δ(a))−xI),
where I is the identity matrix of the appropriate size, is known as its character-
istic polynomial.
Similarly, we can encode any set of states B ⊆ Q as an n-entry row vector
v(B) by defining
1 if qi ∈ B
v(B)i =
0 otherwise .
We place an arbitrary but fixed order on the elements of Q. We refer to v(B)
as the vector encoding of B, and to B as the set encoding of v(B). Note that
v(B1 ) + v(B2 ) = v(B1 ⊕ B2 ).
4
The matrix product v(I)m(δ(a)) encodes the states reachable from the initial
states after reading one letter, v(I)m(δ(a))2 encodes the states reachable after
two letters, and in general v(I)m(δ(a))k encodes the states reachable after k
letters. Standard linear algebra shows the following:
where v(F )T denotes the transpose of the row vector v(F ). In the general case
where we consider also non-unary symmetric difference automata, one can asso-
ciate a matrix m(δ(a)) to each symbol a ∈ Σ. Then a word w = w1 . . . wk , with
wi ∈ Σ, is accepted if and only if:
v(I)m(δ(w1 )) . . . m(δ(wk ))v(F )T = 1.
Note that if N is an n-state ⊕-NFA with initial vector v(I), final vector
v(F ) and transition matrices m(δ(a)), and A is a n × n non-singular matrix with
inverse A−1 , then the ⊕-NFA NA with initial vector v(I)A, final vector A−1 v(F )
and transition matrices A−1 m(δ(a))A, accepts the same language as N , since:
It is shown in [17] that if N and N 0 are minimal ⊕-NFAs for the same language
L, then we can find a non-singular matrix A such that N 0 = NA . We will refer
to the process of changing from N to NA as making a change of basis by using
A.
5
N (see [16]). It can be shown that N is minimal, and that N D is a minimal DFA
([16]), which is always the case when determinizing a minimal ⊕-NFA, as we
will show later in Theorem 4. If we encode the start state as a row vector v(I),
with only the first component of v(I) equal to one, and compute v(I)m(δ(a))k ,
we end up with the k-th entry in the on-the-fly subset construction on N . For
example, with the start state q1 encoded as v(I) = [ 1 0 0 ], we see that
111
m(δ(a))4 = 1 1 0 ,
011
6
Note that we always have ∅ ∈ K(N ).
The range of a ⊕-NFA N = (Q, Σ, δ, I, F ) is defined as the linear subspace
of 2Q generated by subsets of the form δ(I, w).
The notion of being trim for ⊕-NFAs is defined analogously to the case of
(union) NFAs. Instead of also removing states from which final states can not
be reached, as in the case of (union) NFAs, we rather consider these as states
that should be trimmed from N R .
Proof. Assume that N = (Q, Σ, δ, I, F ). The proposition follows from the follow-
ing standard fact from linear algebra. Let V be a k-dimensional linear subspace
of the n-dimensional vector space Zn2 . Then there exists a n × n non-singular
matrix A over Z2 , such that {vA | v ∈ V } is equal to {(a1 , . . . , ak , 0, . . . , 0) ∈
(Z2 )n | ai ∈ Z2 }. Thus we can find a non-singular matrix A such that the vectors
v(I)m(δ(w1 )) . . . m(δ(wl ))A, for all w = w1 . . . wl ∈ Σ ∗ , which is also equal to
v(I)AA−1 m(δ(w1 )) . . . AA−1 m(δ(wl ))A, generate (by using ⊕) precisely all 2k
vectors with all components from the (k + 1)-th component onwards being zero.
Thus by removing states qk+1 , . . . , qn from NA , we obtain a trim ⊕-NFA. t
u
Proof. Assume that K(N ) 6= {∅} and let ∅ 6= J ∈ K(N ). Then from the defi-
nition of K(N ) we have that LJ,F (N ) = ∅, and thus LF,J (N R ) = ∅. But note
that LF,J (N R ) = ∅ implies that R(N R ) 6= 2Q . The converse can be proved in a
similar way. t
u
7
Remark 1. It can be shown that if K(N ) = {∅} and K(N R ) = {∅} (or if K(N ) =
{∅} and R(N ) = 2Q ), then N is minimal, but this result is not required in the
remainder of this paper.
8
a, b
a a, b
a, b q1 , q 5 q4 q3
q1 q2 q3 q4 q5 b
b a, b a, b a, b
a, b
q2 , q 4 ∅
a, b
(a) (b)
a, b
b a, b a, b a, b
q1 q2 q3 q4 q5
a, b
(c)
9
a, b
a, b
a a, b q1 q2
a, b b
q1 q2 ∅ a, b b
b
b q3 q4 a, b
a, b
(a) (b)
determinization it can be verified that the right languages of the states of N are
as follows: A1 ∪ A3 for q1 (note that A1 ∪ A3 = L1 ), A2 for q2 , A3 for q3 , and
A4 for q4 . Thus N is atomic, but not ⊕-atomic, since none of A2 , A3 or A4 can
be expressed as a symmetric difference of any combination of quotients.
Proof. From [17] we have that if N and N 0 are minimal ⊕-NFAs with L(N ) =
L(N 0 ), then we can obtain N 0 from N by making a change of basis. Also, by
Theorem 2, there exists a minimal ⊕-NFA N for any language L, that is also
a ⊕-RFSA. The result now follows from the observation that if NA is obtained
from N by a change of basis by using the non-singular matrix A, then the right
language of a state of NA is the symmetric difference of the right languages of
some of the states of N . To see this, note that from the equation
it follows that if we take the symmetric difference of the right languages of states
of N corresponding to the positions of the ith row of A−1 having 1’s, then we
obtain the right language of the ith state of NA . t
u
10
Proposition 6. Let D = (Q, Σ, δ, q0 , F ) be the complete minimal DFA accepting
L. Then there is a one-to-one correspondence between the sets Q and B, mapping
a state q ∈ Q to some atom Bi so that Lq0 ,q (D) = BiR holds.
Corollary 1. Let D = (Q, Σ, δ, q0 , F ) be any DFA accepting L. Then for every
state q ∈ Q there is some i ∈ {1, . . . , r}, such that Lq0 ,q (D) ⊆ BiR holds.
The following theorem is similar to the result obtained in [2] for (union)
NFAs. Also, the proof we present here for ⊕-NFAs is essentially the same as it
was for NFAs in [2].
Theorem 3. For any ⊕-NFA N , N D is minimal if and only if N R is atomic.
Proof. Let N = (Q, Σ, δ, I, F ) be a ⊕-NFA and assume that N D is minimal,
but suppose that N R is not atomic. Then there is a state q of N R that is not
a union of atoms. That is, there is a word u ∈ Lq,I (N R ) such that u ∈ Bi for
some i ∈ {1, . . . , r}, but for some other word v ∈ Bi , v 6∈ Lq,I (N R ). It is implied
that uR ∈ LI,q (N ) and v R 6∈ LI,q (N ). Since we assumed that N D is a minimal
DFA, by Proposition 6 there is a state s of N D such that LI,s (N D ) = BiR . It is
implied that uR , v R ∈ LI,s (N D ). Since we had uR ∈ LI,q (N ), we get that q ∈ s.
On the other hand, because v R 6∈ LI,q (N ) holds, we get q 6∈ s, a contradiction.
Conversely, assume that N R is atomic. Then S for every state q of N , there is
R
R
S
i∈Hq Bi for every state q of N . Consider any state s of the DFA N
D
and
any word u such that u ∈ LI,s (N ). Then, clearly, u ∈ LI,q (N ) for every q ∈
D
Corollary 1, LI,s (N D ) ⊆ BkR for some atom Bk . Since atoms are disjoint, any
boolean combination of sets BiR cannot be a proper subset of any BkR . Thus,
LI,s (N D ) = BkR . If we suppose that N D is not minimal, then there are some
states s0 and s00 of N D , and a state t of the corresponding minimal DFA, such that
s0 , s00 and t have the same right language. Then it is easy to see by Proposition 6
that there is some Bi , such that LI,s0 (N D ) ⊂ BiR and LI,s00 (N D ) ⊂ BiR , a
contradiction. t
u
Theorem 4. If N is a minimal ⊕-NFA, then N D is a minimal DFA.
Proof. The result follows from Propositions 1, 4, 5 and Theorem 3. t
u
11
References
1. Brzozowski, J.: Canonical regular expressions and minimal state graphs for def-
inite events. In: Proceedings of the Symposium on the Mathematical Theory of
Automata. MRI Symposia Series, Polytechnic Press of Polytechnic Institute of
Brooklyn (1963) 529–561
2. Brzozowski, J., Tamm, H.: Theory of átomata. In Mauri, G., Leporati, A., eds.:
Proceedings of the 15th International Conference on Developments in Language
Theory (DLT). Volume 6795 of Lecture Notes in Computer Science, Springer (2011)
105–117
3. Brzozowski, J., Tamm, H.: Quotient complexities of atoms of regular languages.
http://arxiv.org/abs/1201.0295 (2012)
4. Denis, F., Lemay, A., Terlutte, A.: Residual finite state automata. In Ferreira, A.,
Reichel, H., eds.: STACS 2001, 18th Annual Symposium on Theoretical Aspects of
Computer Science, Dresden, Germany, February 15-17, 2001, Proceedings. Volume
2010 of Lecture Notes in Computer Science, Springer (2001) 144–157
5. Dornhoff, L., Hohn, F.: Applied Modern Algebra. Macmillan Publishing Company
(1978)
6. Droste, M., Kuich, W., Vogler, H.: Handbook of Weighted Automata. 1st edn.
Springer Publishing Company, Incorporated (2009)
7. Droste, M., Rahonis, G.: Weighted automata and weighted logics on infinite words.
In Ibarra, O.H., Dang, Z., eds.: Developments in Language Theory. Volume 4036
of Lecture Notes in Computer Science, Springer (2006) 49–58
8. Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages and Com-
putation. Addison Wesley (1979)
9. Ilie, L., Navarro, G., Yu, S.: On NFA reductions. Lecture Notes in Computer
Science 3113 112–124
10. Jiang, T., Ravikumar, B.: Minimal NFA problems are hard. SIAM Journal on
Computing 22(6) (December 1993) 1117–1141
11. Kirsten, D., Mäurer, I.: On the determinization of weighted automata. Journal of
Automata, Languages and Combinatorics 10(2/3) (2005) 287–312
12. Mohri, M.: Finite-state transducers in language and speech processing. Computa-
tional Linguistics 23(2) (June 1997) 269–311
13. Stone, H.: Discrete Mathematical Structures. Science Research Associates (1973)
14. Van der Merwe, A., Van Zijl, L., Geldenhuys, J.: Ambiguity of unary symmetric
difference NFAs. In: Proceedings of the International Colloquium on Theoreti-
cal Aspects of Computing. Volume 6916 of Lecture Notes in Computer Science,
Springer (September 2011) 256–266
15. Van Zijl, L.: Generalized Nondeterminism and the Succinct Representation of
Regular Languages. PhD thesis, Stellenbosch University (November 1997)
16. Van Zijl, L.: On binary symmetric difference NFAs and succinct representations of
regular languages. Theoretical Computer Science 328(1) (November 2004) 161–170
17. Vuillemin, J., Gama, N.: Compact normal form for regular languages as xor au-
tomata. In: Proceedings of the 14th International Conference on Implementa-
tion and Application of Automata. CIAA ’09, Berlin, Heidelberg, Springer-Verlag
(2009) 24–33
12