Finite Automata, Palindromes, Powers, and Patterns: Abstract. Given A Language L and A Nondeterministic Finite Automaton
Finite Automata, Palindromes, Powers, and Patterns: Abstract. Given A Language L and A Nondeterministic Finite Automaton
Finite Automata, Palindromes, Powers, and Patterns: Abstract. Given A Language L and A Nondeterministic Finite Automaton
and Patterns
1 Introduction
Let L ⊆ Σ ∗ be a fixed language, and let M be a deterministic finite automaton
(DFA) or nondeterministic finite automaton (NFA) with input alphabet Σ. In
this paper we are interested in three questions:
1. Whether we can efficiently decide (in terms of the size of M ) if L(M ) contains
at least one element of L, that is, if L(M ) ∩ L = ∅;
2. Whether we can efficiently decide if L(M ) contains infinitely many elements
of L, that is, if L(M ) ∩ L is infinite;
3. Given that L(M ) contains at least one element of L, what is a good upper
bound on the shortest element of L(M ) ∩ L?
As an example, consider the case where Σ = {a}, L is the set of primes written
in unary, that is, {ai : i is prime }, and M is a NFA with n states.
To answer questions (1) and (2), we first rewrite M in Chrobak normal form
[5]. Chrobak normal form consists of an NFA M with a “tail” of O(n2 ) states,
followed by a single nondeterministic choice to a set of disjoint cycles containing
at most n states. Computing this normal form can be achieved in O(n5 ) steps
by a result of Martinez [17].
Now we examine each of the cycles produced by this transformation. Each
cycle accepts a finite union of sets of the form (at )∗ ac , where t is the size of
Author’s current address: Department of Computer and Information Sciences, Indi-
ana University South Bend, 1700 Mishawaka Ave., P.O. Box 7111, South Bend, IN
46634, USA.
C. Martı́n-Vide, F. Otto, and H. Fernau (Eds.): LATA 2008, LNCS 5196, pp. 52–63, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Finite Automata, Palindromes, Powers, and Patterns 53
the cycle and c ≤ n2 + n; both t and c are given explicitly from M . Now,
by Dirichlet’s theorem on primes in arithmetic progressions, gcd(t, c) = 1 for
at least one pair (t, c) induced by M if and only if M accepts infinitely many
elements of L. This can be checked in O(n2 ) steps, and so we get a solution to
question (2) in polynomial time.
Question (1) requires a little more work. From our answer to question (2),
we may assume that gcd(t, c) > 1 for all pairs (t, c), for otherwise M accepts
infinitely many elements of L and hence at least one element. Each element in
such a set is of length kt + c for some k ≥ 0. Let d = gcd(t, c) ≥ 2. Then
kt + c = (kt/d + c/d)d. If k > 1, this quantity is at least 2d and hence composite.
Thus it suffices to check the primality of c and t + c, both of which are at
most n2 + 2n. We can precompute the primes < n2 + 2n in linear time using
a modification of the sieve of Eratosthenes [18], and check if any of them are
accepted. This gives a solution to question (1) in polynomial time.
On the other hand, answering question (3) essentially amounts to estimating
the size of the least prime in an arithmetic progression, an extremely difficult
question that is still not fully resolved [9], although it is known that there is a
polynomial upper bound.
Thus we see that asking these questions, even for relatively simple languages
L, can quickly take us to the limits of what is known in formal languages and
number theory.
In this paper we examine questions (1)-(3) in the case where M is an NFA
and L is either the set of palindromes, the set of k-powers, the set of powers, the
set of words matching a general pattern, or their complements.
In some of these cases, there is previous work. For example, Ito et al. [12]
studied several circumstances in which primitive words (non-powers) may appear
in regular languages. As a typical result in [12], we mention: “A DFA over an
alphabet of 2 or more letters accepts a primitive word iff it accepts one of length
≤ 3n− 3, where n is the number of states of the DFA”. Horváth, Karhumäki and
Kleijn [11] addressed the decidability problem of whether a language accepted
by an NFA is palindromic (i.e., every element is a palindrome). They showed
that the language accepted by an NFA with n states is palindromic if and only
if all its words of length shorter than 3n are palindromes.
A preliminary version of the full version of this paper is available online [2].
A language L is called slender if there is a constant C such that, for all n ≥ 0, the
number of words of length n in L is less than C. The following characterization
of slender regular languages has been independently rediscovered several times
in the past [14,24,19].
For further background on finite automata and regular languages we refer the
reader to Yu [26].
is, in fact, regular, as often shown in a beginning course in formal languages [10,
p. 72, Exercise 3.4 (h)]. We can take advantage of this as follows:
Lemma 1. Let M be an NFA with n states and t transitions. Then there exists
an NFA M with n2 + 1 states and ≤ 2t2 transitions such that L(M ) = L .
Finite Automata, Palindromes, Powers, and Patterns 55
Since the pattern p is given as part of the input, this problem is actually
somewhat more general than the sort of problem formulated as Question 1 of
the introduction, where the language L was fixed.
The following result was proved by Restivo and Salemi [20] (a more detailed
proof appears in [4]).
Observe that Theorem 5 implies the decidability of the NFA PATTERN AC-
CEPTANCE problem. It is possible to give a boolean matrix based proof of
Theorem 5 (see Zhang [27] for a study of this boolean matrix approach to au-
tomata theory) that provides an explicit description of an NFA accepting PΔ ,
but due to space constraints we omit this proof. However, the reader may per-
haps deduce the argument from the proof of the following algorithmic result,
which uses similar ideas.
DFA INTERSECTION
INSTANCE: An integer k ≥ 1 and k DFAs A1 , A2 , . . . , Ak , each over the
alphabet Σ.
QUESTION: Does there exist x ∈ Σ ∗ such that x is accepted by each
Ai , 1 ≤ i ≤ k?
We include here the following combinatorial result, which, when applied to words
in a regular language, gives a sort of “pumping lemma” for powers in a regular
language.
Ito et al. [12] proved a similar result for primitive words: namely, that if L
is accepted by an n-state DFA over an alphabet of two or more letters and
contains a primitive word, then it contains a primitive word of length ≤ 3n − 3.
In other words, every word in L is a power if and only if every word in the set
{x ∈ L : |x| ≤ 3n − 3} is a power.
The proof of Theorem 9 is similar to that of [12, Proposition 7], albeit with
some additional complications. We shall give a complete proof in the full version
of this paper.
Finite Automata, Palindromes, Powers, and Patterns 59
The characterization due to Ito et al. [12, Proposition 10] (see also Dömösi,
Horváth, and Ito [6, Theorem 3]) of the regular languages consisting only of
powers, along with Theorem 2, implies that any such language is slender. A
simple application of the Myhill–Nerode Theorem gives the following weaker
result.
Proof (sketch). We create an NFA, Mr , for r = 3n, such that no word in L(Mr )
is a k-power, and Mr accepts all non-k-powers of length ≤ r (and perhaps some
other non-k-powers).
Note that we may assume that k ≤ r. If k > r, then no word of length ≤ r is
a k-power. In this case, to obtain the desired answer it suffices to test if the set
{x ∈ L(M ) : |x| ≤ r} is empty. However, this set is empty if and only if L(M )
is empty, and this is easily verified in linear time.
We now form a new NFA A as the cross product of Mr with M . From Theo-
rem 9, it follows that L(A) = ∅ iff every word in L(M ) is a k-power. Again, we
can determine if L(A) = ∅ in linear time.
We omit the details of the construction of Mr , noting only that Mr can be
constructed to have at most O(r2 ) states and O(r2 ) transitions. After construct-
ing the cross-product, this gives a O(n3 + tn2 ) bound on the time required to
determine if every word in L(M ) is a k-power.
are squares, but xyxyx is not a power. Hence, the obvious (8n + 8)-state NFA
that accepts x(yx)∗ has the property that the shortest non-k-power accepted is
of length 20n+18. We generalize this lower bound by defining x and y as follows:
let u = (ab)n a, x = uk , and y = x−1 (xbau−1 x)k x−1 . We leave it to the reader
to deduce the following result.
We now move from the problem of testing if an automaton accepts only k-powers
to that of testing if it accepts only powers (of any kind). Just as Theorem 9 was
the starting point for our algorithmic results in Section 6, the following theorem
of Ito et al. [12] (stated here in a slightly stronger form than in the original) is
the starting point for our algorithmic results in this section.
number of such words in any length exceeds 7n. If all these words are powers,
then every word is a power. Otherwise, if we find a non-power, or if the number
of words in any length exceeds 7n, then not every word is a power. By the work
of Mäkinen [16] or Ackerman & Shallit [1], we can enumerate these words in
O(n5 ) time.
Using part (2) of Theorem 13 along with Proposition 7, one obtains the following
in a similar manner.
Theorem 16. Given an NFA M with n states, we can decide if all but finitely
many words in L(M ) are non-powers in O(n5 ) time.
8 Final Remarks
References
1. Ackerman, M., Shallit, J.: Efficient enumeration of regular languages. In: Holub, J.,
Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 226–242. Springer, Heidelberg
(2007)
2. Anderson, T., Rampersad, N., Santean, N., Shallit, J.: Finite automata, palin-
dromes, patterns, and borders, http://www.arxiv.org/abs/0711.3183
Finite Automata, Palindromes, Powers, and Patterns 63
3. Birget, J.-C.: Intersection and union of regular languages and state complexity.
Inform. Process. Lett. 43, 185–190 (1992)
4. Castiglione, G., Restivo, A., Salemi, S.: Patterns in words and languages. Disc.
Appl. Math. 144, 237–246 (2004)
5. Chrobak, M.: Finite automata and unary languages. Theoret. Comput. Sci. 47,
149–158 (1986); Errata 302, 497–498 (2003)
6. Dömösi, P., Horváth, G., Ito, M.: A small hierarchy of languages consisting of
non-primitive words. Publ. Math (Debrecen) 64, 261–267 (2004)
7. Garey, M., Johnson, D.: Computers and Intractability. Freeman, New York (1979)
8. Glaister, I., Shallit, J.: A lower bound technique for the size of nondeterministic
finite automata. Inform. Process. Lett. 59, 75–77 (1996)
9. Heath-Brown, D.R.: Zero-free regions for Dirichlet L-functions, and the least prime
in an arithmetic progression. Proc. Lond. Math. Soc. 64, 265–338 (1992)
10. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages, and
Computation. Addison-Wesley, Reading (1979)
11. Horváth, S., Karhumäki, J., Kleijn, J.: Results concerning palindromicity. J. Inf.
Process. Cybern. EIK 23, 441–451 (1987)
12. Ito, M., Katsura, M., Shyr, H.J., Yu, S.S.: Automata accepting primitive words.
Semigroup Forum 37, 45–52 (1988)
13. Knuth, D., Morris Jr., J., Pratt, V.: Fast pattern matching in strings. SIAM J.
Computing 6, 323–350 (1977)
14. Kunze, M., Shyr, H.J., Thierrin, G.: h-bounded and semi-discrete languages. In-
formation and Control 51, 147–187 (1981)
15. Lyndon, R.C., Schützenberger, M.-P.: The equation am = bn cp in a free group.
Michigan Math. J. 9, 289–298 (1962)
16. Mäkinen, E.: On lexicographic enumeration of regular and context-free languages.
Acta Cybernetica 13, 55–61 (1997)
17. Martinez, A.: Efficient computation of regular expressions from unary NFAs. In:
DCFS 2002, pp. 174–187 (2002)
18. Pritchard, P.: Linear prime-number sieves: a family tree. Sci. Comput. Program-
ming 9, 17–35 (1987)
19. Pǎun, G., Salomaa, A.: Thin and slender languages. Disc. Appl. Math. 61, 257–270
(1995)
20. Restivo, A., Salemi, S.: Words and patterns. In: Kuich, W., Rozenberg, G., Sa-
lomaa, A. (eds.) DLT 2001. LNCS, vol. 2295, pp. 215–218. Springer, Heidelberg
(2002)
21. Rosaz, L.: Puzzle corner, #50. Bull. European Assoc. Theor. Comput. Sci. 76, 234
(February 2002); Solution 77, 261 (June 2002)
22. Rozenberg, G., Salomaa, A.: Handbook of Formal Languages. Springer, Berlin
(1997)
23. Savitch, W.: Relationships between nondeterministic and deterministic tape com-
plexities. J. Comput. System Sci. 4, 177–192 (1970)
24. Shallit, J.: Numeration systems, linear recurrences, and regular sets. Inform. Com-
put. 113, 331–347 (1994)
25. Shallit, J., Breitbart, Y.: Automaticity I: Properties of a measure of descriptional
complexity. J. Comput. System Sci. 53, 10–25 (1996)
26. Yu, S.: Regular languages. In: Handbook of Formal Languages, Ch. 2, pp. 41–110
(1997)
27. Zhang, G.-Q.: Automata, Boolean matrices, and ultimate periodicity. Inform. Com-
put. 152, 138–154 (1999)