On Fair Words
On Fair Words
°c Otto-von-Guericke-Universität Magdeburg
ON FAIR WORDS
Anton Černý
Department of Information Science, Kuwait University
P.O. Box 5969, Safat 13060, Kuwait
e-mail: [email protected]
ABSTRACT
A word is called fair if it contains, for each pair of distinct symbols a, b, the same
number of occurrences of the scattered subword ab as of ba. We provide formulas for
the number of fair words of length up to 10 on a k-letter alphabet and some methods
for constructing fair words. We use a generalization of Parikh vector called p-matrix
to count the number of occurrences of subwords of shape ab in a word.
Keywords: Parikh mapping, Parikh matrix, fair word, palindrome
1. Introduction
ba in binary words. The words with difference 0 are the fair words (called subword
balanced1 in [11]). Theorem 29 refrains the result from there.
We first introduce here p-matrices, p-morphisms and the induced p-equivalence of
words. Then we switch to investigation of fair words; in particular, we are interested
in counting fair words of a given length. We provide an inclusion-exclusion principle,
explicit formulas for the number of fair words of length up to 10 and a conjecture
regarding the general formula.
2. Precedence Matrices
– see [13].
On Fair Words 3
Thus the p-matrix of a word w shows for each pair of distinct symbols ai , aj , how
many times ai precedes aj in w. The main diagonal is the Parikh vector ψ(w), hence
the precedence mapping is a generalization of the Parikh mapping. Using the fact
that, for i 6= j, |w|ai aj + |w|aj ai = |w|ai |w|aj , the following corollaries are easily
obtained.
Corollary 6 Φ(w)− is obtained from Φ(w R ) = Φ(w)T by changing the signs of ele-
ments on the main diagonal. 2
Remark 8 One can easily see that the precedence matrix for a different ordering of
the set Σ can be obtained by permuting the rows and columns accordingly. Thus the
precedence matrix of a word does not strongly depend on the particular ordering of
the alphabet Σ.
Remark 9 If |Σ| = 2 then the upper triangular part of the p-matrix of a word w is
identical to the triangular part of the Parikh matrix of w above (and not including)
the main diagonal. Corollary 3 then implies, that there is a one-to-one correspondence
between Parikh matrices and p-matrices in this case. Therefore many results related
to Parikh matrices on a 2-letter alphabet (e. g., those from [5, 2]) can be directly
translated to the results related to p-matrices.
4 A. ČERNÝ
Let Gk be the set of all matrices A from Fk satisfying for each pair of distinct
indices 1 ≤ i, j ≤ k the condition Ai,j + Aj,i = Ai,i Aj,j .
Proof. We will prove the first inequality by induction on |w|. The second inequality
can be proved in a similar way. For w = ε the inequality is trivially true. Consider the
word w = xa, a ∈ Σ and assume that the assertion is true for x. If a ∈ / {b1 , b2 , . . . , br }
then it is true for w, as well, since all the values involved are the same for w and x. Let
now a ∈ {b1 , b2 , . . . , br }. We may assume a = b1 . Then, for 1 ≤ i ≤ r − 1, |w|bi bi+1 =
|x|bi bi+1 , |w|
Pbi+1 |x|b1 +1 and |w|br b1 = |x|br b1 +|x|br = |x|br b1 +|w|br .
= |x|bi+1 , |w|b1 = P
r r
Therefore i=1 |w|bi b(i+1) mod r = i=1 |x|bi b(i+1) mod r + |w|br ≥ m1 (x)m2 (x) + |w|br .
There are three cases possible.
1. m1 (w) = m1 (x) and m2 (w) = m2 (x).
Then m1 (x)m2 (x) + |w|br ≥ m1 (w)m2 (w).
2. m1 (w) = m1 (x) and m2 (w) = m2 (x) + 1.
Then |w|br = |x|br ≥ m1 (x) and m1 (x)m2 (x) + |w|br ≥ m1 (x)(m2 (x) + 1) =
m1 (w)m2 (w).
3. m1 (w) = m1 (x) + 1 and m2 (w) = m2 (x).
Then m1 (w) = |w|b1 and |w|br ≥ m2 (w) = m2 (x). Therefore m1 (x)m2 (x) + |w|br ≥
(m1 (x) + 1)m2 (x) = m1 (w)m2 (w). 2
2 Possibly m1 (w) = m2 (w) and/or M1 (w) = M2 (w) .
On Fair Words 5
205
Ã
!
Example 13 The matrix M = 4 2 2 satisfies for each pair of distinct indices
143
1 ≤ i, j ≤ 3 the condition Mii ,j + Mji ,i = Mi,i Mj,j but it is not a p-matrix, since
M1,2 + M2,3 + M3,1 = 3 < 2 · 2 where M1,1 = M2,2 = 2 are the two smallest of the
values M1,1 , M2,2 , M3,3 .
3. Fair Words
On the other hand, the fact xy ≡ x0 y 0 and |x| = |x0 | does not imply x ≡ x0 as
illustrated by the following example.
Example 15 abba ≡ baab but ab 6≡ ba. (Actually, no two proper prefixes of abba and
baab of the same length are equivalent.)
Example 16 One can easily check that the fair words of length up to 6 are exactly
the palindromes (the alphabet size does not matter). Starting from length 7, non-
palindromic fair words exist: ab3 a2 b, ab3 a2 bab, ab2 a2 baba2 b, abc2 babcbacb2 ca are few
examples.
Proposition 17 If x is a fair word, then at most one element on the main diagonal
of the matrix Φ(x) (i. e., in the Parikh vector ψ(x)) is odd. 2
Corollary 20 Any two fair words having the same Parikh vector are equivalent. 2
Lemma 21 Let x be a fair word and y a word. Then the word yxy R is fair.
Proof. Φ(yxy R )T = (Φ(y) ◦ Φ(x) ◦ Φ(y R ))T = (Φ(y R )T ◦ Φ(x)T ◦ Φ(y)T ) = Φ(y) ◦
Φ(x) ◦ Φ(y R ) = Φ(yxy R ). 2
Lemma 22 If uvz is a fair word and v 0 ≡ v then uv 0 z is a fair word and uv 0 z ≡ uvz.
Proof. Since both v 0 ≡ v and v 0R ≡ v R , Proposition 14 implies (uv 0 z)R = z R v 0R uR ≡
z R v R uR = (uvz)R ≡ uvz ≡ uv 0 z. 2
Problem 23 Can every fair word be obtained from a word constructed as in Lem-
ma 21 for |x| ≤ 1 by finitely many substitutions of some proper factor v by v 0 as in
Lemma 22?
Φ∆ (h(x))r,s = |h(x)|r,s
X X¡ ¢
|x|p
= |x|p |h(bp )|r,s + 2 |h(bp )|r |h(bp )|s
p p
X
+ |x|p.q |h(bp )|r |h(bq )|s
p.q
X X¡
|y|p
¢
= |y|p |h(bp )|r,s + 2 |h(bp )|r |h(bp )|s
p p
X
+ |y|p.q |h(bp )|r |h(bq )|s
p.q
= |h(y)|r,s
= Φ∆ (h(y))r,s . 2
The morphism h will be called fair if all the words h(a), a ∈ Γ, are fair. It will be
called unfair if none of the words h(a) is fair.
Corollary 27 A word obtained from a fair word by erasing all occurrences of some
symbol is fair. 2
Example 28 Consider the morphism h : {0, 1}∗ → {a, b}∗ defined as h(0) = ab,
h(1) = ba. Then h(01) = abba ≡{a,b} baab = h(10) but 01 6≡{0,1} 10. Therefore equiv-
alence of morphic images does not necessarily imply the equivalence of the original
words.
The language Lsym consisting of all fair words was considered in [11] for the case
|Σ| = 2 and its position within the Chomsky hierarchy has been determined. The
proof from [11] is valid for larger alphabets, as well, hence the result can be extended
as follows.
Theorem 29 For |Σ| ≥ 2, the language Lsym is context-sensitive, but not context-
free.
A natural question is what portion of all words are fair words. We denote as F (k, n)
the number of fair words of length n on a k-symbol alphabet. Table 1 shows these
numbers, as well as the percentage of the total number of words of the given length,
for a few small values k, n.
8 A. ČERNÝ
The table indicates that the values of F (k, n) tend to increase with the increasing
n while the frequency of occurrences of fair words tends to decrease. We do not know
whether the limit of the frequency exists, we assume it does and is equal to 0 (see
Conjecture 34). Of interest are slightly falling values of F (k, n) between odd and
even values of n. Again, we do not know whether this is a common pattern for larger
values of n and how to explain this phenomenon. Conjecture 34 indicates, why for
even n there is a smaller absolute difference between F (k, n − 1) and F (k, n) than
between F (k, n) and F (k, n + 1). For small values of n we are able to establish a
recurrence relation
¡ for¢ F (k, n). To obtain it, we first denote, for 0 ≤ i ≤ p ≤ k, as
ap,k,i = (−1)p−i k−i
p−i .
Lemma 30
1 if i = p,
p−1
ap,k,i = X ¡k−j ¢
− p−j aj,k,i if i < p.
j=i
= 0. 2
Theorem 31 For 0 ≤ p ≤ k,
p
X ¡k−i¢¡k¢
Fp (k, n) = (−1)p−i p−i i F (i, n).
i=0
Proof. Induction on p. The assertion is trivial for p = 0. Let us assume it is true for
all values smaller than some p ≥ 1. To evaluate Fp (k, n), for each of the p-tuple of
symbols we count the F (p, n) fair words consisting of the symbols of the p-tuple only.
The words not containing all the p distinct symbols are counted here as well, even
several times, as the¡ p-tuples intersect. Each word containing exactly j < p distinct
symbols is counted k−j
¢
p−j times, since this is the number of ways the j-tuple can be
On Fair Words 9
Table 1: Frequency of fair words. For each alphabet size, the second column provides the
percentage of the words of the particular length being fair.
Now we can formulate the recurrence relation for F (k, n) for values of n that are
small compared to k.
Theorem 32 If 0 ≤ n ≤ 2k − 2 then
dn/2e dn/2e
X X ¡k−i¢¡k¢
F (k, n) = (−1)p−i p−i i F (i, n).
i=0 p=i
We finish by extending the notion of fairness to (one-way) infinite words (see, [3],
Chapter 2 for related definitions). An infinite word w = b1 b2 . . ., bi ∈ Σ, is fair if
infinitely many prefixes of w are fair. The following lemma provides an easy method
of construction of infinite fair words.
Proof. Corollary 26 implies that all the words hn (a) are fair. 2
Example 36 The fair morphism a 7→ aba, b 7→ b yields the fair periodic infinite word
abababababa . . .
The fair morphism a 7→ abba, b 7→ bab yields the fair infinite word
abbababbababbabababbabab . . .
The morphism a 7→ ab, b 7→ bb yields the infinite word abbbbbbbbbbbb . . ., which is
not fair.
The infinite word abab2 ab3 ab4 . . . is an example of a non-periodic infinite word,
having just 3 fair prefixes. Indeed it can be easily seen that the prefix abab2 . . . abr abi ,
0 ≤ r, 0 ≤ i ≤ r + 1, contains r(r + 1)(r + 2)/6 subwords ba. Hence it is fair iff this
number equals to (r + 1)[r(r + 1)/2 + i]/2 implying i = (r − r 2 )/6. Thus i = 0 and
r = 0 or r = 1; the only fair prefixes are ε, a, and aba.
The unfair morphism a 7→ ab, b 7→ ba yields the fair infinite word of Thue-Morse
abbabaabbaababba . . . (generated, as well, by iteration of the square of the same mor-
phism a 7→ abba, b 7→ baab being itself fair).
The morphism a 7→ ab, b 7→ a, which is neither fair nor unfair, yields the infinite
word of Fibonacci, which is fair [11, Theorem 9].
Every infinite word having infinitely many palindromes as prefixes is fair. The fair
morphism a 7→ ab3 a2 b, b 7→ ab3 a2 b yields a fair periodic infinite word with just three
palindromic prefixes ε, a, and ab3 a, since no its prefix is finished by a2 b3 a.
4. Conclusion
We presented just some basic properties of p-matrices and fair words. More questions
remain open than answered. Is the necessary condition from Corollary 12 sufficient
as well? Knowing explicit formulas for F (k, n) for larger values of n could lead to
proving or disproving Conjecture 34. Answering Problem 23 could be one step towards
a general formula for F (k, n). Fair infinite words may be studied in the framework of
more general investigations suggested in [11].
References
[1] Ö. Egecioglu, O. H. Ibarra, A Matrix q-Analogue of the Parikh Map. In:
J.-J. Lévy, E. W. Mayr, J. C. Mitchell (eds.), IFIP TCS . Kluwer, 2004,
125–138.
12 A. ČERNÝ