Department of Mathematics, University of Turku, Turku, Finland
Department of Mathematics, University of Turku, Turku, Finland
Department of Mathematics, University of Turku, Turku, Finland
Juhani KARHUMIiKI
Department of Mathematics, University of Turku, Turku, Finland
Let h be a morphism satisfying h(a) = ax for a letter a and a nonempty word x. Then h defines
an infinite word (an w-word) when applied iteratively starting from a. Such w-words are
considered in a binary case. It is shown that only biprefixes can generate cube-free w-words, i.e.
words which do not contain a word u’, with ufl, as a subword. The same does not hold true
for fourth power-free w-words, the counterexample being the w-word defined by the Fibonacci-
morphism: h(a) = ba, h(b) = a.
As the main result it is proved that it is decidable whether a given morphism of the above form
generates a cube-free w-word. Moreover, it is shown that no more than 10 steps of iterations are
needed to solve the problem.
1. Introduction
Repetitions in words, i.e. the existence of occurrences of ui, with of A and i 22,
as subwords was first studied by Thue in [lo] and [ll]. He proved, among other
things, that there exist infinite words over a binary alphabet such that these words
do not contain cubes at all. In other words he proved the existence of an infinite
cube-free word over a two-letter alphabet. In the case of a three-letter alphabet he
proved the existence of an infinite word without any repetitions, i.e. the existence of
a square-free infinite word. On the other hand, any word over a binary alphabet and
with the length at least four contains a square.
Later on Thue’s results have been rediscovered several times in different con-
nections. As an overview we refer to [9]. In recent years the research on this field
initiated by Thue has been active. Many of his results have been generalized and a
better understanding about repetitions in words has been achieved, see e.g. [ 1,3-61.
However, many problems are still unanswered.
The Thue’s method to construct an infinite cube-free word over a binary alphabet
was that of iterating a morphism. He considered the morphism defined by h(a) = ab
and h(b) =ba. When applied this iteratively the following sequence is obtained:
a, ab, abba, abbabaab, abbabaabbaababba, . . . Continuing ‘ad infinitum’ a cube-free
infinite word is defined.
In this paper we also consider infinite words, or o-words in our terminology,
2. Preliminaries
For the purpose of this paper let Z be a binary alphabet, say Z = {a, b}. The free
monoid generated by _Z is denoted by Z* and its identity, the so-called empty word,
by A, Z+ = Z* - {A}. Elements of Z* are called words. For the length of a word x we
use the notation 1x1, specifically )A I = 0. If a word u is a prefix of a word u we write
u< u. We say that a word u is a subword in U, or alternatively a segment in U, if
u = xuy for some x and y in Z*. The notation x-‘y (resp. _YX-‘) denotes the left (resp.
right) quotient of y by x. Finally, by an o-word we mean an infinite sequence of
elements of Z (from left to right).
A word or an o-word is called cube-free (resp. square-free, fourth power-free) if
it does not contain a segment of the form uuu (resp. uu, uuuu), with of A. By an
almost cube we mean a word of the form ucucud or ducucu, with u E Z* and c, d E EC,
cf d. So a word abaabaabb is cube-free, but contains several almost cubes, namely
baa, aab,abb and abaabaabb. On the other hand, babaabaabab is not cube-free
since it has a subword (aba)3.
Our central notion is that of a morphism of X*. We consider only A-free
morphisms, i.e. morphisms for which i $ h(2). Let h(a) = (Y and h(b) = /?. We say
On cube-free w-words 281
The explanation for the relation o=,, h(u) is as follows: It tells that both the edges
of u are cut points with respect to (h(a), h(b)} in the word yoz. Moreover, it defines
how the segment u is obtained as the image under h. For simplicity we will normally
write u = h(u) instead of o =Y,zh(u) if there is no danger of confusion.
Let h be a morphism of Z* such that
i.e. a is a proper prefix of h(a). Then h(a) is also a proper prefix of h*(a) since
h*(a) = h(ax) = h(a)h(x) and, in general,
i.e. h’-‘(a) is a proper prefix of h’(a). Hence continuing ‘ad infinitum’ we get an
w-word. A morphism satisfying (1) or the analogous condition for b is called a
prefix preserving morphism, or a pp-morphism in short.
Prefix-preserving morphisms provide a very convenient way of defining o-words.
Indeed, an o-word is obtained simply by iterating a morphism. We refer u-words
obtained in this way to co-words generated by morphisms. In conclusion, we want to
remark that what we really have above is a DOL system (2, h,a), where h has the
prefix condition. Moreover, the way how DOL systems generate words is exactly
that of iterating a morphism. Hence these systems provide a very nice framework to
study o-words.
282 J. Karhumiiki
In this section we are looking for properties satisfied by cube-free words and
w-words over a binary alphabet. As shown by Thue [ll] there exist cube-free
w-words in a binary case. However, such words have some special properties.
Lemma 1. Any cube-free word over a binary alphabet and of the length at least 17
contains aa and bb as subwords.
Proof. The step by step generation of all cube-free words which do not contain the
word aa as a subword yields the following two (one starting from a and the other
from b) terminating trees shown in Fig. 1.
a-b I
/
,aI ,a-b-b-;-;-;I;;
a-b-b-a-b-b-a)
a-b/
/
‘b-a -b-a-b-b-a-b-b-al
\b-a-b-a-b-_b-a-b~a-b-b’
‘b -a I
a-b 1
a-b-b-a-b’
‘b -a I
a-b-b-a-b-b-al
/
a-b-b1
a-b-b-a- b/
a-b-b-a-
/
b/ \b-a )
b/ ‘b a-b-b-a-b-b-al
lb--a-b-a-b-b=iLb/
\b-a 1 a-b-b I
\ b-a-b-a-b-b-a-b’
lb -a I
\bI
Fig. 1.
Above ( denotes that the continuation is not at all possible in such a way that the
required properties would remain. Because the longest sequences are of length 16
the lemma follows.
Corollary 1. Any cube-free w-word over a binary alphabet contains both aa and bb
as subwords.
generates a cube-free o-word K. Let h(a) = c$ and h(b) = (r for some cxand p in .P,
i.e. h is not a prefix. By Corollary 1, R contains bb as a subword and hence also Ma.
But h(bba) = a3/3, which shows that K is not cube-free, a contradiction. By symmetry
in Corollary 1, the cases that h(a) is a prefix of h(b) or that h is not a suffix lead
similarly to a contradiction.
Analogously to Lemma 1 one can show that any cube-free o-word, or in fact any
word of the length at least 24, contains both aba and bab as subwords. Conse-
quently, morphisms of the form h(a) = a/3a and h(b) =/3 never generate cube-free
o-words.
4. The Fihonacci-morphism
In the previous section we proved that cube-free o-words over a binary alphabet
can be regenerated, if at all, only by biprefixes. Here we show that the same does not
hold true for fourth power-free o-words. The result is obtained by studying the
famous Fibonacci-morphism.
It is well-known that the lengths of words obtained iterating h starting at b gives the
Fibonacci-numbers. Although h itself is not a pp-morphism h* is such. So the
iteration of h gives two w-words. Observe also that h has a so-called suffix
condition, i.e. (I is a proper suffix of h(a), and hence h generates a unique u-word
from right to left. In conclusion, h can be used to define altogether three o-words.
The iteration initiates as follows:
+ ababaababaabaababaaba + -.. ,
where (aba)3 appears as a subword in the last written word. Hence, the o-words
obtained are not cube-free, as it must be by Theorem 1.
(h’W)),,o. (1)
The basic idea behind the proof is that we show that if in (1) there are long enough
fourth powers there must be shorter ones, too. Repeating the argument we conclude
that in (1) there must be short fourth powers, which, finally, can be shown to be
impossible.
284 J. Karhumtiki
Assume that u”, where n 23, is a subword in (l), say yu’k is in (1). We have the
following four possibilities
(Indeed, there must always be a cut after a). Consequently, there exists a word U’
such that
Now a(u’b)“-lu’ is a segment in (1). If u’#L it must both start and end with a.
Writing u’= au;, we obtain
a(u’b)n-lu’=a(au~b)“~lu’=a[a]([u~][ba])”-l[u~].
Therefore
(U’b)“-lu’Eh(b(u”a)“~‘u”)
for some word u”. Again b(u”a)“-l U” is a subword in (1) and hence, if u”#A, then it
must start with a and end with 6. This last observation follows since the sequence (1)
does not contain a word aaa as a subword. Now writing u”=au;b we have
Remember now that the above word is a subword in h(Z*). Hence its right
On cube-free w-words 285
neighbour is a. Consequently,
b(u”~)“-‘u”~~h(a(u”‘b)“-‘u”‘) (3)
for some word u”‘. Comparing the right hand sides of (2) and (3) we see that we have
got a cycle, however, with a shorter u-word.
What has been proved above can be summarized to the following diagram (see
Fig. 2) which shows how a shorter word (nth power or almost nth power) is
obtained. The reduction can be carried out as far as the primed u-words are non-
empty or a nonprimed u is of the length at least 2.
\
’ un ~ a("-b)"-'u* with IuI , Iu-1 < IV 1
/
/ IV
I u
I b(u--a)"-'u-* with Iuss l< I u-1
\
IV
\
\ u
\ n-l
a(u*--b) us** with Iuss+ I< Iu~-I
Fig. 2.
Now we are in the position to obtain our result. Assume that the sequence (1)
would contain a fourth power. Then, by above diagram, it would contain either
a4,b4, ba3 or ab3 as a subword, too. But this is impossible. Indeed, the sequence
(/~“(a)),,~ satisfies the recursion formula
which certainly does not contain any of the words a4, b4, ba3 or ab3 as a subword.
Now we are ready to state our result which should be compared to Theorem 1.
The proof is carried out by the sequence of lemmas. To fix the notation let h over
{a, b) * be a pp-morphism, with h(a) = ax for some xf A, defined by
with
v=xlyl =Y2x2Yl =Y2x3 and YlY2E 4-O
Then there exists a word v’ such that either (i) v’v’v’ is a subword in L and 1v’ 1c 1v 1,
u , v I V I
V 1
W
L A’ A Al A I
Y x2 y
x1 X3
where y = yI y2 is either a or /I, say y = a. Then for both the occurrences of y we have
y= h(a), and so there exists a word v1 such that x2= h(vl). Recalling now that h is a
biprefix we deduce that the prefix of x3 with the length 1x21 is also the image of vi
under h and that the same holds true for the suffix of xl with the length 1x2 I.
In conclusion, we get that
h(vlavlavl)=y;‘vvvy~‘. (1)
If either yl = I or y2 = A, then we are ready since in that case the other y would be y
and hence the argument on the left side of (1) could be continued to a cube. SO let
yl#A. By (1) and by the fact that UV~WEL, either a< ylw or /3-Cylw. In the first
case the result follows when we choose v’= via. In the second case an almost cube is
obtained by the choice v’= vi.
On cube-free w-words 287
uvcvcvdw= ~~~l~~YlY21~~21[Y1Y21[~3w~
with
“c=xlYl =Y2x2Yl> vd=y2x3 and yly2~ h(Z).
Then there exists a word v’ such that v’c’v’c’v’d’, with c’,d’~ {a, b}, c’fd’, is a
subword in L and 1D’c’J c I v I.
/I-- Y
Xl X2 y x3
If y1 #A, then we conclude as in the proof of Lemma 2 that for some word v1 and c’
in {a, b} we obtain
Now the facts that uvcvcvdwE L and that h is a prefix guarantee that
h(d’)< y,c-‘dw, with d’#c’, implying the existence of a shorter almost cube. The
case y1 =A can be handled similarly.
Intuitively, the message of the above lemmas is as follows. If a cube (or an almost
cube) exists in L and if the border lines of that cube (or that almost cube), i.e. the
positions illustrated by ) in the formula ... v Iu (II ---, are covered by the same
h-image and without any shift in the @decomposition of the whole word, then a
shorter cube (or a shorter almost cube) can be found.
In what follows we will show that if L contains cubes (or almost cubes), then the
assumptions of Lemmas 2 and 3 are satisfied. Or more precisely, we will show that if
L contains cubes at all, then either the above assumptions are satisfied for cubes and
almost cubes long enough or otherwise L contains very short cubes, too.
Lemma 4. Assume that the shortest cube in L is of the length at least 6 ICrp1, and let
v3 be such one, say uv3w E L. Moreover, let zl, z2 and 213be words such that
U”““W = kllbl[z2l[Yl[z3l~
where
Y E h(z), z,< uv< zly and zlyz2< WV< z~yz~y.
Then
luvl- Iz~l= luvul- IZIYZZI (mod IrIb (2)
288 J. Karhumtiki
Proof. By symmetry, we may assume that y = a. Assume further that (2) does not
hold true.
Now our situation is as follows:
“aw (3)
a a
Let a= ataz= a3a4 where uu =ztot and uuu = UUCZ~Z~OI~. Clearly, at most one of the
a;‘s may be empty, otherwise there is nothing to be proved. Further, by symmetry,
we may assume that 1al I> 1a3 1. Again if Icc1I = /(~31, then we already have (2).
We have two cases to be considered.
Case I: /a11 - Ia3 Is+ Ial. We consider the two different ways how the middle
part of uu may be written. We have:
Let E be the prefix of a of the length Iat / - Ia3 I as indicated in the above figure.
Then a has a prefix c2 and thus u2 has a cube e3 as a subword, a contradiction.
CaseII: lall-la31>+1al. Nowwehavethreesubcases.
(i) The first occurrence of a in (3) is followed by a. In this case the illustration is
as follows:
or
On cube-free w-words 289
The first possibility leads to a contradiction exactly as the main Case I, only the
word ab must be considered instead of a. The second possibility means that a’s on
the lower line must precede /?, since three consecutive a’s is impossible. This, in
turn, implies that a on the upper line must precede a, otherwise we would have the
mirror image situation of the first possibility of this subcase, and it is not possible.
Finally, remembering again that three consecutive a’s is impossible we conclude that
we must have:
B CL a B
I - V i I r J
vv:
I
c A ’ A A J
B a a B
Consequently, the argument of the main Case I becomes applicable for the word
paa/?, which completes the subcase (ii).
(iii) The first occurrence of a in (3) is followed by p and ID I< [al I- 1a3 / . NOW
our illustration is:
c( %
The a on the lower line can not be preceded by a, because of the argument in Case
II(i). Hence, it must be preceded by j3.
We now note that if lall-la31-l~[z+laj, t h en, by the argument in the main
Case I, the /3 on the upper line can not be followed by a. So if a follows that /3 we
must have the situation:
““: : (4)
%
% c(
where the dot line denotes the middle of the lower a. So the upper p goes beyond it
to the right (but not beyond the lower a). We consider now the two occurrences of j3
inside the word afl. Observing that the lower a must be followed by /3 this really can
be done and we get the illustration:
The argument of the main Case I applied to pa shows that p can not be shifted at
all, i.e. both the upper and the lower j3 above must start at the same place. Hence,
290 J. Karhumtiki
(4) shows that cr= aij?oi, for some word czi, and thus (alp)3 is a subword in u*, a
contradiction.
So it remains the case that a on the upper line is followed by two p’s, and, by
symmetry, (Yon the lower line is preceded by two /3’s. Since three consecutive p’s is
impossible, we have the situation
CL B a c1
I I
I
r r Y -I
VV:
I
c ” A 6 A’ J
a B B 0.
Again the argument of the main Case I is applicable, now it must be applied to the
word a/I/k. So we have finished the main Case II, too.
Looking through the above considerations one obtains an lower bound for the
length of u. Certainly, 2 1a/? 1 is such one, which completes the proof of Lemma 4.
Using exactly the same arguments as in the proof of Lemma 4 we can show
Lemma 5. Assume that the shortest cube in L is of the length at least 6 1a/l 1, and let
vcvcud, with c, d E (a, b}, cf d, be a subword in L, say uvcvcodw E L, such that
1vcIz2 )a/3 I. Moreover, let zI, z2 and z3 be words such that
Lemmas 2 and 4 (resp. 3 and 5) show that for cubes or almost cubes of the length
at least 6 )a/? I in L there exist shorter ones, too, if only the two border lines of u3, i.e.
the positions indicated by I in the illustration 0.. u 1o ) u a.-, are covered by the same
h-images.
Consequently, it remains to be considered the case where the border lines are
covered by different h-images.
Lemma 6. For a cube of the length at least 6 Iap I + 3 max{ )a I, //I I > in L there exists
a shorter cube or a shorter almost cube.
Proof. We use the proofs of Lemmas 2 and 4 quite heavily. Let u3 be a cube in L
with I u / L 2 Ia/3 ) + max( IaI, I/3 ) >. By the discussion before it is enough to consider
the case where the border lines of u3 are covered by different h-images. In other
words, we have the situation:
On cube-free o-words 291
where y # 6. Let y = a, i.e. the first border line is covered by CX.By symmetry, we
may assume that lal>l/_I.
Now we recall the proof of Lemma 4. There it was not essential that we used those
occurrences of (Ywhich cover the border lines of u3. We only needed that the two
occurrences of cz were in the shifted position to each other and that they were far
enough from the edges. This in mind we have to consider the following two
possibilities
Case I. The word a covering the second border line of u3 (but which is not
forming the alp-decomposition) contains a cut meeting a, i.e. we have, for example,
the illustration:
where only the lower a’s satisfy the property [a] E h(Z*). In that case we do not have
any problems: all the conclusions in the proof of Lemma 4 are applicable using these
two a’s, since the length of u is assumed to be at least 2 )CrpI+ maxi 1al, 1/I I },
Case II. The word a covering the second border line of u3 does not contain cuts
meeting a, i.e. we have, for example, the situation:
n
Indeed, we must have a cut inside the upper a, since a was assumed to be the longer
h-image. The first occurrence of a must be surrounded by /3 at least on the one side.
Hence, we have two occurrences of p in the shifted position and with the distance
not larger than max{ Ia/, IpI} f rom the border lines. Consequently, the proof of
Lemma 4 becomes applicable now, too.
Lemma 7. For an almost cube of the length at least 6 IOrpI + 3 max{ Ia /, I/3 I } in L
ther exists in L a shorter cube or a shorter almost cube.
We still need one lemma. For the notions of a regular language, a DOL language
and EOL language we refer to [8].
Our proof for Lemma 8 was short and it used unnecessarily complicated language
families. For our purposes better, but longer, proof of the result is presented in the
next section.
Now we are in the position to establish our main result.
Here A is used to denote the middle third of a longer cube or almost cube we are
searching for. If the above leads to a cube, then we are done: L contains a cube. If,
on the other hand, even an almost cube is not found, then this particular y is not
obtained from a cube according to Lemmas 2 and 3. Finally, if an almost cube is
found the process can be repeated. Now the basic observation is that the searching
of a longer cube or almost cube is independent of u’, it depends only on letters c’ and
&and on the left neighbour of y. This means that, if the process does not terminate,
then it leads to a cycle in one or two steps.
This completes the proof of Theorem 3.
On cube-free o-words 293
In this section we are looking for an upper bound for the number of iterations
needed to guarantee the existence of a cube in the sequence (h”(a)),,e if the sequence
contains cubes at all.
Proof. Let h(a) = ax, with xf A. So 1h(a) 1L 2. If h(a) E a*, then h generates a cube in
two, and hence also in seven steps. So assume that h(a) $a*, i.e. both a and b occur
in h(a). If h(b)E b*, then necessarily h(a)=uh(b), for some word U, and thus h
generates a cube in three steps. There remains the case h(b) E Z*aZ*. In that case
The above upper bound is not far from the optimal one. Indeed, for the
Fibonacci-morphism a cube is obtained only after six iteration steps.
We need also the following two simple lemmas.
Proof. Let h(a) = ax, with xf A. So h(b) = b, and thus h(a) = axa for some word x. If
xrsa*UaX*UZ*a, then a cube is obtained in two steps. If XE b+, say x= b’, then
h2(a) = h(ab’a) = ab’ab’ab’a which contains a cube. If x = bbx’bb for some word x’,
then necessarily x’ = ax”a, for some x”, or otherwise a cube is obtained in one step.
However,
and so a cube is obtained in two steps. The remaining case is x= abax”‘ba (or sym-
metrically x = abx”‘aba) for some word x”‘. Also now a cube is obtained in two steps:
Lemma 11. Let h, with h(a) =ax, x+1, be a pp-biprefix which does not generate a
cube in three steps. Then it generates all the subwords of the length three which it
will ever generate in at most three steps.
294 J. Karhumiiki
We left the simple but quite lengthy proof for the reader. We only want to point
out that an easier result, namely the result where it is required that the number of
iterations is five is trivial. Indeed, any subword of the length three which is
generated by a pp-morphism is generated in one step from a word of the length at
most two, and all the subwords of the length two are generated in not more than
four steps, if at all. This last observation is based on the fact that there are no more
than four words of the length two.
Now we are ready for
which shows that yi contains at most 12 a’s. To estimate the number of b’s in y1 we
note that the number of b’s preceding the third occurrence of a is at most 5. Hence
an upper bound for the whole number of b’s in yI is 22. Consequently,
lYll534.
Now we consider the ancestors of yl, i.e. the minimal subwords y2, ~3, . . . of L
such that y1 is a subword in h( y2), h2( yj), . . . . See Fig. 3.
On cube-free w-words 295
/I A
Yl
Fig. 3.
Remember now that h(a) and h(b) are of length at least two and that at least one
of them is of length at least three. Hence, we can estimate the lengths of words in the
sequence Yl, y2, y3, y4, y5 as follows:
Now we are ready to finish the proof. By Lemma 11, a word of length three is
obtained as a subword in no more than three steps, if at all. So our chase of
ancestors of y. is complete. We needed altogether 1 + 4 + 3 (or less) iteration steps.
To guarantee the existence of a cube, and not only an almost cube, two more steps
are enough, by the proof of our main theorem. Hence the limit 10.
We want to finish this section by sharpening the above limit in some special cases.
Let h be a pp-morphism, with h(a) = ax, x f A and Ih(a) I 2 i, Ih(b) / 2 i for some is 2.
Let further o(i) be the smallest number satisfying: for any such h, it generates a
cube-free o-word if and only if {h”(a) 1n 5 a(i)} is cube-free.
With these notation we have
Theorem 5. For different values of i the following holds true: a(2) 5 10, a(3) I 9,
o(4) I 8, o(5) I 8 and a(i) I 7 for i L 6.
Proof. The case i=2 was proved in Theorem 4. For other values of i the proof can
be carried out by applying the ideas from there. Without going into details we only
mention how the value of o(i) is formed as the sum of the number of different
stages:
i=3: 1+3+3=7,
i=4: 1+2+3=6,
i=5: 1+2+3=6,
ir6: 1+1+3=5.
In all the cases two extra steps are needed to guarantee the existence of a cube and
not only of an almost cube.
The above values of o(i) are not claimed to be the best ones. On the other hand,
they are not very large either.
296 J. Karhumtiki
7. Discussion
Acknowledgements
The author is grateful to the Academy of Finland for the excellent working
On cube-free w-words 297
conditions under which this research was done. The author also wants to thank the
referees for their useful comments.
References
(11 D.R. Bean, A. Ehrenfeucht and G. McNulty, Avoidable patterns in strings of symbols, Pacific J.
Math. 85 (1979) 261-294.
[2] J. Berstel, SW les mots sans carte definis par une morphisme, Springer Lecture Notes in Computer
Science 71 (1979) 16-25.
[3] J. Berstel, Mots sans carre et morphismes iteres, Discrete Math. 29 (1979) 235-244.
14) F.-J. Brandenburg, Uniformily growing k-free homomorphisms, Theoret. Comput. Sci. 23 (1983),
to appear.
[5] F. Dejean, Sur un theoreme de Thue, J. Combin. Theory 13 (1972) 90-99.
[6] A. Ehrenfeucht and G. Rozenberg, On the subword complexity of square-free DOL languages,
Theoret. Comput. Sci. 16 (1981) 25-32.
[7] J. Karhumaki, On strongly cube-free w-words generated by binary morphisms, Springer Lecture
Notes in Computer Science 117 (1981) 182-189.
[8] G. Rozenberg and A. Salomaa, The Mathematical Theory of L Systems (Academic Press, London,
1980).
[9] A. Salomaa, Jewels of Formal Language Theory (Computer Science Press, 1981).
[lo] A. Thue, ijber unendliche Zeichenreihen, Videnskapsselskapets Skrifter. I. Mat.-naturv. Klasse,
Kristiania (1906) l-22.
[l l] A. Thue, ijber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen, Videnskapsselskapets
Skrifter, I. Mat.-naturv. Klasse, Kristiania (1912) l-67.