0% found this document useful (0 votes)
10 views8 pages

Lecture 1 - Words, Sequences, Morphisms

The document discusses combinatorics on words, focusing on basic definitions, morphisms, and various types of sequences such as the Thue-Morse sequence and square-free words. It introduces concepts like factor complexity, morphisms, and specific sequences generated through morphisms, including the Dragon curve and σ-sequence. The content is structured into sections covering definitions, properties, and examples relevant to the study of combinatorial structures in words.

Uploaded by

miru park
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

Lecture 1 - Words, Sequences, Morphisms

The document discusses combinatorics on words, focusing on basic definitions, morphisms, and various types of sequences such as the Thue-Morse sequence and square-free words. It introduces concepts like factor complexity, morphisms, and specific sequences generated through morphisms, including the Dragon curve and σ-sequence. The content is structured into sections covering definitions, properties, and examples relevant to the study of combinatorial structures in words.

Uploaded by

miru park
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Combinatorics on Words Sergey Kitaev

SMSTC Module

Lecture 1. Words, sequences, morphisms

Contents

1 Basic definitions 1

2 Morphisms 2

3 Fractal like sequences 4

4 Problems 7

Bibliography 7

1 Basic definitions

An alphabet is a non-empty set of letters, or symbols, like A = {a, b}, or B = {2, ⊕, 5, q}.
We will consider only finite alphabets. A word over an alphabet A is a sequence of letters
from A, like aba, or bbababbaaa. A word can be either finite, or infinite in one direction
(usually to the right), or infinite in two directions (bi-infinite). Infinite words are often
referred to as sequences. The empty word ε is the word having no letters. An denotes the
set of all words of length n over A. Note that there are k n words of length n over a k-letter
alphabet.
A∗ , A+ , and A∞ denote the set of all finite, finite non-empty, and infinite (to the
right) words over A, respectively. In the case when A = {a}, we often write a∗ , a+ , and a∞
instead of A∗ , A+ , and A∞ , respectively. Concatenation (also called catenation) of words
u = u1 u2 · · · un and v = v1 v2 · · · vm , is u · v = uv = u1 u2 · · · un v1 v2 · · · vm . A language L
over A is any subset of A∗ . Any expression of the form X1 X2 · · · Xm where Xi is a language
defines the product of languages. For example, a+ b∗ cc is the set of all words in which the
letter a is followed by a number (maybe none) of other a’s, which in turn is followed by a
number (maybe none) of b’s, and which end with two c’s. The length of a word w, denoted
by |w|, is the number of letters in w. For example, |f ish| = 4. The number of copies of a
letter a in a word w is denoted by |w|a . For example, |crocodile|o = 2. The alphabet of a
word w is Alph(w) = {a | |w|a > 1}. For example, Alph(extreme) = {e, x, t, r, m}.
A word u is a factor of w (resp., left factor or a prefix; a right factor or a suffix) if
there exist words x and y such that w = xuy (resp., w = uy; w = xu). A factor of w is
proper if it is different from w. For example, in the word 332312313, 33231 is a (proper)

1
prefix, 13 is a (proper) suffix, and 31 is a (proper) factor (occurring twice in the word).
A word of the form xx · · · x, with k copies of x, is abbreviated by xk . The reverse of a
word w = w1 w2 · · · wn is the word wR = wn wn−1 · · · w1 . A factorization of a word w is
any sequence (u1 , u2 , . . ., un ) of words such that w = u1 u2 · · · un . For example, two of
the factorizations of the word 2213421 are as follows: (2, 213, 421) and (22, 1342, 1). A
factorization u1 u2 · · · un is L-factorization if all ui ’s are from L. It is natural to write

L∗ = {u1 u2 · · · un | n ≥ 0 and ui ∈ L},

L+ = {u1 u2 · · · un | n ≥ 1 and ui ∈ L}.


Note that each w in L∗ has at least one L-factorization, and if this is always unique, then
L is called a code.

Factor complexity. The factor complexity function fw (n) is the number of distinct
factors of length n in a finite or infinite word w. For example, for all n ≥ 1, fw (n) = 1 if w
is the sequence 111 · · · , and for the word u = 11211, fu (1) = fu (4) = 2, fu (2) = fu (3) = 3,
fu (5) = 1, and fu (n) = 0 for n ≥ 6.
Normally, one is interested in factor complexity of infinite words. Since for every
factor u occurring in an infinite word w, there exists at least one letter a such that ua
occurs in w, we conclude that

fw (n) ≤ fw (n + 1) for all n, (1)

so that fw (n) is a nondecreasing function. We also have

fw (m + n) ≤ fw (m)fw (n) for all m, n

because every factor of length m + n is the concatenation of a factor of length m and a


factor of length n but any such concatenation is not necessarily a factor occurring in w.

2 Morphisms

Let A and B be two alphabets (possibly A = B). A map ϕ : A∗ → B ∗ is called a mor-


phism, if we have ϕ(uv) = ϕ(u)ϕ(v) for any u, v ∈ A∗ . A morphism ϕ can be defined by
defining ϕ(a) for each a ∈ A. A particular property of a morphism ϕ is that ϕ(ε) = ε.
A morphism ϕ is nonerasing if ϕ(a) 6= ε for each a ∈ A. A morphism ϕ is a code if it is
injective, that is, if ϕ(a) 6= ϕ(b) for all a, b ∈ A.

The Thue-Morse sequence. The Thue-Morse sequence t is probably the most famous
example of a sequence defined by iterations of (nonerasing, uniform) morphism θ:

θ(0) = 01
θ(1) = 10.

2
The first few initial iterations of h are as follows:

0, 01, 0110, 01101001, 0110100110010110, . . . .

A remarkable property of this sequence is that it does not contain a factor of the form
XXx, where X is itself a factor and x is the first letter in X. We then say that the
Thue-Morse sequences is 2+ -free (in particular, it is cube-free). We can also say that the
Thue-Morse sequence is overlap-free as it does not contain a factor of the form axaxa,
where a is a letter and x is a word (to be proved in Lecture 3).
Another way to define this sequence is recursive: Let h0 (0) = 0 and, for n ≥ 1,
hn (0) = hn−1 (0)c(hn−1 (0)), where the function c : {0, 1}∗ → {0, 1}∗ is the complement
that swaps 0s and 1s. Then h1 (0) = 01, h2 (0) = 0110, h3 (0) = 01101001, etc.

Square-free words. A word is square-free if it does not contain two equal factors that
are staying next to each other. For example, 123132 is square-free while 1232311 is not,
as it contains squares 2323 and 11. It is easy to see that the longest square-free word
on a 2-letter alphabet, say {a, b}, is aba, while the longest square-free word on a 1-letter
alphabet is of length 1. Axel Thue [6] proved in 1906 that there are infinitely many
square-free words on a 3-letter alphabet (this result was rediscovered many times). An
infinite sequence on a 3-letter alphabet avoiding squares is given by iterating the following
morphism µ:
a → abc
b → ac
c → b.
The initial values of the sequence are abcacbabcbacabcacbacabcb · · · . Other morphisms
defining square-free sequences are as follows:

Axel Thue, 1912 Axel Thue, 1912 Jonathan Leech, 1957


a → abcab a → abacb a → abcbacbcabcba
b → acabcb b → abcbac b → bcacbacabcacb
c → acbcacb c → abcacbc c → cabacbabcabac

Arshon sequence. In 1937, Arshon [1] defined the following infinite word over the
alphabet A = {1, 2, . . . , n} using two morphisms applied depending on the parity of the
position of a letter: Let w1 = 1. For k ≥ 1, wk+1 is obtained by replacing the letters of
wk :

in odd positions in even positions


1 → 123 · · · (n − 1)n 1 → n(n − 1) · · · 321
2 → 234 · · · (n − 1)n1 2 → 1n(n − 1) · · · 432
... ...
n → n12 · · · (n − 2)(n − 1) n → (n − 1)(n − 2) · · · 21n

3
Figure 1: Dragon curve

Then w2 = 123 · · · (n − 1)n and each wi is the initial subword of wi+1 , so w = lim wi
i→∞
is well defined. When n = 3, w is called the Arshon sequence. Arshon proved that the
sequence defined above is square-free thus providing a non-trivial example of a square-free
sequence on n-letter alphabet.
While for even n the sequence is essentially defined by iteration of a (single) mor-
phism, it was proved by Currie [2] in 2002 that for odd n no morphisms defining the
sequences exist. Note that the particular case of n = 3 was proved in [4] prior Currie’s
work, where the proof by contradiction is on 3 pages and involves the fact that the Arshon
sequence is square-free.

3 Fractal like sequences

Dragon curve sequence. The Dragon curve sequence was discovered by NASA physicist
John E. Heighway and was described by Martin Gardner in his Scientific American column
Mathematical Games in 1967. This sequence is also known as paperfolding sequence: Start
with a rectangular piece of paper which we shall view from the edge. Fold the right half
over the left half, with a sharp crease down the middle. Take the folded paper and fold
again the same. Continue this folding process for a few more generations. After a number
of folds, unfold the paper, and spread each fold to an angle of exactly 90 degrees. The
resulting edge curve is our dragon that would look like the shape in Figure 1.
This curve is a classic example of a recursively generated fractal shape. While trav-
eling through the Dragon curve sequence, one can create a binary word indicating whether

4
a turn to the right or a turn to the left was made. For the picture above, this word would
start like ``r``rr```rr`r · · · . It turns out that the word corresponding to the Dragon curve
sequence is equivalent to the σ-sequence wσ defined next.

σ-sequence. Any natural number n can be presented unambiguously as

n = 2t (4s + σ)

where σ < 4, and t is the greatest natural number such that 2t divides n. If n runs through
the natural numbers then σ runs through the sequence that we will call the sequence of
σ, or the σ-sequence. We let wσ denote that sequence. Obviously, wσ consists of 1s and
3s. The initial letters of wσ are 11311331113313 . . ..
An equivalent definition of the σ-sequence:

C1 = 1, D1 = 3

Ck+1 = Ck 1Dk , Dk+1 = Ck 3Dk

k = 1, 2, . . .

and wσ = lim Ck . However, σ-sequence cannot be defined by iteration of a morphism


k→∞
[4]. We consider a proof of this result as an introduction to possible approaches to justify
non-existence results about morphisms.

Theorem 3.1 ([4]). There does not exist a morphism whose iteration defines the sequence
wσ .

Proof. Suppose there exists a morphism f such that f (1) = X, f (3) = Y and wσ =
lim f k (1). Obviously, X consists of the first |X| letters of w, where |X| is the length of
k→∞
X.
Lemma 3.2. The subsequence of wσ in odd positions is 1313131 · · · .

Proof. The odd positions of wσ correspond to the odd numbers n = 20 (4s + σ) = 4s + σ,


so clearly σ alternates between 1 and 3.

Lemma 3.3. |X| ≡ 0 (mod 4).

Proof. It is easy to see that f (1) = 1X (1) , where |X (1) | ≥ 1, since otherwise |f k (1)| = 1,
for k = 1, 2, 3 . . ., so wσ cannot be obtained by iterating f .
Suppose |X (1) | = 1, that is f (1) = 11. But then wσ consists of 1s only, which is
impossible, hence f (1) = 11X (2) , where |X (2) | ≥ 1.

5
Suppose |X (2) | = 1, that is f (1) = 113. Since wσ has the factor 111, then wσ has
the factor f (111) = 113113113. If f (111) begins with a letter in an odd position, then
the bold letters 113113113, read from left to right, will give five consecutive letters of wσ
in odd positions. This contradicts Lemma 3.2. If f (111) begins with a letter in an even
position, then considering letters in odd positions will lead to the same contradiction with
Lemma 3.2, hence f (1) = 113X (3) , where |X (3) | ≥ 1.
Suppose |X (3) | = 1, that is f (1) = 1131. Then f 2 (1) = 11311131Y 1131 and the
bold letter does not coincide with the letter of wσ standing in the same place, hence
f (1) = 1131X (4) , where |X (4) | ≥ 1.
If |X| is odd, then the bold letters in f 2 (1) = 1131X (4) 1131X (4) · · · are two consec-
utive letters in odd places. This contradicts Lemma 3.2. Hence |X| is even.
We have f 2 (1) = XX = X1131X (4) · · · , and thus the next-to-last letter of X is a 3
in an odd position, since otherwise two consecutive 1s in odd positions in wσ are found,
which would contradict Lemma 3.2. Then, the natural number corresponding to the next-
to-last letter of X can be written as 20 (4s + 3) and so |X| = 20 (4s + 3) + 1 = 4(s + 1) ≡ 0
(mod 4).

The following lemma is straightforward to prove.


Lemma 3.4. If n1 = 2t1 (4s1 +1), n2 = 2t2 (4s2 +1), n3 = 2t3 (4s3 +3) and n4 = 2t4 (4s4 +3)
then n1 n2 , n3 n4 can be written as 2t (4s + 1), and n1 n3 as 2t (4s + 3).

It follows from Lemma 3.3 that |X| = 4t for some t. Suppose now that X ends with
1 (the case when X ends with 3 is similar), that is, in the (4t)th position in X we have
1. Since multiplication by 2 does not change σ, we have 1 in the (2t)th position in X.
Consider f 2 (1) = XX. The letters in the second X occupy the positions from (4t + 1)th
to (8t)th in f 2 (1), and in the (6t)th position we must have 1, the same letter as in position
(2t). But 6t = 3(2t), whence, by Lemma 3.4, in positions (2t) and (6t) we must have
different letters. Contradiction. The theorem is proved.

Peano curve. The Peano Curve, also known as the Hilbert Space Filling Curve, discov-
ered in 1890 is an example of fractal space filling curves. A Peano word Pn is obtained by
traveling along the Peano curve after the n-th iteration. Pn is over the alphabet {u, ū, r, r̄}
where u stands for up, ū stands for down, r stands for right, and r̄ stands for left. For ex-
ample, P1 = urū and P2 = rur̄uurūrurūūr̄ūr. The Peano (infinite) word P = lim P2n+1 .
n→∞
The Peano word cannot be generated by iteration of a morphism [5]. Interestingly, there is
a claim in [3] that there is a way to generate the Peano word first by iterating a morphism
over a 12-letter alphabet, and then renaming the 12 letters to get the alphabet {u, ū, r, r̄},
thus showing relevance of morphisms to the Hilbert Space Filling Curve. However, no
justification of this claim is given in [3], instead finding the construction is left to “the
reader as an exercise”.

6
Figure 2: Peano words P1 , P2 , P3

4 Problems

1. Let the set L(s) consist of all binary expansions of nonnegative integers divisible by
an integer s ≥ 1. For example, L(5) = {0, 101, 1010, 1111, 10100, . . .}. Is L(s) a code
for some s? Why?

2. What is the 2020th step in the Dragon curve sequence, left or right?

3. Describe the factor complexity function for

(a) 01010101010101 · · · ;
(b) 1001010101010 · · · .

4. Can you construct a sequence w over a 3-letter alphabet having the maximal possible
factor complexity function fw (n) for all n ≥ 1? What is fw (n) for this sequence?

5. Can one generate the following sequences by iteration of a morphism? If yes, con-
struct such a morphism (or even two, or three such morphisms); if not, provide an
argument why such a morphism does not exist.

(a) 01010101010101 · · · ;
(b) 01001000100001000001 · · · = 01 102 103 104 105 1 · · · .

6. What is the factor complexity of 1234567891011· · · , the concatenation of the decimal


expansions of the positive integers?

7. (a) Do we have a factor of the form xxx, where x ∈ Σ = {u, ū, r, r̄}, in the Peano
word Pn for some n? If your answer is yes, give an example of x and n. If your
answer is no, explain why!
(b) The same question is for xxxx instead of xxx.

7
References

[1] S. E. Arshon. The Proof of the Existence of n-letter Non-Repeated Asymmetric Se-
quences. Math. collected papers, New series, Publ. H. AS USSR, Vol. 2, Issue 3,
Moscow (1937), 769–779. (Russian)

[2] J. Currie. No iterated morphism generates any Arshon sequence of odd order. Discrete
Mathematics 259 (2002) 277–283.

[3] J. Karhumäki. Combinatorics on Words: A New Challenging Topic, TUCS Technical


Report No 645, December 2004.

[4] S. Kitaev. There are no iterated morphisms that define the Arshon sequence and
the sigma-sequence, Journal of Automata, Languages and Combinatorics 8 (2003) 1,
43–50.

[5] S. Kitaev, T. Mansour, P. Séébold. Generating the Peano curve and counting oc-
currences of some patterns. Journal of Automata, Languages and Combinatorics 9
(2004) 4, 439–455

[6] A. Thue. Über unendliche Zeichenreihen. Norske Vid. Selsk. Skr. I Math-Nat. Kl. 7
(1906) 1–22.

You might also like