Lecture 1 - Words, Sequences, Morphisms
Lecture 1 - Words, Sequences, Morphisms
SMSTC Module
Contents
1 Basic definitions 1
2 Morphisms 2
4 Problems 7
Bibliography 7
1 Basic definitions
An alphabet is a non-empty set of letters, or symbols, like A = {a, b}, or B = {2, ⊕, 5, q}.
We will consider only finite alphabets. A word over an alphabet A is a sequence of letters
from A, like aba, or bbababbaaa. A word can be either finite, or infinite in one direction
(usually to the right), or infinite in two directions (bi-infinite). Infinite words are often
referred to as sequences. The empty word ε is the word having no letters. An denotes the
set of all words of length n over A. Note that there are k n words of length n over a k-letter
alphabet.
A∗ , A+ , and A∞ denote the set of all finite, finite non-empty, and infinite (to the
right) words over A, respectively. In the case when A = {a}, we often write a∗ , a+ , and a∞
instead of A∗ , A+ , and A∞ , respectively. Concatenation (also called catenation) of words
u = u1 u2 · · · un and v = v1 v2 · · · vm , is u · v = uv = u1 u2 · · · un v1 v2 · · · vm . A language L
over A is any subset of A∗ . Any expression of the form X1 X2 · · · Xm where Xi is a language
defines the product of languages. For example, a+ b∗ cc is the set of all words in which the
letter a is followed by a number (maybe none) of other a’s, which in turn is followed by a
number (maybe none) of b’s, and which end with two c’s. The length of a word w, denoted
by |w|, is the number of letters in w. For example, |f ish| = 4. The number of copies of a
letter a in a word w is denoted by |w|a . For example, |crocodile|o = 2. The alphabet of a
word w is Alph(w) = {a | |w|a > 1}. For example, Alph(extreme) = {e, x, t, r, m}.
A word u is a factor of w (resp., left factor or a prefix; a right factor or a suffix) if
there exist words x and y such that w = xuy (resp., w = uy; w = xu). A factor of w is
proper if it is different from w. For example, in the word 332312313, 33231 is a (proper)
1
prefix, 13 is a (proper) suffix, and 31 is a (proper) factor (occurring twice in the word).
A word of the form xx · · · x, with k copies of x, is abbreviated by xk . The reverse of a
word w = w1 w2 · · · wn is the word wR = wn wn−1 · · · w1 . A factorization of a word w is
any sequence (u1 , u2 , . . ., un ) of words such that w = u1 u2 · · · un . For example, two of
the factorizations of the word 2213421 are as follows: (2, 213, 421) and (22, 1342, 1). A
factorization u1 u2 · · · un is L-factorization if all ui ’s are from L. It is natural to write
Factor complexity. The factor complexity function fw (n) is the number of distinct
factors of length n in a finite or infinite word w. For example, for all n ≥ 1, fw (n) = 1 if w
is the sequence 111 · · · , and for the word u = 11211, fu (1) = fu (4) = 2, fu (2) = fu (3) = 3,
fu (5) = 1, and fu (n) = 0 for n ≥ 6.
Normally, one is interested in factor complexity of infinite words. Since for every
factor u occurring in an infinite word w, there exists at least one letter a such that ua
occurs in w, we conclude that
2 Morphisms
The Thue-Morse sequence. The Thue-Morse sequence t is probably the most famous
example of a sequence defined by iterations of (nonerasing, uniform) morphism θ:
θ(0) = 01
θ(1) = 10.
2
The first few initial iterations of h are as follows:
A remarkable property of this sequence is that it does not contain a factor of the form
XXx, where X is itself a factor and x is the first letter in X. We then say that the
Thue-Morse sequences is 2+ -free (in particular, it is cube-free). We can also say that the
Thue-Morse sequence is overlap-free as it does not contain a factor of the form axaxa,
where a is a letter and x is a word (to be proved in Lecture 3).
Another way to define this sequence is recursive: Let h0 (0) = 0 and, for n ≥ 1,
hn (0) = hn−1 (0)c(hn−1 (0)), where the function c : {0, 1}∗ → {0, 1}∗ is the complement
that swaps 0s and 1s. Then h1 (0) = 01, h2 (0) = 0110, h3 (0) = 01101001, etc.
Square-free words. A word is square-free if it does not contain two equal factors that
are staying next to each other. For example, 123132 is square-free while 1232311 is not,
as it contains squares 2323 and 11. It is easy to see that the longest square-free word
on a 2-letter alphabet, say {a, b}, is aba, while the longest square-free word on a 1-letter
alphabet is of length 1. Axel Thue [6] proved in 1906 that there are infinitely many
square-free words on a 3-letter alphabet (this result was rediscovered many times). An
infinite sequence on a 3-letter alphabet avoiding squares is given by iterating the following
morphism µ:
a → abc
b → ac
c → b.
The initial values of the sequence are abcacbabcbacabcacbacabcb · · · . Other morphisms
defining square-free sequences are as follows:
Arshon sequence. In 1937, Arshon [1] defined the following infinite word over the
alphabet A = {1, 2, . . . , n} using two morphisms applied depending on the parity of the
position of a letter: Let w1 = 1. For k ≥ 1, wk+1 is obtained by replacing the letters of
wk :
3
Figure 1: Dragon curve
Then w2 = 123 · · · (n − 1)n and each wi is the initial subword of wi+1 , so w = lim wi
i→∞
is well defined. When n = 3, w is called the Arshon sequence. Arshon proved that the
sequence defined above is square-free thus providing a non-trivial example of a square-free
sequence on n-letter alphabet.
While for even n the sequence is essentially defined by iteration of a (single) mor-
phism, it was proved by Currie [2] in 2002 that for odd n no morphisms defining the
sequences exist. Note that the particular case of n = 3 was proved in [4] prior Currie’s
work, where the proof by contradiction is on 3 pages and involves the fact that the Arshon
sequence is square-free.
Dragon curve sequence. The Dragon curve sequence was discovered by NASA physicist
John E. Heighway and was described by Martin Gardner in his Scientific American column
Mathematical Games in 1967. This sequence is also known as paperfolding sequence: Start
with a rectangular piece of paper which we shall view from the edge. Fold the right half
over the left half, with a sharp crease down the middle. Take the folded paper and fold
again the same. Continue this folding process for a few more generations. After a number
of folds, unfold the paper, and spread each fold to an angle of exactly 90 degrees. The
resulting edge curve is our dragon that would look like the shape in Figure 1.
This curve is a classic example of a recursively generated fractal shape. While trav-
eling through the Dragon curve sequence, one can create a binary word indicating whether
4
a turn to the right or a turn to the left was made. For the picture above, this word would
start like ``r``rr```rr`r · · · . It turns out that the word corresponding to the Dragon curve
sequence is equivalent to the σ-sequence wσ defined next.
n = 2t (4s + σ)
where σ < 4, and t is the greatest natural number such that 2t divides n. If n runs through
the natural numbers then σ runs through the sequence that we will call the sequence of
σ, or the σ-sequence. We let wσ denote that sequence. Obviously, wσ consists of 1s and
3s. The initial letters of wσ are 11311331113313 . . ..
An equivalent definition of the σ-sequence:
C1 = 1, D1 = 3
k = 1, 2, . . .
Theorem 3.1 ([4]). There does not exist a morphism whose iteration defines the sequence
wσ .
Proof. Suppose there exists a morphism f such that f (1) = X, f (3) = Y and wσ =
lim f k (1). Obviously, X consists of the first |X| letters of w, where |X| is the length of
k→∞
X.
Lemma 3.2. The subsequence of wσ in odd positions is 1313131 · · · .
Proof. It is easy to see that f (1) = 1X (1) , where |X (1) | ≥ 1, since otherwise |f k (1)| = 1,
for k = 1, 2, 3 . . ., so wσ cannot be obtained by iterating f .
Suppose |X (1) | = 1, that is f (1) = 11. But then wσ consists of 1s only, which is
impossible, hence f (1) = 11X (2) , where |X (2) | ≥ 1.
5
Suppose |X (2) | = 1, that is f (1) = 113. Since wσ has the factor 111, then wσ has
the factor f (111) = 113113113. If f (111) begins with a letter in an odd position, then
the bold letters 113113113, read from left to right, will give five consecutive letters of wσ
in odd positions. This contradicts Lemma 3.2. If f (111) begins with a letter in an even
position, then considering letters in odd positions will lead to the same contradiction with
Lemma 3.2, hence f (1) = 113X (3) , where |X (3) | ≥ 1.
Suppose |X (3) | = 1, that is f (1) = 1131. Then f 2 (1) = 11311131Y 1131 and the
bold letter does not coincide with the letter of wσ standing in the same place, hence
f (1) = 1131X (4) , where |X (4) | ≥ 1.
If |X| is odd, then the bold letters in f 2 (1) = 1131X (4) 1131X (4) · · · are two consec-
utive letters in odd places. This contradicts Lemma 3.2. Hence |X| is even.
We have f 2 (1) = XX = X1131X (4) · · · , and thus the next-to-last letter of X is a 3
in an odd position, since otherwise two consecutive 1s in odd positions in wσ are found,
which would contradict Lemma 3.2. Then, the natural number corresponding to the next-
to-last letter of X can be written as 20 (4s + 3) and so |X| = 20 (4s + 3) + 1 = 4(s + 1) ≡ 0
(mod 4).
It follows from Lemma 3.3 that |X| = 4t for some t. Suppose now that X ends with
1 (the case when X ends with 3 is similar), that is, in the (4t)th position in X we have
1. Since multiplication by 2 does not change σ, we have 1 in the (2t)th position in X.
Consider f 2 (1) = XX. The letters in the second X occupy the positions from (4t + 1)th
to (8t)th in f 2 (1), and in the (6t)th position we must have 1, the same letter as in position
(2t). But 6t = 3(2t), whence, by Lemma 3.4, in positions (2t) and (6t) we must have
different letters. Contradiction. The theorem is proved.
Peano curve. The Peano Curve, also known as the Hilbert Space Filling Curve, discov-
ered in 1890 is an example of fractal space filling curves. A Peano word Pn is obtained by
traveling along the Peano curve after the n-th iteration. Pn is over the alphabet {u, ū, r, r̄}
where u stands for up, ū stands for down, r stands for right, and r̄ stands for left. For ex-
ample, P1 = urū and P2 = rur̄uurūrurūūr̄ūr. The Peano (infinite) word P = lim P2n+1 .
n→∞
The Peano word cannot be generated by iteration of a morphism [5]. Interestingly, there is
a claim in [3] that there is a way to generate the Peano word first by iterating a morphism
over a 12-letter alphabet, and then renaming the 12 letters to get the alphabet {u, ū, r, r̄},
thus showing relevance of morphisms to the Hilbert Space Filling Curve. However, no
justification of this claim is given in [3], instead finding the construction is left to “the
reader as an exercise”.
6
Figure 2: Peano words P1 , P2 , P3
4 Problems
1. Let the set L(s) consist of all binary expansions of nonnegative integers divisible by
an integer s ≥ 1. For example, L(5) = {0, 101, 1010, 1111, 10100, . . .}. Is L(s) a code
for some s? Why?
2. What is the 2020th step in the Dragon curve sequence, left or right?
(a) 01010101010101 · · · ;
(b) 1001010101010 · · · .
4. Can you construct a sequence w over a 3-letter alphabet having the maximal possible
factor complexity function fw (n) for all n ≥ 1? What is fw (n) for this sequence?
5. Can one generate the following sequences by iteration of a morphism? If yes, con-
struct such a morphism (or even two, or three such morphisms); if not, provide an
argument why such a morphism does not exist.
(a) 01010101010101 · · · ;
(b) 01001000100001000001 · · · = 01 102 103 104 105 1 · · · .
7. (a) Do we have a factor of the form xxx, where x ∈ Σ = {u, ū, r, r̄}, in the Peano
word Pn for some n? If your answer is yes, give an example of x and n. If your
answer is no, explain why!
(b) The same question is for xxxx instead of xxx.
7
References
[1] S. E. Arshon. The Proof of the Existence of n-letter Non-Repeated Asymmetric Se-
quences. Math. collected papers, New series, Publ. H. AS USSR, Vol. 2, Issue 3,
Moscow (1937), 769–779. (Russian)
[2] J. Currie. No iterated morphism generates any Arshon sequence of odd order. Discrete
Mathematics 259 (2002) 277–283.
[4] S. Kitaev. There are no iterated morphisms that define the Arshon sequence and
the sigma-sequence, Journal of Automata, Languages and Combinatorics 8 (2003) 1,
43–50.
[5] S. Kitaev, T. Mansour, P. Séébold. Generating the Peano curve and counting oc-
currences of some patterns. Journal of Automata, Languages and Combinatorics 9
(2004) 4, 439–455
[6] A. Thue. Über unendliche Zeichenreihen. Norske Vid. Selsk. Skr. I Math-Nat. Kl. 7
(1906) 1–22.