Solutions
Solutions
in
Entropy in Dynamical Systems
T. Downarowicz
June, 2011
1
A few words about the Exercises
This file contains solutions of almost all exercises included in the book. There are three
exceptions: 3.10, 8.6 and 12.5. The first one instructs to write a computer program. I
think, even if I presented such a program, nobody would really read the code or want
to see it work. The only way to enjoy is this exercise is to actually do it. Exercise
8.6 is completely trivial. As to the last skipped exercise, although I am sure what I
claim there is true, I have never done it before. I see a number of obstacles where the
standard approach could break and some ingenuity might be needed. I decided to leave
this challenge open; it might turn out worth a separate article. I should have formulated
this as a question rather than an exercise.
I confess, there are several exercises that I never bothered to actually do before I put
them in the book. I just had a rough idea how to proceed. When working on this file,
it happened more than once that I encountered unexpected difficulties. Some solutions
turned into pieces of work comparable to writing a small article (7.3, 8.9, 8.14, 9.5,
12.2). In some cases I needed to slightly alter the formulation of the exercise or add
an assumption. In such cases the alternation is clearly indicated at the beginning of
the solution (search for “Attention!”). In exercise 6.3 I extrapolated a property typical
for smooth interval map to all systems, which is an evident symptom of tiredness. I
apologize for all these errors. Some of them result from the fact that many exercises
have been added after completing and proofreading the main body of the book.
Moreover, this work lead me to discovering a few more imprecisions in the book. All
resulting corrections are included in the file “Errata”.
I encourage the readers of the book (and of the solutions, if anyone bothers) to send me
information of any discovered further errors, or any comments, via e-mail. They will
be welcome. The book cannot be corrected, but the errata can always be updated.
Tomasz Downarowicz
2
Part 1
1 Exercises in Chapter 1
Exercise 1.1.
We have
y x
x= x+y ·0+ x+y · (x + y),
hence, by concavity,
y x
f (x) ≥ x+y · f (0) + x+y · f (x + y).
Analogously,
x y
f (y) ≥ x+y · f (0) + x+y · f (x + y).
Summing both sides, we get f (x) + f (y) ≥ f (0) + f (x + y) ≥ f (x + y).
Exercise 1.2.
Convergence in ℓ1 obviously implies coordinatewise convergence for any vectors in
ℓ1 . The converse holds only with a constraint, for example for probability P vectors. Let
p = (p1 , p2 , . . . ) be a probability vector and let i0 be so large, that i>i0 pi < ε/4.
Let p′ = (p′1 , p′2 , . . . ) be another probability vector such that |pi − p′i | < ε/4i0 for all
i ≤ i0 . Then
X X X X X X
p′i = 1 − p′i = 1 − pi + (pi − p′i ) ≤ pi + |pi − p′i | < ε
2
i>i0 i≤i0 i≤i0 i≤i0 i>i0 i≤i0
and thus X X X X
|pi − p′i | = |pi − p′i | + pi + p′i < ε.
i≥1 i≤i0 i>i0 i>i0
Exercise 1.3.
H(a ∨ b|c) =
H(a ∨ b ∨ c) − H(c) = H(a ∨ b ∨ c) − H(b ∨ c) + H(b ∨ c) − H(c) =
H(a|b ∨ c) + H(b|c),
and
H(a ∨ b|c) = H(a ∨ b ∨ c) − H(c) ≥ H(a ∨ c) − H(c) = H(a|c).
With the further assumptions,
H(a ∨ b|c) = H(a|b ∨ c) + H(b|c) ≤ H(a|c) + H(b|c);
H(a ∨ b) = H(a ∨ b ∨ e) = H(a ∨ b|e) + H(e) ≤ H(a|e) + H(b|e) + H(e)
= H(a) + H(b) − H(e) ≤ H(a) + H(b);
H(a ∨ a′ |b ∨ b′ ) ≤ H(a|b ∨ b′ ) + H(a′ |b ∨ b′ ) ≤ H(a|b) + H(a′ |b′ );
H(a|c) ≤ H(a ∨ b|c) ≤ H(a|b ∨ c) + H(b|c) ≤ H(a|b) + H(b|c). (*)
3
Suppose H(a|c) ≥ H(b|c). Then |H(a|c) − H(b|c)| = H(a|c) − H(b|c) and (*)
implies |H(a|c) − H(b|c)| ≤ H(a|b) ≤ max{H(a|b), H(b|a)}. The other case is
symmetric.
Similarly, suppose H(a|c) ≥ H(a|b). Then |H(a|b) − H(a|c)| = H(a|c) − H(a|b)
and (*) implies |H(a|b) − H(a|c)| ≤ H(b|c) ≤ max{H(b|c), H(c|b)}. The other case
is symmetric.
Finally, |H(a) − H(b)| = |H(a|e) + H(e) − H(b|e) − H(e)| = |H(a|e) − H(b|e)| ≤
max{H(a|b), H(b|a)}.
Exercise 1.4.
This is a very crude estimate. Suppose p1 is the minimal term in p, let q = 1 − p1 , and
define two new probability vectors r = (p1 , q) and q = ( pq2 , pq3 , . . . , pql ). Then
X
l X
l
pi pi
H(p) = − pi log pi = −p1 log p1 − q q (log q + log q) =
i=1 i=2
− p1 log p1 − q log q + qH(q) = H(r) + qH(q) ≤ 1 + (1 − p1 ) log l.
Exercise 1.5.
Pick m1 so large that p′1 is close enough to 1 to satisfy η(p1 ) < ε/2 and η(1 − p1 ) <
ε/4. No matter how we pick m2 we will have p2 ≤ 1 − p1 and since η increases near
zero, we will have η(p2 ) < ε/4. We pick m2 large enough to make p1 + p2 so close to
1 that η(1 − p1 − p2 ) < ε/8. No matter how we pick m3 , we will have η(p3 ) < ε/8.
And so on. Eventually, we get
X X
H(p′ ) = η(p′i ) < ε/2i = ε.
i≥1 i≥1
Exercise 1.6.
Take the partitions P, Q and R as in the proof of Fact 1.9.1 (they produce the vector
(1, 1, 1, 2, 2, 2, 2)). Then I(P ∨ Q; R) = H(P ∨ Q) + H(R) − H(P ∨ Q ∨ R) =
2 + 1 − 2 = 1, while both I(P; R) and I(Q; R) are zeros, because the partitions are
pairwise independent (see Fact 1.8.2).
Exercise 1.7.
If a = 0 or b = 0 or c = a + b, the problem is trivial. Otherwise, let p ∈ (0, 1) be
such that H(p, 1 − p) < min{a, b, a + b − c}. Divide the space in two parts A and
B of measures p and 1 − p, respectively. Then let R be a partition of A with (relative)
entropy p1 (a + b − c − H(p, 1 − p)). Let P′ and Q′ be two independent partitions of
1 1
B with (relative) entropies 1−p (c − b) and 1−p (c − a), respectively. The partition P
be defined as R on A and P on B, and analogously, Q is defined as R on A and Q′
′
4
where, in the last case, we have used relative independence of Q′ and P′ on B and
Fact 1.6.16.
2 Exercises in Chapter 2
Exercise 2.1.
It is clear that π is onto (both for the unilateral and bilateral shift space) and each (xn )
has the same image under π as (x′n ), where x′n = xn + 1 (addition is modulo 2). So
the mapping π is exactly 2 to 1. The preimage by π of a block B = (b0 , . . . , bn−1 )
equals C ∪ C ′ , where
Each of these blocks (as cylinder) has measure 2−n−1 and since they are disjoint, their
union has measure 2−n , the same as B. We have proved that π sends the Bernoulli
measure to itself. In other words, the factor process is the same Bernoulli shift and so
the identity map (not π) provides an isomorphism between the process and its factor.
Exercise 2.2.
Note that for n ∈ N, the past of the power process (X, Pn , µ, T n , S) equals the past
P− of the original process. By the power rule we have
Exercise 2.3.
In the Bernoulli shift on two symbols with equal measures 1/2, 1/2 let R denote the
zero-coordinate partition. Consider P = R{1,3} and Q = R{0,2} . These partitions are
independent, so H(P|Q) = H(P) = 2. Next, P2 = R{1,2,3,4} and Q2 = R{0,1,2,3} and
only one coordinate in the definition of P does not occur in Q, so H(P2 |Q2 ) = 1. It is
seen that H(Pn |Qn ) = 1 for all n ≥ 2. The sequence 2, 1, 1, 1, . . . is not increasing,
the increments are (−1, 0, 0, 0, . . . ) and do not decrease.
To have increments increasing for a longer time take e.g. P = R{1,3,4,7,8,9} and
Q = R{0,2,5,6,10} . Then the sequence (H(Pn |Qn ))n equals (6, 3, 1, 0, 0, 0, . . . ) and
the increments are (−3, −2, −1, 0, 0, 0, . . . ).
Exercise 2.4.
Although the sequence an = H(Pn |Qn ∨ B) need not have decreasing increments, it
has decreasing nths and any sequence (an ) with convergent nths satisfies
1X
n
1
lim an = lim (ai − ai−1 ) ≤ lim sup(an+1 − an ).
n n i=1
5
In our case we have
We have T −1 (Qn ) 4 Qn+1 and, by subinvariance, T −1 (B) 4 B, hence the right hand
side does not exceed
The expressions on the right decrease to H(P|P+ ∨ QN0 ∨ B), so we have proved that
Exercise 2.5.
Let (X, P, µ, T, S) be any process with positive entropy h. Take Q to be the trivial
partition and set B = P+ . Then h(P|Q, B) = lim n1 H(Pn |P+ ) = lim n1 H(P|P+ ) =
lim n1 h = 0, while H(P|P+ ∨ QN0 ∨ B) = H(P|P+ ∨ P+ ) = H(P|P+ ) = h > 0.
Exercise 2.6.
Since h(Q) ≤ h(Q ∨ P) = h(Q|P) + h(P), it suffices to show that h(Q|P) = 0. By
(2.3.11) (for trivial B), and since Q 4 PZ , we do have h(Q|P) = H(Q|Q+ ∨ PZ ) = 0.
Exercise 2.7 ([Downarowicz–Serafin, 2002, Example 1]).
Let X ⊂ {0, 1, 2}S consist of sequences in which 0 appears every other position
(and not in between, e.g 0101020102 . . . or 1010201020 . . . ). Let P denote the zero-
coordinate partition and let µ be the (shift-invariant) measure determined by saying that
cylinders of even length 2n have equal masses 2−n−1 . The partition Q = {0, 1 ∪ 2}
is shift-invariant, so it determines a two-point factor (Y, Q, ν, S, S) of (X, P, µ, T, S).
For ergodicity of (X, P, µ, T, S) notice that T 2 is ergodic (in fact Bernoulli) on both 0
and 1 ∪ 2 so every T -invariant function (being T 2 -invariant) is constant on either set.
Now T exchanges these sets, so the two constants must match. Obviously h(µ|ν) =
h(µ) = 1/2. On the other hand, the fiber entropy is not constant on Y : we have
h(P|0) = lim H(µ0 , P|P[1,n] ) = 0 (because P is trivial on the fiber of 0). Now, since
h(µ|ν) = 21 (h(P|0) + h(P|1 ∪ 2)) it must be that h(P|1 ∪ 2) = 1.
Exercise 2.8.
By the Ergodic Theorem, the probability vectors pn,Bm (x) assinging values to the ele-
ments of Pn converge almost surely to the vector p(µ, Pn ). For finite partition P the as-
sertion now follows from the definitions Hn (Bm (x)) = n1 H(pn,Bm (x) ), H(µ, Pn ) =
H(p(µ, Pn )) combined with continuity of static entropy on finite-dimensional proba-
bility vectors.
There is slight difficulty with countable partitions. By lower semicontinuity of the static
entropy (Fact 1.1.8) we only have limm Hn (Bm (x)) ≥ n1 H(µ, Pn ). Note however,
that pn,Bm (x) is the same as the vector of masses assigned to the cells of Pn by the
6
average measure Mm−n δ x . Integrating these probability vectors with respect to the
invariant measure µ we get
Z
pn,Bm (x) dµ(x) = p(µ, Pn ).
This, together with the µ-almost sure converse inequality implies equality µ-almost
everywhere.
3 Exercises in Chapter 3
Exercise 3.1.
1 n
Given ε > 0, we have, for large n: n H(P ) ≤ h(P) + ε, i.e.,
which is exactly the desired nε-entropy independence. It is not trivial whenever ε <
h(µ, T, P) (then nε < h(µ, T n , Pn )). Since n is selected after fixing ε, the “error
term” nε need not be small, so this does not translate to genuine δ-independence.
Exercise 3.2.
In an independent process (X, P, µ, T, S) let Q = T −1 (P). Then H(Pn ) = nH(P)
and H(Pn |Qn ) = H(P), so n1 (H(Pn ) − H(Pn |Qn )) = (1 − n1 )H(P) which increases
to its limit H(P).
Note that H(Pn ) − H(Pn |Qn ) = I(Pn ; Qn ). Exercise 1.6 shows lack of subadditivity:
I(P1 ∨ P2 ; Q1 ∨ Q2 ) may exceed I(P1 ; Q1 )+I(P2 ; Q2 ) (Exercise 1.6 shows failure
for Q1 = Q2 ). So, we cannot expect the sequence examined in this exercise to be
subadditive. Now we have shown it need not even have descending nths.
Exercise 3.3.
This follows directly from the Kolmogorov 0-1 law.
Exercise 3.4.
We have families Λk consisting of rk blocks of lenths nk . We have recursive relations
nk+1 = rk nk and rk+1 = rk !. The key observation is that there is in fact a unique
shift-invariant measure µ on our system X. It is so, because each block B ∈ Λk
7
appears exactly once in every block from Λk+1 . Since every x ∈ X is an infinite
1
concatenation of the blocks from Λk+1 , the ergodic theorem implies µ(B) = nk+1 for
any ergodic measure. This determines the measure to be unique.
In order to compute the entropy, we will anticipate a bit and use the variational prin-
ciple, which allows, in uniquely ergodic systems, to replace h(µ) by the easier topo-
logical entropy. Thus, we need to prove that the sequence n1k log #Bnk has a positive
limit, where Bnk is the family of all blocks of length nk appearing in X. Clearly,
#Bnk ≥ #Λk = rk , so it suffices to examine the sequence n1k log rk . It starts with
A = 21 log #Λ, which we may assume much larger than 1. Then, by Stirling’s formula,
1 rk log rk − rk 1 1
log rk+1 ≈ = log rk − .
nk+1 rk nk nk nk
P∞
So, we have (roughly) lim n1k log rk ≈ A − k=1 n1k . The sequence 1/nk starts with
1/2 and decreases much faster than exponentially, so its sum is smaller than 1. This
shows that n1k log rk has indeed a positive limit.
Exercise 3.5.
In any process (X, P, µ, T, S) with positive entropy take Bk = P[k,∞) . Then B =
T
kBk is the Pinsker sigma-algebra and h(P|B) = h(P) > 0. On the other hand, for
each k, h(P|Bk ) = limn n1 H(Pn |P[k,∞) ) ≤ limn n1 H(Pk ) = 0.
Exercise 3.6.
Let (ΛS , µ, σ, S) be a Bernoulli shift, i.e., µ = pS , where p is a probability distri-
bution on Λ. On the probability space (X, µ) consider the sequence of random vari-
ables Xn (x) = − log µ(Aσn x ), where Aσn x is the cell of PΛ containing σ n x. They
are independent and identically distributed with expected value H(p) (which equals
Pn−1
h(µ)). The Law of Large Numbers asserts that the averages n1 i=0 Xi converge al-
Pn−1
most everywhere to this expected value. It suffices to note that i=0 Xi (x) equals
− log µ(Anx ), i.e., the information function IPnΛ (x).
Exercise 3.7.
Notice that the map x 7→ T Rn (x) (x) preserves the measure (it sends each cylinder of
length n to itself and on every such cylinder it is just the induced map, which pre-
serves the conditional measure). Let us abbreviate this map by S. Thus, for each
i ≥ 1 the variables x 7→ Rn (S i x) have the same distributions as Rn , in particular the
Ornstein–Weiss’s assertion is fulfilled: limn n1 log Rn (S i x) → h(P) µ-almost every-
where. Given k, the variables
mn (x) = max{ n1 log Rn (S i x) : 1 ≤ i ≤ k} = 1
n log max{Rn (S i x) : 1 ≤ i ≤ k}
also converge to h(P) µ-a.e. Now
R(k) 2
n (x) = Rn (x) + Rn (Sx) + Rn (S x) + · · · + Rn (S
k−1
x)
lies between max{Rn (S i x) : 1 ≤ i ≤ k} and k max{Rn (S i x) : 1 ≤ i ≤ k}. Thus
1 (k) 1
n log Rn (x) lies between mn (x) and n log k + mn (x), which implies the desired
convergence.
8
Exercise 3.8.
Attention! In the formulation the word “eventually” is missing: . . . the cardinality of
blocks of length n eventually exceeds 2n(log l−ε) .
1
The numbers pk = (l−1)l k−2 decrease to zero as k → ∞, so the following convex
combinations of l − 1 and l
lk = pk (l − 1) + (1 − pk )l
9
Then, for every ε > 0, the joint measure of all blocks B of length m and compression
rate smaller than (h − ε)/ log #Λ tends to zero with m.
Fix an ε > 0. Consider the blocks B of length m and compression rate smaller than
(h − ε)/ log #P. Their compressed images are binary blocks of lengths smaller than
m(h − ε), so there at most 2m(h−ε) such blocks. In particular, the conditional entropy
of Pm on any subset of their union, denoted Am , is smaller than m(h − ε). Next, fix n
so large that n1 H(Pn ) < h + δ, where δ is some small positive number. In order for a
block B to satisfy Hn (B) ≤ h + δ we need Hn (B) to be closer to n1 H(Pn ) than the
(positive) difference h + δ − n1 H(Pn ). By continuity of entropy on finite-dimensional
vectors, it suffices that the vector pn,B (of frequencies in B of blocks of length n) is
very close to the vector p(µ, Pn ). By the Ergodic Theorem, for m sufficiently large,
this is satisfied for blocks B of length m covering a set Xm of measure at least 1 − δ.
By Lemma 2.8.2, if m is large enough, the cardinality of blocks in Xm does not exceed
2m(h+δ) . In particular, the conditional entropy of Pm on any subset of Xm is smaller
than m(h + δ). We divide X into three parts: Xm ∩ Am (treated as a subset of Am ),
Xm \ Am (a subset of Xm ) and the rest, X \ X ′ , whose measures are estimated from
above by µ(Am ), 1 − µ(Am ) and δ, respectively. We have
Exercise 3.10.
4 Exercises in Chapter 4
Exercise 4.1.
In fact, it suffices to assume that the sequence Pk generates under the action, that is,
[−n,n]
the double sequence of partitions Pnk (in case S = N0 ) or Pk (in case S = Z)
((k, n) range over N0 × N0 ) generates A.
Given any finite partition P of cardinality m, let k and n be so large that there exists a
[−n,n]
partition P′ 4 Pnk (in case S = N0 ; P′ 4 Pk for S = Z) also of cardinality m, with
′
dR (P, P ) < ε (recall that in Pm the metric dR is uniformly equivalent to d1 ). Then,
by (2.4.10), regardless of the conditioning sigma-algebra, |h(P|B) − h(P′ |B)| < ε,
[2n+1]
which implies h(P|B) < h(Pk |B) + ε = h(Pk |B) + ε (we have also used Fact
2.4.1). Taking the (increasing) limit over k on the right and then supremum on the left,
we get h(A|B) ≤ limk h(Pk |B) + ε. Now we can remove ε and combine the result
with the obvious converse inequality.
10
Exercise 4.2.
This is now an immediate consequence of the preceding exercise and Theorem 2.5.1.
Exercise 4.3.
We will prove the formula h(µ, T n |B) = |n|h(µ, T |B) assuming B to be invari-
ant only for negative n, otherwise it works for subinvariant B (regardless of S). We
copy the proof of Fact 4.1.14 almost verbatim. By Fact 2.4.19 (the conditional ver-
sion; this is where invariance of B may be needed), we have, for every partition P:
h(µ, T n , P|B) ≤ h(µ, T n , P|n| |B) = |n|h(µ, T, P|B). Now we apply supP and get
Exercise 4.4.
We will actually prove Theorem 4.2.9 without using Remark 4.2.7 and then prove that
remark using the theorem. Given a finite partition Q measurable with respect to Πµ ,
we can approximate it up to ε in d1 (equivalently in dR ) by a partition Q′ of the same
Wk
cardinality as Q, measurable with respect to a finite join i=1 ΠPi for some partitions
Pi . We do not need Remark 4.2.7 for that; the partitions Pi need not be members of
any a priori fixed sequence of partitions. The difficulty lies in understanding the join of
possibly uncountably manySsigma-algebras. In fact, Πµ is, by definition, the smallest
sigma-algebra containing ΠP (union over all finite partitions P). But every set in
Πµ is obtained via countably many setWoperations involving at most countably many
∞
sets in that union, so it is contained in i=1 ΠPi for some sequence of finite partitions
Pi . Now we can use the usual approximation within this countable join. The rest
of proof of Theorem 4.2.9 is (almost) unchanged. We remark, that without assuming
Wk
the partitions Pi to be linearly ordered by 4, it is no longer true that the join i=1 ΠPi
Wk
equals ΠP , where P = i=1 Pi ; it is only refined (which is very easy to see). There are
examples where the converse refining fails (perhaps this should be better emphasized
in the book). Anyway, the valid direction is sufficient in the proof of Theorem 4.2.9.
We shall now prove Remark 4.2.7 in a stronger version, which includes the case of
one generating partition. Namely, we will only assume that the refining sequence Pk
generatesWunder the action (as we did in the solution of Exercise 4.1).
∞
Clearly, k=1 ΠPk 4 Πµ . We need to prove the converse. Let Q be a partition mea-
surable with respect to Πµ . Then, by Theorem 4.2.9, h(µ, Q, T ) = 0 and hence, by
the power rule (Fact 2.4.19), h(µ, T n , Q) ≤ h(µ, T n , Qn ) = 0 for every n ≥ 1. Now
we approximate Q by Q′ up to ε in dR , where Q′ 4 Pk for some k (here it is im-
portant that Pk is a refining sequence). By (2.4.10) (for T n and trivial B), we have
|h(µ, T n , Q′ ) − h(µ, T n , Q)| < ε, i.e., h(µ, T n , Q′ ) < ε for all n ≥ 1. Further, no-
tice that h(µ, T n , Q′ ) = H(Q′ |Q′{n,2n,3n,... } ) ≥ H(Q′ |Q′[n,∞) ). As a consequence,
[n,∞)
H(Q′ |Pk ) <Wε for every n. Now we invoke (1.7.14) and get ′
W H(Q |ΠPk ) < ε, all
′
the more H(Q | k ΠPk ) < W ε. We use (1.6.36) and get H(Q| k ΠPWk ) < 2ε. Since
this is true for every ε, H(Q| k ΠPk ) = 0 and (1.6.28) implies Q 4 k ΠPk .
11
Exercise 4.5.
We have h(A|B) = supP h(P|B) > 0, where P ranges over all finite A-measurable
partitions of X. Thus, there exists a finite partition P of X such that h(P|B) > 0.
We restrict our attention to the system generated jointly by P and (Y, B, ν, S, S). We
will prove that almost every y has an infinite preimage already in this system. In other
words, we can assume that A = PS ∨ B.
Suppose that the set of points y with finite preimages π −1 (y) has positive measure ν.
For y in this set, the points in the preimage of y are distinguished by their P-names, so
there is a minimal ny such that all these points are in different cells of P[−ny ,ny ] (or
just Pny for S = N0 ). Some value of ny , say n0 must occur with positive measure ν,
say on a set A0 . By ergodicity, the value of the time of the first visit in A0 is finite
almost everywhere, so it is bounded by some n1 on a set A1 of measure ν larger than
δ = h(A|B) S
log #P . That means that relatively on A1 , P ∨ B = R ∨ B, where R is a finite
partition (R can be P[−n,n] joined with the partition determined by the entry times to
D trimmed at n1 ). Now, for every n, we can write
H(Pn |B) ≤ H(Pn |B ∨ κ) + H(κ) ≤ (1 − δ)HA1 (R|B) + δ log #Pn + H(δ, 1 − δ),
where κ is the partition into A1 and its complement. After dividing by n and passing
with n to infinity, the left hand side converges to h(A|B). The first and last terms on
the right hand side (divided by n) decrease to zero, while the middle term becomes, by
the choice of δ, strictly smaller than h(A|B), a contradiction.
Exercise 4.6.
By (2.3.5) and Exercise 2.4 (with trivial B), h(P|Q) ≤ H(P|P+ ∨QS ) in any case of S.
Further,
X
h(P|Q) ≤ H(P|P+ ∨ QS ) ≤ H(P|P+ ∨ QS ∨ Q) = µ(B)HB (P|P+ ∨ Q+ ).
B∈Q
The full future P+ ∨ Q+ of the joint process, restricted to B, obviously contains the
future of the process generated by P with respect to the induced map on B, denoted
by PB,+ (all we need to determine the forward PB,+ -name of an x ∈ B is its forward
P-name and the return times to B, which are Q+ -measurable). So, we get
X X
h(P|Q) ≤ µ(B)HB (P|PB,+ ) = µ(B)h(µB , TB , P).
B∈Q B∈Q
To see an example with sharp inequality, suppose h(P0 ) = h > 0, while Q generates a
p n
periodic factor with some period p. Set P = PP0 . Then h(P|Q) = h(P0 ) = h. Notice
p
that for each B ∈ Q, TB = T . Since µ = B∈Q µ(B)µB and all these measures are
T p -invariant, by affinity of the dynamical entropy and the power rule, we have
X
µ(B)h(µB , T p , P) = h(µ, T p , P) = h(µ, T p , Pp0 ) = ph(µ, T, P0 ) = ph > h.
B∈Q
12
Exercise 4.7.
Just take any endomorphism of finite entropy that does not admit a unilateral generator.
For example, T can be invertible with positive entropy, yet we consider only the action
of N0 . If you want a genuine (not invertible) endomorphism, consider the direct product
of a bilateral Bernoulli shift with a unilateral Bernoulli shift. If there existed a unilateral
generator P, the bilateral Bernoulli factor would be, by invariance, measurable with
respect to P[n,∞) for every positive n, and hence with respect to the Pinsker sigma-
algebra, which is impossible due to positive entropy.
Exercise 4.8.
For any pair of partitions P and Q measurable with respect to A and B, correspondigly,
we have
Dividing by n and passing to the limit we get h(P ∨ Q|C′ ) = h(P|C′ ) + h(Q|C′ ) It
suffices to take supremum over all such pairs of partitions, and notice that the joined
partitions P ∨ Q generate the joined sigma-algebra A ∨ B, to get the desired equality
h(A ∨ B|C′ ) = h(A|C′ ) + h(B|C′ ).
Exercise 4.9.
rε X r−2
1X
r−2
> H(Qr ) ≥ H(Qr |κ) ≥ µ(T n (A))HT n (A) (Qr ) ≥ HT n (A) (Qr ).
2 n=0
r n=0
1 X
r−2
ε r2
HT n (A) (Qr ) <
r − 1 n=0 2r−1
ε r2
which implies that for at least one index n0 ∈ {0, n − 2}, HT n0 (A) (Qr ) < 2 r−1 .
13
Let Q′ be defined as Qr intersected with A′ = T n0 (A) and the complement of A′ in one
piece. Notice that the partition Q′ generates the same process as Q: meaningful symbols
symbols in the Q′ -name of a point x occur at coordinates n for which T n x ∈ A′ and
they encode the blocks of length r starting at n in the Q-name of x, while the next such
symbol is not further than r positions forward. Denoting by R the partition into A′ and
its complement, the static entropy of Q′ can be estimated as follows
Exercise 4.10.
14
Part 2
6 Exercises in Chapter 6
Exercise 6.1.
Consider the (unilateral or bilateral) subshift of finite type on three symbols {0, 1, 2},
where we prohibit repetitions 00, 11, and 22. Let the cover U depend on the zero ©
coordinate and consist of the unions of two symbols each: U = 0 ∪ 1 , 1 ∪ 2 , 0 ∪ 2 .
There are 6 admitted words of length 2: 01, 02, 12, 10, 20 and 21 and they are covered
by two sets (for example) (0 ∪ 1) × (1 ∪√ 2) and (1 ∪ 2) × (1 ∪ 0) (in the order they are
written). So, 12 H(U2 ) = 21 log 2 = log 2. To keep 31 H(U3 ) not increased, we need
√
N (U3 ) not larger than 2 2 (strictly smaller than 3), which means that we would have
to cover all admitted blocks of length 3 by only two elements of U3 . This is impossible,
because there are 12 admitted words of length 3, while each element of U3 contains
at most 4 of them; in the definiton of U ∈ U3 we must specify 3 pairs of different
symbols, so at least two symbols must be used twice, which means that U (containing
a priori 8 words) contains at least 4 forbidden words, hence at most 4 admitted words.
Exercise 6.2.
m+n
This is a direct consequence of (Un )m = Un+m and the convergence m →
m
1.
Exercise 6.3.
Attention! The statement is in general false. Any system (X, T, S) is topologically
conjugate to the subsystem of the unilateral shift on X N0 consisting of the forward
orbits. In the product metric
X
∞
1
dp ((xn ), (yn )) = d(xn , yn )
n=0
2n
the shift map is Lispshitz with the constant c = 2, while it can have arbitrarily large
topological entropy (the same as (X, T, S)).
The statement does hold if c ≤ 1 (an important application of that is, that all isometries
have entropy zero). In such case all the metrics dn are equal to d hence the number of
(n, ε)-separated points does not grow with n.
For arbitrary Lipshitz constants c the statement is valid for C 1 interval maps in the
standard metric, which follows e.g. from the Margulis-Ruelle Inequality (9.4.1) (and
the Variational Principle).
Exercise 6.4.
Let P = {I1 , I2 , . . . , IN } be the partition of [0, 1] into the branches of monotonicity
(intervals on which T is monotone) I1 = [0, a1 ), I2 = [a1 , a2 ), . . . , IN = [aN −1 , 1].
Notice that for each n ≥ 1 the cells J ∈ Pn are in fact intervals on which all the iterates
T, T 2 , . . . , T n−1 are monotone. We will estimate the number of (n, ε)-separated points
15
contained in J. Consider the intervals between neighboring (n, ε)-separated points in
J. Each of them must be stretched to at least the length ε by one of the functions
T, T 2 , . . . , T n−1 . By monotonicity, each of these functions can stretch at most 1/ε
of these intervals, because their images are disjoint. Together, at most n/ε intervals
can be stretched, which limits the cardinality of the points to n/ε + 1. Now, the total
number of (n, ε)-separated points is at most N n ( nε +1) (where N n bounds the number
of cells J). The desired estimate of h(T ) is obtained by taking the logarithm, dividing
by n, passing with n to infinity and then letting ε tend to zero.
Exercise 6.5.
Let T ∨ S denote the joining of T and S within the common extension. Fact (6.4.13)
applied twice (first to T ∨ S and S, then to T ∨ S and T ) and the triangle inequality
yield
|h∗ (S) − h∗ (T )| ≤ h(T ∨ S|S) + h(T ∨ S|T ).
By (6.5.8), h(T ∨ S|S) = h(T |S) and h(T ∨ S|T ) = h(S|T ).
Exercise 6.6.
First take n ≥ 0. Notice that (Un )m , where the exponent m refers to the action of T n ,
equals Unm (in the action of T ). So,
(since the limit defining h(T, U|V) exists, it is achived along the subsequence nm).
Further,
n
On the other hand, since U < U, we also have
h(T n |Vn ) = sup h(T n , U|Vn ) ≤ sup h(T n , Un |Vn ) = nh(T |V).
U U
n n
We have proved the equality h(T |V ) = nh(T |V). We proceed similarly with the
conditioning covers:
h∗ (T n ) = inf h(T n |W) ≤ inf n h(T n |Vn ) = inf nh(T |V) = nh∗ (T ),
W W=V V
n
and, since V < V,
16
Exercise 6.7.
For n ≥ 0 we use again that (Un )m , where the exponent m refers to the action of T n ,
equals Unm . We have
This time we have no subadditivity to deduce the existence of the limit. Instead, to
prove the converse inequality, we will use monotonicity. Every positive integer can
be written as i = mi n − ri with mi ≥ 1 and 0 ≤ ri ≤ n − 1. Then H(Ui |y) ≤
H(Umi n |y), so
mi n
because i →
i
1. Further, the definition of h(T |y) involves the supremum over U,
which is handled identically as in the preceding exercise (we skip rewriting). If we
replace the conditioning y by a measure ν, the only essential difference in the defini-
tions is the presence of inf rather than lim sup. So now we first derive h(T n , Un |ν) ≥
nh(T, U|ν), and we use monotonicity for the converse inequality (we represent i as
mi n + ri , so that i ≥ mi n).
We remark that for an invariant measure ν we can alternatively use Corollary 6.7.4 (c)
and (d), and power rules pass to h(T, U|ν) and h(T |ν) via integration.
Attention! For negative n the equalities h(T n , U|n| |y) = |n|h(T, U|y) and h(T n |y) =
|n|h(T |y) may actually fail. An easy example is the subshift of finite type on three
symbols Λ = {0, 1, 2} where we prohibit the blocks 10 and 20, the cover U equal to the
zero-coordinate partition PΛ , and the factor map that glues together the symbols 1 and 2
(to a symbol denoted in the factor as 1). The fiber of y = . . . 000111 . . . (we underline
the coordinate 0) intersects 2n−1 elements of Un , while only one element of U[−n+1,0] .
Hence h(T n , U|y) = log 2 6= 0 = h(T −1 , U|y). It is not hard to see that in fact
h(T −1 , U|y) = 0 for any other finite cover U, so h(T |y) ≥ log 2 6= 0 = h(T −1 |y).
Nevertheless, the equalities h(T n , U|ν) = |n|h(T, U|ν) and h(T n |ν) = |n|h(T |ν)
do hold whenever ν is invariant. It is so because H(Un |y) = H(U[−n+1,0] |T −n+1 y).
The change of the variable vanishes after integrating with respect to an invariant mea-
sure, so H(Un |ν) = H(U[−n+1,0] |ν), which easily implies the above two equalities
for n = −1 (and hence for all n < 0).
Exercise 6.8.
See Exercise 6.2.
Exercise 6.9.
This is nontrivial only when h(T ) < ∞. Then all involved measure-theoretic entropies
are finite. By the Inner Variational Principle (Theorem 6.8.4), then Fact 4.1.6, and
finally the Variational Principle, we have
17
Exercise 6.10.
Let π = ψφ. We have, using consecutively the Inner Variational Principle, the Varia-
tional Principle and the Conditional Variational Principle,
h(T |ξ) = sup h(µ|ξ) = sup h(µ|φ(µ)) + h(φ(µ)|ξ) ≤
µ∈π −1 (ξ) µ∈π −1 (ξ)
Exercise 6.11.
Let both spaces be {0, 1, 2}, let both covers be U = V = {{0, 1}, {0, 2}, {1, 2}}. We
have N (U) = N (V) = 2, while N (U ⊗ V) = 3 < 4 (the product space is covered for
example by {0, 1} × {0, 1}, {1, 2} × {1, 2} and {0, 2} × {0, 2}).
Exercise 6.12.
The inequality 2. is easy and can be derived using only the Outer Variational Principle
and (6.5.8), as follows:
h(T |ξ) ≤ h(T |R) = h(S ∨ R|R) ≤ h(S).
(An alternative way is via the Inner Variational Principle and the second inequality in
Fact 4.4.3. Yet another alternative is to first prove 1. and then use Corollary 6.7.4 (d).)
The inequality 1. is a bit harder. By Definition 6.5.2, Fact 6.5.9 (and its proof), we
can think of (X, T, S) as a subsystem of (Y, S, S) × (Z, R, S) and we can restrict our
attention to product covers U⊗V. Note that (U⊗V)n (the exponent refers to T = S×R)
equals Un ⊗ Vn (exponents refer to S and R, respectively). At any point z ∈ Z we
have H(Un ⊗ Vn |z) ≤ H(Un ). We divide both sides by n, pass to lim supn , then
apply supremum over all pairs of covers U and V.
7 Exercises in Chapter 7
Exercise 7.1.
Denote our subshift by (X, T, S). The inequality lim supk p1k log #Bk ≤ h(T ) is
obvious; Bk contains only blocks of length pk appearing in our subshift (usually not all
of them).
To derive the converse inequality consider the set Ak of points having a pk -periodic
marker at the coordinate zero. Clearly, Ak is compact, T pk -invariant and with the
action of T pk it is conjugate to a subshift over the alphabet Bk . Thus the topologi-
cal entropy of (Ak , T pk , S) does not exceed log #Bk . The space X contains the dis-
joint union X ′ of sets Ak , T (Ak ), . . . , T pk −1 (Ak ), and the systems (T i (Ak ), T pk , S)
are factors of (Ak , T pk , S) via T i , so their topological entropies are not larger than
log #Bk . In addition, in the unilateral case, there maybe points not belonging to X ′
but all such point fall into X ′ after less than pk iterates. This proves that h(T pk ) ≤
log #Bk on the entire space X and, by the power rule for topological entropy (Fact
6.2.3), h(T ) ≤ p1k log #Bk , for every k. This implies the existence of the limit and the
desired equality.
18
Exercise 7.2.
Exercise 7.3.
Again, this is an exercise in topological dynamics and has little to do with entropy.
We begin with the remark that the Marker Lemma 7.5.4 applies in fact to continuous
maps, not necessarily homeomorphisms. If we replace the starting clopen cover U by
U′ = T −nm (U), then each U ∈ U′ has clopen forward images through nm iterates
and all the sets Fj constructed in the proof (including the marker set F ) are clopen
together with their n backward and forward images. We skip further details here (see
[Downarowicz, 2008]).
In any zero-dimensional system without periodic points we can mimic the odometer
factor. The only difference is that the analogs of the pk -periodic markers will not
appear periodically, yet with gaps ranging between pk and some p′k > pk . Here is
how we do. We fix a sequence (pk ) and the associated quotients qk ≥ 2 just as in
Definition A.3.1. We find a p1 -marker set F1 . Since there are no periodic points in X,
all orbits visit F1 and the gaps between the visits range between p1 and 2p1 − 1. In the
induced system (F1 , TF1 , S) we find a q1 -marker F2 . Since the induced system has no
periodic points, every TF1 -orbit visits F2 with gaps ranging between q1 and 2q1 − 1,
which implies that every T -orbit in X visits F2 with gaps ranging between p1 q1 = p2
and p′2 = (2p1 − 1)(2q1 − 1) > p2 . Proceeding inductively we construct a decreasing
sequence of marker sets Fk . Abusing slightly our convention, we will call them k-
markers (they are in fact pk -markers). If we visualize the k-markers in the array-name
representation of our system (in form of vertical bars in the kth row) then in every
array x we see the (k + 1)-markers only at coordinates where k-markers occur, there
are at least qk k-markers between two consecutive (k+1)-markers, while the distances
between two consecutive (k + 1)-markers are bounded. The blocks appearing in row
k between two neighboring k-markers will be called k-blocks. The (finite) collection
of the k-blocks appearing in the system will be denoted by Bk (this time the blocks
in Bk have various but bounded lengths). In injective systems the arrays are bilateral,
so the kth row of every x is a concatenation of the k-blocks (there is no problem with
truncated k-blocks at the left end).
We are readyS∞to encode our system using a countable alphabet. The alphabet is going
to be Λ = k=1 Bk ∪ {∗}, where ∗ is added as the topological accumulation point, so
that Λ is homeomorphic to the one-point compactification of N0 .
We now define the map φ from X into the shift over Λ by describing the image y =
φ(x) ∈ ΛZ of every x ∈ X. We will encode x “row after row”. We encode the
19
first row by placing in y, at the positions of all 1-markers in x, the symbols from B1
representing the 1-blocks that follow these markers in x. Since p1 ≥ 2, every sector in
y between the positions of two consecutive 1-markers has at least one unfilled position.
We now encode the second row of x by placing in y, in the first empty slot between two
2-markers of x, the symbol from B2 representing the block sitting there in the second
row of x. Since q1 ≥ 2, after this step every sector in y between two 2-markers has
at least one empty slot. We continue in this manner through all rows. All eventually
unfilled positions in y we fill with the stars. (The situation resembles that on Figure
7.2, except that the cuts are not exactly at equal distances and that in every step the
information is stored in only one symbol per “period”, so in the end there will be
much more unfilled space in y.) It is clear that so defined map x 7→ y is continuous:
every symbol (except the star) in y is determined by a bounded rectangle in x (i.e., its
preimage is clopen). The star alone is not an open set, while any open neighborhood
of the star is a complement of finitely many other symbols, so its preimage is also a
clopen set. It is evident that so defined map φ commutes with the shift transformation.
To see that it is injective, note that we can easily reconstruct from y the consecutive
rows of x. For k = 1, we locate in y the symbols belonging to B1 . Their positions
determine the 1-markers and the symbols themselves provide information about the
contents of the corresponding 1-blocks in x. We continue inductively: Suppose the
kth row of x nas been reconstructed (together with the k-markers). We locate in y all
symbols belonging to Bk+1 , and then we “unload” their contents each time starting at
the nearest k-marker to the left, where we also place a (k+1)-marker. So, the map φ is
a topological conjugacy of X with its image.
To see that periodic points are an obstacle, take the identity map on the Cantor set.
Every point is a fixpoint, so in any subshit it must be represented by a sequence filled
with one symbol. Thus, uncountably many symbols are needed to encode all points.
To see how the above fails in non-injective systems, consider an odometer plus a Can-
tor set which is sent by T to one point (say x) in the odometer. No matter how we
encode the system as a unilateral shift, the sequence representing x must admit un-
countably many shift-preimages, that is one-coordinate prolongations to the left. So,
an uncountable alphabet is needed.
Exercise 7.4.
Let h = h(T ). Let Bn denote the family of all blocks of length n occurring in X.
Let Λ be an alphabet of cardinality ⌊2h ⌋ + 1. It is important that log #Λ > h, so
we can invoke our Exercise 3.8 (and the remark following the solution): there exists a
“better than prefix-free” family C of blocks over Λ such that denoting by Cn the family
of blocks of length n contained in C, we have #Cn ≥ 2n(h+ε) for some ε > 0 and n
sufficiently large. On the other hand, we know that log #Bn < 2n(h+ε) for large n.
So, we can find an n0 such that for every S n ≥ n0 , #B Sn ≤ #Cn . Then there exists
an injective length-preserving map Φ : n≥n0 Bn → n≥n0 Cn . Now we apply the
Marker Lemma and find an n0 -marker. The code φ from X into ΛZ is constructed as
follows: we cut every x at the markers into blocks of lengths at lest n0 (and bounded).
Then we replace every such block B by Φ(B). Because Φ is length-preserving, this is
a shift-invariant procedure, and by boundedness of the blocks, it is continuous. Since
20
the image of Φ is contained in a “better than prefix-free” family, the cutting places (i.e.,
the markers) can be reconstructed in every φ(x), and because Φ is injective, we can
then reconstruct x completely. Thus, the code a topological conjugacy of X with its
image.
To see how the above fails in non-injective systems, consider a unilateral subshift X
over a finite alphabet Λ and with entropy smaller than log #Λ. We enhance the subshift
by adding points of the form ax, where x ∈ X and a is a single symbol belonging to
a strictly larger alphabet Λ′ ⊃ Λ. The enhanced system is a subshift over the alphabet
Λ′ . Since all its points fall into X ′ after one iterate, the topological entropy of the
enhanced subshift is the same as that on X. Nonetheless, in any unilateral subshift
representation, each point from X has #Λ′ shift-preimages, so an alphabet of at least
such cardinality is needed.
Exercise 7.5.
In every bilateral subshift any element of a periodic orbit of period n has the form
. . . BBB. . . , where B has length n. Moreover, at most n different blocks B produce
elements of the same orbit. Because there are ln blocks of length n, using an alphabet
of cardinality l we can produce at most ln and at least ln /n different periodic orbits
with period n. Let X be the union of ln /n disjoint orbits of period n modeled as a
subshift over l symbols. The entropy of such a primitive system is zero, so, if Exercise
7.4 worked, X should admit a representation over two symbols. But with two symbols
we can model at most 2n periodic orbits with period n. For l large enough, ln /n > 2n ,
a contradiction.
8 Exercises in Chapter 8
Exercise 8.1.
We have EH ≡ ∞ and α0 = ℵ0 . The best way to see this is by examining the transfi-
nite sequence. Notice that the kth tail θk equals 1 on the dense set {xk+1 , xk+2 , . . . },
so θk ≡ 1. This implies u1 ≡ 1. Now adding u1 to the tails only shifts the picture up
by a unit, hence u2 ≡ 2, and, inductively, uα ≡ α, for natural α. This clearly implies
uℵ0 ≡ ∞ and this is where the transfinite procedure stops for the first time.
21
r, at which the inductive hypothesis holds. For each k we have
............ ... ...
(u + θk )(x) ≤ u(x) + θ k (x) = lim sup u(x′ ) − u(x) + θek (x) − θk (x) ≤
x′ →x
The first term equals r times the defect of u1 , which is zero, because u1 is upper semi-
continuous. (All functions uα in the transfinite sequence are upper semicontinuous
– this is obvious from the definition, but perhaps not sufficiently emphasized in the
book). Now we let k tend to infinity, and then θek (x) decreases to u1 (x) and θk (x)
decreases to zero, so the entire expression tends to zero, as required.
Exercise 8.3.
Each entry in the matrix, say Mn,r , representing un (x) at points of order r can be
verbalized as
¨
a0 + a1 + · · · + ar−1 , for r ≤ n
Mn,r =
“the maximal sum of n different terms indexed up to r − 1”, for r ≥ n,
where by “terms” we mean the numbers ai . Notice that the maximal sum of n different
terms from a set of nonnegative numbers dominates all shorter sums from this set,
so the above second case description can be written as “the maximal sum of up to n
different terms indexed up to r − 1”. This phrasing includes the first case, because for
r ≤ n the maximal such sum is clearly the sum of all terms indexed up to r − 1. Thus,
is the general form, including also M0,r for all r (any sum of 0 terms is 0). We will
verify that un (x) = Mn,ord(x) by induction on n.
For n = 0 the formula holds. Assume it holds for some n ≥ 0. We need to evaluate
un+1 . Take a point x and denote r = ord(x). For k sufficiently large θk (x) = 0 and
then (un + θk )(x) = un (x) = Mn,r . Every neighborhood of x contains (in spite of x)
only points x′ of orders r′ ≤ r − 1, moreover, for every such r′ it contains infinitely
many points of order r′ . Thus, no matter how large k, the function (un + θk ) assumes
within this neighborhood the value Mn,r′ + ar′ , i.e.,
Alltogether, un+1 (x) equals the maximum over r′ ≤ r − 1 of the above maximal sums
and Mn,r . It is hence clear that un+1 (x) does not exceed “the maximal sum of up to
n + 1 different terms indexed up to r − 1”, i.e., Mn+1,r .
22
But every “sum of up to n + 1 different terms indexed up to r − 1” has its maximal
index, some r′ ≤ r − 1, and then this sum is a “sum of up to n + 1 different terms
indexed up to r′ including ar′ ”, and is taken into account in the maximum defining
un+1 (x). This implies that, un+1 (x) = Mn+1,r .
Exercise 8.4.
Exercise 8.5.
Attention! The formulation of the exercise contains a misprint. It should say not about
superenvelopes only about repair functions.
The Tarski-Knaster Theorem asserts that any order-preserving operator P : L → L
defined on a complete lattice L has its smallest fixpoint. Recall that a complete lattice
is a partially ordered set in which every subset has its infimum (greatest lower bound)
and supremum (smallest upper bound). In our case, the lattice will be the collection
of all nonnegative upper semicontinuous functions on the domain X, where we include
the constant infinity function. The infimum of s subset A of L is simply the pointwise
infimum inf{h : h ∈ A}, while the supremum equals sup{h å : h ∈ A} or the infinity
function when the supremum is unbounded. Given an increasing sequence H = (hk )
of nonnegative functions on X, tending to a finite limit h (hence we also have the tails
θk = h − hk ) we define the operator P : L → L by
It is immediate to see that the operator is well defined (the image functions belong to L)
and preserves the order. So far, we have just recalled what was given in the formulation
of the exercise. We need to verify that fixpoints of P are exactly the repair functions of
the tails of H. Then the smallest fixpoint will coincide with the smallest repair function
uH . To this end we write
P(u) = u ⇐⇒ lim u^
+ θk = u ⇐⇒ lim(u^
+ θk − u − θk ) = 0 ⇐⇒
k k
............
lim (u + θk ) = 0 ⇐⇒ u is a repair function.
k
Exercise 8.6.
23
Exercise 8.7.
The derivation of the first statement is based on the implication f ≥ g =⇒ fe ≥ ge.
The converse need not hold, for example if X = [0, 1] with θk ≡ k1 and θk′ = k1 1IQ ,
where Q denotes the set of rational numbers in [0, 1].
Exercise 8.8.
On a compact domain, say, K, our upper semicontinuous function f attains its maxi-
mum y0 at some point x0 . Since K is convex, the Choquet Theorem asserts that there
exists a probability distribution ξ supported by exK with bar(ξ) = x0 . But f is also
convex, so by Fact A.2.10 it is supharmonic, and thus
Z
y0 = f (x0 ) = f (bar(ξ)) ≤ f (x) dξ.
Since f (x) ≤ y0 at all points, this inequality is only possible when it is an equality and
f = y0 ξ-almost everywhere. In particular, f (x) = y0 at at least one point in exK.
Exercise 8.9.
H|
Attention! We do not show that uH α |exK = uα
exK
, only that both determine uH
α via
the same operations. We do not invoke Lemma 8.2.13 directly, only the same proving
methods.
This is a hard exercise. We claim the following
harM [K]
H|exK harM [K]
uH
α = uH
α |exK = uα .
The statement obviously holds for α = 0. Suppose it holds for all β < α. Recall that
vαH stands for supβ<α uH
β . We now write a sequence of (in)equalities and then we will
explain why each of them is true.
harM [K]
(1) (2) H|exK harM [K] (3)
uH
α ≥ uH
α |exK ≥ uα =
![K]
å harM
å harM [K]
H|exK (4) H| (5)
lim vα + θκ |exK = lim vα
exK
+ θκ |exK =
κ κ
2 3
å harM [K] (6)
åharM [K] (7)
H|exK H|
lim vα + θκ |exK ≥ lim 4 vα exK + θκ |exK 5 ≥
κ κ
2 3
harM [K]
å (8)
lim 4 vα
H|exK
^
+ θκ 5 ≥ lim (vαH
(9)
+ θκ ) = uHα.
κ κ
At first notice that since each hκ is harmonic, it is affine, so h is affine (although not
necessarily harmonic), hence each θκ is affine. This makes θeκ concave (Fact A.2.5),
and by an easy induction, all functions uH α are concave. On the other hand, for any
24
R R R
distribution ξ we have h dξ ≥ hκ dξ = hκ (bar(ξ)) for every κ, hence h dξ ≥
h(bar(ξ)) and h is shown to be subharmonic. Also each θκ = h − hκ is subharmonic.
harM
(1) is derived as follows: uH
α |exK is upper semicontinuous, so
harM [K] harM Z Z
uH
α |exK = uH
α |exK (ξ) = uH
α |exK dξ = uH H
α dξ ≤ uα (x),
where ξ is some distribution on exK with barycenter at x, and the last inequality is a
consequence of uH
α being concave and upper semicontinuous, hence supharmonic.
H|
(2) results from the fact that throughout the transfinite induction leading to uα exK (at
a point in exK) the “tildes” are taken in the context of exK, so they produce not larger
functions than the “tildes” taken in the wider context of K, leading to uHα at this point.
(3) is just the transfinite definition applied to the restriction H|exK .
In (4) we pull the decreasing limit outside the harmonic extension. Since the harmonic
extension relies on integrals, we need a kind of Lebesgue Theorem. The functions
are bounded from some index on (otherwise the case is trivial, as we have infinity on
the right), so if the net is actually a sequence, we can use the Lebesgue Dominated
Theorem. For nets, however, we must invoke a stronger result: for a decreasing net of
upper semicontinuous functions, the integral commutes with the limit (see e.g. (A7) in
the Appendix of [Downarowicz-Serafin, 2002]).
In (5) we exchange the limit with the push-down. Since the fibers are compact, the
functions are upper semicontinuous and decrease, this can be done by virtue of the
exchanging suprema and infima Fact A.1.24.
In (6), for each κ we delay the application of “tilde” till after the harmonic extension
and the push-down. (“Tilde” commutes with the harmonic extension because we are
on a Bauer simplex, so it is not important in what order we interpret them applied.)
For the push-down the inequality is obvious, because the function on the left is upper
semicontinuous (see Fact A.1.26) and dominates the function without the “tilde” on the
right.
When evaluating the push-down on the left hand side of (7) at some x ∈ K we must
integrate the sum of two functions with respect to all measures supported by exK) with
barycenter at x. Since θκ is subharmonic, the integral of θκ with respect to such a
measure will be always at least θκ (x). So the integral of the sum will always be at least
the integral of the first function plus θκ (x). Now we can take the supremum over all
such measures.
[K] [K]
H|exK harM H|exK harM
For (8) note that vα ≥ uβ for every β < α. By the
inductive assumption, we replace the latter by uH
β , and then we apply supremum over
all β < α.
(9) is just the transfinite definition.
The claim about the order of accumulation is now obvious.
Exercise 8.10.
This is a direct consequence of two inequalities: H(µ, U ∨ V) ≤ H(µ, U) + H(µ, V)
and H(µ, T −n (U)) ≤ H(µ, U). The first one holds since whenever P < U and Q < V
25
then P ∨ Q < U ∨ Q and H(µ, P ∨ Q) ≤ H(µ, P) + H(µ, Q). The second is true by
invariance of µ and since whenever P < U then T −n (P) < T −n (U).
Exercise 8.11.
For the first inequality note that any P inscribed in U has diameter at most diam(U),
for the second – that any P of diameter smaller than Leb(U) is inscribed in U.
Exercise 8.12.
Just note that the cover Un is inscribed in the cover constituted by the (n, diam(U))-
balls and that the cover by the (n, Leb(U))-balls is inscribed in U. Then use the mono-
tonicity (6.3.5).
Exercise 8.13.
Given a cover U and a set A, the smallest cardinality of a subfamily of Un covering A
is at least equal to the maximal cardinality of (n, diam(U))-separated set. This easily
implies that h(T, U|F, V) ≥ h(T, diam(U)|F, V). On the other hand, any maximal
(n, Leb(U))-separated set E in A is also (n, Leb(U))-spanning in A. Each element of
E is contained, together with its (n, Leb(U))-ball, in an element of Un . In this manner
we select a subfamily of Un which covers A and has at most the cardinality of E. This
implies h(T, U|F, V) ≤ h(T, Leb(U)|F, V). Now we can apply the above to a refining
sequence of covers Uk (then both diam(Uk ) and Leb(Uk ) tend to zero).
Exercise 8.14.
Attention! Implicitly, V is assumed finite. Otherwise I don’t know how to proceed.
This is an extremely unpleasant exercise. The reason I put it is to illuminate how
convenient it is to have entropy structure defined as a uniform equivalence class. We
may afford not to care much about measurability of functions in one particular entropy
structure because in the same class there are other sequences of functions known to be
measurable (even upper semicontinuous). In [Downarowicz, 2005a], I simply used the
upper integral to extend the function h(T |µ, V) to nonergodic measures.
The strategy is to assume that X is zero-dimensional, and (1) approximate V by a
sequence of clopen covers Vk , i.e., having clopen (not necessarily disjoint) cells and
then (2) prove the assertion for such clopen covers. In step (3) we will apply principal
extensions to get rid of the zero-dimensionality assumption.
(1) We fix a finite open cover V (of our zero-dimensional space X) and temporarily we
also fix an ergodic measure µ. We will exploit the following variant of Lemma 8.3.20:
If W is another cover then
h(T |µ, W) ≤ h(T |µ, V) + lim inf{h(T, V|F, W) : µ(F ) > σ}.
σ→1
The proof is exactly the same as that of Lemma 8.3.20 except that is uses the full
version of (6.3.10) (i.e., we keep F in the last term).
For k ∈ N and V ∈ V there exists a clopen set Vk contained in V and containing
{x : d(x, V c ) ≥ 1/k} (because the latter set is compact). We can easily arrange the
26
sets Vk to grow with k. It is easy to see than if 1/k < Leb(V)/2 then the collection
Vk = {Vk : V ∈ V} is a (clopen) cover of X. Since Vk < Vk′ whenever k ′ ≥ k,
the sequence of functions h(T |µ, Vk ) increases.
S Moreover, since for each k the sets
Vk grow to V , the measure of the set Ak = V ∈V (V \ Vk ) is smaller than ε for large
enough k (this is where we need V to be finite). By the Ergodic Theorem, given σ > 0
there exists nk,σ ∈ N and a set Fk,σ of measure larger than σ such that for all n ≥ nk,σ
all n-orbits starting in Fk,σ visit Ak at most nε times. By the last displayed formula,
we have
h(T |µ, V) ≤ h(T |µ, Vk ) + lim h(T, Vk |Fk,σ , V).
σ→1
We need to estimate the last term. Let x ∈ Fk,σ . For n ≥ nk,σ choose a cell Vxn of Vn
Sn−1 (i)
containing x, say Vxn = i=1 T −i V (i) (each V (i) ∈ V ). Then T i (x) ∈ Vk except
for at most nε indices i. This implies that x belongs to one of at most
n
Ln = #Vnε
nε
Sn−1 (i) (i)
modifications of i=1 T −i Vk in which at most nε terms are altered (i.e., Vk is
replaced by another cell of Vk ). We have covered Fk,σ ∩ Vxn by at most Ln elements
of Vnk . Thus
regardless of σ. We have proved that the function h(T |µ, V) on ergodic measures
equals the increasing limit of the functions h(T |µ, Vk ), where Vk are clopen covers.
(2) We will check measurability of the function h(T |µ, V) for a finite clopen cover
V = {V1 , V2 , . . . Vl } of a zero-dimensional space X. In this setup consider the joining
(X ′ , T ′ , S) of our system with the subshift over the alphabet Λ = {1, 2, . . . , l}, such
that every point x ∈ X is joined with all sequences (an ) ∈ ΛS such that T n (x) ∈ Van
(the joining associates to each x all its possible V-names). Every ergodic measure µ can
be lifted to a measure µ′ on the joining by the rule, that all possible Λ-words assigned
to a finite piece of an orbit have equal probabilities (given x, at each coordinate we
choose the available symbols with equal probabilities and independently of the choices
made on other coordinates). It is not hard to see that the measure µ′ is ergodic and
that the assignment µ 7→ µ′ is continuous on ergodic measures (we skip the standard
arguments via estimating the frequencies of blocks). Using ergodic decomposition, we
extend this assignment to a continuous map from MT (X) into MT ′ (X ′ ). Let P denote
a clopen partition (hence a cover) of X, let V′ and P′ denote the lifts of V and P to X ′ ,
respectively. Additionally, on X ′ we have the zero-coordinate clopen partition (and
cover) Q corresponding to the symbols in Λ. Notice that the cells of V′ are precisely
the fiber saturations of the cells of Q. Choose a closed set G ∈ X ′ and denote by F its
projection to X, and let G′ be the lift of F (so that G′ is the fiber-saturation of G; recall
n
that both F and G′ are closed). Because the cells of P′ are fiber-saturated, it does not
matter whether we cover the sets G ∩ B (where B ∈ Qn ) or their fiber saturations.
Thus
h(T ′ , P′ |G, Q) = h(T ′ , P′ |G′ , V′ ) = h(T, P|F, V).
27
In the expression on the left, we are dealing with two partitions and we count the cells
n
of P′ needed to cover a cell in Qn (intersected with F ), so the count will be exactly
n
the same as if we counted the cells of (P′ ∨ Q)n instead of P′ . This leads to
Now we temporarily fix an ergodic measure µ (and µ′ ) and we repeat the argument used
in the proof of Lemma 8.3.21 (with the correction in the definition of the set G = Gε ;
see Errata): We fix a sequence of clopen partitions Pk refining in X, and we choose
a set G of measure larger than σ of points satisfying, up to ε, the Shannon-McMillan-
Breiman Theorem applied to the ergodic measure µ′ and each partition P′k ∨ Q and Q
(for each partition with perhaps different threshold length). Now we let k → ∞, and
then the right hand of the last displayed equality simply converges to h(T |F, V), while
the left side remains within the range h(µ′ , P′k |Q) ± ε, (h(µ′ , P′k |Q) = h(µ′ , P′k ∨ Q|Q)
is the usual measure-theoretic conditional entropy involving two partitions; this we do
exactly as in the proof of Lemma 8.3.21). Since every subset of X of measure larger
than σ contains sets F (images of G as described above) for arbitrarily small ε, the
application of the infimum over such sets and then supremum over σ leads to
This equality extends to all measures µ via integrating over the ergodic decomposition
(the function on the right is harmonic, the one on the left is harmonic by definition).
The function µ 7→ h(µ′ , Pk |Q) is now a composition of the continuous map µ 7→ µ′
with the conditional entropy function for two clopen covers, which, as we know very
well, is upper semicontinuous. So, h(T, |µ, V) is of Young class LU.
(3) It remains to extend the result to generals systems. We change the meaning of
the notation: from now on (X, T, S) will denote a general topological dynamical sys-
tem, while (X ′ , T ′ , S) will be its principal zero-dimensional extension (see Theorem
7.6.1). We have a continuous surjection π : K′ → K between Choquet simpices
K′ = MT ′ (X ′ ) and K = MT (X). We fix a finite open cover V of X and we let V′ de-
note its lift. As we have shown in the preceding step, the function f (µ′ ) = h(T ′ |µ′ , V′ )
is an increasing limit of some upper semicontinuous and affine (hence harmonic) func-
[K]
tions, say fk (µ′ ). The pushed-down functions fk maintain these two properties (see
Fact A.2.22). It is an elementary observation, that the operation push-down preserves
[K]
increasing limits (it is a matter of exchanging two suprema). Thus f [K] = lim ↑ fk .
This monotone limit is obviously of Young class LU, and, by the Lebesgue Monotone
Theorem, it is a harmonic function. Lemma 8.3.18 implies that f [K] (µ) coincides with
h(T |µ, V) on ergodic measures, and, since both functions are harmonic, they coincide
everywhere.
28
9 Exercises in Chapter 9
Exercise 9.1.
Example 9.3.6 is the one. Here EH = P e
h, hence hsex (T ) = h(T ) thus hres (T ) = 0.
On the other hand, the measure µ0 = k 2−k µ(Bk+1 ) (playing the role of the point
b in Example 8.2.17) has obviously entropy zero, while EH at this measure is 1. So,
hres (µ0 ) = 1 > hres (T ).
Exercise 9.2.
This is well known. There is only one invariant measure, the normalized Lebesgue
measure. Any partition P of the circle in two arcs I1 , I2 (no matter how we attach the
endpoints) has the small boundary property and generates (via the dynamics). So, this
one partition suffices to build the zero-dimensional principal extension, which becomes
a subshift (the closure of the P-names). With small boundary property, the standard
zero-dimensional extension is not only principal but even isomorphic.
Exercise 9.3.
Although the system looks even more trivial than the preceding one, this exercise is a
bit more intricate. Since this system has no small boundary property, we must first lift it
to a product with something minimal of entropy zero. Let X denote the product of [0, 1]
with the unit circle (also viewed as [0, 1], but with endpoints glued together). On this
space we apply the product dynamics of the identity times some fixed irrational rotation
(by some s): T (t, x) = (t, x + s). This system is a principal extension of ([0, 1], id, S)
(for both cases of S). Take the partition P into two sets separated by a skew line
crossing all vertical sections (for instance y(t) = 14 + 2t ) and the horizontal line y = 0.
Label the bottom set by 0 and the top set by 1. The ergodic measures on X are δ t × λ,
and it is obvious that the boundary of P (the dividing lines) has measure zero for all
such measures. So, P has small boundary. Moreover, this partition generates (via the
dynamics) in X. So, this one partition suffices to build the zero-dimensional principal
extension, which becomes a subshift Y (the closure of the P-names). This extension
is isomorphic to the product system (for every invariant measure), but obviously not to
the base system ([0, 1], Id, S).
As a matter of fact, this principal symbolic extension is a disjoint union of Sturmian
subshifts (over the same rotation, but different arc partitions), and the factor map
π : Y → [0, 1] associates to every y such t that 41 + 2t is the density of zeros in y.
We skip proving this.
Exercise 9.4.
We must copy the construction of Example 9.3.5 except that there must be two er-
godic measures supported by the first row and with k growing to infinity, the measures
supported by the kth row must approach the average of the two measures in the first
row. So, we take two bilateral uniquely ergodic subshifts X0 and X1 (say, over disjoint
alphabets), each of entropy 1, and we denote their measures by µ0 and µ1 , respec-
tively. We choose blocks Bk appearing in X0 with lengths increasing with k, so that
µ(Bk ) → µ0 and we choose Ck analogously in X1 (the length of Ck should be the same
as that of Bk ). Now we let X consist of all symbolic arrays obeying the following rules:
29
1. The first row x1 of x either belongs to X0 or to X1 or it has the form
x(k) = . . . Bk Ck Bk Ck Bk Ck . . . .
2. If the first row is x(k) , then the kth row of x is an element of X0 .
3. All other rows are filled with zeros.
Now there are two ergodic measures of entropy 1 (plus some periodic measures) sup-
ported by the first row and for each k there are finitely many measures µk,i supported
by matrices with nontrivial kth row. All these measures are isomorphic to µ0 joined
with a periodic orbit, so all of them have entropy 1. These measures accumulate at
the measure 12 (µ0 + µ1 ), because for large k short blocks in the 1st row occur half
of the time with the frequency as in Bk , and half of the time with the frequency as in
Ck , while other rows are filled with zeros, except one very distant row, which we can
ignore for the weak-star distance. The structure of Example 8.2.18 is now copied.
Exercise 9.5.
Of course, we could build a system whose simplex of invariant measures is a Bauer
simplex spanned by the unit interval and the entropy structure restricted to ergodic
measures copies the sequence (hk ) in Exercise 8.1 (the pick-up stick game on a dense
sequence). Instead, we will describe the example proposed by Mike Boyle in the early
90’s, before entropy structures were introduced, and the lack of symbolic extensions
was proved using topological methods. This example triggered the development of the
theory of symbolic extensions. Below it is adapted to the language of symbolic arrays.
Let X consist of 0-1 symbolic arrays obeying the following rules:
1. The 1st row x1 of x is arbitrary. If x1 is not periodic then all other rows are filled
with zeros.
2. If x1 is periodic with minimal period k1 then we allow x1+k1 to be arbitrary. If
x1+k1 is not periodic then all other rows are filled with zeros.
3. If x1+k1 is periodic with minimal period k2 then we allow x1+k1 +k2 to be arbi-
trary. If x1+k1 +k2 is not periodic then all other rows are filled with zeros.
4. and so on...
Every ergodic measure is supported by arrays with only one aperiodic row, hence its
entropy is at most log 2 and so is the topological entropy of the system.
Suppose Y is a symbolic extension via a factor map π : Y → X. Then, for each k
the composition πk of π with the projection onto the subshift Xk visible in the kth
row, as a factor map between two subshifts, is a sliding block code of some finite
horizon rk . Choose integers p1 and pi+1 = pi qi (qi ∈ N), and define k0 = 1 and
ki = 1 + p1 + · · · + pi . By letting the numbers pk grow fast enough we can easily
arrange that rki /pi+1 ≤ 1/3.
Let us focus only the rectangular blocks Rj of length pj+1 extending over kj rows and
having, for each i ≤ j, in row ki , periodic repetitions of some block Bi of length pi+1
(and zeros in all other rows). Note that all such rectangles are admitted in our system
30
X. Every Rj has the following structure: in rows 1 through ki−1 it has qj repetitions
of one and the same rectangle Rj−1 (of length pj ) and in row kj it has a completely
qj
arbitrary block Bj . We will write Rj = [Rj−1 , Bj ]. For each rectangle Rj we let
C(Rj ) be the family of blocks of length pj+1 appearing in Y “above” Rj (i.e., in the
preimage by π of the cylinder associated to Rj , at the same horizontal coordinates as
Rj ). We let Lj denote the minimal cardinality of C(Rj ). Also, we let
q
[ q
j j
C(Rj−1 )= C([Rj−1 , Bj ])
Bj
j q
(the family of blocks admitted “above” Rj−1 ). Notice that if two rectangles Rj differ
′
in the “central parts” (denoted Bj ) of Bj , of length pj+1 − 2rkj ≥ pj+1 /3, then their
families C(Rj ) are disjoint (because the blocks Bj′ are completely determined via the
block code by the considered blocks in Y ). Since there are at least 2pj+1 /3 different
blocks Bj′ , we obtain that, for any Rj−1 ,
pj+1
j q
#C(Rj−1 )≥2 3 Lj .
j q
On the other hand, given Rj−1 , each block in the family C(Rj−1 ) must be concatenated
exclusively from blocks belonging to one family C(Rj−1 ). So,
q
j
#C(Rj−1 ) ≤ (#C(Rj−1 ))qj ,
which, combined with the preceding inequality (and the equality pj+1 /qj = pj ) yields
pj 1
q
#C(Rj−1 ) ≥ 2 3 Lj j .
Because this holds for any Rj−1 , we have obtained the inductive dependence
pj 1
q
Lj−1 ≥ 2 3 Lj j .
(recall that the index 0 refers to the family of one-row rectangles R0 in row k0 = 1
of length p1 , and hence L0 is at most the cardinality of all blocks of length p1 in Y ).
Because the above holds for every j, we have proved that the cardinality of all blocks
of length p1 in Y is unbounded. A contradiction.
Remark. In this example the simplex of ergodic measures and the entropy structure
do not resemble those of Exercise 8.1. The picture is more like the one in Exercise 8.3,
but with infinite order of accumulation (however, one has to take a quotient space to see
such a structure). The measures of positive entropy are supported by arrays with finitely
31
many (say j−1) periodic rows and one nonperiodic row. These measures (after suitable
identification) correspond the points with topological order of accumulation j. When
the index of the last aperiodic row grows, the corresponding measures accumulate at
measures with one row less (this resembles the situation in Example 9.3.5). But we
also have “backward” accumulation points (when the order of accumulation grows).
This corresponds to letting the the number of nonzero rows grow. These accumulation
points are measures supported by some odometers.
32
Part 3
11 Exercises in Chapter 11
Exercise 11.1.
Recall that Hµ (F|G) is defined in (11.2.1) as Hµ (F ⊔ G) − Hµ (G). Thus (11.2.2) is
implied by Hµ (F ⊔ G) ≥ Hµ (F) (in presence of the trivial family O, these two are
in fact equivalent). From now on, this exercise appends Exercise 1.3 by reversing one
of its implications (recall, we assumed H(a ∨ b) ≥ H(a) and H(a|b) ≥ H(a|b ∨ c),
i.e., (11.2.2) and (11.2.3) and we were to derive H(a ∨ b|c) ≤ H(a|b) + H(b|c) i.e.,
(11.2.10)).
We proceed as follows:
Hµ (F|G ⊔ H) = Hµ (F ⊔ G ⊔ H) − Hµ (G ⊔ H) =
Hµ (F ⊔ G ⊔ H) − Hµ (G ⊔ H) − Hµ (G) + Hµ (G) = Hµ (F ⊔ H|G) − Hµ (H|G) ≤
Hµ (F|G) + Hµ (H|G) − Hµ (H|G) = Hµ (F|G).
Exercise 11.2.
We have
G
l+n−1
T l (Fn ) = (T l F)n = T iF
i=l
thus
Fl+n = Fl ⊔ (T l F)n
Using (11.2.2) with H = O, and (11.2.11), we obtain
We now divide both sides by l + n and take lim sup as n tends to infinity. The mid-
dle expression becomes hµ (T , F) while both the left and right hand sides become
hµ (T , T l F).
Exercise 11.3.
Notice that (Fn )m , where the exponent m refers to the action of T n equals Fnm (in
the notation referring to T ). Any natural k equals nm − i, where 0 ≤ i ≤ n − 1 and
then, as in the preceding exercise,
Fnm = Fi ⊔ (T i F)k
and
Hµ ((T i F)k ) ≤ Hµ (Fnm ) ≤ Hµ (Fi ) + Hµ ((T i F)k )).
Now we divide both sides by k and apply lim sup as k tends to infinity. The extreme
terms become hµ (T , T i F), which, by the preceding exercise, equals hµ (T , F). For
33
the middle term we note that in the limit 1/k can be replaced by 1/nm (where m tends
to infinity) and we obtain n1 hµ (T n , Fn ). This proves the first equality in Fact 11.2.6.
The second equality follows in a standard way from two facts: the supremum over all
F applied to hµ (T n , Fn ) gives not more than hµ (T n ) because it takes into account
only families of the form Fn . On the other hand it gives not less, because F ⊂ Fn and
thus hµ (T n , F) ≤ hµ (T n , Fn ).
Exercise 11.4.
This and the next exercises are general facts concerning doubly stochastic operators
and have nothing to do with entropy.
Suppose T be a doubly stochastic operator operator. As we know (see (11.2.21)),
T (f ∨ g) ≥ T f ∨ T g.
f ∨ g ≥ T −1 (T f ∨ T g) ≥ T −1 T f ∨ T −1 T g = f ∨ g.
T (f ∨ g) = T f ∨ T g.
Exercise 11.5.
34
Perhaps we have a good opportunity to clarify an issue concerning doubly stochastic
operators in general. Something that wasn’t clearly said in the book. We have men-
tioned that a transition probability always determines a stochastic operator, and that not
all stochastic operators are such. What we have not said, is this
Fact: every doubly stochastic operator is in fact determined by a transition probability.
In order to produce the transition probability P (x, ·) it now suffices to take the disinte-
gration measure µx of µ with respect to the sigma-algebra on the coordinate 0 at x and
apply it to the sets depending on the coordinate 1. We skip the tedious but straightfor-
ward verification that µ is indeed a shift-invariant probability measure on the product
sigma-algebra, and that the stochastic operator associated with so defined transition
probability preserves µ and coincides on L1 (µ) with T .
The above fact opens yet another way to prove that doubly stochastic operators sending
characteristic functions to characteristic functions are pointwise generated. We need to
show that the transition probabilities P (x, ·) are almost surely point-masses δ y and
then the associated map will be x 7→ y. R
By assumption, for every measurable set A, the function (T 1IA )(x) = 1IA P (x, dy)
takes on almost surely only the values 0 and 1. At almost every point this is true for
a countable family of sets A that generates the sigma-algebra. This already implies
that P (x, ·) is concentrated at one point (otherwise at least one set from the generating
family would have an intermediate measure value).
Exercise 11.6.
We have
Thus 1I{t+ m1 ≤f ≤t− m1 } ≤ Θm,t,s (f ) ≤ 1I{t<f <s} (we have already proved (11.2.26))
1 1
and the extreme functions disagree only when f ∈ (t, t + m ) ∪ (s − m , s). Thus the
1
L (µ) distance between Θm,t,s (f ) and 1I{t<f <s} does not exceed the measure µ of the
set of points for which f falls into this intervals. Including the internal endpoints leads
to a not smaller value, hence the inequality (11.2.27) holds.
35
Exercise 11.7.
n R
It is easy to see that T n (f )(x) = 21n f (σ n
R x) + 2n
2 −1
f dµ for every f ∈ L1 (µ).
n
Thus T (f ) converges to the constant f dµ, implying that the entropy hµ (T ) is
n
zero. On the other hand, if f0 (x) = x0 then T n (f0 )(x) = 21n f (xn ) + 2 2−1 n
1
2
n −1
and (T (f0 )) (κ) equals the nth coordinate partition. Thus, the partition generated
jointly by (T i (f0 ))−1 (κ) for i = 0, 2 . . . , n − 1 equals the partition into the blocks of
length n, which has static entropy n log 2.
12 Exercises in Chapter 12
Exercise 12.1.
(i) is clear using Definition 12.1.2, since a family F of continuous functions on the
factor lifts to a family F′ of continuous functions on X, and for each cover V the
−1
preimage F−1 (V) lifts to F′ (V). Recall that the cardinality of a minimal subcover is
preserved under preimage of a continuous surjection.
(ii) is now obvious, as conjugate systems are factors of each-other.
(iii) is best seen using Definition 12.1.3. Each family F of continuous functions on
Y prolongs to a family F ′ on X and then every (dF , ε)-separated set in Y remains
(dF′ ε)-separated in X. Also note that, by invariance of Y , for each n we can use
T n (F′ ) as a prolongation of T n (F).
(iv) The proof is analogous to that in Exercise 11.3 (without subadditivity we must
cope with lim sup, hence the simple way as in Fact 6.2.3 cannot be applied). We only
outline the steps. Let us use Definition 12.1.1 for a change. We begin by proving an
analog of Exercise 11.2: h1 (T , T l (F), ε) = h1 (T , F, ε). This is done the same way
as that exercise with ⊔ replaced by the ordinary union of families and Hµ (F) replaced
by H1 (F, ε). Monotonicity (the analogue of (11.2.2)) and subadditivity (the analog of
(11.2.11)) are now obvious properties of joining the covers UεF .
Next we follow Exercise 11.3. with the same substitutions.
Exercise 12.2.
P
n n
We have T n f = 1
2n k=0 i f ◦ σ n+i , which is the convex combination of the
functions f ◦ σ j with coefficients as in the binomial (1/2,1/2)-distribution on [n, 2n].
The difference T n f − T n+1 f is thus a combination of the same functions with coeffi-
cients as in the difference of binomial distributions on [n, 2n] and on [n + 1, 2n + 2].
Skipping the precise calculations, we agree that this difference is a signed distribution
on [n, 2n+2] whose absolute value has small total mass, if n is large (see the figure: the
36
red “tops” represent the positive atoms of the difference distribution, the blue “tops”
are negative). Thus, since f is bounded, the differences T n f − T n+1 f converge to
zero uniformly as n → ∞. To complete this exercise, we will prove a more general
fact.
Fact: If F consist of functions f such that the differences T n f − T n+1 f converge to
zero uniformly as n → ∞, then h2 (T , F, V) = 0 for any cover V of the interval.
Proof. Fix some δ > 0. For each V ∈ V define Vδ = {t : d(t, V c ) < δ} ⊂ V and
let Vδ = {Vδ : V ∈ V}. Notice that if δ < Leb(V)/2 then Vδ still covers the entire
interval and it is inscribed in V. Thus, for every f : X → [0, 1], f −1 (Vδ ) < f −1 (V).
Moreover, if kg − f k < δ then g −1 (Vδ ) < f −1 (V). So, if the assumption on F is
satisfied, then for each k, we have
_
k−1
(Tn (F))−1 (Vδ ) < (Tn+i (F))−1 (V),
i=0
Wm−1
if n is larger than some nk . Now, in the expression i=0 (T i (F))−1 (V) defining
(F m )−1 (V), we can gruop the terms as follows (assuming m = nk + rk + s, where
s ≤ k):
(F m )−1 (V) =
n_
!
k −1 _
r−1 _
k−1 _
s−1
i nk +jk+i
(T (F))−1
(V)∨ (T (F)) −1
(V) ∨ (T nk +rk+i (F))−1 (V) 4
i=0 j=0 i=0 i=0
n_
k −1 _
r
(T i (F))−1 (V) ∨ (T nk +jk (F))−1 (Vδ )
i=0 j=0
Dividing by m, letting m grow to infinity and remembering that r < m/k while nk
does not grow with m, we obtain
Now, in our exercise, the above holds for every F hence the topological entropy h2 (T )
is zero. The last question is answered using “half” of the variational principle (The-
orem 12.3.1): for every invariant measure of T , the measure-theoretic entropy of the
corresponding doubly stochastic operator is zero, as well.
37
Exercise 12.3.
For p ∈ [0, 1] let pp denote the probability measure on {0, 1} assigning p to {0} and
1−p to {1}. For y = (yn ) ∈ [0, 1]N0 let µ(y) = py1 ×py2 ×. . . . It is easy to see that the
map y 7→ µ(y) is a homeomorphism between [0, 1]N0 and its image, which is a subset
of M({0, 1}N0 ). It is also immediately seen how the operator T ∗ dual to the operator
T induced on C({0, 1}N0 ) by the shift map acts on this image: T ∗ µ(y) = µ(σy) . We
have shown, that the “hipersystem” (M({0, 1}N0 ), T ∗ ) of the full unilateral shift on
two symbols contains a subsystem conjugate to the full shift on [0, 1]N0 . Since the
latter obviously has infinite topological entropy, so does the hypersystem, although the
full shift on two symbols has finite entropy.
Exercise 12.4.
The proof should roughly follow the standard way, however, there might be some tech-
nical issues. I decided to leave this exercise open.
38