0% found this document useful (0 votes)

5 views38 pages

Predicting Dynamical Systems With Too Few Time-Delay Measurements: Error Estimates

Uploaded by

smieszke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views38 pages

Predicting Dynamical Systems With Too Few Time-Delay Measurements: Error Estimates

Uploaded by

smieszke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

PREDICTING DYNAMICAL SYSTEMS WITH TOO FEW TIME-DELAY

MEASUREMENTS: ERROR ESTIMATES

KRZYSZTOF BARAŃSKI∗, YONATAN GUTMAN† , AND ADAM ŚPIEWAK†

arXiv:2401.15712v1 [math.DS] 28 Jan 2024

Abstract. We study the problem of reconstructing and predicting the future of a dynamical system
by the use of time-delay measurements of typical observables. Considering the case of too few mea-
surements, we prove that for Lipschitz systems on compact sets in Euclidean spaces, equipped with
an invariant Borel probability measure µ of Hausdorff dimension d, one needs at least d measure-
ments of a typical (prevalent) Lipschitz observable for µ-almost sure reconstruction and prediction.
Consequently, the Hausdorff dimension of µ is the precise threshold for the minimal delay (embed-
ding) dimension for such systems in a probabilistic setting. Furthermore, we establish a lower bound
postulated in the Schroer–Sauer–Ott–Yorke prediction error conjecture from 1998, after necessary
modifications (whereas the upper estimates were obtained in our previous work). To this aim, we
prove a general theorem on the dimensions of conditional measures of µ with respect to time-delay
coordinate maps.

1. Introduction
1.1. Time-delayed measurements. The paper considers the problem of reconstructing or pre-
dicting an (unknown) future of a dynamical system by time-delayed measurements of observables.
This is one of the central themes in non-linear data analysis, leading to non-trivial mathematical
questions related to practical tools and algorithms used in applications. Consider a phase space X
(the set of all possible states of the system) and a deterministic dynamics on X, generated by a
transformation T : X → X, which deﬁnes a one-step evolution rule. Let h : X → R be an observable,
which can be seen as a function measuring certain parameter of the system. We assume that the
observer has no direct access to the original system (X, T ) and its (ﬁnite) orbits

x, T x, T 2 x, . . . , T m x, x ∈ X, m ∈ N,

whereas the knowledge of the system is derived from observations (measurements) of the values of
h along the orbits, that is

(1) h(x), h(T x), h(T 2 x), . . . , h(T m x).

One of the main tasks arising in this context is reconstructing the unknown dynamics (X, T )
based on the observational data (1). In particular, one would like to predict the future values
h(T m+1 x), h(T m+2 x, . . .) from the time series given by (1). One of the eﬀective approaches to these
problems is to construct a time-delayed model of the system in a high-dimensional Euclidean space

∗
Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097 Warszawa, Poland
†
Institute of Mathematics, Polish Academy of Sciences, ul. Śniadeckich 8, 00-656 Warszawa,
Poland
E-mail addresses: [email protected], [email protected], [email protected].
2020 Mathematics Subject Classification. 37C45, 37C40, 58D10.
YG and AŚ were partially supported by the National Science Centre (Poland) grant 2020/39/B/ST1/02329.
1
by the use of the time series (1), hoping that this will unfold the original dynamics. More pre-
cisely, fixing a positive integer k (called delay length or delay dimension) for the dimension of the
reconstruction space Rk , one transforms (1) into a sequence of points yj ∈ Rk as
(2) yj = (h(T j x), h(T j+1 x), . . . , h(T j+k−1 x)) for j = 0, . . . , m − k + 1.
One expects that if k is large enough, then the original dynamics (T, X) can be reliably modelled
or approximated by the observed dynamics yj 7→ yj+1 in Rk . This idea has proved fruitful in
applications, see e.g. [SM90, HP97, HGLS05, MRCA14, BR23], and a mathematical theory has been
built to understand the underlying mechanisms and conditions that determine its performance, see
e.g. [PCFS80, FS87, KY90, SYC91, KBA92, SSOY98, Vos03, Rob11, HBS15] and the references
therein.
The main object of interest in this theory is the k-delay coordinate map corresponding to the
observable h, defined as
(3) φh,k : X → Rk , φh,k (x) = (h(x), h(T x), . . . , h(T k−1 x)).
We will frequently suppress the dependence of the delay coordinate map on k, writing φh instead of
φh,k . Note that in this notation, the reconstruction space time series (2) is given as yj = φh (T j x)
and the problem of predicting future values of the time series (1) for an observable h translates to
the question whether φh (x) determines φh (T x) uniquely. Similarly, the problem of reconstructing
the dynamics (X, T ) by time-delay measurements of h translates to the injectivity properties of φh .
In terms of the dynamical systems theory, these two conditions on φh can be rephrased as being,
respectively, a factor map (semi-conjugation) and an isomorphism (conjugation) between the original
system and its image under φh , in a category (regularity) depending on the context. In both cases,
the diagram
T
X −−−−→ X
 
(4) φ
y h
φ
y h
S
φh (X) −−−h−→ φh (X)
commutes, with the well-defined map Sh given as
Sh (h(x), h(T x), . . . , h(T k−1 x)) = (h(T x), h(T 2 x), . . . , h(T k x)).
On the reconstruction space time series (2) the map Sh acts as Sh (yj ) = yj+1 , realizing a one-step
prediction of the next term in (1). In view of this, we formulate the following definition.
Definition 1.1. An observable h : X → R is (deterministically) k-predictable, if for the k-delay
coordinate map φh : X → Rk given by (3) there exists a map Sh : φh (X) → φh (X), called the
prediction map, such that the diagram (4) commutes.
Note that if φh is injective, then the diagram (4) exists, with Sh defined as Sh = φh ◦ T ◦ φ−1 h .
In this case, the conjugated system (φh (X), Sh ) can be treated as a faithful reconstruction model of
the original dynamical system (X, T ) in the reconstruction space Rk , based only on the time-delay
values of the observable h. Furthermore, if the k-delay coordinate map φh is continuous and injective
on a compact set X, then it is a topological embedding of X onto φh (X) ⊂ Rk . In this case we say
that φh is a k-delay embedding.
There is a long history of mathematical results on the embedding properties of φh for typical
observables h. These are usually referred to as Takens-type embedding theorems, as they are re-
lated to a classical result of Floris Takens [Tak81] from 1981. In an extended version by Huke
2
[Huk06, SBDH97] (see also [Noa91]), it states that if T : X → X is a C 1 -diffeomorphism of a com-
pact C 1 -manifold X such that all periodic orbits of T of period smaller than 2 dim X are isolated
and hyperbolic and each has distinct eigenvalues, then for a generic C 1 -observable h : X → R, the
k-delay coordinate map φh is a C 1 -embedding for k > 2 dim X. This seminal result was extended
in a number of papers, including [SYC91, SBDH97, Sta99, Cab00, OY03, SBDH03, Rob05, Gut16,
GQS18, BGŚ20, NV20], to various settings and categories of systems. Apart from that, the pre-
dictability problem was considered, among others, in [Tak02, SSOY98, BGŚ22a, BGŚ22b, KK23].
The results obtained in the references mentioned above are considered to be a theoretical basis for
procedures of time-delay reconstruction and prediction of the dynamics, used in applications, see
e.g. [JL94, HGLS05, WCL09, SKY+ 18, DLSR23]. A common feature in these papers is that the
bound on the delay dimension k is related to the dimension of the phase space X. Let us state
two results which are relevant to the present work. Following [SYC91], they are presented in the
category of Lipschitz transformations and observables on compact sets in Euclidean spaces, where
genericity is understood as the prevalence with a polynomial probe set (see Subsection 2.3). Below,
the symbol dimB denotes the upper box dimension - see Subsection 2.4 for the definitions of all
notions of dimensions that appear in this section and the relations between them.

Theorem 1.2 (Time-delay prediction and embedding theorem). Let X ⊂ RN , N ∈ N, be a

compact set and let T : X → X be a Lipschitz map. Then a prevalent Lipschitz observable h : X → R
is k-predictable, with a continuous prediction map, for every k > 2dimB X. If, additionally, T is
injective and 2dimB ({x ∈ X : T p x = x}) < p for p = 1, . . . , k − 1, then the k-delay coordinate map
φh is a k-delay embedding for a prevalent Lipschitz observable h : X → R.

The ﬁrst part of the theorem is [BGŚ22b, Theorem 1.16], while the second one comes from [SYC91]
(see [Rob11, Theorem 4.5] for the above formulation). In fact, the results hold with greater generality
than above, see [SYC91, Rob11, BGŚ22b] for details. Theorem 1.2 and other mentioned results show
that a threshold for the delay dimension that is suﬃcient for a reliable prediction and reconstruction
of the system is roughly equal to twice the dimension of the phase space, which agrees with the
well-known Menger–Nöbeling and Whitney theorems on embedding, respectively, topological spaces
and smooth manifolds into Euclidean spaces.

1.2. Time-delay measurements from a probabilistic point of view. It turns out that the
minimal number of measurements required for a reliable prediction or reconstruction of the system
can be reduced (at least) by half in a probabilistic setting, when an observer is interested only in
the ‘almost sure’ behaviour of the system. Mathematically speaking, this means that one assumes
X to be endowed with a probability measure, studying trajectories of almost all points x ∈ X. The
relevance of this approach was conjectured by Schroer, Sauer, Ott and Yorke in [SSOY98]. In a
series of our previous papers [BGŚ20, BGŚ22a, BGŚ22b] we developed a theory of almost sure time-
delay prediction and embedding for (locally) Lipschitz systems with a Borel probability measure on
compact or Borel sets in Euclidean spaces. Within this approach, the analogues of the notions of
the time-delay predictability and time-delay embedding are the following.

Definition 1.3. Let X ⊂ RN , N ∈ N, be a Borel set, let µ be a Borel probability measure on X

and let T : X → X be a Borel transformation. We say that a Borel observable h : X → R is almost
surely k-predictable, if for the k-delay coordinate map φh , there exists a Borel set Xh ⊂ X of full
3
µ-measure and a map Sh : φh (Xh ) → φh (X) (prediction map) such that the diagram
T
Xh −−−−→ X
 
(5) φ
y h
φ
y h
S
φh (Xh ) −−−h−→ φh (X)
commutes.
Remark 1.4. If, additionally, T is continuous and h is an almost surely k-predictable continuous
observable, then the set Xh can be chosen such that the set φh (Xh ) and the prediction map Sh are
Borel (see [BGŚ22b, Proposition 1.13]).
Definition 1.5. We say that the k-delay coordinate map φh is almost surely injective, if it is
injective on a full µ-measure Borel set Xh ⊂ X. Then the commuting diagram (5) exists, with
Sh = φh ◦ T ◦ (φh |Xh )−1 .
If h is almost surely k-predictable on a full-measure Borel set Xh ⊂ X, which is T -invariant,
i.e. T (Xh ) ⊂ Xh , then the diagram (5) has the form
T
Xh −−−−→ Xh
 
(6) φ
y h
φ ,
y h
S
φh (Xh ) −−−h−→ φh (Xh )
which provides a semi-conjugation between the system (X, T ) restricted to a full-measure subset
of the phase space and its model in Rk . Similarly, if φh is almost surely injective on a T -invariant
full-measure Borel set Xh ⊂ X, then the diagram (6) provides a measurable isomorphism between
the system (Xh , µ, T |Xh ) and its measurable model (φh (Xh ), φh µ, Sh ) in Rk , where φh µ denotes
the push-forward of µ under the map φh (see Subsection 2.2). In the case when µ is additionally
T -invariant, the map φh becomes an isomorphism in the category of measure-preserving transforma-
tions, hence we call it a k-delay measure-preserving-transformations isomorphism (in short, k-delay
mpt isomorphism).
In [BGŚ20, Theorem 1.2 and Remark 1.3] and [BGŚ22b, Theorem 1.18] we proved the following.
Theorem 1.6 (Probabilistic time-delay prediction and embedding theorem). Let X ⊂ RN ,
N ∈ N, be a compact set, let µ be a Borel probability measure on X and let T : X → X be a Lipschitz
map. Then the following hold.
(a) A prevalent Lipschitz observable h : X → R is almost surely k-predictable for every k >
dimH µ. Furthermore, if k > dimH (supp µ), then the set Xh can be chosen such that the
prediction map Sh is continuous. If, additionally, µ is T -invariant, then Xh can be chosen
to be T -invariant.
(b) If T is injective and dimH (µ|{x∈X:T p x=x} ) < p for p = 1, . . . , k − 1, where k > dimH µ, then
the k-delay coordinate map φh is almost surely injective for a prevalent Lipschitz observable
h : X → R. If, additionally, µ is T -invariant, then Xh can be chosen to be T -invariant
and, consequently, φh is a k-delay mpt isomorphism. Furthermore, if µ is T -invariant and
ergodic, then the assumption dimH (µ|{x∈X:T p x=x} ) < p for p = 1, . . . , k − 1 may be omitted.
Here and in the sequel the symbol supp denotes the topological support of the measure (see
Subsection 2.2). Theorem 1.6 shows that the threshold for the delay dimension that is sufficient
for an almost sure prediction and reconstruction of the system is roughly equal to the dimension
4
of the phase space or a given measure, which reduces (at least) by half the number of required
measurements, compared to the deterministic setup.
1.3. New results. The main objective of this paper is to study the case of (too) few measurements
("undersampling"), mainly in the probabilistic setting, which roughly means k < dimH µ. For such
values of k one should not expect a faithful k-delay prediction or reconstruction of the system. Our
results confirm this intuition in a rigorous way, yielding a natural theoretical limitation for these
time-delay measurement procedures. To our knowledge, these are the first known results showing
lower bounds for the minimal delay dimension needed for a faithful prediction or reconstruction of the
system. We should emphasize that the injectivity of a single delay coordinates map or predictability
of a single observable may hold even in the case of few measurements. Our results exclude such
possibility for a typical (prevalent) observable.
The first result concerns the setting of Theorem 1.6 and shows that for Lipschitz systems on
compact spaces with an invariant probability measure µ, the Hausdorff dimension of µ is the precise
threshold for the minimal delay dimension needed for system reconstruction and prediction proce-
dures for typical Lipschitz observables in the probabilistic setting. To state the result, let us denote
the sets of periodic and pre-periodic points as
Perp (T ) = {x ∈ X : T p x = x},
PrePerp (T ) = {x ∈ X : T p x ∈ {x, T x, . . . , T p−1 x}}
for p ∈ N.
Theorem 1.7. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X
and let T : X → X be a Lipschitz map. Then the following hold.
(a) If k < dimH T k−1 µ, then for a prevalent Lipschitz observable h : X → R, the k-delay co-
ordinate map φh is not almost surely injective. If, additionally, µ is T -invariant or T
is bi-Lipschitz onto its image, then the assumption k < dimH T k−1 µ may be replaced by
k < dimH µ.
(b) If k < dimH T k (µ|X\PrePerk (T ) ), then a prevalent Lipschitz observable h : X → R is not almost
surely k-predictable. If, additionally, T is bi-Lipschitz onto its image or µ is T -invariant, then
the assumption k < dimH T k (µ|X\PrePerk (T ) ) may be replaced by k < dimH µ|X\Sk Perp (T ) .
p=1
Furthermore, if µ is T -invariant and ergodic, then it suffices to assume k < dimH µ.
Remark 1.8. Note that in the general case of a non-invariant measure µ, in Theorem 1.7 it is not
enough to assume k < dimH µ in order to exclude the almost sure injectivity of φh and almost sure
k-predictability of h, see examples in Section 7.2.
Remark 1.9. It is easy to see that some restrictions on the size of the set of preperiodic points
PrePerk (T ) in Theorem 1.7 are necessary. A trivial example is when T is the identity – in this
case every observable is predictable. More generally, if there exist p, k ∈ N, p ≤ k − 1, such that
T p x = T k x for every (resp. µ-almost every) x ∈ X, then every observable is k-predictable (resp. µ-
almost predictable). This holds, e.g., for rational rotations on a d-dimensional torus or for X = [0, 1],
T x = |x − 1/2|.
Combining Theorems 1.6 and 1.7 we see that dimH µ is the precise threshold for the minimal delay
(embedding) dimension required for both almost sure reconstruction and prediction of a system. In
practical tasks the dimension of the system is often unknown and one is faced with a challenge
of estimating the embedding dimension from a time series. Several algorithms aiming at this and
5
similar tasks were introduced, e.g. the false nearest neighbour algorithm and its modifications
[KBA92, ČP88, LPS91, DLSR23] (see also [Aba96] and [BGŚ22b, Section 1.5] for a more detailed
discussion). It is an interesting problem to study rigorously the behaviour of such algorithms and
its relation to the presented results.
Theorem 1.7 implies the following result concerning predictability and embedding limitation in
the deterministic setting of Theorem 1.2. Note that unlike in the probabilistic case, we do not
obtain the same threshold for the minimal delay dimension as the one which appears in Theorem 1.2
(i.e. 2dimB X).
Theorem 1.10. Let X ⊂ RN , N ∈ N, be a compact set and let T : X → X be a Lipschitz map.
Then the following hold.
(a) If k < dimH T k−1 (X), then for a prevalent Lipschitz observable h : X → R, the k-delay
coordinate map φh is not injective.
(b) If k < dimH T k (X) \ kp=1 Perp (T ) , then a prevalent Lipschitz observable h : X → R is not
S

k-predictable.
The proofs of Theorems 1.7 and 1.10 are presented in Section 3.
The subsequent results of the paper are related to a conjecture of Schroer, Sauer, Ott and Yorke
from [SSOY98] concerning error bounds in time-delay prediction procedures. To formulate the
conjecture, let X ⊂ RN , N ∈ N be a Borel set, let µ be a Borel probability measure on X and let
T : X → X be a Borel transformation. For a Borel observable h : X → R, y ∈ supp φh µ and ε > 0,
deﬁne
1
ˆ
χh,ε (y) = φh ◦ T dµ,
µ(φ−1
h (B(y, ε))) φh (B(y,ε))
−1

1
1
ˆ 2
2
σh,ε (y) = −1 kφh ◦ T − χh,ε (y)k dµ
µ(φh (B(y, ε))) φ−1h (B(y,ε))

(provided the integrals exist).

Remark 1.11. In [SSOY98], the probabilistic notion of predictability was introduced in another
way, by deﬁning k-predictable points as points y ∈ Rk for which the prediction error
σh (y) = lim σh,ε (y)
ε→0

exists and is equal to 0. In [BGŚ22b, Theorem 1.14] we showed that for continuous systems on
compact spaces, a continuous observable h is almost surely k-predictable with respect to µ if and
only if φh µ-almost every point y ∈ Rk is k-predictable (see also Lemma 1.20 below). Hence, in the
setup considered in this paper, the two notions of almost sure predictability coincide.
Remark 1.12. The almost sure predictability is closely related to the behaviour of the Farmer–
Sidorowich prediction algorithm [FS87] (see [BGŚ22b, Proposition 1.3]). More precisely, whenever
almost sure k-predictability holds with respect to an ergodic measure µ, it implies an almost sure
convergence of the suitable Farmer–Sidorowich algorithm. Refer to [BGŚ22b, Corollary 1.15] for
details.
In [SSOY98], Schroer, Sauer, Ott and Yorke conjectured [SSOY98, Conjecture 2] a decay rate of
the prediction errors σh,ε (y) for typical observables. The conjecture was stated for a special class of
invariant measures on attractors for smooth diffeomorphisms on Riemannian manifolds – so called
natural measures, see Definition 2.2.
6
Schroer–Sauer–Ott–Yorke prediction error conjecture. Let T : M → M be a smooth diffeo-
morphism of a compact Riemannian manifold M with a compact T -invariant attractor X ⊂ M and a
natural measure µ on X of information dimension ID(µ) = D. Fix a generic observable h : M → R,
k ∈ N and a small δ > 0. Then for the k-delay coordinate map φh corresponding to h and sufficiently
small ε > 0, the following hold.
(i) If k < D, then
µ({x ∈ X : σh,ε (φh (x)) > δ}) ≥ C for some C > 0.
(ii) If D < k < 2D and φh is not injective, then
C1 εk−D ≤ µ({x ∈ X : σh,ε (φh (x)) > δ}) ≤ C2 εk−D for some C1 , C2 > 0.
(iii) If D < k < 2D and φh is injective, then
µ({x ∈ X : σh,ε (φh (x)) > δ}) = 0.
(iv) If k > 2D, then
µ({x ∈ X : σh,ε (φh (x)) > δ}) = 0.
Note that the first conjecture by Schroer, Sauer, Ott and Yorke ([SSOY98, Conjecture 1], con-
cerning the possibility of generic almost sure prediction in the case k > D, was settled in our
previous papers [BGŚ22a, Corollaries 1.9–1.10, Theorem 1.11] and [BGŚ22b, Theorem 1.4] (see also
Theorem 1.6).
Remark 1.13. In the original formulation of the prediction error conjecture in [SSOY98], as well
as in the statement presented above, not all the details are precisely specified, which introduces
the need for some interpretations. This includes the issue of the regularity class of the considered
observables as well as the suitable notion of genericity. Following the approach used in [SYC91], we
consider observables within the class of Lipschitz maps and prevalence as the notion of genericity, but
it should be noted that the obtained results apply also to the classes C r , r = 1, . . . , ∞ of observables
(see Definition 2.3 and the discussion afterwards). Another issue is the interdependence between h
and δ in the formulation of the conjecture, which has a non-trivial impact on the existence of the
postulated lower bounds, see Remark 1.18 and Theorem 1.19.
In our previous paper [BGŚ22b], we studied the upper bounds for the prediction error probability
in the Schroer–Sauer–Ott–Yorke prediction error conjecture. We proved that the upper bound in
assertion (ii) holds true (in a slightly weaker form) if we replace the information dimension with the
upper box counting dimension of supp µ, i.e. for D = dimB (supp µ). Furthermore, assertion (iv)
holds for D = dimB (supp µ) (hence also for D = dimB (supp µ)), while (iii) is true whenever φh
is injective (regardless of the dimension of the phase space). In fact, the results extend to a much
broader class of Lipschitz systems on compact spaces X ⊂ RN and arbitrary Borel probability
measures µ, see [BGŚ22b, Theorem 1.6] for details. We also provided examples [BGŚ22b, Proposi-
tion 8.3] showing that the box counting dimensions cannot be replaced by ID(µ) or dimH µ in the
general class of Lipschitz systems and Borel measures (in the case of natural measures on attractors
for smooth diffeomorphisms the question remains open).
The second objective of this paper is to establish lower bounds in the Schroer–Sauer–Ott–Yorke
prediction error conjecture, with a focus on assertion (i), which corresponds to the case of too few
measurements. Our main result (which strengthens Theorem 1.7(b)), shows that the assertion holds
true for small enough δ > 0, if the information dimension of the measure is replaced by its Hausdorff
dimension, i.e. after setting D = dimH µ.
7
Theorem 1.14. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X
and let T : X → X be a Lipschitz map. Fix k ∈ N such that k < dimH T k (µ|X\PrePerk (T ) ). Then for
a prevalent Lipschitz observable h : X → R one can find C, δ0 > 0 such that for every 0 < δ < δ0
there exists ε0 > 0 with
µ({x ∈ X : σh,ε (φh (x)) > δ}) ≥ C
for every 0 < ε < ε0 , where φh is the k-delay coordinate map corresponding to h. If, additionally, T
is bi-Lipschitz onto its image or µ is T -invariant, then the assumption k < dimH T k (µ|X\PrePerk (T ) )
may be replaced by k < dimH µ|X\Sk Perp (T ) . Furthermore, if T -invariant and ergodic, then it
p=1
suffices to assume k < dimH µ.

Remark 1.15. If h is almost surely k-predictable, then σh,ε (φh (x)) → 0 as ε → 0 for µ-almost every
x ∈ X (see Remark 1.11), so µ({x ∈ X : σh,ε (φh (x)) > δ}) → 0 as ε → 0 for every δ > 0. Therefore
(as the intersection of two prevalent sets is prevalent, in particular non-empty), Theorem 1.6 shows
that the assertion of Theorem 1.14 cannot hold for k > dimH µ.

The proof of Theorem 1.14 is presented in Section 3. In Section 7 we present examples showing
that the assumptions on the dynamics T and the measure µ are necessary. We also provide an
example showing that the result does not hold for information dimension (see Section 7.1). How-
ever, we emphasize that this example lies within the general class of Lipschitz systems, and the
considered measure is not a natural measure on an attractor for a smooth diffeomorphism. Hence, it
does not provide a counterexample to assertion (i) of the Schroer–Sauer–Ott–Yorke prediction error
conjecture.
Combining Theorem 1.14 with [BGŚ22b, Theorem 1.6], we obtain the following version of the
Schroer–Sauer–Ott–Yorke prediction error conjecture, which holds true (after necessary modifica-
tions) for arbitrary Lipschitz systems on compact sets in Euclidean spaces, equipped with a Borel
probability measure. Below we present the result in a slightly simplified version for invariant mea-
sures.

Theorem 1.16 (Prediction error estimates). Let X ⊂ RN , N ∈ N, be a compact set, let µ

a T -invariant Borel probability measure on X and let T : X → X be a Lipschitz map. Assume
µ( kp=1 Perp (T )) = 0 and set DH = dimH µ, D B = dimB (supp µ), D B = dimB (supp µ). Then for
S

a prevalent Lipschitz observable h : X → R and k ∈ N, θ > 0, one can find C, δ0 > 0 such that
for every 0 < δ < δ0 there exists ε0 > 0 such that for the k-delay coordinate map φh and every
0 < ε < ε0 , the following hold.
(i) If k < DH , then
µ({x ∈ X : σh,ε (φh (x)) > δ}) ≥ C.
(ii) If k > DB and φh is not injective, then
µ ({x ∈ X : σh,ε (φ(x)) > δ}) ≤ Cεk−DB −θ .
(iii) If k > 2D B or φh is injective, then
{x ∈ X : σh,ε (φh (x)) > δ} = ∅.
If µ is additionally ergodic, then the assumption µ( kp=1 Perp (T )) = 0 may be omitted.
S

Remark 1.17. Note that Theorem 1.16 does not hold if we replace the dimensions DH , D B , D B in
assertions (i)–(iii) by one notion of dimension, for any choice among ID(µ), DH , D B , D B . Indeed,
[BGŚ22b, Proposition 8.3] shows that Theorem 1.16.(ii)–(iii) do not hold either for ID(µ) or DH ,
8
while Remark 1.15 indicates that Theorem 1.16.(i) is not true either for D B or D B (to see this, it
is suﬃcient to consider any example of the measure µ with DH < k < D B for some k ∈ N). On the
other hand, if µ is exactly dimensional (see Subsection 2.4), then ID(µ) = DH (see [You82, FLR02]).

We complete our results on the Schroer–Sauer–Ott–Yorke prediction error conjecture with a dis-
cussion on the relation between h and δ, presenting a non-trivial example for which the lower bounds
in the conjecture do not hold in its alternative interpretation, even in the class of natural measures
for smooth axiom A diﬀeomorphisms of compact Riemannian manifolds. This issue is clariﬁed in
the following remark and the discussion afterwards.

Remark 1.18. The assertions of Theorems 1.14 and 1.16 are valid for every δ > 0 small enough
depending on h, i.e. for δ < δ0 with δ0 = δ0 (h). An alternative interpretation of the Schroer–Sauer–
Ott–Yorke prediction error conjecture leads to a question, whether the suitable estimates hold for
typical h with a ﬁxed δ. This is certainly true in the case of the upper bounds in assertions (ii)–(iv)
of the conjecture, as for a given observable h, if an upper bound holds for δ0 , then it also holds
for all δ > δ0 . The situation is diﬀerent for the lower bound, for a trivial reason: if for a given
observable h we take δ much larger than diam(h(X)), then σh,ε (y) ≤ δ for all ε, and hence the set
{x ∈ X : σh,ε (φh (x)) > δ} is empty. This shows that the range (0, δ0 ) for which a lower bound holds
must depend on h.

In view of the above remark, it is natural to consider non-constant observables and ask whether
the lower bounds in assertions (i)–(ii) of the prediction error conjecture hold for all δ < δ0 with
δ0 locally almost independent of h, i.e. whether for a given non-constant observable h0 there exists
δ0 > 0 such that the lower bounds are valid for every δ < δ0 and typical h in some neighbourhood
of h0 (so that we can in particular consider h with diam(h(X)) uniformly bounded from below).
The following theorem shows that this is not the case. In fact, it remains untrue even after setting
D = dimH µ or D = dimB X, so one cannot obtain the corresponding lower bounds in Theorem 1.16
with δ0 locally almost independent of non-constant h.

Theorem 1.19 (Counterexample). There exists a compact Riemannian manifold M , a C ∞ -

axiom A diffeomorphism T : M → M with a compact T -invariant attractor X ⊂ M and a natural
measure µ on X, such that ID(µ) = dimH µ = dimH X = dimB X = D = 3/2 and a non-constant
Lipschitz observable h0 : X → R, such that the following holds. For k = 1, 2 and every δ > 0,
there exists ε0 > 0 and an open set U ⊂ Lip(X, R) containing h0 such that for every h ∈ U , the
corresponding k-delay coordinate map φh is not injective and for every 0 < ε < ε0 there holds

{x ∈ X : σh,ε (φh (x)) > δ} = ∅.

Consequently, for k = 1 we have k < D and there is no C = C(δ, h) > 0, such that

µ({x ∈ X : σh,ε (φh (x)) > δ}) ≥ C,

so assertion (i) of the conjecture does not hold when the range of allowable δ is locally almost
independent of h. Similarly, specifying to k = 2 we have D < k < 2D and so the lower bound in
assertion (ii) of the conjecture fails when the range of allowable δ is locally almost independent of h.

The proof of Theorem 1.19 is presented in Section 6. It remains an open problem whether
assertion (ii) of the prediction error conjecture is true if the range of δ is allowed to depend on h.
9
1.4. Dimension of conditional measures. To show Theorems 1.7, 1.10 and 1.14, we prove more
general results on the dimension of conditional measures for the measure µ with respect to the
delay coordinate map, which may be interesting on their own. Following [Sim12] (see also [BGŚ22a,
Subsection 2.4]), for a Borel map φ : X → Rk on a compact set X ⊂ RN and a (complete) Borel
probability measure µ on X, we deﬁne a system of measures µφ,y , y ∈ Rk , where µφ,y is a (possibly
zero) Borel measure on φ−1 (y) deﬁned as the weak-∗ limit
1
(7) µφ,y = lim µ|φ−1 (B(y,δ)) ,
δ→0 µ(φ−1 (B(y, δ)))

whenever the limit exists, and zero otherwise. By the topological Rohlin disintegration theorem
[Sim12], the limit in (7) exists for φµ-almost every y ∈ Rk and satisﬁes
ˆ
(8) µ(E) = µφ,y (E) d(φµ)(y) for every µ-measurable E ⊂ X
Rk

(in particular, the function Rk ∋ y 7→ µφ,y (E) in (8) is φµ-measurable) and

(9) µφ,y (φ−1 (y)) = 1 for φµ-almost every y ∈ Rk .

The system {µφ,y }y∈Rk is called the system of conditional measures for µ with respect to φ. Moreover,
the conditions (8) and (9) characterize the system {µφ,y }y∈Rk uniquely (φµ-almost surely). See
[Sim12] for details.
For a Borel transformation T : X → X on a Borel set X ⊂ RN with a Borel probability measure
µ on X and a k-delay coordinate map φh : X → Rk corresponding to a Borel observable h : X → R,
we consider a system of conditional measures {µh,y }y∈Rk for the (completion of) µ with respect
to φh , where for simplicity, we write µh,y instead of µφh ,y . A direct relation between the almost
sure injectivity of the delay coordinate maps, predictability of observables and the properties of the
system of conditional measures is described in the following lemma.

Lemma 1.20. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

and let T : X → X be a continuous map. Consider a continuous observable h : X → R and fix k ∈ N.
Then the following hold.
(a) The k-delay coordinate map φh is almost surely injective if and only if µh,y is a Dirac’s
measure for φh µ-almost every y ∈ Rk .
(b) The observable h is almost surely k-predictable if and only if φh ◦T (µh,y ) is a Dirac’s measure
for φh µ-almost every y ∈ Rk .
(c) For φh µ-almost every y ∈ Rk , the prediction error σh (y) = limε→0 σh,ε (y) exists and is equal
to the standard deviation of a random variable with probability distribution φh ◦ T (µh,y ).

The proof of Lemma 1.20 is presented in Section 3.

Our ﬁrst result on the system of conditional measures is related to almost sure injectivity of delay
coordinate maps and is a generalization of assertion (a) of Theorem 1.7.

Theorem 1.21. Let X ⊂ RN , N ∈ N, be a compact set, let T : X → X be a Lipschitz map and

let µ be a Borel probability measure on X. Fix k ∈ N. Then, for a prevalent Lipschitz observable
h : X → R and the k-delay coordinate map φh , the following hold.
(a) dimH T k−1 (µh,φh (x) ) ≥ dimH T k−1 µ − k for µ-almost every x ∈ X.
(b) For every ε > 0, there holds dimH T k−1 (µh,φh (x) ) ≥ dimH T k−1 µ − k − ε for x from a set of
positive µ-measure.
10
The second result is related to almost sure predictability and generalizes assertion (b) of Theo-
rem 1.7.
Theorem 1.22. Let X ⊂ RN , N ∈ N, be a compact set, let T : X → X be a Lipschitz map and
let µ be a Borel probability measure on X. Fix k ∈ N and assume µ(PrePerk (T )) = 0. Then, for a
prevalent Lipschitz observable h : X → R and the k-delay coordinate map φh , the following hold.
(a) dimH φh ◦ T (µh,φh (x) ) ≥ min{1, dimH T k µ − k} for µ-almost every x ∈ X.
(b) For every ε > 0, there holds dimH φh ◦ T (µh,φh (x) ) ≥ min{1, dimH T k µ − k − ε} for x from
a set of positive µ-measure.
In Section 3 we show how Theorems 1.21–1.22 imply Theorems 1.7, 1.10 and 1.14, while the proofs
of Theorems 1.21–1.23 are presented in Section 5. As an important step in the proofs, in Section 5
we also show the following result, which might be of independent interest.
Theorem 1.23. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on
X and let T : X → X be a Lipschitz map. Fix k ∈ N such that k < dimH T k−1 µ and (in the
case k > 1) assume µ(PrePerk−1 (T )) = 0. Then for a prevalent Lipschitz observable h : X → R
and the k-delay coordinate map φh , the measure φh µ is absolutely continuous with respect to the
k-dimensional Lebesgue measure in Rk . If, additionally, µ is T -invariant or T is bi-Lipschitz onto
its image, then the condition k < dimH T k−1 µ may be replaced by k < dimH µ and the condition
Sk−1
µ(PrePerk−1 (T )) = 0 may be replaced by µ p=1 Perp (T ) = 0.
In the appendix to this paper, we prove an additional result (Theorem A.1) on the local dimen-
sions of the measure φh µ, which extends [SY97, Theorem 3.5] (see also [SY97, Remark 4.4] for a
version concerning delay coordinate maps) to a general setup of Lipschitz systems and non-invariant
measures.
1.5. Note on the proofs. The proofs of Theorems 1.21–1.23 are adaptations of the proofs of
Marstrand–Mattila-type projection theorems from the classical theory of orthogonal projections
(see [Mat95] for a comprehensive study) to the dynamical setting of delay coordinate maps. More
speciﬁcally, the proofs of Theorems 1.21 and 1.22 are inspired by the proofs of the slicing theorem
for measures [JM98, Theorem 3.3] (see also [Mat95, Theorem 10.7]), while Theorem 1.23 is modelled
after Marstrand–Mattila projection theorem for measures [HT94, Theorem 6.1] (see also [Mat95,
Theorem 9.7]). In the dynamical setting of delay coordinate maps, the two key elements of the
proofs are establishing a correct version of transversality property with respect to the parameter and
treating carefully the obstructions arising from the existence of (pre)periodic points – see Section 4
for details. Refer to [Sol23] for an exposition of the concept of transversality in the context of
orthogonal projections and iterated function systems. In particular, see [Sol23, Section 4.2] for an
explanation of the terminology. A similar task of transferring Marstrand–Mattila projection theorem
on dimension preservation [Mar54, Mat75] to the setup of delay coordinate maps was performed by
Sauer and Yorke in [SY97] and we extend their result in Appendix A.
1.6. Structure of the paper. In Section 2 we provide necessary deﬁnitions and results concerning,
among others, several notions of dimensions of sets and measures used in this paper, as well as a dis-
cussion on the notion of prevalence. In Section 3 we describe relations between conditional measures
with respect to the delay coordinate map and injectivity/predictability properties of observables,
proving Lemma 1.20 and showing how Theorems 1.21–1.22 imply Theorems 1.7, 1.10 and 1.14. Sec-
tion 4 introduces the key technical tools used in the proofs of the main results of the paper. More
precisely, in Subsection 4.1 we provide estimates of some ‘energy type’ integrals, Subsection 4.2
11
contains basic facts on observation matrices that are used for checking prevalence of suitable observ-
ables, in Subsection 4.3 we describe a convenient decomposition of the phase space X, Subsection 4.4
considers ‘restricted’ conditional measures, while Subsection 4.5 introduces ‘geometric slices’ of the
measure and discusses their relation to the system of conditional measures. In Section 5 we prove the
results on dimensions of conditional measures for the measure µ with respect to the delay coordinate
map (Theorems 1.21–1.22) together with Theorem 1.23. In Section 6 we prove Theorem 1.19, by
providing suitable examples. Section 7 contains a discussion on the assumptions within the main
results of paper, presenting several examples showing their necessity. Finally, in Appendix A, we
prove results on the local dimensions of the push-forward of the measure µ by the delay coordinate
maps (Theorem A.1, Corollary A.2 and Theorem A.3).

2. Preliminaries
By N we denote the set of positive integers. The symbols k · k, dist(·, ·) and | · | denote, re-
spectively, the Euclidean norm, distance and diameter in RN , N ∈ N. We set k(x1 , . . . , xN )k∞ =
max{|x1 |, . . . , |xN |} for (x1 , . . . , xN ) ∈ RN . The open r-ball around a point x ∈ RN is denoted by
BN (x, r) and the closed ball by BN (x, r) (sometimes we omit the dimension N in notation). By
Leb we denote the Lebesgue measure (or the outer Lebesgue measure, in the case of non-measurable
sets).

2.1. Singular values. Let ψ : Rm → Rk be a linear map and let A be the matrix of ψ. For
p ∈ {1, . . . , k} let σp (A) (we also use the symbol σp (ψ)) be the pth largest singular value of A,
i.e. the pth largest square root of an eigenvalue of the matrix A∗ A (counted with multiplicities).
It is well-known (see e.g. [Rob11, Lemma 14.2]) that the the rank of A equals the number of the
non-zero singular values of A.
For k ≤ m, the matrix A admits the singular value decomposition in the form A = U ΣV T , where
U is a k × k orthogonal matrix, V is an m × m orthogonal matrix, and Σ is the k × m rectangular
diagonal matrix of the form
 
σ1 (A) 0
.. 0  .
Σ= .


0 σk (A)
We will use the following lemma, proved as [SYC91, Lemma 4.2] (see also [Rob11, Lemma 14.3]).

Lemma 2.1. Let ψ : Rm → Rk be a linear transformation. Assume σp (ψ) > 0 for some p ∈
{1, . . . , k}. Then for every z ∈ Rk and ρ, ε > 0,
Leb({α ∈ Bm (0, ρ) : kψ(α) + zk ≤ ε}) ε p
≤C ,
Leb(Bm (0, ρ)) σp (ψ) ρ
where C > 0 depends only on m, k.

2.2. Measures. Let µ be a Borel measure on a Borel set X ⊂ RN . A Borel set Y ⊂ X is called
a full-measure subset of X, if µ(X \ Y ) = 0. By supp µ we denote the (topological) support of µ,
which is the smallest closed subset of full µ-measure. Dirac’s measure at a point x, denoted by δx ,
is the Borel probability measure supported on {x}. For Borel measures µ, ν in RN we write ν ≪ µ,
if ν is absolutely continuous with respect to µ, i.e. if ν(E) = 0 whenever µ(E) = 0 for Borel sets
E ⊂ RN . For a Borel map φ : X → Rk , k ∈ N, by φµ we denote the push-forward of µ under φ,
deﬁned by φµ(E) = µ(φ−1 (E)) for Borel sets E ⊂ Rk .
12
Let T : X → X be a Borel map. The measure µ is T -invariant, if µ(T −1 (E)) = µ(E) for every
Borel set E ⊂ X. The measure µ is ergodic, if for every Borel set E ⊂ X such that T −1 (E) = E,
there holds µ(E) = 0 or µ(X \ E) = 0.

Definition 2.2 (Natural measure). Let M be a compact Riemannian manifold and T : M → M

a C 1 -diﬀeomorphism. A compact T -invariant set X ⊂ M is called an attractor, if there exists an
open set B ⊂ M containing X, such that limn→∞ dist(T n x, X) = 0 for every x ∈ B. The largest
such set B(X) is called the (maximal) basin of attraction of X. A T -invariant Borel probability
measure µ on X is called a natural measure if
n−1
1X
lim δT i x = µ
n→∞ n
i=0

for almost every x ∈ B(X) with respect to the volume measure on M , where the limit is taken in
the weak-∗ topology.

2.3. Prevalence. As noted in the introduction, we understand the genericity in the space of Lips-
chitz observables on compact sets in the sense of prevalence – a notion introduced by Hunt, Sauer
and Yorke in [HSY92], which may be considered as an analogue of ‘Lebesgue almost sure’ condition
in inﬁnite dimensional normed linear spaces.

Definition 2.3. By a (complete) linear metric on a linear space we mean a (complete) metric which
makes addition and scalar multiplication continuous. Let V be a complete linear metric space (i.e. a
linear space with a complete linear metric). A Borel set S ⊂ V is called prevalent if there exists a
Borel measure ν in V , which is positive and ﬁnite on some compact set in V , such that for every
v ∈ V , there holds v + e ∈ S for ν-almost every e ∈ V . A non-Borel subset of V is prevalent if
it contains a prevalent Borel subset. For more information on prevalence we refer to [HSY92] and
[Rob11, Chapter 5].

For a compact set X ⊂ RN , N ∈ N we consider the space Lip(X) of Lipschitz functions on X.

to be the Lipschitz norm on Lip(X). With this norm, the space Lip(X) is a Banach space (in
particular, a complete linear metric space). In this paper we use the notion of prevalence in the sense
of Deﬁnition 2.3 applied to V = Lip(X). In fact, similarly as in [SYC91, Rob11, BGŚ22a, BGŚ22b],
we as the measure ν in Deﬁnition 2.3 we employ ν = ξ Leb for ξ : Rm → Lip(X), ξ(α1 , . . . , αm ) =
Pm
j=1 αj hj , where {h1 , . . . , hm } (called the probe set ) is the set of all real monomials of N variables
of degree at most d for d = 2k + 1 (in fact, most of the results proved in this paper hold also for
d = 2k − 1), and Leb is the k-dimensional Lebesgue measure in Rk . Then a set S ⊂ Lip(X) is
prevalent if for every h ∈ Lip(X), the function h + m j=1 αj hj is in S for Lebesgue-almost every
P

(α1 , . . . , αm ) ∈ Rm . Note that whenever prevalence in Lip(X) is established via a probe set consisting
of polynomials as above, it also provides prevalence in the spaces C r (X), r = 1, 2, . . . , ∞.

2.4. Dimensions. For convenience, we present the deﬁnitions of all notions of dimensions that
appear in this paper.
13
Definition 2.4. For s > 0, the s-dimensional (outer) Hausdorff measure of a set X ⊂ RN is deﬁned
as
nX∞ ∞
[ o
s s
H (X) = lim inf |Ui | : X ⊂ Ui , |Ui | ≤ δ .
δ→0
i=1 i=1
The Hausdorff dimension of X is given as
dimH X = inf{s > 0 : Hs (X) = 0} = sup{s > 0 : Hs (X) = ∞}
with the convention sup ∅ = 0.

Definition 2.5. For a bounded set X ⊂ RN and δ > 0, let N (X, δ) denote the minimal number
of balls of diameter at most δ required to cover X. The lower and upper box-counting (Minkowski)
dimensions of X are deﬁned, respectively, as
log N (X, δ) log N (X, δ)
dimB X = lim inf , dimB X = lim sup .
δ→0 − log δ δ→0 − log δ
If dimB X = dimB X, then their common value is denoted by dimB X and called the box dimension
of X.

Definition 2.6. Let µ be a finite Borel measure on RN . The lower and upper local dimensions of
µ at a point x ∈ supp µ are defined, respectively, as
log µ(B(x, δ)) log µ(B(x, δ))
d(µ, x) = lim inf , d(µ, x) = lim sup .
δ→0 log δ δ→0 log δ
If d(µ, x) = d(µ, x), then their common value is denoted by d(µ, x) and called the local dimension
of µ at x. If d(µ, x) exists and equals to some d for µ-almost every x, then we say that µ is exact
dimensional with dimension d.
The upper and lower Hausdorff dimension of µ are defined, respectively, as
dimH µ = inf{dimH E : µ(RN \ E) = 0} = ess sup d(µ, x),
x∼µ
dimH µ = inf{dimH E : µ(E) > 0} = ess inf d(µ, x).
x∼µ

See [Fal97, Propositions 10.2–10.3] for the proof of the equivalence of both variants of the deﬁnitions.

The deﬁnitions immediately imply that for ﬁnite Borel measures µ, ν on RN the following holds.
If ν ≪ µ, then dimH ν ≤ dimH µ and dimH ν ≥ dimH µ.

Definition 2.7. For a Borel probability measure µ in RN with compact support, its lower and
upper information dimensions are
log µ(B(x, ε)) log µ(B(x, ε))
ˆ ˆ
ID(µ) = lim inf dµ(x), ID(µ) = lim sup dµ(x).
ε→0 supp µ log ε ε→0 supp µ log ε
If ID(µ) = ID(µ), then we denote their common value as ID(µ) and call it the information dimension
of µ.

Definition 2.8. For a finite Borel measure µ in RN and s > 0, the s-potential of µ at a point
x ∈ RN is defined as
dµ(y)
ˆ
Es (µ, x) = s
,
RN kx − yk
14
while the s-energy of µ is
dµ(x)dµ(y)
ˆ ˆ ˆ
Es (µ) = Es (µ, x)dµ(x) = .
RN RN RN kx − yks
The correlation dimension of µ is defined as
dimc µ = sup{s > 0 : Es (µ) < ∞},
with the convention sup ∅ = 0. For an explanation of this terminology see [SY97, §2] and [Pes08,
Section 17]. Note that if µ has an atom, then dimc µ = 0. It turns out that the lower local dimension
can be characterized in terms of the local potentials as follows (see [SY97, Section 3.2] and [HK97,
Section 4]).
Lemma 2.9. Let µ be a finite Borel measure in RN . Then for every x ∈ supp µ,
d(µ, x) = sup{s > 0 : Es (µ, x) < ∞}.
Remark 2.10. If µ is a finite Borel measure on a bounded Borel set X in RN , then
dimH µ ≤ dimH µ ≤ dimH X ≤ dimB X ≤ dimB X
and
ID(µ) ≤ dimB X, ID(µ) ≤ dimB X.
In general, there are no relations between Hausdorff and information dimensions. However, it is
known that if µ is a T -invariant ergodic measure for a Lipschitz map T : X → X, then dimH µ ≤
ID(µ) ≤ ID(µ) (see [BGŚ22a, Proposition 2.1]).
The relation between the Hausdorff and correlation dimension is not so direct, but one has the
following result (see e.g. [Mat95, Theorem 8.7 and Remark 8.6]).
Lemma 2.11. Let E ⊂ RN be a Borel set. If µ is a finite Borel measure on E satisfying Es (µ) < ∞
for some s > 0, then dimH E ≥ s. Consequently, dimH µ ≥ dimc µ.
For more information on dimension theory in Euclidean spaces see [Fal14, Mat95, Rob11].

3. Relations between conditional measures and injectivity/predictability

properties
In this section we prove Lemma 1.20 and explain how Theorems 1.7, 1.10 and 1.14 follow from
Theorems 1.21–1.22.
Proof of Lemma 1.20. To show assertion (a), suppose ﬁrst that µh,y is a Dirac’s measure for φh µ-
almost every y ∈ Rk . This means that for φh µ-almost every y ∈ Rk there exists a (unique) f (y) ∈ X
such that µh,y = δf (y) . Then by (8), for every µ-measurable set E ⊂ X, the set {y ∈ Rk : f (y) ∈ E}
is φh µ-measurable, so f is measurable. Therefore the set
X̃h = {x ∈ X : f (φh (x)) = x}
is µ-measurable. Note that by (9), we have f (y) ∈ φ−1 h (y) for φh µ-a.e. y, hence also φh (f (φh (x))) =
φh (x) for µ-almost every x ∈ X. Applying f to both sides of the last equality yields f (φh (x)) ∈ X̃h
for µ-almost every x ∈ X. Consequently, (8) gives
ˆ ˆ
µ(X̃h ) = µh,y (X̃h )d(φh µ)(y) = δf (y) (X̃h )d(φh µ)(y)
Rk Rk
ˆ ˆ
= δf (φh (x)) (X̃h )dµ(x) = 1dµ(x) = 1,
X X
15
so X̃h is a full µ-measure set. If x, y ∈ X̃h and φh (x) = φh (y), then x = f (φh (x)) = f (φh (y)) = y,
hence φh is injective on X̃h . By the regularity of µ, the set X̃h contains a full µ-measure Borel subset
Xh . This shows that φh is almost surely injective.
Conversely, if φh is injective on a Borel set Xh ⊂ X of full µ-measure, then a system of measures
{µ̃h,y }y∈Rk given by
(
δ(φh |X )−1 (y) for y ∈ φh (Xh )
µ̃h,y = h

0 otherwise
satisfies the conditions (8)–(9), so it is a system of conditional measures of µ with respect to φh .
Hence, by the uniqueness of the system of conditional measures, µh,y = µ̃h,y is a Dirac’s measure
for φh µ-almost every y ∈ Rk . The details of this argument together with a precise discussion on
measurability issues are presented in [BGŚ22a, Proof of Theorem 3.1].
Assertion (c) is proved as [BGŚ22a, Lemma 3.2]. An immediate consequence of (c) is that for
φh µ-almost every y ∈ Rk , the point y is k-predictable (i.e. σh (y) = 0) if and only if the measure
φh ◦ T (µh,y ) is a Dirac’s measure. As pointed out in Remark 1.11, this condition is also equivalent
to the almost sure k-predictability of h, which proves assertion (b).
Now, supposing Theorems 1.21–1.22 are true, we show how they imply Theorems 1.7, 1.10
and 1.14. We start by proving Theorem 1.14.
Proof of Theorem 1.14. By assumption, dimH T k (µ|X\PrePerk (T ) ) > k > 0, which implies µ(X \
PrePerk (T )) > 0. Let
1
µ̃ = µ| .
µ(X \ PrePerk (T )) X\PrePerk (T )
Then µ̃ is a Borel probability measure on X such that µ̃(PrePerk (T )) = 0 and dimH T k µ̃ > k.
Hence, by Theorem 1.22(b) applied for µ̃, for a prevalent Lipschitz observable h : X → R we have
dimH φh ◦ T (µh,φh (x) ) > 0 for every x from a µ̃-positive measure set, hence also for every x from a
µ-positive measure set. Consequently, φh ◦ T (µh,φh (x) ) is not a Dirac’s measure (and consequently,
a random variable with probability distribution φh ◦ T (µh,φh (x) ) has positive standard deviation) for
every x from a µ-positive measure set. By Lemma 1.20(c), this implies that for a prevalent h there
exists δ0 > 0 such that
lim σh,ε (φh (x)) = σh (φh (x)) > δ0
ε→0
for every x ∈ Yh , where Yh ⊂ X is a set of positive µ-measure. Therefore, for every 0 < δ < δ0 ,
µ({x ∈ Yh : σh,ε (φh (x)) ≤ δ}) → 0 as ε → 0,
so
µ({x ∈ Yh : σh,ε (φh (x)) > δ}) → µ(Yh ) > 0 as ε → 0.
This shows the main assertion of Theorem 1.14.
To prove the additional ones note first that if T is bi-Lipschitz onto its image, then by the
definition of the Hausdorff dimension of a measure, dimH T k (µ|X\PrePerk (T ) ) = dimH µ|X\PrePerk (T ) .
Moreover, T is injective in this case, so every pre-periodic point is actually periodic and PrePerk (T ) =
Sk
p=1 Perp (T ), so dimH µ|X\PrePerk (T ) = dimH µ|X\ k Perp (T ) . Consequently, it suffices to assume
S
p=1
k < dimH µ|X\Sk Perp (T ) instead of k < dimH T k (µ|X\PrePerk (T ) )
p=1
Suppose now µ is T -invariant. Then
k
[ [k
−k
µ T Perp (T ) \ Perp (T ) = 0.
p=1 p=1
16
Sk
Hence, using the fact ⊂ PrePerk (T ) ⊂ T −k ( kp=1 Perp (T )), we obtain
S
p=1 Perp (T )

k
[
µ T −k Perp (T ) \ PrePerk (T ) = 0
p=1

and, consequently,

T k (µ|X\PrePerk (T ) )(E) = µ(T −k (E) \ PrePerk (T ))

k
[ k
[
= µ T −k (E) \ T −k Perp (T ) + µ T −k (E) ∩ T −k Perp (T ) \ PrePerk (T )
p=1 p=1
k
[ k
[
= µ T −k (E) \ T −k Perp (T ) = µ E \ Perp (T ) = µ|X\Sk Perp (T ) (E)
p=1
p=1 p=1

for Borel sets E. This implies that if µ is T -invariant, then T k (µ|X\PrePerk (T ) ) = µ|X\Sk Perp (T ) ,
p=1
which gives the required assertion.
Suppose now µ is T -invariant and ergodic. If µ( kp=1 Perp (T )) = 0, then T k (µ|X\PrePerk (T ) ) =
S

µ|X\Sk Perp (T ) = µ, which ends the proof. Otherwise, µ is supported on the periodic orbit of x
p=1
(see e.g. [BGŚ20, Remark 4.4]. The proof therein is for injective T , but it can be modiﬁed in a
straightforward manner to the general case). Then dimH µ = 0 = dimH T k (µ|X\PrePerk (T ) ), hence
the proof is ﬁnished.

Proof of Theorem 1.7. Suppose k < dimH T k−1 µ. Then by Theorem 1.21(b), we have
dimH T k−1 (µh,φh (x) ) > 0
for x from a positive µ-measure set, so T k−1 (µh,φh (x) ) (and hence µh,φh (x) ) is not a Dirac measure
for x from a positive µ-measure set. Consequently, µh,y is not a Dirac measure for y from a positive
φh µ-measure set. By Lemma 1.20(a), this implies the ﬁrst part of assertion (a). If, additionally, µ is
T -invariant or T is bi-Lipschitz onto its image, then dimH T k µ = dimH µ, which shows the second
part of assertion (a).
To show assertion (b), note that if h is almost surely k-predictable, then σh,ε(φh (x)) → 0 as ε → 0
for µ-almost every x ∈ X (see Remark 1.11), so µ({x ∈ X : σh,ε (φh (x)) > δ}) → 0 as ε → 0 for
every δ > 0. Therefore, assertion (b) follows directly from Theorem 1.14.

Proof of Theorem 1.10. In view of Theorem 1.7(a), to show assertion (a) it is suﬃcient to construct a
Borel probability measure µ on X with dimH T k−1 µ > k. To do it, note that since dimH T k−1 (X) >
k, Frostman’s lemma (see e.g. [Mat95, Theorem 8.8]) implies that there exists a Borel probability
measure ν on T k−1 (X) with dimH ν > k. As T is continuous, the Kuratowski–Ryll–Nardzewski
selection theorem [Kec95, Theorem 12.13] ensures that there exists a Borel partial inverse to T k−1 ,
i.e. a Borel map F : T k−1 (X) → X satisfying T k−1 (F y) = y for y ∈ T k−1 (X) (to apply the theorem,
we use the fact that a continuous image of an open subset of X is Borel, as a countable union of
compact sets). Set µ = F ν. Then µ is a Borel probability measure on X such that T k−1 µ = ν.
Hence, dimH T k−1 µ = dimH ν > k. This shows assertion (a).
For assertion (b), using Theorem 1.7(b), it is suﬃcient to construct a Borel probability measure
µ on X \ PrePerk (T ) with dimH T k µ > k. Similarly as previously, Frostman’s lemma implies the
existence of a Borel probability measure ν on T k (X) \ kp=1 Perp (T ) with dimH ν > k. Taking
S

F : T k (X) \ kp=1 Perp (T ) → X to be a Borel partial inverse to T k and setting µ = F ν, we obtain

S
17
dimH T k µ > k. To end the proof, it is enough to notice that kp=1 Perp (T ) = T k (PrePerk (T )), so
S

F T k (X) \ kp=1 Perp (T ) ⊂ X \ PrePerk (T ), and hence µ is supported on X \ PrePerk (T ).

4. Technical tools
4.1. Energy integral estimates. The following lemma can be proved by the same arguments as
[SY97, Lemma 2.6]. For the reader’s convenience, we include a complete proof.

Lemma 4.1. Let A be the matrix of a linear transformation ψ : Rm → Rk , m, k ∈ N, m ≥ k and let

p ∈ {1, . . . , k} be such that the pth largest singular value σp (A) is positive. Then for every b ∈ Rk
and 0 < s < p,
dα C
ˆ
(10) s
≤ ,
Bm (0,1) kAα + bk (σp (A))s

where C > 0 depends only on m, k, p and s.

Proof. First, we prove (10) for b = 0. Let A = U ΣV T be the singular value decomposition of A (see
Subsection 2.1). As U and V are orthogonal, we have, for α = (α1 , . . . , αm ),
X k −s/2
dα dα
ˆ ˆ ˆ
2 2
= = (σ j (A)) αj dα1 · · · dαm
Bm (0,1) kAαks Bm (0,1) kΣαk
s
Bm (0,1) j=1
(11) Xp −s/2
c
ˆ
≤ (σp (A))2 α2j dα1 · · · dαm =
Bm (0,1) (σp (A))s
j=1

for
ˆ p
X −s/2
c= α2j dα1 · · · dαm
Bm (0,1) j=1

This yields (10) with b = 0. To prove (10) for an arbitrary b ∈ Rk , consider ﬁrst the case when
kbk ≥ 2kAk, where kAk denotes the matrix norm. Then kAα + bk ≥ kAαk for α ∈ Bm (0, 1), so by
(11),
dα dα c
ˆ ˆ
s
≤ s
≤ ,
Bm (0,1) kAα + bk Bm (0,1) kAαk (σp (A))s
providing (10) in this case. For the remaining case, let kbk < 2kAk, write b = b1 +b2 , where b1 ∈ ImA
and b2 ∈ (ImA)⊥ . Then kb1 k ≤ kbk < 2kAk, so there exists α0 ∈ Bm (0, 2) such that b1 = Aα0 .
Therefore,
dα dα dα
ˆ ˆ ˆ
s
= 2 2 s/2
≤ s
Bm (0,1) kAα + bk Bm (0,1) (kAα + b1 k + kb2 k ) Bm (0,1) kAα + b1 k
dα dα
ˆ ˆ
= s
≤ s
Bm (0,1) kA(α + α0 )k Bm (0,3) kAαk
dα c3m−s
ˆ
= 3m−s s
≤ ,
Bm (0,1) kAαk (σp (A))s

where the two last steps follow, respectively, by the change of variables α 7→ α/3 and using (11).
This ﬁnishes the proof.
18
4.2. Observation matrices. Let X ⊂ RN , N ∈ N, be a compact set and let T : X → X be
a Lipschitz transformation. Fix k ∈ N, d ≥ 2k − 1, and let {h1 , . . . , hm } be the set of all real
monomials of N variables of degree at most d. Note that m ≥ k. For a Lipschitz observable
h : X → R and α = (α1 , . . . , αm ) ∈ Rm let hα : X → R be given by
m
X
hα = h + αj hj .
j=1

For simplicity, we write φα instead of φhα for the k-delay coordinate map corresponding to hα , i.e.
φα : X → Rk , φα (x) = (hα (x), hα (T x), . . . , hα (T k−1 x)).
Note that for x, y ∈ X we have
(12) φα (x) − φα (y) = Dx,y α + wx,y
for a k × m matrix Dx,y defined by
h1 (x) − h1 (y) hm (x) − hm (y)
 
...
 h1 (T x) − h1 (T y) ... hm (T x) − hm (T y) 
(13) Dx,y =  .. .. ..
 
. . .

 
h1 (T k−1 x) − h1 (T k−1 y) . . . hm (T k−1 x) − hm (T k−1 y)
and
h(x) − h(y)
 
 h(T x) − h(T y) 
wx,y = .. .
 
 . 
h(T k−1 x) − h(T k−1 y)
The above notation will be used throughout the paper.
Remark 4.2. As explained in Subsection 2.3, a sufficient condition for a set S ⊂ Lip(X) to be
prevalent (with the probe set {h1 , . . . , hm } defined as the family of all real monomials of N variables
of degree at most d) is that for every Lipschitz observable h : X → R, we have hα ∈ S for Lebesgue-
almost every α ∈ Rm . Within the subsequent part of the paper, we check prevalence using this
condition. For all the results, it is enough to take d = 2k + 1 in the definition of {h1 , . . . , hm },
while most of them hold also for d = 2k − 1, which is indicated in the formulations of the particular
results.1
The following fact was proved in [SY97].
Lemma 4.3 ([SY97, Lemma 4.1]). Fix d ≥ 2k − 1 and let {h1 , . . . , hm } be the set of all monomials
of N variables of degree at most d. Assume y1 , . . . , y2k ∈ RN satisfy
kyi+k − yi k ≥ σ for i = 1, . . . , k,
kyi − yj k ≥ ε for i, j = 1, . . . , 2k such that i 6= j, |i − j| =
6 k.
for some σ, ε > 0. Then for every z = (z1 , . . . , z2k ) ∈ R2k there exists α = (α1 , . . . , αm ) ∈ Rm , such
that
Xm
αj hj (yi ) = zi for i = 1, . . . , 2k
j=1

1It is easy to see that if a set S is prevalent with the probe set defined as the family of all monomials of degree
˜ for any
at most d, then S is prevalent with the probe set defined as the family of all monomials of degree at most d,
˜
d ≥ d.
19
and
2k(max{1, ky1 k, . . . , ky2k k})2m−1 kzk∞
kαk∞ ≤ .
ε2k−2 σ
Using this lemma, we prove the following two estimates of the singular values of Dx,y .
Proposition 4.4. Fix d ≥ 2k − 1 and let {h1 , . . . , hm } be the set of all monomials of N variables
of degree at most d. For x, y ∈ X assume
kT i ξ1 − T j ξ2 k ≥ ε for i, j = 0, . . . , k − 1 such that i 6= j, ξ1 , ξ2 ∈ {x, y}
for some ε ≥ 0. Then
σk (Dx,y ) ≥ Cε2k−2 kT k−1 x − T k−1 yk,
where C > 0 depends only on k, m, X and Lip(T ).
Proof. Obviously, we can assume ε > 0 and kT k−1 x − T k−1 yk > 0. Then
σ = min{kT i x − T i yk : 0 ≤ i ≤ k − 1} > 0.
Applying Lemma 4.3 for yi = T i−1 x, yk+i = T i−1 y, i = 1, 2, . . . , k and z1 , . . . , zk ∈ R, zk+1 = · · · =
z2k = 0, we find α = (α1 , . . . , αm ) ∈ Rm such that
m
X 2k(max{1, diam X + dist(0, X)})2m−1 kz̃k∞
αj hj (yi ) = zi for i = 1, . . . , 2k, kαk∞ ≤ ,
ε2k−2 σ
j=1

where
z̃ = (z1 , . . . , zk ),
Hence, by (13),
m
X k
Dx,y α = αj (hj (T i−1 x) − hj (T i−1 y)) = (z1 − zk+1 , . . . , zk − z2k ) = z̃.
i=1
j=1

Moreover,
kz̃k
(14) kαk ≤
cε2k−2 σ
for some c > 0 depending only on k, m and diam X + dist(0, X). Concluding, for every z̃ ∈ Rk
we have found α ∈ Rm such that Dx,y α = z̃ and (14) holds. This implies (it is enough to use the
singular value decomposition of Dx,y , see Subsection 2.1) that
σk (Dx,y ) ≥ cε2k−2 σ.
As T is Lipschitz, for 0 ≤ i ≤ k − 1 we have kT k−1 x − T k−1 yk ≤ (Lip(T ))k−1−i kT i x − T i yk ≤
Lk−1 kT i x − T i yk, where L = max{Lip(T ), 1}. Hence, σ ≥ L1−k kT k−1 x − T k−1 yk, which implies
σk (Dx,y ) ≥ cL1−k ε2k−2 kT k−1 x − T k−1 yk.

Proposition 4.5. Fix d ≥ 2k + 1 and let {h1 , . . . , hm } be the set of all monomials of N variables
of degree at most d. For x, y ∈ X assume
kT i ξ1 − T j ξ2 k ≥ ε for i, j = 0, . . . , k such that i 6= j, ξ1 , ξ2 ∈ {x, y}
for some ε ≥ 0. Then
σ1 (DT x,T y |Ker Dx,y ) ≥ Cε2k kT k x − T k yk,
where C > 0 depends only on k, m, X and Lip(T ), while DT x,T y |Ker Dx,y denotes the restriction of
the linear operator with the matrix DT x,T y to the kernel of the linear operator with the matrix Dx,y .
20
Proof. The proof is analogous to the one of Proposition 4.4. We can assume ε > 0 and kT k x−T k yk >
0, which gives
σ = min{kT i x − T i yk : 0 ≤ i ≤ k} > 0.
By Lemma 4.3 applied for yi = T i−1 x, yk+1+i = T i−1 y, i = 1 . . . , k + 1 and
z1 = z2 = · · · = zk = 0, zk+1 = w, zk+2 = · · · = z2k+2 = 0,
where w ∈ R, we ﬁnd α = (α1 , . . . , αm ) ∈ Rm such that
m
X kz̃k
αj hj (yi ) = zi for i = 1, . . . , 2k + 2, kαk ≤ ,
cε2k σ
j=1

where
z̃ = (0, . . . , 0, w) ∈ Rk
and c > 0 depends only on k, m and diam X + dist(0, X). By (13),
Dx,y α = (z1 − zk+2 , . . . , zk − z2k+1 ) = 0,
DT x,T y α = (z2 − zk+3 , . . . , zk+1 − z2k+2 ) = z̃.
Choosing w = cε2k σ, we have α ∈ Ker Dx,y ∩ Bm (0, 1) with kDT x,T y αk = cε2k σ, which implies
σ1 (DT x,T y |Ker Dx,y ) ≥ cε2k σ.
As in the proof of Proposition 4.4, the Lipschitz condition for T gives σ ≥ L−k kT k x − T k yk for
L = max{Lip(T ), 1}, so
σ1 (DT x,T y |Ker Dx,y ) ≥ cL−k ε2k kT k x − T k yk.

4.3. Phase space decomposition. The proofs of Theorems 1.21–1.23 require working with energy
integrals, which leads to estimates of the correlation dimensions of the considered measures. In order
to obtain the results for the Hausdorﬀ dimension, we have to restrict measure µ to suitable sets,
where its Hausdorﬀ and correlation dimensions are arbitrary close. Moreover, for technical reasons it
is necessary to consider subsets of X, on which the number ε from Propositions 4.4–4.5 is uniformly
bounded away from zero. Both objectives are achieved by the following decomposition of X.

Proposition 4.6. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on

X and let T : X → X be a Lipschitz map. Fix ℓ ∈ N ∪ {0}, η > 0 and (in the case ℓ > 0) assume
µ (PrePerℓ (T )) = 0. Then there exists a countable collection F of compact subsets of X such that:
S
(i) µ F ∈F F = 1 and µ(F ) > 0 for every F ∈ F,
(ii) dimc T ℓ (µ|F ) ≥ dimH T ℓ µ − η,
(iii) if ℓ > 0, then for every F ∈ F there exists ε = ε(F ) > 0 such that
kT i ξ1 − T j ξ2 k ≥ ε for i, j = 0, . . . , ℓ such that i 6= j, ξ1 , ξ2 ∈ {x, y}
for every x, y ∈ F .
Proof. Let t = dimH T ℓ µ. For n ∈ N set
Xn = {x ∈ X : T ℓ µ(B(x, r)) ≤ r t−η for every 0 < r < 1/n}.
By the deﬁnition of dimH µ, we have µ ∞ −ℓ (X ) = T ℓ µ ( ∞ X ) = 1. Furthermore,
S S
n=1 T n n=1 n

dimc T ℓ (µ|T −ℓ (Xn ) ) ≥ t − η

21
for every n ∈ N. Indeed, for 0 < s < t − η, a suitable change of coordinates (see e.g. the calculation
in [Mat95, Chapter 8]) and the equality T ℓ (µ|T −ℓ (Xn ) ) = (T ℓ µ)|Xn provide
ˆ ˆ ∞ ℓ
d(T ℓ µ)(x)d(T ℓ µ)(y) T µ(Xn ∩ B(x, r))
ˆ ˆ
Es (T ℓ (µ|T −ℓ Xn )) = s
= s s+1
drd(T ℓ µ)(x)
Xn Xn kx − yk Xn 0 r
ˆ 1/n ˆ ∞
dr
≤s r t−η−s−1 dr + s+1
< ∞.
0 1/n r

The remainder of the construction proceeds as in the proof of [SY97, Theorem 4.2]. We present the
arguments for the reader’s convenience. In the case ℓ > 0 we have µ(PrePerℓ (T )) = 0, so

ε(x) = min{kT i x − T j xk : 0 ≤ i 6= j ≤ ℓ} > 0 for µ-almost every x ∈ X

(in the case ℓ = 0 we set ε(x) = 1 for x ∈ X). Then for

Yq = {x ∈ X : ε(x) ≥ 1/q}, q∈N

we obtain µ( ∞ 1
q=1 Yq ) = 1. Note that if y ∈ BN (x, 4q (Lip(T )) ) for some x ∈ Yq , then
−ℓ
S

1
kT i ξ1 − T j ξ2 k ≥ for i, j = 0, . . . , ℓ such that i 6= j, ξ1 , ξ2 ∈ {x, y}.
2q
1
Let Bq be a countable cover of Yq by balls centred in Yq , of radii at most 4q (Lip(T )) .
−ℓ Then

F̃ = {F̃ = T −ℓ Xn ∩ Yq ∩ B : n, q ∈ N, B ∈ Bq , µ(F̃ ) > 0}

is a countable family satisfying the conditions (i)–(iii). By the regularity of µ, each set F̃ ∈ F̃ has
a full µ-measure subset, which is a countable union of compact sets. The union over F̃ ∈ F̃ of the
families of all these compact sets deﬁnes the suitable family F.

4.4. Restricted conditional measures. In the subsequent proofs, we will use the following fact.

Lemma 4.7. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

and let φ : X → Rk , k ∈ N, be a Borel map. Suppose F ⊂ X is a positive µ-measure set and let
1
ν = µ(F ) µ|F . Then for ν-almost every x ∈ F , there exists f (x) > 0 such that

νφ,φ(x) = f (x)µφ,φ(x) |F ,

where {µφ,y }y∈Rk and {νφ,y }y∈Rk are, respectively, the systems of conditional measure of µ and ν,
with respect to φ.

Proof. Note that φ(µ|F ) ≪ φµ, so by the diﬀerentiation theorem for measures (see e.g. [Mat95,
Theorem 2.12]), the Radon-Nikodym derivative
dφ(µ|F ) φ(µ|F )(B(y, δ))
(y) = lim
dφµ δ→0 φµ(B(y, δ))

exists and is positive and finite for φ(µ|F )-almost every y ∈ Rk . This together with (7) implies the
−1
assertion of the lemma for f (x) = µ(F ) dφ(µ|
dφµ
F)
(φ(x)) .
22
4.5. Geometric slices. In the subsequent section, apart from the conditional measures defined in
(7), we use ‘geometric slices’ of the measure µ. More precisely, for a Borel map φ : X → Rk on a
compact set X ⊂ RN and a Borel probability measure µ on X we define the system of geometric
φ,y , y ∈ R , of µ as weak- limits (recall that H , s > 0 denotes the s-dimensional Hausdorff
slices µG k ∗ s

measure)
1 1 1
µG
φ,y = lim k
µ|φ−1 (B(y,δ)) = k lim k µ|φ−1 (B(y,δ)) ,
δ→0 H (B(y, δ)) H (B(0, 1)) δ→0 δ
whenever the limit exists, and zero otherwise. The relation between the conditional measures and
geometric slices (under an absolute continuity condition) is explained by the following lemma.

Lemma 4.8. Let φ : X → Rk , k ∈ N, be a Borel map on a compact set X ⊂ RN , N ∈ N, and let

µ be a probability Borel measure on X. Assume φµ ≪ Hk . Then for φµ-almost every y ∈ Rk there
exists 0 < f (y) < ∞ such that
µG
φ,y = f (y)µφ,y .

Moreover,
ˆ
µ(E) = µG k
φ,y (E)dH (y).
Rk
for every Borel set E ⊂ X.
d(φµ)
Proof. Set f to be the Radon–Nikodym derivative dHk
. Then 0 < f (y) < ∞ for φµ-almost every
φµ(B(y,δ))
y∈ Rk and f (y) =limδ→0 H for
k (B(y,δ)) Hk -almostevery y ∈ Rk , which provides the ﬁrst assertion
of the lemma. Furthermore, for every Borel set E ⊂ X,
ˆ ˆ ˆ
k
µ(E) = µφ,y (E)d(φµ)(y) = µφ,y (E)f (y)dH (y) = µG k
φ,y (E)dH (y).
Rk Rk Rk

We will also make use of the following simple observation. If g : X → [0, ∞] is lower semi-
continuous, then for φµ-almost every y ∈ Rk ,
1 1
ˆ ˆ
G
(15) g dµφ,y ≤ k lim inf g dµ.
H (B(0, 1)) δ→0 δk φ−1 (B(y,δ))

This follows from the deﬁnition of µGφ,y as a weak- limit and the fact that a lower semi-continuous
∗

function g : X → [0, ∞] is a non-decreasing limit of a sequence of non-negative continuous functions.

Remark 4.9. The advantage of switching from conditional measures to geometric slices is that the
latter are better suited for the transversality argument used in the proofs. In Theorem 5.1 we will
show that under assumptions of our main theorems, the condition φµ ≪ Hk is satisﬁed, when φ is
a (typical) k-delay coordinate map. Then by Lemma 4.8, the dimensions of µφ,y and µG φ,y coincide
for φµ-almost every y ∈ R . k

5. Proofs of Theorems 1.21–1.23

In this section we prove suitable versions of Theorems 1.21–1.23, using the prevalence condition
described in Remark 4.2, under the notation introduced in Subsection 4.2.
23
5.1. Proof of Theorem 1.23. The following result is the suitable version of Theorem 1.23.
Theorem 5.1. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X
and let T : X → X be a Lipschitz map. Fix k ∈ N such that k < dimH T k−1 µ and (in the case k > 1)
assume µ(PrePerk−1 (T )) = 0. Fix d ≥ 2k − 1 and let {h1 , . . . , hm } be the set of all monomials of
N variables of degree at most d. Let h : X → R be a Lipschitz observable. Then for Lebesgue-almost
every α ∈ Rm we have φα µ ≪ Hk , where φα is the k-delay coordinate map corresponding to hα .
If, additionally, µ is T -invariant or T is bi-Lipschitz onto its image, then the condition k <
dimH T k−1 µ may be replaced by k < dimH µ and the condition µ(PrePerk−1 (T )) = 0 may be replaced
Sk−1
by µ p=1 Perp (T ) = 0.
Proof. It is easy to check that it is suﬃcient to prove the assertion for Lebesgue-almost every
α ∈ Bm (0, 1) (then the general case follows by multiplying h by a positive constant). Choose η > 0
such that dimH T k−1 µ − η > k. Consider the collection F from Proposition 4.6 corresponding to
ℓ = k − 1 and η. Fix F ∈ F. We will prove that φα (µ|F ) ≪ Hk for Lebesgue-almost every
α ∈ Bm (0, 1). As µ F ∈F F = 1, this will ﬁnish the proof.
S

To prove φα (µ|F ) ≪ Hk , it suﬃces to show (see [Mat95, Theorem 2.12]) that for φα (µ|F )-almost
every z ∈ Rk we have
φα (µ|F )(B(z, δ))
lim inf < ∞.
δ→0 δk
Therefore, it is enough to show
φα (µ|F )(B(z, δ))
ˆ ˆ
I= lim inf dφα (µ|F )(z)dα < ∞.
Bm (0,1) Rk δ→0 δk
For that, we proceed as follows. First, by Fatou’s lemma,
1
ˆ ˆ
µ F ∩ φ−1

I ≤ lim inf k α (B(φα (x), δ)) dµ(x)dα
δ→0 δ Bm (0,1) F
1
ˆ ˆ ˆ
= lim inf k 1φ−1
α (B(φα (x),δ))
(y) dµ(y)dµ(x)dα.
δ→0 δ Bm (0,1) F F

Consequently, by Tonelli’s theorem, (12) and Lemma 2.1,

1
ˆ ˆ ˆ
I ≤ lim inf k 1{α∈Bm (0,1):kφα (x)−φα (y)k≤δ} (α) dαdµ(x)dµ(y)
δ→0 δ F F Bm (0,1)
1
ˆ ˆ
= lim inf k Leb ({α ∈ Bm (0, 1) : kφα (x) − φα (y)k ≤ δ}) dµ(x)dµ(y)
δ→0 δ F F
1
ˆ ˆ
= lim inf k Leb ({α ∈ Bm (0, 1) : kDx,y α + wx,y k ≤ δ}) dµ(x)dµ(y)
δ→0 δ F F
dµ(x)dµ(y)
ˆ ˆ
≤ C1 k
F F (σk (Dx,y ))
for some C1 > 0. By Proposition 4.6.(iii), we can apply Proposition 4.4 to obtain
dµ(x)dµ(y)
ˆ ˆ
I ≤ C2 k−1 k−1 ykk
= C2 Ek (T k−1 (µ|F ))
F F kT x − T
for some C2 = C2 (F ) > 0. Since dimc T k−1 (µ|F ) ≥ dimH T k−1 µ − η > k by Proposition 4.6(ii), we
obtain Ek (T k−1 (µ|F )) < ∞, which gives I < ∞ and ends the proof of the absolute continuity of
φα µ.
To show the additional assertions, note that a push-forward by a bi-Lipschitz map does not change
the dimension of the measure and injectivity of T implies that every pre-periodic point x ∈ PrePerk−1
24
Sk−1
is actually periodic of period at most k − 1. Furthermore, if µ is T -invariant and µ

p=1 Perp (T ) =
Sk−1
0, then µ(PrePerk−1 (T )) = 0 since PrePerk−1 (T ) ⊂ T −(k−1) ( p=1 Perp (T )).

5.2. Proof of Theorem 1.21. To prove Theorem 1.21, ﬁrst we show the following result, where we
assume µ(PrePerk−1 (T )) = 0. Later we will explain how to remove this assumption. For simplicity,
here and in the sequel we write µα,y (resp. µG
α,y ), y ∈ R , for conditional measures (resp. geometric
k

slices) of a Borel probability measure µ on X with respect to a k-delay coordinate map φα .

Theorem 5.2. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X and
let T : X → X be a Lipschitz map. Fix k ∈ N and (in the case k > 1) assume µ(PrePerk−1 (T )) = 0.
Fix d ≥ 2k − 1 and let {h1 , . . . , hm } be the set of all monomials of N variables of degree at most d.
Let h : X → R be a Lipschitz observable. Then for Lebesgue-almost every α ∈ Rm ,

dimH T k−1 (µα,φα (x) ) ≥ dimH T k−1 µ − k for µ-almost every x ∈ X,

where φα is the k-delay coordinate map corresponding to hα .

Proof. As previously, it is suﬃcient to prove the assertion for Lebesgue-almost every α ∈ Bm (0, 1).
Obviously, we can assume dimH T k−1 µ > k. Choose η, s > 0 such that

η < dimH T k−1 µ − k, k < s < dimH T k−1 µ − η.

Consider the collection F from Proposition 4.6 corresponding to ℓ = k − 1 and η. Fix F ∈ F. We

will prove

(16) dimH T k−1 (µα,φ(x) ) ≥ s − k for µ-almost every x ∈ F and almost every α ∈ Bm (0, 1).

As η, s can be chosen such that s is arbitrarily close to dimH T k−1 µ and µ F ∈F F = 1, proving
S

(16) will conclude the proof of the theorem.

Let
1
ν= µ|F .
µ(F )
As by Lemma 4.7, for ν-almost every x ∈ F , the measures να,φα (x) and µα,φα (x) |F are equal up to a
multiplication by a positive constant, to prove (16) it is suﬃcient to show

(17) dimH T k−1 (να,φα (x) ) ≥ s − k for ν-almost every x ∈ F and Leb-almost every α ∈ Bm (0, 1).

The remainder of the proof is devoted to verifying (17).

As dimH T k−1 ν ≥ dimH T k−1 µ > k and (in the case k > 1)
1
ν(PrePerk−1 (T )) ≤ µ(PrePerk−1 (T )) = 0,
µ(F )
Theorem 5.1 implies φα ν ≪ Hk for Lebesgue-almost every α ∈ Bm (0, 1). Hence, by Lemma 4.8, the
assertion (17) will follow from

(18) dimH T k−1 (να,z

G
) ≥ s−k for Hk -almost every z ∈ Rk and Leb-almost every α ∈ Bm (0, 1)

(with the convention that dimH of a zero measure is zero). By Lemma 2.11, to show (18) it suﬃces
to prove
ˆ ˆ
(19) I= Es−k (T k−1 (να,z
G
))dHk (z)dα < ∞.
Bm (0,1) Rk
25
Now we check (19). As the function x 7→ kT k−1 x− T k−1 ykk−s ∈ [0, +∞] is lower semi-continuous,
applying (15) for the measure ν together with Fatou’s lemma implies
ˆ ˆ ˆ ˆ G (x)dν G (y)dHk (z)dα
dνα,z α,z
I= k−1 x − T k−1 yks−k
Bm (0,1) R k F F kT
G (y)dHk (z)dα
dν(x)dνα,z
C1
ˆ ˆ ˆ ˆ
≤ lim inf k
δ→0 δ Bm (0,1) Rk F φ−1 α (B(z,δ))
kT k−1 x − T k−1 yks−k
for some C1 > 0. By Lemma 4.8,
G (y)dHk (z)
dνα,z dν(y)
ˆ ˆ ˆ
k−1 k−1 s−k
= k−1
.
Rk F kT x−T yk F kT x − T k−1 yks−k
Therefore, by Tonelli’s theorem,
C1 dν(x)dν(y)dα
ˆ ˆ ˆ
I ≤ lim inf k k−1
δ→0 δ Bm (0,1) φ−1
α (B(φα (y),δ))
kT x − T k−1 yks−k
F

C1
ˆ ˆ ˆ 1φ−1
α (B(φα (y),δ))
(x)
= lim inf dν(x)dν(y)dα
δ→0 δk Bm (0,1) F F kT k−1 x − T k−1 yks−k
C1 1{α∈Bm (0,1):kφα (x)−φα (y)k≤δ} (α)
ˆ ˆ ˆ
= lim inf k dαdν(x)dν(y)
δ→0 δ F F Bm (0,1) kT k−1 x − T k−1 yks−k
C1 Leb({α ∈ Bm (0, 1) : kφα (x) − φα (y)k ≤ δ})
ˆ ˆ
= lim inf dν(x)dν(y).
δ→0 δk F F kT k−1 x − T k−1 yks−k
Consequently, by Lemma 2.1,
dν(x)dν(y)
ˆ ˆ
I ≤ C1 .
F F (σk (Dx,y ))k kT k−1 x − T k−1 yks−k
By Proposition 4.6(iii), we can apply Proposition 4.4 to obtain
dν(x)dν(y)
ˆ ˆ
I ≤ C2 k−1 x − T k−1 ks
= C2 Es (T k−1 ν).
F F kT
for some C2 = C2 (F ) > 0. As dimc T k ν ≥ dimH T k µ − η > s by Proposition 4.6.(ii), we have
Es (T k−1 ν) < ∞, which establishes (19) and ﬁnishes the proof of the theorem.

Proof of Theorem 1.21. Assertion (a) of Theorem 1.21 is a direct consequence of Theorem 5.2, if we
assume additionally µ (PrePerk−1 (T )) = 0 in the case k > 1. Therefore, it remains to prove that
this assumption can be omitted.
For p ∈ N ∪ {0} and q ∈ N deﬁne
Xp,q = {x ∈ X : x, T x, . . . , T p+q−1 x are pairwise distinct and T p+q x = T p x}.
In other words, Xp,q is the union of all pre-periodic orbits under T with the pre-periodic part of
length p and the periodic part of length q. Let
[
R = {(p, q) : p ∈ N ∪ {0}, q ∈ N, p + q ≤ k − 1}, Y =X\ Xp,q
(p,q)∈R

and set ( (
1
µ(Xp,q ) µ|Xp,q if µ(Xp,q ) > 0 1
µ(Y ) µ|Y if µ(Y ) > 0
µp,q = , ν= .
0 if µ(Xp,q ) = 0 0 if µ(Y ) = 0
26
As
X
µ = µ|Y + µ|Xp,q ,
(p,q)∈R

we have that for µ-almost every x ∈ X, the conditional measure µφh,k ,φh,k (x) with respect to a k-
delay coordinate map φh,k corresponding to an observable h (here and in the sequel it is convenient
to include the number of measurements in the notation for the delay coordinate map) is a convex
combination of the conditional measures µp,q
φh,k ,φh,k (x) , (p, q) ∈ R, and νφh,k ,φh,k (x) . Therefore, it
suﬃces to show for a prevalent Lipschitz observable h : X → R,
dimH T k−1 (µp,q
φh,k ,φh,k (x) ) ≥ dimH T
k−1
µ−k for µp,q -almost every x ∈ X,
dimH T k−1 (νφh,k ,φh,k (x) ) ≥ dimH T k−1 µ − k for ν-almost every x ∈ X
for (p, q) ∈ R. As
dimH T k−1 µp,q ≥ dimH T k−1 µ, dimH T k−1 ν ≥ dimH T k−1 µ
whenever these measure are non-zero, it suﬃces to prove, for a prevalent h and (p, q) ∈ R.
(20) dimH T k−1 (µp,q
φh,k ,φh,k (x) ) ≥ dimH T
k−1 p,q
µ −k for µp,q -almost every x ∈ X,
(21) dimH T k−1 (νφh,k ,φh,k (x) ) ≥ dimH T k−1 ν − k for ν-almost every x ∈ X.
As ν(PrePerk−1 (T )) = 0, Theorem 5.2 gives (21). Fix (p, q) ∈ R and let ℓ = p + q ≤ k − 1. Then
µp,q (PrePerℓ−1 (T )) = 0, so we can apply Theorem 5.2 to the measure µp,q , with ℓ instead of k. This
implies
(22) dimH T ℓ−1 (µp,q
φh,ℓ ,φh,ℓ (x) ) ≥ dimH T
ℓ−1 p,q
µ − ℓ for µp,q -almost every x ∈ X

for a prevalent h. Since for x, y ∈ Xp,q we have kφh,ℓ (x) − φh,ℓ (y)k∞ = kφh,k (x) − φh,k (y)k∞ , we see
from (7) that
(23) µp,q p,q
φh,ℓ ,φh,ℓ (x) = µφh,k ,φh,k (x) for µp,q -almost every x ∈ X.

As T is Lipschitz and ℓ < k, we have

(24) dimH T ℓ−1 µp,q − ℓ ≥ dimH T k−1 µp,q − k.
Note that there exists 0 ≤ i ≤ ℓ − 1 such that T k−1 x = T i x for every x ∈ Xp,q . Indeed, let
a ∈ N ∪ {0} be the maximal number such that k − 1 − aq ≥ p + q. Then i = k − 1 − (a + 1)q satisﬁes
0 ≤ p ≤ i ≤ p + q − 1 = ℓ − 1 and T k−1 x = T k−1−(a+1)q x = T i x for every x ∈ Xp,q . Therefore,
dimH T k−1 (µp,q i p,q
φh,k ,φh,k (x) ) = dimH T (µφh,k ,φh,k (x) ) ≥ dimH T
ℓ−1 p,q
(µφh,k ,φh,k (x) ).

Combining this with (22), (23) and (24) yields (20) and finishes the proof of assertion (a) of Theo-
rem 1.21.
To prove assertion (b), note that for every ε > 0 there exists a set Borel E ⊂ X of positive
µ-measure such that dimH T k−1 (µ|E ) ≥ dimH T k−1 µ − ε (this can be easily seen from the definitions
of the upper and lower Hausdorff dimension of measures via local dimensions, see Subsection 2.4).
The assertion is then proved by applying the already established point (a) of Theorem 1.21 to the
1
measure µ̃ = µ(E) µ|E and noting that T k−1 (µ̃h,φh (x) ) ≪ T k−1 (µh,φh (x) ), so dimH T k−1 (µ̃h,φh (x) ) ≤
dimH T k−1 (µ̃h,φh (x) ) ≤ dimH T k−1 (µh,φh (x) ) for µ-almost every x ∈ E.

27
5.3. Proof of Theorem 1.22. First, we prove the following.

Theorem 5.3. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

and let T : X → X be a Lipschitz map. Fix k ∈ N and assume µ(PrePerk (T )) = 0. Fix d ≥ 2k + 1
and let {h1 , . . . , hm } be the set of all monomials of N variables of degree at most d. Let h : X → R
be a Lipschitz observable. Then for Lebesgue-almost every α ∈ Rm ,
dimH φα ◦ T (µα,φα (x) ) ≥ min{1, dimH T k µ − k} for µ-almost every x ∈ X,
where φα is the k-delay coordinate map corresponding to hα .

Proof. Again, it is sufficient to prove the assertion for Lebesgue-almost every α ∈ Bm (0, 1). The
first part of the proof is similar to the one of Theorem 5.2. We can assume dimH T k µ > k. Choose
η, s > 0 such that
η < dimH T k µ − k, k < s < k + min{1, dimH T k µ − η − k}.
Consider the collection F from Proposition 4.6 corresponding to ℓ = k and η, and fix F ∈ F. We
will show
(25) dimH φh ◦ T (µα,φα (x) ) ≥ s − k for µ-almost every x ∈ F and almost every α ∈ Bm (0, 1).
As η, s can be chosen such that s−k is arbitrarily close to min{1, dimH T k µ−k} and µ F ∈F F = 1,
S

showing (25) will prove the theorem.

Let
1
ν= µ|F .
µ(F )
As by Lemma 4.7, for ν-almost every x ∈ F , the measures να,φα (x) and µα,φα (x) |F are equal up to a
multiplication by a positive constant, to show (25) it is suﬃcient to prove
(26) dimH φh ◦ T (να,φα (x) ) ≥ s − k for ν-almost every x ∈ F and almost every α ∈ Bm (0, 1).
In the remainder of the proof we show (26). As dimH T k−1 ν ≥ dimH T k ν ≥ dimH T k µ > k and
1
) µ(PrePerk (T )) = 0, Theorem 5.1 implies φα ν ≪ H for Lebesgue-almost
ν(PrePerk (T )) ≤ µ(F k

every α ∈ Bm (0, 1). Consequently, by Lemma 4.8, the assertion (26) will follow from
G
(27) dimH φh ◦ T (να,z ) ≥ s−k for Hk -almost every z ∈ Rk and almost every α ∈ Bm (0, 1).
By Lemma 2.11, to show (27) it is suﬃcient to prove
ˆ ˆ
G
(28) I= Es−k (φα ◦ T (να,z ))dHk (z)dα < ∞.
Bm (0,1) Rk

Now we verify (28). As the function x 7→ kφα (T x) − φα (T y)kk−s ∈ [0, +∞] is lower semi-continuous,
(15) and Fatou’s lemma imply
ˆ ˆ ˆ ˆ G (x)dν G (y)dHk (z)dα
dνα,z α,z
I=
Bm (0,1) R k F F kφ α (T x) − φα (T y)ks−k
G (y)dHk (z)dα
dν(x)dνα,z
C1
ˆ ˆ ˆ ˆ
≤ lim inf k
δ→0 δ Bm (0,1) Rk F φ−1 α (B(z,δ))
kφα (T x) − φα (T y)ks−k
for some C1 > 0. By Lemma 4.8,
G (y)dHk (z)
dνα,z dν(y)
ˆ ˆ ˆ
s−k
= s−k
,
Rk F kφα (T x) − φα (T y)k F kφα (T x) − φα (T y)k
28
so by Tonelli’s theorem,
C1 dν(x)dν(y)dα
ˆ ˆ ˆ
I ≤ lim inf k
δ→0 δ Bm (0,1) F φ−1
α (B(φα (y),δ))
kφα (T x) − φα (T y)ks−k
C1
ˆ ˆ ˆ 1φ−1
α (B(φα (y),δ))
(x)
= lim inf dν(x)dν(y)dα
δ→0 δk Bm (0,1) F F kφα (T x) − φα (T y)ks−k
C1
ˆ ˆ
= lim inf Jx,y dν(x)dν(y),
δ→0 δk F F
where
dα
ˆ
Jx,y = , x, y ∈ F.
{α∈Bm (0,1):kφα (x)−φα (y)k≤δ} kφα (T x) − φα (T y)ks−k
To estimate Jx,y we proceed as follows. First we note that as dimc T k ν > 0, the measure T k ν has no
atoms. Therefore, within the integral I it suﬃces to consider only points x, y ∈ F with T k x 6= T k y.
Consequently, estimating Jx,y we can assume
x, y ∈ F, T k x 6= T k y.
By Proposition 4.6.(iii), we can use Proposition 4.4 to obtain
C2 (ε(F ))2k−2 k
(29) σk (Dx,y ) ≥ kT x − T k yk > 0
Lip(T )
for some C2 > 0. In particular, this implies that rank Dx,y = k and Ker Dx,y is an (m−k)-dimensional
subspace of Rm . By (12), we can write
{α ∈ Bm (0, 1) : kφα (x) − φα (y)k ≤ δ} = {α ∈ Bm (0, 1) : kDx,y α + wx,y k ≤ δ}
⊂ {β + γ : β ∈ Bm (0, 1) ∩ Ker Dx,y , γ ∈ Gx,y (δ)} ,
where
Gx,y (δ) = Bm (0, 1) ∩ (Ker Dx,y )⊥ ∩ Dx,y
−1
(Bk (−wx,y , δ)) ⊂ Rm .
Identifying (Ker Dx,y )⊥ with Rk , we can use Lemma 2.1 to obtain
k
δ
(30) Hk (Gx,y (δ)) ≤ C3
σk (Dx,y )
for some C3 > 0.
Now we estimate Jx,y . By (12) and Fubini’s theorem,
dHm−k (β)dHk (γ)
ˆ ˆ
(31) Jx,y ≤ C4 s−k
,
Gx,y (δ) Ker Dx,y ∩Bm (0,1) kDT x,T y β + DT x,T y γ + wT x,T y k

where we use the fact that the m-dimensional Lebesgue measure satisﬁes Leb = C4 Hm−k ⊗ Hk for
some C4 > 0. Fix γ ∈ Gx,y (δ). By Proposition 4.6.(iii), we can apply Proposition 4.5 to obtain
σ1 (DT x,T y |Ker Dx,y ) ≥ C5 kT k x − T k yk > 0
for some C5 > 0. Consequently, Lemma 4.1 applied for p = 1 and b = DT x,T y γ + wT x,T y gives
dHm−k (β) C6
ˆ
s−k
≤
Ker Dx,y ∩Bm (0,1) kDT x,T y β + DT x,T y γ + wT x,T y k kT x − T k yks−k
k

for some C6 > 0 (we use here 0 < s − k < 1). Combining (29), (30) and (31) yields
δk δk
Jx,y ≤ C7 ≤ C 8
(σk (Dx,y ))k kT k x − T k yks−k kT k x − T k yks
29
for some C7 , C8 > 0. Finally, this gives
dν(x)dν(y)
ˆ ˆ
I ≤ C1 C8 = C1 C8 Es (ν) < ∞,
F F kT k x − T k yks
as dimc ν ≥ dimH T kµ − η > s by Proposition 4.6.(ii). This establishes (28) and ﬁnishes the proof
of the theorem.
Proof of Theorem 1.22. Assertion (a) is a direct consequence of Theorem 5.3, while assertion (b) can
be obtained in the same way as Theorem 1.21.(b) was obtained from Theorem 1.21.(a) within the
proof of Theorem 1.21.

6. A counterexample – proof of Theorem 1.19

In this section we prove Theorem 1.19. First, recall the deﬁnition of the Smale–Williams solenoid
attractor (see e.g. [Rob99]). Let Y be a solid torus (a smooth manifold with boundary) deﬁned by
Y = S1 × D,
where S1 = R/(2πZ) and D is the unit disc in the complex plane C. Let T̃ : Y → Y be given by

mod 2π, 41 z + 12 eit .

T̃ (t, z) = 2t
The map T̃ provides a C ∞ -embedding of Y into itself. Let
∞
\
Λ= T̃ n (Y ).
n=0
The set Λ is called the Smale–Williams solenoid. It is an invariant compact hyperbolic attractor
admitting a natural (SRB) measure ν, in the sense of Definition 2.2. It is known (see [Sim97, RS03])
that
3
ID(ν) = dimH ν = dimH Λ = dimB Λ = .
2
2 ∞
Theorem 6.1. The dynamical system (Y, T̃ ) can be C -dynamically embedded into (M, T ), where
M is a 3-dimensional manifold and T : M → M is a C ∞ -axiom A diffeomorphism.
Proof. Let M = L(3, 7), the (3, 7)-lens space (this smooth 3-manifold is defined in e.g. [Hat02,
Example 2.43]). According to [JNW04, Claim on p. 4377], M admits a C ∞ -diffeomorphism T : M →
M such that Ω(T ) = A ∪ R, where A is a 4-adic solenoid attractor (C ∞ -conjugated to (Λ, T̃ 2 )) and
R is a 4-adic solenoid repeller.2 By [JNW04, p. 4375], the map T is an axiom A diffeomorphism see
e.g. [Bow08, Chapter 3] for the definition).
The proof of Theorem 1.19 is split into three propositions. Consider the manifold M and the
diffeomorphism T : M → M from Theorem 6.1. Denote by ψ a C ∞ -embedding of (Y, T̃ 2 ) into
(M, T ). Let
X = ψ(Λ), µ = ψν.
Then X is a T -invariant compact hyperbolic attractor with natural measure µ and
3
ID(µ) = dimH µ = dimH X = dimB X = .
2
For the system (X, T ) we consider 2-delay coordinate maps φh : X → R2 corresponding to Lipschitz
observables h : X → R. For simplicity, in the two following propositions we use the notation which
identifies (X, T ) with (Λ, T̃ 2 ).
2Note that the definition of an attractor presented in [JNW04, p. 4373] is compatible with Definition 2.2.
30
Proposition 6.2. The observable h0 : X → R given by h0 (t, z) = cos t is k-predictable for every
k ∈ N.
Proof. It suffices to prove that h0 is 1-predictable, i.e. h0 (t, z) determines h0 (T (t, z)) uniquely.
This follows from the trigonometric identity cos 2t = 2 cos2 t − 1, which shows that h0 (t, z) = cos t
determines uniquely cos 2t and hence also cos 4t = h0 (T (t, z)) (or more explicitly: h0 (T (t, z)) =
2(2h0 (t, z)2 − 1)2 − 1).
Proposition 6.3. For every observable h ∈ Lip(X, R), the corresponding k-delay coordinate map
φh is not injective on X for k = 1, 2.
Proof. This follows directly from the fact that the solenoid Λ does not embed topologically into R2 ,
see [Bin60, HO16].
Proposition 6.4. For every k ∈ N and δ > 0 there exists ε0 > 0 and an open set U ⊂ Lip(X, R)
containing h0 , such that for every ε < ε0 and every h ∈ U ,
{(t, z) ∈ X : σh,ε (φh (t, z)) > δ} = ∅,
where φh is the 2-delay coordinate map corresponding to h.
Proof. This is an adaptation of the proof of [BGŚ22b, Theorem 7.2]. By Proposition 6.2 and
[BGŚ22b, Proposition 1.9], there exists a continuous prediction map Sh0 : φh0 (X) → φh0 (X) satis-
fying φh0 (T (t, z)) = Sh0 (φh0 (t, z)) for every (t, z) ∈ X. Hence, for δ > 0 we can find 0 < ε0 < δ/6
such that
δ
if kφh0 (t, z) − φh0 (s, u)k < 3ε0 , then kφh0 (T (t, z)) − φh0 (T (s, u))k <
6
for (t, z), (s, u) ∈ X. Let ε1 > 0 be such that kφh − φh0 k < ε0 if kh − h0 kLip(X) < ε1 and let
U = {h ∈ Lip(X, R) : kh − h0 kLip(X) < ε1 }.
With this choice, if h ∈ U , then for every ε < ε0 and (s, u) ∈ φ−1 h (B(φh (t, z), ε)), we have kφh0 (t, z)−
φh0 (s, u)k ≤ ε + 2ε0 < 3ε0 , so kφh0 (T (t, z)) − φh0 (T (s, u))k ≤ δ/6 and hence
δ δ
kφh (T (t, z)) − φh (T (s, u))k < + 2ε0 < .
6 2
This together with the definitions of χh,ε and σh,ε gives kχh,ε (φh (t, z)) − φh (T (t, z))k ≤ δ/2 and,
consequently, σh,ε(φh (t, z)) ≤ δ for every (t, z) ∈ X.

Propositions 6.3–6.4 complete the proof of Theorem 1.19.

7. Examples and discussion on assumptions

In this section we provide examples showing the necessity of several assumptions within our
results.
7.1. Theorem 1.14 does not hold for the information dimension. In the following proposition
we present an example showing that in Theorem 1.14, one cannot replace the assumption k <
dimH T k (µ|X\PrePerk (T ) ) by k < ID(T k (µ|X\PrePerk (T ) )).
Proposition 7.1. There exists a compact set X ⊂ R2 , a Borel probability measure µ on X and a
Lipschitz map T : X → X such that the following hold.
(a) µ ∞
S
p=1 PrePerp (T ) = 0,
31
(b) dimH µ < 1,
(c) ID(µ) = ID(T µ) > 1.
(d) A prevalent Lipschitz observable h : X → R is almost surely 1-predictable, in particular
limε→0 µ({x ∈ X : σh,ε (φh (x)) > δ}) = 0 for every δ > 0, where φh is the 1-delay coordinate
map corresponding to h.

Proof. It will be convenient for us to consider certain self-similar sets and measures constructed via
iterated function systems. For an introduction to this theory see [BSS23]. Consider an iterated
function system {fi }3i=0 , where fi : R2 → R2 are given as

fi (x) = λx + ti , t0 = (0, 0), t1 = (0, 1 − λ), t2 = (1 − λ, 0), t3 = (1 − λ, 1 − λ)

for some 1/4 < λ < 1/2. Let X ⊂ R2 , called a four-corner Cantor set, be the attractor of the system
{fi }3i=0 , i.e. the unique non-empty compact set satisfying X = 3i=0 fi (X). We have X = π(Ω),
S

where π : Ω → R2 is the natural projection map from the symbolic space Ω = {0, 1, 2, 3}N , given
by π(ω1 , ω2 , . . .) = limn→∞ fω1 ◦ · · · ◦ fωn (0) (see e.g. [BP17, Chapter 2]). As λ < 1/2, the sets
fi ([0, 1]2 ) ⊂ [0, 1]2 are pairwise disjoint, hence π is a bijection. Moreover, it is straightforward to
check that if we endow Ω with a metric d(ω, τ ) = λ|ω∧τ | (where ω ∧ τ is the longest common preﬁx
of inﬁnite words ω 6= τ and |ω ∧ τ | is its length), then π is a bi-Lipschitz map and

log 4
dimH X = dimB X = dimH Ω = dimB Ω =
− log λ

(see e.g. [BP17, Theorem 2.2.2]). Hence, 1 < dimH Ω = dimB Ω < 2, and following the construction
in [FLR02, Section 3], one can construct a non-atomic Borel probability measure ν on Ω with
supp ν = Ω and dimH ν < 1 < ID(ν) = ID(σν), where σ : ΩN → ΩN is the left-side shift (in this
construction, ν is a convex combination of inﬁnite product measures).3 Setting T : X → X and µ
on X as
T = π ◦ σ ◦ π −1 , µ = πν,
S∞
and using the fact that π is bi-Lipschitz and the set p=1 PrePerp (σ) is countable, one obtains

∞
[
µ PrePerp (T ) = 0, dimH µ < 1, ID(µ) = ID(T µ) > 1,
p=1

so µ satisﬁes assertions (a)–(c). Note that for k = 1, the measure µ satisﬁes the assumptions
of Theorem 1.14, with condition ID(T µ) > 1 instead of dimH (T µ) > 1. On the other hand, as
dimH µ < 1, it follows from [BGŚ22a, Theorem 1.18] that a prevalent Lipschitz observable h : X → R
is almost surely 1-predictable, which implies limε→0 µ({x ∈ X : σh,ε (φ(x)) > δ}) = 0 for every δ > 0
(see Remark 1.15). This shows (d).

Remark 7.2. Note that the measure µ constructed in Proposition 7.1 is not a natural measure for
a smooth diﬀeomorphism (it is even non-invariant), hence it does not provide a counterexample to
assertion (i) of the SSOY prediction error conjecture. Finding such a counterexample (or proving
that it does not exist) remains an open question.

3The construction in [FLR02] corresponds to Ω = {0, 1}N with λ = 1/2, but extends directly to our case.
32
7.2. In Theorems 1.14 and 1.21–1.23, T k µ (or T k−1 µ) cannot be replaced by µ. We
present an example showing that in Theorem 1.14, the assumption k < dimH T k (µ|X\PrePerk (T ) ) can-
not be replaced by k < dimH (µ|X\PrePerk (T ) ), and in Theorem 1.23, the assumption k < dimH T k−1 µ
cannot be replaced by k < dimH µ. Additionally, the example shows that in Theorems 1.21 and 1.22,
the lower bounds in assertions (a) and (b) cannot be replaced by dimH µ − k and dimH µ − k − ε,
respectively. Note that the example also shows that in each of the above cases, one cannot replace
T k µ (or T k−1 µ) by µ even under the additional assumption of the injectivity of T .

Proposition 7.3. There exists a compact set X ⊂ R3 , a Borel probability measure µ on X and a
Lipschitz injective map T : X → X, with dimH µ > 2 and dimH T k µ < 1 for every k ≥ 1, such that
for a prevalent Lipschitz observable h : X → R and the 2-delay coordinate map φh corresponding to
h, the following hold.
(a) h is almost surely 2-predictable, in particular limε→0 µ({x ∈ X : σh,ε (φ(x)) > δ}) = 0 for
every δ > 0.
(b) φh ◦ T (µh,φh (x) ) = δφh (T x) , and hence dimH φh ◦ T (µh,φh (x) ) = 0 for µ-almost every x ∈ X.
(c) The measure φh µ is supported on a set of Hausdorff dimension smaller than 2, so it is not
absolutely continuous with respect to the 2-dimensional Lebesgue measure in R2 .

Proof. We deﬁne X ⊂ R3 to be a compact set, which is a disjoint union X = ∞ j=1 Xj ∪ {0}, where
S

(i)
X1 is a compact self-similar set with dimH X1 = dimB X1 > 2,
(ii)
X2 is a compact self-similar set with dimH X2 = dimB X2 < 1,
(iii)
X1 and X2 are homeomorphic by a Lipschitz bijection T : X1 → X2 ,
(iv)Xj = fj (X2 ) for j ≥ 3, where fn : R3 → R3 is a similarity with scaling ratio 2−j (in particular
Xn and X2 are bi-Lipschitz homeomorphic),
(v) limn→∞ dist(Xn , {0}) = 0.
The sets X1 and X2 can be constructed similarly as in Section 7.1, both being homeomorphic to
the same symbolic space Ω, but with diﬀerent contraction rates. Using (i)–(v), we can deﬁne a
Lipschitz injective map T : X → X such that T (Xj ) = Xj+1 for j ≥ 1 and T (0) = 0. Let µ be a
probability measure on X1 satisfying dimH µ = dimH µ = dimH X1 > 2 (one can take µ to be the
dimension maximizing self-similar measure on X1 , see e.g. [BP17, proof of Theorem 2.2.2] or [Edg98,
Theorem 5.2.5]). Note that dimH T j µ ≤ dimH X2 < 1 for every j ≥ 1. As dimH T µ < 1, the map
T is injective and 0 is the only periodic point of T , we can apply [BGŚ20, Theorem 1.2] to conclude
that for a prevalent Lipschitz observable h : X → R, there exists a set Xh ⊂ X2 of full T µ-measure,
on which h is injective. As T is injective, this implies that the 2-delay coordinate map φh : X → R2
is injective on the set T −1 (Xh ) of full µ-measure. Indeed, if φh (x) = φh (y) for x, y ∈ T −1 (Xh ), then
h(T x) = h(T y), hence T x = T y and, consequently, x = y. Therefore, µh,φh (x) = δx for µ-almost
every x ∈ X. This immediately implies assertions (a)–(b). To show (c), note that φh µ is supported
on R × h(T (X1 )) = R × h(X2 ) and dimH (R × h(X2 )) ≤ 1 + dimB X2 < 2.

7.3. Assumptions on pre-periodic points are necessary. As noted in Remark 1.9, to see that
some assumptions on the size of the set of preperiodic points of T are necessary, it is enough to
consider the case when the map T is the identity. In this case every observable is predictable,
regardless of the dimension of the phase space and the measure µ. Moreover, the measure φh µ is
supported on a 1-dimensional diagonal in Rk , hence it cannot be absolutely continuous with respect
to the k-dimensional Lebesgue measure for k > 1. One can easily construct similar examples, where
all the points of the phase space are (pre-)periodic with the same period p, p > 1. Therefore, one
33
cannot remove the assumption on pre-periodic points in Theorems 1.14, 1.22–1.23 and Theorem A.1
below.

Appendix A. Local dimension projection theorem for delay coordinate maps

We prove the following result on the local dimensions of the push-forward of Borel probability
measures on compact sets in RN by delay coordinate maps. The extends [SY97, Theorem 3.5,
Remark 4.4] to a more general case.

Theorem A.1. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X and
let T : X → X be a Lipschitz map. Fix k ∈ N and (in the case k > 1) assume µ(PrePerk−1 (T )) = 0.
Then for a prevalent Lipschitz observable h : X → R and the k-delay coordinate map φh corresponding
to h,
d(φh µ, φh (x)) ≥ min{k, d(T k−1 µ, T k−1 x)} for µ-almost every x ∈ X.
If, additionally, µ is T -invariant or T is bi-Lipschitz onto its image, then for a prevalent Lipschitz
observable h : X → R,
d(φh µ, φh (x)) = min{k, d(µ, x)} for µ-almost every x ∈ X.
Theorem A.1 provides the following corollary.

Corollary A.2. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

and let T : X → X be a Lipschitz map, such that µ is T -invariant
! or T is bi-Lipschitz onto its image.
k−1
S
Fix k ∈ N and (in the case k > 1) assume µ Perp (T ) = 0. Suppose that the local dimension
p=1
d(µ, x) exists for µ-almost every x ∈ X. Then for a prevalent Lipschitz observable h : X → R, the
local dimension d(φh µ, φh (x)) exists at µ-almost every x ∈ X and satisfies
d(φh µ, φh (x)) = min{k, d(µ, x)},
where φh is the k-delay coordinate map corresponding to h.
To show Theorem A.1, we ﬁrst prove the following more speciﬁc result.

Theorem A.3. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X and
let T : X → X be a Lipschitz map. Fix k ∈ N and (in the case k > 1) assume µ(PrePerk−1 (T )) = 0.
Fix d ≥ 2k − 1 and let {h1 , . . . , hm } be the set of monomials of N variables of degree at most d. Let
h : X → R be a Lipschitz observable. Then for the k-delay coordinate map φh corresponding to h
and Lebesgue-almost every α ∈ Rm ,
d(φα µ, φα (x)) ≥ min{k, d(T k−1 µ, T k−1 x)} for µ-almost every x ∈ X.
Proof. The proof follows the ideas used in [SY97, Theorem 3.5] and [HK97, Theorem 4.1]. As
previously, it is suﬃcient to prove the assertion for Lebesgue-almost every α ∈ Bm (0, 1). Fix η > 0
and consider the decomposition F from Proposition 4.6 for ℓ = k − 1 (actually, in this proof we only
make use of the properties (i) and (iii) of the decomposition, hence η will not be used in the proof).
Fix 0 < s < k. For M > 0 deﬁne
FM = {x ∈ F : Es (T k−1 µ, T k−1 x) ≤ M }.
We will show that ˆ ˆ
I= Es (φα (µ|F ), φα (x))dµ(x)dα < ∞.
Bm (0,1) FM
34
To do it, note that by (12), Tonelli’s theorem and Lemma 4.1,
dµ(y)dµ(x)dα dµ(y)dµ(x)dα dµ(y)dµ(x)
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ
I= s
= s
≤ C 1 s
Bm (0,1) FM F kφα (x) − φα (y)k Bm (0,1) FM F kDx,y α + wx,y k FM F (σk (Dx,y ))

for some C1 > 0. By Proposition 4.6(iii), we can use Proposition 4.4 to obtain
dµ(y)dµ(x) dµ(y)dµ(x)
ˆ ˆ ˆ ˆ
I ≤ C2 k−1 k−1 s
≤ C2 k−1
F F kT x−T yk FM X kT x − T k−1 yks
ˆ M
= C2 Es (T k−1 µ, T k−1 x)dµ(x) ≤ C2 M < ∞.
FM

Letting M → ∞ and applying Lemma 2.9, we obtain, for Lebesgue-almost every α ∈ Bm (0, 1),
d(φα (µ|F ), φα (x)) ≥ s for µ-almost every x ∈ F such that Es (T k−1 µ, T k−1 x) < ∞.
Applying Lemma 2.9 once more, we arrive at
d(φα (µ|F ), φα (x)) ≥ s for µ-almost every x ∈ F such that d(T k−1 µ, T k−1 x) > s.
Taking intersection over all rational s ∈ (0, k) we obtain, for Lebesgue-almost every α ∈ Bm (0, 1),
d(φα (µ|F ), φα (x)) ≥ min{k, d(T k−1 µ, T k−1 x)} for µ-almost every x ∈ F.
As µ F ∈F F = 1, to end the proof it is suﬃcient to show
S

(32) d(φα (µ|F ), φ(x)) = d(φα µ, φ(x)) for µ-almost every x ∈ F.

To prove (32), note that φα (µ|F ) ≪ φα µ, so by the diﬀerentiation theorem for measures (see
e.g. [Mat95, Theorem 2.12]), the Radon-Nikodym derivative
dφα (µ|F ) φα (µ|F )(B(y, δ))
(y) = lim
dφα µ δ→0 φα µ(B(y, δ))
exists and is positive and ﬁnite for φα (µ|F )-almost every y ∈ Rk . Therefore, for φα (µ|F )-almost
every y ∈ Rk ,
φα µ(B(y,δ))
log (φα (µ|F )(B(y, δ))) + log φα (µ|F )(B(y,δ))
d(φα µ, y) = lim inf = d(φα (µ|F ), y).
δ→0 log δ
In particular, this holds with y = φ(x) for µ-almost every x ∈ F , which proves (32).

Proof of Theorem A.1. The main assertion follows directly from Theorem A.3. The additional ones
use the observation that if µ is T -invariant or T is bi-Lipschitz onto its image, then
(33) d(T k−1 µ, T k−1 x) = d(µ, x) for µ-almost every x ∈ X.
Indeed, if µ is T -invariant, then T k−1 µ = µ, and as T is Lipschitz, we have B(x, r/ Lip(T )) ⊂
T −1 (B(T x, r)), so µ(B(x, r/ Lip(T )) ≤ µ(B(T x, r)) by the invariance of µ. Therefore, d(µ, T x) ≤
d(µ, x). On the other hand, by the invariance of µ, there holds d(µ, T x)dµ(x) = d(µ, x)dµ(x).
´ ´

This proves (33). If T is bi-Lipschitz, then B(x, r/ Lip(T )) ⊂ T −1 (B(T x, r)) ⊂ B(x, Lip(T −1 )r),
so d(T µ, T x) = d(µ, x) at every x ∈ X and (33) follows. Moreover, as φα is Lipschitz for every
Lipschitz observable h, the same arguments as above used for φα instead of T k−1 provide
d(φα µ, φα (x)) ≤ min{k, d(µ, x)} for µ-almost every x ∈ X.
Combining this with (33) and Theorem A.3 ends the proof of the additional assertions.
35
Proof of Corollary A.2. In the same way as in the proof of Theorem A.1, for a prevalent Lipschitz
observable h : X → R we obtain the upper bound
d(φh µ, φ(x)) ≤ min{k, d(µ, x)} = min{k, d(µ, x)} for µ-almost every x ∈ X.
By Theorem A.1, we also have
d(φh µ, φ(x)) = min{k, d(µ, x)} = min{k, d(µ, x)} for µ-almost every x ∈ X.
Combining these facts ﬁnishes the proof.

References
[Aba96] Henry D. I. Abarbanel. Analysis of observed chaotic data. Institute for Nonlinear Science. Springer-Verlag,
New York, 1996.
[BGŚ20] Krzysztof Barański, Yonatan Gutman, and Adam Śpiewak. A probabilistic Takens theorem. Nonlinearity,
33(9):4940–4966, 2020.
[BGŚ22a] Krzysztof Barański, Yonatan Gutman, and Adam Śpiewak. On the Shroer-Sauer-Ott-Yorke predictability
conjecture for time-delay embeddings. Comm. Math. Phys., 391(2):609–641, 2022.
[BGŚ22b] Krzysztof Barański, Yonatan Gutman, and Adam Śpiewak. Prediction of dynamical systems from time-
delayed measurements with self-intersections. preprint arXiv:2212.13509, 2022.
[Bin60] R. H. Bing. A simple closed curve is the only homogeneous bounded plane continuum that contains an
arc. Canadian J. Math., 12:209–230, 1960.
[Bow08] Rufus Bowen. Equilibrium states and the ergodic theory of Anosov diffeomorphisms, volume 470 of Lecture
Notes in Mathematics. Springer-Verlag, Berlin, revised edition, 2008. With a preface by David Ruelle,
Edited by Jean-René Chazottes.
[BP17] Christopher J. Bishop and Yuval Peres. Fractals in probability and analysis, volume 162 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2017.
[BR23] Niraj Bagh and M. Ramasubba Reddy. Investigation of the dynamical behavior of brain activities during
rest and motor imagery movements. Biomedical Signal Processing and Control, 79:104153, 2023.
[BSS23] Balázs Bárány, Károly Simon, and Boris Solomyak. Self-similar and self-affine sets and measures, volume
276 of Math. Surv. Monogr. Providence, RI: American Mathematical Society (AMS), 2023.
[Cab00] Victoria Caballero. On an embedding theorem. Acta Math. Hungar., 88(4):269–278, 2000.
[ČP88] A. Čenys and K. Pyragas. Estimation of the number of degrees of freedom from chaotic time series. Physics
Letters A, 129(4):227–230, 1988.
[DLSR23] Paweł Dłotko, Michał Lipiński, and Justyna Signerska-Rynkowska. Testing topological conjugacy of time
series. preprint arXiv:2301.06753, 2023.
[Edg98] Gerald A. Edgar. Integral, probability, and fractal measures. Springer-Verlag, New York, 1998.
[Fal97] Kenneth Falconer. Techniques in fractal geometry. John Wiley & Sons, Ltd., Chichester, 1997.
[Fal14] Kenneth Falconer. Fractal geometry. John Wiley & Sons, Ltd., Chichester, third edition, 2014. Mathe-
matical foundations and applications.
[FLR02] Ai-Hua Fan, Ka-Sing Lau, and Hui Rao. Relationships between different dimensions of a measure. Monatsh.
Math., 135(3):191–201, 2002.
[FS87] J. Doyne Farmer and John J. Sidorowich. Predicting chaotic time series. Phys. Rev. Lett., 59:845–848,
1987.
[GQS18] Yonatan Gutman, Yixiao Qiao, and Gábor Szabó. The embedding problem in topological dynamics and
Takens’ theorem. Nonlinearity, 31(2):597–620, 2018.
[Gut16] Yonatan Gutman. Takens’ embedding theorem with a continuous observable. In Ergodic theory, pages
134–141. De Gruyter, Berlin, 2016.
[Hat02] Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.
[HBS15] Franz Hamilton, Tyrus Berry, and Timothy Sauer. Predicting chaotic time series with a partial model.
Phys. Rev. E, 92:010902, Jul 2015.
[HGLS05] Chih-Hao Hsieh, Sarah M. Glaser, Andrew J. Lucas, and George Sugihara. Distinguishing random environ-
mental fluctuations from ecological catastrophes for the North Pacific Ocean. Nature, 435(7040):336–340,
2005.
36
[HK97] Brian R. Hunt and Vadim Yu. Kaloshin. How projections affect the dimension spectrum of fractal measures.
Nonlinearity, 10(5):1031–1046, 1997.
[HO16] Logan C. Hoehn and Lex G. Oversteegen. A complete classification of homogeneous plane continua. Acta
Math., 216(2):177–216, 2016.
[HP97] Simon Haykin and Sadasivan Puthusserypady. Chaotic dynamics of sea clutter. Chaos: An Interdisci-
plinary Journal of Nonlinear Science, 7(4):777–802, 1997.
[HSY92] Brian R. Hunt, Tim Sauer, and James A. Yorke. Prevalence: a translation-invariant “almost every” on
infinite-dimensional spaces. Bull. Amer. Math. Soc. (N.S.), 27(2):217–238, 1992.
[HT94] Xiaoyu Hu and S. James Taylor. Fractal properties of products and projections of measures in Rd . Math.
Proc. Cambridge Philos. Soc., 115(3):527–544, 1994.
[Huk06] Jeremy P. Huke. Embedding nonlinear dynamical systems: A guide to Takens’ theorem. Manchester
Institute for Mathematical Sciences EPrint 2006.26, 2006.
[JL94] Amithirigala W. Jayawardena and Feizhou Lai. Analysis and prediction of chaos in rainfall and stream
flow time series. Journal of Hydrology, 153(1):23–52, 1994.
[JM98] Maarit Järvenpää and Pertti Mattila. Hausdorff and packing dimensions and sections of measures. Math-
ematika, 45(1):55–77, 1998.
[JNW04] Boju Jiang, Yi Ni, and Shicheng Wang. 3-manifolds that admit knotted solenoids as attractors. Transac-
tions of the American Mathematical Society, 356(11):4371–4382, 2004.
[KBA92] Matthew B. Kennel, Reggie Brown, and Henry D. I. Abarbanel. Determining embedding dimension for
phase-space reconstruction using a geometrical construction. Phys. Rev. A, 45:3403–3411, 1992.
[Kec95] Alexander S. Kechris. Classical descriptive set theory, volume 156 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 1995.
[KK23] Péter Koltai and Philipp Kunde. A Koopman–Takens theorem: Linear least squares prediction of nonlinear
time series. preprint arXiv:2308.02175, 2023.
[KY90] Eric J. Kostelich and James A. Yorke. Noise reduction: finding the simplest dynamical system consistent
with the data. Phys. D, 41(2):183–196, 1990.
[LPS91] W. Liebert, K. Pawelzik, and H. G. Schuster. Optimal embeddings of chaotic attractors from topological
considerations. Europhysics Letters, 14(6):521, mar 1991.
[Mar54] John M. Marstrand. Some fundamental geometrical properties of plane sets of fractional dimensions. Proc.
London Math. Soc. (3), 4:257–302, 1954.
[Mat75] Pertti Mattila. Hausdorff dimension, orthogonal projections and intersections with planes. Ann. Acad. Sci.
Fenn. Ser. A I Math., 1(2):227–244, 1975.
[Mat95] Pertti Mattila. Geometry of sets and measures in Euclidean spaces, volume 44 of Cambridge Studies in
Advanced Mathematics. Cambridge University Press, Cambridge, 1995.
[MRCA14] Ramsés Mena, Francisco Rodríguez, María Castilla, and Manuel R. Arahal. A prediction model based on
neural networks for the energy consumption of a bioclimatic building. Energy and Buildings, 82:142–155,
2014.
[Noa91] Lyle Noakes. The Takens embedding theorem. Internat. J. Bifur. Chaos Appl. Sci. Engrg., 1(4):867–872,
1991.
[NV20] Raymundo Navarrete and Divakar Viswanath. Prevalence of delay embeddings with a fixed observation
function. Phys. D, 414:132697, 15, 2020.
[OY03] William Ott and James A. Yorke. Learning about reality from observation. SIAM J. Appl. Dyn. Syst.,
2(3):297–322, 2003.
[PCFS80] Norman H. Packard, James P. Crutchfield, J. Doyne Farmer, and Robert S. Shaw. Geometry from a time
series. Phys. Rev. Lett., 45:712–716, 1980.
[Pes08] Yakov B. Pesin. Dimension theory in dynamical systems: contemporary views and applications. University
of Chicago Press, 2008.
[Rob99] Clark Robinson. Dynamical systems. Studies in Advanced Mathematics. CRC Press, Boca Raton, FL,
second edition, 1999. Stability, symbolic dynamics, and chaos.
[Rob05] James C. Robinson. A topological delay embedding theorem for infinite-dimensional dynamical systems.
Nonlinearity, 18(5):2135–2143, 2005.
[Rob11] James C. Robinson. Dimensions, embeddings, and attractors, volume 186 of Cambridge Tracts in Mathe-
matics. Cambridge University Press, Cambridge, 2011.

37
[RS03] Michał Rams and Károly Simon. Hausdorff and packing measure for solenoids. Ergodic Theory Dynam.
Systems, 23(1):273–291, 2003.
[SBDH97] Jaroslav Stark, David S. Broomhead, Michael Evan Davies, and Jeremy P. Huke. Takens embedding
theorems for forced and stochastic systems. In Proceedings of the Second World Congress of Nonlinear
Analysts, Part 8 (Athens, 1996), volume 30, pages 5303–5314, 1997.
[SBDH03] Jaroslav Stark, David S. Broomhead, Michael Evan Davies, and Jeremy P. Huke. Delay embeddings for
forced systems. II. Stochastic forcing. J. Nonlinear Sci., 13(6):519–577, 2003.
[Sim97] Károly Simon. The Hausdorff dimension of the Smale-Williams solenoid with different contraction coeffi-
cients. Proceedings of the American Mathematical Society, 125(4):1221–1228, 1997.
[Sim12] David Simmons. Conditional measures and conditional expectation; Rohlin’s disintegration theorem. Dis-
crete Contin. Dyn. Syst., 32(7):2565–2582, 2012.
[SKY+ 18] Volkan Sarp, Ali Kilcik, Vasyl Yurchyshyn, Jean-Pierre Rozelot, and Atila Ozguc. Prediction of solar cycle
25: a non-linear approach. Monthly Notices of the Royal Astronomical Society, 481(3):2981–2985, 09 2018.
[SM90] George Sugihara and Robert May. Nonlinear forecasting as a way of distinguishing chaos from measurement
error in time series. Nature, 344(6268):734–741, 1990.
[Sol23] Boris Solomyak. Notes on the transversality method for iterated function systems – a survey. Mathematical
and Computational Applications, 28(3), 2023.
[SSOY98] Christian G. Schroer, Tim Sauer, Edward Ott, and James A. Yorke. Predicting chaos most of the time
from embeddings with self-intersections. Phys. Rev. Lett., 80:1410–1413, 1998.
[Sta99] Jaroslav Stark. Delay embeddings for forced systems. I. Deterministic forcing. J. Nonlinear Sci., 9(3):255–
332, 1999.
[SY97] Timothy D. Sauer and James A. Yorke. Are the dimensions of a set and its image equal under typical
smooth functions? Ergodic Theory Dynam. Systems, 17(4):941–956, 1997.
[SYC91] Timothy D. Sauer, James A. Yorke, and Martin Casdagli. Embedology. J. Statist. Phys., 65(3-4):579–616,
1991.
[Tak81] Floris Takens. Detecting strange attractors in turbulence. In Dynamical systems and turbulence, Warwick
1980, volume 898 of Lecture Notes in Math., pages 366–381. Springer, Berlin-New York, 1981.
[Tak02] Floris Takens. The reconstruction theorem for endomorphisms. Bull. Braz. Math. Soc. (N.S.), 33(2):231–
262, 2002.
[Vos03] Henning U. Voss. Synchronization of reconstructed dynamical systems. Chaos, 13(1):327–334, 2003.
[WCL09] Cong-Lin Wu, Kwok-Wing Chau, and Yok-Sheung Li. Predicting monthly streamflow using data-driven
models coupled with data-preprocessing techniques. Water Resources Research, 45(8), 2009.
[You82] Lai Sang Young. Dimension, entropy and Lyapunov exponents. Ergodic Theory Dynamical Systems,
2(1):109–124, 1982.

ENEE 660 HW Sol #2
No ratings yet
ENEE 660 HW Sol #2
9 pages
Predicting Time Series of Complex Systems: David Rojas Lukas Kroc Marko Thaler, Va
No ratings yet
Predicting Time Series of Complex Systems: David Rojas Lukas Kroc Marko Thaler, Va
14 pages
Nonlinear Time-Series Analysis: Ulrich Parlitz
No ratings yet
Nonlinear Time-Series Analysis: Ulrich Parlitz
31 pages
Darouach Strong Detectability
No ratings yet
Darouach Strong Detectability
7 pages
Model Reduction For Nonlinearizable Dynamics Via Delay-Embedded Spectral Submanifolds
No ratings yet
Model Reduction For Nonlinearizable Dynamics Via Delay-Embedded Spectral Submanifolds
21 pages
EECI Session 2
No ratings yet
EECI Session 2
23 pages
PhysRevE 84 016223
No ratings yet
PhysRevE 84 016223
17 pages
Model Reduction For Nonlinearizable Dynamics Via D
No ratings yet
Model Reduction For Nonlinearizable Dynamics Via D
22 pages
Obs Glasgow
No ratings yet
Obs Glasgow
6 pages
Mora 1997
No ratings yet
Mora 1997
12 pages
1962kalman - Canonical Structure of Linear Dynamical Systems
No ratings yet
1962kalman - Canonical Structure of Linear Dynamical Systems
5 pages
Asad 1
No ratings yet
Asad 1
61 pages
Serija Iii: WWW - Math.hr/glasnik
No ratings yet
Serija Iii: WWW - Math.hr/glasnik
20 pages
On Nyquist-Shannon Theorem With One-Sided Half of Sampling Sequence
No ratings yet
On Nyquist-Shannon Theorem With One-Sided Half of Sampling Sequence
6 pages
C022 ObserverSmoothBoundedInput 5thIEEE MedCCS97 Paphos
No ratings yet
C022 ObserverSmoothBoundedInput 5thIEEE MedCCS97 Paphos
11 pages
Introduction To The Theory of Parametric Resonance Lecture Notes Alexei A. Mailybaev
No ratings yet
Introduction To The Theory of Parametric Resonance Lecture Notes Alexei A. Mailybaev
26 pages
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
No ratings yet
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
19 pages
Albert Chib JB 93
No ratings yet
Albert Chib JB 93
15 pages
Nonlinear Observers: University of California, Davis, CA, USA
No ratings yet
Nonlinear Observers: University of California, Davis, CA, USA
21 pages
Starters: X X X X A
No ratings yet
Starters: X X X X A
8 pages
Hong 2011
No ratings yet
Hong 2011
19 pages
(Applied Mathematical Sciences 3) Jack K. Hale (Auth.) - Functional Differential Equations (1971, Springer US)
No ratings yet
(Applied Mathematical Sciences 3) Jack K. Hale (Auth.) - Functional Differential Equations (1971, Springer US)
246 pages
Estimation of System Parameters in Discrete Dynamical Systems From Time Series
No ratings yet
Estimation of System Parameters in Discrete Dynamical Systems From Time Series
4 pages
Introduction To Data Assimilation and The Ensemble Kalman Filter
No ratings yet
Introduction To Data Assimilation and The Ensemble Kalman Filter
49 pages
Datascience
No ratings yet
Datascience
14 pages
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems From Discrete Observations
No ratings yet
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems From Discrete Observations
50 pages
Combined State and Least Squares Parameter Estimation Algorithms For Dynamic Systems
No ratings yet
Combined State and Least Squares Parameter Estimation Algorithms For Dynamic Systems
10 pages
Observability and Controllability Analysis of Nonlinear Systems by Linear Methods
No ratings yet
Observability and Controllability Analysis of Nonlinear Systems by Linear Methods
6 pages
On Optimality of The Shiryaev-Roberts Procedure For Detecting A Change in Distribution
No ratings yet
On Optimality of The Shiryaev-Roberts Procedure For Detecting A Change in Distribution
12 pages
Tac 1971 1099621
No ratings yet
Tac 1971 1099621
10 pages
On The Relationship Between Interval Observers and Invariant Sets in Fault Detection
No ratings yet
On The Relationship Between Interval Observers and Invariant Sets in Fault Detection
6 pages
Moment Inversion Problem For Piecewise D-Finite Functions
No ratings yet
Moment Inversion Problem For Piecewise D-Finite Functions
22 pages
Prediction in Projection: A New Paradigm in Delay-Coordinate Reconstruction
No ratings yet
Prediction in Projection: A New Paradigm in Delay-Coordinate Reconstruction
115 pages
Lec 13
No ratings yet
Lec 13
14 pages
A Diagram Free Approach To The Stochastic Estimates in Regularity Structures
No ratings yet
A Diagram Free Approach To The Stochastic Estimates in Regularity Structures
97 pages
Waiting Times, Recurrence Times, Ergodicity and Quasiperiodic Dynamics
No ratings yet
Waiting Times, Recurrence Times, Ergodicity and Quasiperiodic Dynamics
79 pages
Ergodic Theory Intro
No ratings yet
Ergodic Theory Intro
64 pages
JMAA
No ratings yet
JMAA
17 pages
Ec2142 CourseNotes
No ratings yet
Ec2142 CourseNotes
94 pages
Chapter 11: Nonlinear Observers:: A Pair of States (X An Input Function U Such That
No ratings yet
Chapter 11: Nonlinear Observers:: A Pair of States (X An Input Function U Such That
9 pages
Chapter - 4
No ratings yet
Chapter - 4
23 pages
On The Superposition
No ratings yet
On The Superposition
10 pages
An Introduction To Stochastic Control
No ratings yet
An Introduction To Stochastic Control
134 pages
Stochasticity and Non-Locality of Time
No ratings yet
Stochasticity and Non-Locality of Time
13 pages
Chow 1969
No ratings yet
Chow 1969
11 pages
Hamilton94 PDF
No ratings yet
Hamilton94 PDF
42 pages
Hamilton 94
No ratings yet
Hamilton 94
42 pages
Bellman Filtering For State Space Models
No ratings yet
Bellman Filtering For State Space Models
26 pages
1 s2.0 S0005109817304375 Main
No ratings yet
1 s2.0 S0005109817304375 Main
4 pages
Sampling Observability For Heat Equations With Memory: Lingyingma Gengsheng Wang Yubiao Zhang
No ratings yet
Sampling Observability For Heat Equations With Memory: Lingyingma Gengsheng Wang Yubiao Zhang
20 pages
Time-Frequency Localization and Sampling of Multiband Signals
No ratings yet
Time-Frequency Localization and Sampling of Multiband Signals
36 pages
Testing Non-Identifying Restrictions
No ratings yet
Testing Non-Identifying Restrictions
26 pages
Chaos-Driven Quantum
No ratings yet
Chaos-Driven Quantum
12 pages
Inverse Problem
No ratings yet
Inverse Problem
36 pages
A Course in Time Series Analysis
No ratings yet
A Course in Time Series Analysis
139 pages
Solvable Systems of Two Coupled First-Order ODEs W
No ratings yet
Solvable Systems of Two Coupled First-Order ODEs W
25 pages
Markov Switching Models: An Example For A Stock Market Index
No ratings yet
Markov Switching Models: An Example For A Stock Market Index
19 pages
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
From Everand
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
Dr. Gérard G. Emch
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Deriving The Equations of Motion
No ratings yet
Deriving The Equations of Motion
4 pages
Griffiths ED5 e CH 1 P 58
No ratings yet
Griffiths ED5 e CH 1 P 58
3 pages
19 - Lecture - 19 - Routh Hurwithz Criteria - Special Cases
No ratings yet
19 - Lecture - 19 - Routh Hurwithz Criteria - Special Cases
38 pages
MCQ of Calculus 24
No ratings yet
MCQ of Calculus 24
5 pages
Second Law of Thermodynamics-Notes
100% (13)
Second Law of Thermodynamics-Notes
19 pages
GATE Mathematics Questions All Branch by S K Mondal
No ratings yet
GATE Mathematics Questions All Branch by S K Mondal
75 pages
1-25 Physics XII (Sir Asad)
No ratings yet
1-25 Physics XII (Sir Asad)
3 pages
Modern Geometry - Methods and Applications - Part II - The Geometry and Topology of Manifolds - PDF Room
100% (1)
Modern Geometry - Methods and Applications - Part II - The Geometry and Topology of Manifolds - PDF Room
447 pages
Vectors Mark Scheme
No ratings yet
Vectors Mark Scheme
9 pages
Kabeza Physical Plan
No ratings yet
Kabeza Physical Plan
1 page
Instant Download The Dirac Equation in Curved Spacetime A Guide For Calculations Peter Collas PDF All Chapter
100% (10)
Instant Download The Dirac Equation in Curved Spacetime A Guide For Calculations Peter Collas PDF All Chapter
54 pages
Sep 2013
No ratings yet
Sep 2013
8 pages
The Kinetic Molecular Theory Postulates
No ratings yet
The Kinetic Molecular Theory Postulates
1 page
General Relativistic Polytropes With A Repulsive Cosmological Constant PhysRevD.94.103513
No ratings yet
General Relativistic Polytropes With A Repulsive Cosmological Constant PhysRevD.94.103513
37 pages
20sc01t Scheme and Solution
No ratings yet
20sc01t Scheme and Solution
19 pages
What's Important: Many-Particle Hamiltonians and Wavefunctions
No ratings yet
What's Important: Many-Particle Hamiltonians and Wavefunctions
4 pages
Exercises - 03 - Algebra
No ratings yet
Exercises - 03 - Algebra
2 pages
A Note On The Representation of Clifford Algebra
No ratings yet
A Note On The Representation of Clifford Algebra
11 pages
Basic Calculus Exam 4th Quarter
No ratings yet
Basic Calculus Exam 4th Quarter
4 pages
MATLAB Report Example
No ratings yet
MATLAB Report Example
9 pages
Math Question Bank
No ratings yet
Math Question Bank
178 pages
1D
No ratings yet
1D
24 pages
HW5-Classical Mechanics
No ratings yet
HW5-Classical Mechanics
7 pages
Mathematics - IIF (Old)
No ratings yet
Mathematics - IIF (Old)
4 pages
DA241M Review of Linear Algebra Part 3
No ratings yet
DA241M Review of Linear Algebra Part 3
4 pages
Vector Function With Respect To Scalar Variable
No ratings yet
Vector Function With Respect To Scalar Variable
4 pages
Linear and Volumetric Deformation Perspectives From Mass Conservation
No ratings yet
Linear and Volumetric Deformation Perspectives From Mass Conservation
10 pages
MATH1061 Practice Exam
No ratings yet
MATH1061 Practice Exam
35 pages
M.A. Naimark - Linear Differential Operators Part 1 - 1967
No ratings yet
M.A. Naimark - Linear Differential Operators Part 1 - 1967
169 pages
Dirac Delta 1
No ratings yet
Dirac Delta 1
3 pages

Predicting Dynamical Systems With Too Few Time-Delay Measurements: Error Estimates

Uploaded by

Predicting Dynamical Systems With Too Few Time-Delay Measurements: Error Estimates

Uploaded by

PREDICTING DYNAMICAL SYSTEMS WITH TOO FEW TIME-DELAY

MEASUREMENTS: ERROR ESTIMATES

KRZYSZTOF BARAŃSKI∗, YONATAN GUTMAN† , AND ADAM ŚPIEWAK†

(1) h(x), h(T x), h(T 2 x), . . . , h(T m x).

Theorem 1.2 (Time-delay prediction and embedding theorem). Let X ⊂ RN , N ∈ N, be a

Definition 1.3. Let X ⊂ RN , N ∈ N, be a Borel set, let µ be a Borel probability measure on X

(provided the integrals exist).

Theorem 1.16 (Prediction error estimates). Let X ⊂ RN , N ∈ N, be a compact set, let µ

Theorem 1.19 (Counterexample). There exists a compact Riemannian manifold M , a C ∞ -

{x ∈ X : σh,ε (φh (x)) > δ} = ∅.

µ({x ∈ X : σh,ε (φh (x)) > δ}) ≥ C,

(in particular, the function Rk ∋ y 7→ µφ,y (E) in (8) is φµ-measurable) and

(9) µφ,y (φ−1 (y)) = 1 for φµ-almost every y ∈ Rk .

Lemma 1.20. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

The proof of Lemma 1.20 is presented in Section 3.

Theorem 1.21. Let X ⊂ RN , N ∈ N, be a compact set, let T : X → X be a Lipschitz map and

Definition 2.2 (Natural measure). Let M be a compact Riemannian manifold and T : M → M

For a compact set X ⊂ RN , N ∈ N we consider the space Lip(X) of Lipschitz functions on X.

3. Relations between conditional measures and injectivity/predictability

T k (µ|X\PrePerk (T ) )(E) = µ(T −k (E) \ PrePerk (T ))

F : T k (X) \ kp=1 Perp (T ) → X to be a Borel partial inverse to T k and setting µ = F ν, we obtain

F T k (X) \ kp=1 Perp (T ) ⊂ X \ PrePerk (T ), and hence µ is supported on X \ PrePerk (T ).

Lemma 4.1. Let A be the matrix of a linear transformation ψ : Rm → Rk , m, k ∈ N, m ≥ k and let

where C > 0 depends only on m, k, p and s.

Proposition 4.6. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on

dimc T ℓ (µ|T −ℓ (Xn ) ) ≥ t − η

ε(x) = min{kT i x − T j xk : 0 ≤ i 6= j ≤ ℓ} > 0 for µ-almost every x ∈ X

(in the case ℓ = 0 we set ε(x) = 1 for x ∈ X). Then for

Yq = {x ∈ X : ε(x) ≥ 1/q}, q∈N

F̃ = {F̃ = T −ℓ Xn ∩ Yq ∩ B : n, q ∈ N, B ∈ Bq , µ(F̃ ) > 0}

Lemma 4.7. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

Lemma 4.8. Let φ : X → Rk , k ∈ N, be a Borel map on a compact set X ⊂ RN , N ∈ N, and let

function g : X → [0, ∞] is a non-decreasing limit of a sequence of non-negative continuous functions.

5. Proofs of Theorems 1.21–1.23

Consequently, by Tonelli’s theorem, (12) and Lemma 2.1,

slices) of a Borel probability measure µ on X with respect to a k-delay coordinate map φα .

dimH T k−1 (µα,φα (x) ) ≥ dimH T k−1 µ − k for µ-almost every x ∈ X,

where φα is the k-delay coordinate map corresponding to hα .

η < dimH T k−1 µ − k, k < s < dimH T k−1 µ − η.

Consider the collection F from Proposition 4.6 corresponding to ℓ = k − 1 and η. Fix F ∈ F. We

(16) will conclude the proof of the theorem.

The remainder of the proof is devoted to verifying (17).

(18) dimH T k−1 (να,z

As T is Lipschitz and ℓ < k, we have

Theorem 5.3. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

showing (25) will prove the theorem.

6. A counterexample – proof of Theorem 1.19

mod 2π, 41 z + 12 eit .

7. Examples and discussion on assumptions

fi (x) = λx + ti , t0 = (0, 0), t1 = (0, 1 − λ), t2 = (1 − λ, 0), t3 = (1 − λ, 1 − λ)

Appendix A. Local dimension projection theorem for delay coordinate maps

Corollary A.2. Let X ⊂ RN , N ∈ N, be a compact set, let µ be a Borel probability measure on X

(32) d(φα (µ|F ), φ(x)) = d(φα µ, φ(x)) for µ-almost every x ∈ F.

You might also like