Embeddings Extensions
Embeddings Extensions
Embeddings Extensions
Lecture Notes
Department of Mathematics
Princeton University
Spring 2015
Contents
1. Introduction 2
2. Preparatory material 5
3. The Hölder extension problem 8
4. Kirszbraun’s extension theorem 10
5. Bourgain’s embedding theorem 13
6. The nonlinear Dvoretzky theorem 19
7. Assouad’s embedding theorem 23
8. The Johnson-Lindenstrauss extension theorem 26
9. Embedding unions of metric spaces into Euclidean space 35
10. Extensions of Banach space-valued Lipschitz functions 39
11. Ball’s extension theorem 44
12. Uniform convexity and uniform smoothness 53
13. Calculating Markov type and cotype 58
1
Notes taken by Alexandros Eskenazis
1. Introduction
In this first section, we present the setting for the two basic problems that we will face throughout
the course.
1.1. Embeddings and extensions. The first problem is the bi-Lipschitz embedding problem. This
consinsts of deciding whether a given metric space (X, dX ) admits a ”reasonable” embedding into some
other metric space (Y, dY ), in the sense that there is a mapping f : X → Y , such that if we compute
the distance dY (f (x), f (y)), for two points x, y ∈ X, then we can almost compute the distance dX (x, y).
This can be written formally in the following way:
Definition 1.1. Let (X, dX ) and (Y, dY ) be metric spaces. A mapping f : X → Y has distortion at
most D > 1 if there exists a (scaling constant) s > 0, such that
(1) sdX (x, y) 6 dY (f (x), f (y)) 6 sDdX (x, y),
for every x, y ∈ X. The optimal such D is denoted by dist(f ). We denote by c(Y,dY ) (X, dX ) (or simply
by cY (X)) the infimum of all constants D > 1 such that there is a mapping f : X → Y with distortion
at most D. When Y = Lp = Lp (0, 1), we simply write cp (X, dX ) or cp (X), instead of c(Lp ,k·kp ) (X, dX ).
The number c2 (X) is usually called the Euclidean distortion of X.
Question: For two given metric spaces (X, dX ) and (Y, dY ), can we estimate the quantity cY (X)?
If we can effectively bound cY (X) from above, then we definitely know that there exists a low-distortion
embedding of X into Y . If, on the other hand, we can find a large lower bound for cY (X), then we can
deduce that there is an invariant on Y which does not allow X to be nicely embedded inside Y .
The second problem which is of our interest is the Lipschitz extension problem. This can be formulated
as follows: Suppose we are given a pair of metric spaces (X, dX ) and (Z, dZ ), a subset A ⊆ X and a
Lipschitz map f : A → Z.
Question 1: Is there always a Lipschitz mapping f˜ : X → Z which extends f , that is f˜|A = f ?
If the answer is positive for every such function, then it is reasonable to ask the following uniformity
question:
Question 2: Is there a constant K > 1, depending on X, A and Z, such that every function f : A → Z
admits a Lipschitz extension f˜ : X → Z with kf˜kLip 6 Kkf kLip ?
If the answer to this question is also positive, then it is of quantitative interest to compute - or estimate
- the least such K, which we will denote by e(X, A, Z). Afterwards it is reasonable to ask the even
stronger question:
Question 3: Is the quantity
(2) e(X, Z) = sup {e(X, A, Z) : A ⊆ X}
finite?
In case the answer is negative, we would be interested in quantitative formulations of this divergence,
such as:
Question 4: If e(X, Z) = ∞, estimate the quantities
(3) en (X, Z) = sup {e(X, A, Z) : A ⊆ X with |A| 6 n}
and, for ε ∈ (0, 1),
(4) eε (X, Z) = sup e(X, A, Z) : A bounded and ∀x 6= y ∈ Ad(x, y) > εdiam(A) .
We mention now the two best known bounds regarding these quantities when the target space Z is a
Banach space.
Theorem 1.2. If Z is a Banach space, then, for every metric space (X, dX ) it holds
log n
(5) en (X, Z) . .
log log n
Conversely, we do not know whether this is optimal: the best known lower bound is
p
(6) en (X, Z) & log n,
for a particular Banach space Z and a metric space (X, dX ).
2
Theorem 1.3. If Z is a Banach space, then, for every metric space (X, dX ) and every ε ∈ (0, 1), it
holds
1
(7) eε (X, Z) 6 .
ε
Unlike the previous result, we know that this is asymptotically sharp: there is a Banach space Z and a
metric space (X, dX ) such that eε (X, Z) 1ε .
Proof of the upper bound (7). Let A ⊆ X bounded such that, if D = diam(A), then for every x 6= y ∈ A
it holds dX (x, y) > εD. This means that the closed balls
εD
Bx = B x, = {y ∈ X : dX (x, y) 6 εD/4},
4
where x ∈ A, have pairwise disjoint interiors. Define the extension:
( εD −d (x,z)
X
˜
4
εD f (x), if z ∈ Bx
f (z) = 4
S
0, if z ∈
/ x∈A Bx .
kf kLip
It can easily be checked that f˜ is a well-defined ε -Lipschitz mapping. 2
1.2. Extension and approximation. We digress from our main topic in order to make a slight remark:
General principle: An extension theorem implies an approximation theorem.
We give a simple example of this principle. Let’s start with a definition:
Definition 1.4. A metric space (X, dX ) is called metrically convex if for any pair of points x, y ∈ X
and every t ∈ (0, 1), there is a point z ∈ X such that d(x, z) = td(x, y) and d(y, z) = (1 − t)d(x, y).
Many interesting metric spaces, such as normed spaces and complete Riemannian manifolds, are metri-
cally convex.
Theorem 1.5. Let (X, dX ) a metrically convex metric space and (Z, dZ ) a metric space such that
e(X, Z) = K < ∞. Then, any uniformly continuous function f : X → Z, can be uniformly approximated
by Lipschitz functions.
Proof. Let f : X → Z be a uniformly continuous function and an ε > 0. For δ > 0, denote by
(8) ω(δ) = sup {dZ (f (x), f (y)) : dX (x, y) 6 δ}
the modulus of continuity of f . Since f is uniformly continuous, we have limδ→0+ ω(δ) = 0. The fact
that X is metrically convex also implies that ω is subadditive, that is, for s, t > 0
ω(s + t) 6 ω(s) + ω(t).
Indeed, if x, y ∈ X such that dX (x, y) 6 s + t, then there is a point z ∈ X such that dX (x, z) 6 s and
dX (z, y) 6 t. Thus:
dZ (f (x), f (y)) 6 dZ (f (x), f (z)) + dZ (f (z), f (y)) 6 ω(s) + ω(t),
and the claim follows.
Let Nε a maximal ε−separated set inside X - that is a set such that for every x, y ∈ Nε to hold
dX (x, y) > ε which is maximal under inclusion. This is necessairy an ε−net of X, i.e. for every x ∈ X,
there is an x0 ∈ Nε such that dX (x, x0 ) 6 ε.
Consider the function f |Nε . This is a Lipschitz function: for x 6= hy ∈ Nε , iwe have dX (x, y) > ε and
dZ (f (x), f (y)) 6 ω (dX (x, y)) . Dividing the interval [0, dX (x, y)] into dX (x,y)
ε + 1 subintervals of length
at most ε we get, using the subadditivity of ω, that
dX (x, y) dX (x, y)
ω (dX (x, y)) 6 + 1 ω(ε) 6 2 ω(ε).
ε ε
Thus, combining the above, we get that
ω(ε)
dZ (f (x), f (y)) 6 2 dX (x, y),
ε
for every x, y ∈ Nε or equivalently kf |Nε kLip 6 2 ω(ε)
ε .
3
Using the Lipschitz extension assumption, we can now extend f |Nε to a function f˜ : X → Z with
kf˜kLip 6 2K ω(ε) ˜
ε . We now have to prove that f and f are uniformly close. Observe that, for x ∈ X, one
can find y ∈ Nε with dX (x, y) 6 ε and thus
dZ f (x), f˜(x) 6 dZ (f (x), f (y)) + dZ f˜(y), f˜(x) 6 ω(ε) + 2Kω(ε),
which indeed tends to 0 as ε → 0+ .
Related textbooks:
4
2. Preparatory material
In this section we present some first results and techniques that we will use extensively in the rest of
the course.
2.1. Basic embedding and extension theorems. Let’s start with some elementary positive results
about embeddings and extensions.
Theorem 2.1. Every metric space (X, dX ) is isometric to a subspace of `∞ (X), where for an arbitrary
set Γ, we define the space
(9) `∞ (Γ) = (xγ )γ∈Γ ∈ RΓ : sup |xγ | < ∞
γ∈Γ
2.2. The discrete cube Fn2 . Now we develop the basic notation of Fourier Analysis on the discrete
cube and prove our first (asymptotically) negative embeddability result. Let F2 be the field with two
elements, which we will denote by 0, 1, and Fn2 the hypercube {0, 1}n ⊆ Rn . We endow Fn2 with the
Hamming metric
n
X
(12) kx − yk1 = |xi − yi | = |{i : xi 6= yi }| ;
i=1
that is the inner product of the space L2 (µ), where µ is the counting measure in Fn2 . Since the space of
all functions f : Fn2 → R has dimension 2n , we deduce that {wA }A is an orthonormal basis of L2 (µ) and
thus every such function has an expansion of the form
X
(15) f (x) = fb(A)wA (x), x ∈ Fn2 ,
A⊆{1,2,...,n}
where
Z
(16) fb(A) = f (x)wA (x)dµ(x) ∈ R.
Fn
2
Remark. The same is true for functions f : Fn2 → X, where X is any Banach space – in this case
fb(A) ∈ X. The previous dimension-counting argument does not work here, but we can easily deduce
this from the above by composing f with linear functionals x∗ ∈ X ∗ .
The non-embeddability result that we mentioned is the following:
√
Theorem 2.5. If Fn2 is the Hamming cube, then c2 (Fn2 ) = n.
In order to prove this, we will need a quantitative lemma, which will work as an invariant of embeddings
of the Hamming cube into `2 .
Lemma 2.6 (Enflo, 1969). Every function f : Fn2 → `2 satisfies the inequality
Z n Z
X
(17) kf (x + e) − f (x)k22 dµ(x) 6 kf (x + ej ) − f (x)k22 dµ(x),
Fn
2 j=1 Fn
2
where {ej }nj=1 is the standard basis of Fn2 and e = (1, 1, ..., 1).
Proof of Theorem 2.5. Consider the identity mapping i : Fn2 → (Rn , k · k2 ). Then, for every x, y ∈ Fn2 =
{0, 1}n ,
√
ki(x) − i(y)k2 6 kx − yk1 6 nki(x) − i(y)k2 .
√
This means that c2 (Fn2 ) 6 n.
6
n
To prove
√ the reverse inequality we must consider an embedding f : F 2 → `2 and prove that D =
n
dist(f ) > n. There is a scaling factor s > 0 such that for every x, y ∈ F2
skx − yk1 6 kf (x) − f (y)k2 6 sDkx − yk1 .
We will now use Enflo’s Lemma: we know that (17) holds for the embedding f . But:
kf (x + ej ) − f (x)k22 6 s2 D2 kej k21 = s2 D2
and
kf (x + e) − f (x)k22 > s2 kek21 = s2 n2 .
Combining those with (17), we get that
s2 n2 6 ns2 D2 ,
√
which gives the desired bound D > n. 2
Proof of Enflo’s Lemma. Since the inequality involves only expressions of the form kxk22 ,
where x ∈ `2 ,
it is enough to prove it coordinatewise, that is for functions f : Fn2 → R. In this case, we can expand f
as X
f (x) = xA wA (x),
A⊆{1,2,...,n}
for some xA ∈ R. To do this, we write the Walsh expansion of both sides and then use Parseval’s identity.
Firstly,
X
f (x) − f (x + e) = xA (wA (x) − wA (x + e))
A⊆{1,2,...,n}
X
= xA (1 − (−1)|A| )wA (x)
A⊆{1,2,...,n}
X
=2 xA wA (x),
A: |A| odd
thus X
LHS = 4 x2A .
A: |A| odd
On the other hand,
X
f (x) − f (x + ej ) = xA (wA (x) − wA (x + ej ))
A⊆{1,2,...,n}
X
=2 xA wA (x),
A: j∈A
and thus
n
X X X
RHS = 4 x2A = 4 |A|x2A .
j=1 A: j∈A A⊆{1,2,...,n}
The desired inequality is now obvious. 2
It is worth noting that this is essentially the only non-trivial result, for which we are able to actually
compute the embedding constant cY (X).
The same techniques that we used in the proof of Enflo’s lemma, can also give the following variants
of it which are left as exercises.
Lemma 2.7. Every function f : Fn2 → `2 satisfies the inequality
Z Z Xn Z
2
(18) kf (x) − f (y)k2 dµ(x)dµ(y) . kf (x + ej ) − f (x)k22 dµ(x).
Fn
2 Fn
2 j=1 Fn
2
Lemma 2.8. Every function f : Fn2 → `2 such that fb(A) = 0 for every A with 0 < |A| < k satisfies the
inequality
Z Z n Z
1X
(19) kf (x) − f (y)k22 dµ(x)dµ(y) . kf (x + ej ) − f (x)k22 dµ(x).
Fn
2 Fn
2
k j=1 F n
2
7
3. The Hölder extension problem
Like the extension problem for Lipschitz functions, it is easy to formulate the extension problem for
Hölder functions:
Question: Let (X, dX ), (Y, dY ) be metric spaces, a subset A ⊆ X and an α−Hölder function f : A → Y
for some α ∈ (0, 1]; that is a function for which
Does there exist an α−Hölder extension f˜ : X → Y of f and, if yes, in which senses is this extension
uniform?
One can easily notice that this is a special case of the general Lipschitz extension question that we posed
in Section 1 since, for 0 < α 6 1, the function dα X is also a metric. In this setting, when Y is a Hilbert
1 1
space, there exists a dichotomy principle depending on whether α ∈ 0, 2 or α ∈ 2 , 1 . We will start
with the case α ∈ 12 , 1 , in which case there is no uniform extension constant. This was proven with the
following construction:
Counterexample 3.1 (Johnson, Lindenstrauss and Schechtman, 1986). Consider the Hamming cube
X = Fn2 and a disjoint copy of it, say X̃ = {x̃ : x ∈ X}. We will define a weighted graph structure on the
set X ∪ X̃ and our metric space will be this graph with the induced shortest path metric. Fix α ∈ 21 , 1
and a constant r > 0. The edges of the graph will be of the following three kinds:
• Pairs of the form {x, y}, where x, y ∈ X. The corresponding weight will be
1/2α
w{x,y} = kx − yk1 .
• Pairs of the form {x̃, ỹ}, where x, y ∈ X. The corresponding weight will be
kx − yk1
w{x̃,ỹ} = .
r2α−1
• Pairs of the form {x, x̃}, where x ∈ X. The corresponding weight will be
w{x,x̃} = r.
Claim: If d is the shortest path metric on X ∪ X̃ with respect to this weighted graph, then for every
1/2α
x, y ∈ X, d(x, y) = kx − yk1 .
thus it is α−Hölder with constant 1 and let f˜ : X ∪ X̃ → `n2 be an α−Hölder extension with constant
L > 1. We will prove that L has to be large.
Using Enflo’s inequality again, we get
Z n Z
X
(∗∗) kf˜(x̃ − f˜(x̃ + ẽ)k22 dµ(x) . kf˜(x̃) − f˜(x̃ + e˜j )k22 dµ(x).
Fn
2 j=1 Fn
2
8
By the Hölder condition, we have
n Z
X
RHS 6 L2 d(x̃, x̃ + e˜j )2α dµ(x)
j=1 Fn
2
n
X L2
6
j=1
r2α(2α−1)
L2 n
= .
r2α(2α−1)
On the other hand,
kf˜(x̃) − f˜(x̃ + ẽ)k2 > kf (x) − f (x + e)k2 − kf (x) − f˜(x̃)k2 − kf (x + e) − f˜(x̃ + ẽ)k2
& kx − (x + e)k2 − LdX (x, x̃)α − LdX (x + e, x̃ + ẽ)α
√
= n − 2Lrα .
Adding all these and using (∗∗), we get
L2 n √ 2
> n − 2Lrα ,
r2α(2α−1)
which can be rewritten as √
√
n
L 2rα + > n.
rα(2α−1)
2
Optimizing with respect to r, we get that for r = c(α)n1/4α it holds
1 1
L &α n 2 − 4α .
2
Remark: Notice that for the case α = 1 of this counterexample, we get
1/4
L &α n1/4 log |X ∪ X̃| ,
which is far from being the best known bound (6).
The related quantitative conjecture is this:
1
Conjecture 3.2. For every metric space (X, dX ) and every 2 < α 6 1, it holds
α− 21
(21) en ((X, dα
X ), `2 ) . (log n) .
The case α = 1 of this conjecture is known as the Johnson – Lindenstrauss extension theorem. Another
open problem related to the particular example we just constructed is the following.
Open problem 3.3. Does the metric space X ∪ X̃ with α = 1 embed into `1 , in the sense that c1 (X ∪ X̃)
is bounded as n → ∞?
This question is particularly interesting, since one can prove that both c1 (X) and c1 (X̃), with the
restricted metric, remain bounded as n → ∞ but nevertheless it is not known whether there exists a
low-distortion embedding of the whole space.
The dichotomy we promised will be completed by Minty’s theorem. We just saw that for α > 21 , the
Hölder extension problem cannot be uniformly solved in general. However, Minty proved that this is
not the case for α 6 12 : in this case the Hölder extension problem admits an isometric solution. We will
prove this assertion in the next section, as a corollary of an important extension theorem for functions
between Hilbert spaces.
9
4. Kirszbraun’s extension theorem
In the previous section, we constructed a sequence of metric spaces {Xn }, subsets An ⊆ Xn and Lips-
chitz functions fn : An → `2 which do not admit Lipschitz extensions to Xn with uniformly comparable
Lipschitz norm. We will see now that such an example could not occur if the Xn were Hilbert spaces:
Theorem 4.1 (Kirszbraun, 1934). Let H1 , H2 be Hilbert spaces, a subset A ⊆ H1 and a Lipschitz
function f : A → H2 . Then, there exists an extension f˜ : H1 → H2 of f , such that kf˜kLip = kf kLip .
Before we proceed to the proof of the theorem, we give an equivalent geometric formulation.
Theorem 4.2 (Geometric version of Kirszbraun’s theorem). Let H1 , H2 be Hilbert spaces, {xi }i∈I ⊆ H1 ,
{yi }i∈I ⊆ H2 and {ri }i∈I ⊆ [0, ∞), where I is some index set. If for every i, j ∈ I, it holds
(22) kyi − yj kH2 6 kxi − xj kH1 ,
then
\ \
(23) B(xi , ri ) 6= ∅ =⇒ B(yi , ri ) 6= ∅.
i∈I i∈I
The validity of the conjecture in its generality is still open. However, Gromov proved in 1987 that
the claim holds if n 6 k + 1. More recently, in the early 00’s, the conjecture was also confirmed (by
completely different arguments) for the cases n = k + 2 and k = 2 (for arbitrary n).
Let us return to our main topic. We start by proving the equivalence of the two statements above.
Geometric version ⇔ extension version – proof. (⇒) Suppose that kf kLip = 1. By a typical Zorn’s
lemma argument, it is enough to construct an extension to one more point x ∈ X r A. Consider the
points {a}a∈A ⊆ H1 and {f (a)}a∈A ⊆ H2 . For them, it holds kf (a) − f (b)kH2 6 ka − bkH1 , for every
a, b ∈ A and we also have
\
x∈ B(a, kx − ak) 6= ∅.
a∈A
Thus, we deduce that there is a point
\
y∈ B(f (a), kx − ak) 6= ∅,
a∈A
i.e. ky − f (a)kH2 6 kx − ak for every a ∈ A. So, if we define f˜(x) = y, we have a Lipschitz extension
with norm 1.
(⇐) Define A = {xi : i ∈ I} ⊆ H1 and f : A → H2 by f (xi ) = T yi , i ∈ I. Our assumption is equivalent
to the fact that f is 1−Lipschitz. Consider now a point x ∈ i∈I B(xi , ri ), a 1−Lipschitz extension
f˜ : H1 → H2 of f and set y = f˜(x). Then, for every i ∈ I,
ky − yi kH2 = kf˜(x) − yi kH2 6 kx − xi kH1 6 ri ,
2
T
thus y ∈ i∈I B(yi , ri ) 6= ∅.
Proof of the geometric version of Kirszbraun’s theorem. First observe that all the balls B(xi , ri ), B(yi , ri )
are weakly compact (since Hilbert spaces are reflexive) and thus have the finite intersection property.
Therefore, we can suppose that I = {1, 2, Tn..., n} for some n. Now, replace H1 Tby H10 = span{x1 , ..., xn }
0 n
and H2 by H2 = span{y1 , ..., yn }. Since i=1 BH1 (xi , ri ) 6= ∅, we deduce that i=1 BH10 (xi , ri ) 6= ∅, just
n
by considering an orthogonal projection from H1 to H10 . Of course, if it holds i=1 BH20 (yi , ri ) 6= ∅, then
T
Tn
also i=1 BH2 (yi , ri ) 6= ∅. So, we can assume that dimH1 , dimH2 < ∞.
10
Tn
TnTake now any point x ∈ i=1 B(xi , ri ). Observe that if x = xi0 , for some i0 , then y = yi0 ∈
i=1 B(yi , ri ) and thus our claim is trivial. So, we can suppose that x 6= xi for every i = 1, 2, ..., n.
Define the function h : H2 → [0, ∞) by
ky − xi k2
(25) h(y) = max , y ∈ H2 .
16i6n kx − xi k2
Since h is continuous and limy→∞ h(y) = ∞, h attains its global minimum
m = min h(y).
y∈H2
Thus
2 2
E kf (X) − Ef (X)k = E kf (X) − yk
X
= λj kyj − yk2
j∈J
X
= m2 λj kxj − xk2
j∈J
2 2
= m E kX − xk .
But, we notice that
2
EkX − xk2 > E kX − EXk
and, on the other hand,
2 1
E kf (X) − Ef (X)k = Ekf (X) − f (X 0 )k2
2
1
6 EkX − X 0 k2
2
2
= E kX − EXk .
(All the above are trivially checked using the Hilbert space structure and the properties of the expected
value.) Putting everything together, we get that
2 2
m2 E kX − EXk 6 E kX − EXk ,
which gives (since X is not constant) that m 6 1, as we wanted.
We are now in position to prove the dichotomy that we mentioned in the previous section:
11
Corollary 4.5 (Minty, 1970). Let (X, dX ) be any metric space A ⊆ X, α ∈ 0, 21 and f : A → `2 an
α−Hölder function with constant L. Then, there exists an extension f˜ : X → `2 , i.e. f˜|A = f , which is
also α−Hölder with constant L.
Proof. It suffices to prove the theorem in the case L = 1 and α = 12 (because we can then aply it to the
metric d2α
X ). As usual, we just have to extend f to one more point x ∈ X r A. Consider the Hilbert
space
!1/2
X
(28) `2 (X) = f : X → R : kf k = kf (x)k22 <∞
x∈X
and let {ex }x∈X be its standard basis. Observe now, that
\ p p
0∈ B dX (x, a)ea , dX (x, a) 6= ∅.
a∈A
Since, for every a 6= b ∈ A,
kf (a) − f (b)k2 6 dX (a, b)1/2
p 1/2
6 dX (x, a) + dX (x, b)
p p
= dX (x, a)ea − dX (x, b)eb
`2 (X)
we deduce from Kirszbraun’s theorem that there exists a
\ p
y∈ B f (a), dX (x, a) 6= ∅.
a∈A
function f : A → `n2 , for some n, with constant L. Then f admits an extension f˜ : X → `n2 , i.e. f˜|A = f ,
1
which is α−Hölder with constant Lnα− 2 .
12
5. Bourgain’s embedding theorem
In this section we will prove the fundamental result of Bourgain on embeddings of finite metric spaces
into Hilbert space. Afterwards, we will construct an example which proves that Bourgain’s result is
asymptotically sharp.
5.1. Statement and proof of the theorem. The (very general) theorem that we will prove in this
section is the following:
Theorem 5.1 (Bourgain, 1986). If (X, d) is any n−point metric space, then c2 (X) . log n.
Proof. Let k = blog2 nc + 1. For every subset A ⊆ X, denote
n−|A|
1 1
(29) πj (A) = 1− j .
2j|A| 2
Interpretation: Imagine that we choose a random subset B of X by adjoining independently any element
of X in B with probability 1/2j . Then πj (A) is exactly the probability P(B = A).
X
Define a function f : X → R2 , by the formula f (x) = (f (x)A )A⊆X , where
1/2
k
1 X
(30) f (x)A = πj (A) d(x, A).
k j=1
X
We will prove that f has bi-Lipschitz distortion at most a constant multiple of log n, where R2 is
endowed with the Euclidean norm. The upper bound follows easily; for x, y ∈ X:
X 1X k
2
kf (x) − f (y)k22 = πj (A) (d(x, A) − d(y, A))
k j=1
A⊆X
(31) k
X X 1
6 πj (A)d(x, y)2
j=1
k
A⊆X
= d(x, y)2 ,
where we used that x 7→ d(x, A) is 1−Lipschitz and that {πj (A)}A⊆X sum to 1 for every j, by the above
interpretation.
The lower bound requires more effort. Fix again two points x, y ∈ X and a 1 6 j 6 k. Define rj (x, y)
to be the smallest r > 0 such that
1
r̃j (x, y) = min rj (x, y), d(x, y) .
3
The key lemma that we will need to finish the proof is this:
Lemma 5.2. Let Aj be a random subset of X obtained with respect to πj . Then, for every x 6= y ∈ X,
it holds
2 2
(32) Eπj (d(x, Aj ) − d(y, Aj )) & (r̃j (x, y) − r̃j−1 (x, y)) .
13
Let us suppose for the moment that the lemma is valid. Then, we calculate:
X 1X k
2
kf (x) − f (y)k22 = πj (A) (d(x, A) − d(y, A))
k j=1
A⊆X
k
1X 2
= Eπ (d(x, Aj ) − d(y, Aj ))
k j=1 j
k
1X 2
& (r̃j (x, y) − r̃j−1 (x, y))
(33) k j=1
2
k
1 X
> 2 r̃j (x, y) − r̃j−1 (x, y)
k j=1
1
= r̃k (x, y)
k2
1
2 d(x, y)2 ,
k
d(x,y)
where apart from the lemma, we have applied Jensen’s inequality and the fact that rk (x, y) > 3 .
The previous inequality can be rewritten as
1
(34) kf (x) − f (y)k2 & d(x, y),
log n
which, along with (31), proves our claim.
log n
(35) cp (X) . .
p
1
(Hint: Replace the quantity 2j in Bourgain’s proof by q j , for q ∈ (0, 1) and then optimize with respect
to q.)
5.2. Sharpness of Bourgain’s theorem. We now present an explicit construction which will prove
that the proof above gave the asymptotically optimal upper bound for Euclidean embeddings of finite
metric spaces. In particular, we prove:
Theorem 5.4 (Linial-London-Rabinovich, 1995). For arbitrarily large n there exists an n-point metric
space (X, dX ) with c2 (X) & log n.
In particular, Linial, London and Rabinovich proven that for a sequence of regular expander graphs
{Gn } Bourgain’s upper bound is achieved. The proof that follows is different and was given by Khot
and Naor in 2006. In fact, the same construction can prove that the statement in Exercise 5.3 is also
asymptotically sharp. Observe also, that for the Hamming cube we have proved
√ q
c2 (Fn2 ) = n = log |Fn2 |,
where for a subset C ⊆ X we define Cε = {x ∈ X : d(x, C) 6 ε}. One can easily prove that H is a
metric on 2X but d is not. The example proving the sharpness of Theorem 5.1 will be a suitable quotient
of the Hamming cube.
Definition 5.5. Let (X, d) be a metric space and U = {U1 , U2 , ..., Uk } a partition of X. Consider the
weighted graph on the set of vertices {U1 , U2 , ..., Uk }, where the weights are defined by
def
We denote by X/U = {U1 , U2 , ..., Uk } the resulting metric space with the induced shortest-path metric
dX/U .
Lemma 5.6. Suppose that a group G acts on X by isometries and let U be the orbit partition {Gx }x∈X
of X with respect to this action. Then
Lemma 5.7. Let G be a group that act on Fd2 by isometries and |G| 6 2εd , where 0 < ε < 1. Then
1 X 1−ε
(40)
2 2d
dFd2 /G (Gx , Gy ) & 1 · d.
d
1 + log 1−ε
x,y∈F2
15
Proof. Let µ be the uniform probability measure of Fd2 and fix some δ > 0. Now observe that
n o
µ × µ (x, y) ∈ Fd2 × Fd2 : dFd2 /G (Gx , Gy ) > δd
n o
= 1 − µ × µ (x, y) ∈ Fd2 × Fd2 : ∃ g ∈ G such that kx − gyk1 < δd
1 X X
=1− d µ x ∈ Fd2 : kx − gyk1 < δd
2
y∈Fd2
g∈G
|G| X d
=1− d
2 k
k6δd
1 X d
> 1 − (1−ε)d .
2 k
k6δd
It can easily be proven that V ⊥ is also a subspace of Fd2 and that (V ⊥ )⊥ = V . We will now prove that
there exists a linear subspace V of Fd2 such that
(42) c2 (Fd2 /V ⊥ ) & d log |Fd2 /V ⊥ |,
where V ⊥ acts on Fd2 by addition. We will need the following Fourier-analytic lemma:
Lemma 5.8. For every subspace V of Fd2 , define
(43) w(V ) = min kxk1 .
x∈V r{0}
g(x) = f (x + V ⊥ ), x ∈ Fd2 .
Observe that g is V ⊥ -invariant, thus gb(A) = 0 for 0 < |A| < w(V ). Thus, by Lemma 2.8 (a variant of
Enflo’s inequality),
Z Z d Z
1 X
(45) kg(x) − g(y)k22 dµ(x)dµ(y) . kg(x + ej ) − g(x)k22 dµ(x).
Fd
2 Fd
2
w(V ) j=1 Fd
2
d−dim(V )
by Lemma 5.7, where |V ⊥ | = 2εd , i.e. ε = d . On the other hand:
d
D2 X
Z
RHS 6 dFd2 /V ⊥ (x + V ⊥ , x + ej + V ⊥ )dµ(x)
w(V ) j=1 Fd
2
D2 d
6 .
w(V )
or equivalently
dim(V ) 2
D2 & dw(V ) d
.
d + d log dim(V )
To finish the proof of Theorem 5.4, we must argue that there exists a subspace V of Fd2 such that both
dim(V ) and w(V ) are of the order of d. Then, the previous lemma will give:
18
6. The nonlinear Dvoretzky theorem
In the previous section we proved that for an arbitrary n-point metric space, we can construct an
embedding into `2 with distortion bounded by a constant multiple of log n and furthermore, this result
is sharp. It is natural to ask whether there exist Ramsey-type results in the above setting:
Question: Is it true that any finite metric space contains large subsets which embed into Euclidean
space with low distortion?
In what follows, we will give a (sharp) quantitative result answering this question. Before moving
on, we have to remark that the motivation for this question comes from a classical theorem about
finite-dimensional Banach spaces:
Theorem 6.1 (Dvoretzky, 1961). Let (X, k · kX ) be an n-dimensional Banach space. Then, for every
ε > 0, there exists a subspace Y of X which satisfies the following:
(i) Y embeds linearly into `2 with distortion at most 1 + ε and
(ii) dim Y > c(ε) log n, where c(ε) is a constant depending only on ε.
The result of this section will be a nonlinear analogue of Dvoretzky’s theorem which we now state.
Theorem 6.2 (Nonlinear Dvoretzky Theorem, Mendel-Naor, 2006). Let (X, d) be an n-point metric
space. Then, for every ε ∈ (0, 1), there exists a subset S ⊆ X which satisfies the following:
(i) c2 (S) . 1ε and
(ii) |S| > n1−ε .
Even though we will not prove it, we remark that the above result is sharp:
Theorem 6.3. For every n ∈ N and ε ∈ (0, 1), there exists an n-point metric space (X, d) such that if
S ⊆ X with |S| > n1−ε , then c2 (S) & 1ε .
The proof of Theorem 6.2 is probabilistic. Let’s start with some terminology. For a partition P of a
metric space X and x ∈ X, we denote by P(x) the unique element of P to which x belongs. For ∆ > 0,
we say that P is ∆-bounded if
(46) diamP(x) 6 ∆, for every x ∈ X.
Definition 6.4. A sequence of partitions {Pk }∞ k=0 of a metric space X is called a partition tree if the
following hold:
(i) P0 = {X};
(ii) For every k > 0, Pk is 8−k diam(X)-bounded and
(iii) For every k > 0, Pk+1 is a refinement of Pk , i.e. for every x ∈ X it holds Pk+1 (x) ⊆ Pk (x).
The crucial definition is the following:
Definition 6.5. Let β, γ > 0. A probability distribution over random partition trees of a metric space
X is called completely β-padded with exponent γ if for every x ∈ X
−k
1
(47) P{Pk }∞ B(x, 8 βdiam(X)) ⊆ Pk (x), ∀k > 0 > γ.
k=0
n
Observe that in the previous definition, the probability is with respect to the random choice of a
partition tree of X. From now on, we normalize so that diam(X) = 1. The relation of this definition
with our problem is explained in the following lemma:
Lemma 6.6. Suppose that X admits a completely β-padded with exponent γ random partition tree. Then
there exists a subset S ⊆ X such that:
(i) c2 (S) 6 β8 and
(ii) |S| > n1−γ .
Proof. Define a random subset
S = x ∈ X : B(x, 8−k β) ⊆ Pk (x), ∀k > 0 .
(48)
First, we calculate
X
P B(x, 8−k β) ⊆ Pk (x), ∀k > 0
E|S| =
x∈X
Lemma 6.7. Every finite ultrametric space (S, ρ) is isometric to a subset of a Hilbert space.
Proof. Denote by m = |S|. We will prove by induction on m that there is an embedding f : S → H
satisfying:
(i) kf (x) − f (y)kH = ρ(x, y), ∀x, y ∈ S and
(ii) kf (x)kH = diam(S)
√
2
, ∀x ∈ S.
For the inductive step, define the relation ∼ on S by x ∼ y if if ρ(x, y) < diam(S) and observe that this
is an equivalence relation since ρ is an ultrametric. Let A1 , A2 , ..., Ak be the equivalence classes of ∼ and
notice that |Ai | < m for every i = 1, 2..., k, since otherwise m = 1. Thus, for every i = 1, 2, ..., k there
exists an embedding fi : Ai → Hi such that
(i) kfi (x) − fi (y)kHi = ρ(x, y), ∀x, y ∈ Ai and
(ii) kf (x)kHi = diam(A )
√ i , ∀x ∈ Ai .
2
L
k k
Define the map f : S → i=1 Hi ⊕ `2 , by
r
def diam(S)2 − diam(Ai )2
(53) x ∈ Ai ⇒ f (x) = fi (x) + · ei ,
2
where `k2 = span{e1 , ..., ek }. Now, we just check that f is an isometry:
• If x, y ∈ Ai :
kf (x) − f (y)kH = kfi (x) − fi (y)kHi = ρ(x, y).
• If x ∈ Ai , y ∈ Aj and j 6= i then ρ(x, y) = diam(S):
diam(S)2 − diam(Ai )2 diam(S)2 − diam(Aj )2
kf (x) − f (y)k2H = + + kfi (x)k2Hi + kfj (y)k2Hj
2 2
= diam(S)2
= ρ(x, y)2 ,
where in the second equality we used the hypothesis (ii).
diam(S)2
Finally, indeed it is kf (x)k2H = 2 , as we wanted.
Having understood the relation of Definition 6.5 with our problem, we now present the key geometric
lemma that almost finishes the proof:
20
Lemma 6.8. Let (X, d) be a finite metric space and consider a number ∆ > 0. Then there exists a
∆
probability distribution over ∆-bounded partitions of X such that for every t ∈ 0, 8 and every x ∈ X:
|B(x, ∆/8)| 8t/∆
(54) PP B(x, t) ⊆ P(x) > .
|B(x, ∆)|
Proof. Write X = {x1 , x2 , ..., xn } and:
(i) Let π ∈ Sn be a uniformly random permutation of {1, 2,h..., n};i
(ii) Let R be a uniformly distributed random variable over ∆ ∆
4, 2 .
Consider now the random (with respect to π and R) partition P = {C1 , C2 , ..., Cn } of X (with possible
repetitions) given by
j
[
(55) C1 = B(xπ(1) , R) and Cj+1 = B(xπ(j+1) , R) r Ci .
i=1
h i
∆ ∆
Claim. For every fixed r ∈ 4, 2 and x ∈ X:
|B(x, r − t)|
(56) Pπ,R B(x, t) ⊆ P(x) R = r > .
|B(x, r + t)|
For the proof of the claim first observe that, since P(x) is contained in a ball of radius r, every point
outside the ball B(x, r + t) is irrelevant: it belongs to another equivalence class. The crucial idea is the
following: consider the first point x0 (in the random ordering induced by π) which belongs in B(x, r + t)
and suppose that this point happens to belong to B(x, r − t). By constuction, P(x) = P(x0 ) is the ball
of radius r centered at x0 . Thus, we deduce that if d(y, x) 6 t, then
d(y, x0 ) 6 d(y, x) + d(x, x0 ) 6 t + (r − t) = r,
i.e. B(x, t) ⊆ P(x). From the above analysis and the fact that π (and thus the random point of
B(x, r + t)) was chosen uniformly at random we conclude that the claim is valid. 2
Now, define h(s) = log |B(x, s)| and calculate:
Z ∆/2
1
Pπ,R B(x, t) ⊆ P(x) = Pπ,R B(x, t) ⊆ P(x) R = r dr
∆/4 ∆/4
4 ∆/2 |B(x, r − t)|
Z
> dr
∆ ∆/4 |B(x, r + t)|
4 ∆/2 h(r−t)−h(r+t)
Z
= e dr
∆ ∆/4
(†) 4 Z ∆/2
> exp h(r − t) − h(r + t)dr
∆ ∆/4
4 Z ∆/4+t 4 ∆/2+t
Z
= exp h(r)dr − h(r)dr
∆ ∆/4−t ∆ ∆/2−t
8t 8t
> exp h(∆/4 − t) − h(∆/2 + t)
∆ ∆
8t 8t
> exp h(∆/8) − h(∆)
∆ ∆
8t/∆
|B(x, ∆/8)|
= ,
|B(x, ∆)|
where in (†) we used Jensen’s inequality.
We are now in position to finish the proof:
Proof of Theorem 6.2. For every k > 0, let Pk be a random partition as in the previous lemma for
∆ = 8−k such that P0 , P1 , ... are chosen independently of each other. Now, let us define the (random)
partition Qk to be the common refinement of P0 , P1 , ..., Pk or equivalently, Q0 = P0 and Qk+1 is the
common refinement of Qk and Pk+1 . In other words, for every x ∈ X
Qk+1 (x) = Qk (x) ∩ Pk+1 (x).
21
Thus, {Qk }∞
k=0 is a random partition tree of X. Let α > 8 and x ∈ X. We want to estimate from below
the quantity
1 −k
(57) P{Pk }∞ B x, 8 ⊆ Q k (x), ∀k > 0 .
k=0
α
1 −k
Qk were defined, we notice that if B x, α 8
From the way
1 −k
⊆ Pk (x) for every k > 0 if and only if
B x, α 8 ⊆ Qk (x) for every k > 0. So, we deduce that
1 1
P B x, 8−k ⊆ Qk (x), ∀k > 0 = P B x, 8−k ⊆ Pk (x), ∀k > 0
α α
∞
Y 1
PPk B x, 8−k ⊆ Pk (x)
=
α
k=1
∞
(54) Y |B(x, 8−k−1 )| 8/α
>
|B(x, 8−k )|
k=1
1
= .
n8/α
Thus, {Qk }∞
k=0 is α1 -padded with exponent α8 . For ε = α8 ∈ (0, 1) we get the result using Lemma 6.6. 2
Remark. The subset S given by the proof above embeds with distortion at most 1/ε into an ultrametric
space and thus to a Hilbert space. We note here that we can not hope for a Bourgain-type theorem in
the ultrametric setting. This can be seen by the fact that the distortion of the metric space {1, 2, ..., n}
with its usual metric into any ultrametric space is at least n − 1 (exercise).
22
7. Assouad’s embedding theorem
A major research problem in embeddings is the following:
Open problem 7.1 (Bi-Lipschitz embedding problem in Rn ). Characterize those metric spaces (X, d)
that admit a bi-Lipschitz embedding into Rn , for some n.
Analogues of this problem have been answered in other branches of Mathematics. For example,
topological dimension is an invariant in the category of topological spaces and the Nash embedding
theorems settle the case of Riemannian manifolds.
A necessary condition for a metric space to admit an embedding in some Euclidean space Rn is the
following:
Definition 7.2. A metric space (X, d) is K-doubling, where K > 1, if every ball in X can be covered
by K balls of half the radius. A space is doubling if it is K-doubling for some K > 1.
First of all, the necessity is a special case of the following:
n n
Lemma 7.3. Let k·k be a norm
on R . For every r > 0 and ε ∈ (0, 1), every ball of radius r is (R , k·k)
3 n
can be covered by at most ε balls of radius εr.
Proof. Let N be an εr-net in B(x, r), i.e. a maximal εr-separated subset of B(x, r). Then
[
(58) B(x, r) ⊆ B(y, εr).
y∈N
which gives
εr n εr n
|N | vol(B) 6 r + vol(B).
2 2
So, we conclude that
n n
2 3
|N | 6 1+ 6 .
ε ε
One might conjecture that the doubling condition is sufficient for a metric space to be embeddable
in some Rn . Even though it turns out that there are doubling metric spaces which are not bi-Lipschitz
embeddable in any Rn , Assouad proved that the equivalence is almost correct:
Theorem 7.4 (Assouad, 1983). For every K > 1 and ε ∈ (0, 1), there exist D = D(K, ε) > 1 and
N = N (K, ε) ∈ N such that for any K-doubling metric space (X, d), its (1 − ε)-snowflake (X, d1−ε ) can
be embedded with bi-Lipschitz distortion at most D into RN .
Remark. Let us note here the almost trivial fact that the doubling condition is also necessary in As-
souad’s statement: if (X, d1−ε
X ) can be embedded into some doubling metric space (Y, dY ) with distortion
D for some ε ∈ (0, 1), then (X, dX ) is also doubling.
An impressive (trivial) corollary of Assouad’s theorem is the following:
√
Corollary 7.5. A metric space (X, d) is such that (X, d) is bi-Lipschitz to a subset of some RN if and
only if (X, d) is doubling.
23
Now we will proceed with the proof:
Proof of Assouad’s embedding theorem. Let (X, d) a K-doubling metric space and ε ∈ (0, 1). Fix
(temporarily) some c > 0 and let N be any c-net in X. Define a graph structure on N by setting:
(59) x ∼ y ⇔ d(x, y) 6 12c, x, y ∈ N .
Observation I. There exists an M = M (K) ∈ N such that the degree of this graph is bounded by M − 1,
i.e.
(60) ∀x ∈ N : |{y ∈ N : d(x, y) 6 12c}| 6 M − 1.
In particular, this graph is M -colorable: there exists a coloring χ : N → {1, 2, ..., M } such that for x ∼ y,
χ(x) 6= χ(y).
Proof. Consider an x ∈ N and the ball B(x, 12c). By the doubling condition, there exists a power of
K, say M − 1 = M (K) − 1, such that B(x, 12c) can be covered by M − 1 balls of radius c/2. However,
each of this balls can contain at most one element of N , thus (60) holds true. In particular (a proof by
induction), the graph is M -colorable, which proves the observation. 2
Now, define the embedding fc = f : X → RM by
X
(61) f (x) = gz (x)eχ(z) ,
z∈N
where
n 2c − d(x, z) o
(62) gz (x) = max ,0
2c
M 1
and e1 , ..., eM is the standard basis of R . Observe that each gz is 2c -Lipschitz, supported in B(z, 2c)
and also the above sum is finite since B(z1 , 2c) ∩ B(z2 , 2c) 6= ∅ only for finitely many pairs z1 , z2 ∈ N .
Finally, the number of such pairs depends only on K, call it C0 = C0 (K). Thus kf k∞ 6 C0 which
implies:
kf (x) − f (y)k2 6 2C0 , ∀x, y ∈ X.
Furthermore:
X C0
kf (x) − f (y)k2 6 kgz (x) − gz (y)k2 6 d(x, y).
2c
z∈N
Hence, we can summarize the above as follows:
n d(x, y) o
(63) kf (x) − f (y)k2 6 B min 1, , ∀x, y ∈ X,
c
for some B = B(K) > 0.
Observation II. If x, y ∈ X and 4c 6 d(x, y) 6 8c, then f (x) and f (y) are orthogonal.
Proof. In the expansion of f , gz (w) 6= 0 if and only if z ∈ B(w, 2c). The balls of radius 2c around x, y are
disjoint but any two points of N from these balls are neighbours in the graph (their distance is 6 12c).
This implies that f (x) and f (y) are disjointly supported, thus orthogonal. 2
Thus, for x, y ∈ X with 4c 6 d(x, y) 6 8c we have:
1
q
(64) kf (x) − f (y)k22 = kf (x)k22 + kf (y)k22 > ,
2
since there exist points of N which are c-close to x, y.
Now, we are ready to get to the main part of the proof. Applying the above construction with
c = 2−j−3 , for j ∈ Z, we get functions fj : X → RM such that:
(65) kfj (x) − fj (y)k2 6 B min{1, 2j d(x, y)}, ∀x, y ∈ X,
where B = B(K) > 0 and if 2−j−1 6 d(x, y) 6 2−j :
(66) kfj (x) − fj (y)k2 > A,
for some absolute constant A > 0. We want to glue together the functions {fj }j∈Z . For this purpose, fix
an integer m ∈ Z, to be determined later. Also denote by u1 , ..., u2m the standard basis of R2m with the
24
convention uj+2m = uj , for j ∈ Z. The ε-Assouad embedding of X is the map f : X → RM ⊗R2m ≡ R2mM
defined by
X 1
(67) f (x) = j(1−ε)
fj (x) ⊗ uj , x ∈ X.
j∈Z
2
We will prove that f is actually a bi-Lipschitz embedding. Let x, y ∈ X and ` ∈ Z such that
1 1
6 d(x, y) < ` .
2`+1 2
First, for the upper bound:
X 1
kf (x) − f (y)k2 6 j(1−ε)
kfj (x) − fj (y)k2
j∈Z
2
X 1 X 1
6 kfj (x) − fj (y)k2 + kfj (x) − fj (y)k2
j6`
2j(1−ε) j>`
2j(1−ε)
X 1 X 1
6B 2j d(x, y) + B
j6`
2j(1−ε) j>`
2 j(1−ε)
X B
.B 2jε d(x, y) +
j6`
2`(1−ε)
Bd(x, y)1−ε ,
by the way we picked `. To prove the lower bound, observe first that
X 1 X 1
kf (x) − f (y)k2 > j(1−ε)
· (fj (x) − fj (y)) ⊗ uj − kfj (x) − fj (y)k2 .
2 2 2j(1−ε)
|j−`|<m |j−`|>m
B2(`−m)ε d(x, y)
B
mε d(x, y)1−ε .
2
Finally, in the first term, we are tensorizing with respect to distinct uj ’s; in particular:
X 1 1
j(1−ε)
· (fj (x) − fj (y)) ⊗ uj > `(1−ε) kf` (x) − f` (y)k2
2 2 2
|j−`|<m
A
>
2`(1−ε)
d(x, y)1−ε .
The above series of inequalities proves that, for large enough m,
kf (x) − f (y)k2 & d(x, y)1−ε ,
which finishes the proof. 2
25
8. The Johnson-Lindenstrauss extension theorem
In this section we will present an important extension theorem for Lipschitz functions with values in a
Hilbert space. Afterwards, we will give an example proving that this result is close to being asymptotically
sharp.
8.1. Statement and proof of the theorem. The main result of this section is the following:
Theorem 8.1 (Johnson-Lindenstrauss extension theorem, 1984). Let (X, d) be a metric space and A ⊆ X
a subset with |A| = n. Then, for every f : A → `2 there exists an extension f˜ : X → `2 , i.e. f˜|A = f ,
√
satisfying kf˜kLip . log nkf kLip .
Using the terminology of Section 1, the result states that for every metric space (X, d),
p
(68) en (X, `2 ) . log n.
The ingredients needed for the proof of the extension theorem are the nonlinear Hahn-Banach theorem,
Kirszbraun’s extension theorem and the following dimension reduction result which is fundamental in
its own right:
Theorem 8.2 (Johnson-Lindenstrauss lemma). For every ε > 0 there exists c(ε) > 0 such that: for
every n > 1 and every x1 , ..., xn ∈ `2 there exist y1 , ..., yn ∈ Rk satisfying the following:
(i) For every i, j = 1, 2, ..., n:
(69) kxi − xj k2 6 kyi − yj k2 6 (1 + ε)kxi − xj k2 ;
(ii) k 6 c(ε) log n.
Remark. The proof that we will present gives the dependence c(ε) . ε12 . In the early 00’s, Alon proved
1
that in order for the lemma to hold, one must have c(ε) & ε2 log(1/ε) . However, the exact dependence on
ε is not yet known.
Let’s assume for the moment that the lemma is valid.
Proof of Theorem 8.1. Let g : f (A) → `k2 be the embedding given by the Johnson-Lindenstrauss lemma,
i.e. the map xi 7→ yi , when ε = 1. The lemma guarantees that k log n, kgkLip 6 2 √ and kg −1 kLip 6 1.
k k −1
Consider also the identity map I : `2 → `∞ ; it holds kIkLip = 1 and kI kLip = k. For the map
◦ g ◦ f : X → `k∞ satisfying
I ◦ g ◦ f : A → `k∞ , the nonlinear Hahn-Banach theorem gives an extension I ^
kI ^
◦ g ◦ f kLip = kI ◦ g ◦ f kLip 6 2kf kLip .
Also, by Kirszbraun’s theorem, the map g −1 : g ◦ f (A) → `2 admits an extension gg
−1 : `k → ` satisfying
2 2
−1 k −1
kgg Lip = kg kLip = 1.
Define now f˜ : X → `2 to be
(70) f˜ = gg
−1 ◦ I −1 ◦ I ^
◦g◦f
and observe that f˜ indeed extends f and
√
kf˜kLip . kkf kLip log nkf kLip ,
p
as we wanted. 2
n
Proof of the Johnson-Lindenstrauss lemma. Without loss of generality, we assume that x1 , ..., xn ∈ R .
Fix some u ∈ S n−1 and let g1 , ..., gn be independent standard gaussian random variables, i.e. each gi
has density
1 2
φ(t) = √ · e−t /2 , t ∈ R.
2π
Denote by
X n
G = G(u) = gi ui
i=1
and observe that G is also a random variable with the same density, by the rotation invariance of the
Gauss measure. Consider now G1 , ..., Gk i.i.d. copies of G and the (random) embedding:
1 1 1
u 7−→ kuk2 · √ G1 (u), √ G2 (u), ..., √ Gk (u) ,
k k k
26
for some k to be determined later. We will prove that there exists an embedding from this random family
such that gives the desired inequalities. Fix some number λ ∈ (0, 12 ) and u ∈ S n−1 ; then:
1 Xk Pk
2
P G2i > 1 + ε = P eλ i=1 Gi > eλ(1+ε)k
k i=1
(•) h Pk 2
i
6 e−λ(1+ε)k E eλ i=1 Gi
k
Y 2
= e−λ(1+ε)k E eλGi
i=1
k Z ∞
−λ(1+ε)k
Y 1 2 2
=e √ eλt e−t /2
dt
i=1
2π −∞
1
= e−λ(1+ε)k
(1 − 2λ)k/2
k
= exp − λ(1 + ε)k − log(1 − 2λ) ,
2
ε
where in (•) we used Markov’s inequality. The last quantity is minimized when λ = 2(1+ε) ; thus we get
k
1 X
ε 1 1
(71) P G2i > 1 + ε 6 e−k 2+2 log 1+ε .
k i=1
One can easily calculate and see that
ε 1 1
+ log & ε2 ;
2 2 1+ε
thus there exists a c > 0 such that
k
1 X 2
(72) P G2i > 1 + ε 6 e−ckε .
k i=1
The exact same proof also gives the inequality
1 Xk 2
(73) P G2i 6 1 − ε 6 e−ckε
k i=1
and combining these we get
1X k 2
(74) P Gi (u)2 − 1 > ε 6 2e−ckε ,
k i=1
8.2. Almost sharpness of the theorem. The best known lower for the Johnson-Lindenstrauss exten-
sion question (also proven in the same paper) is the following:
Theorem 8.3. For arbitrarily large n, there exists a sequence of metric spaces {Xn }, subsets An ⊆ Xn
with |An | = n and 1-Lipschitz functions fn : An → `2 such that every extension f˜n : Xn → `2 , i.e.
f˜n |An = fn , satisfies
s
˜ log n
(77) kfn kLip & .
log log n
Strategy. We will find finite-dimensional normed spaces X, Y and Z, where X is a subspace of Y and
Z is a Hilbert space, and a linear operator T : X → Z such that:
(i) dim X = dim Z=k,
(ii) T is an isomorphism satisfying kT k = 1 and kT −1 k 6 4 and √
(iii) for any linear operator S : Y → Z that extends T , i.e. S|X = T we have kSk > 2k .
Our finite set A will be an ε-net in the unit sphere of X, SX = {x ∈ X : kxk = 1} and f = T |A .
Using a linearization argument we will see that every extension of f must have large Lipschitz norm (the
constants above are unimportant).
We digress to note that condition (iii) above is, in a sense, sharp. This follows from this classical
theorem of Kadec and Snobar:
Theorem 8.4 (Kadec-Snobar). Let Y be a Banach space √ and X a k-dimensional subspace of Y . Then
there exists a projection P : Y → X satisfying kP k 6 k.
We remind the following lemma, which we partially saw in the proof of Assouad’s theorem:
Lemma 8.5. If X is a k-dimensional Banach space and ε > 0, then there exists an ε-net N ⊆ SX such
that
k
2
(78) |N | 6 1 + .
ε
Also, every such net satisfies
k−1
2
(79) |N | > .
ε
Proof. Similar to the proof of Lemma 7.3 – left as an exercise.
Fix now the Banach spaces X, Y, Z, where Z = `k2 , and T : X → Z as described in the strategy above
(whose existence we will prove later). Consider N ⊆ SX an ε-net, as in the previous lemma, and define
A = N ∪ {0}, f = T |A : A → Z. We will prove that for a Lipschitz extension f˜ : SY ∪ {0} → Z of f , i.e.
f˜|A = f , with kf˜kLip = L, L has to be large. The proof will be completed via the following lemmas:
Lemma 8.6. Consider F : Y → Z, the positively homogeneous extension of f˜, that is
(
kykf˜ y/kyk , y 6= 0
(80) F (y) = .
0, y=0
and thus
∞
X
k(S|X T −1 )−1 k 6 kI − S|X T −1 kj 6 2.
j=0
0 0 −1 −1
Define now, S : Y → Z by S = (S|X T ) S and observe that S 0 extends T and
kS 0 k 6 2kSk 6 24L.
So, the non-extendability property in the construction of T implies that:
√ √
k k
(•) kS 0 k > =⇒ L > .
2 48
k √
Finally, remember that n = |A| = |N | + 1 . 3ε and that, without loss of generality, L . log n (from
the positive Johnson-Lindenstrauss theorem). We need to pick ε > 0 so that (∗) holds. Observe that
p
160kLε . kε log n . k 3/2 ε log(1/ε),
29
log k
from the inequalities above. Choosing ε k3/2
, (∗) holds and also
log n
n . eCk log k =⇒ k & ,
log log n
for a universal constant C > 0. Finally, from (•) we deduce that
s
√ log n
L& k& ,
log log n
as we wanted. 2
Remark. Observe that the proof above cannot give an asymptotically better result than the one
exhibited here. In particular, getting rid of the log log n term on the denominator, would be equivalent
(in the above argument) to choosing ε to be a constant, i.e. independent of k, which contradicts (∗).
We now proceed with the various ingridients of the proof. First we will explain in detail the con-
struction of the spaces X, Y, Z and the operator T : X → Z. As usual, it will be based on some Fourier
Analysis on the discrete cube.
Construction. Consider the Hamming cube Fk2 and let Y = L1 (Fk2 , µ), where µ is the uniform probability
measure on Fk2 . Also, consider the subspace
n n
X o
(81) X = f ∈ L1 (Fk2 ) : ∃ ai ∈ R s.t. f (x) = ai (−1)xi , ∀x ∈ Fk2 ,
i=1
6 kP kkf k1 ;
that is, kQk 6 kP k.
Claim. Q can be written as
k
X
(85) Qf (x) = fb({i})εi (x), x ∈ Fk2 .
i=1
∗
Pm if in particular f ∈ Y then PY f = f . Now, fix some ε > 0 and for f smooth on Y and
Thus,
i=1 ai ei = 1, write:
m m Z
X X ∂f
(PY f ) ai ei = ai (y)ψ(y) dy
i=1 i=1 Y ∂yi
Z m
1 X
= ψ(y) f y + εai ei − f (y) + εθ(e, y) dy
ε Y i=1
Pm
Z f y + i=1 ai ei − f (y)
6 ψ(y) dy + sup |θ(ε, y)|
Y ε y∈supp(ψ)
6 kf kLip + o(1),
33
as ε → 0+ , since θ(ε, y) → 0 uniformly in y in supp(ψ). So far we have proved that for every a1 , . . . , am ∈
R:
Xm Xm
PY f ai ei 6 kf kLip a i ei
i=1 i=1
for every smooth function
R f . For a general f ∈ Y # consider a sequence {χn } of C ∞ compactly supported
functions in Y with Y χn (y) dy = 1 whose supports shrink to zero. Then, the functions fn = f ∗ χn are
smooth averages of f , i.e. kf ∗ χn kLip 6 kf kLip . Now, since χn ∗ f −→ f uniformly on compact sets, we
deduce that:
kPY f kLip = lim kPY fn kLip 6 lim sup kfn kLip 6 kf kLip ,
n→∞ n→∞
which implies that kPY k 6 1. 2
Finally, we will use this to prove Theorem 8.13:
Proof of the strong Lindenstrauss theorem. Let k = dim X, m = dim Y and e1 , . . . , ek a basis of X,
completed into a basis e1 , . . . , em of Y . In what follows identify X ≡ Rk and Y ≡ Rm ≡ Rk × Rm−k .
∞
Finally, fix
R two compactly supported C functions ψ1 : Rk → [0, ∞) and ψ2 : Rm−k → [0, ∞) satisfying
# ∗
R
ψ = Rm−k ψ2 = 1. Define now Pn : Y → Y by
Rk 1
m m Z
X
m−k
X ∂
(93) (Pn f ) ai ei = −n ai f (y) ψ1 (y1 , . . . , yk )ψ2 (nyk+1 , . . . , nym ) dy.
i=1 i=1 Rm ∂y i
34
9. Embedding unions of metric spaces into Euclidean space
Here is the main question we want to address in this section:
Question: Suppose (X, d) is a metric space and X = A ∪ B, where c2 (A) < ∞ and c2 (B) < ∞. Does
this imply that c2 (X) < ∞?
The answer is given, quantitatively, by the following recent result due to K. Makarychev and Y.
Makarychev:
Theorem 9.1. Let (X, d) be a metric space and A, B ⊆ X such that X = A ∪ B. Then, if c2 (A) < DA
and c2 (B) < DB , we also have c2 (X) . DA DB .
It is not currently known if the dependence on DA , DB above is sharp. The best previously known
similar result is the following:
Theorem 9.2. Let (X, d) be a metric space, A, B ⊆ X such that X = A ∪ B and (S, ρ) an ultrametric
space. If cS (A) < DA and cS (B) < DB , then cS (X) . DA DB . Furthermore, the above dependence on
DA , DB is sharp.
Let us now proceed with the proof of Theorem 9.1. As usual, we can assume without loss of generality
that X is finite and also that A ∩ B = ∅.
Lemma 9.3. Let X = A ∪ B as above and α > 0. Then there exists some A0 ⊆ A with the following
properties:
(i) For every a ∈ A, there exists a0 ∈ A0 such that
d(a0 , B) 6 d(a, B) and d(a, a0 ) 6 αd(a, B).
(ii) For every a01 , a02 ∈ A0
d(a01 , a02 ) > α min{d(a01 , B), d(a02 , B)}.
Proof. We will construct the subset A0 . First pick a01 ∈ A such that
d(a01 , B) = d(A, B).
Sj
Now, if we have chosen a01 , a02 , ..., a0j ∈ A, pick a0j+1 ∈ A to be any point in the set Ar i=1 B(a0i , αd(a0i , B))
that is closest to B. Since X is finite, this process will terminate after s 6 |A| steps; then define
A0 = {a01 , ..., a0s }. Now, the required conditions can be easily checked:
(i) For some a ∈ A r A0 , let j be the minimum index such that a ∈ B(a0j , αd(a0j , B)). Since a0j was
chosen in this step, we get
d(a0j , B) 6 d(a, B)
and by the way we picked j
d(a, a0j ) 6 αd(a0j , B) 6 αd(a, B).
(ii) For any i < j, since a0i was picked over a0j , we have d(a0i , B) 6 d(a0j , B) and since a0j ∈
/ B(a0i , αd(a0i , B)):
d(a0j , a0i ) > αd(a0i , B).
For the above set A0 , define a map f : A0 → B by setting f (a0 ) to be any closest point to a0 in B, i.e.
(94) d(a0 , f (a0 )) = d(a0 , B).
Lemma 9.4. For the above A0 and f : A0 → B, we have
1
kf kLip 6 2 1 + .
α
Proof. Let a01 , a02 ∈ A0 . Then
d(f (a01 ), f (a02 )) 6 d(f (a01 ), a01 ) + d(a01 , a02 ) + d(a02 , f (a02 ))
= d(a01 , B) + d(a01 , a02 ) + d(a02 , B).
Observe now that
max{d(a01 , B), d(a02 , B)} 6 d(a01 , a02 ) + min{d(a01 , B), d(a02 , B)}
35
and using condition (ii) we get:
d(f (a01 ), f (a02 )) 6 2d(a01 , a02 ) + 2 min{d(a01 , B), d(a02 , B)}
1 0 0
62 1+ d(a1 , a2 ),
α
which is what we wanted.
The main part of the proof is the following construction of an approximate embedding:
Lemma 9.5. For X = A ∪ B as above, there exists a map ψ : X → `2 such that:
(i) For every a1 , a2 ∈ A:
1
(95) kψ(a1 ) − ψ(a2 )k2 6 2 1 + DA DB d(a1 , a2 ).
α
(ii) For every b1 , b2 ∈ B:
(96) d(b1 , b2 ) 6 kψ(b1 ) − ψ(b2 )k2 6 DB d(b1 , b2 ).
(iii) For every a ∈ A, b ∈ B:
(97) kψ(a) − ψ(b)k2 6 2(1 + α)DA DB + (2 + α)DB d(a, b)
and
(98) kψ(a) − ψ(b)k2 > d(a, b) − (1 + α)(2DA DB + 1)d(a, B).
Proof. By our assumptions on A and B, there exist embeddings φA : A → `2 and φB : B → `2 such that
d(a1 , a2 ) 6 kφA (a1 ) − φA (a2 )k2 6 DA d(a1 , a2 )
and
d(b1 , b2 ) 6 kφB (b1 ) − φB (b2 )k2 6 DB d(b1 , b2 ),
for every a1 , a2 ∈ A and b1 , b2 ∈ B. Consider the mapping
φB ◦ f ◦ φ−1 0
A : φA (A ) −→ φB (B) ⊆ `2
and we’ll check that ψ satisfies what we want. First of all, (ii) is trivial and for (i):
kψ(a1 ) − ψ(a2 )k2 = kh ◦ φA (a1 ) − h ◦ φA (a2 )k2
1
62 1+ DB kφA (a1 ) − φA (a2 )k2
α
1
62 1+ DA DB d(a1 , a2 ).
α
Now, to check (iii), take a ∈ A and b ∈ B. By our choice of A0 , there exists a0 ∈ A0 such that
d(a0 , B 0 ) 6 d(a, B) and d(a, a0 ) 6 αd(a, B).
Denote b0 = f (a0 ) and observe that, since h is an extension:
ψ(a0 ) = φB ◦ f ◦ φ−1 0
A ◦ φA (a )
= φB ◦ f (a0 )
= φB (b0 )
36
and thus
kψ(a) − ψ(b)k2 6 kψ(a) − ψ(a0 )k2 + kφB (b0 ) − φB (b)k2
1
62 1+ DA DB d(a, a0 ) + DB d(b, b0 )
α
1
DA DB αd(a, B) + DB d(b, a) + d(a, a0 ) + d(a0 , b0 )
62 1+
α
6 2(1 + α)DA DB d(a, b) + DB d(a, b) + αd(a, B) + d(a0 , B)
d2 ,
for the optimal value of θ, where d = d(a, b), ua = βd(a, B) and ub = βd(b, A).
Now, to bound kF kLip , if a1 , a2 ∈ A:
kF (a1 ) − F (a2 )k22 = kψA (a1 ) − ψA (a2 )k22 + kψB (a1 ) − ψB (a2 )k22 + γ 2 |d(a1 , B) − d(a2 , B)|2
1 2 2 2 2
6 4 1+ DA DB + DA + γ 2 d(a1 , a2 )2
α
2 2
= Oα (DA DB ) · d(a1 , a2 )2 ,
37
using the previous lemma and that γ = Oα (DA DB ). Similarly we can give bounds for b1 , b2 ∈ B. Finally,
for a ∈ A, b ∈ B:
2
kF (a) − F (b)k22 6 2 2(1 + α)DA DB + (2 + α)DB d(a, b)2 + |ψ3 (a) − ψ3 (b)|2
2 2
) · d(a, b)2 + γ 2 d(a, B)2 + d(b, A)2
= Oα (DA DB
2 2
= Oα (DA DB ) · d(a, b)2 .
Thus, we get c2 (X) . DA DB . 2
Before moving to another topic, two related open problems are the following:
Open problem 9.6. Are there analogues of Theorem 9.1 for p 6= 2?
Open problem 9.7. Suppose that the metric space (X, d) can be written as X = A1 ∪ A2 ∪ ... ∪ Ak ,
where c2 (Ai ) = 1 for every i. How big can c2 (X) be?
38
10. Extensions of Banach space-valued Lipschitz functions
2
Our goal here is to give a self-contained proof of the following theorem, which was originally proved
in [LN05]. The proof below is based on the same ideas as in [LN05], but some steps and constructions
are different, leading to simplifications. The previously best-known bound on this problem was due to
[JLS86].
Theorem 10.1. Suppose that (X, dX ) is a metric space and (Z, k · kZ ) is a Banach space. Fix an integer
n > 3 and A ⊆ X with |A| = n. Then for every Lipschitz function f : A → Z there exists a function
F : X → Z that extends f and
log n
(102) kF kLip . kf kLip .
log log n
By normalization, we may assume from now on that kf kLip = 1. Write A = {a1 , . . . , an }. For
r ∈ [0, ∞) let Ar denote the r-neighborhood of A in X, i.e.,
n
def
[
Ar = BX (aj , r),
j=1
def
where for x ∈ X and r > 0 we denote BX (x, r) = {y ∈ X : dX (x, y) 6 r}. Given a permutation
π ∈ Sn and r ∈ [0, ∞), for every x ∈ Ar let jrπ (x) ∈ {1, . . . , n} be the smallest j ∈ {1, . . . , n} for which
dX (aπ(j) , x) 6 r. Such a j must exist since x ∈ Ar . Define aπr : X → A by
x if x ∈ A,
def
(103) ∀ x ∈ X, aπr (x) = ajrπ (x) if x ∈ Ar r A,
a1 if x ∈ X r Ar .
We record the following lemma for future use; compare to inequality (3) in [MN07].
Lemma 10.2. Suppose that r > 0 and that x, y ∈ Ar satisfy dX (x, y) 6 r. Then
|{π ∈ Sn : aπr (x) 6= aπr (y)}| |A ∩ BX (x, r − dX (x, y))|
61− .
n! |A ∩ BX (x, r + dX (x, y))|
Proof. Suppose that π ∈ Sn is such that the minimal j ∈ {1, . . . , n} for which aπ(j) ∈ BX (x, r +dX (x, y))
actually satisfies aπ(j) ∈ BX (x, r − dX (x, y)). Hence jrπ (x) = j and therefore aπ(j) = aπr (x). Also,
dX (aπ(j) , y) 6 dX (aπ(j) , x) + dX (x, y) 6 r, so jrπ (y) 6 j. But dX (x, ajrπ (y) ) 6 dX (y, ajrπ (y) ) + dX (x, y) 6
r + dX (x, y), so by the definition of j we must have jrπ (y) > j. Thus jrπ (y) = j, so that aπr (y) =
aπ(j) = aπr (x). We have shown that if in the random order that π induces on A the first element
that falls in the ball BX (x, r + dX (x, y)) actually falls in the smaller ball BX (x, r − dX (x, y)), then
aπr (y) = aπr (x). If π is chosen uniformly at random from Sn then the probability of this event equals
|A ∩ BX (x, r − dX (x, y))|/|A ∩ BX (x, r + dX (x, y))|. Hence,
|{π ∈ Sn : aπr (x) = aπr (y)}| |A ∩ BX (x, r − dX (x, y))|
> .
n! |A ∩ BX (x, r + dX (x, y))|
Corollary 10.3. Suppose that 0 6 u 6 v and x, y ∈ Au satisfy dX (x, y) 6 min{u − dX (x, A), v − u/2}.
Then Z v Z 2t
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
1
dr dt . dX (x, y) log n.
u t t n!
def
Proof. Denote d = dX (x, y) and for r > 0
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
def def
(104) g(r) = and h(r) = log (|A ∩ BX (x, r)|) .
n!
Since x, y ∈ Au and for r > u we have Au ⊆ Ar and dX (x, y) 6 u 6 r, Lemma 10.2 implies that
(105) ∀ r > u, g(r) 6 1 − eh(r−d)−h(r+d) 6 h(r + d) − h(r − d),
where in the last step of (105) we used the elementary inequality 1 − e−α 6 α, which holds for every
α ∈ R.
2This section was typed by Assaf Naor.
39
Note that by Fubini we have
Z v Z 2t Z 2v !
Z min{v,r} Z 2v
1 g(r) min{v, r}
(106) g(r)dr dt = dt dr = g(r) log dr.
u t t u max{u,r/2} t u max{u, r/2}
where the last step of (107) is valid because 2v − d > u + d, due to our assumption that d 6 v − u/2.
Since h is nondecreasing, for every s ∈ [2v − d, 2v + d] we have h(s) 6 h(2v + d), and for every
s ∈ [u − d, u + d] we have h(s) > h(u − d). It therefore follows from (107) that
Z v Z 2t
1 (104) |A ∩ BX (x, 2v + d)|
(108) g(r)dr dt . d h(2v + d) − h(u − d) = d log . d log n,
u t t |A ∩ BX (x, u − d)|
where in the last step of (108) we used the fact that |A ∩ BX (x, 2v + d)| 6 |A| = n, and, due to
our assumption d 6 u − dX (x, A) ( ⇐⇒ dX (x, A) 6 u − d), that A ∩ BX (x, u − d) 6= ∅, so that
|A ∩ BX (x, u − d)| > 1.
Returning to the proof of Theorem 10.1, fix ε ∈ (0, 1/2). Fix also any (2/ε)-Lipschitz function
φε : R → [0, 1] that vanishes outside [ε/2, 1 + ε/2] and φε (s) = 1 for every s ∈ [ε, 1]. Note that
Z 1 Z ∞ Z 1+ε/2
ds φε (s) ds
log(1/ε) = 6 ds 6 6 log(3/ε).
ε s 0 s ε/2 s
Hence, if we define c(ε) ∈ (0, ∞) by
Z ∞
1 def φε (s)
(109) = ds,
c(ε) 0 s
then
1 1
(110) 6 c(ε) 6 .
log(3/ε) log(1/ε)
Define F : X → Z by setting F (x) = f (x) for x ∈ A and
def c(ε)
X Z ∞ 1 2 Z 2t
π
(111) ∀ x ∈ X r A, F (x) = φε dX (x, A) f (ar (x)) dr dt.
n! 0 t2 t t
π∈Sn
By definition, F extends f . Next, suppose that x ∈ X and y ∈ X r A. Fix any z ∈ A that satisfies
dX (x, z) = dX (x, A) (thus if x ∈ A then z = x). We have the following identity.
F (y) − F (x)
(112)
2dX (y,A) 2dX (x,A)
c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
= (f (a) − f (z)) drdt.
n! 0 t t2
π∈Sn a∈A
(113)
2dX (w,A) 2dX (w,A)
c(ε) X X ∞ 2t φε 1{aπr (w)=a} c(ε) X ∞ 2t φε
Z Z Z Z
t t
f (z)drdt = f (z)drdt
n! 0 t t2 n! t2
π∈Sn a∈A π∈Sn 0 t
Z ∞ Z ∞
1 2 (∗) φε (s) (109)
= c(ε) φε dX (w, A) dt f (z) = c(ε) ds f (z) = f (z),
0 t t 0 s
where in (∗) we made the change of variable s = 2dX (w, A)/t, which is allowed since dX (w, A) > 0.
Due to (113), if x, y ∈ X r A then (112) is a consequence of the definition (111). If x ∈ A (recall that
y ∈ X r A) then z = x and φε (2dX (x, A)/t) = 0 for all t > 0. So, in this case (112) follows once more
from (113) and (111).
40
By (112) we have
kF (x) − F (y)kZ
2dX (y,A) 2dX (x,A)
c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
6 kf (a) − f (z) kZ drdt
n! 0 t t2
π∈Sn a∈A
(114)
2dX (y,A) 2dX (x,A)
c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
6 dX (a, z)drdt
n! 0 t t2
π∈Sn a∈A
(115)
2dX (y,A) 2dX (x,A)
2c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
6 dX (x, a)drdt,
n! 0 t t2
π∈Sn a∈A
where in (114) we used the fact that kf kLip = 1 and in (115) we used the fact that for every a ∈ A we
have dX (a, z) 6 dX (a, x) + dX (x, z) 6 2dX (a, x), due to the choice of z as the point in A that is closest
to x.
To estimate (115), fix t > 0 and r ∈ [t, 2t]. If φε (2dX (y, A)/t)1{aπr (y)=a} 6= φε (2dX (x, A)/t)1{aπr (x)=a}
then either a = aπr (x) and 2dX (x, A)/t ∈ supp(φε ) or a = aπr (y) and 2dX (y, A)/t ∈ supp(φε ). Recalling
that supp(φε ) ⊆ [ε/2, 1 + ε/2], it follows that either a = aπr (x) and dX (x, A) < t or a = aπr (y) and
dX (y, A) < t. If a = aπr (x) and dX (x, A) < t then since t 6 r it follows that x ∈ Ar , and so the definition
of aπr (x) implies that dX (x, a) = dX (aπr (x), x) 6 r. On the other hand, if a = aπr (y) and dX (y, A) < t
then as before we have dX (y, a) = dX (aπr (y), y) 6 r, and therefore dX (x, a) 6 dX (x, y) + dX (y, a) 6
dX (x, y) + r. We have thus checked that dX (x, a) 6 dX (x, y) + r 6 dX (x, y) + 2t whenever the integrand
in (115) is nonzero. Consequently,
kF (x) − F (y)kZ
2dX (y,A) 2dX (x,A)
2c(ε) X X ∞ 2t φε
(115) 1{aπr (y)=a} + φε 1{aπr (x)=a}
Z Z
t t
6 dX (x, y)drdt
n! 0 t t2
π∈Sn a∈A
2dX (y,A) 2dX (x,A)
4c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
+ drdt.
n! 0 t t
π∈Sn a∈A
2dX (y,A) 2dX (x,A)
(113) 4c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
= 4dX (x, y) + drdt.
n! 0 t t
π∈Sn a∈A
Therefore, in order to establish the validity of (102) it suffice to show that we can choose ε ∈ (0, 1/2) so
that
(116)
Z ∞ Z 2t φε 2dX (y,A) 1 π − φ 2dX (x,A)
1{aπr (x)=a}
c(ε) X X t {ar (y)=a} ε t log n
drdt . dX (x, y).
n! 0 t t log log n
π∈Sn a∈A
Corollary 10.4 implies that (117) holds true when min{dX (x, A), dX (y, A)} 6 5dX (x, y)/3. We shall
therefore assume from now on that the assumption of Corollary 10.4 fails, i.e., that
3
(118) dX (x, y) < min{dX (x, A), dX (y, A)}.
5
Define
def 2dX (y, A) 2dX (x, A)
(119) Uε (x, y) = t ∈ (0, ∞) : φε − φε >0 ,
t t
and
def 2dX (y, A) 2dX (x, A)
(120) Vε (x, y) = t ∈ (0, ∞) : φε + φε >0 .
t t
Then, for every π ∈ Sn , t > 0 and r ∈ [t, 2t] we have
X 2dX (y, A) 2dX (x, A)
φε 1{aπr (y)=a} − φε 1{aπr (x)=a}
t t
a∈A
2dX (y, A) 2dX (x, A)
= φε − φε 1{aπr (x)=aπr (y)}
t t
2dX (y, A) 2dX (x, A)
+ φε + φε 1{aπr (x)6=aπr (y)}
t t
dX (x, y)
(121) . 1Uε (x,y) + 1Vε (x,y) · 1{aπr (x)6=aπr (y)} .
εt
Where in (121) we used the fact that φε is (2/ε)-Lipschitz and that |dX (x, A) − dX (y, A)| 6 dX (x, y).
Consequently, in combination with the upper bound on c(ε) in (110), it follows from (121) that
2dX (y,A)
c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε 2dX (x,A) t 1{aπr (x)=a}
(122) drdt
n! t
π∈Sn a∈A 0 t
Z 2t
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
Z Z
dX (x, y) dt 1 1
. + dr dt.
ε log(1/ε) Uε (x,y) t log(1/ε) Vε (x,y) t t n!
To bound the first term in (122), denote
def def
m(x, y) = min{dX (x, A), dX (y, A)} and M (x, y) = max{dX (x, A), dX (y, A)}.
If t ∈ [0, ∞) satisfies t < 2m(x, y)/(1 + ε/2) then min{2dX (x, A)/t, 2dX (y, A)/t} > 1 + ε/2, and there-
fore by the definition of φε we have φε (2dX (x, A)/t) = φε (2dX (y, A)/t) = 0. Similarly, if t ∈ [0, ∞)
satisfies t > 4M (x, y)/ε then max{2dX (x, A)/t, 2dX (y, A)/t} < ε/2, and therefore by the definition of
φε we also have φε (2dX (x, A)/t) = φε (2dX (y, A)/t) = 0. Finally, if 2M (x, y) 6 t 6 2m(x, y)/ε then
42
2dX (x, A)/t, 2dX (y, A)/t ∈ [ε, 1], so by the definition of φε we have φε (2dX (x, A)/t) = φε (2dX (y, A)/t) =
1. By the definition of Uε (x, y) in (119), we have thus shown that
2m(x, y) 2m(x, y) 4M (x, y)
Uε (x, y) ⊆ , 2M (x, y) ∪ , .
1 + ε/2 ε ε
Consequently,
Z Z 2M (x,y) Z 4M (x,y)
dt dt ε dt 2M (x, y)
(123) 6 + . log . 1,
Uε (x,y) t t t m(x, y)
2m(x,y) 2m(x,y)
1+ε/2 ε
where the last step of (123) holds true because, due to the triangle inequality and (118), we have
3
M (x, y) 6 dX (x, y) + m(x, y) < m(x, y) + m(x, y) . m(x, y).
5
To bound the second term in (122), note that by the definition of Vε (x, y) in (120) and the choice of
φε ,
h
2dX (x, A) 2dX (y, A) ε εi
(124) t ∈ Vε (x, y) =⇒ , ∩ ,1 + 6= ∅.
t t 2 2
Hence,
2dX (x, A) 4dX (x, A) 2dX (y, A) 4dX (y, A)
(125) Vε (x, y) ⊆ , ∪ , ,
1 + ε/2 ε 1 + ε/2 ε
and therefore, using the notation for g : [0, ∞) → [0, 1] that was introduced in (104),
Z Z 2t Z 4dX ε(x,A) Z 2t Z 4dX ε(y,A) Z 2t
1 1 1
(126) g(r)dr dt 6 g(r)dr dt + g(r)dr dt.
Vε (x,y) t t 2dX (x,A)
1+ε/2
t t 2dX (y,A)
1+ε/2
t t
We wish to use Corollary 10.3 to estimate the two integrals that appear in the right hand side of (126).
To this end we need to first check that the assumptions of Corollary 10.3 are satisfied. Denote ux =
2dX (x, A)/(1 + ε/2) and uy = 2dX (y, A)/(1 + ε/2). Since ux > dX (x, A) we have x ∈ Aux . Analogously,
y ∈ Auy . Also,
(118)
3 4 + 2ε
(127) dX (y, A) 6 dX (x, y) + dX (x, A) 6 dX (x, A) + dX (x, A) = ux 6 ux ,
5 5
where the last step of (127) is valid because ε 6 1/2. From (127) we see that y ∈ Aux , and the symmetric
argument shows that x ∈ Auy . It also follows from (127) that dX (x, y) 6 ux −dX (x, A), and by symmetry
also dX (x, y) 6 uy − dX (y, A). Next, denote vx = 4dX (x, A)/ε and vx = 4dX (y, A)/ε. In order to verify
the assumptions of Corollary 10.3, it remains to check that dX (x, y) 6 min{vx −ux /2, vy −uy /2}. Indeed,
3
dX (x, y) (118) 3dX (x, A)/5 5 3ε(1 + ε/2)
< = 4 1 = < 1,
vx − ux /2 vx − ux /2 ε − 1+ε/2
5(4 + ε)
and the symmetric argument shows that also dX (x, y) < vy −uy /2. Having checked that the assumptions
of Corollary 10.3 hold true, it follows from (126) and Corollary 10.3 that
Z 2t
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
Z
1
(128) dr dt . dX (x, y) log n.
Vε (x,y) t t n!
The desired estimate (117) now follows from a substitution of (123) and (128) into (122).
References
[JLS86] W.B. Johnson, J. Lindenstrauss and G. Schechtman. Extensions of Lipschitz maps into Banach
spaces. Israel J. Math., 54(2):129–138, 1986.
[LN05] J. R. Lee and A. Naor. Extending Lipschitz functions via random metric partitions. Invent.
Math., 160(1):59–95, 2005.
[MN07] M. Mendel and A. Naor. Ramsey partitions and proximity data structures. J. Eur. Math. Soc.
(JEMS), 9(2):253–275, 2007.
43
11. Ball’s extension theorem
Our goal in this section is to present a fully nonlinear version of K. Ball’s extension theorem. The
result we present is the must general Lipschitz extension theorem currently known. The motivation for
the theorem is a classical result of Maurey concerning the extension and factorization of linear maps
between certain classes of Banach spaces.
11.1. Markov type and cotype. From now on, we will denote by ∆n−1 the n-simplex, i.e.
n n
X o
(129) ∆n−1 = (x1 , ..., xn ) ∈ [0, 1]n : xi = 1 ,
i=1
which we think as the space of all probability measures on the set {1, 2, ..., n}. A (row) stochastic matrix
is a square matrix A ∈ Mn (R) with non-negative entries, all the rows of which add up to 1. Given a
measure π ∈ ∆n−1 , a stochastic matrix A = (aij ) ∈ Mn (R) will be called reversible relative to π if for
every 1 6 i, j 6 n:
Definition 11.1 (Ball, 1992). A metric space (X, d) has Markov type p ∈ (0, ∞) with constant M ∈
(0, ∞) if for every t, n ∈ N, every π ∈ ∆n−1 , every stochastic matrix A ∈ Mn (R) reversible relative to π
and every x1 , ..., xn ∈ X the following inequality holds:
n
X n
X
(131) πi (At )ij d(xi , xj )p 6 M p t πi aij d(xi , xj )p .
i,j=1 i,j=1
The infimum over M such that this holds is the Markov type p-constant of X and is denoted by Mp (X).
Interpretation: The above notion has an important probabilistic interpretation, which we will now men-
tion without getting into many details. A process {Zt }∞
t=0 with values in {1, 2, ..., n} is called a stationary
and reversible Markov chain with respect to π and A, if:
(i) For every t ∈ N and every 1 6 i 6 n it holds P(Zt = i) = πi and
(ii) for every t ∈ N and every 1 6 i, j 6 n it holds P(Zt = j|Zt = i) = aij .
The Markov type p inequality can be equivalently written as follows: for every stationary and reversible
Markov chain on {1, 2, ..., n}, every f : {1, 2, ..., n} → X and every t ∈ N the following inequality holds:
From the above interpretation, it follows trivially that every metric space has Markov type 1 with
constant 1. A slightly more interesting computation is the following:
Lemma 11.2. Every Hilbert space H has Markov type 2 with M2 (H) = 1.
Proof. Consider some π ∈ ∆n−1 and a stochastic matrix A ∈ Mn (R) reversible relative to π. We want
to show that for every x1 , ..., xn ∈ H and for every t > 0,
n
X n
X
t 2
πi (A )ij kxi − xj k 6 t πi aij kxi − xj k2 .
i,j=1 i,j=1
Since the inequality above concerns only squares of the norms of vectors in a Hilbert space, it is enough
to be proven if x1 , ..., xn ∈ R, i.e. coordinatewise. Consider the inner product space L2 (π); that is Rn
with the dot product
n
X
hz, wi = πi zi wi .
i=1
44
If we denote by x the vector (x1 , x2 , ..., xn )T we see that
n
X
LHS = πi (At )ij (xi − xj )2
i,j=1
Xn
= πi (At )ij (x2i − 2xi xj + x2j )
i,j=1
X n n
X
= πi x2i t
− 2hA x, xi + πj (At )ji x2j
i=1 i,j=1
n
X
=2 πi x2i − hAt x, xi = 2h(I − At )x, xi.
i=1
t
Observe: we used that A is stochastic, given that A is, and that it is also reversible relative to π. On
the other hand, setting t = 1 the same calculation implies that
RHS = 2h(I − A)x, xi.
Hence, we must prove the inequality
(∗). h(I − At )x, xi 6 th(I − A)x, xi
A simple calculation shows that the reversibility of A is equivalent to the fact that A is self-adjoint as an
element of L2 (π) – thus diagonizable with real eigenvalues. So it is enough to prove (∗) for eigenvectors
x 6= 0: Ax = λx; that is
1 − λt 6 t(1 − λ).
To prove this we observe that, since A is stochastic, for every y ∈ Rn :
n
X
kAykL1 (π) = πi (Ay)i
i=1
Xn n
X
= πi aij yj
i=1 j=1
Xn
6 πi aij |yj |
i,j=1
Xn
= πj aji |yj |
i,j=1
X n
= πj |yj | = kykL1 (π) .
j=1
Of course, this proof has the drawback that it does not compute the exact value of c2 (Fn2 ) (as Enflo’s
proof did). However, it is way more robust since it does not rely on the structure of the cube as a
group. For example, one can use a similar proof to see that for arbitrary large subsets of the cube similar
distortion estimates hold.
Now we present the dual notion of Markov type:
Definition 11.4. A metric space (X, d) has metric Markov cotype p ∈ (0, ∞) with constant N ∈ (0, ∞)
if for every t, n ∈ N, every π ∈ ∆n−1 , every stochastic matrix A ∈ Mn (R) reversible relative to π and
every x1 , ..., xn ∈ X there exist y1 , ..., yn ∈ X such that
n
X n
X n
X t
1 X
p p p
(133) πi d(xi , yi ) + t πi aij d(yi , yj ) 6 N πi As d(xi , xj )p .
i=1 i,j=1 i,j=1
t s=1
ij
The infimum over N such that this holds is the metric Markov cotype p-constant of X and is denoted
by Np (X).
Explanation. In an effort to dualize (i.e. reverse the inequalities in) the definition of Markov type, we
would like to define Markov cotype by the inequality
n
X n
X
t p −p
πi (A )ij d(xi , xj ) > N t πi aij d(xi , xj )p ,
i,j=1 i,j=1
One can easily see though that this cannot be true in any metric space X which is not a singleton, by
considering a process equally distributed between two points in the metric space. Thus, the existential
quantifier added to the definition is needed for it to make sense. The fact that the power At is replaced
by the Cesàro average on the RHS, is just for technical reasons.
We close this section by introducing a notion that will be useful in what follows:
Definition 11.5. Let (X, d) be a metric space and PX the space of all finitely supported probability
measures in X, i.e. measures µ of the form
n
X n
X
µ= λi δxi , xi ∈ X, λi ∈ [0, 1] and λi = 1.
i=1 i=1
(X, d) is called Wp -barycentric with constant Γ ∈ (0, ∞) if there exists a map B : PX → X such that
(i) For every x ∈ X, B(δx ) = x and
(ii) the following inequality holds true:
X n X n p n
X
(134) d B λi δxi , B λi δyi 6 Γp λi d(xi , yi )p ,
i=1 i=1 i=1
Pn
for every x1 , ..., xn , y1 , ..., yn ∈ X and λ1 , ..., λn ∈ [0, 1] with i=1 λi = 1.
Remarks. (i) If a metric space (X, d) is Wp -barycentric, then it is also Wq -barycentric for any q > p
with the same constant.
(ii) Every Banach space is W1 -barycentric with constant 1.
46
11.2. Statement and proof of the theorem. The promised Lipschitz extension theorem is the fol-
lowing:
Theorem 11.6 (generalized Ball extension theorem, Mendel-Naor, 2013). Let (X, dX ) and (Y, dY ) be
metric spaces and p ∈ (0, ∞) such that:
(i) (X, dX ) has Markov type p;
(ii) (Y, dY ) is Wp barycentric with constant Γ and has metric Markov cotype p.
Consider a subset Z ⊆ X and a Lipschitz function f : Z → Y . Then, for every finite subset S ⊆ X there
exists a function F ≡ F S : S → Y such that:
(a) F |S∩Z = f |S∩Z and
(b) kF kLip . ΓMp (X)Np (Y )kf kLip .
In most reasonable cases, this finite extension result gives a global extension result. We illustrate this
fact by the following useful example:
Corollary 11.7. Let X and Y be Banach spaces satisfying (i) and (ii) of the previous theorem such that
Y is also reflexive. Then e(X, Y ) . Mp (X)Np (Y ).
Proof. Consider a subset Z ⊆ X and a function f : Z → Y . We can assume that 0 ∈ Z, after translation.
By the previous theorem, for every finite set S ⊆ X there exists a function F S : S → Y which agrees
with f on S ∩ Z and also satisfies
def
kF S kLip . Mp (X)Np (Y ) = K.
def
For every such S, define the vector bS = (bSx )x∈X ∈ x∈X (KkxkBY ) = X by the equalities
Q
(
F S (x), x ∈ S
bSx = .
0, x∈/S
Since Y is reflexive, X is compact in the (weak) product topology and thus there exists a limit point
b ∈ X of the net (bS )S⊆X finite . Define now F : X → Y by F (x) = bx for x ∈ X. Obviously, F extends
f and also, for x, y ∈ X:
kF (x) − F (y)k = kbx − by k
= weak− lim[F S (x) − F S (y)]
S
6 lim sup kF S (x) − F S (y)k
S
. Kkx − yk,
that is, kF kLip . K.
We will now proceed with the proof of the theorem. In the sections that follow, we will prove that
many Banach spaces and metric spaces satisfy the conditions (i) and (ii) above; thus we will get concrete
extension results.
The key lemma for the proof of Theorem 11.6 is the following:
Lemma 11.8 (Dual extension criterion). Consider (X, dX ) and (Y, dY ) two metric spaces such that
Y is Wp -barycentric with constant Γ, Z ⊆ X, f : Z → Y and some ε ∈ (0, 1). Suppose that there
exists a constant K > 0 such that for every n ∈ N, every x1 , ..., xn ∈ X and every symmetric matrix
H = (hij ) ∈ Mn (R) with positive entries there exists a function ΦH : {x1 , ..., xn } → Y such that the
following hold:
(i) ΦH |{x1 ,...,xn }∩Z = f |{x1 ,...,xn }∩Z ;
(ii)
n n
X p X
(135) hij dY ΦH (xi ), ΦH (xj ) 6 K p kf kpLip hij dX (xi , xj )p .
i,j=1 i,j=1
and
D = M = (mij ) ∈ Mn (R) : M symmetric and mij > 0
def
and denote E = conv(C + D). Observe that E is a closed convex set of symmetric matrices. Consider
the matrix T = (tij ) ∈ Mn (R) with tij = K p kf kpLip dX (xi , xj )p .
Claim. It suffices to prove that T ∈ E.
Indeed, if T ∈ E, there exist λ1 , ..., λm ∈ [0, 1] which add up to 1 and functions Φ1 , ..., Φm :
{x1 , ..., xn } → Y such that each Φi agrees with f on {x1 , ..., xn } ∩ Z and furthermore
(1 + ε)p tij = (1 + ε)p K p kf kpLip dX (xi , xj )p
m
X
> λk dY (Φk (xi ), Φk (xj )), for every i, j.
k=1
Pm
Now for 1 6 i 6 n define the measure µi = k=1 λk δΦk (xi ) and the function F : {x1 , ..., xn } → Y
by F (xi ) = B(µi ), where B is the barycenter map. Notice that F indeed extends f and also from the
Wp -barycentricity of Y :
m
X
dY (F (xi ), F (xj ))p 6 Γp λk dY (Φk (xi ), Φk (xj ))p
k=1
6 (1 + ε)p Γp K p kf kpLip dX (xi , xj )p ,
which is the inequality (b). 2
Proof of T ∈ E. Suppose, for contradiction, that T ∈ / E. Then, there exists a symmetric matrix
H = (hij ) ∈ Mn (R) so that
Xn Xn
inf hij mij > hij tij .
M =(mij )∈E
i,j=1 i,j=1
Applying the above for mij = M δij , for large M > 0, we see that hij > 0 and since C ⊆ E, for every
(cij ) = dY (Φ(xi ), Φ(xj ))p ∈ C:
X X
hij dY (Φ(xi ), Φ(xj ))p > K p kf kpLip hij dX (xi , xj )p ,
i,j i,j
To finish the proof of Ball’s theorem, we will also need the following technical lemma:
Lemma 11.9 (Approximate convexity lemma). Fix m, n ∈ N and p ∈ [1, ∞) and let B = (bij ) ∈
Mn×n (R) and C ∈ Mn (R) be (row) stochastic matrices such that C is reversible relative to some π ∈
∆n−1 . Then for every metric space (X, dX ) and every z1 , ..., zm ∈ X there exist w1 , ..., wn ∈ X such that
nX n Xm n
X o m
X
(136) max π b d
i ir X (w ,
i rz ) p
, π c
i ij d(w i , wj )p
6 3 p
(B ∗ Dπ CB)rs dX (zr , zs )p ,
i=1 r=1 i,j=1 r,s=1
Since the terms vii plays no role in the above inequality, we can assume that vii = 0 for every 1 6 i 6 n.
Then, for large enough t > 0, there exists a θ > 0 and π ∈ ∆n−1 such that
(•) wir = θπi bir for every i, r
and
(••) vij = θtπi aij for every i 6= j,
where B = (bir ) ∈ Mn×m (R) and A = (aij ) ∈ Mn (R) are stochastic matrices with A reversible relative
to π.
Indeed, (•) and (••) are obvious for
n X m Pm
def def r=1 wir def wir
X
θ = wir , πi = , bir = Pm
i=1 r=1
θ s=1 wis
and Pn
def 1 j=1 vij def 1 vij
aii = 1 − · Pm , aij = · Pm if i 6= j.
t r=1 wir t r=1 wir
Thus, after this change of parameters:
Xn X
m n
X
LHS = θ 2 πi bir dY (yi , f (zr ))p + t πi aij dY (yi , yj )p .
i=1 r=1 i,j=1
Pτ
Denote by τ = dt/2p e and Cτ (A) = τ1 s=1 As . Since Cτ (A) is reversible relative to π, the approximate
convexity lemma guarantees the existence of w1 , ..., wn ∈ Y such that
n X
nX m n
X o
p
max πi bir dY (wi , f (zr )) , πi (Cτ (A))ij dY (wi , wj )p
i=1 r=1 i,j=1
(†) m
X
6 3p (B ∗ Dπ Cτ (A)B)rs dY (f (zr ), f (zs ))p .
r,s=1
Now, from the definition of metric Markov cotype, there exist y1 , ..., yn ∈ Y such that
n
X n
X n
X
(‡) dY (wi , yi )p + τ πi aij dY (yi , yj )p 6 N p πi (Cτ (A))ij dY (wi , wj )p .
i=1 i,j=1 i,j=1
We will prove that y1 , ..., yn work for large enough t. For the first term, since
dY (yi , f (zr ))p 6 2p−1 dY (yi , wi )p + dY (wi , f (zr ))p
49
we get
n X
X m
2 πi bir dY (yi , f (zr ))p
i=1 r=1
n X
X m n X
X m
6 2p πi bir dY (yi , wi )p + 2p πi bir dY (wi , f (zr ))p
i=1 r=1 i=1 r=1
n
X n X
X m
= 2p πi dY (yi , wi )p + 2p πi bir dY (wi , f (zr ))p .
i=1 i=1 r=1
LHS
6
θ
n
X n
X n X
X m
6 2p πi dY (yi , wi )p + t πi aij dY (yi , yj )p + 2p πi bir dY (wi , f (zr ))p
i=1 i,j=1 i=1 r=1
(‡) n
X n X
X m
6 (2N )p πi (Cτ (A))ij dY (yi , yj )p + 2p πi bir dY (wi , f (zr ))p
i,j=1 i=1 r=1
(†) m
X
6 6p (N p + 1) (B ∗ Dπ Cτ (A)B)rs dY (f (zr ), f (zs ))p
r,s=1
m
X
6 6p (N p + 1)kf kpLip (B ∗ Dπ Cτ (A)B)rs dX (zr , zs )p .
r,s=1
Observe now that the last quantity, depends only on the metric space (X, dX ). Using the identity
n
X
∗
(B Dπ Cτ (A)B)rs = πi bir bjs (Cτ (A))ij
i,j=1
n
X m
X
S2 = πi bir bjs (Cτ (A))ij dX (xi , xj )p
i,j=1 r,s=1
Xn
= πi (Cτ (A))ij dX (xi , xj )p
i,j=1
τ n
1 X X
= πi (Aσ )ij dX (xi , xj )p
τ σ=1 i,j=1
τ
1 X
6 σM πi aij dX (xi , xj )p
p
τ σ=1
n
τ (τ + 1) p X
= M πi aij dX (xi , xj )p
2τ i,j=1
n
τ + 1 p X vij
= M dX (xi , xj )p
2 i,j=1
θt
n
1 p X
6 M dX (xi , xj )p .
θ i,j=1
Patching everything together, (∗) has been proven and the theorem follows. 2
Proof of the approximate convexity lemma. Let f : {z1 , ..., zm } → `∞ be any isometric embedding and
define
m
X
yi = bir f (zr ), i = 1, 2, ..., n.
r=1
dX (wi , wj )p = kf (wi ) − f (wj )kp∞ 6 3p−1 kf (wi ) − yi kp∞ + kyi − yj kp∞ + kyj − f (wj )kp∞
n
X n
X n
X
p p−1
πi cij dX (wi , wj ) 6 3 πi cij kyi − yj kp∞ +2·3 p−1
πi kyi − f (wi )kp∞ .
i,j=1 i,j=1 i=1
Now, we compute:
n
X n
X m
X p
πi cij kyi − yj kp∞ =
πi cij bir bjs f (zr ) − f (zs )
∞
i,j=1 i,j=1 r,s=1
Xn m
X
6 πi cij bir bjs kf (zr ) − f (zs )kp∞
i,j=1 r,s=1
Xm
∗
= (B Dπ CB)rs dX (zr , zs )p
r,s=1
51
and since kyi − f (wi )kp∞ 6 kyi − f (zr )kp∞ for every i, r and
P
r (CB)ir = 1 for every i, we have
n
X n X
X m
πi kyi − f (wi )kp∞ = πi (CB)ir kyi − f (wi )kp∞
i=1 i=1 r=1
n X
X m
6 πi (CB)ir kyi − f (zr )kp∞
i=1 r=1
n X
X m m
X p
= πi (CB)ir bis f (zs ) − f (zr )
∞
i=1 r=1 s=1
Xn X m
6 πi (CB)ir bis kf (zs ) − f (zr )kp∞
i=1 r,s=1
m
X
∗
= (B Dπ CB)rs dX (zr , zs )p .
r,s=1
Since A and B have also been bounded (in the second term), the lemma has been proven. 2
52
12. Uniform convexity and uniform smoothness
In this section we present the classes of uniformly convex and uniformly smooth Banach spaces, which
we will later relate to Markov type and cotype. Furthermore, we examine these properties in Lp spaces.
12.1. Definitions and basic properties. First, we will survey some general results about uniform
convexity and uniform smoothness (omitting a few proofs). The relation between these notions and the
calculation of Markov type and cotype will appear in the next section.
Definition 12.1. Let (X, k · k) be a Banach space with dim X > 2. The modulus of uniform convexity
of X is the function
x+y
(137) δX (ε) = inf 1 − : kxk = kyk = 1 and kx − yk = ε , ε ∈ (0, 2].
2
We call X uniformly convex if δX (ε) > 0 for every ε ∈ (0, 2].
Examples 12.2. (i) Any Hilbert space H is uniformly convex. Indeed, by the parallelogram identity,
if kxk = kyk = 1 and kx − yk = ε
kx + yk2 = 4 − kx − yk2 = 4 − ε2 ,
q
2
thus δH (ε) = 1 − 1 − ε4 .
(ii) Both `1 and `∞ are not uniformly convex since their unit spheres contain line segments.
An elementary, yet tedious to prove, related result is the following:
Proposition 12.3 (Figiel). For any Banach space (X, k · k), the mapping ε 7→ δX (ε)/ε is increasing.
Using Dvoretzky’s theorem, one can also deduce that:
Proposition 12.4. For any Banach space (X, k · k) the following inequality holds:
r
ε2
(138) δX (ε) . 1 − 1 − = δ`2 (ε)
4
Definition 12.5. Let (X, k · k) be a Banach space. The modulus of uniform smoothness of X is the
function
kx + τ yk + kx − τ yk
(139) ρX (τ ) = sup − 1 : kxk = kyk = 1 , τ > 0.
2
We call X uniformly smooth if ρ(τ ) = o(τ ) as τ → 0+ .
Example 12.6. Again, `1 is not uniformly smooth since for x = e1 and y = e2
kx + τ yk1 + kx − τ yk1
− 1 = τ.
2
The notions of uniform convexity and uniform smoothness are, in some sense, in duality as shown by
the following theorem.
Theorem 12.7 (Lindenstrauss Duality Formulas). For a Banach space X, the following identities hold
true:
nτε o
(140) ρX ∗ (τ ) = sup − δX (ε) : ε > 0 .
2
and
nτε o
(141) ρX (τ ) = sup − δX ∗ (ε) : ε > 0 .
2
Proof. Left as an exercise.
We also mention a general result on uniformly convex and uniformly smooth Banach spaces, whose
proof we omit:
Theorem 12.8 (Milman-Pettis). If a Banach space X is either uniformly convex or uniformly smooth,
then X is reflexive.
Finally, uniform convexity and uniform smoothness admit some quantitative analogues:
53
Definition 12.9. Let (X, k · k) be a Banach space, p ∈ (1, 2] and q ∈ [2, ∞). We say that X is p-smooth
if there exists a constant C > 0 with ρX (τ ) 6 Cτ p . Similarly, we say that X is q-convex if there exists
a constand C > 0 with δX (ε) > Cεq .
Even though dealing with the class of p-smooth (resp. q-convex) Banach spaces seems restricting at
first, the following deep theorem of Pisier shows that it is actually not.
Theorem 12.10 (Pisier, 1975). If X is a uniformly convex (resp. smooth) Banach space, then it admits
an equivalent norm with respect to which it is q-convex (resp. p-smooth) for some q < ∞ (resp. p > 1).
Because of the above result, we will now focus our attention on the classes of p-smooth and q-convex
Banach spaces. A more useful (yet completely equivalent) formulation of these definitions is the following:
Definition 12.11. Let X be a Banach space. The q-convexity constant of X, denoted by Kq (X), is the
infimum of those K > 1 such that
2
(142) 2kxkq + q kykq 6 kx + ykq + kx − ykq , for every x, y ∈ X.
K
Similarly, the p-smoothness constant of X, denoted by Sp (X), is the infimum of those S > 1 such that
(143) kx + ykp + kx − ykp 6 2kxkp + 2S p kykp , for every x, y ∈ X.
It can easily be checked that X is q-convex (resp. p-smooth) if and only if Kq (X) < ∞ (resp.
Sp (X) < ∞).
a+b a−b
Remark. Setting x = 2 and y = 2 , we see that the inequalities (142) and (143) are equivalent to
a+b q 1 a−b q kakq + kbkq
(144) + 6 , for every a, b ∈ X
2 Kq (X)q 2 2
and
kakq + kbkq a+b q a−b q
(145) 6 + Sp (X)p
2 2 2
respectively.
An important fact (already hinted by Lindenstrauss’ formulas above) is that there is a perfect duality
between uniform convexity and smoothness. This is formulated as follows:
Theorem 12.12. Let X be a Banach space. Then Sp (X) = Kq (X ∗ ), where p and q are conjugate
indices, i.e. p1 + 1q = 1. Since any Banach space that is either q-convex or p-smooth is reflexive, this also
gives Kq (X) = Sp (X ∗ ).
Proof. Firstly, we prove that Sp (X) 6 Kq (X ∗ ). Let K > Kq (X ∗ ) and x, y ∈ X. We will prove that
kx + ykp + kx − ykp 6 2kxkp + 2K p kykp .
Consider f, g ∈ SX ∗ such that f (x + y) = kx + yk and g(x − y) = kx − yk and define F, G ∈ X ∗ by
21/q kx + ykp−1 21/q kx − ykp−1
F = 1/q · f and G = 1/q · g.
kx + ykp + kx − ykp kx + ykp + kx − ykp
Now, observe that
F (x + y) + G(x − y) 21/q kx + ykp + kx − ykp
= 1/q
2 2 kx + ykp + kx − ykp
kx + ykp + kx − ykp 1/p
= .
2
Thus:
kx + ykp + kx − ykp 1/p F +G F −G
= (x) + (y)
2 2 2
F +G F −G
6 kxk + kyk
2 2
F +G q 1 F − G q 1/q 1/p
6 + q kxkp + K p kykp
2 K 2
(144) 1
q 1/q
q
1/p
kxkp + K p kykp
6 1/q kF k + kGk
2
1/p
= kxkp + K p kykp ,
54
by the construction of F and G.
Now, instead of proving that Kq (X ∗ ) 6 Sp (X), we will prove that Kq (X) 6 Sp (X ∗ ) which is equivalent
by the reflexivity of X. Let S > Sp (X ∗ ) and x, y ∈ X. We will prove that
2
2kxkq + q kykq 6 kx + ykq + kx − ykq .
S
Again, take f, g ∈ SX ∗ so that f (x) = kxk and g(y) = kyk and define F, G ∈ X ∗ by
kxkq−1 kykq−1
F = 1/p · f and G = · g.
1 1 q 1/p
2 kxkq + S q kyk
q 2S q kxkq + S q kyk
For these functions,
1 1/q
kxkq + q
kykq = (F + G)(x + y) + (F − G)(x − y)
S
6 kF + Gkkx + yk + kF − Gkkx − yk
1/p 1/q
6 kF + Gkp + kF − Gkp kx + ykq + kx − ykq
(143) 1/p 1/q
6 2kF kp + 2S p kGkp kx + ykq + kx − ykq
kx + ykq + kx − ykq 1/q
= ,
2
which is the desired inequality.
12.2. Smoothness and convexity in Lp spaces. Now we will compute the smoothness and convexity
parameters of Lp spaces. The duality given by the last theorem of the previous section allows us to prove
only the convexity constants (or only the smoothness constants). Observe that L1 and L∞ are neither
uniformly smooth nor uniformly convex, since they are not reflexive. The easier part of the computation
is the following:
Theorem 12.13 (Clarkson’s inequality). For 1 < p 6 2, it holds Sp (Lp ) = 1. Equivalently, for
2 6 q < ∞, it holds Kq (Lq ) = 1.
Proof. We will prove the second version of the statement; let q > 2. Since the desired inequality (144)
contains only expressions of the form k · kqq it is enough to be proven for real numbers, i.e. it suffices to
prove
a+b q a−b q |a|q + |b|q
+ 6 , for every a, b ∈ R.
2 2 2
To prove this, we first use the inequality k · k`q 6 k · k`2 to get
a+b q a − b q 1/q a + b 2 a − b 2 1/2
+ 6 +
2 2 2 2
a2 + b2 1/2
= .
2
Now the convexity of t 7→ |t|q/2 gives
a+b q a − b q a2 + b2 q/2
+ 6
2 2 2
|a|q + |b|q
6 ,
2
which implies that Kq (Lq ) = 1.
The above result says that for 1 < p 6 2, Lp is p-smooth and for 2 6 q < ∞, Lq is q-convex. The
converse situation is described by the following theorem:
Theorem 12.14. For 1 < p 6 2, it holds K2 (Lp ) 6 √1 . Equivalently, for 2 6 q < ∞, it holds
√ p−1
S2 (Lq ) 6 q − 1.
Using the formulation of (142), we have to prove that for every f, g ∈ Lp , where 1 < p 6 2, it holds
kf + gk2p + kf − gk2p
kf k2p + (p − 1)kgk2p 6 .
2
In order to prove this, we will need two very useful inequalities:
55
Proposition 12.15 (Bonami-Beckner two-point inequality). Let a, b ∈ R and 1 6 p 6 2. Then
1/2 |a + b|p + |a − b|p 1/p
(146) a2 + (p − 1)b2 6 .
2
Proof. It is enough to prove the inequality when |a| > |b|, because otherwise
a2 + (p − 1)b2 6 b2 + (p − 1)a2
b
and the RHS is symmetric in a and b. Let x = a ∈ [−1, 1]. We have to prove that
p/2 (1 + x)p + (1 − x)p
1 + (p − 1)x2 6 .
2
Using the Taylor expansion, we see that
∞
(1 + x)p + (1 − x)p
X p
= x2k ,
2 2k
k=0
where s = p(p−1)···(p−s+1)
p p
s! . Observe that, since only even terms appear, all the coefficients 2k are
non-negative. Thus
(1 + x)p + (1 − x)p p(p − 1) 2 p/2
>1+ x > 1 + (p − 1)x2 ,
2 2
where we used the general inequality 1 + αy > (1 + y)α , where 0 6 α 6 1.
Proposition 12.16 (Hanner’s inequality). For every f, g ∈ Lp and 1 6 p 6 2 the following inequality
holds true:
p p
(147) kf kp − kgkp + kf kp + kgkp 6 kf + gkpp + kf − gkpp .
For the proof of Hanner’s inequality we will need the following lemma.
and
(1 + r)p−1 − (1 − r)p−1
β(r) = .
rp−1
Then, for every A, B ∈ R and r ∈ [0, 1]
Proof. We first claim that α(r) > β(r) for every r ∈ [0, 1]. Consider h = α−β and observe that h(1) = 0.
It suffices to prove that h0 6 0. Indeed,
1 1 1
h0 (r) = −(p − 1) p + 1 − 6 0.
r (1 − r)2−p (1 + r)2−p
Again, it suffices to prove the lemma if 0 < B < A. Denote R = B/A ∈ (0, 1) and observe that we must
prove
α(r) + β(r)Rp 6 (1 + R)p + (1 − R)p , r ∈ [0, 1].
p
So, it suffices to prove that F (r) = α(r) + β(r)R attains a global maximum on r = R which is easily
checked by differentiation.
Proof of Hanner’s inequality. Without loss of generality, kgkp 6 kf kp . Applying pointwise the previous
kgk
lemma with A = |f |p , B = |g|p and r = kf kpp we get
57
13. Calculating Markov type and cotype
In this section, we will calculate the Markov type and metric Markov cotype of a large variety of
metric spaces, thus getting concrete applications of Ball’s extension theorem. In particular, we will see
that these notions serve as nonlineaer analogues of uniform smoothness and uniform convexity.
13.1. Uniform smoothness and Markov type. Our first goal is to prove that any p-smooth Banach
space also has Markov type p. To do this, we will use an inequality of Pisier for martingales with values
in a uniformly smooth space. We will also need the following:
Lemma 13.1. Let X be a p-smooth Banach space for some 1 < p 6 2. If Z is an X-valued random
variable then
p p
(149) EkZkp 6 kEZk + 2Sp (X)p EkZ − EZk .
Proof. Write S = Sp (X). Then, for every x, y ∈ X
kx + ykp + kx − ykp 6 2kxkp + 2S p kykp .
Thus, for x = EZ and y = Z − EZ, we get the (pointwise) estimate
p p p
kZkp + k2EZ − Zk 6 2 kEZk + 2S p kZ − EZk .
Taking averages in both sides we get
p
EkZkp + E k2EZ − Zk 6 2kEZkp + 2S p EkZ − EZkp .
But since Jensen’s inequality gives
Ek2EZ − Zkp > k2EZ − EZkp = kEZkp ,
the proof is complete.
We give the following definitions on the setting of barycentric metric spaces, since this is the one in
which we will need in the next subsection. However, we will not need these generalities at the moment.
Let (X, d) be a metric space, PX the space of all finitely-supported probability mesures on X and
B : PX → X a barycenter map; that is a function satisfying B(δx ) = x for every x ∈ X. Suppose also
that (Ω, µ) is a finite probability space and F ⊆ 2Ω is a σ-algebra. Observe that, since Ω is finite, F is
the σ-algebra generated by a partition of Ω. Thus, for a point ω ∈ Ω we can define
(150) F(ω) = the atom of the partition of Ω to which ω belongs.
Definition 13.2. In the above setting, let Z : Ω → X be an X-valued random variable. The conditional
barycenter of Z with respect to F is the X-valued random variable
1 X
(151) B(Z F)(ω) = B µ(a)δZ(a) .
µ (F(ω))
a∈F (ω)
that is:
(156) E[f (Zs )|Z0 , . . . , Zs−1 ] = (Lf )(Zs−1 ) + f (Zs−1 ).
Finally, reversibility implies that
(157) E[f (Zs )|Zs+1 , . . . , Zt ] = (Lf )(Zs+1 ) + f (Zs+1 ).
We can now define
s−1
X t
X
Ms(t) = f (Zs ) − (Lf )(Zr ) and Ns(t) = f (Zt−s ) − (Lf )(Zr )
r=0 r=t−s+1
and
t
(t)
X
E[Ns(t) |Zs+1 , . . . , Zt ] = (Lf )(Zt−s+1 + f (Zt−s+1 ) − (Lf )(Zr ) = Ns−1 ;
r=t−s+1
(t) (t)
i.e. {Ms }ts=0 and {Ns }ts=0 are martingales. Now, (153) directly follows from the identities:
(t)
Ms+1 − Ms(t) = f (Zs+1 ) − f (Zs ) − (Lf )(Zs )
and
(t)
Ns+1 − Ns(t) = f (Zt−s+1 ) − f (Zt−s ) − (Lf )(Zt−s ).
Finally, to prove (154) first observe that:
n
X
Ek(Lf )(Zs )kp = πi k(Lf )(i)kp
i=1
n
X n
X
= πi aij (f (j) − f (i))
i=1 j=1
n
X
6 πi aij kf (j) − f (i)kp
i,j=1
= Ekf (Z1 ) − f (Z0 )kp .
This implies the estimate (154), since
(t)
EkMs+1 − Ms(t) k 6 2p−1 Ekf (Zs+1 ) − f (Zs )kp + 2p−1 Ek(Lf )(Zs )kp 6 2p Ekf (Z1 ) − f (Z0 )kp .
2
We close this section with a digression of independent interest:
Theorem 13.7. For 1 < p 6 2, it holds Mp (Lp ) = 1.
60
Proof. Fix 1 6 q < p < ∞ and (Ω, µ) a measure space. For a function f : Ω → R define T (f ) : Ω×R → C
by
1 − eitf (ω)
(158) T (f )(ω, t) = , ω ∈ Ω, t ∈ R
|t|(q+1)/p
and notice that for f, g ∈ Lq (µ):
p
eitg(ω) − eitf (ω)
Z Z
kT (f ) − T (g)kpLp (µ×m) = dtdµ(ω)
Ω R |t|q+1
p
1 − eit(f (ω)−g(ω))
Z Z
= dtdµ(ω)
Ω R |t|q+1
p
1 − eis
Z Z
1
= q+1 · dsdµ(ω)
Ω R s |f (ω) − g(ω)|
f (ω)−g(ω)
Z |1 − eis |p Z
q
= ds · |f (ω) − g(ω)| dµ(ω) .
R |s|q+1 Ω
| {z }
C(p, q) ∈ (0, ∞)
Thus, after rescaling, there exists a mapping T ≡ Tq,p : Lq → Lp such that for every f, g ∈ Lq :
(159) kT (f ) − T (g)kpp = kf − gkqq
In particular, if p 6 2 there exists a mapping T : Lp → L2 satisfying
kT (f ) − T (g)k22 = kf − gkpp .
Thus, Mp (Lp ) = 1 follows readily from M2 (L2 ) = 1 which we have proven.
√
Open problem 13.8. Is it M2 (Lp ) = p − 1 for p > 2?
13.2. Barycentric metric spaces and metric Markov cotype. We will now identify classes of metric
spaces with metric Markov cotype p. Surprisingly, we will get a very wide class of spaces, including (but
not limited to) p-convex Banach spaces. We start with the analogue of Lemma 13.1 for p-convex spaces:
Lemma 13.9. Let X be a p-convex Banach space for some 2 6 p < ∞. If Z is an X-valued random
variable, then
1
(160) kEZkp + · EkZ − EZkp 6 EkZkp .
(2p−1 − 1)Kp (X)p
Proof. Define θ > 0 by
EkZkp − kEZkp
def p
θ = inf : Z satisfies EkZ − EZk > 0 .
EkZ − EZkp
1
We want to prove that θ > (2p−1 −1)Kp (X)p . Given ε > 0 consider a random variable Z0 such that
We will now focus on the class of p-barycentric metric spaces, which, given the previous lemma, will
serve as a nonlinear analogue of p-convex Banach spaces. A small reminder first: for a metric space
(X, d) we denote by PX the space of all finitely supported probability measures on X. A barycenter map
on X is a function B : PX → X satisfying B(δx ) = x. (X, d) will be called WpP -barycentric with P
constant
Γ, for some p > 1, if there exists a barycenter map B such that for every µ = i λi δxi and µ0 = i λi δx0i
it holds
p X
(161) d B(µ), B(µ0 ) 6 Γp λi d(xi , x0i )p .
i
Definition 13.10. Fix some p > 1. A metric space (X, d) will be called p-barycentric with constant
K > 0 if there exists a barycenter map B : PX → X such that for every x ∈ X and µ ∈ PX :
Z Z
p 1 p
(162) d x, B(µ) + p d( y, B(µ) dµ(y) 6 d(x, y)p dµ(y).
K X X
In the next few pages, we will prove that a wide class of metric spaces satisfies the above property for
p = 2. However, let’s first mention our goal - which will be proven later on:
Theorem 13.11 (Mendel-Naor, 2013). Let (X, d) be a metric space that is p-barycentric with constant
K and also Wp -barycentric with constant Γ under the same barycenter map. Then (X, d) has metric
Markov cotype p and Np (X) . KΓ.
Examples 13.12. (i) From Lemma 13.9, it follows readily that a p-convex Banach space is also p-
barycentric with constant K 6 (2p−1 − 1)Kp (X).
(ii) A simply connected, complete Riemannian manifold with non-positive sectional curvature is 2-
barycentric with constant 1 and also W1 -barycentric – thus also W2 -barycentric – with constant 1.
We will later prove that a much wider class of metric spaces satisfies these properties.
Moving towards the second example, we will be working with a special class of metric spaces. A metric
space (X, d) is called geodesic if for every y, z ∈ X there exists a path γ : [0, 1] → X from y to z such
that for every t ∈ [0, 1]
(163) d(y, γ(t)) = td(y, z) and d(x, γ(t)) = (1 − t)d(y, z).
Such paths are called geodesics.
Definition 13.13. A geodesic metric space (X, d) is a CAT(0)-space if for every y, z ∈ X there exists a
geodesic γ : [0, 1] → X from y to z such that for every x ∈ X and t ∈ [0, 1]:
(164) d(x, γ(t))2 6 (1 − t)d(x, y)2 + td(x, z)2 − t(1 − t)d(y, z)2 .
A compete CAT(0)-space is called a Hadamard space.
Examples 13.14. (i) The hyperbolic space is a Hadamard space. In more generality, any Riemannian
manifold with nonpositive sectional curvature is a CAT(0)-space.
(ii) The metric spaces induced by the stortest path distance on trees are CAT(0)-spaces.
Now, let (X, d) be a Hadamard space. For a probability measure µ ∈ PX , define B(µ) ∈ X to be the
point that minimizes the function
Z
def
(165) X 3 z 7−→ Fµ (z) = d(z, y)2 dµ(y).
X
Thus,
m 6 tm + (1 − t)Fµ (x) − t(1 − t)d(x, B(µ))2
or equivalently
m 6 Fµ (x) − td(x, B(µ))2 .
For t = 1, this is (162).
Proposition 13.17. Any Hadamard space (X, dX ) is W1 -barycentric with constant 1.
For the proof of this last proposition, we will need a classical inequality:
Lemma 13.18 (Reshetnyak’s inequality). Let (X, d) be a Hadamard space. If y, y 0 , z, z 0 ∈ X, then
(166) d(y, z 0 )2 + d(y 0 , z)2 6 d(y, z)2 + d(y 0 , z 0 )2 + 2d(y, y 0 )d(z, z 0 ).
Proof. Let σ : [0, 1] → X be the geodesic from y to z 0 . Then for t ∈ [0, 1]
d(y 0 , σ(t))2 6 (1 − t)d(y 0 , y)2 + td(y 0 , z 0 )2 − t(1 − t)d(y, z 0 )2
and
d(z, σ(t))2 6 (1 − t)d(z, y)2 + td(z, z 0 )2 − t(1 − t)d(y, z 0 )2 .
Finally, the triangle inequality together with Cauchy-Schwarz imply that:
2
d(y 0 , z)2 6 d(y 0 , σ(t)) + d(z, σ(t))
d(y 0 , σ(t))2 d(z, σ(t))2
6 +
t 1−t
1−t t
6 d(y 0 , y)2 + d(y 0 , z 0 )2 + d(y, z)2 + d(z, z 0 )2 − d(y, z 0 )2 .
t 1−t
d(y,y 0 )
For t = we get Reshetnyak’s inequality.
d(y,y 0 )+d(z,z 0 )
Pn n
Proof of Proposition 13.17. Let µ = i=1 λi δyi , µ0 = i=1 λi δyi0 ∈ PX . Applying Reshetnyak’s inequal-
P
Proof. Fix any ω ∈ Ω and t = 0, 1, ..., m − 1. Applying the p-barycentric inequality to the probability
measure
1 X
ν= µ(a)δZt+1 (a)
µ Ft (ω)
a∈Ft (ω)
implies that, since B(ν) = Zt (ω) by the martingale property:
1 X µ(a)
d(z, Zt (ω))p + p d(Zt+1 (a), Zt (ω))p
K µ Ft (ω)
a∈Ft (ω)
X µ(a)
6 d(Zt+1 (a), z)p .
µ Ft (ω)
a∈Ft (ω)
Thus, if the atoms of Ft are {A1 , ..., Ak } and ω ∈ Ai , for every i we have:
1 X X
p
µ(a)d(Zt+1 (a), Zt (ω))p 6 µ(a)d(z, Zt+1 (a))p − µ(Ai )d(z, Zt (ω))p
K
a∈Ai a∈Ai
Reference
[MN13] M. Mendel and A. Naor. Spectral calculus and Lipschitz extension for barycentric metric spaces.
Anal. Geom. Metr. Spaces, 1:163–199, 2013.
65