Embeddings Extensions

Metric Embeddings and Lipschitz Extensions
Lecture Notes
Lectures given by Assaf Naor
Department of Mathematics
Princeton University
Spring 2015
Contents
1. Introduction 2
2. Preparatory material 5
3. The Hölder extension problem 8
4. Kirszbraun’s extension theorem 10
5. Bourgain’s embedding theorem 13
6. The nonlinear Dvoretzky theorem 19
7. Assouad’s embedding theorem 23
8. The Johnson-Lindenstrauss extension theorem 26
9. Embedding unions of metric spaces into Euclidean space 35
10. Extensions of Banach space-valued Lipschitz functions 39
11. Ball’s extension theorem 44
12. Uniform convexity and uniform smoothness 53
13. Calculating Markov type and cotype 58
1
Notes taken by Alexandros Eskenazis
1. Introduction
In this first section, we present the setting for the two basic problems that we will face throughout
the course.
1.1. Embeddings and extensions. The first problem is the bi-Lipschitz embedding problem. This
consinsts of deciding whether a given metric space (X, dX ) admits a ”reasonable” embedding into some
other metric space (Y, dY ), in the sense that there is a mapping f : X → Y , such that if we compute
the distance dY (f (x), f (y)), for two points x, y ∈ X, then we can almost compute the distance dX (x, y).
This can be written formally in the following way:
Definition 1.1. Let (X, dX ) and (Y, dY ) be metric spaces. A mapping f : X → Y has distortion at
most D > 1 if there exists a (scaling constant) s > 0, such that
(1) sdX (x, y) 6 dY (f (x), f (y)) 6 sDdX (x, y),
for every x, y ∈ X. The optimal such D is denoted by dist(f ). We denote by c(Y,dY ) (X, dX ) (or simply
by cY (X)) the infimum of all constants D > 1 such that there is a mapping f : X → Y with distortion
at most D. When Y = Lp = Lp (0, 1), we simply write cp (X, dX ) or cp (X), instead of c(Lp ,k·kp ) (X, dX ).
The number c2 (X) is usually called the Euclidean distortion of X.
Question: For two given metric spaces (X, dX ) and (Y, dY ), can we estimate the quantity cY (X)?
If we can effectively bound cY (X) from above, then we definitely know that there exists a low-distortion
embedding of X into Y . If, on the other hand, we can find a large lower bound for cY (X), then we can
deduce that there is an invariant on Y which does not allow X to be nicely embedded inside Y .
The second problem which is of our interest is the Lipschitz extension problem. This can be formulated
as follows: Suppose we are given a pair of metric spaces (X, dX ) and (Z, dZ ), a subset A ⊆ X and a
Lipschitz map f : A → Z.
Question 1: Is there always a Lipschitz mapping f˜ : X → Z which extends f , that is f˜|A = f ?
If the answer is positive for every such function, then it is reasonable to ask the following uniformity
question:
Question 2: Is there a constant K > 1, depending on X, A and Z, such that every function f : A → Z
admits a Lipschitz extension f˜ : X → Z with kf˜kLip 6 Kkf kLip ?
If the answer to this question is also positive, then it is of quantitative interest to compute - or estimate
- the least such K, which we will denote by e(X, A, Z). Afterwards it is reasonable to ask the even
stronger question:
Question 3: Is the quantity
(2) e(X, Z) = sup {e(X, A, Z) : A ⊆ X}
finite?
In case the answer is negative, we would be interested in quantitative formulations of this divergence,
such as:
Question 4: If e(X, Z) = ∞, estimate the quantities
(3) en (X, Z) = sup {e(X, A, Z) : A ⊆ X with |A| 6 n}
and, for ε ∈ (0, 1),

(4) eε (X, Z) = sup e(X, A, Z) : A bounded and ∀x 6= y ∈ Ad(x, y) > εdiam(A) .
We mention now the two best known bounds regarding these quantities when the target space Z is a
Banach space.
Theorem 1.2. If Z is a Banach space, then, for every metric space (X, dX ) it holds
log n
(5) en (X, Z) . .
log log n
Conversely, we do not know whether this is optimal: the best known lower bound is
p
(6) en (X, Z) & log n,
for a particular Banach space Z and a metric space (X, dX ).
2
Theorem 1.3. If Z is a Banach space, then, for every metric space (X, dX ) and every ε ∈ (0, 1), it
holds
1
(7) eε (X, Z) 6 .
ε
Unlike the previous result, we know that this is asymptotically sharp: there is a Banach space Z and a
metric space (X, dX ) such that eε (X, Z) 1ε .
Proof of the upper bound (7). Let A ⊆ X bounded such that, if D = diam(A), then for every x 6= y ∈ A
it holds dX (x, y) > εD. This means that the closed balls

εD
Bx = B x, = {y ∈ X : dX (x, y) 6 εD/4},
4
where x ∈ A, have pairwise disjoint interiors. Define the extension:
( εD −d (x,z)
X
˜
4
εD f (x), if z ∈ Bx
f (z) = 4
S
0, if z ∈
/ x∈A Bx .
kf kLip
It can easily be checked that f˜ is a well-defined ε -Lipschitz mapping. 2
1.2. Extension and approximation. We digress from our main topic in order to make a slight remark:
General principle: An extension theorem implies an approximation theorem.
We give a simple example of this principle. Let’s start with a definition:
Definition 1.4. A metric space (X, dX ) is called metrically convex if for any pair of points x, y ∈ X
and every t ∈ (0, 1), there is a point z ∈ X such that d(x, z) = td(x, y) and d(y, z) = (1 − t)d(x, y).
Many interesting metric spaces, such as normed spaces and complete Riemannian manifolds, are metri-
cally convex.
Theorem 1.5. Let (X, dX ) a metrically convex metric space and (Z, dZ ) a metric space such that
e(X, Z) = K < ∞. Then, any uniformly continuous function f : X → Z, can be uniformly approximated
by Lipschitz functions.
Proof. Let f : X → Z be a uniformly continuous function and an ε > 0. For δ > 0, denote by
(8) ω(δ) = sup {dZ (f (x), f (y)) : dX (x, y) 6 δ}
the modulus of continuity of f . Since f is uniformly continuous, we have limδ→0+ ω(δ) = 0. The fact
that X is metrically convex also implies that ω is subadditive, that is, for s, t > 0
ω(s + t) 6 ω(s) + ω(t).
Indeed, if x, y ∈ X such that dX (x, y) 6 s + t, then there is a point z ∈ X such that dX (x, z) 6 s and
dX (z, y) 6 t. Thus:
dZ (f (x), f (y)) 6 dZ (f (x), f (z)) + dZ (f (z), f (y)) 6 ω(s) + ω(t),
and the claim follows.
Let Nε a maximal ε−separated set inside X - that is a set such that for every x, y ∈ Nε to hold
dX (x, y) > ε which is maximal under inclusion. This is necessairy an ε−net of X, i.e. for every x ∈ X,
there is an x0 ∈ Nε such that dX (x, x0 ) 6 ε.
Consider the function f |Nε . This is a Lipschitz function: for x 6= hy ∈ Nε , iwe have dX (x, y) > ε and
dZ (f (x), f (y)) 6 ω (dX (x, y)) . Dividing the interval [0, dX (x, y)] into dX (x,y)
ε + 1 subintervals of length
at most ε we get, using the subadditivity of ω, that

dX (x, y) dX (x, y)
ω (dX (x, y)) 6 + 1 ω(ε) 6 2 ω(ε).
ε ε
Thus, combining the above, we get that
ω(ε)
dZ (f (x), f (y)) 6 2 dX (x, y),
ε
for every x, y ∈ Nε or equivalently kf |Nε kLip 6 2 ω(ε)
ε .
3
Using the Lipschitz extension assumption, we can now extend f |Nε to a function f˜ : X → Z with
kf˜kLip 6 2K ω(ε) ˜
ε . We now have to prove that f and f are uniformly close. Observe that, for x ∈ X, one
can find y ∈ Nε with dX (x, y) 6 ε and thus

dZ f (x), f˜(x) 6 dZ (f (x), f (y)) + dZ f˜(y), f˜(x) 6 ω(ε) + 2Kω(ε),
which indeed tends to 0 as ε → 0+ .
Related textbooks:
• Y. Benyamini and J. Lindenstrauss. Geometric nonlinear functional analysis. Vol. 1, volume 48

of American Mathematical Society Colloquium Publications. American Mathematical Society, Provi-
dence, RI, 2000.
• J. Heinonen. Lectures on analysis on metric spaces. Universitext. Springer-Verlag, New York, 2001.
• J. Matoušek. Lectures on discrete geometry, volume 212 of Graduate Texts in Mathematics. Springer-
Verlag, New York, 2002.
• A. Brudnyi and Y. Brudnyi. Methods of geometric analysis in extension and trace problems Vol. 1 &
2, volumes 102 & 103 of Monographs in Mathematics. Springer-Verlag. New York, 2012.
• M. I. Ostrovskii. Metric embeddings, volume 49 of De Gruyter Studies in Mathematics. De Gruyter,
Berlin, 2013.
4
2. Preparatory material
In this section we present some first results and techniques that we will use extensively in the rest of
the course.
2.1. Basic embedding and extension theorems. Let’s start with some elementary positive results
about embeddings and extensions.
Theorem 2.1. Every metric space (X, dX ) is isometric to a subspace of `∞ (X), where for an arbitrary
set Γ, we define the space

(9) `∞ (Γ) = (xγ )γ∈Γ ∈ RΓ : sup |xγ | < ∞
γ∈Γ
with the supremum norm k · k∞ .

Proof. Fix some x0 ∈ X and define j : X → `∞ (X) by
j(x) = (dX (x, y) − dX (x0 , y))y∈X .
One easily checks that, indeed, j(x) ∈ `∞ (X) and in fact j is an isometry.
Remark. If X is separable, then X is actually isometric to a subspace of `∞ (N) = `∞ .
Theorem 2.2. Suppose (X, dX ) is a metric space, A ⊆ X and let f : A → `∞ (Γ) a Lipschitz function.
Then, there is an extension f˜ : X → `∞ (Γ) of f , i.e. with f˜|A = f , such that kf˜kLip = kf kLip .
Proof. Let kf kLip = L and write
f (x) = (fγ (x))γ∈Γ , for x ∈ X and fγ (x) ∈ R.
Then, we see that
kf kLip = sup sup |fγ (x) − fγ (y)| = sup kfγ kLip ,
dX (x,y)61 γ∈Γ γ∈Γ
thus each fγ is also L−Lipschitz. Thus, it is enough to extend all the fγ isometrically, that is prove our
theorem with R replacing `∞ (Γ). This will be done in the next important lemma.
Lemma 2.3 (Nonlinear Hahn-Banach theorem). Suppose (X, dX ) is a metric space, A ⊆ X and let
f : A → R a Lipschitz function. Then, there is an extension f˜ : X → R of f , i.e. with f˜|A = f , such
that kf˜kLip = kf kLip .
First – direct proof. Call again L = kf kLip and define the function f˜ : X → R by the formula
(10) f˜(x) = inf {f (a) + LdX (x, a)} , x ∈ X.
a∈A
To see that this function satisfies the results, fix an arbitrary a0 ∈ A. Then, for any a ∈ A:
f (a) + LdX (x, a) > f (a0 ) − Ld(a, a0 ) + Ld(x, a) > f (a0 ) − Ld(x, a0 ) > −∞,
so f˜(x) is well-defined. Also, if x ∈ A, the above shows that f˜(x) = f (x). Finally, for x, y ∈ X and
ε > 0, choose ax ∈ A such that
f˜(x) > f (ax ) + Ld(x, ax ) − ε.
Then,
f˜(y) − f˜(x) 6 f (ax ) + Ld(y, ax ) − f (ax ) − Ld(x, ax ) + ε 6 Ld(x, y) + ε.
Thus, we see that f˜ is indeed L−Lipschitz. 2
Second proof – mimicing the linear Hahn-Banach Theorem. The ”inductive” step in the proof of the
Hahn-Banach Theorem (which actually uses Zorn’s Lemma) takes now the following form:
If A ( X and x ∈ X r A, then, there is an isometric extension fx of f on A ∪ {x}.
This is equivalent to the existance of a real number t = f (x) such that for every a ∈ A, it holds
|f (a) − t| 6 LdX (x, a),
which can be written as
f (a) − LdX (x, a) 6 t 6 f (a) + LdX (x, a), ∀a ∈ A.
Thus, equivalently, it must hold
(11) sup {f (a) − LdX (x, a)} 6 inf
0
{f (a0 ) + LdX (x, a0 )} ,
a∈A a ∈A
5
which is easily seen to be true. Notice that the quantity in the right hand side is exactly the extension
f˜ that we defined in the first proof above. 2
The previous result was an isometric extension theorem. In the language of Section 1 we have proved
that
e(X, R) = e (X, `∞ (Γ)) = 1,
for every metric space X. Of course, not all extensions preserve the Lipschitz norm though:

Counterexample 2.4. Consider X = R3 , k · k1 and A = {e1 , e2 , e3 } ⊆ X and the isometry f that
maps A into an equilateral triangle {v1 , v2 , v3 } in R2 , k · k2 – notice that kei − ej k1 = 2 for i 6= j. Then,
the point 0 ∈ R3 cannot be mapped anywhere in R2 , so that it decreases the distances with the three
vertices, since the disks B(vi , 1) do not intersect.
2.2. The discrete cube Fn2 . Now we develop the basic notation of Fourier Analysis on the discrete
cube and prove our first (asymptotically) negative embeddability result. Let F2 be the field with two
elements, which we will denote by 0, 1, and Fn2 the hypercube {0, 1}n ⊆ Rn . We endow Fn2 with the
Hamming metric
n
X
(12) kx − yk1 = |xi − yi | = |{i : xi 6= yi }| ;
i=1
(Fn2 , k · k1 ) is called the Hamming cube.

For every A ⊆ {1, 2, ..., n} we consider the Walsh function related to A, wA : Fn2 → {−1, 1} defined by
P
xi
(13) wA (x) = (−1) i∈A , x ∈ Fn2
and we observe that the functions {wA }A are pairwise orthogonal with respect to the inner product
X
(14) hf, gi = f (x)g(x),
x∈Fn
2
that is the inner product of the space L2 (µ), where µ is the counting measure in Fn2 . Since the space of
all functions f : Fn2 → R has dimension 2n , we deduce that {wA }A is an orthonormal basis of L2 (µ) and
thus every such function has an expansion of the form
X
(15) f (x) = fb(A)wA (x), x ∈ Fn2 ,
A⊆{1,2,...,n}
where
Z
(16) fb(A) = f (x)wA (x)dµ(x) ∈ R.
Fn
2
Remark. The same is true for functions f : Fn2 → X, where X is any Banach space – in this case
fb(A) ∈ X. The previous dimension-counting argument does not work here, but we can easily deduce
this from the above by composing f with linear functionals x∗ ∈ X ∗ .
The non-embeddability result that we mentioned is the following:
√
Theorem 2.5. If Fn2 is the Hamming cube, then c2 (Fn2 ) = n.
In order to prove this, we will need a quantitative lemma, which will work as an invariant of embeddings
of the Hamming cube into `2 .
Lemma 2.6 (Enflo, 1969). Every function f : Fn2 → `2 satisfies the inequality
Z n Z
X
(17) kf (x + e) − f (x)k22 dµ(x) 6 kf (x + ej ) − f (x)k22 dµ(x),
Fn
2 j=1 Fn
2
where {ej }nj=1 is the standard basis of Fn2 and e = (1, 1, ..., 1).
Proof of Theorem 2.5. Consider the identity mapping i : Fn2 → (Rn , k · k2 ). Then, for every x, y ∈ Fn2 =
{0, 1}n ,
√
ki(x) − i(y)k2 6 kx − yk1 6 nki(x) − i(y)k2 .
√
This means that c2 (Fn2 ) 6 n.
6
n
To prove
√ the reverse inequality we must consider an embedding f : F 2 → `2 and prove that D =
n
dist(f ) > n. There is a scaling factor s > 0 such that for every x, y ∈ F2
skx − yk1 6 kf (x) − f (y)k2 6 sDkx − yk1 .
We will now use Enflo’s Lemma: we know that (17) holds for the embedding f . But:
kf (x + ej ) − f (x)k22 6 s2 D2 kej k21 = s2 D2
and
kf (x + e) − f (x)k22 > s2 kek21 = s2 n2 .
Combining those with (17), we get that
s2 n2 6 ns2 D2 ,
√
which gives the desired bound D > n. 2
Proof of Enflo’s Lemma. Since the inequality involves only expressions of the form kxk22 ,
where x ∈ `2 ,
it is enough to prove it coordinatewise, that is for functions f : Fn2 → R. In this case, we can expand f
as X
f (x) = xA wA (x),
A⊆{1,2,...,n}
for some xA ∈ R. To do this, we write the Walsh expansion of both sides and then use Parseval’s identity.
Firstly,
X
f (x) − f (x + e) = xA (wA (x) − wA (x + e))
A⊆{1,2,...,n}
X
= xA (1 − (−1)|A| )wA (x)
A⊆{1,2,...,n}
X
=2 xA wA (x),
A: |A| odd
thus X
LHS = 4 x2A .
A: |A| odd
On the other hand,
X
f (x) − f (x + ej ) = xA (wA (x) − wA (x + ej ))
A⊆{1,2,...,n}
X
=2 xA wA (x),
A: j∈A
and thus
n
X X X
RHS = 4 x2A = 4 |A|x2A .
j=1 A: j∈A A⊆{1,2,...,n}
The desired inequality is now obvious. 2
It is worth noting that this is essentially the only non-trivial result, for which we are able to actually
compute the embedding constant cY (X).
The same techniques that we used in the proof of Enflo’s lemma, can also give the following variants
of it which are left as exercises.
Lemma 2.7. Every function f : Fn2 → `2 satisfies the inequality
Z Z Xn Z
2
(18) kf (x) − f (y)k2 dµ(x)dµ(y) . kf (x + ej ) − f (x)k22 dµ(x).
Fn
2 Fn
2 j=1 Fn
2
Lemma 2.8. Every function f : Fn2 → `2 such that fb(A) = 0 for every A with 0 < |A| < k satisfies the
inequality
Z Z n Z
1X
(19) kf (x) − f (y)k22 dµ(x)dµ(y) . kf (x + ej ) − f (x)k22 dµ(x).
Fn
2 Fn
2
k j=1 F n
2
7
3. The Hölder extension problem
Like the extension problem for Lipschitz functions, it is easy to formulate the extension problem for
Hölder functions:
Question: Let (X, dX ), (Y, dY ) be metric spaces, a subset A ⊆ X and an α−Hölder function f : A → Y
for some α ∈ (0, 1]; that is a function for which
(20) dY (f (x), f (y)) . dX (x, y)α , for every x, y ∈ A.
Does there exist an α−Hölder extension f˜ : X → Y of f and, if yes, in which senses is this extension
uniform?
One can easily notice that this is a special case of the general Lipschitz extension question that we posed
in Section 1 since, for 0 < α 6 1, the function dα X is also a metric. In this setting, when Y is a Hilbert
1 1
space, there exists a dichotomy principle depending on whether α ∈ 0, 2 or α ∈ 2 , 1 . We will start
with the case α ∈ 12 , 1 , in which case there is no uniform extension constant. This was proven with the

following construction:
Counterexample 3.1 (Johnson, Lindenstrauss and Schechtman, 1986). Consider the Hamming cube
X = Fn2 and a disjoint copy of it, say X̃ = {x̃ : x ∈ X}. We will define a weighted graph structure on the
set X ∪ X̃ and our metric space will be this graph with the induced shortest path metric. Fix α ∈ 21 , 1
and a constant r > 0. The edges of the graph will be of the following three kinds:
• Pairs of the form {x, y}, where x, y ∈ X. The corresponding weight will be
1/2α
w{x,y} = kx − yk1 .
• Pairs of the form {x̃, ỹ}, where x, y ∈ X. The corresponding weight will be
kx − yk1
w{x̃,ỹ} = .
r2α−1
• Pairs of the form {x, x̃}, where x ∈ X. The corresponding weight will be
w{x,x̃} = r.
Claim: If d is the shortest path metric on X ∪ X̃ with respect to this weighted graph, then for every
1/2α
x, y ∈ X, d(x, y) = kx − yk1 .
Proof. Let x = x0 , x1 , x2 , ..., xk = y be a shortest path between x, y in X ∪ X̃ with k minimal. We will

1 1/2α
prove that k = 1. Since 2α 6 1, the function ρ(z, w) = kz − wk1 is a metric in X and thus the path
does not contain any two consecutive edges inside X. Obviously it also does not contain any consecutive
edges inside X̃. Thus, if k > 1, it contains a subpath of the form a → ã → b̃ → b, for some a, b ∈ X.
Thus
ka − bk1
(∗) d(x, y) > 2r + .
r2α−1
1/2α
It can easily be seen now (by differentiating), that the right hand side of (∗) is at least ka − bk1 . This
proves that k = 1, since k was chosen minimal. 2
Consider now the identity map f : X → `n2 . This satisfies:

p
kf (x) − f (y)k2 = kx − yk1 = d(x, y)α , x, y ∈ X;
thus it is α−Hölder with constant 1 and let f˜ : X ∪ X̃ → `n2 be an α−Hölder extension with constant
L > 1. We will prove that L has to be large.
Using Enflo’s inequality again, we get
Z n Z
X
(∗∗) kf˜(x̃ − f˜(x̃ + ẽ)k22 dµ(x) . kf˜(x̃) − f˜(x̃ + e˜j )k22 dµ(x).
Fn
2 j=1 Fn
2
8
By the Hölder condition, we have
n Z
X
RHS 6 L2 d(x̃, x̃ + e˜j )2α dµ(x)
j=1 Fn
2
n
X L2
6
j=1
r2α(2α−1)
L2 n
= .
r2α(2α−1)
On the other hand,
kf˜(x̃) − f˜(x̃ + ẽ)k2 > kf (x) − f (x + e)k2 − kf (x) − f˜(x̃)k2 − kf (x + e) − f˜(x̃ + ẽ)k2
& kx − (x + e)k2 − LdX (x, x̃)α − LdX (x + e, x̃ + ẽ)α
√
= n − 2Lrα .
Adding all these and using (∗∗), we get
L2 n √ 2
> n − 2Lrα ,
r2α(2α−1)
which can be rewritten as √
√

n
L 2rα + > n.
rα(2α−1)
2
Optimizing with respect to r, we get that for r = c(α)n1/4α it holds
1 1
L &α n 2 − 4α .
2
Remark: Notice that for the case α = 1 of this counterexample, we get
1/4
L &α n1/4 log |X ∪ X̃| ,
which is far from being the best known bound (6).
The related quantitative conjecture is this:
1
Conjecture 3.2. For every metric space (X, dX ) and every 2 < α 6 1, it holds
α− 21
(21) en ((X, dα
X ), `2 ) . (log n) .
The case α = 1 of this conjecture is known as the Johnson – Lindenstrauss extension theorem. Another
open problem related to the particular example we just constructed is the following.
Open problem 3.3. Does the metric space X ∪ X̃ with α = 1 embed into `1 , in the sense that c1 (X ∪ X̃)
is bounded as n → ∞?
This question is particularly interesting, since one can prove that both c1 (X) and c1 (X̃), with the
restricted metric, remain bounded as n → ∞ but nevertheless it is not known whether there exists a
low-distortion embedding of the whole space.
The dichotomy we promised will be completed by Minty’s theorem. We just saw that for α > 21 , the
Hölder extension problem cannot be uniformly solved in general. However, Minty proved that this is
not the case for α 6 12 : in this case the Hölder extension problem admits an isometric solution. We will
prove this assertion in the next section, as a corollary of an important extension theorem for functions
between Hilbert spaces.
9
4. Kirszbraun’s extension theorem
In the previous section, we constructed a sequence of metric spaces {Xn }, subsets An ⊆ Xn and Lips-
chitz functions fn : An → `2 which do not admit Lipschitz extensions to Xn with uniformly comparable
Lipschitz norm. We will see now that such an example could not occur if the Xn were Hilbert spaces:
Theorem 4.1 (Kirszbraun, 1934). Let H1 , H2 be Hilbert spaces, a subset A ⊆ H1 and a Lipschitz
function f : A → H2 . Then, there exists an extension f˜ : H1 → H2 of f , such that kf˜kLip = kf kLip .
Before we proceed to the proof of the theorem, we give an equivalent geometric formulation.
Theorem 4.2 (Geometric version of Kirszbraun’s theorem). Let H1 , H2 be Hilbert spaces, {xi }i∈I ⊆ H1 ,
{yi }i∈I ⊆ H2 and {ri }i∈I ⊆ [0, ∞), where I is some index set. If for every i, j ∈ I, it holds
(22) kyi − yj kH2 6 kxi − xj kH1 ,
then
\ \
(23) B(xi , ri ) 6= ∅ =⇒ B(yi , ri ) 6= ∅.
i∈I i∈I
Remark. The above property actually characterizes Hilbert spaces.

A beautiful open question related to the above theorem is the following:
Conjecture 4.3 (Kneser-Pulsen, 1955). Consider vectors x1 , ..., xn , y1 , ..., yn ∈ Rk , such that
kyi − yj k2 6 kxi − xj k2 , for every i, j = 1, 2, ..., n.
Then, for every r1 , ..., rn > 0,
n
! n
!
\ \
(24) vol B(xi , ri ) 6 vol B(yi , ri ) .
i=1 i=1
The validity of the conjecture in its generality is still open. However, Gromov proved in 1987 that
the claim holds if n 6 k + 1. More recently, in the early 00’s, the conjecture was also confirmed (by
completely different arguments) for the cases n = k + 2 and k = 2 (for arbitrary n).
Let us return to our main topic. We start by proving the equivalence of the two statements above.
Geometric version ⇔ extension version – proof. (⇒) Suppose that kf kLip = 1. By a typical Zorn’s
lemma argument, it is enough to construct an extension to one more point x ∈ X r A. Consider the
points {a}a∈A ⊆ H1 and {f (a)}a∈A ⊆ H2 . For them, it holds kf (a) − f (b)kH2 6 ka − bkH1 , for every
a, b ∈ A and we also have
\
x∈ B(a, kx − ak) 6= ∅.
a∈A
Thus, we deduce that there is a point
\
y∈ B(f (a), kx − ak) 6= ∅,
a∈A
i.e. ky − f (a)kH2 6 kx − ak for every a ∈ A. So, if we define f˜(x) = y, we have a Lipschitz extension
with norm 1.
(⇐) Define A = {xi : i ∈ I} ⊆ H1 and f : A → H2 by f (xi ) = T yi , i ∈ I. Our assumption is equivalent
to the fact that f is 1−Lipschitz. Consider now a point x ∈ i∈I B(xi , ri ), a 1−Lipschitz extension
f˜ : H1 → H2 of f and set y = f˜(x). Then, for every i ∈ I,
ky − yi kH2 = kf˜(x) − yi kH2 6 kx − xi kH1 6 ri ,
2
T
thus y ∈ i∈I B(yi , ri ) 6= ∅.
Proof of the geometric version of Kirszbraun’s theorem. First observe that all the balls B(xi , ri ), B(yi , ri )
are weakly compact (since Hilbert spaces are reflexive) and thus have the finite intersection property.
Therefore, we can suppose that I = {1, 2, Tn..., n} for some n. Now, replace H1 Tby H10 = span{x1 , ..., xn }
0 n
and H2 by H2 = span{y1 , ..., yn }. Since i=1 BH1 (xi , ri ) 6= ∅, we deduce that i=1 BH10 (xi , ri ) 6= ∅, just
n
by considering an orthogonal projection from H1 to H10 . Of course, if it holds i=1 BH20 (yi , ri ) 6= ∅, then
T
Tn
also i=1 BH2 (yi , ri ) 6= ∅. So, we can assume that dimH1 , dimH2 < ∞.
10
Tn
TnTake now any point x ∈ i=1 B(xi , ri ). Observe that if x = xi0 , for some i0 , then y = yi0 ∈
i=1 B(yi , ri ) and thus our claim is trivial. So, we can suppose that x 6= xi for every i = 1, 2, ..., n.
Define the function h : H2 → [0, ∞) by
ky − xi k2
(25) h(y) = max , y ∈ H2 .
16i6n kx − xi k2
Since h is continuous and limy→∞ h(y) = ∞, h attains its global minimum
m = min h(y).
y∈H2
We must prove that m 6 1. Consider a y ∈ H2 such that h(y) = m and define

(26) J = {i ∈ {1, ..., n} : ky − yi k2 = mkx − xi k2 } .
Observe that if i ∈
/ J, then
ky − yi k2 < mkx − xi k2 ,
from the definition of m. We will now need a lemma:
Lemma 4.4. With the above assumptions, y ∈ conv{yj }j∈J .
Proof. Suppose y ∈
/ conv{yj }j∈J = K and consider a separating hyperplane
H = {z ∈ H2 : hz, νi = α}
between y and K, where kνk = 1 pointing towards K. Now for a small ε > 0, such that y + εν is in the
same halfspace as y, it holds ky + εν − yi k < mkx − xi k for every 1 6 i 6 n, which cannot hold.
Completing the proof. By the above lemma, we can write
X X
y= λj yj , where λj > 0 and λj = 1.
j∈J j∈J
Consider now a random vector X ∈ H2 such that

(27) P (X = xj ) = λj , for j ∈ J.
Also, denote yj = f (xj ), j ∈ J and consider an independent copy X 0 of X. Observe that
X
Ef (X) = λj yj = y.
j∈J
Thus
2 2
E kf (X) − Ef (X)k = E kf (X) − yk
X
= λj kyj − yk2
j∈J
X
= m2 λj kxj − xk2
j∈J
2 2
= m E kX − xk .
But, we notice that
2
EkX − xk2 > E kX − EXk
and, on the other hand,
2 1
E kf (X) − Ef (X)k = Ekf (X) − f (X 0 )k2
2
1
6 EkX − X 0 k2
2
2
= E kX − EXk .
(All the above are trivially checked using the Hilbert space structure and the properties of the expected
value.) Putting everything together, we get that
2 2
m2 E kX − EXk 6 E kX − EXk ,
which gives (since X is not constant) that m 6 1, as we wanted.
We are now in position to prove the dichotomy that we mentioned in the previous section:
11
Corollary 4.5 (Minty, 1970). Let (X, dX ) be any metric space A ⊆ X, α ∈ 0, 21 and f : A → `2 an

α−Hölder function with constant L. Then, there exists an extension f˜ : X → `2 , i.e. f˜|A = f , which is
also α−Hölder with constant L.
Proof. It suffices to prove the theorem in the case L = 1 and α = 12 (because we can then aply it to the
metric d2α
X ). As usual, we just have to extend f to one more point x ∈ X r A. Consider the Hilbert
space
 !1/2 
 X 
(28) `2 (X) = f : X → R : kf k = kf (x)k22 <∞
 
x∈X
and let {ex }x∈X be its standard basis. Observe now, that
\ p p
0∈ B dX (x, a)ea , dX (x, a) 6= ∅.
a∈A
Since, for every a 6= b ∈ A,
kf (a) − f (b)k2 6 dX (a, b)1/2
p 1/2
6 dX (x, a) + dX (x, b)
p p
= dX (x, a)ea − dX (x, b)eb
`2 (X)
we deduce from Kirszbraun’s theorem that there exists a
\ p
y∈ B f (a), dX (x, a) 6= ∅.
a∈A
Defining f˜(x) = y, we get that

kf˜(x) − f (a)k 6 dX (x, a)1/2 , for every a ∈ A,
as we wanted.
We close this section by stating a major conjecture concerning the Hölder extension problem:
Conjecture 4.6 (Kalton). Let α ∈ 21 , 1 , (X, dX ) be a metric space, a subset A ⊆ X and a α−Hölder

function f : A → `n2 , for some n, with constant L. Then f admits an extension f˜ : X → `n2 , i.e. f˜|A = f ,
1
which is α−Hölder with constant Lnα− 2 .
12
5. Bourgain’s embedding theorem
In this section we will prove the fundamental result of Bourgain on embeddings of finite metric spaces
into Hilbert space. Afterwards, we will construct an example which proves that Bourgain’s result is
asymptotically sharp.
5.1. Statement and proof of the theorem. The (very general) theorem that we will prove in this
section is the following:
Theorem 5.1 (Bourgain, 1986). If (X, d) is any n−point metric space, then c2 (X) . log n.
Proof. Let k = blog2 nc + 1. For every subset A ⊆ X, denote
n−|A|
1 1
(29) πj (A) = 1− j .
2j|A| 2
Interpretation: Imagine that we choose a random subset B of X by adjoining independently any element
of X in B with probability 1/2j . Then πj (A) is exactly the probability P(B = A).
X
Define a function f : X → R2 , by the formula f (x) = (f (x)A )A⊆X , where
 1/2
k
1 X
(30) f (x)A =  πj (A) d(x, A).
k j=1
X
We will prove that f has bi-Lipschitz distortion at most a constant multiple of log n, where R2 is
endowed with the Euclidean norm. The upper bound follows easily; for x, y ∈ X:
X 1X k
2
kf (x) − f (y)k22 = πj (A) (d(x, A) − d(y, A))
k j=1
A⊆X
(31) k
X X 1
6 πj (A)d(x, y)2
j=1
k
A⊆X
= d(x, y)2 ,
where we used that x 7→ d(x, A) is 1−Lipschitz and that {πj (A)}A⊆X sum to 1 for every j, by the above
interpretation.
The lower bound requires more effort. Fix again two points x, y ∈ X and a 1 6 j 6 k. Define rj (x, y)
to be the smallest r > 0 such that
|B(x, r)| > 2j and |B(y, r)| > 2j
and (for technical reasons) denote

1
r̃j (x, y) = min rj (x, y), d(x, y) .
3
The key lemma that we will need to finish the proof is this:
Lemma 5.2. Let Aj be a random subset of X obtained with respect to πj . Then, for every x 6= y ∈ X,
it holds
2 2
(32) Eπj (d(x, Aj ) − d(y, Aj )) & (r̃j (x, y) − r̃j−1 (x, y)) .
13
Let us suppose for the moment that the lemma is valid. Then, we calculate:
X 1X k
2
kf (x) − f (y)k22 = πj (A) (d(x, A) − d(y, A))
k j=1
A⊆X
k
1X 2
= Eπ (d(x, Aj ) − d(y, Aj ))
k j=1 j
k
1X 2
& (r̃j (x, y) − r̃j−1 (x, y))
(33) k j=1
 2
k
1 X
> 2 r̃j (x, y) − r̃j−1 (x, y)
k j=1
1
= r̃k (x, y)
k2
1
2 d(x, y)2 ,
k
d(x,y)
where apart from the lemma, we have applied Jensen’s inequality and the fact that rk (x, y) > 3 .
The previous inequality can be rewritten as
1
(34) kf (x) − f (y)k2 & d(x, y),
log n
which, along with (31), proves our claim.
Now, to finish the proof of Bourgain’s theorem:

Proof of Lemma 5.2. Fix again x 6= y ∈ X and 1 6 j 6 k. We will prove the even stronger claim:

(∗) P |d(x, A) − d(y, A)| > r̃j (x, y) − r̃j−1 (x, y) & 1,
where the probability is with respect to the measure πj . If (∗) is correct, then the lemma trivially follows
from Markov’s inequality.
Since (∗) is obvious if r̃j (x, y) = r̃j−1 (x, y), we can assume that r̃j (x, y) > r̃j−1 (x, y), which implies
that r̃j−1 (x, y) = rj−1 (x, y). By definition, we know that
|B(x, r̃j−1 (x, y))| > 2j−1 and |B(y, r̃j−1 (x, y))| > 2j−1 .
Considering the open balls, it is true that
|B o (x, r̃j (x, y))| < 2j or |B o (y, r̃j (x, y))| < 2j ;
without loss of generality we assume that the first holds. Observe, that from the definition of r̃i (x, y),
the balls B o (x, r̃j−1 (x, y)) and B(y, rj (x, y)) are disjoint. Now pick a random set A in the manner we
have described.
Observation. If no point of A is in B o (x, r̃j (x, y)) but there is a point of A in B(y, r̃j−1 (x, y)), then
|d(x, A) − d(y, A)| > r̃j (x, y) − r̃j−1 (x, y).
Thus, we conclude that

P |d(x, A) − d(y, A)| > r̃j (x, y) − r̃j−1 (x, y)
> P A ∩ B o (x, r̃j (x, y)) = ∅ and A ∩ B(y, r̃j−1 (x, y)) 6= ∅

= P A ∩ B o (x, r̃j (x, y)) = ∅ · P A ∩ B(y, r̃j−1 (x, y)) 6= ∅

|B o (x,r̃j (x,y))| |B(y,r̃j−1 (x,y))| !
1 1
= 1− j 1− 1− j
2 2
2 j 2 j−1 !
1 1
> 1− j 1− 1− j & 1,
2 2
where we used the fact that the balls are disjoint in the first equality. 2
14
Exercise 5.3. Prove that for every 1 6 p < ∞, and every n−point metric space (X, d), it holds
log n
(35) cp (X) . .
p
1
(Hint: Replace the quantity 2j in Bourgain’s proof by q j , for q ∈ (0, 1) and then optimize with respect
to q.)
5.2. Sharpness of Bourgain’s theorem. We now present an explicit construction which will prove
that the proof above gave the asymptotically optimal upper bound for Euclidean embeddings of finite
metric spaces. In particular, we prove:
Theorem 5.4 (Linial-London-Rabinovich, 1995). For arbitrarily large n there exists an n-point metric
space (X, dX ) with c2 (X) & log n.
In particular, Linial, London and Rabinovich proven that for a sequence of regular expander graphs
{Gn } Bourgain’s upper bound is achieved. The proof that follows is different and was given by Khot
and Naor in 2006. In fact, the same construction can prove that the statement in Exercise 5.3 is also
asymptotically sharp. Observe also, that for the Hamming cube we have proved
√ q
c2 (Fn2 ) = n = log |Fn2 |,
which is not the lower bound that we want to achieve here.

For the proof of Theorem 5.4 we will need the notion of a quotient of a finite metric space. So, let
(X, d) be a finite metric space and A, B ⊆ X two subsets of X. The classical distance of A and B is the
quantity

(36) d(A, B) = min d(a, b) : a ∈ A, b ∈ B
whereas their Hausdorff distance is defined by

H(A, B) = max max d(a, B), max d(A, b)
a∈A b∈B
(37)
= min{ε > 0 : B ⊆ Aε and A ⊆ Bε },
where for a subset C ⊆ X we define Cε = {x ∈ X : d(x, C) 6 ε}. One can easily prove that H is a
metric on 2X but d is not. The example proving the sharpness of Theorem 5.1 will be a suitable quotient
of the Hamming cube.
Definition 5.5. Let (X, d) be a metric space and U = {U1 , U2 , ..., Uk } a partition of X. Consider the
weighted graph on the set of vertices {U1 , U2 , ..., Uk }, where the weights are defined by
(38) w{Ui ,Uj } = d(Ui , Uj ), i, j = 1, 2, ..., n.
def
We denote by X/U = {U1 , U2 , ..., Uk } the resulting metric space with the induced shortest-path metric
dX/U .
Lemma 5.6. Suppose that a group G acts on X by isometries and let U be the orbit partition {Gx }x∈X
of X with respect to this action. Then
(39) dX/G (Gx , Gy ) = d(Gx , Gy ) = H(Gx , Gy ), x, y ∈ X,
where we write X/G instead of X/{Gx }x∈X .
Proof. The proof is easy and left as an exercise.
Lemma 5.7. Let G be a group that act on Fd2 by isometries and |G| 6 2εd , where 0 < ε < 1. Then
1 X 1−ε
(40)
2 2d
dFd2 /G (Gx , Gy ) & 1 · d.
d
1 + log 1−ε
x,y∈F2
15
Proof. Let µ be the uniform probability measure of Fd2 and fix some δ > 0. Now observe that
n o
µ × µ (x, y) ∈ Fd2 × Fd2 : dFd2 /G (Gx , Gy ) > δd
n o
= 1 − µ × µ (x, y) ∈ Fd2 × Fd2 : ∃ g ∈ G such that kx − gyk1 < δd
1 X X
=1− d µ x ∈ Fd2 : kx − gyk1 < δd
2
y∈Fd2
g∈G

|G| X d
=1− d
2 k
k6δd
1 X d
> 1 − (1−ε)d .
2 k
k6δd
Using Stirling’s formula, we thus have:

√
n o 2 δd δ −d
µ × µ (x, y) ∈ F2 × F2 : d(Gx , Gy ) > δd > 1 − (1−ε)d δ (1 − δ)1−δ
d d
,
2
1−ε
and after optimizing with respect to δ, i.e. for δ 1+log 1 , we get
1−ε
n o 1
µ × µ (x, y) ∈ Fd2 × Fd2 : d(Gx , Gy ) > δd > .
2
After integrating, we get the desired inequality.
For an F2 -linear subspace V ⊆ Fd2 , we define V ⊥ by
d
X
(41) V ⊥ = {y ∈ Fd2 : xi yi = 0 ∈ F2 , ∀x ∈ V }.
i=1
It can easily be proven that V ⊥ is also a subspace of Fd2 and that (V ⊥ )⊥ = V . We will now prove that
there exists a linear subspace V of Fd2 such that
(42) c2 (Fd2 /V ⊥ ) & d log |Fd2 /V ⊥ |,
where V ⊥ acts on Fd2 by addition. We will need the following Fourier-analytic lemma:
Lemma 5.8. For every subspace V of Fd2 , define
(43) w(V ) = min kxk1 .
x∈V r{0}
Suppose that f : Fd2 → `2 is V ⊥ -invariant, i.e.

f (x + y) = f (x), for every x ∈ Fd2 and y ∈ V ⊥ .
Then, for every A ⊆ {1, 2, ..., d} with 0 < |A| < w(V ), we have fb(A) = 0.
Proof. Let A a subset of {1, 2, ..., d} with 0 < |A| < w(V ). Then 1A ∈/ V = (V ⊥ )⊥ and thus there exists
⊥
some y ∈ V with X X
yj 6= 0 ∈ F2 =⇒ yj = odd.
j∈A j∈A
Thus, we can calculate:
Z
fb(A) = f (x)wA (x)dµ(x)
Fd
2
Z
= f (x − y)wA (x)dµ(x)
Fd
2
Z
= f (x)wA (x + y)dµ(x)
Fd
2
Z
= wA (y) f (x)wA (x)dµ(x)
Fd
2
P
yj
= (−1) j∈A fb(A)
= −fb(A),
16
that is, fb(A) = 0.
The main ingredient of the proof is the following inequality:
Lemma 5.9. For every subspace V of Fd2 , we have

p dim(V )
(44) c2 (Fd2 /V ⊥ ) & d · w(V ) · d
.
d + d log dim(V )
Proof. Let f : Fd2 /V ⊥ → `2 be such that
dFd2 /V ⊥ (x + V ⊥ , y + V ⊥ ) 6 kf (x + V ⊥ ) − f (y + V ⊥ )k2 6 D · dFd2 /V ⊥ (x + V ⊥ , y + V ⊥ ).
We want to bound D from below. Let g : Fd2 → `2 be defined by
g(x) = f (x + V ⊥ ), x ∈ Fd2 .
Observe that g is V ⊥ -invariant, thus gb(A) = 0 for 0 < |A| < w(V ). Thus, by Lemma 2.8 (a variant of
Enflo’s inequality),
Z Z d Z
1 X
(45) kg(x) − g(y)k22 dµ(x)dµ(y) . kg(x + ej ) − g(x)k22 dµ(x).
Fd
2 Fd
2
w(V ) j=1 Fd
2
Now, using the Lipschitz conditions in (45), we have:

Z Z
LHS > dFd2 /V ⊥ (x + V ⊥ , y + V ⊥ )2 dµ(x)dµ(y)
Fd
2 Fd
2
1−ε 2
& 1 ·d ,
1 + log 1−ε
d−dim(V )
by Lemma 5.7, where |V ⊥ | = 2εd , i.e. ε = d . On the other hand:
d
D2 X
Z
RHS 6 dFd2 /V ⊥ (x + V ⊥ , x + ej + V ⊥ )dµ(x)
w(V ) j=1 Fd
2
D2 d
6 .
w(V )
Putting everything together:

D2 d dim(V )2
& 2 .
w(V ) d
1 + log dim(V )
or equivalently
dim(V ) 2
D2 & dw(V ) d
.
d + d log dim(V )

To finish the proof of Theorem 5.4, we must argue that there exists a subspace V of Fd2 such that both
dim(V ) and w(V ) are of the order of d. Then, the previous lemma will give:
c2 (Fd2 /V ⊥ ) & d log |Fd2 /V ⊥ |.

d
Lemma 5.10. For every d, there exists a subspace V of Fd2 such that dim(V ) > 4 and w(V ) & d.
Proof. We will construct such a subspace by induction. Suppose that
{0} = V0 ⊆ V1 ⊆ ... ⊆ Vk ⊆ Fd2

17
have been constructed with k < d/4 and dim(Vi ) = i. We will construct a subspace Vk+1 of dimension
k + 1 such that w(Vk+1 ) > δd, for some δ > 0, to be determined later. We can calculate:
X
|{x ∈ Fd2 r Vk : ∃ y ∈ Vk such that kx + yk < δd}| 6 |{x ∈ Fd2 r Vk : kx + yk < δd}|
y∈Vk
δd
XX d
6
`
y∈Vk `=1
δd
k
X d
=2
`
`=1
√ −d
< 2d/4 2 δd δ δ (1 − δ)1−δ
< 2d/2 < 2d − 2k ,
for some universal constant δ > 0. Thus, there exists some x ∈ Fd2 r Vk such that kx + yk1 > δd, for
every y ∈ Vk . Thus, for Vk+1 = span(Vk ∪ {x}) = Vk ∪ (Vk + x), we have
w(Vk+1 ) > δd & d.

18
6. The nonlinear Dvoretzky theorem
In the previous section we proved that for an arbitrary n-point metric space, we can construct an
embedding into `2 with distortion bounded by a constant multiple of log n and furthermore, this result
is sharp. It is natural to ask whether there exist Ramsey-type results in the above setting:
Question: Is it true that any finite metric space contains large subsets which embed into Euclidean
space with low distortion?
In what follows, we will give a (sharp) quantitative result answering this question. Before moving
on, we have to remark that the motivation for this question comes from a classical theorem about
finite-dimensional Banach spaces:
Theorem 6.1 (Dvoretzky, 1961). Let (X, k · kX ) be an n-dimensional Banach space. Then, for every
ε > 0, there exists a subspace Y of X which satisfies the following:
(i) Y embeds linearly into `2 with distortion at most 1 + ε and
(ii) dim Y > c(ε) log n, where c(ε) is a constant depending only on ε.
The result of this section will be a nonlinear analogue of Dvoretzky’s theorem which we now state.
Theorem 6.2 (Nonlinear Dvoretzky Theorem, Mendel-Naor, 2006). Let (X, d) be an n-point metric
space. Then, for every ε ∈ (0, 1), there exists a subset S ⊆ X which satisfies the following:
(i) c2 (S) . 1ε and
(ii) |S| > n1−ε .
Even though we will not prove it, we remark that the above result is sharp:
Theorem 6.3. For every n ∈ N and ε ∈ (0, 1), there exists an n-point metric space (X, d) such that if
S ⊆ X with |S| > n1−ε , then c2 (S) & 1ε .
The proof of Theorem 6.2 is probabilistic. Let’s start with some terminology. For a partition P of a
metric space X and x ∈ X, we denote by P(x) the unique element of P to which x belongs. For ∆ > 0,
we say that P is ∆-bounded if
(46) diamP(x) 6 ∆, for every x ∈ X.
Definition 6.4. A sequence of partitions {Pk }∞ k=0 of a metric space X is called a partition tree if the
following hold:
(i) P0 = {X};
(ii) For every k > 0, Pk is 8−k diam(X)-bounded and
(iii) For every k > 0, Pk+1 is a refinement of Pk , i.e. for every x ∈ X it holds Pk+1 (x) ⊆ Pk (x).
The crucial definition is the following:
Definition 6.5. Let β, γ > 0. A probability distribution over random partition trees of a metric space
X is called completely β-padded with exponent γ if for every x ∈ X

−k
1
(47) P{Pk }∞ B(x, 8 βdiam(X)) ⊆ Pk (x), ∀k > 0 > γ.
k=0
n
Observe that in the previous definition, the probability is with respect to the random choice of a
partition tree of X. From now on, we normalize so that diam(X) = 1. The relation of this definition
with our problem is explained in the following lemma:
Lemma 6.6. Suppose that X admits a completely β-padded with exponent γ random partition tree. Then
there exists a subset S ⊆ X such that:
(i) c2 (S) 6 β8 and
(ii) |S| > n1−γ .
Proof. Define a random subset
S = x ∈ X : B(x, 8−k β) ⊆ Pk (x), ∀k > 0 .

(48)
First, we calculate
X
P B(x, 8−k β) ⊆ Pk (x), ∀k > 0

E|S| =
x∈X
> n · n−γ = n1−γ .

19
Thus, we can choose a random partition tree {Pk }∞ k=0 such that the corresponding set S satisfies (ii).
Now, we will prove (i). For x, y ∈ X, let k(x, y) be the largest integer k such that Pk (x) = Pk (y) and
define the random variable
(49) ρ(x, y) = 8−k(x,y) .
One can easily prove that ρ is a metric. Since x, y ∈ Pk(x,y) (x), it holds
(50) d(x, y) 6 diam(Pk(x,y) (x)) 6 8−k(x,y) = ρ(x, y).

For the lower bound, we observe that if x ∈ S, since y ∈
/ Pk(x,y)+1 (x):
β
(51) d(x, y) > β8−k(x,y)−1 = ρ(x, y).
8
Combining (50) with (51) we get that
β
ρ(x, y) 6 d(x, y) 6 ρ(x, y), ∀x ∈ S, y ∈ X
8
and the proof would be complete if we knew that ρ was a Hilbert space metric. In fact, one can see that
ρ is an ultrametric, i.e.
(52) ρ(x, z) 6 max{ρ(x, y), ρ(y, z)}, x, z ∈ X.
y∈X
The following lemma finally completes the proof.
Lemma 6.7. Every finite ultrametric space (S, ρ) is isometric to a subset of a Hilbert space.
Proof. Denote by m = |S|. We will prove by induction on m that there is an embedding f : S → H
satisfying:
(i) kf (x) − f (y)kH = ρ(x, y), ∀x, y ∈ S and
(ii) kf (x)kH = diam(S)
√
2
, ∀x ∈ S.
For the inductive step, define the relation ∼ on S by x ∼ y if if ρ(x, y) < diam(S) and observe that this
is an equivalence relation since ρ is an ultrametric. Let A1 , A2 , ..., Ak be the equivalence classes of ∼ and
notice that |Ai | < m for every i = 1, 2..., k, since otherwise m = 1. Thus, for every i = 1, 2, ..., k there
exists an embedding fi : Ai → Hi such that
(i) kfi (x) − fi (y)kHi = ρ(x, y), ∀x, y ∈ Ai and
(ii) kf (x)kHi = diam(A )
√ i , ∀x ∈ Ai .
2
L
k k
Define the map f : S → i=1 Hi ⊕ `2 , by
r
def diam(S)2 − diam(Ai )2
(53) x ∈ Ai ⇒ f (x) = fi (x) + · ei ,
2
where `k2 = span{e1 , ..., ek }. Now, we just check that f is an isometry:
• If x, y ∈ Ai :
kf (x) − f (y)kH = kfi (x) − fi (y)kHi = ρ(x, y).
• If x ∈ Ai , y ∈ Aj and j 6= i then ρ(x, y) = diam(S):
diam(S)2 − diam(Ai )2 diam(S)2 − diam(Aj )2
kf (x) − f (y)k2H = + + kfi (x)k2Hi + kfj (y)k2Hj
2 2
= diam(S)2
= ρ(x, y)2 ,
where in the second equality we used the hypothesis (ii).
diam(S)2
Finally, indeed it is kf (x)k2H = 2 , as we wanted.
Having understood the relation of Definition 6.5 with our problem, we now present the key geometric
lemma that almost finishes the proof:
20
Lemma 6.8. Let (X, d) be a finite metric space and consider a number ∆ > 0. Then there exists a
∆
probability distribution over ∆-bounded partitions of X such that for every t ∈ 0, 8 and every x ∈ X:
|B(x, ∆/8)| 8t/∆
(54) PP B(x, t) ⊆ P(x) > .
|B(x, ∆)|
Proof. Write X = {x1 , x2 , ..., xn } and:
(i) Let π ∈ Sn be a uniformly random permutation of {1, 2,h..., n};i
(ii) Let R be a uniformly distributed random variable over ∆ ∆
4, 2 .
Consider now the random (with respect to π and R) partition P = {C1 , C2 , ..., Cn } of X (with possible
repetitions) given by
j
[
(55) C1 = B(xπ(1) , R) and Cj+1 = B(xπ(j+1) , R) r Ci .
i=1
h i
∆ ∆
Claim. For every fixed r ∈ 4, 2 and x ∈ X:
|B(x, r − t)|
(56) Pπ,R B(x, t) ⊆ P(x) R = r > .
|B(x, r + t)|
For the proof of the claim first observe that, since P(x) is contained in a ball of radius r, every point
outside the ball B(x, r + t) is irrelevant: it belongs to another equivalence class. The crucial idea is the
following: consider the first point x0 (in the random ordering induced by π) which belongs in B(x, r + t)
and suppose that this point happens to belong to B(x, r − t). By constuction, P(x) = P(x0 ) is the ball
of radius r centered at x0 . Thus, we deduce that if d(y, x) 6 t, then
d(y, x0 ) 6 d(y, x) + d(x, x0 ) 6 t + (r − t) = r,
i.e. B(x, t) ⊆ P(x). From the above analysis and the fact that π (and thus the random point of
B(x, r + t)) was chosen uniformly at random we conclude that the claim is valid. 2
Now, define h(s) = log |B(x, s)| and calculate:
Z ∆/2
1
Pπ,R B(x, t) ⊆ P(x) = Pπ,R B(x, t) ⊆ P(x) R = r dr
∆/4 ∆/4
4 ∆/2 |B(x, r − t)|
Z
> dr
∆ ∆/4 |B(x, r + t)|
4 ∆/2 h(r−t)−h(r+t)
Z
= e dr
∆ ∆/4
(†) 4 Z ∆/2
> exp h(r − t) − h(r + t)dr
∆ ∆/4
4 Z ∆/4+t 4 ∆/2+t
Z
= exp h(r)dr − h(r)dr
∆ ∆/4−t ∆ ∆/2−t
8t 8t
> exp h(∆/4 − t) − h(∆/2 + t)
∆ ∆
8t 8t
> exp h(∆/8) − h(∆)
∆ ∆
8t/∆
|B(x, ∆/8)|
= ,
|B(x, ∆)|
where in (†) we used Jensen’s inequality.
We are now in position to finish the proof:
Proof of Theorem 6.2. For every k > 0, let Pk be a random partition as in the previous lemma for
∆ = 8−k such that P0 , P1 , ... are chosen independently of each other. Now, let us define the (random)
partition Qk to be the common refinement of P0 , P1 , ..., Pk or equivalently, Q0 = P0 and Qk+1 is the
common refinement of Qk and Pk+1 . In other words, for every x ∈ X
Qk+1 (x) = Qk (x) ∩ Pk+1 (x).
21
Thus, {Qk }∞
k=0 is a random partition tree of X. Let α > 8 and x ∈ X. We want to estimate from below
the quantity
1 −k
(57) P{Pk }∞ B x, 8 ⊆ Q k (x), ∀k > 0 .
k=0
α
1 −k

Qk were defined, we notice that if B x, α 8
From the way
1 −k
⊆ Pk (x) for every k > 0 if and only if
B x, α 8 ⊆ Qk (x) for every k > 0. So, we deduce that
1 1
P B x, 8−k ⊆ Qk (x), ∀k > 0 = P B x, 8−k ⊆ Pk (x), ∀k > 0

α α
∞
Y 1
PPk B x, 8−k ⊆ Pk (x)

=
α
k=1
∞
(54) Y |B(x, 8−k−1 )| 8/α
>
|B(x, 8−k )|
k=1
1
= .
n8/α
Thus, {Qk }∞
k=0 is α1 -padded with exponent α8 . For ε = α8 ∈ (0, 1) we get the result using Lemma 6.6. 2
Remark. The subset S given by the proof above embeds with distortion at most 1/ε into an ultrametric
space and thus to a Hilbert space. We note here that we can not hope for a Bourgain-type theorem in
the ultrametric setting. This can be seen by the fact that the distortion of the metric space {1, 2, ..., n}
with its usual metric into any ultrametric space is at least n − 1 (exercise).
22
7. Assouad’s embedding theorem
A major research problem in embeddings is the following:
Open problem 7.1 (Bi-Lipschitz embedding problem in Rn ). Characterize those metric spaces (X, d)
that admit a bi-Lipschitz embedding into Rn , for some n.
Analogues of this problem have been answered in other branches of Mathematics. For example,
topological dimension is an invariant in the category of topological spaces and the Nash embedding
theorems settle the case of Riemannian manifolds.
A necessary condition for a metric space to admit an embedding in some Euclidean space Rn is the
following:
Definition 7.2. A metric space (X, d) is K-doubling, where K > 1, if every ball in X can be covered
by K balls of half the radius. A space is doubling if it is K-doubling for some K > 1.
First of all, the necessity is a special case of the following:
n n
Lemma 7.3. Let k·k be a norm
on R . For every r > 0 and ε ∈ (0, 1), every ball of radius r is (R , k·k)
3 n
can be covered by at most ε balls of radius εr.
Proof. Let N be an εr-net in B(x, r), i.e. a maximal εr-separated subset of B(x, r). Then
[
(58) B(x, r) ⊆ B(y, εr).
y∈N
Now, to bound |N |, observe that from the construction of N , the balls B y, r

2 y∈N
are pairwise
disjoint, thus
 
[ εr X εr
vol  B y, = vol B y,
2 2
y∈N y∈N
εr n
= |N | vol(B).
2
But
[ εr εr
B y, ⊆ B x, r +
2 2
y∈N
which gives
εr n εr n
|N | vol(B) 6 r + vol(B).
2 2
So, we conclude that
n n
2 3
|N | 6 1+ 6 .
ε ε

One might conjecture that the doubling condition is sufficient for a metric space to be embeddable
in some Rn . Even though it turns out that there are doubling metric spaces which are not bi-Lipschitz
embeddable in any Rn , Assouad proved that the equivalence is almost correct:
Theorem 7.4 (Assouad, 1983). For every K > 1 and ε ∈ (0, 1), there exist D = D(K, ε) > 1 and
N = N (K, ε) ∈ N such that for any K-doubling metric space (X, d), its (1 − ε)-snowflake (X, d1−ε ) can
be embedded with bi-Lipschitz distortion at most D into RN .
Remark. Let us note here the almost trivial fact that the doubling condition is also necessary in As-
souad’s statement: if (X, d1−ε
X ) can be embedded into some doubling metric space (Y, dY ) with distortion
D for some ε ∈ (0, 1), then (X, dX ) is also doubling.
An impressive (trivial) corollary of Assouad’s theorem is the following:
√
Corollary 7.5. A metric space (X, d) is such that (X, d) is bi-Lipschitz to a subset of some RN if and
only if (X, d) is doubling.
23
Now we will proceed with the proof:
Proof of Assouad’s embedding theorem. Let (X, d) a K-doubling metric space and ε ∈ (0, 1). Fix
(temporarily) some c > 0 and let N be any c-net in X. Define a graph structure on N by setting:
(59) x ∼ y ⇔ d(x, y) 6 12c, x, y ∈ N .
Observation I. There exists an M = M (K) ∈ N such that the degree of this graph is bounded by M − 1,
i.e.
(60) ∀x ∈ N : |{y ∈ N : d(x, y) 6 12c}| 6 M − 1.
In particular, this graph is M -colorable: there exists a coloring χ : N → {1, 2, ..., M } such that for x ∼ y,
χ(x) 6= χ(y).
Proof. Consider an x ∈ N and the ball B(x, 12c). By the doubling condition, there exists a power of
K, say M − 1 = M (K) − 1, such that B(x, 12c) can be covered by M − 1 balls of radius c/2. However,
each of this balls can contain at most one element of N , thus (60) holds true. In particular (a proof by
induction), the graph is M -colorable, which proves the observation. 2
Now, define the embedding fc = f : X → RM by
X
(61) f (x) = gz (x)eχ(z) ,
z∈N
where
n 2c − d(x, z) o
(62) gz (x) = max ,0
2c
M 1
and e1 , ..., eM is the standard basis of R . Observe that each gz is 2c -Lipschitz, supported in B(z, 2c)
and also the above sum is finite since B(z1 , 2c) ∩ B(z2 , 2c) 6= ∅ only for finitely many pairs z1 , z2 ∈ N .
Finally, the number of such pairs depends only on K, call it C0 = C0 (K). Thus kf k∞ 6 C0 which
implies:
kf (x) − f (y)k2 6 2C0 , ∀x, y ∈ X.
Furthermore:
X C0
kf (x) − f (y)k2 6 kgz (x) − gz (y)k2 6 d(x, y).
2c
z∈N
Hence, we can summarize the above as follows:
n d(x, y) o
(63) kf (x) − f (y)k2 6 B min 1, , ∀x, y ∈ X,
c
for some B = B(K) > 0.
Observation II. If x, y ∈ X and 4c 6 d(x, y) 6 8c, then f (x) and f (y) are orthogonal.
Proof. In the expansion of f , gz (w) 6= 0 if and only if z ∈ B(w, 2c). The balls of radius 2c around x, y are
disjoint but any two points of N from these balls are neighbours in the graph (their distance is 6 12c).
This implies that f (x) and f (y) are disjointly supported, thus orthogonal. 2
Thus, for x, y ∈ X with 4c 6 d(x, y) 6 8c we have:
1
q
(64) kf (x) − f (y)k22 = kf (x)k22 + kf (y)k22 > ,
2
since there exist points of N which are c-close to x, y.
Now, we are ready to get to the main part of the proof. Applying the above construction with
c = 2−j−3 , for j ∈ Z, we get functions fj : X → RM such that:
(65) kfj (x) − fj (y)k2 6 B min{1, 2j d(x, y)}, ∀x, y ∈ X,
where B = B(K) > 0 and if 2−j−1 6 d(x, y) 6 2−j :
(66) kfj (x) − fj (y)k2 > A,
for some absolute constant A > 0. We want to glue together the functions {fj }j∈Z . For this purpose, fix
an integer m ∈ Z, to be determined later. Also denote by u1 , ..., u2m the standard basis of R2m with the
24
convention uj+2m = uj , for j ∈ Z. The ε-Assouad embedding of X is the map f : X → RM ⊗R2m ≡ R2mM
defined by
X 1
(67) f (x) = j(1−ε)
fj (x) ⊗ uj , x ∈ X.
j∈Z
2
We will prove that f is actually a bi-Lipschitz embedding. Let x, y ∈ X and ` ∈ Z such that
1 1
6 d(x, y) < ` .
2`+1 2
First, for the upper bound:
X 1
kf (x) − f (y)k2 6 j(1−ε)
kfj (x) − fj (y)k2
j∈Z
2
X 1 X 1
6 kfj (x) − fj (y)k2 + kfj (x) − fj (y)k2
j6`
2j(1−ε) j>`
2j(1−ε)
X 1 X 1
6B 2j d(x, y) + B
j6`
2j(1−ε) j>`
2 j(1−ε)
X B
.B 2jε d(x, y) +
j6`
2`(1−ε)
Bd(x, y)1−ε ,
by the way we picked `. To prove the lower bound, observe first that
X 1 X 1
kf (x) − f (y)k2 > j(1−ε)
· (fj (x) − fj (y)) ⊗ uj − kfj (x) − fj (y)k2 .
2 2 2j(1−ε)
|j−`|<m |j−`|>m
For the second term here, we have

X 1 X 1
j(1−ε)
kfj (x) − fj (y)k2 6 B j(1−ε)
j>`+m
2 j>`+m
2
B

2(`+m)(1−ε)
B
d(x, y)1−ε
2m(1−ε)
and also
X 1 X 1
kfj (x) − fj (y)k2 6 B 2j d(x, y)
j6`−m
2j(1−ε) j6`−m
2j(1−ε)
B2(`−m)ε d(x, y)
B
mε d(x, y)1−ε .
2
Finally, in the first term, we are tensorizing with respect to distinct uj ’s; in particular:
X 1 1
j(1−ε)
· (fj (x) − fj (y)) ⊗ uj > `(1−ε) kf` (x) − f` (y)k2
2 2 2
|j−`|<m
A
>
2`(1−ε)
d(x, y)1−ε .
The above series of inequalities proves that, for large enough m,
kf (x) − f (y)k2 & d(x, y)1−ε ,
which finishes the proof. 2
25
8. The Johnson-Lindenstrauss extension theorem
In this section we will present an important extension theorem for Lipschitz functions with values in a
Hilbert space. Afterwards, we will give an example proving that this result is close to being asymptotically
sharp.
8.1. Statement and proof of the theorem. The main result of this section is the following:
Theorem 8.1 (Johnson-Lindenstrauss extension theorem, 1984). Let (X, d) be a metric space and A ⊆ X
a subset with |A| = n. Then, for every f : A → `2 there exists an extension f˜ : X → `2 , i.e. f˜|A = f ,
√
satisfying kf˜kLip . log nkf kLip .
Using the terminology of Section 1, the result states that for every metric space (X, d),
p
(68) en (X, `2 ) . log n.
The ingredients needed for the proof of the extension theorem are the nonlinear Hahn-Banach theorem,
Kirszbraun’s extension theorem and the following dimension reduction result which is fundamental in
its own right:
Theorem 8.2 (Johnson-Lindenstrauss lemma). For every ε > 0 there exists c(ε) > 0 such that: for
every n > 1 and every x1 , ..., xn ∈ `2 there exist y1 , ..., yn ∈ Rk satisfying the following:
(i) For every i, j = 1, 2, ..., n:
(69) kxi − xj k2 6 kyi − yj k2 6 (1 + ε)kxi − xj k2 ;
(ii) k 6 c(ε) log n.
Remark. The proof that we will present gives the dependence c(ε) . ε12 . In the early 00’s, Alon proved
1
that in order for the lemma to hold, one must have c(ε) & ε2 log(1/ε) . However, the exact dependence on
ε is not yet known.
Let’s assume for the moment that the lemma is valid.
Proof of Theorem 8.1. Let g : f (A) → `k2 be the embedding given by the Johnson-Lindenstrauss lemma,
i.e. the map xi 7→ yi , when ε = 1. The lemma guarantees that k log n, kgkLip 6 2 √ and kg −1 kLip 6 1.
k k −1
Consider also the identity map I : `2 → `∞ ; it holds kIkLip = 1 and kI kLip = k. For the map
◦ g ◦ f : X → `k∞ satisfying
I ◦ g ◦ f : A → `k∞ , the nonlinear Hahn-Banach theorem gives an extension I ^
kI ^
◦ g ◦ f kLip = kI ◦ g ◦ f kLip 6 2kf kLip .
Also, by Kirszbraun’s theorem, the map g −1 : g ◦ f (A) → `2 admits an extension gg
−1 : `k → ` satisfying
2 2
−1 k −1
kgg Lip = kg kLip = 1.
Define now f˜ : X → `2 to be
(70) f˜ = gg
−1 ◦ I −1 ◦ I ^
◦g◦f
and observe that f˜ indeed extends f and
√
kf˜kLip . kkf kLip log nkf kLip ,
p
as we wanted. 2
n
Proof of the Johnson-Lindenstrauss lemma. Without loss of generality, we assume that x1 , ..., xn ∈ R .
Fix some u ∈ S n−1 and let g1 , ..., gn be independent standard gaussian random variables, i.e. each gi
has density
1 2
φ(t) = √ · e−t /2 , t ∈ R.
2π
Denote by
X n
G = G(u) = gi ui
i=1
and observe that G is also a random variable with the same density, by the rotation invariance of the
Gauss measure. Consider now G1 , ..., Gk i.i.d. copies of G and the (random) embedding:
1 1 1
u 7−→ kuk2 · √ G1 (u), √ G2 (u), ..., √ Gk (u) ,
k k k
26
for some k to be determined later. We will prove that there exists an embedding from this random family
such that gives the desired inequalities. Fix some number λ ∈ (0, 12 ) and u ∈ S n−1 ; then:
1 Xk Pk
2
P G2i > 1 + ε = P eλ i=1 Gi > eλ(1+ε)k
k i=1
(•) h Pk 2
i
6 e−λ(1+ε)k E eλ i=1 Gi
k
Y 2
= e−λ(1+ε)k E eλGi

i=1
k Z ∞
−λ(1+ε)k
Y 1 2 2
=e √ eλt e−t /2
dt
i=1
2π −∞
1
= e−λ(1+ε)k
(1 − 2λ)k/2
k
= exp − λ(1 + ε)k − log(1 − 2λ) ,
2
ε
where in (•) we used Markov’s inequality. The last quantity is minimized when λ = 2(1+ε) ; thus we get
k
1 X
ε 1 1
(71) P G2i > 1 + ε 6 e−k 2+2 log 1+ε .
k i=1
One can easily calculate and see that
ε 1 1
+ log & ε2 ;
2 2 1+ε
thus there exists a c > 0 such that
k
1 X 2
(72) P G2i > 1 + ε 6 e−ckε .
k i=1
The exact same proof also gives the inequality
1 Xk 2
(73) P G2i 6 1 − ε 6 e−ckε
k i=1
and combining these we get
1X k 2
(74) P Gi (u)2 − 1 > ε 6 2e−ckε ,
k i=1
for every u ∈ S n−1 and ε > 0. Define now

1 1 1
(75) yi = √ G1 (xi ), √ G2 (xi ), ..., √ Gk (xi )
k k k
and observe that for i 6= j and η > 0:

P kyi − yj k22 −kxi − xj k22 > ηkxi − xj k22
ky − y k 2
i j 2
=P −1 >η
kxi − xj k2
1X k x − x 2
i j
=P G` −1 >η
k kxi − xj k2
`=1
−ckη 2
6 2e .
From this we deduce the bound

P kyi − yj k22 −kxi − xj k22 > ηkxi − xj k22 , for some i, j
(76)
n −ckη2
62 e .
2
27
q
1+η
Choosing η > 0 so that 1 + ε = 1−η and k & ε12 log n, the above quantity is strictly less that 1 so there
exists an embedding from this random family such that (after rescaling):
kxi − xj k2 6 kyi − yj k2 6 (1 + ε)kxi − xj k2 ,
for every i, j = 1, 2, ..., n. 2
8.2. Almost sharpness of the theorem. The best known lower for the Johnson-Lindenstrauss exten-
sion question (also proven in the same paper) is the following:
Theorem 8.3. For arbitrarily large n, there exists a sequence of metric spaces {Xn }, subsets An ⊆ Xn
with |An | = n and 1-Lipschitz functions fn : An → `2 such that every extension fñ : Xn → `2 , i.e.
fñ |An = fn , satisfies
s
˜ log n
(77) kfn kLip & .
log log n
Strategy. We will find finite-dimensional normed spaces X, Y and Z, where X is a subspace of Y and
Z is a Hilbert space, and a linear operator T : X → Z such that:
(i) dim X = dim Z=k,
(ii) T is an isomorphism satisfying kT k = 1 and kT −1 k 6 4 and √
(iii) for any linear operator S : Y → Z that extends T , i.e. S|X = T we have kSk > 2k .
Our finite set A will be an ε-net in the unit sphere of X, SX = {x ∈ X : kxk = 1} and f = T |A .
Using a linearization argument we will see that every extension of f must have large Lipschitz norm (the
constants above are unimportant).
We digress to note that condition (iii) above is, in a sense, sharp. This follows from this classical
theorem of Kadec and Snobar:
Theorem 8.4 (Kadec-Snobar). Let Y be a Banach space √ and X a k-dimensional subspace of Y . Then
there exists a projection P : Y → X satisfying kP k 6 k.
We remind the following lemma, which we partially saw in the proof of Assouad’s theorem:
Lemma 8.5. If X is a k-dimensional Banach space and ε > 0, then there exists an ε-net N ⊆ SX such
that
k
2
(78) |N | 6 1 + .
ε
Also, every such net satisfies
k−1
2
(79) |N | > .
ε
Proof. Similar to the proof of Lemma 7.3 – left as an exercise.
Fix now the Banach spaces X, Y, Z, where Z = `k2 , and T : X → Z as described in the strategy above
(whose existence we will prove later). Consider N ⊆ SX an ε-net, as in the previous lemma, and define
A = N ∪ {0}, f = T |A : A → Z. We will prove that for a Lipschitz extension f˜ : SY ∪ {0} → Z of f , i.e.
f˜|A = f , with kf˜kLip = L, L has to be large. The proof will be completed via the following lemmas:
Lemma 8.6. Consider F : Y → Z, the positively homogeneous extension of f˜, that is
(
kykf˜ y/kyk , y 6= 0

(80) F (y) = .
0, y=0
Then, kF kLip 6 2kf˜kLip + kf˜k∞ .

Lemma 8.7 (Smoothing lemma). Let X ⊆ Y and Z be Banach spaces with dim X = k. Suppose
F : Y → Z is Lipschitz and positively homogeneous and T : X → Z is a linear operator. Then, there
exists another positively homogeneous function F̃ : Y → Z such that:
(i) kF̃ kLip 6 4kF kLip and
(ii) kF̃ |X − T kLip 6 (8k + 2)kF |SX − T |SX k∞ .
28
Lemma 8.8 (Linearization lemma). Let X ⊆ Y and Z be finite dimensional Banach spaces. Suppose
F̃ : Y → Z is a Lipschitz function and T : X → Z is a linear operator. Then there exists a linear
operator S : Y → Z satisfying:
(i) kSk 6 kF̃ kLip and
(ii) kS|X − T k 6 kF̃ |X − T kLip .
Remark. The proof will show that the linearization lemma still holds true for infinite-dimensional
Banach spaces, if Z is reflexive.
Assuming everything, we now complete the proof of the lower bound:
Proof of Theorem 8.3. For the function F : Y → Z of Lemma 8.6, we have
f˜(0)=0
kF kLip 6 2kf˜kLip + kf˜k∞ 6 3L.
From the smoothing lemma, there exists a positively homogeneous function F̃ : Y → Z such that
kF̃ kLip 6 12L
and
kF̃ |X − T kLip 6 (8k + 2)kF |SX − T k∞ .
Now, for x ∈ SX there exists some a ∈ N such that ka − xk 6 ε. So, we get:
(•)
kF (x) − T xk 6 kf˜(x) − f˜(a)k + kT a − T xk
6 Lkx − ak + kT kkx − ak
6 (L + 1)ε,
where in (•) we used that T a = f˜(a). Hence, we deduced that
kF̃ |X − T kLip 6 (8k + 2)(L + 1)ε 6 20kLε.
Using the linearization lemma, we can find a linear operator S : Y → Z such that
kSk 6 kF̃ kLip 6 12L
and
kS|X − T k 6 kF̃ |X − T kLip 6 20kLε.
Consider now the linear operator S|X T −1 : Z → Z and observe that
kI − S|X T −1 k = k(T − S|X )T −1 k 6 4kT − S|X k 6 80kLε.
Hence, if
(∗) 160kLε < 1
−1
holds, then S|X T is invertible with
∞
X
(S|X T −1 )−1 = (I − S|X T −1 )j
j=0
and thus
∞
X
k(S|X T −1 )−1 k 6 kI − S|X T −1 kj 6 2.
j=0
0 0 −1 −1
Define now, S : Y → Z by S = (S|X T ) S and observe that S 0 extends T and
kS 0 k 6 2kSk 6 24L.
So, the non-extendability property in the construction of T implies that:
√ √
k k
(•) kS 0 k > =⇒ L > .
2 48
k √
Finally, remember that n = |A| = |N | + 1 . 3ε and that, without loss of generality, L . log n (from
the positive Johnson-Lindenstrauss theorem). We need to pick ε > 0 so that (∗) holds. Observe that
p
160kLε . kε log n . k 3/2 ε log(1/ε),
29
log k
from the inequalities above. Choosing ε k3/2
, (∗) holds and also
log n
n . eCk log k =⇒ k & ,
log log n
for a universal constant C > 0. Finally, from (•) we deduce that
s
√ log n
L& k& ,
log log n
as we wanted. 2
Remark. Observe that the proof above cannot give an asymptotically better result than the one
exhibited here. In particular, getting rid of the log log n term on the denominator, would be equivalent
(in the above argument) to choosing ε to be a constant, i.e. independent of k, which contradicts (∗).
We now proceed with the various ingridients of the proof. First we will explain in detail the con-
struction of the spaces X, Y, Z and the operator T : X → Z. As usual, it will be based on some Fourier
Analysis on the discrete cube.
Construction. Consider the Hamming cube Fk2 and let Y = L1 (Fk2 , µ), where µ is the uniform probability
measure on Fk2 . Also, consider the subspace
n n
X o
(81) X = f ∈ L1 (Fk2 ) : ∃ ai ∈ R s.t. f (x) = ai (−1)xi , ∀x ∈ Fk2 ,
i=1
consisting of all linear functions on the cube. Finally, define T : X → `k2 by

k
X
(82) T ai εi = (a1 , ..., ak ),
i=1
xi
where εi (x) = (−1) . Observe that {εi }ki=1 are i.i.d. Bernoulli random variables. The first lemma is a
special case of Khinchine’s inequality:
Lemma 8.9. For every a1 , ..., ak ∈ R:
k k k
1 X 2 1/2 X X 1/2
(83) √ ai 6 Eε ai εi 6 a2i .
3 i=1 i=1 i=1
√
In particular kT k 6 1 and kT −1 k 6 3 < 4.
Pk 2
Proof. Consider the random variable X = i=1 ai εi and observe that, from independence, EX =
Pk 2
i=1 ai . Thus, from Jensen’s inequality:
k
X 1/2
1/2
EX 6 EX 2 = a2i .
i=1
Now, for the lower bound:
k k
X 2
X X 2
EX 4 = a4i + 6 a2i a2j 6 3 a2i = 3 EX 2 .
i=1 i<j i=1
Thus, using Hölder’s inequality:

2/3 1/3 2/3 2/3
EX 2 = EX 2/3 X 4/3 6 EX EX 4 6 EX 31/3 EX 2 ;
or equivalently
1 1/2
EX > √ EX 2 ,
3
which is exactly (83).
√ √
Remark. It is a theorem of Szarek that the sharp constant (instead of 3) in this inequality is 2.
Now, we only have left to prove (iii) from the construction above:
q
k
Lemma 8.10. Suppose P : Y → X is a linear projection. Then kP k > 2.
30
Remark. Assuming the lemma,qconsider S : Y → `k2 an extension of T . Then, T −1 S : Y → X is a
√ √
projection and thus kT −1 Sk > k
2 . Since kT
−1
k 6 2, we deduce that kSk > 2k , which is exactly
(iii).
Proof of Lemma 8.10. For a function f : Fk2 → R and y ∈ Fk2 , consider the function defined by
fy (x) = f (x + y). Also, define the operator Q : Y → X by
1 X
(84) Qf = k (P fy )y
2 k y∈F2
Pk
and observe that for f = i=1 ai εi we have
k
X
fy (x) = ai (−1)yi εi (x) ∈ X.
i=1
Thus P fy = fy , which implies Qf = f , i.e. Q is a projection with

1 X
kQf k1 6 k k(P fy )y k1
2 k y∈F2
6 kP kkf k1 ;
that is, kQk 6 kP k.
Claim. Q can be written as
k
X
(85) Qf (x) = fb({i})εi (x), x ∈ Fk2 .
i=1
Assuming that the claim is true, consider f = 2k δ(0,...,0) =

P
A⊆{1,...,k} wA and observe that
k r
X k
kQk > kQf k1 = E εi > .
i=1
2
q
So, we can deduce that kP k > k
2. 2
Proof of the claim. The verification of (85) from (84) is a straightforward computation using the Fourier
expansion of f – we leave it as an exercise. 2
This completes the promised construction. Now, we have to prove the Lemmas 8.6, 8.7 and 8.8 that
we used above.
Proof of Lemma 8.6. Let x, y ∈ Y such that 0 < kxk 6 kyk and write:
kF (x) − F (y)k = kxkf˜(x/kxk) − kykf˜(y/kyk)
6 kxkf˜(x/kxk) − kykf˜(x/kxk) + kykf˜(x/kxk) − kykf˜(y/kyk)

x y
6 kxk − kyk · kf˜(x/kxk)k + kykkf˜kLip −
kxk kyk
kyk
6 kf˜k∞ kx − yk + kf˜kLip y − x .
kxk
Now, since
kyk
y− x 6 ky − xk + kxk − kyk 6 2kx − yk,
kxk
the lemma follows. 2
Proof of the smoothing lemma. For y ∈ SY , define
Z
1
(86) Fb(y) = F (y + x)dx.
vol(BX ) BX
31
Trivially, kFbkLip 6 kF kLip . Take x1 6= x2 ∈ SX and denote by δ = kx1 − x2 k. We first want to estimate
the following quantity:
k(Fb − T )(x1 )−(Fb − T )(x2 )k
Z
1
= F (x1 + x) − T (x1 + x) − F (x2 + x) + T (x2 + x)dx
vol(BX ) BX
Z Z
1
= F (w) − T (w)dw − F (w) − T (w)dw
vol(BX ) (x1 +BX )r(x2 +BX ) (x2 +BX )r(x1 +BX )
Z
1
6 kF (w) − T (w)kdw
vol(BX ) (x1 +BX )4(x2 +BX )

vol (x1 + BX )4(x2 + BX )
6 · sup kF (w) − T (w)k
vol(BX ) w∈2BX

vol (x1 + BX )4(x2 + BX )
62· · kF |SX − T |SX k∞ .
vol(BX )
Now observe that

(x1 + BX )4(x2 + BX ) ⊆ (x1 + BX ) r (x1 + (1 − δ)BX ) ∪ (x2 + BX ) r (x2 + (1 − δ)BX .
and thus
vol (x1 + BX )4(x2 + BX ) 6 2 1 − (1 − δ)k vol(BX ) 6 2kδvol(BX ).

Putting everything together, we have proved that

(87) kFb − T |SX kLip 6 4kkF |SX − T |SX k∞ .
Let F̃ be the positive homogeneous extension of Fb. Lemma 8.6 implies that
kF̃ kLip 6 2kFbkLip + kFbk∞
6 2kF kLip + kF k`∞ (2SX )
= 2kF kLip + 2kF k∞ 6 4kF kLip .
Finally, using again Lemma 8.6

kF̃ |X − T kLip = y 7→ kyk Fb(y/kyk) − T (y/kyk)
6 2kFb − T |SX kLip + kFb|SX − T |SX k∞

6 8kkF |SX − T |SX k∞ + kF − T k`∞ (2SX )
6 (8k + 2)kF |SX − T |SX k∞ ,
which completes the proof. 2
To finish the proof, we must prove the linearization lemma 8.8. We will need a few definitions.
For any Banach space X, define the Lipschitz dual of X by
(88) X # = {f : X → R : f (0) = 0 and f Lipschitz};
X # with the Lipschitz norm k · kLip becomes a Banach space:
Exercise 8.11. Prove that X # is complete.
Trivially, X ∗ ⊆ X # . Something stronger holds though:
Theorem 8.12 (Lindenstrauss, weak form). Let Y be a Banach space. Then there exists a linear
projection PY : Y # → Y ∗ with kPY k = 1.
In fact, we will need the following strengthening of this result:
Theorem 8.13 (Lindenstrauss, strong form). Let Y be a Banach space and X ⊆ Y a finite-dimensional
subspace. Then there exist norm-1 linear projections PY : Y # → Y ∗ and PX : X # → X ∗ such that for
every f ∈ Y # :
(89) PX (f |X ) = (PY f )|X .
Equivalently, if we denote by R# : Y #
→ X # and R∗ : Y ∗ → X ∗ the restriction operators:
(90) PX R # = R ∗ PY .
32
Proof of the linearization lemma. For convenience of notation, write f instead of F̃ . Consider also the
norm-1 linear projections PX , PY given by the Lindenstrauss theorem that satisfy PX R# = R∗ PY . For
any Lipschitz function f : Y → Z define its functional linearization, f # : Z # → Y # by
(91) f # (h) = h ◦ f
and observe that it is a linear operator with kf # k 6 kf kLip . Using the diagram
f # |Z ∗ P
Z ∗ −−−−−→ Y # −−−
Y
→Y∗
def
and the facts Y ∗∗ = Y and Z ∗∗ = Z we can define the linear operator S : Y → Z by S = (PY f # |Z ∗ )∗ .
The weak form of Lindenstrauss’ theorem immediately implies (i) of Lemma 8.8:
∗
kSk = PY f # |Z ∗ 6 kf # k 6 kf kLip .
To prove (ii) we will furthermore need the consistency condition (90):
∗ ∗
kS|X − T k = PY f # |Z ∗ |X − T = PY f # |Z ∗ R∗∗ − T ∗∗
= kR∗ PY f # |Z ∗ − T ∗ k = kPX R# f # |Z ∗ − PX T ∗ k
6 kR# f # |Z ∗ − T ∗ k
= sup (z ∗ ◦ f )|X − z ∗ ◦ T Lip
kz ∗ k=1
z ∗ f (x) − T (x) − z ∗ f (y) − T (y)

= sup sup
kz ∗ k=1 x6=y∈X kx − yk
z ∗ (f (x) − T (x)) − z ∗ (f (y) − T (y))

= sup sup
x6=y∈X kz ∗ k=1 kx − yk

f (x) − T (x) − f (y) − T (y)
= sup
x6=y∈X kx − yk
= kf |X − T kLip ,
2
We still have to prove Lindenstrauss’ theorems to finish the proof:
Proof of the weak Lindenstrauss theorem. Let m = dim Y and R e1 , . . . , em a basis of Y . Also, consider
ψ : Y → [0, ∞) a compactly supported C ∞ function with Y ψ(y) dy = 1 and for f ∈ Y # define
PY f ∈ Y ∗ by
m m Z
X X ∂ψ
(92) (PY f ) a i ei = − ai f (y) (y) dy.
i=1 i=1 Y ∂y i
This defines an operator PY : Y # → Y ∗ such that, if f is smooth:

m m Z
X X ∂f
(PY f ) ai ei = ai (y)ψ(y) dy.
i=1 i=1 Y ∂yi
∗
Pm if in particular f ∈ Y then PY f = f . Now, fix some ε > 0 and for f smooth on Y and
Thus,
i=1 ai ei = 1, write:
m m Z
X X ∂f
(PY f ) ai ei = ai (y)ψ(y) dy
i=1 i=1 Y ∂yi
Z m
1 X
= ψ(y) f y + εai ei − f (y) + εθ(e, y) dy
ε Y i=1
Pm
Z f y + i=1 ai ei − f (y)
6 ψ(y) dy + sup |θ(ε, y)|
Y ε y∈supp(ψ)
6 kf kLip + o(1),
33
as ε → 0+ , since θ(ε, y) → 0 uniformly in y in supp(ψ). So far we have proved that for every a1 , . . . , am ∈
R:
Xm Xm
PY f ai ei 6 kf kLip a i ei
i=1 i=1
for every smooth function
R f . For a general f ∈ Y # consider a sequence {χn } of C ∞ compactly supported
functions in Y with Y χn (y) dy = 1 whose supports shrink to zero. Then, the functions fn = f ∗ χn are
smooth averages of f , i.e. kf ∗ χn kLip 6 kf kLip . Now, since χn ∗ f −→ f uniformly on compact sets, we
deduce that:
kPY f kLip = lim kPY fn kLip 6 lim sup kfn kLip 6 kf kLip ,
n→∞ n→∞
which implies that kPY k 6 1. 2
Finally, we will use this to prove Theorem 8.13:
Proof of the strong Lindenstrauss theorem. Let k = dim X, m = dim Y and e1 , . . . , ek a basis of X,
completed into a basis e1 , . . . , em of Y . In what follows identify X ≡ Rk and Y ≡ Rm ≡ Rk × Rm−k .
∞
Finally, fix
R two compactly supported C functions ψ1 : Rk → [0, ∞) and ψ2 : Rm−k → [0, ∞) satisfying
# ∗
R
ψ = Rm−k ψ2 = 1. Define now Pn : Y → Y by
Rk 1
m m Z
X
m−k
X ∂
(93) (Pn f ) ai ei = −n ai f (y) ψ1 (y1 , . . . , yk )ψ2 (nyk+1 , . . . , nym ) dy.
i=1 i=1 Rm ∂y i
By the argument in the previous proof, {Pn } is a sequence of projections.

Claim. The sequence {Pn } has a limit point in the SOT.
Proof. For i 6 m define (Pn f )(ei ) = φin (f ) and note that φin is a linear functional on Y # satisfying
kφin f k 6 kf kLip kei k =⇒ φin ∈ kei kB(Y # )∗ .
The Banach-Alaoglu theorem now implies that there exists a limit point for every φin , which in turn
defines a limit point of {Pn } in the SOT. 2
# ∗ #
Let PY : Y → Y be such a limit point, that is PY f = limn→∞ Pn f for every f ∈ Y . This
immediately implies that PY is a norm-1 projection. Now, for i 6 k:
Z
∂ψ1
(PY f )(ei ) = lim nm−k −f (y) (y1 , . . . , yk )ψ2 (nyk+1 , . . . , nym ) dy
n→∞ Rm ∂yi
Z
∂ψ1
=− f (y1 , . . . , yk , 0, . . . , 0) (y1 , . . . , yk ) dy
R k ∂yi
= (PX f |X )(ei ),
where PX : X # → X ∗ is the projection we wanted. The second equality above follows from the fact that
the support of the smooth function (yk+1 , . . . , ym ) 7→ ψ2 (nyk+1 , . . . , nym ) shrinks to zero as n → ∞. 2
34
9. Embedding unions of metric spaces into Euclidean space
Here is the main question we want to address in this section:
Question: Suppose (X, d) is a metric space and X = A ∪ B, where c2 (A) < ∞ and c2 (B) < ∞. Does
this imply that c2 (X) < ∞?
The answer is given, quantitatively, by the following recent result due to K. Makarychev and Y.
Makarychev:
Theorem 9.1. Let (X, d) be a metric space and A, B ⊆ X such that X = A ∪ B. Then, if c2 (A) < DA
and c2 (B) < DB , we also have c2 (X) . DA DB .
It is not currently known if the dependence on DA , DB above is sharp. The best previously known
similar result is the following:
Theorem 9.2. Let (X, d) be a metric space, A, B ⊆ X such that X = A ∪ B and (S, ρ) an ultrametric
space. If cS (A) < DA and cS (B) < DB , then cS (X) . DA DB . Furthermore, the above dependence on
DA , DB is sharp.
Let us now proceed with the proof of Theorem 9.1. As usual, we can assume without loss of generality
that X is finite and also that A ∩ B = ∅.
Lemma 9.3. Let X = A ∪ B as above and α > 0. Then there exists some A0 ⊆ A with the following
properties:
(i) For every a ∈ A, there exists a0 ∈ A0 such that
d(a0 , B) 6 d(a, B) and d(a, a0 ) 6 αd(a, B).
(ii) For every a01 , a02 ∈ A0
d(a01 , a02 ) > α min{d(a01 , B), d(a02 , B)}.
Proof. We will construct the subset A0 . First pick a01 ∈ A such that
d(a01 , B) = d(A, B).
Sj
Now, if we have chosen a01 , a02 , ..., a0j ∈ A, pick a0j+1 ∈ A to be any point in the set Ar i=1 B(a0i , αd(a0i , B))
that is closest to B. Since X is finite, this process will terminate after s 6 |A| steps; then define
A0 = {a01 , ..., a0s }. Now, the required conditions can be easily checked:
(i) For some a ∈ A r A0 , let j be the minimum index such that a ∈ B(a0j , αd(a0j , B)). Since a0j was
chosen in this step, we get
d(a0j , B) 6 d(a, B)
and by the way we picked j
d(a, a0j ) 6 αd(a0j , B) 6 αd(a, B).
(ii) For any i < j, since a0i was picked over a0j , we have d(a0i , B) 6 d(a0j , B) and since a0j ∈
/ B(a0i , αd(a0i , B)):
d(a0j , a0i ) > αd(a0i , B).

For the above set A0 , define a map f : A0 → B by setting f (a0 ) to be any closest point to a0 in B, i.e.
(94) d(a0 , f (a0 )) = d(a0 , B).
Lemma 9.4. For the above A0 and f : A0 → B, we have
1
kf kLip 6 2 1 + .
α
Proof. Let a01 , a02 ∈ A0 . Then
d(f (a01 ), f (a02 )) 6 d(f (a01 ), a01 ) + d(a01 , a02 ) + d(a02 , f (a02 ))
= d(a01 , B) + d(a01 , a02 ) + d(a02 , B).
Observe now that
max{d(a01 , B), d(a02 , B)} 6 d(a01 , a02 ) + min{d(a01 , B), d(a02 , B)}
35
and using condition (ii) we get:
d(f (a01 ), f (a02 )) 6 2d(a01 , a02 ) + 2 min{d(a01 , B), d(a02 , B)}
1 0 0
62 1+ d(a1 , a2 ),
α
which is what we wanted.
The main part of the proof is the following construction of an approximate embedding:
Lemma 9.5. For X = A ∪ B as above, there exists a map ψ : X → `2 such that:
(i) For every a1 , a2 ∈ A:
1
(95) kψ(a1 ) − ψ(a2 )k2 6 2 1 + DA DB d(a1 , a2 ).
α
(ii) For every b1 , b2 ∈ B:
(96) d(b1 , b2 ) 6 kψ(b1 ) − ψ(b2 )k2 6 DB d(b1 , b2 ).
(iii) For every a ∈ A, b ∈ B:

(97) kψ(a) − ψ(b)k2 6 2(1 + α)DA DB + (2 + α)DB d(a, b)
and
(98) kψ(a) − ψ(b)k2 > d(a, b) − (1 + α)(2DA DB + 1)d(a, B).
Proof. By our assumptions on A and B, there exist embeddings φA : A → `2 and φB : B → `2 such that
d(a1 , a2 ) 6 kφA (a1 ) − φA (a2 )k2 6 DA d(a1 , a2 )
and
d(b1 , b2 ) 6 kφB (b1 ) − φB (b2 )k2 6 DB d(b1 , b2 ),
for every a1 , a2 ∈ A and b1 , b2 ∈ B. Consider the mapping
φB ◦ f ◦ φ−1 0
A : φA (A ) −→ φB (B) ⊆ `2
and observe that φA (A0 ) ⊆ `2 and

1
kφB ◦ f ◦ φ−1
A kLip 6 2 1 + DB ,
α
from the estimate in Lemma 9.4. By Kirszbraun’sextension therem, there exists a map h : `2 → `2
−1 1
which extends φB ◦ f ◦ φA and also has khkLip 6 2 1 + α DB . Now, we define ψ : X → `2 by
(
φB (x), x∈B
(99) ψ(x) =
h ◦ φA (x), x ∈ A
and we’ll check that ψ satisfies what we want. First of all, (ii) is trivial and for (i):
kψ(a1 ) − ψ(a2 )k2 = kh ◦ φA (a1 ) − h ◦ φA (a2 )k2
1
62 1+ DB kφA (a1 ) − φA (a2 )k2
α
1
62 1+ DA DB d(a1 , a2 ).
α
Now, to check (iii), take a ∈ A and b ∈ B. By our choice of A0 , there exists a0 ∈ A0 such that
d(a0 , B 0 ) 6 d(a, B) and d(a, a0 ) 6 αd(a, B).
Denote b0 = f (a0 ) and observe that, since h is an extension:
ψ(a0 ) = φB ◦ f ◦ φ−1 0
A ◦ φA (a )
= φB ◦ f (a0 )
= φB (b0 )
36
and thus
kψ(a) − ψ(b)k2 6 kψ(a) − ψ(a0 )k2 + kφB (b0 ) − φB (b)k2
1
62 1+ DA DB d(a, a0 ) + DB d(b, b0 )
α
1
DA DB αd(a, B) + DB d(b, a) + d(a, a0 ) + d(a0 , b0 )

62 1+
α
6 2(1 + α)DA DB d(a, b) + DB d(a, b) + αd(a, B) + d(a0 , B)

6 2(1 + α)DA DB d(a, b) + (2 + α)DB d(a, b),

0
since d(a , B) 6 d(a, B) 6 d(a, b). Finally, for the lower bound:
kψ(a) − ψ(b)k2 > kφB (b) − φB (b0 )k2 − kψ(a) − ψ(a0 )k2
1
> d(b, b0 ) − 2 1 + DA DB d(a, a0 )
α
> d(b, b0 ) − 2(1 + α)DA DB d(a, B)
and
d(b, b0 ) > d(a, b) − d(a, a0 ) − d(a0 , b0 )
> d(a, b) − αd(a, B) − d(a0 , B)
> d(a, b) − (1 + α)d(a, B),
which finishes the proof.
We are now in position to complete the proof.
Proof of the theorem. Observe that the statement of the lemma above is not symmetric in A, B. Thus,
applying the lemma for the pairs (A, B) and (B, A) we get two maps ψA , ψB : X → `2 that satisfy the
conditions above.
Denote by β = (1 + α)(2DA DB + 1) and γ = θβ, for some θ > 0 to be determined. Now, define
ψ3 : X → R by
(
γd(x, B), x∈A
(100) ψ3 (x) = ,
−γd(x, A), x ∈ B
the signed distance from the opposite side and finally F : X → `2 ⊕ `2 ⊕ R by
(101) F (x) = (ψA (x), ψB (x), ψ3 (x)), x ∈ X.
−1
We will give bounds on kF kLip and kF kLip . For the second quantity, if a1 , a2 ∈ A:
kF (a1 ) − F (a2 )k2 > d(a1 , a2 ),
because of the contribution of ψB and similarly for b1 , b2 ∈ B:
kF (b1 ) − F (b2 )k2 > d(b1 , b2 ).
If now a ∈ A and b ∈ B:
kF (a) − F (b)k22 = kψA (a) − ψA (b)k22 + kψB (a) − ψB (b)k22 + |ψ3 (a) − ψ3 (b)|2
2 2 2
> d(a, b) − βd(a, B) + d(a, b) − βd(b, A) + θ2 β 2 d(a, B) + d(b, A)
= (d − ua )2 + (d − ub )2 + θ2 (ua + ub )2
n o
> min (d − x)2 + (d − y)2 + θ2 (x + y)2
x,y>0
d2 ,
for the optimal value of θ, where d = d(a, b), ua = βd(a, B) and ub = βd(b, A).
Now, to bound kF kLip , if a1 , a2 ∈ A:
kF (a1 ) − F (a2 )k22 = kψA (a1 ) − ψA (a2 )k22 + kψB (a1 ) − ψB (a2 )k22 + γ 2 |d(a1 , B) − d(a2 , B)|2
1 2 2 2 2

6 4 1+ DA DB + DA + γ 2 d(a1 , a2 )2
α
2 2
= Oα (DA DB ) · d(a1 , a2 )2 ,
37
using the previous lemma and that γ = Oα (DA DB ). Similarly we can give bounds for b1 , b2 ∈ B. Finally,
for a ∈ A, b ∈ B:
2
kF (a) − F (b)k22 6 2 2(1 + α)DA DB + (2 + α)DB d(a, b)2 + |ψ3 (a) − ψ3 (b)|2
2 2
) · d(a, b)2 + γ 2 d(a, B)2 + d(b, A)2

= Oα (DA DB
2 2
= Oα (DA DB ) · d(a, b)2 .
Thus, we get c2 (X) . DA DB . 2
Before moving to another topic, two related open problems are the following:
Open problem 9.6. Are there analogues of Theorem 9.1 for p 6= 2?
Open problem 9.7. Suppose that the metric space (X, d) can be written as X = A1 ∪ A2 ∪ ... ∪ Ak ,
where c2 (Ai ) = 1 for every i. How big can c2 (X) be?
38
10. Extensions of Banach space-valued Lipschitz functions
2
Our goal here is to give a self-contained proof of the following theorem, which was originally proved
in [LN05]. The proof below is based on the same ideas as in [LN05], but some steps and constructions
are different, leading to simplifications. The previously best-known bound on this problem was due to
[JLS86].
Theorem 10.1. Suppose that (X, dX ) is a metric space and (Z, k · kZ ) is a Banach space. Fix an integer
n > 3 and A ⊆ X with |A| = n. Then for every Lipschitz function f : A → Z there exists a function
F : X → Z that extends f and
log n
(102) kF kLip . kf kLip .
log log n
By normalization, we may assume from now on that kf kLip = 1. Write A = {a1 , . . . , an }. For
r ∈ [0, ∞) let Ar denote the r-neighborhood of A in X, i.e.,
n
def
[
Ar = BX (aj , r),
j=1
def
where for x ∈ X and r > 0 we denote BX (x, r) = {y ∈ X : dX (x, y) 6 r}. Given a permutation
π ∈ Sn and r ∈ [0, ∞), for every x ∈ Ar let jrπ (x) ∈ {1, . . . , n} be the smallest j ∈ {1, . . . , n} for which
dX (aπ(j) , x) 6 r. Such a j must exist since x ∈ Ar . Define aπr : X → A by

 x if x ∈ A,
def
(103) ∀ x ∈ X, aπr (x) = ajrπ (x) if x ∈ Ar r A,
a1 if x ∈ X r Ar .

We record the following lemma for future use; compare to inequality (3) in [MN07].
Lemma 10.2. Suppose that r > 0 and that x, y ∈ Ar satisfy dX (x, y) 6 r. Then
|{π ∈ Sn : aπr (x) 6= aπr (y)}| |A ∩ BX (x, r − dX (x, y))|
61− .
n! |A ∩ BX (x, r + dX (x, y))|
Proof. Suppose that π ∈ Sn is such that the minimal j ∈ {1, . . . , n} for which aπ(j) ∈ BX (x, r +dX (x, y))
actually satisfies aπ(j) ∈ BX (x, r − dX (x, y)). Hence jrπ (x) = j and therefore aπ(j) = aπr (x). Also,
dX (aπ(j) , y) 6 dX (aπ(j) , x) + dX (x, y) 6 r, so jrπ (y) 6 j. But dX (x, ajrπ (y) ) 6 dX (y, ajrπ (y) ) + dX (x, y) 6
r + dX (x, y), so by the definition of j we must have jrπ (y) > j. Thus jrπ (y) = j, so that aπr (y) =
aπ(j) = aπr (x). We have shown that if in the random order that π induces on A the first element
that falls in the ball BX (x, r + dX (x, y)) actually falls in the smaller ball BX (x, r − dX (x, y)), then
aπr (y) = aπr (x). If π is chosen uniformly at random from Sn then the probability of this event equals
|A ∩ BX (x, r − dX (x, y))|/|A ∩ BX (x, r + dX (x, y))|. Hence,
|{π ∈ Sn : aπr (x) = aπr (y)}| |A ∩ BX (x, r − dX (x, y))|
> .
n! |A ∩ BX (x, r + dX (x, y))|
Corollary 10.3. Suppose that 0 6 u 6 v and x, y ∈ Au satisfy dX (x, y) 6 min{u − dX (x, A), v − u/2}.
Then Z v Z 2t
|{π ∈ Sn : aπr (x) 6= aπr (y)}|

1
dr dt . dX (x, y) log n.
u t t n!
def
Proof. Denote d = dX (x, y) and for r > 0
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
def def
(104) g(r) = and h(r) = log (|A ∩ BX (x, r)|) .
n!
Since x, y ∈ Au and for r > u we have Au ⊆ Ar and dX (x, y) 6 u 6 r, Lemma 10.2 implies that
(105) ∀ r > u, g(r) 6 1 − eh(r−d)−h(r+d) 6 h(r + d) − h(r − d),
where in the last step of (105) we used the elementary inequality 1 − e−α 6 α, which holds for every
α ∈ R.
2This section was typed by Assaf Naor.
39
Note that by Fubini we have
Z v Z 2t Z 2v !
Z min{v,r} Z 2v
1 g(r) min{v, r}
(106) g(r)dr dt = dt dr = g(r) log dr.
u t t u max{u,r/2} t u max{u, r/2}
Since min{v, r}/ max{u, r/2} 6 2, it follows that

(107)
Z v Z 2t (106) Z 2v (105)
Z 2v+d Z 2v−d Z 2v+d Z u+d
1
g(r)dr dt . g(r)dr 6 h(s)ds− h(s)ds = h(s)ds− h(s)ds,
u t t u u+d u−d 2v−d u−d
where the last step of (107) is valid because 2v − d > u + d, due to our assumption that d 6 v − u/2.
Since h is nondecreasing, for every s ∈ [2v − d, 2v + d] we have h(s) 6 h(2v + d), and for every
s ∈ [u − d, u + d] we have h(s) > h(u − d). It therefore follows from (107) that
Z v Z 2t
1 (104) |A ∩ BX (x, 2v + d)|
(108) g(r)dr dt . d h(2v + d) − h(u − d) = d log . d log n,
u t t |A ∩ BX (x, u − d)|
where in the last step of (108) we used the fact that |A ∩ BX (x, 2v + d)| 6 |A| = n, and, due to
our assumption d 6 u − dX (x, A) ( ⇐⇒ dX (x, A) 6 u − d), that A ∩ BX (x, u − d) 6= ∅, so that
|A ∩ BX (x, u − d)| > 1.
Returning to the proof of Theorem 10.1, fix ε ∈ (0, 1/2). Fix also any (2/ε)-Lipschitz function
φε : R → [0, 1] that vanishes outside [ε/2, 1 + ε/2] and φε (s) = 1 for every s ∈ [ε, 1]. Note that
Z 1 Z ∞ Z 1+ε/2
ds φε (s) ds
log(1/ε) = 6 ds 6 6 log(3/ε).
ε s 0 s ε/2 s
Hence, if we define c(ε) ∈ (0, ∞) by
Z ∞
1 def φε (s)
(109) = ds,
c(ε) 0 s
then
1 1
(110) 6 c(ε) 6 .
log(3/ε) log(1/ε)
Define F : X → Z by setting F (x) = f (x) for x ∈ A and
def c(ε)
X Z ∞ 1 2 Z 2t
π
(111) ∀ x ∈ X r A, F (x) = φε dX (x, A) f (ar (x)) dr dt.
n! 0 t2 t t
π∈Sn
By definition, F extends f . Next, suppose that x ∈ X and y ∈ X r A. Fix any z ∈ A that satisfies
dX (x, z) = dX (x, A) (thus if x ∈ A then z = x). We have the following identity.
F (y) − F (x)
(112)

2dX (y,A) 2dX (x,A)
c(ε) X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε t 1{aπr (x)=a}
= (f (a) − f (z)) drdt.
n! 0 t t2
π∈Sn a∈A
To verify the validity of (112), note that for every w ∈ X r A we have
(113)

2dX (w,A) 2dX (w,A)
c(ε) X X ∞ 2t φε 1{aπr (w)=a} c(ε) X ∞ 2t φε
Z Z Z Z
t t
f (z)drdt = f (z)drdt
n! 0 t t2 n! t2
π∈Sn a∈A π∈Sn 0 t
Z ∞ Z ∞
1 2 (∗) φε (s) (109)
= c(ε) φε dX (w, A) dt f (z) = c(ε) ds f (z) = f (z),
0 t t 0 s
where in (∗) we made the change of variable s = 2dX (w, A)/t, which is allowed since dX (w, A) > 0.
Due to (113), if x, y ∈ X r A then (112) is a consequence of the definition (111). If x ∈ A (recall that
y ∈ X r A) then z = x and φε (2dX (x, A)/t) = 0 for all t > 0. So, in this case (112) follows once more
from (113) and (111).
40
By (112) we have
kF (x) − F (y)kZ

2dX (y,A) 2dX (x,A)
Z Z
6 kf (a) − f (z) kZ drdt
n! 0 t t2
π∈Sn a∈A
(114)

2dX (y,A) 2dX (x,A)
Z Z
6 dX (a, z)drdt
n! 0 t t2
π∈Sn a∈A
(115)

2dX (y,A) 2dX (x,A)
2c(ε) X X ∞ 2t φε
Z Z
6 dX (x, a)drdt,
n! 0 t t2
π∈Sn a∈A
where in (114) we used the fact that kf kLip = 1 and in (115) we used the fact that for every a ∈ A we
have dX (a, z) 6 dX (a, x) + dX (x, z) 6 2dX (a, x), due to the choice of z as the point in A that is closest
to x.
To estimate (115), fix t > 0 and r ∈ [t, 2t]. If φε (2dX (y, A)/t)1{aπr (y)=a} 6= φε (2dX (x, A)/t)1{aπr (x)=a}
then either a = aπr (x) and 2dX (x, A)/t ∈ supp(φε ) or a = aπr (y) and 2dX (y, A)/t ∈ supp(φε ). Recalling
that supp(φε ) ⊆ [ε/2, 1 + ε/2], it follows that either a = aπr (x) and dX (x, A) < t or a = aπr (y) and
dX (y, A) < t. If a = aπr (x) and dX (x, A) < t then since t 6 r it follows that x ∈ Ar , and so the definition
of aπr (x) implies that dX (x, a) = dX (aπr (x), x) 6 r. On the other hand, if a = aπr (y) and dX (y, A) < t
then as before we have dX (y, a) = dX (aπr (y), y) 6 r, and therefore dX (x, a) 6 dX (x, y) + dX (y, a) 6
dX (x, y) + r. We have thus checked that dX (x, a) 6 dX (x, y) + r 6 dX (x, y) + 2t whenever the integrand
in (115) is nonzero. Consequently,
kF (x) − F (y)kZ

2dX (y,A) 2dX (x,A)
(115) 1{aπr (y)=a} + φε 1{aπr (x)=a}
Z Z
t t
6 dX (x, y)drdt
n! 0 t t2
π∈Sn a∈A

2dX (y,A) 2dX (x,A)
Z Z
+ drdt.
n! 0 t t
π∈Sn a∈A

2dX (y,A) 2dX (x,A)
(113) 4c(ε) X X ∞ 2t φε
Z Z
= 4dX (x, y) + drdt.
n! 0 t t
π∈Sn a∈A
Therefore, in order to establish the validity of (102) it suffice to show that we can choose ε ∈ (0, 1/2) so
that
(116)
Z ∞ Z 2t φε 2dX (y,A) 1 π − φ 2dX (x,A)
1{aπr (x)=a}
c(ε) X X t {ar (y)=a} ε t log n
drdt . dX (x, y).
n! 0 t t log log n
π∈Sn a∈A
We shall prove below that for every ε ∈ (0, 1/2] we have

2dX (y,A)
Z Z
t 1{aπr (y)=a} − φε 2dX (x,A)
t 1{aπr (x)=a}
(117) drdt
n! t
π∈Sn a∈A 0 t

1 dX (x, y)
. + log n .
ε log(1/ε)
Once proved, (117) would imply (116), and hence also Theorem 10.1, if we choose ε 1/ log n.
Fix t > 0 and r ∈ [t, 2t] and note that if φε (2dX (y, A)/t)1{aπr (y)=a} 6= φε (2dX (x, A)/t)1{aπr (x)=a} then
{2dX (x, A), 2dX (y, A)/t} ∩ supp(φε ) 6= ∅, implying that max{dX (x, A), dX (y, A)} > εt/4. Hence, since

X 2dX (y, A) 2dX (x, A)
∀ π ∈ Sn , φε 1{aπr (y)=a} − φε 1{aπr (x)=a} 6 2,
t t
a∈A
41
we have

2dX (y,A)
1 X X ∞ 2t φε
Z Z
t 1{aπr (y)=a} − φε 2dX (x,A)
t 1{aπr (x)=a}
drdt
n! t
π∈Sn a∈A 0 t
Z 4ε max{dX (x,A),dX (y,A)} Z 2t
drdt
62
0 t t
max{dX (x, A), dX (y, A)}
.
ε
dX (x, y) + min{dX (x, A), dX (y, A)}
. .
ε
In combination with the upper bound on c(ε) in (110), we therefore have the following corollary (the
constant 35 that appears in it isn’t crucial; it was chosen only to simplify some of the ensuing expressions).
Corollary 10.4. If min{dX (x, A), dX (y, A)} 6 53 dX (x, y) then

Z ∞ Z 2t φε 2dX (y,A) 1 π − φ 2dX (x,A)
1{aπr (x)=a}
c(ε) X X t {ar (y)=a} ε t dX (x, y)
drdt . .
n! 0 t t ε log(1/ε)
π∈Sn a∈A
Corollary 10.4 implies that (117) holds true when min{dX (x, A), dX (y, A)} 6 5dX (x, y)/3. We shall
therefore assume from now on that the assumption of Corollary 10.4 fails, i.e., that
3
(118) dX (x, y) < min{dX (x, A), dX (y, A)}.
5
Define

def 2dX (y, A) 2dX (x, A)
(119) Uε (x, y) = t ∈ (0, ∞) : φε − φε >0 ,
t t
and

def 2dX (y, A) 2dX (x, A)
(120) Vε (x, y) = t ∈ (0, ∞) : φε + φε >0 .
t t
Then, for every π ∈ Sn , t > 0 and r ∈ [t, 2t] we have

X 2dX (y, A) 2dX (x, A)
φε 1{aπr (y)=a} − φε 1{aπr (x)=a}
t t
a∈A

2dX (y, A) 2dX (x, A)
= φε − φε 1{aπr (x)=aπr (y)}
t t

2dX (y, A) 2dX (x, A)
+ φε + φε 1{aπr (x)6=aπr (y)}
t t
dX (x, y)
(121) . 1Uε (x,y) + 1Vε (x,y) · 1{aπr (x)6=aπr (y)} .
εt
Where in (121) we used the fact that φε is (2/ε)-Lipschitz and that |dX (x, A) − dX (y, A)| 6 dX (x, y).
Consequently, in combination with the upper bound on c(ε) in (110), it follows from (121) that

2dX (y,A)
Z Z
t 1{aπr (y)=a} − φε 2dX (x,A) t 1{aπr (x)=a}
(122) drdt
n! t
π∈Sn a∈A 0 t
Z 2t
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
Z Z
dX (x, y) dt 1 1
. + dr dt.
ε log(1/ε) Uε (x,y) t log(1/ε) Vε (x,y) t t n!
To bound the first term in (122), denote
def def
m(x, y) = min{dX (x, A), dX (y, A)} and M (x, y) = max{dX (x, A), dX (y, A)}.
If t ∈ [0, ∞) satisfies t < 2m(x, y)/(1 + ε/2) then min{2dX (x, A)/t, 2dX (y, A)/t} > 1 + ε/2, and there-
fore by the definition of φε we have φε (2dX (x, A)/t) = φε (2dX (y, A)/t) = 0. Similarly, if t ∈ [0, ∞)
satisfies t > 4M (x, y)/ε then max{2dX (x, A)/t, 2dX (y, A)/t} < ε/2, and therefore by the definition of
φε we also have φε (2dX (x, A)/t) = φε (2dX (y, A)/t) = 0. Finally, if 2M (x, y) 6 t 6 2m(x, y)/ε then
42
2dX (x, A)/t, 2dX (y, A)/t ∈ [ε, 1], so by the definition of φε we have φε (2dX (x, A)/t) = φε (2dX (y, A)/t) =
1. By the definition of Uε (x, y) in (119), we have thus shown that

2m(x, y) 2m(x, y) 4M (x, y)
Uε (x, y) ⊆ , 2M (x, y) ∪ , .
1 + ε/2 ε ε
Consequently,
Z Z 2M (x,y) Z 4M (x,y)
dt dt ε dt 2M (x, y)
(123) 6 + . log . 1,
Uε (x,y) t t t m(x, y)
2m(x,y) 2m(x,y)
1+ε/2 ε
where the last step of (123) holds true because, due to the triangle inequality and (118), we have
3
M (x, y) 6 dX (x, y) + m(x, y) < m(x, y) + m(x, y) . m(x, y).
5
To bound the second term in (122), note that by the definition of Vε (x, y) in (120) and the choice of
φε ,
h
2dX (x, A) 2dX (y, A) ε εi
(124) t ∈ Vε (x, y) =⇒ , ∩ ,1 + 6= ∅.
t t 2 2
Hence,

2dX (x, A) 4dX (x, A) 2dX (y, A) 4dX (y, A)
(125) Vε (x, y) ⊆ , ∪ , ,
1 + ε/2 ε 1 + ε/2 ε
and therefore, using the notation for g : [0, ∞) → [0, 1] that was introduced in (104),
Z Z 2t Z 4dX ε(x,A) Z 2t Z 4dX ε(y,A) Z 2t
1 1 1
(126) g(r)dr dt 6 g(r)dr dt + g(r)dr dt.
Vε (x,y) t t 2dX (x,A)
1+ε/2
t t 2dX (y,A)
1+ε/2
t t
We wish to use Corollary 10.3 to estimate the two integrals that appear in the right hand side of (126).
To this end we need to first check that the assumptions of Corollary 10.3 are satisfied. Denote ux =
2dX (x, A)/(1 + ε/2) and uy = 2dX (y, A)/(1 + ε/2). Since ux > dX (x, A) we have x ∈ Aux . Analogously,
y ∈ Auy . Also,
(118)
3 4 + 2ε
(127) dX (y, A) 6 dX (x, y) + dX (x, A) 6 dX (x, A) + dX (x, A) = ux 6 ux ,
5 5
where the last step of (127) is valid because ε 6 1/2. From (127) we see that y ∈ Aux , and the symmetric
argument shows that x ∈ Auy . It also follows from (127) that dX (x, y) 6 ux −dX (x, A), and by symmetry
also dX (x, y) 6 uy − dX (y, A). Next, denote vx = 4dX (x, A)/ε and vx = 4dX (y, A)/ε. In order to verify
the assumptions of Corollary 10.3, it remains to check that dX (x, y) 6 min{vx −ux /2, vy −uy /2}. Indeed,
3
dX (x, y) (118) 3dX (x, A)/5 5 3ε(1 + ε/2)
< = 4 1 = < 1,
vx − ux /2 vx − ux /2 ε − 1+ε/2
5(4 + ε)
and the symmetric argument shows that also dX (x, y) < vy −uy /2. Having checked that the assumptions
of Corollary 10.3 hold true, it follows from (126) and Corollary 10.3 that
Z 2t
|{π ∈ Sn : aπr (x) 6= aπr (y)}|
Z
1
(128) dr dt . dX (x, y) log n.
Vε (x,y) t t n!
The desired estimate (117) now follows from a substitution of (123) and (128) into (122).
References
[JLS86] W.B. Johnson, J. Lindenstrauss and G. Schechtman. Extensions of Lipschitz maps into Banach
spaces. Israel J. Math., 54(2):129–138, 1986.
[LN05] J. R. Lee and A. Naor. Extending Lipschitz functions via random metric partitions. Invent.
Math., 160(1):59–95, 2005.
[MN07] M. Mendel and A. Naor. Ramsey partitions and proximity data structures. J. Eur. Math. Soc.
(JEMS), 9(2):253–275, 2007.
43
11. Ball’s extension theorem
Our goal in this section is to present a fully nonlinear version of K. Ball’s extension theorem. The
result we present is the must general Lipschitz extension theorem currently known. The motivation for
the theorem is a classical result of Maurey concerning the extension and factorization of linear maps
between certain classes of Banach spaces.
11.1. Markov type and cotype. From now on, we will denote by ∆n−1 the n-simplex, i.e.
n n
X o
(129) ∆n−1 = (x1 , ..., xn ) ∈ [0, 1]n : xi = 1 ,
i=1
which we think as the space of all probability measures on the set {1, 2, ..., n}. A (row) stochastic matrix
is a square matrix A ∈ Mn (R) with non-negative entries, all the rows of which add up to 1. Given a
measure π ∈ ∆n−1 , a stochastic matrix A = (aij ) ∈ Mn (R) will be called reversible relative to π if for
every 1 6 i, j 6 n:
(130) πi aij = πj aji .
The following is one of the two key definitions of this section.
Definition 11.1 (Ball, 1992). A metric space (X, d) has Markov type p ∈ (0, ∞) with constant M ∈
(0, ∞) if for every t, n ∈ N, every π ∈ ∆n−1 , every stochastic matrix A ∈ Mn (R) reversible relative to π
and every x1 , ..., xn ∈ X the following inequality holds:
n
X n
X
(131) πi (At )ij d(xi , xj )p 6 M p t πi aij d(xi , xj )p .
i,j=1 i,j=1
The infimum over M such that this holds is the Markov type p-constant of X and is denoted by Mp (X).
Interpretation: The above notion has an important probabilistic interpretation, which we will now men-
tion without getting into many details. A process {Zt }∞
t=0 with values in {1, 2, ..., n} is called a stationary
and reversible Markov chain with respect to π and A, if:
(i) For every t ∈ N and every 1 6 i 6 n it holds P(Zt = i) = πi and
(ii) for every t ∈ N and every 1 6 i, j 6 n it holds P(Zt = j|Zt = i) = aij .
The Markov type p inequality can be equivalently written as follows: for every stationary and reversible
Markov chain on {1, 2, ..., n}, every f : {1, 2, ..., n} → X and every t ∈ N the following inequality holds:
E d(f (Zt ), f (Z0 ))p 6 M p tE d(f (Z1 ), f (Z0 ))p .

(132)
From the above interpretation, it follows trivially that every metric space has Markov type 1 with
constant 1. A slightly more interesting computation is the following:
Lemma 11.2. Every Hilbert space H has Markov type 2 with M2 (H) = 1.
Proof. Consider some π ∈ ∆n−1 and a stochastic matrix A ∈ Mn (R) reversible relative to π. We want
to show that for every x1 , ..., xn ∈ H and for every t > 0,
n
X n
X
t 2
πi (A )ij kxi − xj k 6 t πi aij kxi − xj k2 .
i,j=1 i,j=1
Since the inequality above concerns only squares of the norms of vectors in a Hilbert space, it is enough
to be proven if x1 , ..., xn ∈ R, i.e. coordinatewise. Consider the inner product space L2 (π); that is Rn
with the dot product
n
X
hz, wi = πi zi wi .
i=1
44
If we denote by x the vector (x1 , x2 , ..., xn )T we see that
n
X
LHS = πi (At )ij (xi − xj )2
i,j=1
Xn
= πi (At )ij (x2i − 2xi xj + x2j )
i,j=1
X n n
X
= πi x2i t
− 2hA x, xi + πj (At )ji x2j
i=1 i,j=1
n
X
=2 πi x2i − hAt x, xi = 2h(I − At )x, xi.
i=1
t
Observe: we used that A is stochastic, given that A is, and that it is also reversible relative to π. On
the other hand, setting t = 1 the same calculation implies that
RHS = 2h(I − A)x, xi.
Hence, we must prove the inequality
(∗). h(I − At )x, xi 6 th(I − A)x, xi
A simple calculation shows that the reversibility of A is equivalent to the fact that A is self-adjoint as an
element of L2 (π) – thus diagonizable with real eigenvalues. So it is enough to prove (∗) for eigenvectors
x 6= 0: Ax = λx; that is
1 − λt 6 t(1 − λ).
To prove this we observe that, since A is stochastic, for every y ∈ Rn :
n
X
kAykL1 (π) = πi (Ay)i
i=1
Xn n
X
= πi aij yj
i=1 j=1
Xn
6 πi aij |yj |
i,j=1
Xn
= πj aji |yj |
i,j=1
X n
= πj |yj | = kykL1 (π) .
j=1
So, applying this inequality for x, we get that |λ| 6 1. Thus

1 − λt
= 1 + λ + ... + λt−1 6 t.
1−λ

As an application of this simple computation we give the following result that we have proven before
using Fourier Analysis:
√
Proposition 11.3. For the Hamming cube (Fn2 , k · k1 ), it holds c2 (Fn2 ) & n.
Proof. Let {Zt }∞ n
t=0 be the standard random walk on F2 ; that is, we start from a random vertex of the
cube and at each step move to a neighbouring vertex with equal probability 1/n. For t 6 n4 , there is
probability at least 34 to move away from Z0 at the t-th step, thus
1
EkZt − Z0 k1 > EkZt−1 − Z0 k1 + .
2
Iterating this inequality we deduce that:
EkZt − Z0 k1 > t/2.
45
Now, consider an embedding f : Fn2 → `2 with
kx − yk1 6 kf (x) − f (y)k2 6 Dkx − yk1 , x, y ∈ Fn2 .
By the Markov type 2 property of `2 , we deduce
Ekf (Zt ) − f (Z0 )k22 6 tEkf (Z1 ) − f (Z0 )k22 6 tD2 .
On the other hand, since
2
Ekf (Zt ) − f (Z0 )k22 > EkZt − Z0 k21 > EkZt − Z0 k1 & t2 ,
√ √
we get D & t which, for t = n4 , implies D & n.
Of course, this proof has the drawback that it does not compute the exact value of c2 (Fn2 ) (as Enflo’s
proof did). However, it is way more robust since it does not rely on the structure of the cube as a
group. For example, one can use a similar proof to see that for arbitrary large subsets of the cube similar
distortion estimates hold.
Now we present the dual notion of Markov type:
Definition 11.4. A metric space (X, d) has metric Markov cotype p ∈ (0, ∞) with constant N ∈ (0, ∞)
if for every t, n ∈ N, every π ∈ ∆n−1 , every stochastic matrix A ∈ Mn (R) reversible relative to π and
every x1 , ..., xn ∈ X there exist y1 , ..., yn ∈ X such that
n
X n
X n
X t
1 X
p p p
(133) πi d(xi , yi ) + t πi aij d(yi , yj ) 6 N πi As d(xi , xj )p .
i=1 i,j=1 i,j=1
t s=1
ij
The infimum over N such that this holds is the metric Markov cotype p-constant of X and is denoted
by Np (X).
Explanation. In an effort to dualize (i.e. reverse the inequalities in) the definition of Markov type, we
would like to define Markov cotype by the inequality
n
X n
X
t p −p
πi (A )ij d(xi , xj ) > N t πi aij d(xi , xj )p ,
i,j=1 i,j=1
which, in the language of Markov chains, takes the form:

E d(f (Zt ), f (Z0 ))p > N −p tE d(f (Z1 ), f (Z0 ))p .

One can easily see though that this cannot be true in any metric space X which is not a singleton, by
considering a process equally distributed between two points in the metric space. Thus, the existential
quantifier added to the definition is needed for it to make sense. The fact that the power At is replaced
by the Cesàro average on the RHS, is just for technical reasons.
We close this section by introducing a notion that will be useful in what follows:
Definition 11.5. Let (X, d) be a metric space and PX the space of all finitely supported probability
measures in X, i.e. measures µ of the form
n
X n
X
µ= λi δxi , xi ∈ X, λi ∈ [0, 1] and λi = 1.
i=1 i=1
(X, d) is called Wp -barycentric with constant Γ ∈ (0, ∞) if there exists a map B : PX → X such that
(i) For every x ∈ X, B(δx ) = x and
(ii) the following inequality holds true:
X n X n p n
X
(134) d B λi δxi , B λi δyi 6 Γp λi d(xi , yi )p ,
i=1 i=1 i=1
Pn
for every x1 , ..., xn , y1 , ..., yn ∈ X and λ1 , ..., λn ∈ [0, 1] with i=1 λi = 1.
Remarks. (i) If a metric space (X, d) is Wp -barycentric, then it is also Wq -barycentric for any q > p
with the same constant.
(ii) Every Banach space is W1 -barycentric with constant 1.
46
11.2. Statement and proof of the theorem. The promised Lipschitz extension theorem is the fol-
lowing:
Theorem 11.6 (generalized Ball extension theorem, Mendel-Naor, 2013). Let (X, dX ) and (Y, dY ) be
metric spaces and p ∈ (0, ∞) such that:
(i) (X, dX ) has Markov type p;
(ii) (Y, dY ) is Wp barycentric with constant Γ and has metric Markov cotype p.
Consider a subset Z ⊆ X and a Lipschitz function f : Z → Y . Then, for every finite subset S ⊆ X there
exists a function F ≡ F S : S → Y such that:
(a) F |S∩Z = f |S∩Z and
(b) kF kLip . ΓMp (X)Np (Y )kf kLip .
In most reasonable cases, this finite extension result gives a global extension result. We illustrate this
fact by the following useful example:
Corollary 11.7. Let X and Y be Banach spaces satisfying (i) and (ii) of the previous theorem such that
Y is also reflexive. Then e(X, Y ) . Mp (X)Np (Y ).
Proof. Consider a subset Z ⊆ X and a function f : Z → Y . We can assume that 0 ∈ Z, after translation.
By the previous theorem, for every finite set S ⊆ X there exists a function F S : S → Y which agrees
with f on S ∩ Z and also satisfies
def
kF S kLip . Mp (X)Np (Y ) = K.
def
For every such S, define the vector bS = (bSx )x∈X ∈ x∈X (KkxkBY ) = X by the equalities
Q
(
F S (x), x ∈ S
bSx = .
0, x∈/S
Since Y is reflexive, X is compact in the (weak) product topology and thus there exists a limit point
b ∈ X of the net (bS )S⊆X finite . Define now F : X → Y by F (x) = bx for x ∈ X. Obviously, F extends
f and also, for x, y ∈ X:
kF (x) − F (y)k = kbx − by k
= weak− lim[F S (x) − F S (y)]
S
6 lim sup kF S (x) − F S (y)k
S
. Kkx − yk,
that is, kF kLip . K.
We will now proceed with the proof of the theorem. In the sections that follow, we will prove that
many Banach spaces and metric spaces satisfy the conditions (i) and (ii) above; thus we will get concrete
extension results.
The key lemma for the proof of Theorem 11.6 is the following:
Lemma 11.8 (Dual extension criterion). Consider (X, dX ) and (Y, dY ) two metric spaces such that
Y is Wp -barycentric with constant Γ, Z ⊆ X, f : Z → Y and some ε ∈ (0, 1). Suppose that there
exists a constant K > 0 such that for every n ∈ N, every x1 , ..., xn ∈ X and every symmetric matrix
H = (hij ) ∈ Mn (R) with positive entries there exists a function ΦH : {x1 , ..., xn } → Y such that the
following hold:
(i) ΦH |{x1 ,...,xn }∩Z = f |{x1 ,...,xn }∩Z ;
(ii)
n n
X p X
(135) hij dY ΦH (xi ), ΦH (xj ) 6 K p kf kpLip hij dX (xi , xj )p .
i,j=1 i,j=1
Then, there exists a function F : {x1 , ..., xn } → Y such that:

(a) F |{x1 ,...,xn }∩Z = f |{x1 ,...,xn }∩Z ;
(b) kF kLip 6 (1 + ε)ΓKkf kLip .
47
Proof. Fix x1 , ..., xn ∈ X. Now, consider the sets C, D ⊆ Mn (R) defined by
n o
C = dY (Φ(xi ), Φ(xj ))p ij : Φ : {x1 , ..., xn } → Y with Φ|{x1 ,...,xn }∩Z = f |{x1 ,...,xn }∩Z

and

D = M = (mij ) ∈ Mn (R) : M symmetric and mij > 0
def
and denote E = conv(C + D). Observe that E is a closed convex set of symmetric matrices. Consider
the matrix T = (tij ) ∈ Mn (R) with tij = K p kf kpLip dX (xi , xj )p .
Claim. It suffices to prove that T ∈ E.
Indeed, if T ∈ E, there exist λ1 , ..., λm ∈ [0, 1] which add up to 1 and functions Φ1 , ..., Φm :
{x1 , ..., xn } → Y such that each Φi agrees with f on {x1 , ..., xn } ∩ Z and furthermore
(1 + ε)p tij = (1 + ε)p K p kf kpLip dX (xi , xj )p
m
X
> λk dY (Φk (xi ), Φk (xj )), for every i, j.
k=1
Pm
Now for 1 6 i 6 n define the measure µi = k=1 λk δΦk (xi ) and the function F : {x1 , ..., xn } → Y
by F (xi ) = B(µi ), where B is the barycenter map. Notice that F indeed extends f and also from the
Wp -barycentricity of Y :
m
X
dY (F (xi ), F (xj ))p 6 Γp λk dY (Φk (xi ), Φk (xj ))p
k=1
6 (1 + ε)p Γp K p kf kpLip dX (xi , xj )p ,
which is the inequality (b). 2
Proof of T ∈ E. Suppose, for contradiction, that T ∈ / E. Then, there exists a symmetric matrix
H = (hij ) ∈ Mn (R) so that
Xn Xn
inf hij mij > hij tij .
M =(mij )∈E
i,j=1 i,j=1
Applying the above for mij = M δij , for large M > 0, we see that hij > 0 and since C ⊆ E, for every
(cij ) = dY (Φ(xi ), Φ(xj ))p ∈ C:
X X
hij dY (Φ(xi ), Φ(xj ))p > K p kf kpLip hij dX (xi , xj )p ,
i,j i,j
which is a contradiction by the definition of C and (ii).
To finish the proof of Ball’s theorem, we will also need the following technical lemma:
Lemma 11.9 (Approximate convexity lemma). Fix m, n ∈ N and p ∈ [1, ∞) and let B = (bij ) ∈
Mn×n (R) and C ∈ Mn (R) be (row) stochastic matrices such that C is reversible relative to some π ∈
∆n−1 . Then for every metric space (X, dX ) and every z1 , ..., zm ∈ X there exist w1 , ..., wn ∈ X such that
nX n Xm n
X o m
X
(136) max π b d
i ir X (w ,
i rz ) p
, π c
i ij d(w i , wj )p
6 3 p
(B ∗ Dπ CB)rs dX (zr , zs )p ,
i=1 r=1 i,j=1 r,s=1
where Dπ = diag(π1 , ..., πn ).

Proof of Theorem 11.6. Let M > Mp (X) and N > Np (Y ). Fix m, n ∈ N and take distinct x1 , ..., xn ∈
X rZ and z1 , ..., zm ∈ Z. We will prove that the assumptions of the dual extension criterion are satisfied.
Consider a symmetric matrix H = (hij ) ∈ Mn+m (R) with nonnegative entries and write

V (H) W (H)
H=
W (H)∗ U (H)
where U (H) = (urs ) ∈ Mm (R), V (H) = (vij ) ∈ Mn (R) and W (H) = (wir ) ∈ Mn×m (R). Also, denote
by
m n X
m n
def
X X X
RH = urs dX (zr , zs )p + 2 wir dX (xi , zr )p + vij dX (xi , xj )p
r,s=1 i=1 r=1 i,j=1
48
and for y1 , ..., yn ∈ Y
m n X
m n
def X p
X
p
X
LH (y1 , ..., yn ) = urs dY (f (zr ), f (zs )) + 2 wir dY (yi , f (zr )) + vij dY (yi , yj )p .
r,s=1 i=1 r=1 i,j=1
By the dual extension criterion, it suffices to prove the following:

Claim. There exist y1 , ..., yn ∈ Y such that
(∗) LH (y1 , ..., yn ) 6 ΛRH ,
18p
where Λ = 3 (N
p
+ 1)M p kf kpLip .
Fix some δ > 0. To prove the claim, firstly observe that
m
X m
X
urs dY (f (zr ), f (zs ))p 6 kf kpLip urs dX (zr , zs )p
r,s=1 r,s=1
thus it is enough to show that there exist y1 , ..., yn ∈ Y such that

n X
X m n
X Xn X
m n
X
p p p
2 wir dY (yi , f (zr )) + vij dY (yi , yj ) 6 Λ 2 wir dX (xi , zr ) + vij dX (xi , xj )p + δ
i=1 r=1 i,j=1 i=1 r=1 i,j=1
Since the terms vii plays no role in the above inequality, we can assume that vii = 0 for every 1 6 i 6 n.
Then, for large enough t > 0, there exists a θ > 0 and π ∈ ∆n−1 such that
(•) wir = θπi bir for every i, r
and
(••) vij = θtπi aij for every i 6= j,
where B = (bir ) ∈ Mn×m (R) and A = (aij ) ∈ Mn (R) are stochastic matrices with A reversible relative
to π.
Indeed, (•) and (••) are obvious for
n X m Pm
def def r=1 wir def wir
X
θ = wir , πi = , bir = Pm
i=1 r=1
θ s=1 wis
and Pn
def 1 j=1 vij def 1 vij
aii = 1 − · Pm , aij = · Pm if i 6= j.
t r=1 wir t r=1 wir
Thus, after this change of parameters:
Xn X
m n
X
LHS = θ 2 πi bir dY (yi , f (zr ))p + t πi aij dY (yi , yj )p .
i=1 r=1 i,j=1
Pτ
Denote by τ = dt/2p e and Cτ (A) = τ1 s=1 As . Since Cτ (A) is reversible relative to π, the approximate
convexity lemma guarantees the existence of w1 , ..., wn ∈ Y such that
n X
nX m n
X o
p
max πi bir dY (wi , f (zr )) , πi (Cτ (A))ij dY (wi , wj )p
i=1 r=1 i,j=1
(†) m
X
6 3p (B ∗ Dπ Cτ (A)B)rs dY (f (zr ), f (zs ))p .
r,s=1
Now, from the definition of metric Markov cotype, there exist y1 , ..., yn ∈ Y such that
n
X n
X n
X
(‡) dY (wi , yi )p + τ πi aij dY (yi , yj )p 6 N p πi (Cτ (A))ij dY (wi , wj )p .
i=1 i,j=1 i,j=1
We will prove that y1 , ..., yn work for large enough t. For the first term, since
dY (yi , f (zr ))p 6 2p−1 dY (yi , wi )p + dY (wi , f (zr ))p

49
we get
n X
X m
2 πi bir dY (yi , f (zr ))p
i=1 r=1
n X
X m n X
X m
6 2p πi bir dY (yi , wi )p + 2p πi bir dY (wi , f (zr ))p
i=1 r=1 i=1 r=1
n
X n X
X m
= 2p πi dY (yi , wi )p + 2p πi bir dY (wi , f (zr ))p .
i=1 i=1 r=1
Hence, combining the above,
LHS
6
θ
n
X n
X n X
X m
6 2p πi dY (yi , wi )p + t πi aij dY (yi , yj )p + 2p πi bir dY (wi , f (zr ))p
i=1 i,j=1 i=1 r=1
(‡) n
X n X
X m
6 (2N )p πi (Cτ (A))ij dY (yi , yj )p + 2p πi bir dY (wi , f (zr ))p
i,j=1 i=1 r=1
(†) m
X
6 6p (N p + 1) (B ∗ Dπ Cτ (A)B)rs dY (f (zr ), f (zs ))p
r,s=1
m
X
6 6p (N p + 1)kf kpLip (B ∗ Dπ Cτ (A)B)rs dX (zr , zs )p .
r,s=1
Observe now that the last quantity, depends only on the metric space (X, dX ). Using the identity
n
X
∗
(B Dπ Cτ (A)B)rs = πi bir bjs (Cτ (A))ij
i,j=1
and the triangle inequality
dX (zr , zs )p 6 3p−1 dX (zr , xi )p + dX (xi , xj )p + dX (xj , zs )p

we can continue writing:

LHS 18p p
6 (N + 1)kf kpLip (S1 + S2 + S3 ),
θ 3
where:
n
X m
X
S1 = πi bir bjs (Cτ (A))ij dX (zr , xi )p
i,j=1 r,s=1
Xn X m
= πi bir (Cτ (A))ij dX (zr , xi )p
i,j=1 r=1
X n Xm
= πi bir dX (zr , xi )p
i=1 r=1
n X m
1 X
= wir dX (xi , zr )p ;
θ i=1 r=1
n
X m
X
S3 = πi bir bjs (Cτ (A))ij dX (zs , xj )p = S1
i,j=1 r,s=1
50
and finally, using the Markov type property of X:
n
X m
X
S2 = πi bir bjs (Cτ (A))ij dX (xi , xj )p
i,j=1 r,s=1
Xn
= πi (Cτ (A))ij dX (xi , xj )p
i,j=1
τ n
1 X X
= πi (Aσ )ij dX (xi , xj )p
τ σ=1 i,j=1
τ
1 X
6 σM πi aij dX (xi , xj )p
p
τ σ=1
n
τ (τ + 1) p X
= M πi aij dX (xi , xj )p
2τ i,j=1
n
τ + 1 p X vij
= M dX (xi , xj )p
2 i,j=1
θt
n
1 p X
6 M dX (xi , xj )p .
θ i,j=1
Patching everything together, (∗) has been proven and the theorem follows. 2
Proof of the approximate convexity lemma. Let f : {z1 , ..., zm } → `∞ be any isometric embedding and
define
m
X
yi = bir f (zr ), i = 1, 2, ..., n.
r=1
For every i choose any wi ∈ {z1 , ..., zm } such that
kyi − f (wi )k∞ = min kyi − f (z)k∞ .

z∈{z1 ,...,zm }
Bounding the second term. By the triangle inequality:
dX (wi , wj )p = kf (wi ) − f (wj )kp∞ 6 3p−1 kf (wi ) − yi kp∞ + kyi − yj kp∞ + kyj − f (wj )kp∞

and the reversibility of C we deduce that
n
X n
X n
X
p p−1
πi cij dX (wi , wj ) 6 3 πi cij kyi − yj kp∞ +2·3 p−1
πi kyi − f (wi )kp∞ .
i,j=1 i,j=1 i=1
Now, we compute:
n
X n
X m
X p
πi cij kyi − yj kp∞ =

πi cij bir bjs f (zr ) − f (zs )
∞
i,j=1 i,j=1 r,s=1
Xn m
X
6 πi cij bir bjs kf (zr ) − f (zs )kp∞
i,j=1 r,s=1
Xm
∗
= (B Dπ CB)rs dX (zr , zs )p
r,s=1
51
and since kyi − f (wi )kp∞ 6 kyi − f (zr )kp∞ for every i, r and
P
r (CB)ir = 1 for every i, we have
n
X n X
X m
πi kyi − f (wi )kp∞ = πi (CB)ir kyi − f (wi )kp∞
i=1 i=1 r=1
n X
X m
6 πi (CB)ir kyi − f (zr )kp∞
i=1 r=1
n X
X m m
X p

= πi (CB)ir bis f (zs ) − f (zr )
∞
i=1 r=1 s=1
Xn X m
6 πi (CB)ir bis kf (zs ) − f (zr )kp∞
i=1 r,s=1
m
X
∗
= (B Dπ CB)rs dX (zr , zs )p .
r,s=1
Bounding the first term. Again, by the triangle inequality

dX (wi , zr )p 6 3p−1 kf (wi ) − yi kp∞ + kyi − yj kp∞ + kyj − f (zr )kp∞

P
and since j cij = 1 for every i, we get
n X
X m n X
X m
πi bir dX (wi , zr )p = πi bir cij dX (wi , zr )p .
i=1 r=1 i,j=1 r=1
Thus, we have to bound the following quantities:

n X m n
def
X X
A = πi bir cij kf (wi ) − yi kp∞ = πi kf (wi ) − yi kp∞ ;
i,j=1 r=1 i=1
n m n
def
X X X
B = πi bir cij kyi − yj kp∞ = πi cij kyi − yj kp∞
i,j=1 r=1 i,j=1
and
m
n X
def
X
C = πi bir cij kyj − f (zr )kp∞
i,j=1 r=1
Xn X m m
X p

= πi bir cij bjs f (zs ) − f (zr )
∞
i,j=1 r=1 s=1
Xn m
X
6 πi bir bjs cij kf (zr ) − f (zs )kp∞
i,j=1 r,s=1
Xm
∗
= (B Dπ CB)rs dX (zr , zs )p .
r,s=1
Since A and B have also been bounded (in the second term), the lemma has been proven. 2
52
12. Uniform convexity and uniform smoothness
In this section we present the classes of uniformly convex and uniformly smooth Banach spaces, which
we will later relate to Markov type and cotype. Furthermore, we examine these properties in Lp spaces.
12.1. Definitions and basic properties. First, we will survey some general results about uniform
convexity and uniform smoothness (omitting a few proofs). The relation between these notions and the
calculation of Markov type and cotype will appear in the next section.
Definition 12.1. Let (X, k · k) be a Banach space with dim X > 2. The modulus of uniform convexity
of X is the function

x+y
(137) δX (ε) = inf 1 − : kxk = kyk = 1 and kx − yk = ε , ε ∈ (0, 2].
2
We call X uniformly convex if δX (ε) > 0 for every ε ∈ (0, 2].
Examples 12.2. (i) Any Hilbert space H is uniformly convex. Indeed, by the parallelogram identity,
if kxk = kyk = 1 and kx − yk = ε
kx + yk2 = 4 − kx − yk2 = 4 − ε2 ,
q
2
thus δH (ε) = 1 − 1 − ε4 .
(ii) Both `1 and `∞ are not uniformly convex since their unit spheres contain line segments.
An elementary, yet tedious to prove, related result is the following:
Proposition 12.3 (Figiel). For any Banach space (X, k · k), the mapping ε 7→ δX (ε)/ε is increasing.
Using Dvoretzky’s theorem, one can also deduce that:
Proposition 12.4. For any Banach space (X, k · k) the following inequality holds:
r
ε2
(138) δX (ε) . 1 − 1 − = δ`2 (ε)
4
Definition 12.5. Let (X, k · k) be a Banach space. The modulus of uniform smoothness of X is the
function

kx + τ yk + kx − τ yk
(139) ρX (τ ) = sup − 1 : kxk = kyk = 1 , τ > 0.
2
We call X uniformly smooth if ρ(τ ) = o(τ ) as τ → 0+ .
Example 12.6. Again, `1 is not uniformly smooth since for x = e1 and y = e2
kx + τ yk1 + kx − τ yk1
− 1 = τ.
2
The notions of uniform convexity and uniform smoothness are, in some sense, in duality as shown by
the following theorem.
Theorem 12.7 (Lindenstrauss Duality Formulas). For a Banach space X, the following identities hold
true:
nτε o
(140) ρX ∗ (τ ) = sup − δX (ε) : ε > 0 .
2
and
nτε o
(141) ρX (τ ) = sup − δX ∗ (ε) : ε > 0 .
2
Proof. Left as an exercise.
We also mention a general result on uniformly convex and uniformly smooth Banach spaces, whose
proof we omit:
Theorem 12.8 (Milman-Pettis). If a Banach space X is either uniformly convex or uniformly smooth,
then X is reflexive.
Finally, uniform convexity and uniform smoothness admit some quantitative analogues:
53
Definition 12.9. Let (X, k · k) be a Banach space, p ∈ (1, 2] and q ∈ [2, ∞). We say that X is p-smooth
if there exists a constant C > 0 with ρX (τ ) 6 Cτ p . Similarly, we say that X is q-convex if there exists
a constand C > 0 with δX (ε) > Cεq .
Even though dealing with the class of p-smooth (resp. q-convex) Banach spaces seems restricting at
first, the following deep theorem of Pisier shows that it is actually not.
Theorem 12.10 (Pisier, 1975). If X is a uniformly convex (resp. smooth) Banach space, then it admits
an equivalent norm with respect to which it is q-convex (resp. p-smooth) for some q < ∞ (resp. p > 1).
Because of the above result, we will now focus our attention on the classes of p-smooth and q-convex
Banach spaces. A more useful (yet completely equivalent) formulation of these definitions is the following:
Definition 12.11. Let X be a Banach space. The q-convexity constant of X, denoted by Kq (X), is the
infimum of those K > 1 such that
2
(142) 2kxkq + q kykq 6 kx + ykq + kx − ykq , for every x, y ∈ X.
K
Similarly, the p-smoothness constant of X, denoted by Sp (X), is the infimum of those S > 1 such that
(143) kx + ykp + kx − ykp 6 2kxkp + 2S p kykp , for every x, y ∈ X.
It can easily be checked that X is q-convex (resp. p-smooth) if and only if Kq (X) < ∞ (resp.
Sp (X) < ∞).
a+b a−b
Remark. Setting x = 2 and y = 2 , we see that the inequalities (142) and (143) are equivalent to
a+b q 1 a−b q kakq + kbkq
(144) + 6 , for every a, b ∈ X
2 Kq (X)q 2 2
and
kakq + kbkq a+b q a−b q
(145) 6 + Sp (X)p
2 2 2
respectively.
An important fact (already hinted by Lindenstrauss’ formulas above) is that there is a perfect duality
between uniform convexity and smoothness. This is formulated as follows:
Theorem 12.12. Let X be a Banach space. Then Sp (X) = Kq (X ∗ ), where p and q are conjugate
indices, i.e. p1 + 1q = 1. Since any Banach space that is either q-convex or p-smooth is reflexive, this also
gives Kq (X) = Sp (X ∗ ).
Proof. Firstly, we prove that Sp (X) 6 Kq (X ∗ ). Let K > Kq (X ∗ ) and x, y ∈ X. We will prove that
kx + ykp + kx − ykp 6 2kxkp + 2K p kykp .
Consider f, g ∈ SX ∗ such that f (x + y) = kx + yk and g(x − y) = kx − yk and define F, G ∈ X ∗ by
21/q kx + ykp−1 21/q kx − ykp−1
F = 1/q · f and G = 1/q · g.
kx + ykp + kx − ykp kx + ykp + kx − ykp
Now, observe that

F (x + y) + G(x − y) 21/q kx + ykp + kx − ykp
= 1/q
2 2 kx + ykp + kx − ykp
kx + ykp + kx − ykp 1/p
= .
2
Thus:
kx + ykp + kx − ykp 1/p F +G F −G
= (x) + (y)
2 2 2
F +G F −G
6 kxk + kyk
2 2
F +G q 1 F − G q 1/q 1/p
6 + q kxkp + K p kykp
2 K 2
(144) 1
q 1/q
q
1/p
kxkp + K p kykp

6 1/q kF k + kGk
2
1/p
= kxkp + K p kykp ,
54
by the construction of F and G.
Now, instead of proving that Kq (X ∗ ) 6 Sp (X), we will prove that Kq (X) 6 Sp (X ∗ ) which is equivalent
by the reflexivity of X. Let S > Sp (X ∗ ) and x, y ∈ X. We will prove that
2
2kxkq + q kykq 6 kx + ykq + kx − ykq .
S
Again, take f, g ∈ SX ∗ so that f (x) = kxk and g(y) = kyk and define F, G ∈ X ∗ by
kxkq−1 kykq−1
F = 1/p · f and G = · g.
1 1 q 1/p

2 kxkq + S q kyk
q 2S q kxkq + S q kyk
For these functions,
1 1/q
kxkq + q
kykq = (F + G)(x + y) + (F − G)(x − y)
S
6 kF + Gkkx + yk + kF − Gkkx − yk
1/p 1/q
6 kF + Gkp + kF − Gkp kx + ykq + kx − ykq
(143) 1/p 1/q
6 2kF kp + 2S p kGkp kx + ykq + kx − ykq
kx + ykq + kx − ykq 1/q
= ,
2
which is the desired inequality.
12.2. Smoothness and convexity in Lp spaces. Now we will compute the smoothness and convexity
parameters of Lp spaces. The duality given by the last theorem of the previous section allows us to prove
only the convexity constants (or only the smoothness constants). Observe that L1 and L∞ are neither
uniformly smooth nor uniformly convex, since they are not reflexive. The easier part of the computation
is the following:
Theorem 12.13 (Clarkson’s inequality). For 1 < p 6 2, it holds Sp (Lp ) = 1. Equivalently, for
2 6 q < ∞, it holds Kq (Lq ) = 1.
Proof. We will prove the second version of the statement; let q > 2. Since the desired inequality (144)
contains only expressions of the form k · kqq it is enough to be proven for real numbers, i.e. it suffices to
prove
a+b q a−b q |a|q + |b|q
+ 6 , for every a, b ∈ R.
2 2 2
To prove this, we first use the inequality k · k`q 6 k · k`2 to get
a+b q a − b q 1/q a + b 2 a − b 2 1/2
+ 6 +
2 2 2 2
a2 + b2 1/2
= .
2
Now the convexity of t 7→ |t|q/2 gives
a+b q a − b q a2 + b2 q/2
+ 6
2 2 2
|a|q + |b|q
6 ,
2
which implies that Kq (Lq ) = 1.
The above result says that for 1 < p 6 2, Lp is p-smooth and for 2 6 q < ∞, Lq is q-convex. The
converse situation is described by the following theorem:
Theorem 12.14. For 1 < p 6 2, it holds K2 (Lp ) 6 √1 . Equivalently, for 2 6 q < ∞, it holds
√ p−1
S2 (Lq ) 6 q − 1.
Using the formulation of (142), we have to prove that for every f, g ∈ Lp , where 1 < p 6 2, it holds
kf + gk2p + kf − gk2p
kf k2p + (p − 1)kgk2p 6 .
2
In order to prove this, we will need two very useful inequalities:
55
Proposition 12.15 (Bonami-Beckner two-point inequality). Let a, b ∈ R and 1 6 p 6 2. Then
1/2 |a + b|p + |a − b|p 1/p
(146) a2 + (p − 1)b2 6 .
2
Proof. It is enough to prove the inequality when |a| > |b|, because otherwise
a2 + (p − 1)b2 6 b2 + (p − 1)a2
b
and the RHS is symmetric in a and b. Let x = a ∈ [−1, 1]. We have to prove that
p/2 (1 + x)p + (1 − x)p
1 + (p − 1)x2 6 .
2
Using the Taylor expansion, we see that
∞
(1 + x)p + (1 − x)p

X p
= x2k ,
2 2k
k=0
where s = p(p−1)···(p−s+1)
p p

s! . Observe that, since only even terms appear, all the coefficients 2k are
non-negative. Thus
(1 + x)p + (1 − x)p p(p − 1) 2 p/2
>1+ x > 1 + (p − 1)x2 ,
2 2
where we used the general inequality 1 + αy > (1 + y)α , where 0 6 α 6 1.
Proposition 12.16 (Hanner’s inequality). For every f, g ∈ Lp and 1 6 p 6 2 the following inequality
holds true:
p p
(147) kf kp − kgkp + kf kp + kgkp 6 kf + gkpp + kf − gkpp .
For the proof of Hanner’s inequality we will need the following lemma.
Lemma 12.17. For every r ∈ [0, 1] define
α(r) = (1 + r)p−1 + (1 − r)p−1
and
(1 + r)p−1 − (1 − r)p−1
β(r) = .
rp−1
Then, for every A, B ∈ R and r ∈ [0, 1]
(148) α(r)|A|p + β(r)|B|p 6 |A + B|p + |A − B|p .
Proof. We first claim that α(r) > β(r) for every r ∈ [0, 1]. Consider h = α−β and observe that h(1) = 0.
It suffices to prove that h0 6 0. Indeed,
1 1 1
h0 (r) = −(p − 1) p + 1 − 6 0.
r (1 − r)2−p (1 + r)2−p
Again, it suffices to prove the lemma if 0 < B < A. Denote R = B/A ∈ (0, 1) and observe that we must
prove
α(r) + β(r)Rp 6 (1 + R)p + (1 − R)p , r ∈ [0, 1].
p
So, it suffices to prove that F (r) = α(r) + β(r)R attains a global maximum on r = R which is easily
checked by differentiation.
Proof of Hanner’s inequality. Without loss of generality, kgkp 6 kf kp . Applying pointwise the previous
kgk
lemma with A = |f |p , B = |g|p and r = kf kpp we get
α(r)|f |p + β(r)|g|p 6 |f + g|p + |f − g|p .
After integration, we get Hanner’s inequality. 2

56
Proof of Theorem 12.14. Using the concavity of t 7→ tp/2 , Hanner’s inequality and the Bonami-Beckner
inequality we get
!1/2 1/p
kf + gk2p + kf − gk2p kf + gkpp + kf − gkpp
>
2 2
p p !1/p
kf kp − kgkp + kf kp + kgkp
>
2
1/2
> kf k2p + (p − 1)kgk2p ;
that is, exactly, K2 (Lp ) 6 √1 .

p−1
2
57
13. Calculating Markov type and cotype
In this section, we will calculate the Markov type and metric Markov cotype of a large variety of
metric spaces, thus getting concrete applications of Ball’s extension theorem. In particular, we will see
that these notions serve as nonlineaer analogues of uniform smoothness and uniform convexity.
13.1. Uniform smoothness and Markov type. Our first goal is to prove that any p-smooth Banach
space also has Markov type p. To do this, we will use an inequality of Pisier for martingales with values
in a uniformly smooth space. We will also need the following:
Lemma 13.1. Let X be a p-smooth Banach space for some 1 < p 6 2. If Z is an X-valued random
variable then
p p
(149) EkZkp 6 kEZk + 2Sp (X)p EkZ − EZk .
Proof. Write S = Sp (X). Then, for every x, y ∈ X
kx + ykp + kx − ykp 6 2kxkp + 2S p kykp .
Thus, for x = EZ and y = Z − EZ, we get the (pointwise) estimate
p p p
kZkp + k2EZ − Zk 6 2 kEZk + 2S p kZ − EZk .
Taking averages in both sides we get
p
EkZkp + E k2EZ − Zk 6 2kEZkp + 2S p EkZ − EZkp .
But since Jensen’s inequality gives
Ek2EZ − Zkp > k2EZ − EZkp = kEZkp ,
the proof is complete.
We give the following definitions on the setting of barycentric metric spaces, since this is the one in
which we will need in the next subsection. However, we will not need these generalities at the moment.
Let (X, d) be a metric space, PX the space of all finitely-supported probability mesures on X and
B : PX → X a barycenter map; that is a function satisfying B(δx ) = x for every x ∈ X. Suppose also
that (Ω, µ) is a finite probability space and F ⊆ 2Ω is a σ-algebra. Observe that, since Ω is finite, F is
the σ-algebra generated by a partition of Ω. Thus, for a point ω ∈ Ω we can define
(150) F(ω) = the atom of the partition of Ω to which ω belongs.
Definition 13.2. In the above setting, let Z : Ω → X be an X-valued random variable. The conditional
barycenter of Z with respect to F is the X-valued random variable
 
1 X
(151) B(Z F)(ω) = B  µ(a)δZ(a)  .
µ (F(ω))
a∈F (ω)
Observe that B(Z F) is constant on each atom F(ω).

Definition 13.3. In the above setting, suppose we are given a filtration
{∅, Ω} = F0 ⊆ F1 ⊆ ... ⊆ Fn = 2Ω .
A sequence of X-valued random variables {Zj }nj=0 is a martingale with respect to this filtration if
B(Zi Fi−1 ) = Zi−1 for every 1 6 i 6 n.
Warning! The property B(Zi Fj ) = Zj for every j < i, which is true for real-valued martingales (with
the usual conditional expectation), is not true in this setting.
We are now in position to state and prove an important inequality for martingales with values in a
uniformly smooth Banach space.
Theorem 13.4 (Pisier’s martingale inequality for smoothness). Let X be a p-smooth Banach space for
some 1 < p 6 2 and {Mk }nk=0 an X-valued martingale. Then
n
X
(152) EkMn − M0 kp 6 2Sp (X)p EkMk − Mk−1 kp .
k=1
58
Proof. Suppose {Mk }nk=0 is a martingale with respect to the filtration {∅, Ω} = F0 ⊆ F1 ⊆ ... ⊆ Fn = 2Ω .
Applying the previous lemma for the conditional expectation with respect to Fn−1 and the random
variable Mn − M0 , we get the pointwise estimate
E kMn − M0 kp Fn−1 6 kMn−1 − M0 kp + 2Sp (X)p E kMn − Mn−1 kp Fn−1 .

Taking expected values, we have

EkMn − M0 kp 6 EkMn−1 − M0 kp + 2Sp (X)EkMn − Mn−1 kp .
Applying the same lemma for the conditional expectation with respect to Fn−2 and the random variable
Mn−1 − M0 we get (after averaging)
EkMn−1 − M0 kp 6 EkMn−2 − M0 kp + 2Sp (X)p EkMn−1 − Mn−2 kp .
If we continue in the same way and add all the resulting inequalities, we get the desired result.
We state now the crucial lemma for the computation of the Markov type of p-smooth Banach spaces.
Lemma 13.5 (Decomposition lemma). Let X be a Banach space, {Zt }∞ t=0 a stationary reversible Markov
chain on {1, 2, ..., n} and a function f : {1, 2, ..., n} → X. For every t ∈ N there exist two martingales
(t) (t)
{Ms }ts=0 and {Ns }ts=0 (maybe with respect to different filtrations) such that:
(t) (t) (t)
(153) f (Zs+1 ) − f (Zs−1 ) = (Ms+1 − Ms(t) ) − (Nt−s+1 − Nt−s )
and
(t)
(154) EkMs+1 − Ms(t) kp 6 2p Ekf (Z1 ) − f (Z0 )kp
(t)
and the symmetric relation for {Ns } holds.
Using this lemma, we are in position to prove the first big result of the present chapter.
Theorem 13.6 (Naor-Peres-Schramm-Sheffield, 2006). For every 1 6 p 6 2 and every Banach space X,
Mp (X) .p Sp (X).
√
In particular, M2 (Lp ) . p − 1 for p > 2 and Mp (Lp ) . 1 for 1 < p 6 2.
Proof. Fix some t ∈ N, a function f : {1, 2, ..., n} → X and let {Zt }∞
t=0 be a stationary reversible Markov
(t) (t)
chain on {1, 2, ..., n}. Let also {Ms } and {Ns } be the martingales given from the decomposition
lemma.
Assume first that t = 2m is an even number. Then, since
(t) (t) (t)
f (Zs+1 ) − f (Zs−1 ) = (Ms+1 − Ms(t) ) − (Nt−s+1 − Nt−s ),
after summing over s = 1, 3, ..., 2m − 1 we get
t/2
(t) (t) (t) (t)
X
f (Zt ) − f (Z0 ) = (M2k − M2k−1 ) − (N2k − N2k−1 ) ;
k=1
thus
t/2 t/2
p p
(t) (t) (t) (t)
X X
(∗) Ekf (Zt ) − f (Z0 )kp .p E M2k − M2k−1 +E N2k − N2k−1 .
k=1 k=1
Observe now that the sequence
s
(t) (t)
X
Ms0 = (M2k − M2k−1 )
k=1
is also a martingale with respect to the filtration {F2s }∞
s=0 . Hence, by an application of Pisier’s inequality,
we have
t/2 t/2
p
(t) (t) (t) (t)
X X
p
E (M2k − M2k−1 ) . Sp (X) kM2k − M2k−1 kp
(•M )
k=1 k=1
. tSp (X)p Ekf (Z1 ) − f (Z0 )kp
(t)
and similarly for {Ns }. By (∗), (•M ) and the identical (•N ) we get the Markov p-type inequality with
constant Mp (X) . Sp (X).
59
If t = 2m + 1 is an odd number, we just write
f (Zt ) − f (Z0 ) = f (Zt ) − f (Z1 ) + f (Z1 ) − f (Z0 )
and apply the previous estimate on the term f (Zt ) − f (Z1 ).
Proof of the decomposition lemma. Let A = (aij ) be the transition matrix of {Zt }∞ t=0 , which is reversible
relative to π ∈ ∆n−1 . For every f : {1, . . . , n} → X, define the Laplacian of f , Lf : {1, . . . , n} → X by
n
X n
X
(155) (Lf )(i) = aij (f (j) − f (i)) = aij f (j) − f (i).
j=1 j=1
Using this notation, for i = Zs−1 :

E[f (Zs )|Z0 , . . . , Zs−1 ] = E[f (Zs )|Zs−1 = i]
n
X
= aij f (j) = (Lf )(i) + f (i),
j=1
that is:
(156) E[f (Zs )|Z0 , . . . , Zs−1 ] = (Lf )(Zs−1 ) + f (Zs−1 ).
Finally, reversibility implies that
(157) E[f (Zs )|Zs+1 , . . . , Zt ] = (Lf )(Zs+1 ) + f (Zs+1 ).
We can now define
s−1
X t
X
Ms(t) = f (Zs ) − (Lf )(Zr ) and Ns(t) = f (Zt−s ) − (Lf )(Zr )
r=0 r=t−s+1
and check that, indeed

s−1
(t)
X
E[Ms(t) |Z0 , . . . , Zs−1 ] = (Lf )(Zs−1 ) + f (Zs−1 ) − (Lf )(Zr ) = Ms−1
r=0
and
t
(t)
X
E[Ns(t) |Zs+1 , . . . , Zt ] = (Lf )(Zt−s+1 + f (Zt−s+1 ) − (Lf )(Zr ) = Ns−1 ;
r=t−s+1
(t) (t)
i.e. {Ms }ts=0 and {Ns }ts=0 are martingales. Now, (153) directly follows from the identities:
(t)
Ms+1 − Ms(t) = f (Zs+1 ) − f (Zs ) − (Lf )(Zs )
and
(t)
Ns+1 − Ns(t) = f (Zt−s+1 ) − f (Zt−s ) − (Lf )(Zt−s ).
Finally, to prove (154) first observe that:
n
X
Ek(Lf )(Zs )kp = πi k(Lf )(i)kp
i=1
n
X n
X
= πi aij (f (j) − f (i))
i=1 j=1
n
X
6 πi aij kf (j) − f (i)kp
i,j=1
= Ekf (Z1 ) − f (Z0 )kp .
This implies the estimate (154), since
(t)
EkMs+1 − Ms(t) k 6 2p−1 Ekf (Zs+1 ) − f (Zs )kp + 2p−1 Ek(Lf )(Zs )kp 6 2p Ekf (Z1 ) − f (Z0 )kp .
2
We close this section with a digression of independent interest:
Theorem 13.7. For 1 < p 6 2, it holds Mp (Lp ) = 1.
60
Proof. Fix 1 6 q < p < ∞ and (Ω, µ) a measure space. For a function f : Ω → R define T (f ) : Ω×R → C
by
1 − eitf (ω)
(158) T (f )(ω, t) = , ω ∈ Ω, t ∈ R
|t|(q+1)/p
and notice that for f, g ∈ Lq (µ):
p
eitg(ω) − eitf (ω)
Z Z
kT (f ) − T (g)kpLp (µ×m) = dtdµ(ω)
Ω R |t|q+1
p
1 − eit(f (ω)−g(ω))
Z Z
= dtdµ(ω)
Ω R |t|q+1
p
1 − eis
Z Z
1
= q+1 · dsdµ(ω)
Ω R s |f (ω) − g(ω)|
f (ω)−g(ω)
Z |1 − eis |p Z
q
= ds · |f (ω) − g(ω)| dµ(ω) .
R |s|q+1 Ω
| {z }
C(p, q) ∈ (0, ∞)
Thus, after rescaling, there exists a mapping T ≡ Tq,p : Lq → Lp such that for every f, g ∈ Lq :
(159) kT (f ) − T (g)kpp = kf − gkqq
In particular, if p 6 2 there exists a mapping T : Lp → L2 satisfying
kT (f ) − T (g)k22 = kf − gkpp .
Thus, Mp (Lp ) = 1 follows readily from M2 (L2 ) = 1 which we have proven.
√
Open problem 13.8. Is it M2 (Lp ) = p − 1 for p > 2?
13.2. Barycentric metric spaces and metric Markov cotype. We will now identify classes of metric
spaces with metric Markov cotype p. Surprisingly, we will get a very wide class of spaces, including (but
not limited to) p-convex Banach spaces. We start with the analogue of Lemma 13.1 for p-convex spaces:
Lemma 13.9. Let X be a p-convex Banach space for some 2 6 p < ∞. If Z is an X-valued random
variable, then
1
(160) kEZkp + · EkZ − EZkp 6 EkZkp .
(2p−1 − 1)Kp (X)p
Proof. Define θ > 0 by
EkZkp − kEZkp

def p
θ = inf : Z satisfies EkZ − EZk > 0 .
EkZ − EZkp
1
We want to prove that θ > (2p−1 −1)Kp (X)p . Given ε > 0 consider a random variable Z0 such that
(θ + ε) · EkZ0 − EZ0 kp > EkZ0 kp − kEZ0 kp .

For K > Kp (X), since X is p-convex, we get:
Z0 + EZ0 p 2 Z0 − EZ0 p
2 + 6 kZ0 kp + kEZ0 kp
2 Kp 2
which implies, after taking expected values, that
Z0 + EZ0 p 2 Z0 − EZ0 p
EkZ0 kp > 2E + E − kEZ0 kp .
2 Kp 2
Also, from the way we chose Z0 we have:
Z + EZ
0 0
p 1 Z0 − EZ0 p
(θ + ε)EkZ0 − EZ0 kp > 2 E + E − kEZ0 kp .
2 Kp 2
61
Z0 +EZ0
Call Z = 2 and notice that, since EZ = EZ0 , the above can be written as:
1
(θ + ε)EkZ0 − EZ0 kp > 2 EkZkp − kEZkp + EkZ0 − EZ0 kp

2p−1 K p
1
> 2θEkZ − EZkp + p−1 p EkZ0 − EZ0 kp
2 K
θ 1
= p−1 + p−1 p EkZ0 − EZ0 kp .
2 2 K
1
Now, letting ε → 0+ and K → Kp (X)+ implies θ > (2p−1 −1)Kp (X)p .
We will now focus on the class of p-barycentric metric spaces, which, given the previous lemma, will
serve as a nonlinear analogue of p-convex Banach spaces. A small reminder first: for a metric space
(X, d) we denote by PX the space of all finitely supported probability measures on X. A barycenter map
on X is a function B : PX → X satisfying B(δx ) = x. (X, d) will be called WpP -barycentric with P
constant
Γ, for some p > 1, if there exists a barycenter map B such that for every µ = i λi δxi and µ0 = i λi δx0i
it holds
p X
(161) d B(µ), B(µ0 ) 6 Γp λi d(xi , x0i )p .
i
Definition 13.10. Fix some p > 1. A metric space (X, d) will be called p-barycentric with constant
K > 0 if there exists a barycenter map B : PX → X such that for every x ∈ X and µ ∈ PX :
Z Z
p 1 p
(162) d x, B(µ) + p d( y, B(µ) dµ(y) 6 d(x, y)p dµ(y).
K X X
In the next few pages, we will prove that a wide class of metric spaces satisfies the above property for
p = 2. However, let’s first mention our goal - which will be proven later on:
Theorem 13.11 (Mendel-Naor, 2013). Let (X, d) be a metric space that is p-barycentric with constant
K and also Wp -barycentric with constant Γ under the same barycenter map. Then (X, d) has metric
Markov cotype p and Np (X) . KΓ.
Examples 13.12. (i) From Lemma 13.9, it follows readily that a p-convex Banach space is also p-
barycentric with constant K 6 (2p−1 − 1)Kp (X).
(ii) A simply connected, complete Riemannian manifold with non-positive sectional curvature is 2-
barycentric with constant 1 and also W1 -barycentric – thus also W2 -barycentric – with constant 1.
We will later prove that a much wider class of metric spaces satisfies these properties.
Moving towards the second example, we will be working with a special class of metric spaces. A metric
space (X, d) is called geodesic if for every y, z ∈ X there exists a path γ : [0, 1] → X from y to z such
that for every t ∈ [0, 1]
(163) d(y, γ(t)) = td(y, z) and d(x, γ(t)) = (1 − t)d(y, z).
Such paths are called geodesics.
Definition 13.13. A geodesic metric space (X, d) is a CAT(0)-space if for every y, z ∈ X there exists a
geodesic γ : [0, 1] → X from y to z such that for every x ∈ X and t ∈ [0, 1]:
(164) d(x, γ(t))2 6 (1 − t)d(x, y)2 + td(x, z)2 − t(1 − t)d(y, z)2 .
A compete CAT(0)-space is called a Hadamard space.
Examples 13.14. (i) The hyperbolic space is a Hadamard space. In more generality, any Riemannian
manifold with nonpositive sectional curvature is a CAT(0)-space.
(ii) The metric spaces induced by the stortest path distance on trees are CAT(0)-spaces.
Now, let (X, d) be a Hadamard space. For a probability measure µ ∈ PX , define B(µ) ∈ X to be the
point that minimizes the function
Z
def
(165) X 3 z 7−→ Fµ (z) = d(z, y)2 dµ(y).
X
Lemma 13.15. The function Fµ has a unique minimizer.

62
Proof. Let m = inf z Fµ (z) and {zn } a sequence in X with Fµ (zn ) → m. For n, k > 1, denote by wn,k
the midpoint of zn and zk . Then, for y ∈ X:
d(y, zn )2 + d(y, zk )2 1
d(y, wn,k )2 6 − d(zn , zk )2 .
2 4
After integrating, we get:
Fµ (zn ) + Fµ (zk ) 1
m 6 Fµ (wn,k ) 6 − d(zn , zk ).
2 4
Letting n, k → ∞, we deduce that {zn } is Cauchy and thus converges to a limit z ∈ X with Fµ (z) = m,
i.e. z is a minimizer of Fµ . Finally, if z 0 is another minimizer and w is their midpoint we have, as before:
Fµ (z) + Fµ (z 0 ) 1 1
m 6 Fµ (w) 6 − d(z, z 0 ) = m − d(z, z 0 )
2 4 4
which implies that z = z 0 .
The two main properties of the above barycenter map follow:
Proposition 13.16. Any Hadamard space (X, d) is 2-barycentric with constant 1.
Proof. Let B denote the barycenter map defined above and take x ∈ X, µ ∈ PX and a geodesic γ :
[0, 1] → X from x to B(µ). For every y ∈ X:
d(y, γ(t))2 6 td(y, B(µ))2 + (1 − t)d(y, x)2 − t(1 − t)d(x, B(µ))2
which, after integrating gives:
Fµ γ(t) 6 tFµ B(µ) + (1 − t)Fµ (x) − t(1 − t)d(x, B(µ))2 .

Thus,
m 6 tm + (1 − t)Fµ (x) − t(1 − t)d(x, B(µ))2
or equivalently
m 6 Fµ (x) − td(x, B(µ))2 .
For t = 1, this is (162).
Proposition 13.17. Any Hadamard space (X, dX ) is W1 -barycentric with constant 1.
For the proof of this last proposition, we will need a classical inequality:
Lemma 13.18 (Reshetnyak’s inequality). Let (X, d) be a Hadamard space. If y, y 0 , z, z 0 ∈ X, then
(166) d(y, z 0 )2 + d(y 0 , z)2 6 d(y, z)2 + d(y 0 , z 0 )2 + 2d(y, y 0 )d(z, z 0 ).
Proof. Let σ : [0, 1] → X be the geodesic from y to z 0 . Then for t ∈ [0, 1]
d(y 0 , σ(t))2 6 (1 − t)d(y 0 , y)2 + td(y 0 , z 0 )2 − t(1 − t)d(y, z 0 )2
and
d(z, σ(t))2 6 (1 − t)d(z, y)2 + td(z, z 0 )2 − t(1 − t)d(y, z 0 )2 .
Finally, the triangle inequality together with Cauchy-Schwarz imply that:
2
d(y 0 , z)2 6 d(y 0 , σ(t)) + d(z, σ(t))
d(y 0 , σ(t))2 d(z, σ(t))2
6 +
t 1−t
1−t t
6 d(y 0 , y)2 + d(y 0 , z 0 )2 + d(y, z)2 + d(z, z 0 )2 − d(y, z 0 )2 .
t 1−t
d(y,y 0 )
For t = we get Reshetnyak’s inequality.
d(y,y 0 )+d(z,z 0 )
Pn n
Proof of Proposition 13.17. Let µ = i=1 λi δyi , µ0 = i=1 λi δyi0 ∈ PX . Applying Reshetnyak’s inequal-
P
ity for yi , yi0 , B(µ), B(µ0 ) we get:

d(yi , B(µ0 ))2 + d(yi0 , B(µ))2 6 d(yi , B(µ))2 + d(yi0 , B(µ0 ))2 + 2d(yi , yi0 )d(B(µ), B(µ0 )),
which, after multiplying by λi and summing over i, gives:
n
X
(167) Fµ (B(µ0 )) + Fµ0 (B(µ)) 6 Fµ (B(µ)) + Fµ0 (B(µ0 )) + 2d(B(µ), B(µ0 )) λi d(yi , yi0 ).
i=1
63
Now, since X is 2-barycentric:
d(B(µ), B(µ0 ))2 + Fµ (B(µ)) 6 Fµ (B(µ0 ))
and by symmetry:
d(B(µ), B(µ0 ))2 + Fµ0 (B(µ0 )) 6 Fµ0 (B(µ)).
Adding these two and using (167), we finally deduce (after cancelations)
Xn
2d(B(µ), B(µ0 ))2 6 2d(B(µ), B(µ0 )) λi d(yi , yi0 ),
i=1
which is exactly (161) with p = Γ = 1. 2
The last two propositions, along with the already known example of p-convex spaces give a wide class
of metric spaces for which Theorem 13.11 directly applies. Let us now proceed with the proof of the
theorem, in analogy with what we did for Markov type before.
Theorem 13.19 (Pisier’s martingale inequality for barycentric metric spaces). Let (X, d) be a p-
barycentric metric space with constant K > 0, Ω a finite set, a probability measure µ ∈ PΩ and a
filtration {∅, Ω} = F0 ⊆ F1 ⊆ ... ⊆ Fm = 2Ω . Consider also an X-valued martingale {Zk }mk=0 with
respect to this filtration and the above barycenter map. Then, for every z ∈ X:
m−1 Z Z
p 1 X p
(168) d(z, Z0 ) + p d(Zt+1 , Zt ) dµ 6 d(z, Zm )p dµ.
K t=0 Ω Ω
Proof. Fix any ω ∈ Ω and t = 0, 1, ..., m − 1. Applying the p-barycentric inequality to the probability
measure
1 X
ν= µ(a)δZt+1 (a)
µ Ft (ω)
a∈Ft (ω)
implies that, since B(ν) = Zt (ω) by the martingale property:
1 X µ(a)
d(z, Zt (ω))p + p d(Zt+1 (a), Zt (ω))p
K µ Ft (ω)
a∈Ft (ω)
X µ(a)
6 d(Zt+1 (a), z)p .
µ Ft (ω)
a∈Ft (ω)
Thus, if the atoms of Ft are {A1 , ..., Ak } and ω ∈ Ai , for every i we have:
1 X X
p
µ(a)d(Zt+1 (a), Zt (ω))p 6 µ(a)d(z, Zt+1 (a))p − µ(Ai )d(z, Zt (ω))p
K
a∈Ai a∈Ai
and, after integrating with respect to ω ∈ Ω (since Zt is constant on each Ai ):

Z Z Z
1 p p
d(Z t , Zt+1 ) dµ 6 d(z, Zt+1 ) dµ − d(z, Zt )p dµ.
Kp Ω Ω Ω
The above sum is telescopic: summing over t gives Pisier’s inequality.
n−1
Sketch of the proof of Theorem 13.11.Let A = (aij ) reversible with respect to some π ∈ ∆ and
x1 , ..., xn ∈ X. We will find y1 , ..., yn ∈ X such that
n
X Xn n
X
πi d(xi , yi )p + t πi aij d(yi , yj )p . (ΓK)p πi (Ct (A))ij d(xi , xj )p ,
i=1 i,j=1 i,j=1
1
Pt s t
where Ct (A) = t s=1 A . Consider the set Ω = {1, 2, ..., n} , whose elements (i1 , i2 , ..., it ) we think as
trajectories of length t on {1, 2, ..., n} and for ` ∈ {1, 2, ..., n} define the measure µ` ∈ PΩ by
t−1
Y
µ` (i1 , i2 , ..., it ) = aì1 ais is+1
(169) s=1
= probability to travel through this trajectory starting from `.
(`,t) t
For ` ∈ {1, 2, ..., n} and t > 1, define the martingale {Ms }s=0 (backwards) inductively by the relations:
X n
(`,t)
(170) Mt (i1 , ..., it ) = xit and Ms(`,t) = B ais j δM (`,t) for s < t;
s+1
j=1
64
that is: Mt denotes the endpoint of the trajectory, Mt−1 is the barycenter of all possible endpoints etc.
Finally, for 1 6 i 6 n, define
1 Xt
def
(171) yi = B δM (i,s) .
t s=1 0
One can prove that these points satisfy the metric Markov cotype inequality. The details of the proof
can be found in [MN13], pp. 15-20. 2
Reference
[MN13] M. Mendel and A. Naor. Spectral calculus and Lipschitz extension for barycentric metric spaces.
Anal. Geom. Metr. Spaces, 1:163–199, 2013.
65

Embeddings Extensions

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Embeddings Extensions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Embeddings Extensions

Uploaded by

Copyright:

Available Formats

Metric Embeddings and Lipschitz Extensions

Lectures given by Assaf Naor

• Y. Benyamini and J. Lindenstrauss. Geometric nonlinear functional analysis. Vol. 1, volume 48

with the supremum norm k · k∞ .

(Fn2 , k · k1 ) is called the Hamming cube.

(20) dY (f (x), f (y)) . dX (x, y)α , for every x, y ∈ A.

Proof. Let x = x0 , x1 , x2 , ..., xk = y be a shortest path between x, y in X ∪ X̃ with k minimal. We will

Consider now the identity map f : X → `n2 . This satisfies:

Remark. The above property actually characterizes Hilbert spaces.

We must prove that m 6 1. Consider a y ∈ H2 such that h(y) = m and define

Consider now a random vector X ∈ H2 such that

Defining f˜(x) = y, we get that

|B(x, r)| > 2j and |B(y, r)| > 2j

and (for technical reasons) denote

Now, to finish the proof of Bourgain’s theorem:

= P A ∩ B o (x, r̃j (x, y)) = ∅ · P A ∩ B(y, r̃j−1 (x, y)) 6= ∅

which is not the lower bound that we want to achieve here.

whereas their Hausdorff distance is defined by

(38) w{Ui ,Uj } = d(Ui , Uj ), i, j = 1, 2, ..., n.

(39) dX/G (Gx , Gy ) = d(Gx , Gy ) = H(Gx , Gy ), x, y ∈ X,

where we write X/G instead of X/{Gx }x∈X .

Proof. The proof is easy and left as an exercise.

Using Stirling’s formula, we thus have:

Suppose that f : Fd2 → `2 is V ⊥ -invariant, i.e.

The main ingredient of the proof is the following inequality:

Lemma 5.9. For every subspace V of Fd2 , we have

Proof. Let f : Fd2 /V ⊥ → `2 be such that

dFd2 /V ⊥ (x + V ⊥ , y + V ⊥ ) 6 kf (x + V ⊥ ) − f (y + V ⊥ )k2 6 D · dFd2 /V ⊥ (x + V ⊥ , y + V ⊥ ).

We want to bound D from below. Let g : Fd2 → `2 be defined by

Now, using the Lipschitz conditions in (45), we have:

Putting everything together:

c2 (Fd2 /V ⊥ ) & d  log |Fd2 /V ⊥ |.

Proof. We will construct such a subspace by induction. Suppose that

{0} = V0 ⊆ V1 ⊆ ... ⊆ Vk ⊆ Fd2

> n · n−γ = n1−γ .

(50) d(x, y) 6 diam(Pk(x,y) (x)) 6 8−k(x,y) = ρ(x, y).

The following lemma finally completes the proof.

Now, to bound |N |, observe that from the construction of N , the balls B y, r

For the second term here, we have

for every u ∈ S n−1 and ε > 0. Define now

Then, kF kLip 6 2kf˜kLip + kf˜k∞ .

consisting of all linear functions on the cube. Finally, define T : X → `k2 by

Thus, using Hölder’s inequality:

Thus P fy = fy , which implies Qf = f , i.e. Q is a projection with

Assuming that the claim is true, consider f = 2k δ(0,...,0) =

kF (x) − F (y)k = kxkf˜(x/kxk) − kykf˜(y/kyk)

6 kxkf˜(x/kxk) − kykf˜(x/kxk) + kykf˜(x/kxk) − kykf˜(y/kyk)

Putting everything together, we have proved that

6 2kFb − T |SX kLip + kFb|SX − T |SX k∞

z ∗ f (x) − T (x) − z ∗ f (y) − T (y)

z ∗ (f (x) − T (x)) − z ∗ (f (y) − T (y))

This defines an operator PY : Y # → Y ∗ such that, if f is smooth:

By the argument in the previous proof, {Pn } is a sequence of projections.

and observe that φA (A0 ) ⊆ `2 and

6 2(1 + α)DA DB d(a, b) + (2 + α)DB d(a, b),

Since min{v, r}/ max{u, r/2} 6 2, it follows that

To verify the validity of (112), note that for every w ∈ X r A we have

We shall prove below that for every ε ∈ (0, 1/2] we have

c2 (Fd2 /V ⊥ ) & d log |Fd2 /V ⊥ |.

Now, to bound |N |, observe that from the construction of N , the balls B y, r