1 Distances and Metric Spaces: 1.1 Finite Metrics and Graphs
1 Distances and Metric Spaces: 1.1 Finite Metrics and Graphs
a b c d e
a 0 3 8 6 1
b 0 9 7 2
c 0 2 7
d 0 5
e 0
It is often difficult to visualize metrics when specified thus, and hence we will use a natural
correspondence between graphs and metrics. Given a graph G on n vertices endowed with
lengths on the edges, one can get a natural metric dG by setting, for every i, j ∈ V (G), the
distance dG (i, j) to be the length of the shortest-path between vertices i and j in G.
Conversely, given a metric (X, d), we can obtain a weighted graph G(d) representing or
generating the metric thus: we set X to be the vertices of the graph, add edges between all
pairs of vertices, and set the length of the edge {i, j} to be d(i, j). It is trivial to see that the
shortest path metric dG(d) is identical to the original metric d.
In fact, we may be able to get a “simpler” graph representing d by dropping any edge e
such that the resulting metric dG(d)−e still equals d. We can continue removing edges until no
more edges can be removed without changing the distances; the resulting minimal graph will
be called the critical graph for the metric d. It can be checked that the critical graph for the
metric above is the tree given in Figure 1.2.
We will usually blur the distinction between finite metrics and graphs, unless there are
good reasons not to do so.
I-1
a
1
e 5 d
2
2
b c
Figure 1.2: The graph corresonding to the metric of Figure 1.1
the special case of the `∞ -norm specified by kxk∞ = maxki=1 {|xi |}. (All this can be extended
to non-finite dimensions, but we will not encounter such situations in this class.) Given two
points x, y ∈ Rk , the `p -distance between them is naturally given by kx − ykp .
Some of these spaces should be familiar to us: `2 is just Euclidean space, while `1 is real
space endowed with the so-called Manhattan metric. It is often instructive to view the unit
balls in these `p metrics; here are the balls for p = 1, 2, ∞:
1.3 Embeddings
This class deals with the properties of metric spaces and for this we will have to analyze basic
metric spaces and the similarities and differences between them. Formally, we compare metric
spaces by using an embedding.
I-2
Definition 1.1 Given metric spaces (X, d) and (X, d0 ) a map f : X → X 0 is called an
embedding. An embedding is called distance-preserving or isometric if for all x, y ∈ X,
d(x, y) = d0 (f (x), f (y)).
Note that embeddings are a generic term for any map from a metric into another; trans-
forming the metric (X, d) into its graphical representation G(d) also gave us an isometric
embedding.
We call a finite metric (X, d) an `p -metric if there exists an embedding from X into Rk
(for some k) such that kf (i) − f (j)kp = d(x, y); by a cavalier abuse of notation, we will often
denote this by d ∈ `p . To denote the fact that the `p space had k dimensions, we will call the
space `kp as well.
1.4 Distortion
It is very rare to find cases where isometric embeddings can be found between two spaces of
interest, and hence we often have to allow the mappings to alter distances in some (hopefully
restricted) fashion.
There are many notions of “close”; most of the course will focus on the following notions.
Given two metrics (X, d) and (X 0 , d0 ) and a map f : X → X 0 , the contraction of f is the
maximum factor by which distances are shrunk, i.e.,
d(x, y)
max ,
x,y∈X d0 (f (x), f (y))
the expansion or stretch of f is the maximum factor by which distances are stretched:
d0 (f (x), f (y))
max ,
x,y∈X d(x, y)
and the distortion of f , denoted by kf kdist , is the product of the distortion and the expansion.
An isometric embedding has distortion 1, and this is the lowest it can get. (Right?)
Another equivalent definition is the following: the distortion of f , denoted by kf kdist , is
the smallest value α ≥ 1 for which there exists an r > 0 such that for all x, y ∈ X,
(Check.) A very useful property of distortion is that it is invariant under scaling; i.e., replacing
d and d0 by, say 10 d and 13 d0 does not change kf kdist . Hence, in many arguments, we will feel
free to rescale embeddings to make arguments easier.
2 Why embeddings?
While we may want to investigate the properties of metric spaces for their rich mathematical
content, the study is rewarding from an algorithmic viewpoint as well. The algorithms based
on metric methods can be roughly classified as follows.
I-3
1. Metric Data
The input data for the problem at hand is a metric or can be pre-processed to form a
metric. The idea is to embed the metric into (a combination of) simpler metrics, on which
the problem can be solved more easily. If the embedding has a distortion larger than 1
this technique results in approximaton algorithms for the given problem. Otherwise it
gives an exact solution.
2. Metric Relaxations
The given problem is formulated as a mathematical program which can be relaxed such
that an optimal solution can be viewed as a metric. Rounding techniques based on
embeddings can give rise to approximate solutions.
3. Problems on Metrics
Those are the problems in which metrics are the objects of study. For example, given
an arbitrary metric, the goal is to find a tree metric that is closest (in some sense) to it.
This has applications in building evolutionary trees in computational molecular biology.
In the following we give examples for the first two themes and show how solutions can be
obtained via metric methods.
I-4
This immediately suggests the following mapping: we create a coordinate fy : X → R
for each vector y ∈ {−1, 1}k , with fy (z) = hz, yi. The final map f is simply obtained by
concatenating these coordinates together, i.e., f = ⊕fy . That this is an isometric embedding
follows from kf (u) − f (v)k∞ = kf (u − v)k∞ = maxy {hu − v, yi} = ku − vk1 , where we used
the linearity of the map and Equation 2.1.
0
Solving the transformed problem: We claim that the furthest pair of points in `k∞ must
be the furthest pair of points when projected down onto one of the k 0 dimensions. Indeed, the
distance between the furthest pair in the set S is
k0
max ku − vk∞ = max max |ui − vi |
u,v∈S u,v∈S i=1
k0
= max max |ui − vi |
i=1 x,y∈S
k0
= max furthest pair along the i-th coordinate .
i=1
However, the problem of finding the furthest pair along any coordinate can be solved by finding
the largest and the smallest value in that coordinate, and hence takes 2n time. Doing this for
all the k 0 coordinates takes O(k 0 n) = O(n 2k ) time.
Theorem 2.2 (Fréchet) Any n point metric space (X, d) can be embedded into `∞ .
Proof. For each point x ∈ X, let us define a coordinate fx : X → R+ thus: fx (u) = d(x, u).
We claim this is an isometric embedding of d into `∞ . Indeed,
However, the value of Expression 2.2 is also at least d(u, v) (obtained by setting x = u), and
hence the embedding is isometric.
Exercise 2.3 Show that the number of coordinates can be reduced to n − 1. Show that, for
any tree metric, the number can be reduced to O(log n).
In contrast, other metric spaces, say `2 , are not universal in this sense. E.g., consider the
metric generated by the 3-star K1,3 . A simple argument shows that this cannot be embedded
into `2 isometrically, regardless of the number of dimensions that are used.
I-5
2.2 Metric Relaxations
We illustrate how the method of metric relaxation can be used to solve problems involving cuts.
Definition 2.5 A metric d on V is a cut metric if there exists y : 2V → R+ such that for all
i, j ∈ V , X
dij = y(S).
S⊆V :{i,j}∈∂S
P
I.e., the metric d is a cut metric exactly when it is a non-negative linear combination y(S)δS
of elementary cut metrics.
Since it is unclear how to write the above expression as a polynomial sized linear program,
we relax the condition and only require d to be a metric such that dst ≥ 1. I.e.,
X
min ce de (2.3)
e∈E
subject to dij ≤ dik + dkj , ∀i, j, k
dst ≥ 1
dij ≥ 0, ∀i, j.
Suppose the metric d∗ is an optimal solution to (2.3). Since d∗ may not be a cut metric in
general, we need to round the solution in order to obtain a cut. To this end, let us order
the vertices in V into level sets based on their distances from s in d∗ (Figure 2.4). Let
x0 < x1 < . . . < xl be the distinct distances in the set {d∗ (s, v) | v ∈ V } in increasing order.
Clearly, the smallest distance x0 = 0 (corresponding to d∗ (s, s)), and xl ≥ d∗ (s, t).
For each 1 ≤ j ≤ l, define Vj = {v ∈ V | d∗ (s, v) ≤ xj } be the vertices in the first
P j levels,
and Ej = ∂(Vj ) be the edges that leave Vj . Also, define yj = xj − xj−1 , and Cj = e∈Ej ce .
I-6
s t
x0 = 0 x j−1 xj xl
Distance
from s
yj
Proof. For the edge e = (u, v) ∈ E, the length d∗e ≥ |d∗ (s, u)−d∗P
(s, v)| by the triangle inequality.
Furthermore, for each vertex u ∈ V , it holds that d∗ (s, u) = j:{s,u}∈Ej yj . Combining the
two, we get that d∗e ≥ j:e∈Ej yj .
P
Theorem 2.7 Let C ∗ = minlj=1 Cj be attained at j = j0 . Then the cut Ej0 is an minimum
s-t cut.
Proof. Let Z ∗ be the capacity of a minimum cut. Since Z ∗ is the value of an optimal integral
solution to (2.3),
P it∗ can be no smaller than the value of the optimal fractional solution, and
∗
hence Z ≥ e ce de . Now using Lemma 2.6, we get
X X X l X
X l
X
ce d∗e ≥ ce yj = ce yj = Cj yj ,
e∈E e∈E j:e∈Ej j=1 e∈Ej j=1
simply by changing the order of summation, and usign the definition of Cj . However, since
C ∗ = minj Cj , we get that
l
X l
X
Cj yj ≥ C ∗ yj = C ∗ (xl − x0 ) ≥ C ∗ d∗ (s, t) ≥ C ∗ ,
j=1 j=1
the final inequality using that d∗ (s, t) ≥ 1. This shows that Z ∗ ≥ C ∗ , which proves the result.
I-7
2.2.3 An `1 -embeddings view
As anPaside, the proof can also be looked at as embedding the distance d∗ into the cut metric
d0 = j yj δVj , which is the same as d0 (x, y) = d∗ (s, x) − d∗ (s, y). (Check that these are two
equivalent definitions!)
Since d0 (x, y) ≤ d∗ (x, y) for all x, y, we 0
P P
Pget that e ce de ≤ e ce de . Now picking a random
value j ∈ {1, . . . , l} with probability yj /( yj ) and taking the cut Ej = ∂(Vj ) ensures that the
expected value of the cut is:
" #
X 1 X X
E ce δVj (e) = P ce d0e ≤ ce de ≤ Z ∗ ,
e
y
j j e e
for βS ∈ R. Indeed, to get Fréchet’s mapping of metrics into `∞ from Section 3.1, we can set
βS = 1 ⇐⇒ |S| = 1. The main result in this area was proved by Bourgain (1985), who showed
the following theorem:
Theorem 3.8 (Bourgain (1985)) Given a metric (X, d), the map f obtained by setting
1
βS = n
|S| |S|
I-8
This result is tight for `1 , as was shown by Linial, London and Rabinovich (1995). They
extended a theorem of Leighton and Rao (1989) to show that constant-degree expanders require
a distortion of Ω(log n) to embed into `1 .
In 1995, Matoušek sharpened Theorem 3.8 to show that a suitable choice of βS allows us
to embed any n-point metric into `p with distortion O((log n)/p). Furthermore, he showed a
matching lower bound of Ω((log n)/p) on the distortion into `p , using the same constant-degree
expanders.
Theorem 3.9 Given a metric and p ∈ [1, ∞), one can embed d into `kp with O((log n)/p)
distortion.
The embeddings can be modified by random sampling ideas to use a small number of
dimensions:
Theorem 3.10 Given a metric and p ∈ [1, ∞), one can embed d into `kp with k = O(log2 n)
dimensions and O(log n) distortion.
I-9
• Given a pair of vertices x, y ∈ X, the expected distance is not too much larger than
d(x, y), i.e.,
E[ dT (x, y) ] ≤ α d(x, y).
The added power of randomization can be seen by taking the n-cycle with unit length
edges: the distribution we want is obtained by picking one of the edges uniformly at random
and deleting it. It can be verified that this gives us a 2(1 − n1 )-probabilistic embedding of the
cycle into its subtrees.
This concept has been widely studied in many papers, and culminated in the following
result due to Fakcharoenphol, Rao and Talwar (2003):
Theorem 3.12 Any n-point metric O(log n)-probabilistically embeds into a distribution of
trees; furthermore, samples from this distribution can be generated in polynomial time.
Embeddings into random trees (as we shall refer to them) have enjoyed very wide applica-
bility, and we will see examples later in the course. Essentially, any intractable problem on
a metric space (that has a cost function linear in the distances) can be now solved on a tree
instead, and Theorem 3.12 guarantees that we lose only O(log n) in the cost (in expectation).
References
[Bou85] Jean Bourgain. On Lipschitz embeddings of finite metric spaces in Hilbert space.
Israel Journal of Mathematics, 52(1-2):46–52, 1985.
[Das99] Sanjoy Dasgupta. Learning mixtures of gaussians. In Proceedings of the 40th IEEE
Symposium on Foundations of Computer Science (FOCS), pages 634–644, 1999.
[FRT03] Jittat Fakcharoenphol, Satish B. Rao, and Kunal Talwar. A tight bound on approxi-
mating arbitrary metrics by tree metrics. In Proceedings of the 35th ACM Symposium
on Theory of Computing (STOC), pages 448–455, 2003.
[LLR95] Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some
of its algorithmic applications. Combinatorica, 15(2):215–245, 1995. Also in Proc.
35th FOCS, 1994, pp. 577–591.
[LR99] Frank Thomson Leighton and Satish B. Rao. Multicommodity max-flow min-cut
theorems and their use in designing approximation algorithms. Journal of the ACM,
46(6):787–832, 1999.
[Mat97] Jiřı́ Matoušek. On embedding expander graphs into `p spaces. Israel Journal of
Mathematics, 102:189–197, 1997.
[Mat02] Jiřı́ Matoušek. Lectures on discrete geometry, volume 212 of Graduate Texts in
Mathematics. Springer, New York, 2002.
I-10