Notions of Optimal Transport Theory and How To

Notions of optimal transport theory and how to
implement them on a computer

arXiv:1710.02634v1 [math.AP] 7 Oct 2017
Bruno Lévy and Erica Schwindt

October 10, 2017
Abstract
This article gives an introduction to optimal transport, a mathematical
theory that makes it possible to measure distances between functions (or
distances between more general objects), to interpolate between objects
or to enforce mass/volume conservation in certain computational physics
simulations. Optimal transport is a rich scientific domain, with active
research communities, both on its theoretical aspects and on more ap-
plicative considerations, such as geometry processing and machine learn-
ing. This article aims at explaining the main principles behind the theory
of optimal transport, introduce the different involved notions, and more
importantly, how they relate, to let the reader grasp an intuition of the
elegant theory that structures them. Then we will consider a specific set-
ting, called semi-discrete, where a continuous function is transported to
a discrete sum of Dirac masses. Studying this specific setting naturally
leads to an efficient computational algorithm, that uses classical notions
of computational geometry, such as a generalization of Voronoi diagrams
called Laguerre diagrams.
1 Introduction
This article presents an introduction to optimal transport. It summarizes and
complements a series of conferences given by B. Lévy between 2014 and 2017.
The presentations stays at an elementary level, that corresponds to a computer
scientist’s vision of the problem. In the article, we stick to using standard no-
tions of analysis (functions, integrals) and linear algebra (vectors, matrices),
and give an intuition of the notion of measure. The main objective of the pre-
sentation is to understand the overall structure of the reasoning 1 , and to follow
a continuous path from the theory to an efficient algorithm that can be imple-
mented in a computer.
Optimal transport, initially studied by Monge, [Mon84], is a very general

mathematical framework that can be used to model a wide class of application
1 Teach principles, not equations. [R. Feynman]
1
Figure 1: Comparing functions: one would like to say that f1 is nearer to
f2 than f3 , but the classical L2 norm “does not see” that the graph of f2
corresponds to the graph of f1 slightly shifted along the x axis.
Figure 2: Interpolating between two functions: linear interpolation makes

a hump disappear while the other hump appears; displacement interpolation,
stemming from optimal transport, will displace the hump as expected.
domains. In particular, it is a natural formulation for several fundamental

questions in computer graphics [Mém11, Mér11, BvdPPH11], because it makes
it possible to define new ways of comparing functions, of measuring distances
between functions and interpolating between two (or more) functions :
Comparing functions Consider the functions f1 , f2 and f3 in Figure 1. Here

we have chosen a function f1 with a wildly oscillating graph, and a function
f2 obtained by translating the graph of f1 along the x axis. The function
f3 corresponds to the mean value of f1 (or f2 ). If one measures the relative
distances between these functions using the classical L2 norm, that is dL2 (f, g) =
(f (x) − g(x))2 dx, one will find that f1 is nearer to f3 than f2 . Optimal
R
transport makes it possible to define a distance that will take into account that
the graph of f2 can be obtained from f1 through a translation (like here), or
through a deformation of the graph of f1 . From the point of view of this new
distance, the function f1 will be nearer to f2 than to f3 .
Interpolating between functions: Now consider the functions u and v in

Figure 2. Here we suppose that u corresponds to a certain physical quantity
measured at an initial time t = 0 and that v corresponds to the same phe-
nomenon measured at a final time t = 1. The problem that we consider now
2
Figure 3: Given two terrains defined by their height functions u and v, symbol-
ized here as gray levels, Monge’s problem consists in transforming one terrain
into the other one by moving matter through an application T . This application
needs to satisfy a mass conservation constraint.
consists in reconstructing what happened between t = 0 and t = 1. If we use

linear interpolation (Figure 2, top-right), we will see the left hump progres-
sively disappearing while the right hump progressively appears, which is not
very realistic if the functions represent for instance a propagating wave. Op-
timal transport makes it possible to define another type of interpolation (Mc.
Cann’s displacement interpolation, Figure 2, bottom-right), that will progres-
sively displace and morph the graph of u into the graph of v.
Optimal transport makes it possible to define a geometry of a space of func-

tions2 , and thus gives a definition of distance in this space, as well as means of
interpolating between different functions, and in general, defining the barycen-
ter of a weighted family of functions, in a very general context. Thus, optimal
transport appears as a fundamental tool in many applied domains. In com-
puter graphics, applications were proposed, to compare and interpolate objects
of diverse natures [BvdPPH11], to generate lenses that can concentrate light to
form caustics in a prescribed manner [MMT17, STTP14]. Moreover, optimal
transport defines new tools that can be used to discretize Partial Differential
Equations, and define new numerical solution mechanisms [BCMO14]. This
type of numerical solution mechanism can be used to simulate for instance
fluids [GM17], with spectacular applications and results in computer graphics
[dGWH+ 15].
The two sections that follow are partly inspired by [Vil09], [San14], [Caf03]
and [AG13], but stay at an elementary level. Here the main goal is to give an
intuition of the different concepts, and more importantly an idea of the way
the relate together. Finally we will see how they can be directly used to design
a computational algorithm with very good performance, that can be used in
practice in several application domains.
2 or more general objects, called probability measures, more on this later.
3
2 Monge’s problem
The initial optimal transport problem was first introduced and studied by
Monge, right before the French revolution [Mon84]. We first give an intuitive
idea of the problem, then quickly introduce the notion of measure, that is nec-
essary to formally state the problem in its most general form and to analyze
it.
2.1 Intuition
Monge’s initial motivation to study this problem was very practical: supposing
you have an army of workmen, how can you transform a terrain with an initial
landscape into a given desired target landscape, while minimizing the total
amount of work ?
Monge’s initial problem statement was as follows:
R
inf c(x, T (x))u(x)dx
T :X→X X
subject to:
R R
∀B ⊂ X, u(x)dx = v(x)dx
T −1 (B) B
2
where X isR a subset of R
R , u and v are two positive functions defined on X and
such that X u(x)dx = Y v(x)dx, and c(·, ·) is a convex distance (the Euclidean
distance in Monge’s initial problem statement).
The functions u and v represent the height of the current landscape and the
height of the target landscape respectively (symbolized as gray levels in Figure
3). The problem consists in finding (if it exists) a function T from X to X that
transforms the current landscape u into the desired one v, while minimizing the
product of the amount of transported earth u(x) with the distance c(x, T (x))
to which it was transported. Clearly, the amount of earth is conserved during
transport, thus the total quantity of earth should be the same in the source
and target landscapes (the integrals of u and v over X should coincide). This
global matter conservation constraint needs to be completed with a local one.
The local matter conservation constraint enforces that in the target landscape,
the quantity of earth received in any subset B of X corresponds to what was
transported here, that is the quantity of earth initially present in the pre-image
T −1 (B) of B under T . Without this constraint, one could locally create matter
in some places and annihilate matter in other places in a counterbalancing way.
A map T that satisfies the local mass conservation constraint is called a transport
map.
2.2 Monge’s problem with measures

We now suppose that instead of a “target landscape”, we wish to transport
earth (or a resource) towards a set of points (that will be denoted by Y for
4
Figure 4: Transport from a function (gray levels) to a discrete point-set (blue
disks).
now on), that represent for instance a set of factories that exploit a resource,
see Figure 4. Each factory wishes to receive a certain quantity of resource (de-
pending for instance of the number of potential customers around the factory).
Thus, the function v that represents the “target landscape” is replaced with a
function on a finite set of points. However, if a function v is zero everywhere
except on a finite set of points, then its integral over X is also zero. This is a
problem, because for instance one cannot properly express the mass conserva-
tion constraint. For this reason, the notion of function is not rich enough for
representing this configuration. One can use instead measures (more on this
below), and associate with each factory a Dirac mass weighted by the quantity
of resource to be transported to the factory.
From now on, we will use measures µ and ν to represent the “current land-
scape” and the “target landscape”. These measures are supported by sets X
and Y , that may be different sets (in the present example, X is a subset of R2
and Y is a discrete set of points). Using measures instead of function not only
makes it possible to study our “transport to discrete set of factories” problem,
but also it can be used to formalize computer objects (meshes) and directly
leads to a computational algorithm. This algorithm is very elegant because it is
a verbatim computer translation of the mathematical theory (see §7.6). In this
particular setting, translating from the mathematical language to the algorith-
mic setting does not require to make any approximation. This is made possible
by the generality of the notion of measure.
The reader who wishes to learn more on measure theory may refer to the
textbook [Tao11]. To keep the length of this article reasonable, we will not
give here the formal definition of a measure. In our context, one can think of
a measure as a “function” that can be only queried using integrals and that
can be “concentrated” on very small sets (points). The following table can be
used to intuitively translate from the “language of functions” to the “language
5
Figure 5: A classical example of the existence problem: there is no optimal
transport between a segment L1 and two parallel segments L2 and L3 (it is
always possible to find a better transport by replacing h with h/2).
of measures” :
Function u Measure µ
R R
B
u(x)dx µ(B) or B dµ
R R
B
f (x)u(x)dx B
f (x)dµ
u(x) N/A
(Note: in contrast with functions, measures cannot be evaluated at a point,
they can be only integrated over domains).
In its version with measures, Monge’s problem can be stated as follows:

R
inf c(x, T (x))dµ subject to ν = T ]µ (M)
T :X→Y X
where X and Y are Borel sets (that is, sets that can be measured), µ and ν
are two measures on X and Y respectively such that µ(X) = ν(Y ) and c(·, ·)
is a convex distance. The constraint ν = T ]µ, that reads “T pushes µ onto ν”
corresponds to the local mass conservation constraint. Given a measure µ on X
and a map T from X to Y , the measure T ]µ on Y , called “the pushforward of
µ by T ”, is such that T ]µ(B) = µ(T −1 (B)) for all Borel set B ⊂ Y . Thus, the
local mass conservation constraint means that µ(T −1 (B)) = ν(B) for all Borel
set B ⊂ Y .
The local mass conservation constraint makes the problem very difficult:
imagine now that you want to implement a computer program that enforces it:
the constraint concerns all the subsets B of Y . Could you imagine an algorithm
that just tests whether a given map satisfies it ? What about enforcing it ? We
will see below a series of transformations of the initial problem into equivalent
problems, where the constraint becomes linear. We will finally end up with a
simple convex optimization problem, that can be solved numerically using clas-
sical methods.
Before then, let us get back to examine the original problem. The local
mass conservation constraint is not the only difficulty: the functional optimized
6
Figure 6: Four example of transport plans in 1D. A: a segment is translated.
B: a segment is split into two segments. C: a Dirac mass is split into two Dirac
masses; D: a Dirac mass is spread along two segments. The first two examples
(A and B) have the form (Id × T )]µ where T is a transport map. The third
and fourth ones (C and D) have no corresponding transport map, because each
of them splits a Dirac mass.
by Monge’s problem is non-symmetric, and this causes additional difficulties

when studying the existence of solutions for problem (M). The problem is
not symmetric because T needs to be a map, therefore the source and target
landscape do not play the same role. Thus, it is possible to merge earth (if
T (x1 ) = T (x2 ) for two different points x1 and x2 ), but it is not possible to
split earth (for that, we would need a “map” T that could send the same point
x to two different points y1 and y2 ). The problem is illustrated in Figure 5:
suppose that you want to compute the optimal transport between a segment L1
(that symbolizes a “wall of earth”) and two parallel segments L2 and L3 (that
symbolize two “trenches” with a depth that correspond to half the height of the
wall of earth). Now we want to transport the wall of earth to the trenches, to
make the landscape flat. To do so, it is possible to decompose L1 into segments
of length h, sent alternatively towards L2 and L3 (Figure 5 on the left). For
any length h, it is always possible to find a better map T , that is a lower value
of the functional in (M), by subdividing L1 into smaller segments (Figure 5 on
the right). The best way to proceed consists in sending from each point of L1
half the earth to L2 and half the earth to L3 , which cannot be represented
by a map. Thus, the best solution of problem (M) is not a map. In a more
general setting, this problem appears each time the source measure µ has mass
concentrated on a manifold of dimension d − 1 [McC95] (like the segment L1 in
the present example).
3 Kantorovich’s relaxed problem

To overcome this difficulty, Kantorovich stated a problem with a larger space
of solutions, that is, a relaxation of problem (M), where mass can be both split
and merged. The idea consists in solving for the “graph of T ” instead of T . One
may think of the graph of T as a function g defined on X × Y that indicates
for each couple of points x ∈ X, y ∈ Y how much matter goes from x to y.
7
Figure 7: A discrete version of Kantorovich’s problem.
However, once again, we cannot use standard functions to represent the graph
of T : if you think about the graph of a univariate function x 7→ f (x), it is
defined on R2 but concentrated on a curve. For this reason, as in our previous
example with factories, one needs to use measures. Thus, we are now looking
for a measure γ supported by the product space X × Y . The relaxed problem
is stated as follows:
( )
R
inf c(x, y)dγ | γ ≥ 0 and γ ∈ Π(µ, ν)
γ
X×Y
(K)
where:
Π(µ, ν) = {γ ∈ P(X × Y ) | (PX )]γ = µ ; (PY )]γ = ν}

where (PX ) and (PY ) denote the two projections (x, y) ∈ X × Y 7→ x and
(x, y) ∈ X × Y 7→ y respectively.
The two measures (PX )]γ and (PY )]γ obtained by pushing forward γ by the
two projections are called the marginals of γ. The measures γ in the admissible
set Π(µ, ν), that is, the measures that have µ and ν as marginals, are called
optimal transport plans. Let us now have a closer look at the two constraints on
the marginals (PX )]γ = µ and (PX )]γ that define the set of optimal transport
plans Π(µ, ν). Recalling the definition of the pushforward (previous subsection),
these two constraints can also be written as:
R R
(PX )]γ = µ ⇐⇒ ∀B ⊂ X, B dµ = B×Y dγ
(1)
(PY )]γ = ν ⇐⇒ ∀B 0 ⊂ Y, B 0 dν = X×B 0 dγ.
R R
Intuitively, the first constraint (PX )]γ = µ means that everything that comes
from a subset B of X should correspond to the amount of matter (initially) con-
tained by B in the source landscape, and the second one (PY )]γ = ν means that
everything that goes into a subset B 0 of Y should correspond to the (prescribed)
amount of matter contained by B 0 in the target landscape ν.
We now examine the relation between the relaxed problem (K) and the initial
problem (M). One can easily check that among the optimal transport plans,
those with the form (Id × T )]µ correspond to a transport map T :
Observation 1. If (Id × T )]µ is a transport plan, then T pushes µ onto ν.
8
Proof. (Id×T )]µ is in Π(µ, ν), thus (PY )](Id×T )]µ = ν, or ((PY ) ◦ (Id × T )) ]µ =
ν, and finally T ]µ = ν.
We can now observe that if a transport plan γ has the form γ = (Id × T )]µ,
then problem (K) becomes:
   
 Z  Z
min c(x, y)d ((Id × T )]µ) = min c(x, T (x))dµ
  
X×Y X
(one retrieves the initial Monge problem).
To further help grasping an intuition of this notion of transport plan, we

show four 1D examples in Figure 6 (the transport plan is then 1D × 1D = 2D).
Intuitively, the transport plan γ may be thought of as a “table” indexed by
x and y that indicates the quantity of matter transported from x to y. More
exactly3 , the measure γ is non-zero on subsets of X × Y that contain points
(x, y) such that some matter is transported from x to y. Whenever γ derives
from a transport map T , that is if γ has the form (Id × T )]µ, then we can
consider γ as the “graph of T ” like in the first two examples of Figure 6 (A)
and (B)4 in Figure 6.
The transport plans of the two other examples (C) and (D) have no as-
sociated transport map, because they split Dirac masses. The transport plan
associated with Figure 5 has the same nature (but this time in 2D × 2D = 4D).
It cannot be written with the form (Id × T )]µ because it splits the mass con-
centrated in L1 into L2 and L3 .
Now the theoretical questions are:
• when does an optimal transport plan exist ?

• when does it admit an associated optimal transport map ?
A standard approach to tackle this type of existence problem is to find
a certain regularity both in the functional and in the space of the admissible
transport plans, that is, proving that the functional is sufficiently “smooth” and
finding a compact set of admissible transport plans. Since the set of admissible
transport plans contains at least the product measure µ ⊗ ν, it is non-empty,
and existence can be proven thanks to a topological argument that exploits the
regularity of the functional and the compactness of the set. Once the existence
of a transport plan is established, the other question concerns the existence of an
associated transport map. Unfortunately, problem (K) does not directly show
the structure required by this reasoning path. However, one can observe that
3 We recall that one cannot evaluate a measure γ at a point (x, y), we can just compute
integrals with γ.
4 Note that the measure µ is supposed to be absolutely continuous with respect to the
Lebesgue measure. This is required, because for instance in example (B) of Figure 6, the
transport map T is undefined at the center of the segment. The absolute continuity require-
ment allows one to remove from X any subset with zero measure.
9
(K) is a linear optimization problem subject to linear constraints. This suggests
using certain tools, such as the dual formulation, that was also developed by
Kantorovich. With this dual formulation, it is possible to exhibit an interesting
structure of the problem, that can be used to answer both questions (existence
of a transport plan, and existence of an associated transport map).
4 Kantorovich’s dual problem

Kantorovich’s duality applies to problem (K), in its most general form (with
measures). To facilitate understanding, we will consider instead a discrete ver-
sion of problem (K), where the involved entities are vectors and matrices (instead
of measures and operators). This makes it easy to better discern the structure
of (K), that is the linear nature of both the functional and the constraints. This
also makes it easier for the reader to understand how to construct the dual by
manipulating simpler objects (matrices and vectors), however the structure of
the reasoning is the same in the general case.
4.1 The discrete Kantorovich problem

In Figure 7, we show a discrete version of the 1D transport between two segments
of Figure 6. The measures µ and ν are replaced with vectors U = (ui )i=1...m
and V = (vj )i=1...n . The transport plan γ becomes a set of coefficients γij .
Each coefficient γij indicates the quantity of matter that will be transported
from ui to vj . The discrete Kantorovich problem can be written as follows:

 Px γ = U
min < C, γ > subject to Py γ = V (2)
γ
γi,j ≥ 0 ∀i, j

where γ is the vector of Rm×n with all coefficients γij (that is, the matrix γij
“unrolled” into a vector), and C the vector of Rm×n with the coefficients cij
indicating the transport cost between point i and point j (for instance, the
Euclidean cost). The objective function is simply the dot product, denoted
by < C, γ >, of the cost vector C and the vector γ. The objective function
is linear in γ. The constraints on the marginals (1) impose in this discrete
version that the sums of the γij coefficients over the columns correspond to
the ui coefficients (Figure 7-B) and the sums over the rows correspond to the
vj coefficients (Figure 7-C). Intuitively, everything that leaves point i should
correspond to ui , that is the quantity of matter initially present in i in the
source landscape, and everything that arrives at a point j should correspond
to vj , that is the quantity of matter desired at j in the target landscape. As
one can easily notice, in this form, both constraints are linear in γ. They can
be written with two matrices Px and Py , of dimensions m × mn and n × mn
respectively.
10
4.2 Constructing the Kantorovich dual in the discrete set-
ting
We introduce, arbitrarily for now, the following function L defined by:
L(ϕ, ψ) =< C, γ > − < ϕ, Px γ − U > − < ψ, Py γ − V >
that takes as arguments two vectors, ϕ in Rm and ψ in Rn . The function L

is constructed from the objective function < C, γ > from which we subtracted
the dot products of ϕ and ψ with the vectors that correspond to the degree of
violation of the constraints. One can observe that:
sup[L(ϕ, ψ)] = < C, γ > if Px γ = U and Py γ = V
ϕ,ψ
= +∞ otherwise.
Indeed, if for instance a component i of Px γ is non-zero, one can make L

arbitrarily large by suitably choosing the associated coefficient ϕi .
Now we consider:
" #
inf sup[L(ϕ, ψ)] = inf [< C, γ >] .
γ≥0 ϕ,ψ γ ≥ 0
Px γ = U
Py γ = V
There is equality, because to minimize sup[L(ϕ, ψ)], γ has no other choice than
satisfying the constraints (see the previous observation). Thus, we obtain a
new expression (left-hand side) of the discrete Kantorovich problem (right-hand
side). We now further examine it, and replace L by its expression:
" #
< C, γ > − < ϕ, Px γ − U >
inf sup (3)
γ≥0 ϕ,ψ − < ψ, Py γ − V >

< C, γ > − < ϕ, Px γ − U >
= sup inf (4)
ϕ,ψ γ≥0 − < ψ, Py γ − V >
" !#
< γ, C − Px t ϕ − Py t ψ > +
= sup inf (5)
ϕ,ψ γ≥0 < ϕ, U > + < ψ, V >
= sup [< ϕ, U > + < ψ, V >] . (6)

ϕ, ψ
Px t ϕ + Py t ψ ≤ C
The first step (4) consists in exchanging the “inf” and “sup”. Then we
rearrange the terms (5). By reinterpreting this equation as a constrained op-
timization problem (similarly to what we did in the previous paragraph), we
finally obtain the constrained optimization problem in (6). In the constraint
Px t ϕ + Py t ψ ≤ C, the inequality is to be considered componentwise. Finally,
the problem (6) can be rewritten as:
11
sup [< ϕ, U > + < ψ, V >]
ϕ,ψ (7)
subject to ϕi + ψj ≤ cij , ∀i, j.
As compared to the primal problem (2) that depends on m × n variables
(all the coefficients γij of the optimal transport plan for all couples of points
(i, j)), this dual problem depends on m + n variables (the components ϕi and
ψj attached to the source points and target points). We will see later how
to further reduce the number of variables, but before then, we go back to the
general continuous setting (that is, with functions, measures and operators).
4.3 The Kantorovich dual in the continuous setting

The same reasoning path can be applied to the continuous Kantorovich problem
(K), leading to the following problem (DK):

R R
(DK) sup ϕdµ + ψdν
ϕ,ψ X Y
(8)
subject to:
ϕ(x) + ψ(y) ≤ c(x, y) ∀(x, y) ∈ X × Y,
where ϕ and ψ are now functions defined on X and Y 5 .
The classical image that gives an intuitive meaning to this dual problem is
to consider that instead of transporting earth by ourselves, we are now hiring a
company that will do the work on our behalf. The company has a special way
of determining the price: the function ϕ(x) corresponds to what they charge
for loading earth at x, and ψ(y) corresponds to what they charge for unloading
earth at y. The company aims at maximizing its profit (this is why the dual
problem is a “sup” rather than an “inf)”, but it cannot charge more than what
it would cost us if we were doing the work by ourselves (hence the constraint).
The existence of solutions for (DK) remains difficult to study, because the
set of functions ϕ, ψ that satisfy the constraint is not compact. However, it is
possible to reveal more structure of the problem, by introducing the notion of
c-transform, that makes it possible to exhibit a set of admissible functions with
sufficient regularity:
Definition 1. • For f any function on Y with values in R ∪ {−∞} and not
identically −∞, we define its c-transform by
f c (x) = inf [c(x, y) − f (y)] , x ∈ X.

y∈Y
5 The functions ϕ and ψ need to be taken in L1 (µ) and L1 (ν). The proof of the equivalence
with problem (K) requires more precautions than in the discrete case, in particular step (4)
(exchanging sup and inf), that uses a result of convex analysis (due to Rockafellar), see [Vil09]
chapter 5.
12
• If a function ϕ is such that there exists a function f such that ϕ = f c ,
then ϕ is said to be c-concave;
• Ψc (X) denotes the set of c-concave functions on X.
We now show two properties of (DK) that will allow us to restrict the problem
to the class of c-concave functions to search for ϕ and ψ:
Observation 2. If the pair (ϕ, ψ) is admissible for (DK), then the pair (ψ c , ψ)
is admissible as well.
Proof. (
∀(x, y) ∈ X × Y, ϕ(x) + ψ(y) ≤ c(x, y)
ψ c (x) = inf [c(x, y) − ψ(y)]
y∈Y
ψ c (x) + ψ(y) = inf [c(x, y 0 ) − ψ(y 0 )] + ψ(y)
y 0 ∈Y
≤ (c(x, y) − ψ(y)) + ψ(y)
≤ c(x, y).
Observation 3. If the pair (ϕ, ψ) is admissible for (DK), then one obtains a
better pair by replacing ϕ with ψ c :
Proof.
)
ψ c (x) = inf [c(x, y) − ψ(y)]
y∈Y ⇒ ψ c (x) ≤ ϕ(x).
∀y ∈ Y, ϕ(x) ≤ c(x, y) − ψ(y)
In terms of the previous intuitive image, this means that by replacing ϕ

with ψ c , the company can charge more while the price remains acceptable for
the client (that is, the constraint is satisfied). Thus, we have:
ϕ dµ + ϕc dν
R R
inf(K) = sup
ϕ∈Ψc (X) X Y
c
R R
= sup ψ dµ + ψ dν
ψ∈Ψc (Y ) X Y
We do not give here the detailed proof for existence. The reader is referred
to [Vil09], Chapter 4. The idea is that we are now in a much better situation,
since the set of admissible functions Ψc (X) is compact6 .
The optimal value gives an interesting information, that is the minimum

cost of transforming µ into ν. This can be used to define a distance between
distributions, and also gives a way to compare different distributions, which is
of practical interest for some applications.
6 Provided that the value of ψ is fixed at a point of Y in order to suppress invariance with
respect to adding a constant to ψ.
13
5 From the Kantorovich dual to the optimal trans-
port map
5.1 The c-superdifferential
Suppose now that in addition to the optimal cost you want to know the associ-
ated way to transform µ into ν, in other words, when it exists, the map T from
X to Y which associated transport plan (Id × T )]µ minimizes the functional of
the Monge problem. A result characterizes the support of γ, that is the subset
∂ c ϕ ⊂ X × Y of the pairs of points (x, y) connected by the transport plan:
Theorem 1. Let ϕ a c-concave function. For all (x, y) ∈ ∂ c ϕ, we have
∇ϕ(x) − ∇x c(x, y) = 0,
where ∂ c ϕ = {(x, y) | ϕ(z) ≤ ϕ(x) + (c(z, y) − c(x, y)), ∀z ∈ X}7 denotes the
so-called c-superdifferential of ϕ.
Proof. See [Vil09] chapters 9 and 10.
In order to give an idea of the relation between the c-superdifferential and the
associated transport map T , we present below a heuristic argument: consider a
point (x, y) in the c-superdifferential ∂ c ϕ, then for all z ∈ X we have
c(x, y) − ϕ(x) ≤ c(z, y) − ϕ(z). (9)
Now, by using (9), we can compute the derivative at x with respect to an
arbitrary direction w
ϕ(x + tw) − ϕ(x) c(x + tw, y) − c(x, y)
lim ≤ lim+
t→0+ t t→0 t
and we obtain ∇ϕ(x) · w ≤ ∇x c(x, y) · w. We can do the same derivation along
direction −w instead of w, and then we get ∇ϕ(x) · w = ∇x c(x, y) · w, ∀w ∈ X.
In the particular case of the L2 cost, that is with c(x, y) = 1/2kx − yk2 ,
this relation becomes ∀(x, y) ∈ ∂ c ϕ, ∇ϕ(x) + y − x = 0, thus, when the optimal
transport map T exists, it is given by
T (x) = x − ∇ϕ(x) = ∇(kxk2 /2 − ϕ(x)).
Not only this gives an expression of T in function of ϕ, which is of high interest
to us if we want to compute the transport explicitly. In addition, this makes it
possible to characterize T as the gradient of a convex function (see also Brenier’s
polar factorization theorem [Bre91]). This convexity property is interesting,
because it means that two “transported particles” x1 7→ T (x1 ) et x2 7→ T (x2 )
will never collide. We now see how to prove these two assertions (T gradient of
a convex function and absence of collision) in the case of the L2 transport (with
(c(x, y) = 1/2kx − yk2 ).
7 By definition of the c-transform, if (x, y) ∈ ∂ c ϕ, then ϕc (y) = c(x, y) − ϕ(x). Then,
the c-superdifferential can be characterized by the set of all points (x, y) ∈ X × Y such that
ϕ(x) + ϕc (y) = c(x, y).
14
Figure 8: The upper envelope of a family of affine functions is a convex function.
Figure 9: Different types of transport, with continuous and discrete measures

µ and ν.
Observation 4. If c(x, y) = 1/2kx − yk2 and ϕ ∈ Ψc (X), then ϕ̄ : x 7→ ϕ̄(x) =

kxk2 /2 − ϕ(x) is a convex function (it is an equivalence if X = Y = Rd , see
[San15]).
Proof.
ϕ(x) =ψ c (x)
kx − yk2

= inf − ψ(y)
y 2
2
kyk2

kxk
= inf −x·y+ − ψ(y) .
y 2 2
Then,
kxk2 kyk2

−ϕ̄(x) =ϕ(x) − = inf −x · y + − ψ(y) .
2 y 2
Or equivalently,
kyk2

ϕ̄(x) = sup x · y − − ψ(y) .
y 2
2
The function x 7→ x · y − kyk2 − ψ(y) is affine in x, therefore the graph of ϕ̄ is
the upper envelope of a family of hyperplanes, therefore ϕ̄ is a convex function
(see Figure 8).
15
Observation 5. We now consider the trajectories of two particles parameterized
by t ∈ [0, 1], t 7→ (1 − t)x1 + tT (x1 ) and t 7→ (1 − t)x2 + tT (x2 ). If x1 6= x2 and
0 < t < 1 then there is no collision between the two particles.
Proof. By contradiction, suppose there is a collision, that is there exists t ∈ (0, 1)
and x1 6= x2 such that
(1 − t)x1 + tT (x1 ) = (1 − t)x2 + tT (x2 ).
Since T = ∇ϕ̄, we can rewrite the last equality as
(1 − t)(x1 − x2 ) + t(∇ϕ̄(x1 ) − ∇ϕ̄(x2 )) = 0.
Therefore,
(1 − t)kx1 − x2 k2 + t(∇ϕ̄(x1 ) − ∇ϕ̄(x2 )) · (x1 − x2 ) = 0.
The last step leads to a contradiction, between the left-hand side is the sum
of two strictly positive numbers (recalling the definition of the convexity of ϕ̄:
∀x1 6= x2 , (x1 − x2 ) · (∇ϕ̄(x1 ) − ∇ϕ̄(x2 )) > 0 )8
6 Continuous, discrete and semi-discrete trans-

port
The properties that we have presented in the previous sections are true for any
couple of source and target measures µ and ν, that can derive from continuous
functions or that can be discrete empirical measures (sum of Dirac masses).
Figure 9 presents three configurations that are interesting to study. These con-
figurations have specific properties, that lead to different algorithms for comput-
ing the transport. We give here some indications and references concerning the
continuous → continuous and discrete → discrete cases. Then we will develop
the continuous → discrete case with more details in the next section.
6.1 The continuous → continuous case and Monge-Ampère

equation
We recall that when the optimal transport map exists, in the case of the L2
cost (that is, c(x, y) = 1/2kx − yk2 ), it can be deduced from the function ϕ by
using the relation T (x) = ∇ϕ̄ = x − ∇ϕ. The change of variable formula for
integration over a subset B of X can be written as:
Z Z
∀B ⊂ X, 1dµ = µ(B) = ν(T (B)) = |det JT (x)| dµ (10)
B B
8 Note that even if there is no collision, the trajectories can cross, that is (1−t)x +tT (x ) =
1 1
(1 − t0 )x2 + t0 T (x2 ) for some t 6= t0 (see example in [Vil09]). If the cost is the Euclidean
distance (instead of squared Euclidean distance), the non-intersection property is stronger
and trajectories cannot cross. This comes at the expense of losing the uniqueness of the
optimal transport plan[Vil09].
16
where JT denotes the Jacobian matrix of T and det denotes the determinant.
R
If µ and νR have densities u and v respectively, that is ∀B, µ(B) = B u(x)dx
and ν(B) = B v(x)dx, then one can (formally) consider (10) pointwise in X:
∀x ∈ X, u(x) = |det JT (x)| v(T (x)). (11)
By injecting T = ∇ϕ̄ and JT = H ϕ̄ into (11), one obtains:
∀x ∈ X, u(x) = |det H ϕ̄(x)| v(∇ϕ̄(x)), (12)
where H ϕ̄ denotes the Hessian matrix of ϕ̄. Equation (12) is known ad the
Monge-Ampère equation. It is a highly non-linear equation, and its solutions
when they exist often present singularities9 . Note that the derivation above is
purely formal, and that studying the solutions of the Monge-Ampère equation
require using more sophisticated tools. In particular, it is possible to define sev-
eral types of weak solutions (viscosity solutions, solution in the sense of Brenier,
solutions in the sense of Alexandrov . . . ). Several algorithms to compute numer-
ical solutions of the Monge-Ampère equations were proposed. As such, see for
instance the Benamou-Brenier algorithm [BB00], that uses a dynamic formu-
lation inspired by fluid dynamics (incompressible Euler equation with specific
boundary conditions). See also [PPO14].
6.2 The discrete → discrete case

If µ is the sum of m Dirac masses and ν the sum of n Dirac masses, then the
problem boils down to finding the m × n coefficients γij that give for each pair
of points i of the source space and j of the target space the quantity of matter
transported from i to j. This corresponds to the transport plan in the discrete
Kantorovich problem that we have seen previously §4. This type of problem
(referred to as an assignment problem) can be solved by different methods of
linear programming [BDM09]. These method can be dramatically accelerated
by adding a regularization term, that can be interpreted as the entropy of the
transport plan [Leo13]. This regularized version of optimal transport can be
solved by highly efficient numerical algorithms [Cut13].
6.3 The continuous → discrete case

This configuration, also called semi-discrete, corresponds to a continuous func-
tion transported to a sum of Dirac masses (see the examples of c-concave func-
tions in [GM96]). This correspond to our example with factories that consume a
resource, in §2.2. Semi-discrete transport has interesting connections with some
notions of computational geometry and some specific sets of convex polyhedra
that were studied by Alexandrov [Ale05] and later by Aurenhammer, Hoffman
and Aranov [AHA92]. The next section is devoted to this configuration.
9 This is similar to the eikonal equation, which solution corresponds to the distance field,
that has a singularity on the medial axis.
17
Figure 10: Semi-discrete transport: gray levels symbolize the quantity of a
resource that will be transported to 4 factories. Each factory will be allocated a
part of the terrain in function of the quantity of resource that it should collect.
7 Semi-discrete transport
We now suppose that the source measure µ is continuous, and that the target
measure ν is a sum of Dirac masses. A practical example of this type of config-
uration corresponds to a resource which available quantity is represented by a
function u. The resource is collected by a set of n factories, as shown in Figure
10. Each factory is supposed to collect a certain prescribed quantity of resource
νj . Clearly, the sum
Pnof all prescriptions
R corresponds to the total quantity of
available resource ( j=1 νj = X u(x)dx).
7.1 Semi-discrete Monge problem

In this specific configuration, the Monge problem becomes:
R R
inf c(x, T (x))u(x)dx, subject to u(x)dx = νj , ∀j.
T :X→Y X
T −1 (yj )
A transport map T associates with each point x of X one of the points yj .

Thus, it is possible to partition X, by associating to each yj the region T −1 (yj )
that contains all the points transported towards yj by T . The constraint imposes
that the quantity of collected resource over each region T −1 (yj ) corresponds to
the prescribed quantity νj .
Let us now examine the form of the dual Kantorovich problem. In terms of
measure, the source measure
Pn µ has a density u, and the target measure ν is a
sum of Dirac masses ν = j=1 νj δyj , supported by the set of points Y = {yj }.
We recall that in its general form, the dual Kantorovich problem is written as
follows: Z Z
c
sup ψ (x)dµ + ψ(y)dν . (13)
ψ∈Ψc (Y ) X Y
18
Figure 11: The objective function of the dual Kantorovich problem is concave,
because its graph is the lower envelope of a family of affine functions.
In our semi-discrete case, the functional becomes a function of n variables,

with the following form:
F (ψ) = F (ψ1 , ψ2 , . . . ψn ) (14)

Z n
X
= ψ c (x)u(x)dx + ψj νj (15)
X j=1
Z n
X
= inf [c(x, yj ) − ψj ] u(x)dx + ψj νj (16)
yj ∈Y
X j=1
Xn Z n
X
= (c(x, yj ) − ψj ) u(x)dx + ψj νj . (17)
j=1 c (y ) j=1
Lagψ j
The first step (15) takes into account the nature of the measures µ and ν.
In particular, one can notice that the measure ν is completely defined by the
scalars νj associated with the points yj , and the function ψ is definedR by the
scalars ψj that correspondPto its value at each point yj . The integral Y ψ(y)dν
becomes the dot product j ψj νj . Thus, the functional that corresponds to the
dual Kantorovich problem becomes a function F that depends on n variables
(the ψj ). Let us now replace the c-conjugate ψ c with its expression, which gives
(16). The integral in the left term can be reorganized, by grouping the points of
X for which the same point yj minimizes c(x, yj ) − ψj , which gives (17), where
the Laguerre cell Lagψc (yj ) is defined by:
Lagψc (yj ) = {x ∈ X | c(x, yj ) − ψj ≤ c(x, yk ) − ψk , ∀k 6= j} .
The Laguerre diagram, formed by the union of the Laguerre cells, is a clas-
sical structure in computational geometry. In the case of the L2 cost c(x, y) =
1/2kx − yk2 , it corresponds to the power diagram, that was studied by Au-
renhammer at the end of the 80’s [Aur87]. One of its particularities is that
the boundaries of the cells are rectilinear, making it reasonably easy to design
computational algorithms to construct them.
19
7.2 Concavity of F
The objective function F is a particular case of the Kantorovich dual, and
naturally inherits its properties, such as its concavity (that we did not discuss
yet). This property is interesting both from a theoretical point of view, to study
the existence and uniqueness of solutions, and from a practical point of view, to
design efficient numerical solution mechanisms. In the semi-discrete case, the
concavity of F is easier to prove than in the general case. We summarize here
the proof by Aurenhammer et. al [AHA92], that leads to an efficient algorithm
[Mér11], [Lév15], [KMT16].
Theorem 2. The objective function F of the semi-discrete Kantorovich dual
problem (13) is concave.
Proof. Consider the function G defined by:
Z

G(A, [ψ1 , . . . ψn ]) = c(x, yA(x) ) − ψA(x) u(x)dx, (18)
X
and parameterized by an assignment A : X → [1 . . . n], that is, a function that

associates with each point x of X the index j of one of the points yj . If we
denote A−1 (j) = {x|A(x) = j}, then G can be also written as:
P R
G(A, ψ) = (c(x, yj ) − ψj ) u(x)dx
j A−1 (j)
P R P R (19)
= c(x, yj )u(x)dx − ψj u(x)dx.
j A−1 (j) j A−1 (j)
The first term does not depend on the ψj , and the second one is a linear combi-
nation of the ψj coefficients, thus, for a given fixed assignment A, ψ 7→ G(A, ψ)
is an affine function of ψ. Figure 11 depicts the appearance of the graph of G
for different assignment. The horizontal axis symbolizes the components of the
vector ψ (of dimension n) and the vertical axis the value of G(A, ψ). For a given
assignment A, the graph of G is an hyperplane (symbolized here by a straight
line).
Among all the possible assignments A, we distinguish Aψ that associates
with a point x the index j of the Laguerre cell x belongs to 10 , that is:
Aψ (x) = arg min [c(x, yj ) − ψj ] .

j
For a fixed vector ψ = ψ 0 , among all the possible assignments A, the assign-
0
ment Aψ minimizes the value G(A, ψ 0 ), because it minimizes the integrand
pointwise (see Figure 11 on the left. Thus, since G is affine with respect to
ψ, the graph of the function ψ → G(Aψ , ψ) is the lower envelope of a family
of hyperplanes (symbolized as straight lines in Figure 11 on the right), hence
ψ → G(Aψ , ψ) is a concave function. Finally, the objective function F of the
10 A is undefined on the set of Laguerre cell boundaries, this does not cause any difficulty
because this set has zero measure.
20
dual Kantorovich problem can be written as F (ψ) = G(Aψ , ψ) + j νj ψj , that
P
is, the sum of a concave function and a linear function, hence it is also a concave
function.
7.3 The semi-discrete optimal transport map

Let us now examine the c-superdifferential ∂ c ψ, that is, the set of points (x ∈
X, y ∈ Y ) connected by the optimal transport map. We recall that the c-
superdifferential ∂ c ψ can be defined alternatively as ∂ c ψ = {(x, yj ) | ψ c (x) + ψj = c(x, yj )}.
Consider a point x of X that belongs to the Laguerre cell Lagψc (yj ). The c-
superdifferential [∂ c ψ](x) at x is defined by:
[∂ c ψ](x) = {yk | ψ c (x) + ψk = c(x, yk )} (20)

= yk | inf [c(x, yl ) − ψl ] + ψk = c(x, yk ) (21)
yl
= {yk | c(x, yj ) − ψj + ψk = c(x, yk )} (22)

= {yk | c(x, yj ) − ψj = c(x, yk ) − ψk } (23)
= {yj } . (24)
In the first step (21), we replace ψ c by its definition, then we use the fact
that x belongs to the Laguerre cell of yj (22), and finally, the only point of Y
that satisfies (23) is yj because we have supposed x inside the Laguerre cell of yj .
To summarize, the optimal transport map T moves each point x to the

point yj associated with the Laguerre cell Lagψc (yj ) that contains x. The vector
ψ1 , . . . ψn is the unique vector that maximizes the discrete dual Kantorovich
function F such that ψ is c-concave. It is possible to show that ψ is c-concave
if and only if no Laguerre cell is empty of matter, that is the integral of u is
non-zero on each Laguerre cell. Indeed, we have the following results:
Theorem 3. Let Y = {y1 , . . . , yn } be a set of n points. Let ψ be any function
defined on Y such that Lagψc (yi ) are not empty sets. Then ψ is a c-concave
function.
Proof. By definition, for i = 1, 2, . . . , k
(ψ c )c (yi ) = inf [c(x, yi ) − ψ c (x)]
x∈X

= inf c(x, yi ) − inf [c(x, yj ) − ψ(yj )]
x∈X yj ∈Y

 ψ(yi ),
 if x ∈ Lagψc (yi )
<(c(x,yi )−ψ(yi ))
= inf
x∈X  z }| {
 c(x, y ) − ( c(x, y ) − ψ(y ) ), if x ∈ Lag c (y ) (j 6= i)
i j j ψ j
= ψ(yi ).
This allows to conclude that ψ is a c-convex function (see [San15, Proposition
1.34]).
21
Moreover, the converse of the theorem is also true:
Theorem 4. Let Y = {y1 , . . . , yn } be a set of n points. Let ψ be a c-concave
function defined on Y . Then the sets Lagψc (yi ) are not empty for all i =
1, . . . , n,.
Proof. Reasoning by contradiction, we suppose that ψ is a c-concave function
and there exist i0 ∈ {1, . . . , n} such that Lagψc (yi0 ) = ∅. Then, from definition
∀x ∈ X, ∃j ∈ {1, . . . , n} with j 6= i0 , and j > 0 such that

c(x, yi0 ) − ψ(yi0 ) ≥ c(x, yj ) − ψ(yj ) + j .
We will write x = xj . Thus,

c(xj , yi0 ) − inf [c(xj , yj ) − ψ(yj )]
yj ∈Y
(25)
≥ c(xj , yi0 ) − (c(xj , yi0 ) − ψ(yi0 ) − j )
= ψ(yi0 ) + j .
Therefore,

c c
(ψ ) (yi0 ) = inf c(x, yi0 ) − inf [c(x, yj ) − ψ(yj )]
x∈X yj ∈Y

= inf c(xj , yi0 ) − inf [c(xj , yj ) − ψ(yj )]
j yj ∈Y
= inf [ψ(yi0 ) + j ]
j
> ψ(yi0 ).
This contradicts the fact that ψ is a c-concave function, because [San15, Propo-
sition 1.34].
We now proceed to compute the first and second order derivatives of the
objective function F . These derivatives are useful in practice to design compu-
tational algorithms.
7.4 First-order derivatives of the objective function

Since it is concave, F admits a unique maximum ψ ∗ , characterized by ∇F (ψ ∗ ) =
0 where ∇F denotes the gradient.Let us now examine the form of the gradient
Pn
∇F = ∇ G(Aψ, ψ) + j=1 ψj νj . By replacing G(Aψ , ψ) with its expression
(19), one obtains:
∂G G(ψ + tej ) − G(ψ)
(ψ) = lim
∂ψj t→0 t
Z
1
= lim inf [c(x, y1 ) − ψ1 , . . . , c(x, yj ) − ψj − t, . . . , c(x, yn ) − ψn ]
t→0 t X
o
− inf [c(x, yi ) − ψi ] u(x)dx .
i
22
Let x ∈ Lagψc (ym ), then for t small enough we have
inf [c(x, y1 ) − ψ1 , . . . , c(x, yj ) − ψj − t, . . . , c(x, yk ) − ψn ]

= c(x, ym ) − ψm .
Indeed, since c(x, ym ) − ψm < c(x, yi ) − ψi for all i 6= m, in particular, if m 6= j:

c(x, ym ) − ψm < c(x, yj ) − ψj . Thus, for t small enough
c(x, ym ) − ψm ≤ c(x, yj ) − ψj − t.
In the case that m = j,
inf [c(x, y1 ) − ψ1 , . . . , c(x, yj ) − ψj − t, . . . , c(x, yk ) − ψk ]

= c(x, yj ) − ψj − t.
Consequently, we obtain that

Z
∂G
(ψ) = − u(x)dx.
∂ψj c (y )
Lagψ j
P
Recalling that the objective function F is given by F (ψ) = G(Aψ , ψ) + j νj ψj ,
we finally obtain: Z
∂F
= νj − u(x)dx. (26)
∂ψj
c (y )
Lagψ j
Since the objective function F is concave, it admits a unique maximum. At

this maximum, all the components of the gradient vanish, this implies that the
quantity of matter obtained at yj , that is the integral of u on the Laguerre cell
of yj , corresponds to the prescribed quantity of matter, that is νj .
7.5 Second order derivatives of the objective function

The coefficients of the Hessian matrix ∂ 2 F/∂ψi ∂ψj are slightly more difficult to
compute, since here, we cannot invoke the envelope theorem. We cannot avoid
invoking Reynold’s formula. We do not detail the computations for length
considerations, but give the final result.
In the particular case of the L2 cost, that is c(x, y) = 1/2kx − yk2 , the
second-order derivatives are given by:
∂2G
Z
u(x)
(ψ) = dS(x) (i 6= j),
∂ψi ∂ψj kyi − yj k
c c
Lagψ (yi )∩Lagψ (yj )
(27)
2 X ∂2G
∂ G
(ψ) = − (ψ).
∂ψj2 ∂ψi ∂ψj
i6=j
23
7.6 A computational algorithm for L2 semi-discrete opti-
mal transport
With the definition of F (ψ), the expression of its first order derivatives (gradient

∇F ) and second order derivatives (Hessian matrix ∇2 F = ∂ 2 F/∂ψi ∂ψj ij ),
we are now equipped to design a numerical solution mechanism that computes
semi-discrete optimal transport by maximizing F , based on a particular version
[KMT16] of Newton’s optimization method [NW06]:
Input: a mesh that supports the source density u
the points (yj )nj=1
the prescribed quantities (νj )nj=1
Output: the (unique)
R Laguerre diagram Lagψc such that:
u(x)dx = νj ∀j
c (y )
Lagψ j
(1) ψ ← [0 . . . 0]
(2) While convergence is not reached
(3) Compute ∇F and ∇2 F
(4) Find p ∈ Rn such that ∇2 F (ψ)p = −∇F (ψ)
(5) Find the descent parameter α
(6) ψ ← ψ + αp
(7) End while
The source measure is given by its density, that is a positive piecewise linear
function u, supported by a triangulated mesh (2D) or tetrahedral mesh (3D)
of a domain X. The target measure is discrete, and supported by the pointset
Y = (yj )nj=1 . Each target point will receive the prescribed quantity of matter
νj . Clearly,
R the prescriptions
P should be balanced with the available resource,
that is X u(x)dx = j νj . The algorithm computes for each point of the target
measure the subset of X that is affected to it through the optimal transport,
T −1 (yj ) = Lagψc (yj ), that corresponds to the Laguerre cell of yj . The Laguerre
diagram is completely determined by the vector ψ that maximizes F .
Line (2) needs a criterion for convergence. The classical convergence cri-
terion for a Newton algorithm uses the norm of the gradient of F . In our
case, the components of the gradient of F have a geometric meaning, since
∂F/∂ψj corresponds to the difference between the prescribed quantity νj asso-
ciated
R with j and the quantity of matter present in the Laguerre cell of yj given
by Lagc (yj )u(x)dx. Thus, we can decide to stop the algorithm as soon as the
ψ
largest absolute value of a component becomes smaller than a certain percent-
age of the smallest prescription. Thus we consider that convergence is reached if
max |∇Fj | < minj νj , for a user-defined (typically 1% in the examples below).
Line (3) computes the coefficients of the gradient and the Hessian matrix of
F , using (26) and (27). These computations involve integrals over the Laguerre
24
cells and over their boundaries. For the L2 cost c(x, y) = 1/2kx − yk2 , the
boundaries of the Laguerre cells are rectilinear, which dramatically simplifies
the computations of the Hessian coefficients (27). In addition, it makes it possi-
ble to use efficient algorithms to compute the Laguerre diagram [Bow81, Wat81].
Their implementation is available in several programming libraries, such as GE-
OGRAM11 et CGAL12 . Then one needs to compute the intersection between
each Laguerre cell and the mesh that supports the density u. This can be done
with specialized algorithms [Lév15], also available in GEOGRAM.
Line (4) finds the Newton step p by solving a linear system. We use the
Conjugate Gradient algorithm [HS52] with the Jacobi preconditioner. In our
empirical experiments below, we stopped the conjugate iterations as soon as
k∇2 F p + ∇F k/k∇F k < 10−3 .
Line (5) determines the descent parameter α. A result due to Mérigot and
Kitagawa [KMT16] ensures the convergence of the Newton algorithm if the mea-
sure of the smallest Laguerre cell remains larger than a certain threshold (that
is, half the smallest prescription νj ). There is also a condition on the norm of
the gradient k∇F k that we do not repeat here (the reader is referred to Mérigot
and Kitagawa’s original article for more details). In our implementation, start-
ing with α = 1, we iteratively divide α by two until both conditions are satisfied.
Let us now make one step backwards and think about the original definition
of Monge’s problem (M). We wish to stress that the initial constraint (local
mass conservation) that characterizes transport maps was terribly difficult. It
is remarkable that after several rewrites (Kantorovich relaxation, duality, c-
convexity), the final problem becomes as simple as optimizing a regular (C 2 )
concave function, for which computing the gradient and Hessian is easy in the
semi-discrete case and boils down to evaluating volumes and areas in a Laguerre
diagram. We wish also to stress that the computational algorithm did not re-
quire to make any approximation or discretization. The discrete, computer
version is a particular setting of the general theory, that fully describes not only
transport between smooth objects (functions), but also transport between less
regular objects, such as pointsets and triangulated meshes. This is made possi-
ble by the rich mathematical vocabulary (measures) on which optimal transport
theory acts. Thus, the computational algorithm is an elegant, direct verbatim
translation of the theory into a computer program.
We now show some computational results and list possible applications of

this algorithm.
11 http://alice.loria.fr/software/geogram/doc/html/index.html
12 http://www.cgal.org
25
Figure 12: A: transport between a uniform density and a random pointset; B:
transport between a varying density and the same pointset; C: intersections be-
tween meshes used to compute the coefficients; D: transport between a measure
supported by a surface and a 3D pointset.
8 Results, examples and applications of L2 semi-

discrete transport
Figure 12 shows some examples of transport in 2D, between a uniform density
and a pointset (A). Each Laguerre cell has the same area. Then we consider the
same pointset, but this time with a density u(x, y) = 10(1 + sin(2πx) sin(2πy))
(image B). We obtain a completely different Laguerre diagram. Each cell of this
diagram has the same value for the integrated density u. (C): the coefficients
of the gradient and the Hessian of F computed in the previous section involve
integrals of the density u over the Laguerre cells and over their boundaries. The
density u is supported by a triangulated mesh, and linearly interpolated on the
triangles. The integrals of u over the Laguerre cells and their boundaries are
evaluated in closed form, by computing the intersections between the Laguerre
cells and the triangles of the mesh that supports u. (D): the same algorithm can
compute the optimal transport between a measure supported by a 3D surface
and a pointset in 3D.
Figure 13 shows two examples of volume deformation by optimal transport.
The same algorithm is used, but this time with 3D Laguerre diagrams, and by
computing intersections between the Laguerre cells and a tetrahedral mesh that
supports the source density u. The intermediary steps of the animation are
generated by linear interpolation of the positions (Mc. Cann’s interpolation).
Figure 14 demonstrates a more challenging configuration, computing transport
between a sphere (surface) and a cube (volume). The sphere is approximated
with 10 million Dirac masses. As can be seen, transport has a singularity that
26
Figure 13: Interpolation of 3D volumes using optimal transport.
Figure 14: Computing the transport between objects of different dimension,

from a sphere to a cube. The sphere is sampled with 10 million points. The
cross-section reveals the formation of a singularity that has some similarities
with the medial axis of the cube.
Figure 15: An application of optimal transport to fluid simulation: numerical

simulation of the Taylor-Rayleigh instability using the Gallouet-Mérigot scheme.
27
Figure 16: Numerical simulation of an incompressible bi-phasic flow in a bottle.
resembles the medial axis of the cube (rightmost figure).

Figure 15 demonstrates an application in computational fluid dynamics. A
heavy incompressible fluid (in red) is placed on top of a lighter incompressible
fluid (in blue). Both fluids tend to exchange their positions, due to gravity, but
incompressibility is an obstacle to their movement. As a result, vortices are
created, and they become faster and faster (Taylor-Rayleigh instability). The
numerical simulation method that we used here [GM17] directly computes the
trajectories of particles (Lagrangian coordinates) while taking into account the
incompressibility constraint, which is in general unnatural in Lagrangian coor-
dinates. By providing a means of controlling the volumes of the Laguerre cells,
optimal transport appears here as an easy way of enforcing incompressibility.
Using some efficient geometric algorithms for the Laguerre diagram and its in-
tersections [Lév15], the Gallouet-Mérigot numerical scheme for incompressible
Euler scheme can be also applied in 3D to simulate the behavior of non-mixing
incompressible fluids (Figure 16). The 3D version of the Taylor-Rayleigh insta-
bility is shown in Figure 17, with 10 million Laguerre cells. More complicated
vortices are generated (shown here in cross-section).
The applications in computer animation and computational physics seem to

be promising, because the semi-discrete algorithm behaves very well in practice.
It is now reasonable to design simulation algorithms that solve a transport prob-
lem in each timestep (as our early computational fluid dynamics of the previous
paragraph do). With our optimized implementation that uses a multicore pro-
cessor for the geometric part of the algorithm and a GPU for solving the linear
systems, the semi-discrete algorithm takes no more than a few tens of seconds
to solve a problem with 10 millions unknown. This opens the door to numerical
solution mechanisms for difficult problems. For instance, in astrophysics, the
Early Universe Reconstruction problem [BFH+ 03] consists in going “backward
in time” from observation data on the repartition of galaxy clusters, to “play
the Big-Bang movie” backward. Our numerical experiments tend to confirm
Brenier’s point of view that he expressed in the 2000’s, that semi-discrete opti-
mal transport can be an efficient way of solving this problem.
To ensure that our results are reproducible, the source-code associated with
28
Figure 17: Top: Numerical simulation of the Taylor-Rayleigh instability using a
3D version of the Gallouet-Mérigot scheme, with a cross-section that reveals the
29
internal structure of the vortices. Bottom: a closeup that shows the interface
between the two fluids, represented by the Laguerre facets that bound two
Laguerre cells of different fluid elements.
the numerical solution mechanism used in all these experiments is available in
the EXPLORAGRAM component of the GEOGRAM programming library13 .
Acknowledgments
This research is supported by EXPLORAGRAM (Inria Exploratory Research
Grant). The authors wish to thank Quentin Mérigot, Yann Brenier, Jean-David
Benamou, Nicolas Bonneel and Lénaı̈c Chizat for many discussions.
References
[AG13] Luigi Ambrosio and Nicolas Gigli. A users guide to optimal trans-
port. Modelling and Optimisation of Flows on Networks, Lecture
Notes in Mathematics, pages 1–155, 2013.
[AHA92] Franz Aurenhammer, Friedrich Hoffmann, and Boris Aronov.
Minkowski-type theorems and least-squares partitioning. In Sym-
posium on Computational Geometry, pages 350–357, 1992.
[Ale05] A. D. Alexandrov. Intrinsic geometry of convex surfaces (transla-
tion of the 1948 Russian original). CRC Press, 2005.
[Aur87] Franz Aurenhammer. Power diagrams: Properties, algorithms and

applications. SIAM J. Comput., 16(1):78–96, 1987.
[BB00] Jean-David Benamou and Yann Brenier. A computational fluid
mechanics solution to the monge-kantorovich mass transfer prob-
lem. Numerische Mathematik, 84(3):375–393, 2000.
[BCMO14] Jean-David Benamou, Guillaume Carlier, Quentin Mérigot, and

Edouard Oudet. Discretization of functionals involving the
monge-ampère operator. arXiv, August 2014. [math.NA]
http://arxiv.org/abs/1408.4336.
[BDM09] Rainer Burkard, Mauro Dell’Amico, and Silvano Martello. As-
signment Problems. SIAM, 2009.
[BFH+ 03] Y. Brenier, U. Frisch, M. Henon, G. Loeper, S. Matarrese, R. Mo-
hayaee, and A. Sobolevskii. Reconstruction of the early uni-
verse as a convex optimization problem. arXiv, September 2003.
arXiv:astro-ph/0304214v3.
[Bow81] Adrian Bowyer. Computing dirichlet tessellations. Comput. J.,

24(2):162–166, 1981.
13 http://alice.loria.fr/software/geogram/doc/html/index.html
30
[Bre91] Yann Brenier. Polar factorization and monotone rearrangement
of vector-valued functions. Communications on Pure and Applied
Mathematics, 44:375–417, 1991.
[BvdPPH11] Nicolas Bonneel, Michiel van de Panne, Sylvain Paris, and Wolf-
gang Heidrich. Displacement interpolation using lagrangian mass
transport. ACM Trans. Graph., 30(6):158, 2011.
[Caf03] Luis Caffarelli. The monge-ampère equation and optimal trans-
portation, an elementary review. Optimal transportation and ap-
plications (Martina Franca, 2001), Lecture Notes in Mathematics,
pages 1–10, 2003.
[Cut13] Marco Cuturi. Sinkhorn distances: Lightspeed computation of

optimal transport. In Advances in Neural Information Process-
ing Systems 26: 27th Annual Conference on Neural Information
Processing Systems 2013. Proceedings of a meeting held December
5-8, 2013, Lake Tahoe, Nevada, United States., pages 2292–2300,
2013.
[dGWH+ 15] Fernando de Goes, Corentin Wallez, Jin Huang, Dmitry Pavlov,
and Mathieu Desbrun. Power particles: an incompressible fluid
solver based on power diagrams. ACM Trans. Graph., 34(4):50:1–
50:11, 2015.
[GM96] Wilfrid Gangbo and Robert J. McCann. The geometry of optimal

transportation. Acta Math., 177(2):113–161, 1996.
[GM17] Thomas O. Gallouët and Quentin Mérigot. A lagrangian scheme
à la Brenier for the incompressible euler equations. Foundations
of Computational Mathematics, May 2017.
[HS52] Magnus R. Hestenes and Eduard Stiefel. Methods of Conjugate

Gradients for Solving Linear Systems. Journal of Research of the
National Bureau of Standards, 49(6):409–436, December 1952.
[KMT16] Jun Kitagawa, Quentin Mérigot, and Boris Thibert. A new-
ton algorithm for semi-discrete optimal transport. CoRR,
abs/1603.05579, 2016.
[Leo13] Christian Leonard. A survey of the schrödinger problem and some
of its connections with optimal transport. arXiv, August 2013.
[math.PR] http://arxiv.org/abs/1308.0215.
[Lév15] Bruno Lévy. A numerical algorithm for L2 semi-discrete opti-

mal transport in 3d. ESAIM M2AN (Mathematical Modeling and
Analysis), 2015.
31
[McC95] Robert J. McCann. Existence and uniqueness of monotone
measure-preserving maps. Duke Mathematical Journal, 80(2):309–
323, 1995.
[Mém11] Facundo Mémoli. Gromov-wasserstein distances and the met-
ric approach to object matching. Foundations of Computational
Mathematics, 11(4):417–487, 2011.
[Mér11] Quentin Mérigot. A multiscale approach to optimal transport.
Comput. Graph. Forum, 30(5):1583–1592, 2011.
[MMT17] Quentin Mérigot, Jocelyn Meyron, and Boris Thibert. Light in
power: A general and parameter-free algorithm for caustic design.
CoRR, abs/1708.04820, 2017.
[Mon84] Gaspard Monge. Mémoire sur la théorie des déblais et des rem-
blais. Histoire de l’Acadmie Royale des Sciences (1781), pages
666–704, 1784.
[NW06] J. Nocedal and S. J. Wright. Numerical Optimization. Springer,

New York, 2nd edition, 2006.
[PPO14] Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet. Optimal
Transport with Proximal Splitting. SIAM Journal on Imaging
Sciences, 7(1):212–238, January 2014.
[San14] Filippo Santambrogio. Introduction to Optimal Transport The-

ory. In Optimal Transport, Theory and Applications, August 2014.
[math.PR] http://arxiv.org/abs/1009.3856.
[San15] Filippo Santambrogio. Optimal transport for applied mathemati-
cians, volume 87 of Progress in Nonlinear Differential Equations
and their Applications. Birkhäuser/Springer, Cham, 2015. Calcu-
lus of variations, PDEs, and modeling.
[STTP14] Yuliy Schwartzburg, Romain Testuz, Andrea Tagliasacchi, and
Mark Pauly. High-contrast computational caustic design. ACM
Trans. Graph., 33(4):74:1–74:11, 2014.
[Tao11] Terence Tao. An Introduction to Measure Theory. American Math-

ematical Society, 2011.
[Vil09] Cdric Villani. Optimal transport : old and new. Grundlehren der
mathematischen Wissenschaften. Springer, Berlin, 2009.
[Wat81] David Watson. Computing the n-dimensional delaunay tessella-

tion with application to voronoi polytopes. Comput. J., 24(2):167–
172, 1981.
32

Notions of Optimal Transport Theory and How To

Uploaded by

Copyright:

Available Formats

Notions of Optimal Transport Theory and How To

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notions of Optimal Transport Theory and How To

Uploaded by

Copyright:

Available Formats

Notions of optimal transport theory and how to

implement them on a computer

Bruno Lévy and Erica Schwindt

Optimal transport, initially studied by Monge, [Mon84], is a very general

Figure 2: Interpolating between two functions: linear interpolation makes

domains. In particular, it is a natural formulation for several fundamental

Comparing functions Consider the functions f1 , f2 and f3 in Figure 1. Here

Interpolating between functions: Now consider the functions u and v in

consists in reconstructing what happened between t = 0 and t = 1. If we use

Optimal transport makes it possible to define a geometry of a space of func-

2.2 Monge’s problem with measures

In its version with measures, Monge’s problem can be stated as follows:

by Monge’s problem is non-symmetric, and this causes additional difficulties

3 Kantorovich’s relaxed problem

Π(µ, ν) = {γ ∈ P(X × Y ) | (PX )]γ = µ ; (PY )]γ = ν}

(one retrieves the initial Monge problem).

To further help grasping an intuition of this notion of transport plan, we

• when does an optimal transport plan exist ?

4 Kantorovich’s dual problem

4.1 The discrete Kantorovich problem

L(ϕ, ψ) =< C, γ > − < ϕ, Px γ − U > − < ψ, Py γ − V >

that takes as arguments two vectors, ϕ in Rm and ψ in Rn . The function L

Indeed, if for instance a component i of Px γ is non-zero, one can make L

= sup [< ϕ, U > + < ψ, V >] . (6)

4.3 The Kantorovich dual in the continuous setting

f c (x) = inf [c(x, y) − f (y)] , x ∈ X.

In terms of the previous intuitive image, this means that by replacing ϕ

The optimal value gives an interesting information, that is the minimum

respect to adding a constant to ψ.

Figure 9: Different types of transport, with continuous and discrete measures

Observation 4. If c(x, y) = 1/2kx − yk2 and ϕ ∈ Ψc (X), then ϕ̄ : x 7→ ϕ̄(x) =

(1 − t)x1 + tT (x1 ) = (1 − t)x2 + tT (x2 ).

Since T = ∇ϕ̄, we can rewrite the last equality as

(1 − t)(x1 − x2 ) + t(∇ϕ̄(x1 ) − ∇ϕ̄(x2 )) = 0.

(1 − t)kx1 − x2 k2 + t(∇ϕ̄(x1 ) − ∇ϕ̄(x2 )) · (x1 − x2 ) = 0.

6 Continuous, discrete and semi-discrete trans-

6.1 The continuous → continuous case and Monge-Ampère

∀x ∈ X, u(x) = |det JT (x)| v(T (x)). (11)

By injecting T = ∇ϕ̄ and JT = H ϕ̄ into (11), one obtains:

∀x ∈ X, u(x) = |det H ϕ̄(x)| v(∇ϕ̄(x)), (12)

6.2 The discrete → discrete case

6.3 The continuous → discrete case

that has a singularity on the medial axis.

7.1 Semi-discrete Monge problem

A transport map T associates with each point x of X one of the points yj .

In our semi-discrete case, the functional becomes a function of n variables,

F (ψ) = F (ψ1 , ψ2 , . . . ψn ) (14)

Lagψc (yj ) = {x ∈ X | c(x, yj ) − ψj ≤ c(x, yk ) − ψk , ∀k 6= j} .

and parameterized by an assignment A : X → [1 . . . n], that is, a function that

Aψ (x) = arg min [c(x, yj ) − ψj ] .

because this set has zero measure.

7.3 The semi-discrete optimal transport map

= {yk | c(x, yj ) − ψj + ψk = c(x, yk )} (22)

To summarize, the optimal transport map T moves each point x to the

∀x ∈ X, ∃j ∈ {1, . . . , n} with j 6= i0 , and j > 0 such that

7.4 First-order derivatives of the objective function

inf [c(x, y1 ) − ψ1 , . . . , c(x, yj ) − ψj − t, . . . , c(x, yk ) − ψn ]

Indeed, since c(x, ym ) − ψm < c(x, yi ) − ψi for all i 6= m, in particular, if m 6= j:

In the case that m = j,

inf [c(x, y1 ) − ψ1 , . . . , c(x, yj ) − ψj − t, . . . , c(x, yk ) − ψk ]

∀x ∈ X, ∃j ∈ {1, . . . , n} with j 6= i0 , and j > 0 such that