0% found this document useful (0 votes)
45 views65 pages

Survey Grad Flowsmtric

This document provides an overview of gradient flows, specifically those partial differential equations (PDEs) that can be interpreted as gradient flows for the Wasserstein metric on the space of probability measures. It begins by introducing gradient flows in the Euclidean setting and then generalizes to metric spaces. Next, it focuses on gradient flows in the Wasserstein space, providing introductions to optimal transport and the Wasserstein distances. It describes the Jordan-Kinderleher-Otto scheme for minimizing movement schemes and analyzes the Fokker-Planck equation as a gradient flow. The document concludes by discussing heat flow in metric measure spaces and the relationship between Dirichlet and Cheeger energies, comparing different

Uploaded by

VAHID VAHID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views65 pages

Survey Grad Flowsmtric

This document provides an overview of gradient flows, specifically those partial differential equations (PDEs) that can be interpreted as gradient flows for the Wasserstein metric on the space of probability measures. It begins by introducing gradient flows in the Euclidean setting and then generalizes to metric spaces. Next, it focuses on gradient flows in the Wasserstein space, providing introductions to optimal transport and the Wasserstein distances. It describes the Jordan-Kinderleher-Otto scheme for minimizing movement schemes and analyzes the Fokker-Planck equation as a gradient flow. The document concludes by discussing heat flow in metric measure spaces and the relationship between Dirichlet and Cheeger energies, comparing different

Uploaded by

VAHID VAHID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65



Euclidean, Metric, and Wasserstein Gradient Flows:


an overview
Filippo Santambrogio∗

Abstract
This is an expository paper on the theory of gradient flows, and in particular of those PDEs which
can be interpreted as gradient flows for the Wasserstein metric on the space of probability measures
(a distance induced by optimal transport). The starting point is the Euclidean theory, and then its
generalization to metric spaces, according to the work of Ambrosio, Gigli and Savaré. Then comes
an independent exposition of the Wasserstein theory, with a short introduction to the optimal transport
tools that are needed and to the notion of geodesic convexity, followed by a precise desciption of the
Jordan-Kinderleher-Otto scheme, with proof of convergence in the easiest case: the linear Fokker-
Planck equation. A discussion of other gradient flows PDEs and of numerical methods based on
these ideas is also provided. The paper ends with a new, theoretical, development, due to Ambrosio,
Gigli, Savaré, Kuwada and Ohta: the study of the heat flow in metric measure spaces.

AMS Subject Classification (2010): 00-02, 34G25, 35K05, 49J45, 49Q20, 49M29, 54E35

Keywords: Cauchy problem, Subdifferential, Analysis in metric spaces, Optimal transport, Wasserstein
distances, Heat flow, Fokker-Planck equation, Numerical methods, Contractivity, Metric measure spaces

Contents
1 Introduction 2

2 From Euclidean to Metric 4


2.1 Gradient flows in the Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 An introduction to the metric setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 The general theory in metric spaces 13


3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Existence of a gradient flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Uniqueness and contractivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Laboratoire de Mathématiques d’Orsay, Univ. Paris-Sud, CNRS, Université Paris-Saclay, 91405 Orsay Cedex, France,
[email protected], http://www.math.u-psud.fr/∼santambr

1
4 Gradient flows in the Wasserstein space 19
4.1 Preliminaries on Optimal Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 The Wasserstein distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Minimizing movement schemes in the Wasserstein space and evolution PDEs . . . . . . 29
4.4 Geodesic convexity in W2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Analysis of the Fokker-Planck equation as a gradient flow in W2 . . . . . . . . . . . . . 37
4.6 Other gradient-flow PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8 Numerical methods from the JKO scheme . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 The heat flow in metric measure spaces 56


5.1 Dirichlet and Cheeger energies in metric measure spaces . . . . . . . . . . . . . . . . . 57
5.2 A well-posed gradient flow for the entropy . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Gradient flows comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Bibliography 60

1 Introduction
Gradient flows, or steepest descent curves, are a very classical topic in evolution equations: take a
functional F defined on a vector space X, and, instead of looking at points x minizing F (which is related
to the statical equation ∇F(x) = 0), we look, given an initial point x0 , for a curve starting at x0 and trying
to minimize F as fast as possible (in this case, we will solve equations of the form x0 (t) = −∇F(x(t))).
As we speak of gradients (which are element of X, and not of X 0 as the differential of F should be), it
is natural to impose that X is an Hilbert space (so as to identify it with its dual and produce a gradient
vector). In the finite-dimensional case, the above equation is very easy to deal with, but also the infinite-
dimensional case is not so exotic. Indeed, just think at the evolution equation ∂t u = ∆u, which is the
evolution variant of the statical Laplace equation −∆u = 0. In this
R way, the Heat equation is the gradient
flow, in the L2 Hilbert space, of the Dirichlet energy F(u) = 21 |∇u|2 , of which −∆u is the gradient in
the appropriate sense (more generally, one could consider equations of the form ∂t u = δF/δu, where this
notation stands for the first variation of F).
But this is somehow classical. . . The renovated interest for the notion of gradient flow arrived be-
tween the end of the 20th century and the beginning of the 21st, with the work of Jordan, Kinderleherer
and Otto ([55]) and then of Otto [76], who saw a gradient flow structure in some equations of the form
∂t % − ∇ · (%v) = 0, where the vector field v is given by v = ∇[δF/δ%]. This requires to use the space of
probabilities % on a given domain, and to endow it with a non-linear metric structure, derived from the
theory of optimal transport. This theory, initiated by Monge in the 18th century ([75]), then developed
by Kantorovich in the ’40s ([56]), is now well-established (many texts present it, such as [89, 90, 84])
and is intimately connected with PDEs of the form of the continuity equation ∂t % − ∇ · (%v) = 0.
The turning point for the theory of gradient flows and for the interest that researchers in PDEs de-
veloped for it was for sure the publication of [6]. This celebrated book established a whole theory on
the notion of gradient flow in metric spaces, which requires careful definitions because in the equation
x0 (t) = −∇F(x(t)), neither the term x0 nor ∇F make any sense in this framework. For existence and -
mainly - uniqueness results, the notion of geodesic convexity (convexity of a functional F defined on a

2
metric space X, when restricted to the geodesic curves of X) plays an important role. Then the theory is
particularized in the second half of [6] to the case of the metric space of probability measures endowed
with the so-called Wasserstein distance coming from optimal transport, whose differential structure is
widely studied in the book. In this framework, the geodesic convexity results that McCann obtained in
[69] are crucial to make a bridge from the general to the particular theory.
It is interesting to observe that, finally, the heat equation turns out to be a gradient flow in twoR different
senses: it is the gradient flow of the Dirichlet energy in the L2 space, but also of the entropy % log(%)
in the Wasserstein space. Both frameworks can be adapted from the particular case of probabilities on
a domain Ω ⊂ Rd to the more general case of metric measure spaces, and the question whether the two
flows coincide, or under which assumptions they do, is natural. It has been recently studied by Ambrosio,
Gigli, Savaré and new collaborators (Kuwada and Ohta) in a series of papers ([49, 51, 9]), and has been
the starting point of recent researches on the differential structure of metric measure spaces.
The present survey, which is an extended, updated, and English version of a Bourbaki seminar given
by the author in 2013 ([83]; the reader will also remark that most of the extensions are essentially taken
from [84]), aims at giving an overview of the whole theory. In particular, among the goals, there is at
the same time to introduce the tools for studying metric gradient flows, but also to see how to deal with
Wasserstein gradient flows without such a theory. This could be of interest for applied mathematicians,
who could be more involved in the specific PDEs that have this gradient flow form, without a desire
for full generality; for the same audience, a section has been added about numerical methods inspired
from the so-called JKO (Jordan-Kinderleherer-Otto) scheme, and one on a list of equations which fall
into these framework. De facto, more than half of the survey is devoted to the theory in the Wasserstein
spaces and full proofs are given in the easiest case.
The paper is organized as follows: after this introduction, Section 2 exposes the theory in the Eu-
clidean case, and presents which are the good definitions which can be translated into a metric setting;
Section 3 is devoted to the general metric setting, as in the first half of [6], and is quite expository (only
the key ingredients to obtain the proofs are sketched); Section 4 is the longest one and develops the
Wasserstein setting: after an introduction to optimal transport and to the Wasserstein distances, there
in an informal presentation of the equations that can be obtained as gradient flows, a discussion of the
functionals which have geodesic convexity properties, a quite precise proof of convergence in the linear
case of the Fokker-Planck equation, a discussion about the other equations and functionals which fit the
framework and about boundary conditions, and finally a short section about numerics. Last but not least,
Section 5 gives a very short presentation of the fascinating topic of heat flows in arbitraty metric measure
spaces, with reference to the interesting implications that this has in the differential structure of these
spaces.
This survey is meant to be suitable for readers with different backgrounds and interests. In particular,
the reader who is mainly interested gradient flows in the Wasserstein space and in PDE applications can
decide to skip sections 3, 4.4 and 5, which deal on the contrary with key objects for the - very lively at
the moment - subject of analysis on metric measure spaces.

3
2 From Euclidean to Metric
2.1 Gradient flows in the Euclidean space
Before dealing with gradient flows in general metric spaces, the best way to clarify the situation is to
start from the easiest case, i.e. what happens in the Euclidean space Rn . Most of what we will say stays
true in an arbitrary Hilbert space, but we will stick to the finite-dimensional case for simplicity.
Here, given a function F : Rn → R, smooth enough, and a point x0 ∈ Rn , a gradient flow is just
defined as a curve x(t), with starting point at t = 0 given by x0 , which moves by choosing at each instant
of time the direction which makes the function F decrease as much as possible. More precisely, we
consider the solution of the Cauchy Problem

 x0 (t) = −∇F(x(t)) for t > 0,


(2.1)
 x(0) = x0 .

This is a standard Cauchy problem which has a unique solution if ∇F is Lipschitz continuous, i.e. if
F ∈ C 1,1 . We will see that existence and uniqueness can also hold without this strong assumption, thanks
to the variational structure of the equation.
A first interesting property is the following, concerning uniqueness and estimates. We will present
it in the case where F is convex, which means that it could be non-differentiable, but we can replace
the gradient with the subdifferential. More precisely, we can consider instead of (2.1), the following
differential inclusion: we look for an absolutely continuous curve x : [0, T ] → Rn such that

 x0 (t) ∈ −∂F(x(t)) for a.e. t > 0,


(2.2)
 x(0) = x0 ,

where ∂F(x) = {p ∈ Rn : F(y) ≥ F(x) + p · (y − x) for all y ∈ Rn }. We refer to [80] for all the
definitions and notions from convex analysis that could be needed in the following, and we recall that, if
F is differentiable at x, we have ∂F(x) = {∇F(x)} and that F is differentiable at x if and only if ∂F is a
singleton. Also note that ∂F(x) is always a convex set, and is not empty whenever F is real-valued (or x
is in the interior of {x : F(x) < +∞}), and we denote by ∂◦ F(x) its element of minimal norm.

Proposition 2.1. Suppose that F is convex and let x1 and x2 be two solutions of (2.2). Then we have
|x1 (t)− x2 (t)| ≤ |x1 (0)− x2 (0)| for every t. In particular this gives uniqueness of the solution of the Cauchy
problem.

Proof. Let us consider g(t) = 12 |x1 (t) − x2 (t)|2 and differentiate it. We have

g0 (t) = (x1 (t) − x2 (t)) · (x10 (t) − x20 (t)).

Here we use a basic property of gradient of convex functions, i.e. that for every x1 , x2 , p1 , p2 with
pi ∈ ∂F(xi ), we have
(x1 − x2 ) · (p1 − p2 ) ≥ 0.
From these considerations, we obtain g0 (t) ≤ 0 and g(t) ≤ g(0). This gives the first part of the claim.
Then, if we take two different solutions of the same Cauchy problem, we have x1 (0) = x2 (0), and
this implies x1 (t) = x2 (t) for any t > 0. 

4
We can also stuy the case where F is semi-convex. We recall that F semi-convex means that it is
λ-convex for some λ ∈ R i.e. x 7→ F(x) − λ2 |x|2 is convex. For λ > 0 this is stronger than convexity, and
for λ < 0 this is weaker. Roughly speaking, λ-convexity corresponds to D2 F ≥ λI. Functions which are
λ-convex for some λ are called semi-convex. The reason of the interest towards semi-convex functions
lies in the fact that on the one hand, as the reader will see throughout the exposition, the general theory
of gradient flows applies very well to this class of functions and that, on the other hand, they are general
enough to cover many interesting cases. In particular, on a bounded set, all smooth (C 2 is enough)
functions are λ-convex for a suitable λ < 0.
For λ-convex functions, we can define their subdifferential as follows
 λ 
∂F(x) = p ∈ Rn : F(y) ≥ F(x) + p · (y − x) + |y − x|2 for all y ∈ Rn .
2
This definition is consistent with the above one whenever λ ≥ 0 (and guarantees ∂F(x) , ∅ for λ < 0).
Also, one can check that, setting F̃(x) = F(x) − λ2 |x|2 , this definition coincides with {p ∈ Rn : p − λx ∈
∂F̃(x)}. Again, we define ∂◦ F the element of minimal norm of ∂F.

Remark 2.1. From the same proof of Proposition 2.1, one can also deduce uniqueness and stability
estimates in the case where F is λ-convex. Indeed, in this case we obtain |x1 (t)−x2 (t)| ≤ |x1 (0)−x2 (0)|e−λt ,
which also proves, if λ > 0, exponential convergence to the unique minimizer of F. The key point is that,
if F is λ-convex it is easy to prove that x1 , x2 , p1 , p2 with pi ∈ ∂F(xi ) provides

(x1 − x2 ) · (p1 − p2 ) ≥ λ|x1 − x2 |2 .

This implies g0 (t) ≤ −2λg(t) and allows to conclude, by Gronwall’s lemma, g(t) ≤ g(0)e−2λt . For the
exponential convergence, if λ > 0 then F is coercive and admits a minimizer, which is unique by strict
convexity. Let us call it x̄. Take a solution x(t) and compare it to the constant curve x̄, which is a solution
since 0 ∈ ∂F( x̄). Then we get |x1 (t) − x̄| ≤ e−λt |x1 (0) − x̄|.

Another well-known fact about the λ-convex case is the fact that the differential inclusion x0 (t) ∈
−∂F(x(t)) actually becomes, a.e., an equality: x0 (t) = −∂◦ F(t). More precisely, we have the following.

Proposition 2.2. Suppose that F is λ-convex and let x be a solutions of (2.2). Then, for all the times
t0 such that both t 7→ x(t) and t 7→ F(x(t)) are differentiable at t = t0 , the subdifferential ∂F(x(t0 )) is
contained in a hyperplane orthogonal to x0 (t0 ). In particular, we have x0 (t) = −∂◦ F(x(t)) for a.e. t.

Proof. Let t0 be as in the statement, and p ∈ ∂F(x(t0 )). From the definition of subdifferential, for every t
we have
λ
F(x(t)) ≥ F(x(t0 )) + p · (x(t) − x(t0 )) + |x(t) − x(t0 )|2 ,
2
but this inequality becomes an equality for t = t0 . Hence, the quantity
λ
F(x(t)) − F(x(t0 )) − p · (x(t) − x(t0 )) − |x(t) − x(t0 )|2
2
is minimal for t = t0 and, differentiating in t (which is possible by assumption), we get
d
F(x(t))|t=t0 = p · x0 (t0 ).
dt

5
Since this is true for every p ∈ ∂F(x(t0 )), this shows that ∂F(x(t0 )) is contained in a hyperplane of the
form {p : p · x0 (t0 ) = const}.
Whenever x0 (t0 ) belongs to ∂F(x(t0 )) (which is true for a.e. t0 ), this shows that x0 (t0 ) is the orthog-
onal projection of 0 onto ∂F(x(t0 )) and onto the hyperplane which contains it, and hence its element of
minimal norm. This provides x0 (t0 ) = −∂◦ F(x(t0 )) for a.e. t0 , as the differentiability of x and of F ◦ x are
also true a.e., since x is supposed to be absolutely continuous and F is locally Lipschitz. 

Another interesting feature of those particular Cauchy problems which are gradient flows is their
discretization in time. Actually, one can fix a small time step parameter τ > 0 and look for a sequence of
points (xkτ )k defined through the iterated scheme, called Minimizing Movement Scheme,

τ
|x − xkτ |2
xk+1 ∈ argmin x F(x) + . (2.3)

We can forget now the convexity assumption on F, which are not necessary for this part of the analysis.
Indeed, very mild assumptions on F (l.s.c. and some lower bounds, for instance F(x) ≥ C1 − C2 |x|2 ) are
sufficient to guarantee that these problems admit a solution for small τ. The case where F is λ-convex is
covered by these assumptions, and also provides uniqueness of the minimizers. This is evident if λ > 0
since we have strict convexity for every τ, and if λ is negative the sum will be strictly convex for small τ.
We can interpret this sequence of points as the values of the curve x(t) at times t = 0, τ, 2τ, . . . , kτ, . . . .
It happens that the optimality conditions of the recursive minimization exactly give a connection between
these minimization problems and the equation, since we have

τ
|x − xkτ |2 τ
τ − xτ
xk+1
xk+1 ∈ argmin F(x) + ⇒ ∇F(xk+1 )+ k
= 0,
2τ τ
i.e. τ − xτ
xk+1 τ
= −∇F(xk+1
k
).
τ
This expression is exactly the discrete-time implicit Euler scheme for x0 = −∇F(x)! (note that in the
xτ −xτ τ )).
convex non-smooth case this becomes k+1τ k ∈ −∂F(xk+1
We recall that, given an ODE x (t) = v(x(t)) (that we take autonomous for simplicity), with given
0

initial datum x(0) = x0 , Euler schemes are time-discretization where derivatives are replaced by finite
differences. We fix a time step τ > 0 and define a sequence xkτ . The explicit scheme is given by
τ
xk+1 = xkτ + τv(xkτ ), x0τ = x0 ,

while the implicit scheme is given by


τ
xk+1 = xkτ + τv(xk+1
τ
), x0τ = x0 .

This means that xk+1τ is selected as a solution of an equation involving xkτ , instead of being explicitly
computable from xkτ . The explicit scheme is obviously easier to implement, but enjoys less stability and
qualitative properties than the implicit one. Suppose for instance v = −∇F: then the quantity F(x(t))
decreases in t in the continuous solution, which is also the case for the implicit scheme, but not for the
explicit one (which represents the iteration of the gradient method for the minimization of F). Note that

6
the same can be done for evolution PDEs, and that solving the Heat equation ∂t % = ∆%t by an explicit
scheme is very dangerous: at every step, %τk+1 would have two degrees of regularity less than %τk , since it
would be obtained through %τk+1 = %τk − τ∆%τk .
It is possible to prove that, for τ → 0, the sequence we found, suitably interpolated, converges to the
solution of Problem (2.2). We give here below the details of this argument, as it will be the basis of the
argument that we will use in Section 4.
First, we define two different interpolations of the points xkτ . Let us define two curves xτ , x̃τ : [0, T ] →
n
R as follows: first we define τ − xτ
xk+1
τ
vk+1 := k
,
τ
then we set
xτ (t) = xk+1
τ
x̃τ (t) = xkτ + (t − kτ)vτk+1 for t ∈]kτ, (k + 1)τ].
Also set
vτ (t) = vτk+1 for t ∈]kτ, (k + 1)τ].
It is easy to see that x̃τ is a continuous curve, piecewise affine (hence absolutely continuous), satisfying
( x̃τ )0 = vτ . On the contrary, xτ is not continuous, but satisfies by construction vτ (t) ∈ −∂F(xτ (t)).
The iterated minimization scheme defining xk+1 τ provides the estimate
τ − x τ |2
|xk+1
τ
F(xk+1 )+ k
≤ F(xkτ ), (2.4)

τ to the previous one. If F(x ) < +∞ and inf F > −∞, summing
obtained comparing the optimal point xk+1 0
over k we get
` τ − x τ |2
|xk+1
≤ F(x0τ ) − F(x`+1
τ
X  
k
) ≤ C. (2.5)
k=0

This is valid for every `, and we can arrive up to ` = bT/τc. Now, note that
τ − x τ |2
|xk+1 τ − xτ |
|xk+1
!2 Z (k+1)τ
k
=τ k
= τ|vτk |2 = |( x̃τ )0 (t)|2 dt.
2τ 2τ kτ

This means that we have Z T


1 τ0 2
|( x̃ ) (t)| dt ≤ C (2.6)
0 2
and hence x̃τ is bounded in H 1 and vτ in L2 . The injection H 1 ⊂ C 0,1/2 provides an equicontinuity bound
on x̃τ of the form
| x̃τ (t) − x̃τ (s)| ≤ C|t − s|1/2 . (2.7)
This also implies

| x̃τ (t) − xτ (t)| ≤ Cτ1/2 , (2.8)


since xτ (t) = x̃τ (s) for a certain s = kτ with |s − t| ≤ τ.
This provides the necessary compactness to prove the following.

7
Proposition 2.3. Let x̃τ , xτ and vτ be constructed as above using the minimizing movement scheme.
Suppose F(x0 ) < +∞ and inf F > −∞. Then, up to a subsequence τ j → 0 (still denoted by τ), both x̃τ
and xτ converge uniformly to a same curve x ∈ H 1 , and vτ weakly converges in L2 to a vector function
v, such that x0 = v and

1. if F is λ-convex, we have v(t) ∈ −∂F(x(t)) for a.e. t, i.e. x is a solution of (2.2);

2. if F is C 1 , we have v(t) = −∇F(x(t)) for all t, i.e. x is a solution of (2.1).

Proof. Thanks to the estimates (2.6) and (2.7) and the fact that the initial point x̃τ (0) is fixed, we can
apply Ascoli-Arzelà’s theorem to x̃τ and get a uniformly converging subsequence. The estimate(2.8)
implies that, on the same subsequence, xτ also converges uniformly to the same limit, that we will call
x = [0, T ] → Rn . Then, vτ = ( x̃τ )0 and (2.6) allow to guarantee, up to an extra subsequence extraction, the
weak convergence vτ * v in L2 . The condition x0 = v is automatical as a consequence of distributional
convergence.
To prove 1), we will fix a point y ∈ Rn and write
λ
F(y) ≥ F(xτ (t)) + vτ (t) · (y − xτ (t)) + |y − xτ (t)|2 .
2
We then multiply by a positive measurable function a : [0, T ] → R+ and integrate:

λ
Z T  
a(t) F(y) − F(xτ (t)) − vτ (t) · (y − xτ (t)) + |y − xτ (t)|2 dt ≥ 0.
0 2

We can pass to the limit as τ → 0, using the uniform (hence L2 strong) convergence xτ → x and the
weak convergence vτ * v. In terms of F, we just need its lower semi-continuity. This provides

λ
Z T  
a(t) F(y) − F(x(t)) − v(t) · (y − x(t)) + |y − x(t)|2 dt ≥ 0.
0 2
From the arbitrariness of a, the inequality
λ
F(y) ≥ F(x(t)) + v(t) · (y − x(t)) − |y − x(t)|2
2
is true for a.e. t (for fixed y). Using y in a dense countable set in the interior of {F < +∞} (where F is
continuous), we get v(t) ∈ ∂F(x(t)).
To prove 2), the situation is easier. Indeed we have

−∇F(xτ (t)) = vτ (t) = ( x̃τ )0 (t).

The first term in the equality uniformly converges, as a function of t, (since ∇F is continuous and xτ lives
in a compact set) to −∇F(x), the second weakly converges to v and the third to x0 . This proves the claim,
and the equality is now true for every t as the function t 7→ −∇F(x(t)) is uniformly continuous. 

In the above result, we only proved convergence of the curves xτ to the limit curve x, solution of
x0= −∇F(x) (or −x0 ∈ ∂F(x)), but we gave no quantitative order of convergence, and we will not study
such an issue in the rest of the survey neither. On the contrary, the book [6] which will be the basis for

8
the metric case, also provides explicit estimates; these estimates are usually of order τ. An interesting
observation, in the Euclidean case, is that if the sequence xkτ is defined by

x + xkτ |x − xkτ |2
!
τ
xk+1 ∈ argmin x 2F + ,
2 2τ
then we have τ − xτ
xk+1 x + xkτ
!
k
= −∇F ,
τ 2
and the convergence is of order τ2 . This has been used in the Wasserstein case1 (see Section 4 and in
particular Section 4.8) in [61].

2.2 An introduction to the metric setting


The iterated minimization scheme that we introduced above has another interesting feature: it even
suggests how to define solutions for functions F which are only l.s.c., with no gradient at all!
Even more, a huge advantage of this discretized formulation is also that it can easily be adapted to
metric spaces. Actually, if one has a metric space (X, d) and a l.s.c. function F : X → R ∪ {+∞} (under
suitable compactness assumptions to guarantee existence of the minimum), one can define

τ
d(x, xkτ )2
xk+1 ∈ argmin x F(x) + (2.9)

and study the limit as τ → 0. Then, we use the piecewise constant interpolation

xτ (t) := xkτ for every t ∈](k − 1)τ, kτ] (2.10)

and study the limit of xτ as τ → 0.


De Giorgi, in [38], defined what he called Generalized Minimizing Movements2 :

Definition 2.1. A curve x : [0, T ] → X is called Generalized Minimizing Movements (GMM) if there
exists a sequence of time steps τ j → 0 such that the sequence of curves xτ j defined in (2.10) using the
iterated solutions of (2.9) uniformly converges to x in [0, T ].

The compactness results in the space of curves guaranteeing the existence of GMM are also a conse-
quence of an Hölder estimate that we already saw in the Euclidean case. Yet, in the general case, some
arrangements are needed, as we cannot use the piecewise affine interpolation. We will see later that, in
case the segments may be replaced by geodesics, a similar estimate can be obtained. Yet, we can also
obtain a Hölder-like estimate from the piecewise constant interpolation.
We start from
τ , x τ )2
d(xk+1
τ
F(xk+1 )+ k
≤ F(xkτ ), (2.11)

1
The attentive reader can observe that, setting y := (x + xkτ )/2, this minimization problem becomes miny 2F(y) + 2|y − xkτ |2 /τ.
Yet, when acting on a metric space, or simply on a manifold or a bounded domain, there is an extra constraint on y: the point
y must be the middle point of a geodesic between xkτ and a point x (on a sphere, for instance, this means that if xkτ is the North
Pole, then y must lie in the northern emisphere).
2
We prefer not to make any distinction here between Generalized Minimizing Movements and Minimizing Movements.

9
and
l
τ
, xkτ )2 ≤ 2τ F(x0τ ) − F(xl+1
τ
X  
d(xk+1 ) ≤ Cτ.
k=0

The Cauchy-Schwartz inequality gives, for t < s, t ∈ [kτ, (k + 1)τ[ and s ∈ [lτ, (l + 1)τ[ (which implies
|l − k| ≤ |t−s|
τ + 1),

l
 l 1/2
 |t − s| !1/2 √ 
τ τ τ τ τ τ
X X 
d(x (t), x (s)) ≤ d(xk+1 , xk ) ≤  d(xk+1 , xk ) 
 2
+1 ≤ C |t − s|1/2+ τ .
k=0 k=0
τ

This means that the curves xτ - if we forget that they are discontinuous - are morally equi-hölder con-

tinuous with exponent 1/2 (up to a negligible error of order τ), and allows to extract a converging
subsequence.
Anyway, if we add some structure to the metric space (X, d), a more similar analysis to the Euclidean
case can be performed. This is what happens when we suppose that (X, d) is a geodesic space. This
requires a short discussion about curves and geodesics in metric spaces.
Curves and geodesics in metric spaces. We recall that a curve ω is a continuous function defined
on a interval, say [0, 1] and valued in a metric space (X, d). As it is a map between metric spaces, it is
meaningful to say whether it is Lipschitz or not, but its speed ω0 (t) has no meaning, unless X is a vector
space. Surprisingly, it is possible to give a meaning to the modulus of the velocity, |ω0 |(t).

Definition 2.2. If ω : [0, 1] → X is a curve valued in the metric space (X, d) we define the metric
derivative of ω at time t, denoted by |ω0 |(t) through
d(ω(t + h), ω(t))
|ω0 |(t) := lim ,
h→0 |h|
provided this limit exists.

In the spirit of Rademacher Theorem, it is possible to prove (see [12]) that, if ω : [0, 1] → X is
Lipschitz continuous, then the metric derivative |ω0 |(t) exists for a.e. t. Moreover we have, for t0 < t1 ,
Z t1
d(ω(t0 ), ω(t1 )) ≤ |ω0 |(s) ds.
t0

The same is also true for more general curves, not only Lipschitz continuous.

Definition 2.3. A curve ω : [0, 1] → R t1X is said to be absolutely continuous whenever there exists g ∈
L ([0, 1]) such that d(ω(t0 ), ω(t1 )) ≤ t g(s)ds for every t0 < t1 . The set of absolutely continuous curves
1
0
defined on [0, 1] and valued in X is denoted by AC(X).

It is well-known that every absolutely continuous curve can be reparametrized in time (through a
monotone-increasing reparametrization) and become Lipschitz continuous, and the existence of the met-
ric derivative for a.e. t is also true for ω ∈ AC(X), via this reparametrization.
Given a continuous curve, we can also define its length, and the notion of geodesic curves.

10
Definition 2.4. For a curve ω : [0, 1] → X, let us define
 n−1 

X 
ω(t = < < < = .
 
Length(ω) := sup  d(ω(t ), )) : n ≥ 1, 0 t t · · · t 1

 k k+1 0 1 n 

 
k=0
R1
It is easy to see that all curves ω ∈ AC(X) satisfy Length(ω) ≤ 0
g(t)dt < +∞. Also, we can prove
that, for any curve ω ∈ AC(X), we have
Z 1
Length(ω) = |ω0 |(t)dt.
0

We collect now some more definitions.

Definition 2.5. A curve ω : [0, 1] → X is said to be a geodesic between x0 and x1 ∈ X if ω(0) = x0 ,


ω(1) = x1 and Length(ω) = min{Length(ω̃) : ω̃(0) = x0 , ω̃(1) = x1 }.
A space (X, d) is said to be a length space if for every x and y we have

d(x, y) = inf{Length(ω) : ω ∈ AC(X), ω(0) = x, ω(1) = y}.

A space (X, d) is said to be a geodesic space if for every x and y we have

d(x, y) = min{Length(ω) : ω ∈ AC(X), ω(0) = x, ω(1) = y},

i.e. if it is a length space and there exist geodesics between arbitrary points.

In a length space, a curve ω : [t0 , t1 ] → X is said to be a constant-speed geodesic between ω(0) and
ω(1) ∈ X if it satisfies
|t − s|
d(ω(t), ω(s)) = d(ω(t0 ), ω(t1 )) for all t, s ∈ [t0 , t1 ].
t1 − t0
It is easy to check that a curve with this property is automatically a geodesic, and that the following three
facts are equivalent (for arbitrary p > 1)

1. ω is a constant-speed geodesic defined on [t0 , t1 ] and joining x0 and x1 ,


),ω(t1 ))
2. ω ∈ AC(X) and |ω0 |(t) = d(ω(tt10−t 0
a.e.,
nR t o
3. ω solves min t 1 |ω0 |(t) p dt : ω(t0 ) = x0 , ω(t1 ) = x1 .
0

We can now come back to the interpolation of the points obained through the Minimizing Movement
scheme (2.9) and note that, if (X, d) is a geodesic space, then the piecewise affine interpolation that
we used in the Euclidean space may be helpfully replaced via a piecewise geodesic interpolation. This
means defining a curve xτ : [0, T ] → X such that xτ (kτ) = xkτ and such that xτ restricted to any interval
[kτ, (k+1)τ] is a constant-speed geodesic with speed equal to d(xkτ , xk+1
τ )/τ. Then, the same computations

as in the Euclidean case allow to prove an H bound on the curves xτ (i.e. an L2 bound on the metric
1

derivatives |(xτ )0 |) and prove equicontinuity.

11
The next question is how to characterize the limit curve obtained when τ → 0, and in particular how
to express the fact that it is a gradient flow of the function F. Of course, one cannot try to prove the
equality x0 = −∇F(x), just because neither the left-hand side nor the right-hand side have a meaning in a
metric space!
If the space X, the distance d, and the functional F are explicitly known, in some cases it is possible
to pass to the limit the optimality conditions of each optimization problem in the discretized scheme, and
characterize the limit curves (or the limit curve) x(t). It will be possible to do so in the framework of
probability measures, as it will be discussed in Section 4, but not in general. Indeed, without a little bit of
(differential) structure on the space X, it is essentially impossible to do so. Hence, if we want to develop
a general theory for gradient flows in metric spaces, finer tools are needed. In particular, we need to
characterize the solutions of x0 = −∇F(x) (or x0 ∈ −∂F(x)) by only using metric quantities (in particular,
avoiding derivatives, gradients, and more generally vectors). The book by Ambrosio-Gigli-Savaré [6],
and in particular its first part (the second being devoted to the space of probability measures) exactly
aims at doing so.
Hence, what we do here is to present alternative characterizations of gradient flows in the smooth
Euclidean case, which can be used as a definition of gradient flow in the metric case, since all the
quantities which are involved have their metric counterpart.
The first observation is the following: thanks to the Cauchy-Schwartz inequality, for every curve we
have
Z t Z t
F(x(s)) − F(x(t)) = −∇F(x(r)) · x0 (r) dr ≤ |∇F(x(r))||x0 (r)| dr
s
Zs t !
1 0 2 1
≤ |x (r)| + |∇F(x(r))| dr.
2
s 2 2

Here, the first inequality is an equality if and only if x0 (r) and ∇F(x(r)) are vectors with opposite direc-
tions for a.e. r, and the second is an equality if and only if their norms are equal. Hence, the condition,
called EDE (Energy Dissipation Equality)
Z t !
1 0 2 1
F(x(s)) − F(x(t)) = |x (r)| + |∇F(x(r))| dr, for all s < t
2
s 2 2
Rt 
(or even the simple inequality F(x(s)) − F(x(t)) ≥ s 12 |x0 (r)|2 + 12 |∇F(x(r))|2 dr) is equivalent to x0 =
−∇F(x) a.e., and could be taken as a definition of gradient flow.
In the general theory of Gradient Flows ([6]) in metric spaces, another characterization, different
from the EDE, is proposed in order to cope with uniqueness and stability results. It is based on the
following observation: if F : Rd → R is convex, then the inequality

F(y) ≥ F(x) + p · (y − x) for all y ∈ Rd

characterizes (by definition) the vectors p ∈ ∂F(x) and, if F ∈ C 1 , it is only satisfied for p = ∇F(x).
Analogously, if F is λ-convex, the inequality that characterizes the gradient is
λ
F(y) ≥ F(x) + |x − y|2 + p · (y − x) for all y ∈ Rd .
2

12
We can pick a curve x(t) and a point y and compute
d1
|x(t) − y|2 = (y − x(t)) · (−x0 (t)).
dt 2
Consequently, imposing
d1 λ
|x(t) − y|2 ≤ F(y) − F(x(t)) − |x(t) − y|2 ,
dt 2 2
for all y, will be equivalent to −x0 (t) ∈ −∂F(x(t)). This will provide a second characterization (called
EVI, Evolution Variational Inequality) of gradient flows in a metric environment. Indeed, all the terms
appearing in the above inequality have a metric counterpart (only squared distances and derivatives w.r.t.
time appear). Even if we often forget the dependance on λ, it should be noted that the condition EVI
should actually be written as EVIλ , since it involves a parameter λ, which is a priori arbitrary. Actually, λ-
convexity of F is not necessary to define the EVIλ property, but it will be necessary in order to guarantee
the existence of curves which satisfy such a condition. The notion of λ-convexity will hence be crucial
also in metric spaces, where it will be rather “λ-geodesic-convexity”.
The role of the EVI condition in the uniqueness and stability of gradient flows is quite easy to guess.
Take two curves, that we call x(t) and y(s), and compute
d1 λ
d(x(t), y(s))2 ≤ F(y(s)) − F(x(t)) − d(x(t), y(s))2 , (2.12)
dt 2 2
d 1 λ
d(x(t), y(s))2 ≤ F(x(t)) − F(y(s)) − d(x(t), y(s))2 . (2.13)
ds 2 2
If one wants to estimate E(t) = 12 d(x(t), y(t))2 , summing up the two above inequalities, after a chain-rule
argument for the composition of the function of two variables (t, s) 7→ 12 d(x(t), y(s))2 and of the curve
t 7→ (t, t), gives
d
E(t) ≤ −2λE(t).
dt
By Gronwall Lemma, this provides uniqueness (when x(0) = y(0)) and stability.

3 The general theory in metric spaces


3.1 Preliminaries
In order to sketch the general theory in metric spaces, first we need to give (or recall) general definitions
for the three main objects that we need in the EDE and EVI properties, characterizing gradient flows: the
notion of speed of a curve, that of slope of a function (somehow the modulus of its gradient) and that of
(geodesic) convexity.
Metric derivative. We already introduced in the previous section the notion of metric derivative:
given a curve x : [0, T ] → X valued in a metric space, we can define, instead of the velocity x0 (t) as a
vector (i.e, with its direction, as we would do in a vector space), the speed (i.e. the modulus, or norm, of
x0 (t)) as follows:
d(x(t), x(t + h))
|x0 |(t) := lim ,
h→0 |h|

13
provided the limit exists.This is the notion of speed that we will use in metric spaces.
Slope and modulus of the gradient. Many definitions of the modulus of the gradient of a function
F defined over a metric space are possible. First, we call upper gradient every function g : X → R such
that, for every Lipschitz curve x, we have
Z 1
|F(x(0)) − F(x(1))| ≤ g(x(t))|x0 |(t)dt.
0

If F is Lipschitz continuous, a possible choice is the local Lipschitz constant


|F(x) − F(y)|
|∇F|(x) := lim sup ; (3.1)
y→x d(x, y)

another is the descending slope (we will often say just slope), which is a notion more adapted to the
minimization of a function than to its maximization, and hence reasonable for lower semi-continuous
functions:
[F(x) − F(y)]+
|∇− F|(x) := lim sup
y→x d(x, y)
(note that the slope vanishes at every local minimum point). In general, it is not true that the slope is an
upper gradient, but we will give conditions to guarantee that it is. Later on (Section 5) we will see how
to define a Sobolev space H 1 on a (measure) metric space, by using suitable relaxations of the modulus
of the gradient of F.
Geodesic convexity. The third notion to be dealt with is that of convexity. This can only be done in
a geodesic metric space. On such a space, we can say that a function is geodesically convex whenever
it is convex along geodesics. More precisely, we require that for every pair (x(0), x(1)) there exists3 a
geodesic x with constant speed connecting these two points and such that

F(x(t)) ≤ (1 − t)F(x(0)) + tF(x(1)).

We can also define λ-convex functions as those which satisfy a modified version of the above inequality:
t(1 − t) 2
F(x(t)) ≤ (1 − t)F(x(0)) + tF(x(1)) − λ d (x(0), x(1)). (3.2)
2

3.2 Existence of a gradient flow


Once fixed these basic ingredients, we can now move on to the notion of gradient flow. A starting
approach is, again, the sequential minimization along a discrete scheme, for a fixed time step τ > 0,
and then pass to the limit. First, we would like to see in which framework this procedure is well-posed.
Let us suppose that the space X and the function F are such that every sub-level set {F ≤ c} is compact
in X, either for the topology induced by the distance d, or for a weaker topology, such that d is lower
semi-continuous w.r.t. it; F is required to be l.s.c. in the same topology. This is the minimal framework
to guarantee existence of the minimizers at each step, and to get estimates as in (2.11) providing the
3
Warning: this definition is not equivalent to true convexity along the geodesic, since we only compare intermediate instants
t to 0 and 1, and not to other interemediate instants; however, in case of uniqueness of goedesics, or if we required the same
condition to be true for all geodesics, then we would recover the same condition. Also, let us note that we will only need the
existence of geodesics connecting pairs of points where F < +∞.

14
existence of a limit curve. It is by the way a quite general situation, as we can see in the case where X is
a reflexive Banach space and the distance d is the one induced by the norm: in this case there is no need
to restrict to the (very severe) assumption that F is strongly l.s.c., but the weak topology allows to deal
with a much wider situation.
We can easily understand that, even if the estimate (2.11) is enough to provide compactness, and thus
the existence of a GMM, it will never be enough to characterize the limit curve (indeed, it is satisfied by
any discrete evolution where xk+1τ gives a better value than xkτ , without any need for optimality). Hence,
we will never obtain either of the two formulations - EDE or EVI - of metric gradient flows.
In order to improve the result, we should exploit how much xk+1 τ is better than xkτ . An idea due to
De Giorgi allows to obtain the desired result, via a “variational interpolation” between the points xkτ and
τ . In order to do so, once we fix xτ , for every θ ∈]0, 1], we consider the problem
xk+1 k

d2 (x, xkτ )
min F(x) +
x 2θτ
and call x(θ) any minimizer for this problem, and ϕ(θ) the minimal value. It’s clear that, for θ → 0+ ,
we have x(θ) → xkτ and ϕ(θ) → F(xkτ ), and that, for θ = 1, we get back to the original problem with
τ . Moreover, the function ϕ is non-increasing and hence a.e. differentiable (actually, we
minimizer xk+1
can even prove that it is locally semiconcave). Its derivative ϕ0 (θ) is given by the derivative of the
d2 (x,xτ )
function θ 7→ F(x) + 2θτ k , computed at the optimal point x = x(θ) (the existence of ϕ0 (θ) implies that
this derivative is the same at every minimal point x(θ)). Hence we have
d2 (x(θ), xkτ )
ϕ (θ) = −
0
,
2θ2 τ
which means, by the way, that d(x(θ), xkτ )2 does not depend on the minimizer x(θ) for all θ such that ϕ0 (θ)
exists. Moreover, the optimality condition for the minimization problem with θ > 0 easily show that
d(x(θ), xkτ )

|∇ F|(x(θ)) ≤ .
θτ
This can be seen if we consider the minimization of an arbitrary function x 7→ F(x) + cd2 (x, x̄), for fixed
c > 0 and x̄, and we consider a competitor y. If x is optimal we have F(y) + cd2 (y, x̄) ≥ F(x) + cd2 (x, x̄),
which implies
 
F(x) − F(y) ≤ c d2 (y, x̄) − d2 (x, x̄) = c (d(y, x̄) + d(x, x̄)) (d(y, x̄) − d(x, x̄)) ≤ c (d(y, x̄) + d(x, x̄)) d(y, x).
We divide by d(y, x), take the positive part and then the lim sup as y → x, and we get |∇− F|(x) ≤ 2cd(x, x̄).
We now come back to the function ϕ and use
Z 1
ϕ(0) − ϕ(1) ≥ − ϕ0 (θ) dθ
0

(the inequality is due to the possible singular part of the derivative for monotone functions; actually, we
can prove that it is an equality by using the local semiconcave behavior, but this is not needed in the
following), together with the inequality
d(x(θ), xkτ )2 τ
−ϕ0 (θ) = ≥ |∇− F(x(θ))|2
2θ2 τ 2

15
that we just proved. Hence, we get an improved version of (2.11):
τ , x τ )2
d(xk+1 τ 1
Z
τ
F(xk+1 ) + k
≤ F(xkτ ) − |∇− F(x(θ))|2 dθ.
2τ 2 0

If we sum up for k = 0, 1, 2, . . . and then take the limit τ → 0, we can prove, for every GMM x, the
inequality
1 t 0 2 1 t −
Z Z
F(x(t)) + |x |(r) dr + |∇ F(x(r))|2 dr ≤ F(x(0)), (3.3)
2 0 2 0
under some suitable assumptions that we must select. In particular, we need lower-semicontinuity of F
in order to handle the term F(xk+1 τ ) (which will become F(x(t)) at the limit), but we also need lower-

semicontinuity of the slope |∇ F| in order to handle the corresponding term.
This inequality does not exactly correspond to EDE: on the one hand we have an inequality, and
on the other we just compare instants t and 0 instead of t and s. If we want equality for every pair
(t, s), we need to require
R t the slope to be an upper gradient. Indeed, in this case, we have the inequality
F(x(0)) − F(x(t)) ≤ 0 |∇− F(x(r))||x0 |(r)dr and, starting from the usual inequalities, we find that (3.3) is
actually an equality. This allows to subtract the equalities for s and t, and get, for s < t:

1 t 0 2 1 t −
Z Z
F(x(t)) + |x |(r) dr + |∇ F(x(r))|2 dr = F(x(s)).
2 s 2 s
Magically, it happens that the assumption that F is λ-geodesically convex simplifies everything.
Indeed, we have two good points: the slope is automatically l.s.c., and it is automatically an upper
gradient. These results are proven in [6, 5]. We just give here the main idea to prove both. This idea
is based on a pointwise representation of the slope as a sup instead of a lim sup: if F is λ-geodesically
convex, then we have
F(x) − F(y) λ
" #
|∇ F|(x) = sup

+ d(x, y) . (3.4)
y,x d(x, y) 2 +

In order to check this, we just need to add a term λ2 d(x, y) inside the positive part of the definition of
|∇− F|(x), which does not affect the limit as y → x and shows that |∇− F|(x) is smaller than this sup.
The opposite inequality is proven by fixing a point y, connecting it to x through a geodesic x(t), and
computing the limit along this curve.
This representation as a sup allows to prove semicontinuity of the slope 4 . It is also possible (see [5],
for instance) to prove that the slope is an upper gradient.
Let us insist anyway on the fact that the λ-convexity assumption is not natural nor crucial to prove the
existence of a gradient flow. On the one hand, functions smooth enough could satisfy the assumptions
on the semi-continuity of F and of |∇− F| and the fact that |∇− F| is an upper gradient independently of
convexity; on the other hand the discrete scheme already provides a method, well-posed under much
weaker assumptions, to find a limit curve. If the space and the functional allow for it (as it will be the
case in the next section), we can hope to characterize this limit curve as the solution of an equation (it
will be a PDE in Section 4), without passing through the general theory and the EDE condition.
4
Warning: we get here semi-continuity w.r.t. the topology induced by the distance d, which only allows to handle the case
where the set {F ≤ c} are d-compacts.

16
3.3 Uniqueness and contractivity
On the contrary, if we think at the uniqueness proof that we gave in the Euclidean case, it seems that
some sort of convexity should be the good assumption in order to prove uniqueness. Here we will only
give the main lines of the uniqueness theory in the metric framework: the key point is to use the EVI
condition instead of the EDE.
The situation concerning these two different notions of gradient flows (EVI and EDE) in abstract
metric spaces has been clarified by Savaré (in an unpublished note, but the proof can also be found in
[5]), who showed that

• All curves which are gradient flows in the EVI sense also satisfy the EDE condition.

• The EDE condition is not in general enough to guarantee uniqueness of the gradient flow. A simple
example: take X = R2 with the `∞ distance

d((x1 , x2 ), (y1 , y2 )) = |x1 − y1 | ∨ |x2 − y2 |,

and take F(x1 , x2 ) = x1 ; we can check that any curve (x1 (t), x2 (t)) with x10 (t) = −1 and |x20 (t)| ≤ 1
satisfies EDE.

• On the other hand, existence of a gradient flow in the EDE sense is quite easy to get, and provable
under very mild assumption, as we sketched in Section 3.2.

• The EVI condition is in general too strong in order to get existence (in the example above of the
`∞ norm, no EVI gradient flow would exist), but always guarantees uniqueness and stability (w.r.t.
initial data).

Also, the existence of EVI gradient flows is itself very restricting on the function F: indeed, it is
proven in [37] that, if F is such that from every starting point x0 there exists an EVIλ gradient flow, then
F is necessarily λ-geodesically-convex.
We provide here an idea of the proof of the contractivity (and hence of the uniqueness) of the EVI
gradient flows.

Proposition 3.1. If two curves x, y : [0, T ] → X satisfy the EVI condition, then we have
d
d(x(t), y(t))2 ≤ −2λd(x(t), y(t))2
dt
and d(x(t), y(t)) ≤ e−λt d(x(0), y(0)).

The second part of the statement is an easy consequence of the first one, by Gronwall Lemma. The
first is (formally) obtained by differentiating t 7→ d(x(t), y(t0 ))2 at t = t0 , then s 7→ d(x(t0 ), y(s))2 at
s = t0 .The EVI condition allows to write
d
d(x(t), y(t0 ))2|t=t0 ≤ −λd(x(t0 ), y(t0 ))2 + 2F(y(t0 )) − 2F(x(t0 ))
dt
d
d(x(t0 ), y(s))2|s=t0 ≤ −λd(x(t0 ), y(t0 ))2 + 2F(x(t0 )) − 2F(y(t0 ))
ds

17
and hence, summing up, and playing with the chain rule for derivatives, we get
d
d(x(t), y(t))2 ≤ −2λd(x(t), y(t))2 .
dt
If we want a satisfying theory for gradient flows which includes uniqueness, we just need to prove the
existence of curves which satisfy the EVI condition, accepting that this will probably require additional
assumptions. This can still be done via the discrete scheme, adding a compatibility hypothesis between
the function F and the distance d, a condition which involves some sort of convexity. We do not enter the
details of the proof, for which we refer to [6], where the convergence to an EVI gradient flow is proven,
with explicit error estimates. These a priori estimates allow to prove that we have a Cauchy sequence,
and then allow to get rid of the compactness part of the proof (by the way, we could even avoid using
compactness so as to prove existence of a minimizer at every time step, using almost-minimizers and the
in the Ekeland’s variational principle [41]). Here, we will just present this extra convexity assumption,
needed for the existence of EVI gradient flows developed in [6].
This assumption, that we will call C2 G2 (Compatible Convexity along Generalized Geodesics) is the
following: suppose that, for every pair (x0 , x1 ) and every y ∈ X, there is a curve x(t) connecting x(0) = x0
to x(1) = x1 , such that
t(1 − t) 2
F(x(t)) ≤ (1 − t)F(x0 ) + tF(x1 ) − λ d (x0 , x1 ),
2
d2 (x(t), y) ≤ (1 − t)d2 (x0 , y) + td2 (x1 , y) − t(1 − t)d2 (x0 , x1 ).

In other words, we require λ-convexity of the function F, but also the 2-convexity of the function x 7→
d2 (x, y), along a same curve which is not necessarily the geodesic. This second condition is automatically
satisfied, by using the geodesic itself, in the Euclidean space (and in every Hilbert space), since the
function x 7→ |x − y|2 is quadratic, and its Hessian matrix is 2I at every point. We can also see that it is
satisfied in a normed space if and only if the norm is induced by a scalar product. It has been recently
pointed out by Gigli that the sharp condition on the space X in order to guarantee existence of EVI
gradient flows is that X should be infinitesimally Hilbertian (this will be made precise in Section 5).
Here, we just observe that C2 G2 implies (λ + 1τ )-convexity, along those curves, sometimes called
generalized geodesics (consider that these curves also depend on a third point, sort of a base point,
typically different from the two points that should be connected), of the functional that we minimize at
each time step in the minimizing movement scheme. This provides uniqueness of the minimizer as soon
as τ is small enough, and allows to perform the desired estimates.
Also, the choice of this C2 G2 condition, which is a technical condition whose role is only to prove
existence of an EVI gradient flow, has been done in view of the applications to the case of the Wasserstein
spaces, that wil be the object of the next section. Indeed, in these spaces the squared distance is not in
general 2-convex along geodesics, but we can find some adapted curves on which it is 2-convex, and
many functionals F stay convex on these curves.
We finish this section by mentioning a recent extension to some non-λ-convex functionals. The
starting point is the fact that the very use of Gronwall’s lemma to prove uniqueness can be modified
by allowing for a weaker condition. Indeed, it is well-known that, whenever a function ω satisfies an
R1 1
Osgood condition 0 ω(s) ds = +∞, then E 0 ≤ ω(E) together with E(0) = 0 implies E(t) = 0 for t > 0.
This suggests that one could define a variant of the EVI definition for functions which are not λ−convex,

18
but almost, and this is the theory developed in [36]. Such a paper studies the case where F satisfies some
sort of ω-convexity for a “modulus of convexity” ω. More precisely, this means
|λ|
F(xt ) ≤ (1 − t)F(0 ) + tF(x1 ) − [(1 − t)ω(t2 d(x0 , x1 )2 ) + tω((1 − t)2 d(x0 , x1 )2 )],
2
on generalized geodesics xt (note that in the case ω(s) = s we come back to (3.2)). The function ω is
required to satisfy an Osgood condition (and some other technical conditions). Then, the EVI condition
is replaced by
d1 |λ|
d(x(t), y)2 ≤ F(y) − F(x(t)) + ω(d(x(t), y)2 ),
dt 2 2
and this allows to produce a theory with existence and uniqueness results (via a variant of Proposition
3.1). In the Wasserstein spaces (see next section), a typical case of functionals which can fit this theory
are functionals involving singular interaction kernels (or solutions to elliptic PDEs, as in the Keller-Segel
case) under L∞ constraints on the density (using the fact that the gradient ∇u of the solution of −∆u = %
is not Lipschitz when % ∈ L∞ , but is at least log-lipschitz).

4 Gradient flows in the Wasserstein space


One of the most exciting applications (and maybe the only one5 , in what concerns applied mathematics)
of the theory of gradient flows in metric spaces is for sure that of evolution PDEs in the space of measures.
This topic is inspired from the work by Jordan, Kinderlehrer and Otto ([55]), who had the intuition that
the Heat and the Fokker-Planck equations have a common variational structure in terms of a particular
distance on the probability measures, the so-called Wasserstein distance. Yet, the theory has only become
formal and general with the work by Ambrosio, Gigli and Savaré (which does not mean that proofs in
[55] were not rigorous, but the intutition on the general structure still needed to be better understood).
The main idea is to endow the space P(Ω) of probability measures on a domain Ω ⊂ Rd with a
distance, and then deal with gradient flows of suitable functionals on such a metric space. Such a distance
arises from optimal transport theory. More details about optimal transport can be found in the books by
C. Villani ([89, 90]) and in the book on gradient flows by Ambrosio, Gigli and Savaré [6]6 ; a recent book
by the author of this survey is also available [84].

4.1 Preliminaries on Optimal Transport


The motivation for the whole subject is the Rfollowing
R problem proposed by Monge in 1781 ([75]): given
two densities of mass f, g ≥ 0 on R , with f = g = 1, find a map T : Rd → Rd pushing the first one
d

onto the other, i.e. such that


Z Z
g(x)dx = f (y)dy for any Borel subset A ⊂ Rd (4.1)
A T −1 (A)
5
This is for sure exaggerated, as we could think for instance at the theory of geometrical evolutions of shapes and sets, even
if it seems that this metric approach has not yet been generalized in this framework.
6
Lighter versions exist, such as [11], or the recent User’s Guide to Optimal Transport ([5]), which is a good reference for
many topics in this survey, as it deals for one half with optimal transport (even if the title suggests that this is the only topic
of the guide), then for one sixth with the general theory of gradient flows (as in our Section 3), and finally for one third with
metric spaces with curvature bounds (that we will briefly sketch in Section 5).

19
and minimizing the quantity Z
|T (x) − x| f (x) dx
Rd
among all the maps satisfying this condition. This means that we have a collection of particles, distributed
with density f on Rd , that have to be moved, so that they arrange according to a new distribution,
whose density is prescribed and is g. The movement has to be chosen so as to minimize the average
displacement. The map T describes the movement, and T (x) represents the destination of the particle
originally located at x. The constraint on T precisely accounts for the fact that we need to reconstruct
the density g. In the sequel, we will always define, similarly to (4.1), the image measure of a measure µ
on X (measures will indeed replace the densities f and g in the most general formulation of the problem)
through a measurable map T : X → Y: it is the measure denoted by T # µ on Y and characterized by

(T # µ)(A) = µ(T −1 (A)) for every measurable set A,


Z Z
or φ d (T # µ) = φ ◦ T dµ for every measurable function φ.
Y X

The problem of Monge has stayed with no solution (does a minimizer exist? how to characterize it?. . . )
till the progress made in the 1940s with the work by Kantorovich ([56]). In the Kantorovich’s framework,
the problem has been widely generalized, with very general cost functions c(x, y) instead of the Euclidean
distance |x − y| and more general measures and spaces.
Let us start from the general picture. Consider a metric space X, that we suppose compact for
simplicity7 and a cost function c : X × X → [0, +∞]. For simplicity of the exposition, we will suppose
that c is continuous and symmetric: c(x, y) = c(y, x) (in particular, the target and source space will be the
same space X).
The formulation proposed by Kantorovich of the problem raised by Monge is the following: given
two probability measures µ, ν ∈ P(X), consider the problem
(Z )
(KP) min c dγ : γ ∈ Π(µ, ν) , (4.2)
X×X

where Π(µ, ν) is the set of the so-called transport plans, i.e.

Π(µ, ν) = {γ ∈ P(X × X) : (π0 )# γ = µ, (π1 )# γ = ν, }

where π0 and π1 are the two projections of X × X onto its factors. These probability measures over
X × X are an alternative way to describe the displacement of the particles of µ: instead of saying, for
each x, which is the destination T (x) of the particle originally located at x, we say for each pair (x, y)
how many particles go from x to y. It is clear that this description allows for more general movements,
since from a single point x particles can a priori move to different destinations y. If multiple destinations
really occur, then this movement cannot be described through a map T . It can be easily checked that
R (id, T )# µ belongs to Π(µ, ν) then T pushes µ onto ν (i.e. T # µ = ν) and the functional takes the form
if
c(x, T (x))dµ(x), thus generalizing Monge’s problem.
7
Most of the results that we present stay true without this assumptions, anyway, and we refer in particular to [6] or [89] for
details, since a large part of the analysis of [84] is performed under this simplfying assumption.

20
The minimizers for this problem are called optimal transport plans between µ and ν. Should γ be of
the form (id, T )# µ for a measurable map T : X → X (i.e. when no splitting of the mass occurs), the map
T would be called optimal transport map from µ to ν.
This generalized problem by Kantorovich is much easier to handle than the original one proposed
by Monge: for instance in the Monge case we would need existence of at least a map T satisfying the
constraints. This is not verified when µ = δ0 , if ν is not a single Dirac mass. On the contrary, there always
exists at least a transport plan in Π(µ, ν) (for instance we always have µ ⊗ ν ∈ Π(µ, ν)). Moreover, one
can state that (KP) is the relaxation of the original problem by Monge: if one considers the problem in
the same setting, where the competitors are transport plans, but sets the functional at +∞ on all the plans
that are not of the form (id, T )# µ, then one has a functional on Π(µ, ν) whose relaxation (in the sense of
the largest lower-semicontinuous functional smaller than the given one) is the functional in (KP) (see for
instance Section 1.5 in [84]).
Anyway, it is important to note that an easy use of the Direct Method of Calculus of Variations
(i.e. taking a minimizing sequence, saying that it is compact in some topology - here it is the weak
convergence of probability measures - finding a limit, and proving semicontinuity, or continuity, of the
functional we minimize, so that the limit is a minimizer) proves that a minimum does exist. As a con-
sequence, if one is interested in the problem of Monge, the question may become“does this minimizer
come from a transport map T ?” (note, on the contrary, that directly attacking by compactness and semi-
continuity Monge’s formulation is out of reach, because of the non-linearity of the constraint T # µ = ν,
which is not closed under weak convergence).
Since the problem (KP) is a linear optimization under linear constraints, an important tool will be
duality theory, which is typically used for convex problems. We will find a dual problem (DP) for (KP)
and exploit the relations between dual and primal.
The first thing we will do is finding a formal dual problem, by means of an inf-sup exchange.
First express the constraint γ ∈ Π(µ, ν) in the following way : notice that, if γ is a non-negative
measure on X × X, then we have

if γ ∈ Π(µ, ν)
Z Z Z
0

sup φ dµ + ψ dν − (φ(x) + ψ(y)) dγ =  .

φ, ψ +∞ otherwise

Hence, one can remove the constraints on γ by adding the previous sup, since if they are satisfied
nothing has been added and if they are not one gets +∞ and this will be avoided by the minimization.
We may look at the problem we get and interchange the inf in γ and the sup in φ, ψ:
Z Z Z Z !
min c dγ + sup φ dµ + ψ dν − (φ(x) + ψ(y)) dγ
γ≥0 φ,ψ

becomes Z Z Z
sup φ dµ + ψ dν + inf (c(x, y) − (φ(x) + ψ(y))) dγ.
φ,ψ γ≥0

Obviously it is not always possible to exchange inf and sup, and the main tools to do this come from
convex analysis. We refer to [84], Section 1.6.3 for a simple proof of this fact, or to [89], where the proof
is based on Flenchel-Rockafellar duality (see, for instance, [43]). Anyway, we insist that in this case it is
true that inf sup = sup inf.
Afterwards, one can re-write the inf in γ as a constraint on φ and ψ, since one has

21

if φ(x) + ψ(y) ≤ c(x, y) for all (x, y) ∈ X × X
Z
0

(c(x, y) − (φ(x) + ψ(y))) dγ =  .

inf
γ≥0 −∞
 otherwise

This leads to the following dual optimization problem: given the two probabilities µ and ν and the
cost function c : X × X → [0, +∞] we consider the problem
(Z Z )
(DP) max φ dµ + ψ dν : φ, ψ ∈ C(X), φ(x)+ψ(y) ≤ c(x, y) for all (x, y) ∈ X × X . (4.3)
X X

This problem does not admit a straightforward existence result, since the class of admissible func-
tions lacks compactness. Yet, we can better understand this problem and find existence once we have
introduced the notion of c−transform (a kind of generalization of the well-known Legendre transform).

Definition 4.1. Given a function χ : X → R we define its c−transform (or c−conjugate function) by

χc (y) = inf c(x, y) − χ(x).


x∈X

Moreover, we say that a function ψ is c−concave if there exists χ such that ψ = χc and we denote by
Ψc (X) the set of c−concave functions.

It is quite easy to realize that, given a pair (φ, ψ) in the maximization problem (DP), one can al-
ways replace it with (φ, φc ), and then with (φcc , φc ), and the constraints are preserved and the integrals
increased. Actually one could go on but it is possible to prove that φccc = φc for any function φ. This is
the same as saying that ψcc = ψ for any c−concave function ψ, and this perfectly recalls what happens
for the Legendre transform of convex funtions (which corresponds to the particular case c(x, y) = −x · y).
A consequence of these considerations is the following well-known result:

Proposition 4.1. We have Z Z


min (KP) = max φ dµ + φc dν, (4.4)
φ∈Ψc (X) X X
where the max on the right hand side is realized. In particular the minimum value of (KP) is a convex
function of (µ, ν), as it is a supremum of linear functionals.

Definition 4.2. The functions φ realizing the maximum in (4.4) are called Kantorovich potentials for the
transport from µ to ν (and will be often denoted by the symbol ϕ instead of φ).

Notice that any c−concave function shares the same modulus of continuity of the cost c. Hence, if c
is uniformly continuous (which is always the case whenever c is continuous and X is compact), one can
get a uniform modulus of continuity for the functions in Ψc (X). This is the reason why one can prove
existence for (DP) (which is the same of the right hand side problem in Proposition 4.1), by applying
Ascoli-Arzelà’s Theorem.
We look at two interesting cases. When c(x, y) is equal to the distance d(x, y) on the metric space X,
then we can easily see that

φ ∈ Ψc (X) ⇐⇒ φ is a 1-Lipschitz function (4.5)

22
and
φ ∈ Ψc (X) ⇒ φc = −φ. (4.6)
Another interesting case is the case where X = Ω ⊂ Rd and c(x, y) = 12 |x − y|2 . In this case we have

x2
φ ∈ Ψc (X) =⇒ x 7→ − φ(x) is a convex function.
2
Moreover, if X = Rd this is an equivalence.
A consequence of (4.5) and (4.6) is that, in the case where c = d, Formula 4.4 may be re-written as
Z
min (KP) = max (DP) = max φ d(µ − ν). (4.7)
φ∈Lip1 X

We now concentrate on the quadratic case when X is a domain Ω ⊂ Rd , and look at the existence of
optimal transport maps T . From now on, we will use the word domain to denote the closure of a bounded
and connected open set.
The main tool is the duality result. If we have equality between the minimum of (KP) and the
maximum of (DP) and both extremal values are realized, one can consider an optimal transport plan γ
and a Kantorovich potential ϕ and write

ϕ(x) + ϕc (y) ≤ c(x, y) on X × X and ϕ(x) + ϕc (y) = c(x, y) on sptγ.

The equality on sptγ is a consequence of the inequality which is valid everywhere and of
Z Z Z Z
c dγ = ϕ dµ + ϕ dν = (ϕ(x) + ϕc (y)) dγ,
c

which implies equality γ−a.e. These functions being continuous, the equality passes to the support of
the measure. Once we have that, let us fix a point (x0 , y0 ) ∈ sptγ. One may deduce from the previous
computations that
1
x 7→ ϕ(x) − |x − y0 |2 is maximal at x = x0
2
and, if ϕ is differentiable at x0 , one gets ∇ϕ(x0 ) = x0 − y0 , i.e. y0 = x0 − ∇ϕ(x0 ). This shows that only
one unique point y0 can be such that (x0 , y0 ) ∈ sptγ, which means that γ is concentrated on a graph. The
2
map T providing this graph is of the form x 7→ x − ∇ϕ(x) = ∇u(x) (where u(x) := x2 − ϕ(x) is a convex
function). This shows the following well-known theorem, due to Brenier ([23, 24]). Note that this also
gives uniqueness of the optimal transport plan and of the gradient of the Kantorovich potential. The
only technical point to let this strategy work is the µ-a.e. differentiability of the potential ϕ. Since ϕ has
the same regularity of a convex function, and convex function are locally Lipschitz, it is differentiable
Lebesgue-a.e., which allows to prove the following:
Theorem 4.2. Given µ and ν probability measures on a domain Ω ⊂ Rd there exists an optimal transport
plan γ for the quadratic cost 21 |x − y|2 . It is unique and of the form (id, T )# µ, provided µ is absolutely
continuous. Moreover there exists also at least a Kantorovich potential ϕ, and the gradient ∇ϕ is uniquely
determined µ−a.e. (in particular ϕ is unique up to additive constants if the density of µ is positive a.e.
on Ω). The optimal transport map T and the potential ϕ are linked by T (x) = x − ∇ϕ(x). Moreover, the
2
optimal map T is equal a.e. to the gradient of a convex function u, given by u(x) := x2 − ϕ(x).

23
Actually, the existence of an optimal transport map is true under weaker assumptions: we can replace
the condition of being absolutely continuous with the condition “µ(A) = 0 for any A ⊂ Rd such that
H d−1 (A) < +∞” or with any condition which ensures that the non-differentiability set of ϕ is negligible
(and convex function are more regular than locally Lipschitz functions).
In Theorem 4.2 only the part concerning the optimal map T is not symmetric in µ and ν: hence the
uniqueness of the Kantorovich potential is true even if it ν (and not µ) has positive density a.e. (since one
can retrieve ϕ from ϕc and viceversa).
We stress that Theorem 4.2 admits a converse implication and that any gradient of a convex function
is indeed optimal between µ and its image measure. Moreover, Theorem 4.2 can be translated in very
easy terms in the one-dimensional case d = 1: given a non-atomic measure µ and another measure ν,
both in P(R), there exists a unique monotone increasing transport map T such that T # µ = ν, and it is
optimal for the quadratic cost.
Finally, the same kind of arguments could be adapted to prove existence and uniqueness of an optimal
map for other costs, in particular to costs of the form c(x, y) = h(x − y) for a stricty convex function
h : Rd → R, which includes all the costs of the form c(x, y) = |x − y| p with p > 1. In the one-dimensional
case it even happens that the same monotone increasing map is optimal for every p ≥ 1 (and it is the
unique optimal map for p > 1)!

4.2 The Wasserstein distances


Starting from the values of the problem (KP) in (4.2) we can define a set of distances over P(X).
We mainly consider costs of the form c(x, y) = |x − y| p in X = Ω ⊂ Rd , but the analysis can be
adapted to a power of the distance in a more general metric space X. The exponent p will always be
taken in [1, +∞[ (we will not discuss the case p = ∞) in order to take advantage of the properties of the
L p norms. When Ω is unbounded we need to restrict our analysis to the following set of probabilities
( Z )
P p (Ω) := µ ∈ P(Ω) : |x| p dµ(x) < +∞ .

In a metric space, we fix an arbitrary point x0 ∈ X, and set


( Z )
P p (X) := µ ∈ P(X) : d(x, x0 ) p dµ(x) < +∞
X

(the finiteness of this integral does not depend on the choice of x0 ).


The distances that we want to consider are defined in the following way: for any p ∈ [1, +∞[ set
1/p
W p (µ, ν) = min (KP) with c(x, y) = |x − y| p .

The quantities that we obtain in this way are called Wasserstein distances8 . They are very important
in many fields of applications and they seem a natural way to describe distances between equal amounts
of mass distributed on a same space.
8
They are named after L. Vaserstein (whose name is sometimes spelled Wasserstein), but this choice is highly debated, in
particular in Russia, as the role he played in these distances is very marginal. However, this is now the standard name used
in Western countries, probably due to the terminology used in [55, 76], even if other names have been suggested, such as
Monge-Kantorovich distances, Kantorovich-Rubinstein. . . and we will stick to this terminology.

24
g(x)
f (x)
g
f (x) f

g(x)

x T(x) x T(x)

Figure 1: “Vertical” vs “horizontal” distances (the transport T is computed in the picture on the right
using 1D considerations, imposing equality between the blue and red areas under the graphs of f and g).

It is interesting to compare these distances to L p distances between densities (a comparison which


is meaningful when we consider absolutely continous measures on Rd , for instance). A first observation
is the very different behavior of these two classes of distances. We could say that, if L p distances can
be considered “vertical”, Wasserstein distances are instead “horizontal”. This consideration is very in-
formal, but is quite natural if one associates with every absolutely continuous measure the graph of its
density. To compute || f − g||L p we need to look, for every point x, at the distance between f (x) and g(x),
which corresponds to a vertical displacement between the two graphs, and then integrate this quantity.
On the contrary, to compute W p ( f, g) we need to consider the distance between a point x and a point T (x)
(i.e. an horizontal displacement on the graph) and then to integrate this, for a particular pairing between
x and T (x) which makes a coupling between f and g.
A first example where we can see the very different behavior of these two ways of computing dis-
tances is the following: take two densities f and g supported on [0, 1], and define gh as gh (x) = g(x − h).
p p
As soon as |h| > 1, the L p distance between f and gh equals (|| f ||L p + ||g||L p )1/p , and does not depend on
the “spatial” information consisting in |h|. On the contrary, the W p distance between f and gh is of the
order of |h| (for h → ∞) and depends much more on the displacement than on the shapes of f and g.
We now analyze some properties of these distances. Most proofs can be found in [84], chapter 5, or
in [89] or [6].
First we underline that, as a consequence of Hölder (or Jensen) inequalities, the Wasserstein distances
are always ordered, i.e. W p1 ≤ W p2 if p1 ≤ p2 . Reversed inequalities are possible only if Ω is bounded,
and in this case we have, if set D = diam(Ω), for p1 ≤ p2 ,
p /p2
W p1 ≤ W p2 ≤ D1−p1 /p2 W p11 .
This automatically guarantees that, if the quantities W p are distances, then they induce the same
topology, at least when Ω is bounded. But first we should check that they are distances. . .
Proposition 4.3. The quantity W p defined above is a distance over P p (Ω).

R that W p ≥ 0. Then, we also remark that W p (µ, ν) = 0 implies that there exists
Proof. First, let us note
γ ∈ Π(µ, ν) such that |x − y| p dγ = 0. Such a γ ∈ Π(µ, ν) is concentrated on {x = y}. This implies µ = ν
since, for any test function φ, we have
Z Z Z Z
φ dµ = φ(x) dγ = φ(y) dγ = φ dν.

25
We need now to prove the triangle inequality. We only give a proof in the case p > 1, with absolutely
continuous measures.
Take three measures µ, % and ν, and suppose that µ and % are absolutely continuous. Let T be the
optimal transport from µ to % and S the optimal one from % to ν. Then S ◦ T is an admissible transport
from µ to ν, since (S ◦ T )# µ = S # (T # µ) = S # % = ν. We have
Z ! 1p
W p (µ, ν) ≤ p
|S (T (x)) − x| dµ(x) = ||S ◦ T − id||L p (µ)

≤ ||S ◦ T − T ||L p (µ) + ||T − id||L p (µ) .

Moreover,
Z ! 1p Z ! 1p
||S ◦ T − T ||L p (µ) = p
|S (T (x)) − T (x)| dµ(x) = p
|S (y) − y| d%(y)

and this last quantity equals W p (%, ν). Moreover, ||T − id||L p (µ) = W p (µ, %), whence

W p (µ, ν) ≤ W p (µ, %) + W p (%, ν).

This gives the proof when µ, %  Ld and p > 1. For the general case, an approximation is needed (but
other arguments can also apply, see, for instance, [84], Section 5.1). 

We now give, without proofs, two results on the topology induced by W p on a general metric space
X.
Theorem 4.4. If X is compact, for any p ≥ 1 the function W p is a distance over P(X) and the convergence
with respect to this distance is equivalent to the weak convergence of probability measures.
To prove that the convergence according to W p is equivalent to weak convergence one first establish
this result for p = 1, through the use of the duality formula in the form (4.7). Then it is possible to use
the inequalities between the distances W p (see above) to extend the result to a general p.
The case of a noncompact space X is a little more difficult. As we said, the distance is only defined
on a subset of the whole space of probability measures, to avoid infinite values. Set, for a fixed reference
point x0 which can be chosen to be 0 in the Euclidean space,
Z
m p (µ) := d(x, x0 ) p dµ(x).
X
n o
In this case, the distance W p is only defined on P p (X) := µ ∈ P(X) : m p (µ) < +∞ . We have
Theorem 4.5. For any p ≥ 1 the function W p is a distance over P p (X) and, given a measure µ and a
sequence (µn )n in W p (X), the following are equivalent:
• µn → µ according to W p ;

• µn * µ and m p (µn ) → m p (µ);

• X φ dµn → X φ dµ for any φ ∈ C 0 (X) whose growth is at most of order p (i.e. there exist constants
R R

A and B depending on φ such that |φ(x)| ≤ A + Bd(x, x0 ) p for any x).

26
After this short introduction to the metric space W p := (P p (X), W p ) and its topology, we will focus
on the Euclidean case, i.e. where the metric space X is a domain Ω ⊂ Rd , and study the curves valued in
W p (Ω) in connections with PDEs.
The main point is to identify the absolutely continuous curves in the space W p (Ω) with solutions of
the continuity equation ∂t µt + ∇ · (vt µt ) = 0 with L p vector fields vt . Moreover, we want to connect the
L p norm of vt with the metric derivative |µ0 |(t).
We recall that standard considerations from fluid mechanics tell us that the continuity equation above
may be interpreted as the equation ruling the evolution of the density µt of a family of particles initially
distributed according to µ0 and each of which follows the flow

y0x (t) = vt (y x (t))


y x (0) = x.

The main theorem in this setting (originally proven in [6]) relates absolutely continuous curves in W p
with solutions of the continuity equation:
Theorem 4.6. Let (µt )t∈[0,1] be an absolutely continuous curve in W p (Ω) (for p > 1 and Ω ⊂ Rd an open
domain). Then for a.e. t ∈ [0, 1] there exists a vector field vt ∈ L p (µt ; Rd ) such that
• the continuity equation ∂t µt + ∇ · (vt µt ) = 0 is satisfied in the sense of distributions,

• for a.e. t we have ||vt ||L p (µt ) ≤ |µ0 |(t) (where |µ0 |(t) denotes the metric derivative at time t of the
curve t 7→ µt , w.r.t. the distance W p );
Conversely, if (µt )t∈[0,1] is a family of measures in P p (Ω) and for each t we have a vector field
R1
vt ∈ L p (µt ; Rd ) with 0 ||vt ||L p (µt ) dt < +∞ solving ∂t µt + ∇ · (vt µt ) = 0, then (µt )t is absolutely continuous
in W p (Ω) and for a.e. t we have |µ0 |(t) ≤ ||vt ||L p (µt ) .
Note that, as a consequence of the second part of the statement, the vector field vt introduced in the
first part must a posteriori satisfy ||vt ||L p (µt ) = |µ0 |(t).
We will not give the proof of this theorem, which is quite involved. The main reference is [6] (but
the reader can also find alternative proofs in Chapter 5 of [84], in the case where Ω is compact). Yet,
if the reader wants an idea of the reason for this theorem to be true, it is possible to start from the
case of two time steps: there are two measures µt and µt+h and there are several ways for moving the
particles so as to reconstruct the latter from the former. It is exactly R as when we look for a transport.
One of these transports is optimal in the sense that it minimizes |T (x) − x| p dµt (x) and the value of this
p
integral equals W p (µt , µt+h ). If we call vt (x) the “discrete velocity of the particle located at x at time t,
i.e. vt (x) = (T (x) − x)/h, one has ||vt ||L p (µt ) = 1h W p (µt , µt+h ). We can easily guess that, at least formally,
the result of the previous theorem can be obtained as a limit as h → 0.
Once we know about curves in their generality, it is interesting to think about geodesics. The follow-
ing result is a characterization of geodesics in W p (Ω) when Ω is a convex domain in Rd . This procedure
is also known as McCann’s displacement interpolation.
Theorem 4.7. If Ω ⊂ Rd is convex, then all the spaces W p (Ω) are length spaces and if µ and ν belong to
W p (Ω), and γ is an optimal transport plan from µ to ν for the cost c p (x, y) = |x − y| p , then the curve

µγ (t) = (πt )# γ

27
where πt : Ω × Ω → Ω is given by πt (x, y) = (1 − t)x + ty, is a constant-speed geodesic from µ to ν. In
the case p > 1 all the constant-speed geodesics are of this form, and, if µ is absolutely continuous, then
there is only one geodesic and it has the form

µt = [T t ]# µ, where T t := (1 − t)id + tT

where T is the optimal transport map from µ to ν. In this case, the velocity field vt of the geodesic µt is
given by vt = (T − id) ◦ (T t )−1 . In particular, for t = 0 we have v0 = −∇ϕ and for t = 1 we have v1 = ∇ψ,
where ϕ is the Kantorovich potential in the transport from µ to ν and ψ = ϕc .
The above theorem may be adapted to the case where the Euclidean domain Ω is replaced by a Rie-
mannian manifold, and in this case the map T t must be defined by using geodesics instead of segments:
the point T t (x) will be the value at time t of a constant-speed geodesic, parametrized on the interval [0, 1],
connecting x to T (x). For the theory of optimal transport on manifolds, we refer to [71].
Using the characterization of constant-speed geodesics as minimizers of a strictly convex kinetic
energy, we can also infer the following interesting information.

• Looking for an optimal transport for the cost c(x, y) = |x − y| p is equivalent to looking for constant-
speed geodesic in W p because from optimal plans we can reconstruct geodesics and from geodesics
(via their velocity field) it is possible to reconstruct the optimal transport;
R1
• constant-speed geodesics may be found by minimizing 0 |µ0 |(t) p dt ;

• in the case of the Wasserstein spaces, we have |µ0 |(t) p = Ω |vt | p dµt , where v is a velocity field
R

solving the continuity equation together with µ (this field is not unique, but the metric derivative
|µ0 |(t) equals the minimal value of the L p norm of all possible fields).

As a consequence of these last considerations, for p > 1, solving the kinetic energy minimization
problem (Z 1 Z )
min |vt | d%t dt : ∂t %t + ∇ · (vt %t ) = 0, %0 = µ, %1 = ν
p
0 Ω
selects constant-speed geodesics connecting µ to ν, and hence allows to find the optimal transport be-
tween these two measures. This is what is usually called Benamou-Brenier formula ([14]).
On the other hand, this minimization problem in the variables (%t , vt ) has non-linear constraints (due
to the product vt %t ) and the functional is non-convex (since (t, x) 7→ t|x| p is not convex). Yet, it is possible
to transform it into a convex problem. For this, it is sufficient to switch variables, from (%t , vt ) into (%t , Et )
where Et = vt %t , thus obtaining the following minimization problem
Z Z 
 1
 |Et | p 
∂ % + = % = µ, % = ν .
 
min  dxdt : ∇ · E 0, (4.8)

p−1 t t t 0 1
 0 Ω%
 


t

We need to use the properties of the function f p : R × Rd → R ∪ {+∞}, defined through


 1 |x| p

 p p−1 if t > 0,
 t


f p (t, x) := sup (at + b · x) =  if t = 0, x = 0

 0
(a,b)∈Kq 
+∞


if t = 0, x , 0, or t < 0,

28
where Kq := {(a, b) ∈ R × Rd : a + 1q |b|q ≤ 0} and q = p/(p − 1) is the conjugate exponent of p. In
particular, f p is convex, which makes the above minimization problem convex, and also allows to write
R1R p
what we formally wrote as 0 Ω |Ep−1t|
dxdt (an expression which implicitly assumes %t , Et  Ld ) into the
%t
form
(Z Z )
B p (%, E) := sup a d% + b · dE : (a, b) ∈ C(Ω × [0, 1]; Kq ) .

Both the convexity and this last expression will be useful for numerical methods (as it was first done in
[14]).

4.3 Minimizing movement schemes in the Wasserstein space and evolution PDEs
Thanks to all the theory which has been described so far, it is natural to study gradient flows in the space
W2 (Ω) (the reason for choosing the exponent p = 2 will be clear in a while) and to connect them to
PDEs of the form of a continuity equation. The most convenient way to study this is to start from the
time-discretized problem, i.e. to consider a sequence of iterated minimization problems:

W22 (%, %τk )


%τk+1 ∈ argmin% F(%) + . (4.9)

Note that we denote now the measures on Ω by the letter % instead of µ or ν because we expect them to be
absolutely continuous measures with nice (smooth) densities, and we want to study the PDE they solve.
The reason to focus on the case p = 2 can also be easily understood. Indeed, from the very beginning,
i.e. from Section 2, we saw that the equation x0 = −∇F(x) corresponds to a sequence of minimization
problems with the squared distance |x − xkτ |2 (if we change the exponent here we can consider
τ p
1 |x − xk |
min F(x) + · ,
x p τ p−1

but this gives raise to the equation x0 = −|∇F(x)|q−2 ∇F(x), where q = p/(p−1) is the conjugate exponent
of p), and in the Wasserstein space W p the distance is defined as the power 1/p of a transport cost; only
in the case p = 2 the exponent goes away and we are lead to a minimization problem involving F(%) and
a transport cost of the form
(Z )
Tc (%, ν) := min c(x, y) dγ : γ ∈ Π(%, ν) ,

for ν = %τk .
In the particular case of the space W2 (Ω), which has some additional structure, if compared to ar-
bitrary metric spaces, we would like to give a PDE description of the curves that we obtain as gradient
flows, and this will pass through the optimality conditions of the minimization problem (4.9). In order to
study these optimality conditions, we introduce the notion of first variation of a functional. This will be
done in a very sketchy and formal way (we refer to Chapter 7 in [84] for more details).
Given a functional G : P(Ω) → R we call δG δ% (%), if it exists, the unique (up to additive constants)
R δG
function such that dε G(% + εχ)|ε=0 = δ% (%)dχ for every perturbation χ such that, at least for ε ∈ [0, ε0 ],
d

29
the measure % + εχ belongs to P(Ω). The function δG δ% (%) is called first variation of the functional G at %.
In order to understand this notion, the easiest possibility is to analyze some examples.
The three main classes of examples are the following functionals9
Z Z ZZ
1
F (%) = f (%(x))dx, V(%) = V(x)d%, W(%) = W(x − y)d%(x)d%(y),
2
where f : R → R is a convex superlinear function (and the functional F is set to +∞ if % is not absolutely
continuous w.r.t. the Lebesgue measure) and V : Ω → R and W : Rd → R are regular enough (and W is
taken symmetric, i.e. W(z) = W(−z), for simplicity). In this case it is quite easy to realize that we have
δF δV δW
(%) = f 0 (%), (%) = V, (%) = W ∗%.
δ% δ% δ%
It is clear that the first variation of a functional is a crucial tool to write optimality conditions for vari-
ational problems involving such a functional. In order to study the problem (4.9), we need to complete
the picture by undestanding the first variation of functionals of the form % 7→ Tc (%, ν). The result is the
following:
Proposition 4.8. Let c : Ω × Ω → R be a continuous cost function. Then the functional % 7→ Tc (%, ν)
R convex, Rand its subdifferential at %0 coincides with the set of Kantorovich potentials {ϕ ∈ C (Ω) :
is 0

ϕ d%0 + ϕ dν = Tc (%, ν)}. Moreover, if there is a unique c-concave Kantorovich potential ϕ from %0
c

to ν up to additive constants, then we also have δTδ%


c (·,ν)
(%0 ) = ϕ.
Even if a complete proof of the above proposition is not totally trivial (and Chapter 7 in [84] only
provides it in the case where Ω is compact), one can guess why this is true from the following consider-
ations. Start from Propositon 4.1, which provides
Z Z
Tc (%, ν) = max φ d% + φc dν.
φ∈Ψc (X) Ω Ω

This expresses Tc as a supremum of linear functionals in % and shows convexity. Standard considerations
from convex analysis allow to identify the subdifferential as theRset of functions ϕ attaining the maximum.
An alternative point of view is to consider the functional % 7→ φ d% + φc dν for fixed φ, in which case
R

the first variation is of course φ; then it is easy to see that the first variation of the supremum may be
obtained (in case of uniqueness) just by selecting the optimal φ.
Once we know how to compute first variations, we come back to the optimality conditions for the
minimization problem (4.9). Which are these optimality conditions? roughly speaking, we should have
δF τ ϕ
(% ) + = const
δ% k+1 τ
(where the reasons for having a constant instead of 0 is the fact that, in the space of probability measures,
only zero-mean densities are considered as admissible perturbations, and the first variations are always
defined up to additive constants). Note that here ϕ is the Kantorovich potential associated with the
transport from %τk+1 to %τk (and not viceversa).
9
Note that in some cases the functionals that we use are actually valued in R ∪ {+∞}, and we restrict to a suitable class of
perturbations χ which make the corresponding functional finite.

30
More precise statements and proofs of this optimality conditions will be presented in the next section.
Here we look at the consequences we can get. Actually, if we combine the fact that the above sum is
constant, and that we have T (x) = x − ∇ϕ(x) for the optimal T , we get
T (x) − x ∇ϕ(x) δF 
− v(x) := =− =∇ (%) (x). (4.10)
τ τ δ%

We will denote by −v the ratio T (x)−x τ . Why? because, as a ratio between a displacement and a time step,
it has the meaning of a velocity, but since it is the displacement associated to the transport from %τk+1 to
%τk , it is better to view it rather as a backward velocity (which justifies the minus sign).
Since here we have v = −∇ δF δ% (%)), this suggests that at the limit τ → 0 we will find a solution of

δF
!!
∂t % − ∇ · % ∇ (%) = 0. (4.11)
δ%

This is a PDE where the velocity field in the continuity equation depends on the density % itself. An
interesting example is the case where we use F(%) = f (%(x))dx, with f (t) = t log t. In such a case we
R

have f 0 (t) = log t + 1 and ∇( f 0 (%)) = ∇%


% : this means that the gradient flow equation associated to the
functional F would be the Heat Equation ∂% − ∆% = 0. Using F(%) = f (%(x))dx + V(x)d%(x), we
R R

would have the Fokker-Planck Equation ∂% −∆%−∇·(%∇V) = 0. We will see later which other interesting
PDEs can be obtained in this way.
Many possible proofs can be built for the convergence of the above iterated minimization scheme.
In particular, one could follow the general theory developed in [6], i.e. checking all the assumptions to
prove existence and uniqueness of an EVI gradient flow for the functional F in the space W2 (Ω), and
then characterizing the velocity field that Theorem 4.6 associates with the curve obtained as a gradient
flow. In [6], it is proven, under suitable conditions, that such a vector field vt must belong to what is
defined as the Wasserstein sub-differential of the functional F, provided in particular that F is λ-convex.
Then, the Wasserstein sub-differential is proven to be of the desired form (i.e. composed only of the
gradient of the first variation of F, when F admits a first variation).
This approach has the advantage to use a general theory and to adapt it to the scopes of this particular
setting. On the other hand, some considerations seem necessary:

• the important point when studying these PDEs is that the curves (%t )t obtained as a limit are true
weak solutions of the continuity equations; from this point of view, the notions of EDE and EVI
solutions and the formalism developed in the first part of the book [6] (devoted to the general
metric case) are not necessary; if the second part of [6] is exactly concerned with Wasserstein
spaces and with the caracterization of the limit as τ → 0 as the solution of a PDE, we can say that
the whole formalism is sometimes too heavy.

• after using optimal transport thery to select a suitable distance in the discrete scheme above and a
suitable interpolation, the passage to the limit can be done by classical compactness techniques in
the space of measures and in functional analysis; on the other hand, there are often some difficulties
in handling some non-linear terms, which are not always seen when using the theory of [6] (which
is an advantage of this general theory).

31
• the λ-convexity assumption is in general not crucial in what concerns existence (but the general
theory in [6] has been built under this assumption, as we saw in Section 3).

• as far as uniqueness of the solution is concerned, the natural point of view in the applications
would be to prove uniqueness of the weak solution of the equation (or, possibly, to define a more
restrictive notion of solution of the PDE for which one can prove uniqueness), and this is a priori
very different from the EDE or EVI notions. To this aim, the use of the Wasserstein distance can
be very useful, as one can often prove uniqueness by differentiating in time the distance between
two solutions, and maybe apply a Gronwall lemma (and this can be done independently of the
EVI notion; see for instance the end of section 4.4). On the other hand, uniqueness results are
almost never possible without some kind of λ-convexity (or weaker versions of it, as in [36]) of
the functional.

For the reader who wants an idea of how to prove convergence of the scheme to a solution of the
PDE independently of the EDE/EVI theory, here are some sketchy ideas. Everything will be developed
in details in Section 4.5 in a particular case.
The main point lies in the interpolation of the sequence %τk (and of the corresponding velocities vτk ).
Indeed, two possible interpolations turn out to be useful: on the one hand, we can define an interpolation
(%τ , vτ ) which is piecewise constant, as in (2.10) (to define vτ we use ∇ϕ/τ); on the other hand, we can
connect each measure %τk to %τk+1 by using a piecewise geodesic curve %̃τ , where geodesics are chosen
according to the Wasserstein metric, using Theorem (4.7) to get an explicit expression. This second
interpolation is a Lipschitz curve in W2 (Ω), and has an explicit velocity field, that we know thanks to
Theorem (4.7): we call it ṽτ and it is related to vτ . The advantage of the second interpolation is that
(%̃τ , ṽτ ) satisfies the continuity equation. On the contrary, the first interpolation is not continuous, the
continuity equation is not satisfied, but the optimality conditions at each time step provide a connection
between vτ and %τ (vτ = −∇ δF τ τ τ
δ% (% ))),which is not satisfied with %̃ and ṽ . It is possible to prove that
the two interpolations converge to the same limit as τ → 0, and that the limit will satisfy a continuity
equation with a velocity vector field v = −∇ δF δ% (%)), which allows to conclude.

4.4 Geodesic convexity in W2


Even if we insisted on the fact that the most natural approach to gradient flows in W2 relies on the notion
of weak solutions to some PDEs and not on the EDE/EVI formulations, for sure it could be interesting to
check whether the general theory of Section 3 could be applied to some model functionals on the space
of probabilities, such as F , V or W. This requires to discuss their λ-convexity, which is also useful
because it can provide uniqueness results. As we now know the geodesics in the Wasserstein space, the
question of which functionals are geodesically convex is easy to tackle. The notion of geodesic convexity
in the W2 space, aslo called displacement convexity, has first been introduced by McCann in [69].

Displacement convexity of F , V and W. It is not difficult to check that the convexity of V is enough
to guarantee geodesic convexity of V, since
Z Z
V(µt ) = V d (1 − t)id + tT # µ = V (1 − t)x + tT (x) dµ,
 

32
as well as the convexity of W guarantees that of W:
Z   
W(µt ) = W(x − y) d (1 − t)id + tT # µ ⊗ (1 − t)id + tT # µ (x, y)
 
Z
= W (1 − t)x + tT (x), (1 − t)y + tT (y) d(µ ⊗ µ).


Similarly, if V or W are λ-convex we get λ-geodesical convexity. Note that in the case of V it is easy to
see that the λ-convexity of V is also necessary for the λ-geodesical convexity of V, while the same is not
true for W and W.
The most interesting displacement convexity result is the one for functionals depending on the den-
sity. To consider these functionals, we need some technical facts.
The starting point is the computation of the density of an image measure, via standard change-of-
variable techniques: if T : Ω → Ω is a map smooth enough10 and injective, and det(DT (x)) , 0 for a.e.
x ∈ Ω, if we set %T := T # %, then %T is absolutely continuous with density given by
%
%T = ◦ T −1 .
det(DT )
Then, we underline an interesting computation, which can be proven as an exercice.
Lemma 4.9. Let A be a d × d matrix such that its eigenvalues λi are all real and larger than −1 (for
instance this is the case when A is symmetric and I + A ≥ 0). Then [0, 1] 3 t 7→ g(t) := det(I + tA)1/d is
concave.
We can now state the main theorem.
Theorem 4.10. Suppose that f (0) = 0 and that s 7→ sd f (s−d ) is convex and decreasing. Suppose that Ω
is convex and take 1 < p < ∞. Then F is geodesically convex in W2 .
Proof. Let us consider two measures µ0 , µ1 with F (µ0 ), F (µ1 ) < +∞. They are absolutely continuous
and hence there is a unique constant-speed geodesic µt between them, which has the form µt = (T t )# µ0 ,
2
where T t = id + t(T − id). Note that we have T t (x) = x − t∇ϕ(x), where ϕ is such that x2 − ϕ is
convex. This implies that D2 ϕ ≤ I and T t is, for t < 1, the gradient of a strictly convex function, hence
it is injective. Moreover ∇ϕ is countably Lipschitz, and so is T t . From the formula for the density
of the image measure, we know that µt is absolutely continuous and we can write its density %t as
%t (y) = %(T t−1 (y))/ det(I + tA(T t−1 (y))), where A = −D2 ϕ and % is the density of µ, and hence
%(T t−1 (y))) %(x)
Z ! Z !
F (µt ) = f dy = f det(I + tA(x)) dx,
det(I + tA(T t−1 (y))) det(I + tA(x)
where we used the change of variables y = T t (x) and dy = det DT t (x) dx = det(I + tA(x)) dx.
From Lemma 4.9 we know that det(I + tA(x)) = g(t, x)d for a function g : [0, 1] × Ω which is concave
in t. It is a general fact that the composition of a convex and decreasing function with a concave one
gives a convex function. This implies that
%(x)
!
t 7→ f g(t, x)d
g(t, x)d
10
We need at least T to be countably Lipschitz, i.e. Ω may be written as a countable union of measurable sets (Ωi )i≥0 with
Ω0 negligible and T Ωi Lipschitz continuous for every i ≥ 1.

33
1
is convex (if %(x) , 0 this uses the assumption on f and the fact that t 7→ g(t, x)/%(x) d is concave; if
%(x) = 0 then this function is simply zero). Finally, we proved convexity of t 7→ F (µt ). 

Remark 4.1. Note that the assumption that s 7→ sd f (s−d ) is convex and decreasing implies that f itself is
convex (the reader can check it as an exercise), a property which can be useful to establish, for instance,
lower semicontinuity of F .

Let us see some easy examples of convex functions satisfying the assumptions of Theorem 4.10:

• for any q > 1, the function f (t) = tq satisfies these assumptions, since sd f (s−d ) = s−d(q−1) is
convex and decreasing;

• the entropy function f (t) = t log t also satisfies the assumptions since sd f (s−d ) = −d log s is convex
and decreasing;

• if 1 − d1 ≤ m < 1 the function f (t) = −tm is convex, and if we compute sd f (s−d ) = −tm(1−d) we get
a convex and decreasing function since m(1 − d) < 1. Note that in this case f is not superlinear,
which requires some attention for the semicontinuity of F .

Convexity on generalized geodesics. It is quite disappointing to note that the functional µ 7→ W22 (µ, ν)
is not, in general, displacement convex. This seems contrary to the intuition because usually squared
distances are nice convex functions11 . However, we can see that this fails from the following easy
example. Take ν = 21 δ(1,0) + 21 δ(−1,0) and µt = 21 δ(t,a) + 12 δ(−t,−a) . If a > 1, the curve µt is the geodesic
between µ−1 and µ1 (because the optimal transport between these measures sends (a, −1) to (a, 1) and
(−a, 1) to (−a, −1). Yet, if we compute W22 (µt , ν) we have

W22 (µt , ν) = a2 + min{(1 − t)2 , (1 + t)2 }.

But this function is not convex! (see Figure 2)

µt
µ−1 • • • µ1
a2 + min{(t − 1)2 , (t + 1)2 }

ν ν
• •

t
−1 1
µ1 • • • µ−1
µt

Figure 2: The value of W22 (µt , ν).

11
Actually, this is true in normed spaces, but not even true in Riemannian manifolds, as it depends on curvature properties.

34
The lack of geodesic convexity of this easy functional12 is a problem for many issues, and in partic-
ular for the C2 G2 condition, and an alternate notion has been proposed, namely that of convexity along
generalized geodesics.

Definition 4.3. If we fix an absolutely continuous probability % ∈ P(Ω), for every pair µ0 , µ1 ∈ P(Ω) we
call generalized geodesic between µ0 and µ1 with base % in W2 (Ω) the curve µt = ((1 − t)T 0 + tT 1 )# %,
where T 0 is the optimal transport map (for the cost |x − y|2 ) from % to µ0 , and T 1 from % to µ1 .

It is clear that t 7→ W22 (µt , %) satisfies


Z
W2 (µt , %) ≤
2
|((1 − t)T 0 (x) + tT 1 (x)) − x|2 d%(x)
Z Z
≤ (1 − t) |T 0 (x) − x| d%(x) + t |T 1 (x) − x|2 d%(x) = (1 − t)W22 (µ0 , %) + tW22 (µ1 , %)
2

and hence we have the desired convexity along this curve. Moreover, similar considerations to those we
developed in this section show that all the functionals that we proved to be geodesically convex are also
convex along generalized geodesics. For the case of functionals V and W this is easy, while for the case
of the functional F , Lemma 4.9 has to be changed into “t 7→ det((1 − t)A + tB)1/d is concave, whenever
A and B are symmetric and positive-definite” (the proof is similar). We do not develop these proofs here,
and we refer to [6] or Chapter 7 in [84] for more details.
Of course, we could wonder whether the assumption C2 G2 is satisfied in the Wasserstein space
W2 (Ω) for these functionals F , V and W: actually, if one looks closer at this questions, it is possible to
see that the very definition of C2 G2 has been given on purpose in order to face the case of Wasserstein
spaces. Indeed, if we fix ν ∈ P(Ω) and take µ0 , µ1 two other probabilities, with T 0 , T 1 the optimal
transports from ν to µ0 and µ1 , respectively, then the curve

µt := ((1 − t)T 0 + tT 1 )# ν (4.12)

connects µ0 to µ1 and can be used in C2 G2 .

Displacement convexity and curvature conditions. In the proof of the geodesic convexity of the
functional F we strongly used the Euclidean structure of the geodesics in the Wasserstein space. The
key point was form of the intermediate map T t : a linear interpolation between the identity map id and
the optimal map T , together with the convexity properties of t 7→ det(I + tA)1/d . On a Riemannian
manifold, this would completely change as the geodesic curves between x and T (x), which are no more
segments, could concentrate more or less than what happens in the Euclidean case, depending on the
curvature of the manifold (think at the geodesics on a sphere connecting points close to the North Pole
to points close to the South Pole: they are much farther from each other at their middle points than at
their endpoints). RIt has been found (see [79]) that the condition of λ-geodesic convexity of the Entropy
functional % 7→ % log(%)dVol (where % is absolutely continuous w.r.t. the volume meaure Vol and
densities are computed accordingly) on a manifold characterizes a lower bound on its Ricci curvature:

12
By the way, this functional can even be proven to be somehow geodetically concave, as it is shown in [6], Theorem 7.3.2.

35
Proposition 4.11. Let M be a compact manifold of dimension d and Vol its volume measure. Let E be
the entropy functional defined via E(%) = % log % dVol for all measures %  Vol (set to +∞ on non-
R

absolutely continuous measures). Then E is λ-geodesically convex in the Wasserstein space W2 (M) if
and only if the Ricci curvature Ric M satisfies Ric M ≥ λ. In the case λ = 0, the same equivalence is true
if one replaces the entropy function f (t) = t log t with the function fN (t) = −t1−1/N with N ≥ d.
This fact will be the basis (we will see it in Section 5) of a definition based on optimal transport of
the notion of Ricci curvature bounds in more abstract spaces, a definition independently proposed in two
celebrated papers by Sturm, [87] and Lott and Villani, [63].
Remark 4.2. The reader who followed the first proofs of this section has for sure observed that it is easy,
in the space W2 (Ω), to produce λ-geodesically convex functionals which are not geodesically convex
(with λ < 0, of course), of the form V (just take a λ-convex function V which is not convex), but
that Theorem 4.10 only provides geodesic convexity (never provides λ-convexity without convexity) for
functionals of the form F : this is indeed specific to the Euclidean case, where the optimal transport
has the form T (x) = x − ∇ϕ(x); in Riemannian manifolds or other metric measure spaces, this can be
different!

Geodesic convexity and uniqueness of gradient flows. The fact that λ-convexity is a crucial tool to
establish uniqueness and stability results in gradient flows is not only true in the abstract theory of Section
3, where we saw that the EVI condition (intimately linked to λ-convexity) provides stability. Indeed, it
can also be observed in the concrete case of gradient flows in W2 (Ω), which take the form of the PDE
(4.11). We will see this fact via some formal considerations, starting from an interesting lemma:
Lemma 4.12. Suppose that F : W2 (Ω) → R ∪ {+∞} is λ-geodesically convex. Then, for every %0 , %1 ∈
P(Ω) for which the integrals below are well-defined, we have
δF δF
Z ! Z !
∇ϕ · ∇ (%0 ) d%0 + ∇ψ · ∇ (%1 ) d%1 ≥ λW22 (%0 , %1 ),
δ% δ%
where ϕ is the Kantorovich potential in the transport from %0 to %1 and ψ = ϕc is the Kantorovich
potential from %1 to %0 .
Proof. Let %t be the the geodesic curve in W2 (Ω) connecting %0 to %1 and g(t) := F(%t ). The assumption
of λ-convexity means (see the definition in (3.2))
λ
g(t) ≤ (1 − t)g(0) + tg(1) − t(1 − t)W22 (%0 , %1 ).
2
Since the above inequality is an equality for t = 0, 1, we can differentiate it at these two points thus
obtaining
λ λ
g0 (0) ≤ g(1) − g(0) − W22 (%0 , %1 ), g0 (1) ≥ g(1) − g(0) + W22 (%0 , %1 ),
2 2
which implies
g0 (0) − g0 (1) ≤ −λW22 (%0 , %1 ).
Then, we compute the derivative of g, formally obtaining
δF δF δF
Z Z Z !
g (t) =
0
(%t )∂t %t = − (%t )∇ · (vt %t ) = ∇ (%t ) · vt d%t ,
δ% δ% δ%

36
and, using v0 = −∇ϕ and v1 = ∇ψ, we obtain the claim. 

With the above lemma in mind, we consider two curves %0t and %1t , with
∂t %it + ∇ · (%it vit ) = 0 for i = 0, 1,
where the vector fields vit are their respective velocty fields provided by Theorem 4.6. Setting d(t) :=
2 W2 (%t , %t ), it is natural to guess that we have
1 2 0 1

Z Z
d0 (t) = ∇ϕ · v0t d%0t + ∇ψ · v1t d%1t ,

where ϕ is the Kantorovich potential in the transport from %0t to %1t and ψ = ϕc . Indeed, a rigorous proof
is provided in [6] or in Section 5.3.5 of [84], but one can guess it from the duality formula
(Z Z )
1
d(t) = max φ d%t + ψ d%t : φ(x)+ψ(y) ≤ |x − y| ) .
0 1 2
2
As usual, differentiating an expression written as a max involves the optimal functions φ, ψ in such a
max, and the terms ∂t %it have been replaced by −∇ · (%it vit ) as in the proof Lemma
 4.12.
When the two curves %t and %t are solutions of (4.11) we have vt = −∇ δF

0 1 i i
δ% (%t ) , and Lemma 4.12
allows to obtain the following:
Proposition 4.13. Suppose that F : W2 (Ω) → R∪{+∞} is λ-geodesically convex and that the two curves
%0t and %1t are solutions of (4.11). Then, setting d(t) := 21 W22 (%0t , %1t ), we have
d0 (t) ≤ −λd(t).
This implies uniqueness of the solution of (4.11) for fixed initial datum, stability, and exponential con-
vergence as t → +∞ if λ > 0.

4.5 Analysis of the Fokker-Planck equation as a gradient flow in W2


This section will be a technical parenthesis providing proof details in a particular case study, that of the
Fokker-Planck equation, which is the gradient flow of the functional
Z Z
J(%) = % log % + Vd%,
Ω Ω

where V is a C1 on a domain Ω, that we will choose compact for simplicity. The initial
function13
measure %0 ∈ P(Ω) is taken such that J(%0 ) < +∞.
We stress that the first term of the functional is defined as
R
 Ω %(x) log %(x) dx if %  Ld ,


E(%) := 
+∞
 otherwise,
where we identify the measure % with its density, when it is absolutely continuous. This functional is l.s.c.
for the weak topology of measures (for general references on the semicontinuity of convex functionals
on the space of measures we refer to Chapter 7 in [84] or to [27]), which is equivalent, on the compact
domain Ω, to the W2 convergence. Semi-continuity allows to establish the following:
13
The equation would also be meaningful for V only Lipschitz continuous, but we prefer to stick to the C 1 case for simplicity.

37
Proposition 4.14. The functional J has a unique minimum over P(Ω). In particular J is bounded from
below. Moreover, for each τ > 0 the following sequence of optimization problems recursively defined is
well-posed
W22 (%, %τ(k) )
τ
%(k+1) ∈ argmin% J(%) + , (4.13)

which means that there is a minimizer at every step, and this minimizer is unique.
Proof. Just apply the direct method, noting that P(Ω) is compact for the weak convergence, which is the
same as the convergence for the W2 distance (again, because Ω is compact), and for this convergence F
is l.s.c. and the other terms are continuous. This gives at the same time the existence of a minimizer for
J and of a solution to each of the above minimization problems (4.13). Uniqueness comes from the fact
that all the functionals are convex (in the usual sense) and F is strictly convex. 

Optimality conditions at each time step. A first preliminary result we need is the following :
% in (4.13) must satisfy b
Lemma 4.15. Any minimizer b % > 0 a.e.
Proof. Consider the measure e % with constant positive density c in Ω (i.e. c = |Ω|−1 ). Let us define %ε as
(1 − ε)b
% + εe
% and compare b% to %ε .
%, we may write
By optimality of b
Z Z %, %τ(k) )
W22 (%ε , %τ(k) ) W22 (b
%) − E(%ε ) ≤
E(b Vd%ε − Vdb%+ − . (4.14)
Ω Ω 2τ 2τ
The two differences in the right hand side may be easily estimated (by convexity, for instance) so that we
get (for a constant C depending on τ, of course):
Z
%) − f (%ε ) ≤ Cε
f (b

where f (t) = t log t (set to 0 in t = 0). Write
A = {x ∈ Ω : b
%(x) > 0}, B = {x ∈ Ω : b
%(x) = 0}.
Since f is convex we write, for x ∈ A, f (b%(x)) − f (%ε (x)) ≥ (b
%(x) − %ε (x)) f 0 (%ε (x)) = ε(b
%(x) − e
%(x))(1 +
log %ε (x)). For x ∈ B we simply write f (b
%(x)) − f (%ε (x)) = −εc log(εc). This allows to write
Z
−εc log(εc)|B| + ε (b %(x) − c)(1 + log %ε (x)) dx ≤ Cε
A
and, dividing by ε, Z
− c log(εc)|B| + %(x) − c)(1 + log %ε (x)) dx ≤ C
(b (4.15)
A
Note that the always have
%(x) − c)(1 + log %ε (x)) ≥ (b
(b %(x) − c)(1 + log c)
%(x) ≥ c and b
(just distinguish between the case b %(x) ≤ c). Thus, we may write
Z
−c log(εc)|B| + (b %(x) − c)(1 + log c) dx ≤ C.
A
Letting ε → 0 provides a contradiction, unless |B| = 0. 

38
Remark 4.3. If, after proving |B| = 0, we go on with the computations (as it is actually done in Chapter
% ∈ L1 , which is a stronger condition than just b
8 in [84]), we can also obtain logb % > 0 a.e.

We can now compute the first variation and give optimality conditions on the optimal %τ(k+1) .

Proposition 4.16. The optimal measure %τ(k+1) in (4.13) satisfies

ϕ
log(%τ(k+1) ) + V + = const. a.e. (4.16)
τ
where ϕ is the (unique) Kantorovich potential from %τ(k+1) to %τ(k) . In particular, log %τ(k+1) is Lipschitz
continuous. If T kτ is the optimal transport from %τ(k+1) to %τ(k) , then it satisfies

id − T kτ
vτ(k) := = −∇ log(%τ(k+1) ) + V a.e.
 
(4.17)
τ
Proof. Take the optimal measure b % := %τ(k+1) . We can check that all the functionals involved in the
minimization admit a first variation at b%. For the linear term it is straightforward, and for the Wasserstein
term we can apply Proposition 4.8. The uniqueness of the Kantorovich potential is guaranteed by the fact
% is strictly positive a.e. on the domain Ω (Lemma 4.15). For the entropy term, the integrability of
that b
logb% provided in the previous remark allows (computations are left to the reader) to differentiate under
the integral sign for smooth perturbations.
The first variation of J is hence
δJ
= f 0 (%) + V = log(%) + 1 + V.
δ%
Using carefully (details are in Chapter 8 of [84]) the optimality conditions, we obtain Equation (4.16),
% > 0. In particular, this implies that %τ(k+1) is Lipschitz continuous, since we
which is valid a.e. since b
have
ϕ(x)
!
τ
%(k+1) (x) = exp C − V(x) − .
τ
Then, one differentiates and gets the equality
id − T kτ
= −∇ log(%τ(k+1) ) + V a.e.
 
∇ϕ =
τ
and this allows to conclude. 

Interpolation between time steps and uniform estimates Let us collect some other tools

Proposition 4.17. For any τ > 0, the sequence of minimizers satisfies

X W22 (%τ(k+1) , %τ(k) )


≤ C := 2(J(%0 ) − inf J).
k
τ

Proof. This is just the standard estimate in the minimizing movement scheme, corresponding to what we
presented in (2.5) in the Euclidean case. 

39
Let us define two interpolations between the measures %τ(k) .
With this time-discretized method, we have obtained, for each τ > 0, a sequence (%τ(k) )k . We can use
it to build at least two interesting curves in the space of measures:

• first we can define some piecewise constant curves, i.e. %τt := %τ(k+1) for t ∈]kτ, (k + 1)τ]; associated
to this curve we also define the velocities vτt = vτ(k+1) for t ∈]kτ, (k + 1)τ], where vτ(k) is defined as
in (4.17): vτ(k) = (id − T kτ )/τ, taking as T kτ the optimal transport from %τ(k+1) to %τ(k) ; we also define
the momentum variable E τ = %τ vτ ;

%τt that interpolate the discrete values (%τ(k) )k along geodesics:


• then, we can also consider the densities e
!
kτ − t τ
%τt = v(k) + id %τ(k) , for t ∈](k − 1)τ, kτ[; (4.18)
τ
e
#

vτt are defined so that (e


the velocities e %τ ,e
vτ ) satisfy the continuity equation and ||e
vτt ||L2 (e%τt ) = |(e
%τ )0 |(t).
To do so, we take −1
vτt = vτt ◦ (kτ − t)vτ(k) + id ;

e

%τe
eτ = e
as before, we define a momentum variable: E vτ .

After these definitions we consider some a priori bounds on the curves and the velocities that we
defined. We start from some estimates which are standard in the framework of Minimizing Movements.
%τ is constant on each interval ]kτ, (k + 1)τ[ and equal
Note that the velocity (i.e. metric derivative) of e
to
W2 (%τ(k+1) , %τ(k) ) 1 Z !1/2
τ2 τ
= |id − T k | d%(k+1) = ||vτk+1 ||L2 (%τ ) ,
τ τ (k+1)

which gives
W2 (%τ(k+1) , %τ(k) )
vτt ||L2 (e%τt )
||e = |(e τ 0
% ) |(t) = = ||vτt ||L2 (%τt ) ,
τ
vτ has been chosen so that its L2 norm equals the metric
where we used the fact that the velocity field e
τ
%.
derivative of the curve e
In particular we can obtain
Z T Z Z T Z T
τ
|E |([0, T ] × Ω) = dt |vτt |d%τt = ||vτt ||L1 (%τt ) dt ≤ ||vτt ||L2 (%τt ) dt
0 Ω 0 0
τ τ 2
T  W2 (%(k+1) , %(k) ) 
Z 
||vτt ||2L2 (%τ ) dt = T 1/2
X
≤ T 1/2
τ   ≤ C.
0 t
k
τ

eτ is completely analogous
The estimate on E
Z T Z Z T X  W2 (%τ(k+1) , %τ(k) ) 2
eτ |([0, T ] × Ω) =
|E dt vτt |de
|e %τt ≤T 1/2
vτt ||2L2 (e%τ ) dt
||e =T 1/2
τ   ≤ C.
0 Ω 0 t
k
τ

40
This gives compactness of E τ and Eeτ in the space of vector measures on space-time, for the weak con-
%τ is concerned, we can obtain more than that. Consider the following estimate, for
vergence. As far as e
s<t Z t Z ! t 1/2
%τt , e
W2 (e %τs ) ≤ %τ )0 |(r)dr ≤ (t − s)1/2
|(e %τ )0 |(r)2 dr
|(e .
s s
From the previous computations, we have again
Z T X  W2 (%τ(k+1) , %τ(k) ) 2
τ 0
% ) |(r) dr =
|(e 2
τ   ≤ C,
0 k
τ

and this implies


%τt , e
W2 (e %τs ) ≤ C(t − s)1/2 , (4.19)

which means that the curves are uniformly Hölder continuous. Since they are defined on [0, T ] and
e
valued in W2 (Ω) which is compact, we can apply the Ascoli Arzelà Theorem. This implies that, up to
subsequences, we have

E τ * E in Md ([0, T ] × Ω), E
eτ * E
e in Md ([0, T ] × Ω); %τ → % uniformly for the W2 distance.
e

The limit curve %, from the uniform bounds on e %τ , is both 12 -Hölder continuous and absolutely continuous
in W2 . As far as the curves %τ are concerned, they also converge uniformly to the same curve %, since

W2 (%τt , e
%τt ) ≤ C τ (a consequence of (4.19), of the fact that e %τ = %τ on the points of the form kτ and of
the fact that %τ is constant on each interval ]kτ, (k + 1)τ]).
Let us now prove that E e = E.

Lemma 4.18. Suppose that we have two families of vector measures E τ and E
eτ such that
eτ = e
• E %τe vτ ; E τ = %τ vτ ;
−1
vτt = vτt ◦ (kτ − t)vτ(k) + id ; e%τ = (kτ − t)vτ(k) + id %τ ;
  
• e
#

|vτ |2 d%τ ≤ C (with C independent of τ);


RR

• E τ * E and E
eτ * E
e as τ → 0
e = E.
Then E
Proof. It is sufficient to fix a Lipschitz function f : [0, T ] × Ω → Rd and to prove f · dE = f · dE.
R R
e
To do that, we write
Z Z T Z Z T Z
τ τ τ
f · dE = % = f ◦ (kτ − t)vτ + id · vτt d%τ ,

e dt f ·e
vt de dt
0 Ω 0 Ω

which implies
Z Z Z T Z Z T Z
τ τ τ
f ◦ (kτ − t)v +id − f |vτt |d%τ |vτt |2 d%τ ≤ Cτ.

f · dE − f · dE ≤
e dt ≤ Lip( f )τ
0 Ω 0 Ω

eτ and f · dE τ is the same, i.e. E = E.


R R
This estimate proves that the limit of f · dE e 

41
Relation between % and E We can obtain the following:
Proposition 4.19. The pair (%, E) satisfies, in distributional sense
∂t % + ∇ · E = 0, E = −∇% − %∇V,
with no-flux boundary conditions on ∂Ω. In particular we have found a solution to




 ∂t % − ∆% − ∇ · (%∇V) = 0,
(∇% + %∇V) · n = 0,





%(0) = %0 ,

where the initial datum is to be intended in the following sense: the curve t 7→ %t is (absolutely) continu-
ous in W2 , and its initial value is %0 .
Proof. First, consider the weak convergence (e %τ , E
eτ ) * (%, E) (which is a consequence of E e = E). The
continuity equation ∂te τ
% +∇·E τ
e = 0 is satisfied by construction in the sense of distributions, and this
passes to the limit. Hence, ∂t % + ∇ · E = 0. The continuity in W2 and the initial datum pass to the limit
because of the uniform C 0,1/2 bound in (4.19).
Then, use the convergence (%τ , E τ ) * (%, E). Actually, using the optimality conditions of Proposition
4.16 and the definition of E τ = vτ %τ , we have, for each τ > 0, E τ = −∇%τ − %τ ∇V (note that %τ
is Lipschitz continuous for fixed τ > 0, which allows to write this equality, exploiting in particular
∇%τ = %τ ∇(log(%τ )), which would have no meaning for less regular measures; this regularity could be,
of course, lost in the limit τ → 0). It is not difficult to pass this condition to the limit either (but now ∇%
will have to be intended in the sense of distributions). Take f ∈ Cc1 (]0, T [×Ω; Rd ) and test:
Z Z Z Z Z
τ τ τ τ
f · dE = − f · ∇% − f · ∇V% = ∇ · f d% − f · ∇V%τ .

τ
R as % * R%, using the assumption
These terms pass to the limit V ∈ C 1 , since all the test functions above
are continuous. This gives f · dE = (∇ · f )d% − f · ∇V d%, which implies E = −∇% − %∇V.
R


4.6 Other gradient-flow PDEs


We saw in the previous section the example, and a detailed analysis, of the Fokker-Planck equation. The
main reason to choose such example is its simplicity, because it is a linear equation. This allows easily
to pass to the limit all the terms in the relation between E and %. Yet, many other important equations
can be obtained as gradient flows in W2 , choosing other functionals. We will see some of them here,
without entering into details of the proof. We stress that handling non-linear terms is often difficult and
requires ad-hoc estimates. We will discuss some of the ideas that one should apply. Note on the other
hand that, if one uses the abstract theory of [6], there is no need to produce these ad-hoc estimates: after
developing a general (and hard) theory for general metric spaces, the second part of [6] explains which
are the curves that one finds as gradient flows in W2 , with the relation between the velocity field v and
the derivatives (with an ad-hoc notion of subdifferential in the Wasserstein space) of the functional F.
This automatically gives the desired result as a part of a larger theory.
We will discuss five classes of PDEs in this section: the porous media equation, the Keller-Segel
equation, more general diffusion, advection and aggregation equations, a model for crowd motion with
density constraints, and the flow of the squared sliced Wasserstein distance SW22 .

42
Porous Media Equation This equation models the diffusion of a substance into a material whose
properties are different than the void, and which slows down the diffusion. If one considers the case of
particles which are advected by a potential and subject to this kind of diffusion, the PDE reads

∂t % − ∆(%m ) − ∇ · (%∇V) = 0,

for an exponent m > 1. One can formally check that this is the equation of the gradient flow of the energy
Z Z
1
F(%) = % (x) dx + V(x)%(x) dx
m
m−1

(set to +∞ for
 % < L ). Indeed, the first variation of the first part of the functional is
m
m−1 %
m m−1
, and
%∇ m−1 %
m m−1
= m% · %m−2 ∇% = ∇(%m ).
W22 (%,%0 )
Note that, in the discrete step min% F(%) + 2τ , the solution % satisfies

ϕ

 m−1 %m−1 + V + =C % − a.e.
 m
τ

 m %m−1 + V + ϕ
on {% = 0}.


m−1 τ ≥C

This allows to express %m−1 = m−1 m (C − V − ϕ/τ)+ and implies that % is compactly supported if %0 is
compactly supported, as soon as V has some growth conditions. This fact contradicts the usual infinite
propagation speed that one finds in linear diffusion models (Heat and Fokker-Planck equation).
The above analysis works in the case m > 1: the fact that the usual Fokker-Planck equation can
be obtained for m → 1 can R nothing changes if we define F via F(%) =
R be seen in the following way:
m−1 (% (x) − %(x)) dx + V(x)%(x) dx, since the mass %(x) dx = 1 is a given constant. Yet, then it is
1
R
m

is easy to guess the limit, since


%m − %
lim = % log %,
m→1 m − 1

which provides the entropy that we already used for the Heat equation.
It is also interesting to consider the case m < 1: the function %m − % is no longer convex, but it
is concave and the negative coefficient 1/(m − 1) makes it a convex function. Unfortunately, it is not
superlinear at infinity, which makes it more difficult to handle. But for m ≥ 1 − 1/d the functional F is
still displacement convex. The PDE that we get as a gradient flow is called Fast diffusion equation, and
it has different (and opposite) properties in terms of diffusion rate than the porous media one.
From a technical point of view, proving compactness of the minimizing movement scheme for these
equations is not very easy, since one needs to pass to the limit the non-linear term ∆(%m ), which means
proving strong convergence on % instead of weak convergence. The main ingredient is a sort of H 1 bound
in space, which comes from the fact that we have
T T T
|∇ϕ|2
Z Z Z Z Z
|∇(% m−1/2 2
)| dxdt ≈ % 2 dxdt = |%0 |(t)2 dt ≤ C (4.20)
0 Ω 0 Ω τ 0

(but one has to deal with the fact that this is not a full H 1 bound, and the behavior in time has to be
controlled, via some variants of the classical Aubin-Lions Lemma, [13]).

43
Diffusion, advection and aggregation Consider a more general case where the movement is advected
by a potential determined by the superposition of many potentials, each created by one particle. For
instance, given a function W : Rd → R,R the particle located at x produces a potential W(· − x) and,
globally, the potential is given by V(y) = W(y − x)d%(x), i.e. V = W ∗ %. The equation, if every particle
follows −∇V is
∂t % − ∇ · (% ((∇W) ∗ %)) = 0,
where we used ∇(W ∗ %) = (∇W) ∗ %. If W is even (i.e. the interaction between x and y is the same as
between y and x), then this is the gradient flow of the functional
ZZ
1
F(%) = W(x − y)d%(x)d%(y).
2
RR
When W is convex, for instance in the quadratic case |x−y|2 d%(x)d%(y), this gives raise to a general
aggregation behavior of the particles, and as t → ∞ one expects %t * δ x0 (the point x0 depending on the
initial datum %0 : in the quadratic example above it is the barycenter of %0 ). If W is not smooth enough,
the aggregation into a unique point can also occur in finite time, see [30].
Note that these equations are both non-linear (the term in the divergence is quadratic in %) and non-
local. It is rare to see these non-local aggregation terms alone in the equation, as they are often coupled
with diffusion or other terms. This is why we do not provide specific references except [30]. We also note
that from the technical point of view this kind of nonlinearity is much more compact than the previous
ones, since the convolution operator transforms weak convergence into strong one, provided W is regular
enough.
Most often, the above aggregation energy is studied together with an internal energy and a confining
potential energy, using the functional
Z Z ZZ
1
F(%) = f (%(x)) dx + V(x) d%(x) + W(x − y)d%(x)d%(y).
2
This gives the equation
∂t % − ∇ · % ∇( f 0 (%)) + ∇V + (∇W) ∗ % = 0.
 

Among the mathematical interest for this family of equations, we stress that they are those where more
results (in termes of stability, and convergence to equilibrium) can be proven, due to the fact that condi-
tions to guarantee that F is displacement convex are well-known (Section 4.4). See in particular [31, 32]
for physical considerations and convergence results on this equation.

Keller-Segel An interesting model in mathematical biology (see [57, 58] for the original modeling)
is the following: a population % of bacteria evolves in time, following diffusion and advection by a
potential. The potential is given by the concentration u of a chemo-attractant nutrient substance, produced
by the bacteria themselves. This kind of phenomenon is also known under the name of chemotaxis.
More precisely, bacteria move (with diffusion) in the direction where they find more nutrient, i.e. in the
direction of ∇u, where the distribution of u depends on their density %. The easiest model uses linear
diffusion and supposes that the distribution of u is related to % by the condition −∆u = %, with Dirichlet

44
boundary conditions u = 0 on ∂Ω. This gives the system




∂t % + α∇ · (%∇u) − ∆% = 0,
−∆u = %,





u = 0 on ∂Ω, %(0, ·) = %0 , ∂n % − %∂n u = 0 on ∂Ω.

The parameter α stands for the attraction intensity of bacteria towards the chemo-attractant. By scaling,
instead of using probability measures % ∈ P(Ω) one can set α = 1 and play on the mass of % (indeed, the
non-linearity is only in the term %∇u, which is quadratic in %).
Alternative equations can be considered for u, such as −∆u + u = % with Neumann boundary condi-
tions. On the contrary, the boundary conditions on % must be of no-flux type, to guarantee conservation
of the mass. This system can also be set in the whole space, with suitable decay conditions at infinity.
Note also that often the PDE condition defining u as the solution of a Poisson equation is replaced, when
Ω = R2 , by the explicit formula
Z
1
u(x) = − log(|x − y|)%(y) dy. (4.21)
2π R2
There is some confusion in higher dimension, as the very same formula does not hold for the Poisson
equation (the logarithmic kernel should indeed be replaced by the corresponding Green function), and
there two alternatives: either keep the fact that u solves −∆u = %, or the fact that it derives from % through
(4.21).
One can see that this equation is the gradient flow of the functional
Z Z
1
F(%) = % log % − |∇u% |2 , where u% ∈ H01 (Ω) solves − ∆u% = %.
Ω 2 Ω
R
Indeed, the only non-standard computation is that of the first variation of the Dirichlet term − 12 |∇u% |2 .
Suppose %ε = % + εχ and set u%+εχ = u% + εuχ . Then
Z ! Z Z Z
d 1
− |∇u%+εχ | 2
= − ∇u% · ∇uχ = u% ∆uχ = − u% χ.
dε 2 |ε=0

It is interesting to note that this Dirichlet term is indeed (up to the coefficient −1/2) the square of the H −1
norm of %, since ||u||H 1 = ||∇u||L2 = ||%||H −1 . We will call it the H −1 term.
0
It is alsoRpossible to replace linear diffusionRwith non-linear diffusion of porous media type, replacing
the entropy % log % with a power-like energy %m .
W 2 (%,% )
Note that the variational problem min F(%) + 2 2τ 0 requires some assumption to admit existence
of minimizers, as unfortunately the Dirichlet term has the wrong sign. In particular, it would be possible
that the infimum is −∞, or that the energy is not l.s.c. because of the negative sign.
When we use non-linear diffusion with m > 2 the existence of a solution is quite easy. Sophisticated
functional inequalities allow to handle smaller exponents, and even the linear diffusion case in dimension
2, provided α ≤ 8π. We refer to [18] and to the references therein for details on the analysis of this
equation. Some of the technical difficulties are similar to those of the porous media equation, when
passing to the limit non-linear terms. In [18], the H −1 term is treated in terms of its logarithmic kernel,

45
and ad-hoc variables symmetrization tricks are used. Note however that the nonlinear diffusion case is
easier, as Lm bounds on % translate into W 2,m bounds on u, and hence strong compactness for ∇u.
We also remark that the above model, coupling a parabolic equation on % and an elliptic one on
u, implicitly assumes that the configuration of the chemo-attractant instantaneously follows that of %.
More sophisticated models can be expressed in terms of the so-called parabolic-parabolic Keller-Segel
equation, in the form




 ∂t % + α∇ · (%∇u) − ∆% = 0,
∂t u − ∆u = %,





u = 0 on ∂Ω, %(0, ·) = %0 , ∂n % − %∂n u = 0 on ∂Ω.

or other variants with different boundary conditions. This equation can also be studied as a gradient flow
in two variables, using the distance W2 on % and the L2 distance on u; see [19].

Crowd motion The theory of Wasserstein gradient flows has interestingly been applied to the study of
a continuous model of crowd movement under density constraints.
Let us explain the modeling, starting from the discrete case. Suppose that we have a population
of particles such that each of them, if alone, would follow its own velocity u (which could a priori
depend on time, position, on the particle itself. . . ). Yet, these particles are modeled by rigid disks that
cannot overlap, hence, the actual velocity cannot always be u, in particular if u tends to concentrate the
masses. We will call v the actual velocity of each particle, and the main assumption of the model is that
v = Padm(q) (u), where q is the particle configuration, adm(q) is the set of velocities that do not induce
(for an infinitesimal time) overlapping starting from the configuration q, and Padm(q) is the projection on
this set.
The simplest example is the one where every particle is a disk with the same radius R and center
located at qi . In this case we define the admissible set of configurations K through

K := {q = (qi )i ∈ ΩN : |qi − q j | ≥ 2R for all i , j}.

In this way the set of admissible velocities is easily seen to be

adm(q) = {v = (vi )i : (vi − v j ) · (qi − q j ) ≥ 0 for all (i, j) with |qi − q j | = 2R}.

The evolution equation which has to be solved to follow the motion of q is then

q0 (t) = Padm(q(t)) u(t) (4.22)

(with q(0) given). Equation (4.22), not easy from a mathematical point of view, was studied by Maury
and Venel in [66, 67].
We are now interested in the simplest continuous counterpart of this microscopic model (without
pretending that it is any kind of homogenized limit of the discrete case, but only an easy re-formulation
in a density setting). In this case the particles population will be described by a probability density
% ∈ P(Ω), the constraint becomes a density constraint % ≤ 1 (we define the set K = {% ∈ P(Ω) : % ≤ 1}),
the set of admissible velocities will be described by the sign of the divergence on the saturated region
{% = 1}: adm(%) = v : Ω → Rd : ∇ · v ≥ 0 on {% = 1} ; we will consider a projection P, which will be


46
either the projection in L2 (Ld ) or in L2 (%) (this will turn out to be the same, since the only relevant zone
is {% = 1}). Finally, we solve the equation

∂t %t + ∇ · %t Padm(%t ) ut = 0.

(4.23)

The main difficulty is the fact that the vector field v = Padm(%t ) ut is neither regular (since it is obtained
as an L2 projection, and may only be expected to be L2 a priori), nor it depends regularly on % (it is very
sensitive to small changes in the values of %: passing from a density 1 to a density 1 − ε completely
modifies the saturated zone, and hence the admissible set of velocities and the projection onto it).
In [64] these difficulties have been overpassed in the case u = −∇D (where D : Ω → R is a given
Lipschitz function) and the existence of a solution (with numerical simulations) is proven via a gradient
flow method. Indeed, (4.23) turns out to be the gradient flow in W2 of the energy
R
 D d% if % ∈ K;

F(%) = 

+∞
 if % < K.

We do not enter into the details of the study of this equation, but we just make a little bit more
precise the definitions above. Actually, instead of considering the divergence of vector fields which are
only supposed to be L2 , it is more convenient to give a better description of adm(%) by duality:
( Z )
adm(%) = v ∈ L (%) :
2
v · ∇p ≤ 0 ∀p ∈ H (Ω) : p ≥ 0, p(1 − %) = 0 .
1

In this way we characterize v = Padm(%) (u) through


Z
u = v + ∇p, v ∈ adm(%), v · ∇p = 0,

p ∈ press(%) := {p ∈ H 1 (Ω), p ≥ 0, p(1 − %) = 0},

where press(%) is the space of functions p used as test functions in the dual definition of adm(%), which
play the role of a pressure affecting the movement. The two cones ∇press(%) (defined as the set of
gradients of elements of press(%)) and adm(%) are in duality for the L2 scalar product (i.e. one is defined
as the set of vectors which make a negative scalar product with all the elements of the other). This allows
for an orthogonal decomposition ut = vt + ∇pt , and gives the alternative expression of Equation (4.23),
i.e. 
∂t %t + ∇ · %t (ut − ∇pt ) = 0,
 

(4.24)
0 ≤ % ≤ 1, p ≥ 0, p(1 − %) = 0.

More details can be found in [64, 81, 65]. In particular, in [64] it is explained how to handle the
nonlinearities when passing to the limit. Two sources of nonlinearity are observed: the term %∇p is easy
to consider, since it is actually equal to ∇p (as we have p = 0 on {% , 1}); on the other hand, we need to
deal with the equality p(1 − %) = 0 and pass it to the limit. This is done by obtaining strong compactness
RTR
on p, from a bound on 0 Ω |∇p|2 , similarly to (4.20). It is important to observe that transforming %∇p
into ∇p is only possible in the case of one only phase %, while the multi-phasic case presents extra-
difficulties (see, for instance [29]). These difficulties can be overpassed in the presence of diffusion,
which provides extra H 1 bounds and hence strong convergence for % (for crowd motion with diffusion,

47
see [74]). Finally, the uniqueness question is a tricky one: if the potential D is λ-convex then this is
easy, but considerations from the DiPerna-Lions theory ([40, 4]) suggest that uniqueness should be true
under weaker assumptions; nevertheless this is not really proven. On the other hand, in [39] uniqueness
is proven under very mild assumptions on the drift (or, equivalently, on the potential), provided some
diffusion is added.

Sliced Wasserstein distance Inspired by a construction proposed in [77] for image impainting, M.
Bernot proposed around 2008 an interesting scheme to build a transport map between two given measures
which, if not optimal, at least had some monotoncity properties obtained in an isotropic way.
Consider two measures µ, ν ∈ P(Rd ) and project them onto any one-dimensional direction. For every
e ∈ Sd−1 (the unit sphere of Rd ), we take the map πe : Rd → R given by πe (x) = x · e and look at the image
measures (πe )# µ and (πe )# ν. They are measures on the real line, and we call T e : R → R the monotone
(optimal) transport between them. The idea is that, as far as the direction e is concerned, every point x of
> be displaced of a vector ve (x) := (T e (πe (x)) − πe (x))e. To do a global displacement, consider
Rd should
v(x) = Sd−1 ve (x) dH d−1 (e), where H d−1 is the uniform Hausdorff measure on the sphere.
There is no reason to think that id + v is a transport map from µ to ν, and indeed in general it is not.
But if one fixes a small time step τ > 0 and uses a displacement τv getting a measure µ1 = (id + τv)# µ,
then it is possible to iterate the construction. One expects to build in this way a sequence of measures µn
that converges to ν, but this has not yet been rigorously proven. From the empirical point of view, the
transport maps that are obtained in this way are quite satisfactory, and have been tested in particular in the
discrete case (a finite number of Dirac masses with equal mass, i.e. the so-called assignment problem).
For the above construction, discrete in time and used for applications to images, there are not many
references (essentially, the only one is [78], see also [20] for a wider discussion). A natural continuous
counterpart exists: simply consider, for every absolutely continuous measure % ∈ P(Rd ), the vector field
v = v(%) that we defined above as a velocity field depending on % (absolute continuity is just required to
avoid atoms in the projections). Then, we solve the equation

∂t %t + ∇ · (%t v(%t ) ) = 0. (4.25)

It happens that this equation has a gradient flow structure: it is indeed the gradient flow in W2 of a
functional related to a new distance on probability measures, induced by this construction.
This distance can be defined as follows (see [78]): given two measures µ, ν ∈ P2 (Rd ), we define
? !1/2
SW2 (µ, ν) := W2 ((πe )# µ, (πe )# ν) dH (e)
2 d−1
.
Sd−1

This quantity could have been called “projected Wasserstein distance” (as it is based on the behavior
through projections), but since in [78] it is rather called “sliced Wasserstein distance”, we prefer to keep
the same terminology.
The fact that SW2 is a distance comes from W2 being a distance. The triangle inequality may be
proven using the triangle inequality for W2 (see Section 5.1) and for the L2 norm. Positivity and symmetry
are evident. The equality SW2 (µ, ν) = 0 implies W22 ((πe )# µ, (πe )# ν) for all e ∈ Sd−1 . This means (πe )# µ =
(πe )# ν) for all e and it suffices to prove µ = ν.
It is evident from its definition, and from the fact that the maps πe are 1-Lipschitz (which implies
W2 ((πe )# µ, (πe )# ν) ≤ W22 (µ, ν)) that we have SW2 (µ, ν) ≤ W2 (µ, ν). Moreover, the two distances also
2

48
induce the same topology, at least on compact sets. Indeed, the identity map from W2 to (P(Ω), SW2 )
is continuous (because of SW2 ≤ W2 ) and bijective. Since the space where it is defined is compact, it
β
is also a homeomorphism. One can also prove more, i.e. an inequality of the form W2 ≤ CSW2 for a
suitable exponent β ∈]0, 1[. Chapter 5 in [20] proves this inequality with β = (2(d + 1))−1 .
The interest in the use of this distance is the fact that one has a distance on P(Ω) with very similar
qualitative properties as W2 , but much easier to compute, since it only depends on one-dimensional
computations (obviously, the integral over e ∈ Sd−1 is discretized in practice, and becomes an average
over a large number of directions). We remark anyway an important difference between W2 and SW2 :
the latter is not a geodesic distance. On the contrary, the geodesic distance associated to SW2 (i.e. the
minimal lenght to connect two measures) is exactly W2 .
If we come back to gradient flows, it is not difficult to see that the equation (4.25) is the gradient
flow of F(%) := 12 SW22 (%, ν) and can be studied as such. Existence and estimates on the solution of this
equation are proven in [20], and the nonlinearity of v(%) is quite easy to deal with. On the other hand,
many, natural and useful, questions are still open: is it true that %t * ν as t → ∞? Can we define (at least
under regularity assumptions on the initial data) the flow of the vector field v(%t ) , and what is the limit of
this flow as t → ∞? The idea is that it should be a transport map between %0 and ν and, if not the optimal
transport map, at least a “good” one, but most of the related questions are open.

4.7 Dirichlet boundary conditions


For sure, the attentive reader has already noted that all the equations that have been identified as gradient
flows for the distance W2 on a bounded domain Ω are always accompanied by Neumann boundary
conditions. This should not be surprising. Wasserstein distances express the movement of masses when
passing from a configuration to another, and the equation represents the conservation of mass. It means
that we are describing the movement of a collection % of particles, bound to stay inside a given domain Ω,
and selecting their individual velocity v in way which is linked to the global value of a certain functional
F(%). It is natural in this case to have boundary conditions which write down the fact that particles do
not exit the domain, and the pointwise value of the density % on ∂Ω is not particularly relevant in this
analysis. Note that “do not exit” does not mean “those on the boundary stay on the boundary”, which
is what happens when solutions are smooth and the velocity field v satisfies v · n = 0. Yet, the correct
Neumann condition here is rather %v · n = 0 a.e., which means that particles could enter from ∂Ω into
the interior, but immediately after it happens there will be (locally) no mass on the boundary, and the
condition is not violated, hence. On the contrary, should some mass go from the interior to outside Ω,
then we would have a violation of the Neumann condition, since there would be (intuitively) some mass
% > 0 on the boundary with velocity directed outwards.
Anyway, we see that Dirichlet conditions do not find their translation into W2 gradient flows!
To cope with Dirichlet boundary conditions, Figalli and Gigli defined in [44] a sort of modified
Wasserstein distance, with a special role played by the boundary ∂Ω, in order to study the Heat equation
∂t % = ∆% with Dirichlet b.c. % = 1 on ∂Ω. Their definition is as follows: given two finite positive

measures µ, ν ∈ M+ (Ω) (not necessarily probabilities, not necessarily with the same mass), we define
◦ ◦
Πb(µ, ν) = {γ ∈ M+ (Ω × Ω) : (π x )# γ Ω = µ, (πy )# γ Ω = ν}.

49
Then, we set s (Z )
Wb2 (µ, ν) := inf |x − y|2 dγ, γ ∈ Πb(µ, ν) .
Ω×Ω

The index b stands for the special role played by the boundary. Informally, this means that the transport
◦ ◦
from µ to ν may be done usually (with a part of γ concentrated on Ω × Ω), or by moving some mass

from µ to ∂Ω (using γ (Ω × ∂Ω)), then moving from one point of the boundary to another point of the
boundary (this should be done by using γ (∂Ω × ∂Ω), but since this part of γ does not appear in the
constraints, then we can forget about it, and the transport is finally free on ∂Ω), and finally from ∂Ω to ν

(using γ (∂Ω × Ω)).

In [44] the authors prove that Wb2 is a distance, that the space M+ (Ω) is always a geodesic space,
independently of convexity or connectedness properties of Ω (differently from what happens with Ω,
since here the transport is allowed to “teleport” from one part of the boundary to another, either to pass
from one connected component to another or to follow a shorter path going out of Ω), and they study the
gradient flow, for this distance, of the functional F(%) = (% % − %) dx. Note that in the usual study
R
log
of the entropy on P(Ω) one can decide to forget the term − %, which is anyway a constant because the
R

total mass is fixed. Here this term becomes important (if the function f (t) = t log t − t is usually preferred
to t log t, it is because its derivative is simpler, f 0 (t) = log t, without changing its main properties).
With this choice of the functional and of the distance, the gradient flow that Figalli and Gigli obtain
is the Heat equation with the particular boundary condition % = 1 on ∂Ω. One could wonder where the
constant 1 comes from, and a reasonable explanation is the following: if the transport on the boundary
is free of charge, then automatically the solution selects the value which is the most performant for the
functional, i.e. theRconstant t which minimizes f (t) = t log t − t. In this way, changing the linear part
and using F(%) = (% log % − c%) dx could change the constant on the boundary, but the constant 0 is
forbidden for the moment. It would be interesting to see how far one could go with this approach and
which Dirichlet conditions and which equations could be studied in this way, but this does not seem to
be done at the moment.
Moreover, the authors explain that, due to the lack of geodesic convexity of the entropy w.r.t. Wb2 , the
standard abstract theory of gradient flows is not able to provide uniqueness results (the lack of convexity
is due in some sense to the possible concentration of mass on the boundary, in a way similar to what
happened in [64] when dealing with the door on ∂Ω). On the other hand, standard Hilbertian results on
the Heat equation can provide uniqueness for this equation, as the authors smartly remark in [44].
We observe that this kind of distances with free transport on the boundary were already present in
[22, 21], but in the case of the Wasserstein distance W1 , and the analysis in those papers was not made
for applications to gradient flows, which are less natural to study with p = 1. We can also point out a
nice duality formula:
(Z ) (Z )
Wb1 (µ, ν) := min |x − y| dγ : γ ∈ Πb(µ, ν) = sup u d(µ − ν) : u ∈ Lip1 (Ω), u = 0 on ∂Ω .

◦ ◦
In the special case p = 1 and µ(Ω) = ν(Ω), we also obtain

Wb1 (µ, ν) = Tc (µ, ν), for c(x, y) = min{|x − y|, d(x, ∂Ω) + d(y, ∂Ω)}.

50
The cost c is a pseudo-distance on Ω where moving on the boundary is free. This kind of distance has
also been used in [28] (inspired by [21]) to model free transport costs on other lower dimensional sets,
and not only the boundary (with the goal to model, for instance, transportation networks, and optimize
their shape). It is interesting to see the same kind of ideas appear for so different goals.

4.8 Numerical methods from the JKO scheme


We present in this section two different numerical methods which have been recently proposed to tackle
evolution PDEs which have the form of a gradient flow in W2 (Ω) via their variational JKO scheme. We
will only be concerned with discretization methods allowing the numerical treatment of one step of the
JKO scheme, i.e. solving problems of the form
( )
1 2
min F(%) + W2 (%, ν) : % ∈ P(Ω) ,
2

for suitable ν (to be taken equal to %τk ) and suitable F (including the τ factor). We will not consider the
convergence as τ → 0 of the iterations to solutions of the evolution equation.
We will present two methods. One, essentially taken from [16], is based on the Benamou-Brenier
formula first introduced in [14] as a numerical tool for optimal transport. This method is well-suited
for the case where the energy F(%) used in the R gradient flowR is a convex function of %. For instance,
it works for functionals of the form F(%) = f (%(x))dx + Vd% and can be used for Fokker-Planck
and porous medium equations. The second method is based on methods from semi-discrete optimal
transport, essentially developed by Q. Mérigot using computational geometry (see [72, 60] and [62]
for 3D implementation) and translates the problem into an optimization problem in the class of convex
functions;
R it is well suited for the case where F is geodesically convex,R which means that the term
f (%(x)) dx is ony admissibleR if
R f satisfies McCann’s condition, the term Vd% needs V to be convex,
but interaction terms such as W(x − y) d%(x) d%(y) are also allowed, if W is convex.

Augmented Lagrangian methods Let us recall the basis of the Benamou-Brenier method. This
amounts to solve the variational problem (4.8) which reads, in the quadratic case, as
Z Z Z Z
min sup ad% + b · dE : ∂t %t + ∇ · Et = 0, %0 = µ, %1 = ν
(a,b)∈K2

where K2 = {(a, b) ∈ R × Rd : a + 21 |b|2 ≤ 0}. We then use the fact that the continuity equation constraint
can also be written as a sup penalization, by adding to the functional
Z Z Z Z Z Z
sup − ∂t φd% − ∇φ · dE + φ1 dν − φ0 dµ,
φ∈C 1 ([0,1]×Ω)

which is 0 in case the constraint is satisfied, and +∞ if not.


It is more convenient to express everything in the space-time formalism, i.e.R by writing ∇t,x φ for
(∂t φ, ∇φ) and using the variable m for (%, E) and A for (a, b). We also set G(φ) := φ1 dν − φ0 dµ. Then
R

the problem becomes


min sup m · (A − ∇t,x φ) − IK2 (A) + G(φ),
m A,φ

51
where the scalar product is here an L2 scalar product, but becomes a standard Euclidean scalar product
as soon as one discretizes (in time-space). The function IK2 denotes the indicator function in the sense of
convex analysis, i.e. +∞ if the condition A ∈ K2 is not satisfied, and 0 otherwise.
The problem can now be seen as the search for a saddle-point of the Lagrangian
L(m, (A, φ)) := m · (A − ∇t,x φ) − IK2 (A) + G(φ),
which means that we look for a pair (m, (A, φ)) (actually, a triple, but A and φ play together the role
of the second variable) where m minimizes for fixed (A, φ) and (A, φ) maximizes for fixed m. This fits
the following framework, where the variables are X and Y and the Lagrangian has the form L(X, Y) :=
X · ΛY − H(Y). In this case one can use a very smart trick, based on the fact that the saddle points of this
Lagrangan are the same of the augmented Lagrangian L̃ defined as L̃(X, Y) := X · ΛY − H(Y) − 2r |ΛY|2 ,
whatever the value of the parameter r > 0 is. Indeed, the saddle-point of L are characterized by (we
assume all the functions we minimize are convex and all the functions we maximize are concave)

ΛY = 0

 (optimality of X),
Λ X − ∇H(Y) = 0 (optimality of Y),

 t

while those of L̃ are characterized by



ΛY = 0

 (optimality of X),
Λt X − ∇H(Y) − rΛt ΛY = 0

 (optimality of Y),
which is the same since the first equation implies that the extra term in the second vanishes.
In this case, we obtain a saddle point problem of the form
r
min max m · (A − ∇t,x φ) − IK2 (A) + G(φ) − ||A − ∇t,x φ||2
m A,φ 2
(where the squared norm in the last term is an L2 norm in time and space), which is then solved by
iteratively repeating three steps: for fixed A and m, finding the optimal φ (which amounts to minimizing
a quadratic functional in calculus of variations, i.e. solving a Poisson equation in the space-time [0, 1]×Ω,
with Neumann boundary conditions, homogeneous on ∂Ω and non-homogeneous on t = 0 and t = 1, due
to the term G); then for fixed φ and m find the optimal A (which amounts to a pointwise minimization
problem, in this case a projection on the convex set K2 ); finally update m by going in the direction of the
gradient descent, i.e. replacing m with m − r(A − ∇t,x φ) (it is convenient to choose the parameter of the
gradient descent to be equal to that the Augmented Lagrangian).
This is what is done in the case where the initial and final measures are fixed. At every JKO step,
one is fixed (say, µ), but the other is not, and a penalization on the final %1 is added, of the form τF(%1 ).
Inspired from the considerations above, the saddle point below allows to treat the problem
Z Z
1 2
min W (%1 , µ) + f (%1 (x))dx + Vd%1
%1 2 2
by formulating it as
Z Z Z Z Z Z Z
min max m · (A − ∇t,x φ) + %1 · (φ1 + λ + V) − IK2 (A) − φ0 dµ − f ∗ (λ(x))dx
m,%1 A,φ,λ
Z Z Z
r r
− |A − ∇t,x φ| −
2
|φ1 + λ + V|2 ,
2 2

52
where we re-inserted the integration signs to underline the difference between integrals in space-time
(with m, A and φ) and in space only (with φ0 , φ1 , %1 , V and λ). The role of the variable λ is to be dual to
%1 , which allows to express f (%1 ) as supλ %1 λ − f ∗ (λ).
To find a solution to this saddle-point problem, an iterative procedure is also used, as above. The last
two steps are the update via a gradient descent of m and %1 , and do not require further explications. The
first three steps consist in the optimization of φ (which requires the solution of a Poisson problem) and in
two pointwise minimization problems in order to find A (which requires a projection on K2 ) and λ (the
minimization of f ∗ (λ) + 2r |φ1 (x) + λ + V(x)|2 − %1 (x)λ, for fixed x).
For the applications to gradient flows, a small time-step τ > 0 has to be fixed, and this scheme has
to be done for each k, using µ = %τk and setting %τk+1 equal to the optimizer %1 and the functions f and V
must include the scale factor τ. The time-space [0, 1] × Ω has to be discretized but the evolution in time
is infinitesimal (due to the small time scale τ), which allows to choose a very rough time discretization.
In practice, the interval [0, 1] is only discretized using less than 10 time steps for each k. . .
The interested reader can consult [16] for more details, examples and simulations.

Optimization among convex functions It is clear that the optimization problem


1 2
min W (%, µ) + F(%)
% 2 2
can be formulated in terms of transport maps as
Z
1
min |T (x) − x|2 dµ(x) + F(T # µ).
T :Ω→Ω 2 Ω
Also, it is possible to take advantage of Brenier’s theorem which characterizes optimal transport maps as
gradient of convex functions, and recast it as
Z
1
min |∇u(x) − x|2 dµ(x) + F((∇u)# µ).
u convex : ∇u∈Ω 2 Ω
It is useful to note that in the last formulation the convexity of u is not necessary to be imposed, as
it would anyway come up as an optimality condition. On the other hand, very often the functional F
involves explicity the density of the image measure (∇u)# µ (as it is the case for the typical example F ),
and in this case convexity of u helps in computing this image measure. Indeed, whenever u is convex we
can say that the density of % := (∇u)# µ (if µ itself is absolutely continuous, with a density that we will
denote by %0 ) is given by14
%0
%= ◦ (∇u)−1 .
det(D2 u)
Hence, we are facing a calculus of variations problem in the class of convex functions. A great
difficulty to attack this class of problems is how to discretize the space of convex functions. The first
natural approach would be to approximate them by piecewise linear functions over a fixed mesh. In this
case, imposing convexity becomes a local feature, and the number of linear constraints for convexity is
proportional to the size of the mesh. Yet, Choné and Le Meur showed in [35] that we cannot approximate
14
This same formula takes into account the possibility that ∇u could be non-injective, i.e. u non-strictly convex, in which
case the value of the density could be +∞ due to the determinant at the denominator which would vanish.

53
in this way all convex functions, but only those satisfying some extra constraints on their Hessian (for
instance, those which also have a positive mixed second derivative ∂2 u/∂xi ∂x j ). Because of this difficulty,
a different approach is needed. For instance, Ekeland and Moreno- Bromberg used the representation of
a convex function as a maximum of affine functions [42], but this needed many more linear constraints;
Oudet and Mérigot [73] decided to test convexity on a grid different (and less refined) than that where
the functions are defined. . . These methods give somehow satisfactory answers for functionals involving
u and ∇u, but are not able to handle terms involving the Monge-Ampère operator det(D2 u).
The method proposed in [17], that we will roughly present here, does not really use a prescribed
mesh. The idea is the following: suppose that µ is a discrete measure of atomic type, i.e. of the form
j a j δ x j . A convex defined on its support S := {x j } j will be a function u : S → R such that at each point
P
x ∈ S the subdifferential

∂u(x) := {p ∈ Rd : u(x) + p · (y − x) ≤ u(y) for all y ∈ S }

is non-empty. Also, the Monge-Ampère operator will be defined by using the sub-differential, and more
precisely the equality Z
det(D2 u(x))dx = |∂u(B)|
B
which is valid for smooth convex functions u and arbitrary
R open sets B. An important point is the fact that
whenever f is superlinear, functionals of the form f (%(x))dx impose, for their finiteness, the positiviy
of det(D2 u), which will in turn impose that the sub-differential has positive volume, i.e. it is non-empty,
and hence convexity. . .
More precisely, we will minimize over the set of pairs (u, P) : S → R × Rd where P(x) ∈ ∂u(x) for
every x ∈ S . For every such pair (u, P) we weed to define G(u, P) which is meant to be (∇u)# µ, and
define F(G(u, P)) whenever F has the form F = F + V + W. We will simply define
X X
V(G(u, P)) := a j V(P(x j )) and W(G(u, P)) := a j a j0 W(P(x j ) − P(x j0 )),
j j, j0

which means that we just use P# µ instead of (∇u)# µ. Unfortunately, this choice is not adapted for the
functional F , which requires absolutely continuous measures, and P# µ is atomic. In this case, instead
of concentrating all the mass a j contained in the point x j ∈ S on the unique point P(x j ), we need to
spread it, and we will spread it uniformly on the whole subdifferential ∂u(x j ). This means that we also
define a new surrogate of the image measure (∇u)# µ, called Gac (u, P) (where the superscript ac stands
for absolutely continuous), given by
X aj
Gac (u, P) := Ld A j ,
j
|A j |

where A j := ∂u(x j ) ∩ Ω (the intersection with Ω is done in order to take care of the constraint ∇u ∈ Ω).
Computing F (Gac (u, P)) gives hence
!
X aj
F (G (u, P)) =
ac
|A j | f .
j
|A j |

It is clear that the discretization of V and W in terms of G(u, P) are convex functions of (u, P) (actually,
of P, and the constraint relating P and u is convex) whenever V and W are convex; concerning F , it is

54
possible to prove, thanks to the concavity properties of the determinant or, equivalently, to the Brunn-
Minkowski inequality (see for instance [86]) that F (Gac (u, P)) is convex in u as soon as f satisfies
McCann’s condition. Globally, it is not surprising to see that we face a convex variational problem
in terms of u (or of ∇u) as soon as F is displacement convex in % (actually, convexity on generalized
geodesics based at µ should be the correct notion).
Then we are lead to study the variational problem
1X
min a j |P(x j ) − x j |2 + V(G(u, P)) + W(G(u, P)) + F (Gac (u, P)) (4.26)
u,P 2 j

under the constraints P(x j ) ∈ A j := ∂u(x j ) ∩ Ω. Note that we should incorporate the scale factor τ in
the functional F which means that, for practical purposes, convexity in P is guaranteed as soon as V
and W have second derivatives which are bounded from below (they are semi-convex) and τ is small
(the quadratic term coming from W22 will always overwhelm possible concavity of the other terms). The
delicate point is how to compute the subdifferentials ∂u(x j ), and optimize them (i.e. compute derivatives
of the relevant quantity w.r.t. u).
This is now possible, and in a very efficient way, thanks to tools from computational geometry.
Indeed, in this context, subdifferentials are exactly a particular case of what are called Laguerre cells,
which in turn are very similar to Voronoi cells. We remind that, given some points (x j ) j , their Voronoi
cells V j are defined by
( )
1 1
V j := x ∈ Ω : |x − x j | ≤ |x − x j | for all j
2 0
2 0
2 2

(of course the squares of the distances could be replaced by the distances themselves, but in this way
it is evident that the cells V j are given by a finite number of linear inequalities, and are thus convex
polyhedra; the factors 21 are also present only for cosmetic reasons). Hence, Voronoi cells are the cells
of points which are closer to one given point x j ∈ S than to the others.
In optimal transport a variant of these cells is more useful: given a set of values ψ j , we look for the
cells (called Laguerre cells)
( )
1 1
W j := x ∈ Ω : |x − x j | + ψ j ≤ |x − x j | + ψ j for all j .
2 0
2 0
0
2 2

This means that we look at points which are closer to x j than to the other points x j0 , up to a correction15
given by the values ψ j . It is not difficult to see that also in this case cells are convex polyhedra. And
it is also easy to see that, if % is an absolutely continuous measure on Ω and µ = j a j δ x j , then finding
P
an optimal transport map from % to µ is equivalent to finding values ψ j such that %(W j ) = a j for every
j (indeed, in this case, the map sending every point of W j to x j is optimal, and −ψ j is the value of the
corresponding Kantorovich potential at the point x j ). Finally, it can be easily seen that the Laguerre cells
corresponding to ψ j := u(x j ) − 12 |x j |2 are nothing but the subdifferentials of u (possibly intersected with
Ω).
15
If the points x j are the locations of some ice-cream sellers, we can think that ψ j is the price of an ice-cream at x j , and the
cells W j will represent the regions where customers will decide to go to the seller j, keeping into account both the price and the
distance.

55
Handling Laguerre cells from the computer point of view has for long been difficult, but it is now
state-of-the-art in computational geometry, and it is possible to compute very easily their volumes (in-
cidentally, also find some points P belonging to them, which is useful so as to satisfy the constraints
of Problem (4.26)), as well as the derivatives of their volumes (which depend on the measures of each
faces) w.r.t. the values ψ j . For the applications to semi-discrete16 optimal transport problems, the results
are now very fast (with discretizations with up to 106 points in some minutes, in 3D; the reader can
have a look at [72, 60, 62] but also to Section 6.4.2 in [84]), and the same tools have been used for the
applications to the JKO scheme that we just described.
In order to perform an iterated minimization, it is enough to discretize %0 with a finite number of Dirac
masses located at points x j , to fix τ > 0 small, then to solve (4.26) with µ = %τk and set %τk+1 := G(u, P)
for the optimal (u, P). Results, proofs of convergence and simulations are in [17].

5 The heat flow in metric measure spaces


In this last section we will give a very sketchy overview of an interesting research topic developed by
Ambrosio, Gigli, Savaré and their collaborators, which is in some sense a bridge between

• the theory of gradient flows in W2 , seen from an abstract metric space point of view (which is not
the point of view that we underlined the most in the previous section),

• and the current research topic of analysis and differential calculus in metric measure spaces.

This part of their work is very ambitious, and really aims at studying analytical and geometrical properties
of metric measure spaces; what we will see here is only a starting point.
The topic that we will briefly develop here is concerned with the heat flow, and the main observation
is the following: in the Euclidean space Rd (or in a domain Ω ⊂ Rd ), the heat flow ∂t % = ∆% may be seen
as a gradient flow in two different ways:

• first, it is the gradient flow in the Hilbert space L2 (Ω), endowed with the standard L2 norm, of
the functional consisting in the Dirichlet energy D(%) = |∇%|2 dx (a functional which is set to
R

+∞ if % < H 1 (Ω)); in this setting, the initial datum %0 could be any function in L2 (Ω), but well-
known properties of the heat equation guarantee %0 ≥ 0R⇒ %t ≥ 0 and, if Ω is the whole space, or
boundary conditions are Neumann, then %0 dx = 1 ⇒ %t dx = 1; it is thus possible to restrict to
R

probability densities (i.e. positive densities with mass one);

• then, if we use the functional E of the previous section (the entropy defined with f (t) = t log t), the
heat flow is also a gradient flow in the space W2 (Ω).

A natural question arises: is the fact that these two flows coincide a general fact? How to analyze
this question in a general metric space? In the Euclidean space this is easy: we just write the PDEs
corresponding to each of these flows and, as they are the same PDE, for which we know uniqueness
results, then the two flows are the same.
First, we realize that the question is not well-posed if the only structure that we consider on the
underlining space is that of a metric space. Indeed, we also need a reference measure (a role played by
16
One measure being absolutely continuous, the other atomic.

56
the
R Lebesgue measure in the Euclidean space). Such a measure is needed in order to define the integral
|∇%| dx, and also the entropy % log % dx. Roughly speaking, we need to define “dx”.
R
2

Hence, we need to consider metric measure spaces, (X, d, m), where m ≥ 0 is a reference measure
(usually finite) on the Borel tribe of X. The unexperienced reader should not be surprised: metric measure
spaces are currently the new frontier of some branches of geometric analysis, as a natural generalization
of Riemannian manifolds. In order not to overburden the reference list, we just refer to the following
papers, already present in the bibliography of this survey for other reasons:[5, 9, 10, 7, 34, 49, 51, 52,
53, 54, 63, 85, 87].

5.1 Dirichlet and Cheeger energies in metric measure spaces


In order to attack our question about the comparison of the two flows, we first need to define and study
the flow of the Dirichlet energy, and in particular to give a suitable definition of such an energy. This
more or less means defining the space H 1 (X) whenever X is a metric measure space (MMS). This is not
new, and many authors studied it: we cite in particular [52, 53, 34, 85]). Moreover, the recent works by
Ambrosio, Gigli and Savaré ([9, 7]) presented some results in this direction, useful for the analysis of
the most general case (consider that most of the previous results require a doubling assumption and the
existence of a Poincaré inequality, see also[54], and this assumption on (X, d, m) is not required in their
papers). One of the first definition of Sobolev spaces on a MMS had been given by Haiłasz, who used
the following definition

f ∈ H 1 (X, d, m) if there is g ∈ L2 (X, m) such that | f (x) − f (y)| ≤ d(x, y)(g(x) + g(y)).

This property characterizes Sobolev spaces in Rd by choosing

g = const · M |∇ f | ,
 
>
where M[u] denotes the maximal function of u: M[u](x) := supr>0 B(x,r) u (the important point here is
the classical result in harmonic analysis guaranteeing ||M[u]||L2 ≤ C(d)||u||L2 ) and c is a suitable constant
only depending on the dimension d. As this definition is not local, amost all the recent investigations
on these topics are rather based on some other ideas, due to Cheeger ([34]), using the relaxation starting
from Lipschitz functions, or to Shanmuganlingam ([85]), based on the inequality
Z 1
| f (x(0)) − f (x(1))| ≤ |∇ f (x(t)||x0 (t)| dt
0

required to hold on almost all curves, in a suitable sense. The recent paper [7] resents a classification
of the various weak notions of modulus of the gradient in a MMSR and analyzes their equivalence. On
the contrary, here we will only choose one unique definition for |∇ f |2 dm, the one which seems the
simplest.
For everyR Lipschitz function f on X, let us take its local Lipschitz constant |∇ f |, defined in (3.1), and
set D( f ) := |∇ f |2 (x)dm. Then, by relaxation, we define the Cheeger Energy17 C( f ):
 
C( f ) := inf lim inf D( fn ) : fn → f in L (X, m), fn ∈ Lip(X) .
2
n
17
The name has been chosen because Cheeger also gave a definition by relaxation; moreover, the authors did not wish to call
it Dirichlet energy, as generally this name is used fro quadratic forms.

57
We then define the Sobolev space H 1 (X, d, m) as the space ofpfunctions such that C( f ) < +∞. This
space will be a Banach space, endowed with the norm f 7→ C( f ) and the function f 7→ C( f ) will
be convex. We can also define −∆ f as the element of minimal norm of the subdifferential ∂C( f ) (an
element belonging to the dual of H 1 (X, d, m)). Beware
p that, in general, the map f 7→ −∆ f will not be
linear (which corresponds to the fact that the norm C( f ) is in general not Hilbertian, i.e. it does not
come from a scalar product).
Definining the flow of C in the Hilbert space L2 (X, m) is now easy, and fits well the classical case
of convex functionals on Hilbert spaces or, more generally, of monotone maximal operators (see [25]).
This brings very general existence and uniqueness results.

5.2 A well-posed gradient flow for the entropy


A second step (first developed in [49] and then generalized in [9]) consists in providing existence and
uniqueness conditions for the gradient flow of the entropy, w.r.t. the Wasserstein distance W2 . To do so,
we consider Rthe funcitonal E, defined on the set of densities f such that % := f ·m is a probability measure
via E( f ) := f log f dm and we look at its gradient flow in W2 in the EDE sense. In order to apply the
general theory of Section 3, as we cannot use the notion of weak solutions of the continuity equation, it
will be natural to suppose that this functional E is λ-geodesically convex for some λ ∈ R. This means, in
the sense of Sturm and Lott-Villani, that the space (X, d, m) is a MMS with Ricci curvature bounded from
below. We recall here the corresponding definition, based on the characteristic property already evoked
in the Section 4.4, which was indeed a theorem (Proposition 4.11) in the smooth case.

Definition 5.1. A metric measure space (X, d, m) is said to have a Ricci curvature bounded from below
by a constant K ∈ R in the sense of Sturm and Lott-Villani if the entropy functional E : P(X) → R ∪ {+∞}
defined through
R
 f log f dm if % = f · m

E(%) = 

+∞
 if % is not absolutely continuous w.r.t. m

is K-geodesically convex in the space W2 (X). In this case we say that (X, d, m) satisfies the condition18
CD(K, ∞).

Note that the EVI formulation is not available in this case, as we do not have the geodesic convexity
of the squared Wasserstein. Moreover, on a general metric space, the use of generalized geodesics is not
always possible. This is the reason why we will define the gradient flow of E by the EDE condition and
not the EVI, but this requires to prove via other methods the uniqueness of such a gradient flow. To do
so, Gigli introduced in [49] an interesting strategy to prove uniqueness for EDE flows, which is based on
the following proposition.

Proposition 5.1. If F : P(X) → R∪{+∞} is a strictly convex functional (w.r.t. usual convex combinations
µ s := (1 − s)µ0 + sµ1 , which is meaningful in the set of probability measures), such that |∇− F| is an upper
gradient for F and such that |∇− F|2 is convex, then for every initial measure µ̄ there exists at most one
gradient flow µ(t) in the EDE sense for the functional F satisfying µ(0) = µ̄.
18
The general notation CD(K, N) is used to say that a space has curvature bounded from below by K and dimension bounded
from above by N (CD stands for “curvature-dimension”).

58
In particular, this applies to the functional E: the strict convexity is straightforward, and the squared
slope can be proven to be convex with the help of the formula (3.4) (it is interesting to observe that, in
the Euclidean case, an explicit formula for the slope is known:
|∇ f |2
Z
|∇ E| (%) =
− 2
dx, (5.1)
f
whenever % = f · Ld ).

5.3 Gradient flows comparison


The last point to study (and it is not trivial at all) is the fact that every gradient flow of C (w.r.t. the L2
distance) is also an EDE gradient flow of E for the W2 distance. This one (i.e. (C, L2 ) ⇒ (E, W2 )) is
the simplest direction to consider in this framework, as computations are easier. This is a consequence
of the fact that the structure of gradient flows of convex functionals in Hilbert spaces is much more well
understood. In order to do so, it is useful to compute and estimates
d
E( ft ), where ft is a gradient flow of C in L2 (X, m).
dt
This computation is based on a strategy essentially developed in [51] and on a lemma by Kuwada. The
initial proof, contained in [51], is valid for Alexandroff spaces19 . The generalization of the same result
to arbitrary MMS satisfying CD(K, ∞) is done in [9].
Proposition 5.2. If ft is a gradient flow of C in L2 (X, H d ), then we have the following equality with the
Fischer information:
d
− E( ft ) = C(2 ft ).
p
dt
Moreover, for every % = f · H d ∈ P(X) we have
p
C(2 f ) ≥ |∇− E|2 (%)
(where the slope of E is computed for the W2 distance20 ). Also, if we consider the curve %t = ft · H d , it
happens that %t in an AC curve in the space W2 (X) and
p
|%0 |(t)2 ≤ C(2 ft ).
these three estimates imply that %t is a gradient flow of E w.r.t. W2 .
Once this equivalence is established, we can wonder about the properties of this gradient flow. The
L2 distance being Hilbertian, it is easy to see that the C2 G2 property is satisfied, and hence this flow also
satisfies EVI. On the contrary, it is not evident that the same is true when we consider the same flow as
the gradient flow of E for the distance W2 . Indeed, we can check that the following three conditions are
equivalent (all true or false depending on the space (X, d, m), which is supposed to satisfy CD(K, ∞); see
[10] for the proofs):
19
These spaces, see [26], are metric spaces where triangles are at least as fat as the triangles of a model comparison manifold
with constant curvature equal to K, the comparison being done in terms of the distances from a vertex of a triangle to the points
of a geodesic connecting the two other vertices. These spaces can be proven to have always an integer dimension d ∈ N ∪ {∞},
and can be consideres as MMS whenever d < ∞, by endowing them with their Hausdorff measure H d . Note anyway that the
comparison manifold with constant curvature can be, anyway, taken of dimension 2, as only triangles appear in the definition.
20
As in (5.1).

59
• the unique EDE gradient flow of E for W2 also satisfies EVI;

• the heat flow (which is at the same time the gradient flow of E for W2 and of C for L2 ) depends
linearly on the initial datum;

• (if we suppose that (X, d, m) is a Finsler manifold endowed with its natural distance and its volume
measure), X is a Riemannian manifold.

As a consequence, Ambrosio, Gigli and Savaré proposed in [10] a definition of MMS having a
Riemanniann ricci curvature bounded from below by requiring both to satisfy the CD(K, ∞) condition,
and the linearity of the heat flow (this double condition is usually written RCD(K, ∞)). This is the notion
of infinitesimally Hilbertian space that we mentioned at the end of Section 3.
It is important to observe (but we will not develop this here) that these notions of Ricci bounds (either
Riemannian or not) are stable via measured Gromov-Hausdorff convergence (a notion of convergence
similar to the Gromov-Hausdorff convergence of metric spaces, but considering the minimal Wasserstein
distance between the images of two spaces via isometric embeddings into a same space). This can be
surprising at a first sight (curvature bounds are second-order objects, and we are claiming that they are
stable via a convergence which essentially sounds like a uniform convergence of the spaces, with a weak
convergence of the measures), but not after a simple observation: also the class of convex, or λ-convex,
functions is stable under uniform (or even weak) convergence! but, of course, proving this stability is not
a trivial fact.

References
[1] M. Agueh and M. Bowles, One-dimensional numerical algorithms for gradient flows in the p-
Wasserstein spaces, Acta applicandae mathematicae 125 (2013), no. 1, 121–134.

[2] L. Ambrosio, Movimenti minimizzanti, Rend. Accad. Naz. Sci. XL Mem. Mat. Sci. Fis. Natur. 113,
191–246, 1995.

[3] L. Ambrosio, Lecture Notes on Optimal Transport Problems, Mathematical Aspects of Evolving
Interfaces, Springer Verlag, Lecture Notes in Mathematics (1812), 1–52, 2003.

[4] L. Ambrosio, Transport equation and Cauchy problem for BV vector fields, Inventiones Mathe-
maticae, 158 (2004), 227–260.

[5] L. Ambrosio and N. Gigli A user’s guide to optimal transport, in Modelling and Optimisation of
Flows on Networks Lecture Notes in Mathematics, 1–155, 2013.

[6] L. Ambrosio, N. Gigli and G. Savaré, Gradient flows in metric spaces and in the spaces of proba-
bility measures. Lectures in Mathematics, ETH Zurich, Birkhäuser, 2005.

[7] L. Ambrosio, N. Gigli and G. Savaré, Density of Lipschitz functions and equivalence of weak
gradients in metric measure spaces, Rev. Mat. Iberoamericana 29 (2013) 969-986.

[8] L. Ambrosio, N. Gigli and G. Savaré, Heat flow and calculus on metric measure spaces with
Ricci curvature bounded below - the compact case, Analysis and Numerics of Partial Differential

60
Equations, Brezzi F., Colli Franzone P., Gianazza U.P., Gilardi G. (ed.) Vol. 4, “INDAM”, Springer
(2013) 63-116

[9] L. Ambrosio, N. Gigli and G. Savaré, Calculus and heat flow in metric measure spaces and appli-
cations to spaces with Ricci bounds from below, Inv. Math. 195 (2), 289–391, 2014.

[10] L. Ambrosio, N. Gigli and G. Savaré, Metric measure spaces with Riemannian Ricci curvature
bounded from below, Duke Math. J. 163 (2014) 1405–1490

[11] L. Ambrosio and G. Savaré, Gradient flows of probability measures, Handbook of differential
equations, Evolutionary equations 3, ed. by C.M. Dafermos and E. Feireisl (Elsevier, 2007).

[12] L. Ambrosio and P. Tilli, Topics on analysis in metric spaces. Oxford Lecture Series in Mathe-
matics and its Applications (25). Oxford University Press, Oxford, 2004.

[13] J.-P. Aubin, Un théorème de compacité. (French). C. R. Acad. Sci. Paris 256. pp. 5042–5044

[14] J.-D. Benamou and Y. Brenier, A computational fluid mechanics solution to the Monge-
Kantorovich mass transfer problem, Numer. Math., 84 (2000), 375–393.

[15] J.-D. Benamou and G. Carlier, Augmented Lagrangian Methods for Transport Optimization,
Mean Field Games and Degenerate Elliptic Equations, J. Opt. Theor. Appl., (2015), to appear.

[16] J.-D. Benamou, G. Carlier and M. Laborde An augmented Lagrangian approach to Wasserstein
gradient flows and applications, preprint.

[17] J.-D. Benamou, G. Carlier, Q. Mérigot and É. Oudet Discretization of functionals involving the
Monge-Ampère operator, Numerische Mathematik, to appear.

[18] A. Blanchet, V. Calvez and J.A. Carrillo, Convergence of the mass-transport steepest descent
scheme for the subcritical Patlak-Keller-Segel model, SIAM J. Numer. Anal. 46 (2), 691–721,
2008.

[19] A. Blanchet, J.-A. Carrillo, D. Kinderlehrer, M. Kowalczyk, P. Laurençot and S. Lisini, A


Hybrid Variational Principle for the Keller-Segel System in R2 , ESAIM M2AN, Vol. 49, no 6,
1553–1576, 2015.

[20] N. Bonnotte Unidimensional and Evolution Methods for Optimal Transportation, PhD Thesis,
Université Paris-Sud, 2013

[21] G. Bouchitté and G. Buttazzo, Characterization of optimal shapes and masses through Monge-
Kantorovich equation J. Eur. Math. Soc. 3 (2), 139–168, 2001.

[22] G. Bouchitté, G. Buttazzo and P. Seppecher, Shape optimization solutions via Monge-
Kantorovich equation. C. R. Acad. Sci. Paris Sér. I Math. 324 (10), 1185–1191, 1997.

[23] Y. Brenier, Décomposition polaire et réarrangement monotone des champs de vecteurs. (French)
C. R. Acad. Sci. Paris Sér. I Math. 305 (19), 805–808, 1987.

61
[24] Y. Brenier, Polar factorization and monotone rearrangement of vector-valued functions, Commu-
nications on Pure and Applied Mathematics 44, 375–417, 1991.

[25] H. Brezis Opérateurs maximaux monotones et semi-groupes de contractions dans les spaces de
Hilbert, North-Holland mathematics studies, 1973.

[26] Yu. Burago, M. Gromov and G. Perelman A. D. Alexandrov spaces with curvatures bounded
below (Russian), Uspekhi Mat. Nauk 47 (1992), 3–51; English translation: Russian Math. Surveys
47 (1992), 1–58.

[27] G. Buttazzo, Semicontinuity, relaxation, and integral representation in the calculus of variations
Longman Scientific and Technical, 1989.

[28] G. Buttazzo, É. Oudet and E. Stepanov. Optimal transportation problems with free Dirichlet re-
gions. In Variational methods for discontinuous structures, 41–65, vol 51 of PNLDE, Birkhäuser,
Basel, 2002.

[29] C. Cancès, T. Gallouët and L. Monsaingeon Incompressible immiscible multiphase flows in


porous media: a variational approach, preprint, 2016.

[30] J.-A. Carrillo, M. Di Francesco, A. Figalli, T. Laurent and D. Slepčev, Global-in-time weak
measure solutions and finite-time aggregation for nonlocal interaction equations Duke Math. J.
156 (2), 229–271, 2011.

[31] J.-A. Carrillo, R.J. McCann and C. Villani, Kinetic equilibration rates for granular media and
related equations: entropy dissipation and mass transportation estimates, Revista Matemática
Iberoamericana 19, 1–48, 2003.

[32] J.-A. Carrillo, R.J. McCann and C. Villani, Contractions in the 2-Wasserstein length space and
thermalization of granular media, Arch. Rat. Mech. An. 179, 217–263, 2006.

[33] J.-A. Carrillo and D. Slepčev Example of a displacement convex functional of first order Calcu-
lus of Variations and Partial Differential Equations 36 (4), 547–564, 2009.

[34] J. Cheeger Differentiability of Lipschitz functions on metric measure spaces, Geom. Funct. Anal.
9 (1999), 428–517.

[35] P. Choné and H. Le Meur, Non-convergence result for conformal approximation of variational
problems subject to a convexity constraint, Numer. Funct. Anal. Optim. 5-6 (2001), no. 22, 529–
547.

[36] K. Craig Nonconvex gradient flow in the Wasserstein metric and applications to constrained non-
local interactions, preprint available at http://arxiv.org/abs/1512.07255

[37] S. Daneri and G. Savaré, Eulerian Calculus for the Displacement Convexity in the Wasserstein
Distance. SIAM J. Math. An. 40, 1104-1122, 2008.

[38] E. De Giorgi, New problems on minimizing movements, Boundary Value Problems for PDE and
Applications, C. Baiocchi and J. L. Lions eds. (Masson, 1993) 81–98.

62
[39] S. Di Marino and A. R. Mészáros Uniqueness issues for evolution equations with density con-
straints, Math. Models Methods Appl. Sci., to appear (2016)

[40] R. J. DiPerna and P. L. Lions, Ordinary differential equations, transport theory and Sobolev spaces,
Inventiones mathematicae 98.3 (1989): 511–548.

[41] I. Ekeland, On the variational principle, J. Math. Anal. Appl., vol. 47, no 2,1974, p. 324–353.

[42] I. Ekeland and S. Moreno-Bromberg, An algorithm for computing solutions of variational prob-
lems with global convexity constraints, Numerische Mathematik 115 (2010), no. 1, 45–69.

[43] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, Classics in Mathematics,
Society for Industrial and Applied Mathematics (1999).

[44] A. Figalli and N. Gigli, A new transportation distance between non-negative measures, with
applications to gradients flows with Dirichlet boundary conditions, J. Math. Pures et Appl., 94
(2), 107–130, 2010.

[45] M. Fortin and R. Glowinski, Augmented Lagrangian methods, Applications to the Numerical
Solution of Boundary-Value Problems, North-Holland (1983).

[46] W. Gangbo, An elementary proof of the polar factorization of vector-valued functions, Arch. Ra-
tional Mech. Anal., 128, 381–399, 1994.

[47] W. Gangbo, The Monge mass transfer problem and its applications, Contemp. Math., 226, 79–104,
1999.

[48] W. Gangbo and R. McCann, The geometry of optimal transportation, Acta Math., 177, 113–161,
1996.

[49] N. Gigli On the Heat flow on metric measure spaces: existence, uniqueness and stability, Calc.
Var. Part. Diff. Eq. 39 (2010), 101–120.

[50] N. Gigli Propriétés géométriques et analytiques de certaines structures non lisses, Mémoire HDR,
Univ. Nice-Sophia-Antipolis, 2011.

[51] N. Gigli, K. Kuwada and S. I. Ohta, Heat flow on Alexandrov spaces, Comm. Pure Appl. Math.
Vol. LXVI, 307–331 (2013).

[52] P. Hajłasz, Sobolev spaces on an arbitrary metric space, Potential Analysis 5 (1996), 403–415.

[53] P. Hajłasz, Sobolev spaces on metric-measure spaces, Contemp. Math. 338 (2003), 173–218.

[54] P. Hajłasz and P. Koskela Sobolev met Poincaré, Mem. Amer. Math. Soc. 688 (2000), 1–101.

[55] R. Jordan, D. Kinderlehrer and F. Otto, The variational formulation of the Fokker-Planck equa-
tion, SIAM J. Math. Anal. 29 (1), 1–17, 1998.

[56] L. Kantorovich, On the transfer of masses. Dokl. Acad. Nauk. USSR, 37, 7–8, 1942.

63
[57] E. F. Keller and L. A. Segel, Initiation of slide mold aggregation viewed as an instability, J.
Theor. Biol., 26, 399–415, 1970.

[58] E. F. Keller and L. A. Segel, Model for chemotaxis, J. Theor. Biol., 30, 225–234, 1971.

[59] D. Kinderlehrer and N. J Walkington, Approximation of parabolic equations using the Wasser-
stein metric, ESAIM: Mathematical Modelling and Numerical Analysis 33 (1999), no. 04, 837–
852.

[60] J. Kitagawa, Q. Mérigot and B. Thibert, Convergence of a Newton algorithm for semi-discrete
optimal transport, preprint, 2016

[61] G. Legendre and G. Turinici, Second order in time schemes for gradient flows in Wasserstein and
geodesic metric spaces, preprint, 2016.

[62] B. Lévy A numerical algorithm for L2 semi-discrete optimal transport in 3D ESAIM: Mathemati-
cal Modelling and Numerical Analysis 49 (6), 1693–1715, 2015

[63] J. Lott and C. Villani Ricci curvature for metric-measure spaces via optimal transport Ann. of
Math. 169, 903–991, 2009.

[64] B. Maury, A. Roudneff-Chupin and F. Santambrogio A macroscopic crowd motion model of gra-
dient flow type, Math. Models and Methods in Appl. Sciences 20 (10), 1787–1821, 2010.

[65] B. Maury, A. Roudneff-Chupin, F. Santambrogio and J. Venel, Handling congestion in crowd


motion modeling, Net. Het. Media 6 (3), 485–519, 2011.

[66] B. Maury and J. Venel, A mathematical framework for a crowd motion model, C. R. Acad. Sci.
Paris, Ser. I 346 (2008), 1245–1250.

[67] B. Maury and J. Venel, A discrete contact model for crowd motion ESAIM: M2AN 45 (1), 145–
168, 2011.

[68] R. J. McCann, Existence and uniqueness of monotone measure preserving maps, Duke Math. J.,
80, 309–323, 1995.

[69] R. J. McCann A convexity principle for interacting gases. Adv. Math. 128 (1), 153–159, 1997.

[70] R. J. McCann Exact solutions to the transportation problem on the line. Proc. Royal Soc. London
Ser. A 455, 1341–1380, 1999.

[71] R. J. McCann Polar factorization of maps on Riemannian manifolds. Geom. Funct. Anal., 11 (3),
589–608, 2001.

[72] Q. Mérigot A multiscale approach to optimal transport. Computer Graphics Forum 30 (5) 1583–
1592, 2011.

[73] Q. Mérigot and É. Oudet Handling convexity-like constraints in variational problems SIAM Jour-
nal on Numerical Analysis, 52 (5), 2466–2487, 2014

64
[74] A. R. Mészáros and F. Santambrogio Advection-diffusion equations with density constraints,
Analysis & PDE 9-3 (2016), 615–644.

[75] G. Monge, Mémoire sur la théorie des déblais et des remblais, Histoire de l’Académie Royale des
Sciences de Paris, 666–704, 1781.

[76] F. Otto, The geometry of dissipative evolution equations: The porous medium equation, Comm.
Partial Differential Equations, 26, 101–174, 2011.

[77] F. Pitié, A. C. Kokaram and R. Dahyot. Automated colour grading using colour distribution trans-
fer. Comp. Vis. Image Understanding, 107 (1–2), 123–137, 2007.

[78] J. Rabin, G. Peyré, J. Delon and M. Bernot. Wasserstein Barycenter and Its Application to Texture
Mixing. In Scale Space and Variational Methods in Computer Vision, edited by A. M. Bruckstein,
B. M. Haar Romeny, A.M. Bronstein, and M. M. Bronstein, Lecture Notes in Computer Science,
vol. 6667, 435–446, Springer Berlin Heidelberg, 2012.

[79] M.-K. von Renesse and K.-T. Sturm, Entropic measure and Wasserstein diffusion, Ann. Probab.
37 (2009), 1114–1191.

[80] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970.

[81] A. Roudneff-Chupin, Modélisation macroscopique de mouvements


de foule, PhD Thesis, Université Paris-Sud, 2011, available at
www.math.u-psud.fr/∼roudneff/Images/these roudneff.pdf

[82] F. Santambrogio Gradient flows in Wasserstein spaces and applications to crowd movement,
Séminaire Laurent Schwartz no 27, École Polytechnique, 2010.

[83] F. Santambrogio, Flots de gradient dans les espaces métriques et leurs applications (d’après
Ambrosio-Gigli-Savaré), proceedings of the Bourbaki Seminar, 2013 (in French).

[84] F. Santambrogio Optimal Transport for Applied Mathematicians, Progress in Nonlinear Differen-
tial Equations and Their Applications no 87, Birkhäuser Basel (2015).

[85] N. Shanmugalingam Newtonian spaces: an extension of Sobolev spaces to metric measure spaces,
Rev. Mat. Iberoamericana 16 (2000), 243–279.

[86] Schneider, Convex bodies: the Brunn-Minkowski theory, vol. 44, Cambridge Univ. Press, 1993.

[87] K.-T. Sturm On the geometry of metric measure spaces. I. Acta Math. 196, 65–131, 2006.

[88] K.-T. Sturm On the geometry of metric measure spaces. II. Acta Math. 196, 133–177, 2006.

[89] C. Villani Topics in Optimal Transportation. Graduate Studies in Mathematics, AMS, 2003.

[90] C. Villani, Optimal transport: Old and New, Springer Verlag, 2008.

65

You might also like