0% found this document useful (0 votes)
4 views25 pages

Regularized Discrete Optimal Transport

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views25 pages

Regularized Discrete Optimal Transport

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

REGULARIZED DISCRETE OPTIMAL TRANSPORT

SIRA FERRADANS† , NICOLAS PAPADAKIS‡ ,


GABRIEL PEYRɆ , AND JEAN-FRANÇOIS AUJOL‡ §

Abstract. This article introduces a generalization of the discrete optimal transport, with ap-
plications to color image manipulations. This new formulation includes a relaxation of the mass
conservation constraint and a regularization term. These two features are crucial for image pro-
cessing tasks, which necessitate to take into account families of multimodal histograms, with large
mass variation across modes. The corresponding relaxed and regularized transportation problem is
the solution of a convex optimization problem. Depending on the regularization used, this mini-
mization can be solved using standard linear programming methods or first order proximal splitting
arXiv:1307.5551v1 [cs.CV] 21 Jul 2013

schemes. The resulting transportation plan can be used as a color transfer map, which is robust to
mass variation across images color palettes. Furthermore, the regularization of the transport plan
helps to remove colorization artifacts due to noise amplification. We also extend this framework to
the computation of barycenters of distributions. The barycenter is the solution of an optimization
problem, which is separately convex with respect to the barycenter and the transportation plans, but
not jointly convex. A block coordinate descent scheme converges to a stationary point of the energy.
We show that the resulting algorithm can be used for color normalization across several images. The
relaxed and regularized barycenter defines a common color palette for those images. Applying color
transfer toward this average palette performs a color normalization of the input images.

1. Introduction. A large class of image processing problems involves proba-


bility densities estimated from local or global image features. In contrast to most
distances from information theory (e.g. the Kullback-Leibler divergence), optimal
transport (OT) takes into account the spatial location of the density modes [41]. Fur-
thermore, it also provides as a by-product a warping (the so-called transport plan)
between the densities. This plan can be used to perform image modifications such
as color transfer. However, an important flaw of this OT plan is that it is in general
highly irregular, thus introducing unwanted artifacts in the modified images. In this
article, we propose a variational formalism to relax and regularize the transport. This
novel regularized OT improves visually the results for color image modifications.
1.1. Color Normalization and Color Transfer. The problem of imposing
some histogram on an image has been tackled since the beginning of image process-
ing. Classic problems are histogram equalization or histogram specification (see for
example [18]). Given two images, the goal of color transfer is to impose on one of
the images the histogram of the other one. An approach to color transfer based on
matching statistical properties (mean and covariance) is proposed by Reinhard et
al. [34] for the `αβ color space, and generalized by Xiao and Ma [43] to any color
space. Wang and Huang [42] use similar ideas to generate a sequence of the same
image with a changing histogram. Morovic and Sun [27] and Delon [11] show that
histogram transfer is directly related to the OT problem.
A special case of color transfer is color normalization where the goal is to impose
the same histogram, normally some “average” histogram, on a set of different images.
An application for the color balancing of videos is proposed by Delon [12] to correct
flickering in old movies. In the context of canceling illumination, this problem is also

∗ This work has been supported by the European Research Council (ERC project SIGMA-Vision)

and the French National Research Agency (project NatImages).


† CEREMADE, Université Paris-Dauphine, Place du Marechal De Lattre De Tassigny, 75775

PARIS CEDEX 16, FRANCE.


‡ IMB, UMR 5251, Université Bordeaux 1, 351 cours de la Libération F-33405 TALENCE, France
§ Member of Institut Universitaire de France.

1
known as color constancy and it has been thoroughly studied by Land and McCann
who propose the Retinex theory (see [22] and [3] for a modern formulation). Can-
celing the illumination of a scene is an important component in the computer vision
pipeline, and it is regularly used as a preprocessing to register/compare several im-
ages taken with different cameras or illumination conditions, as a preprocessing before
registration, see [9] for instance.
1.2. Optimal Transport and Imaging.
Discrete optimal transport. The discrete OT is the solution of a convex linear
program originally introduced by Kantorovitch [21]. It corresponds to the convex
relaxation of a combinatorial problem when the densities are sums of the same number
of Dirac masses. This relaxation is tight (i.e. the solution of the linear program is an
assignment) and it extends the notion of OT to an arbitrary sum of weighted Diracs,
see for instance [41]. Although there exist dedicated linear solvers (transportation
simplex [10]) and combinatorial algorithms (such as the Hungarian [19] and auction
algorithms [6]), computing OT is still a challenging task for densities composed of
thousands of Dirac masses.
Optimal transport distance. The OT distance (also known as the Wasserstein
distance or the Earth Mover distance) has been shown to produce state of the art
results for the comparison of statistical descriptors, see for instance [35]. Image re-
trieval performance as well as computational time are both greatly improved by using
non-convex cost functions, see [30].
Optimal transport map. Another line of applications of OT makes use of the
transport plan to warp an input density onto another. OT is strongly connected to
fluid dynamic partial differential equations [5]. These connections have been used
to perform image registration [20]. The estimation of the transport plan is also an
interesting way of tackling the challenging problem of color transfer between images,
see for instance [34, 27, 26]. For grayscale images, the usual histogram equalization
algorithm corresponds to the application of the 1-D OT plan to an image, see for
instance [11]. It thus makes sense to consider the 3-D OT as a mathematically-sound
way to perform color palette transfer, see for instance [31] for an approximate trans-
port method. When doing so, it is important to cope with variations in the modes of
the color palette across images, which makes the mass conservation constraint of OT
problematic. A workaround is to consider parametric densities such as Gaussian mix-
tures and defines ad-hoc matching between the components of the mixture, see [38].
In our work, we tackle this issue by defining a novel notion of OT well adapted to
colors manipulation.
Optimal transport barycenter. It is natural to extend the classical barycenter of
points to barycenter of densities by minimizing a weighted sum of OT distances to-
ward a family of input distributions. In the special case of two input distributions,
this corresponds to the celebrated displacement interpolation defined by McCann [25].
Existence and uniqueness of such a barycenter is proved by Agueh and Carlier [1],
which also show the equivalence with the multi-marginal transportation problem in-
troduced by Gangbo and Świȩch [16]. Displacement interpolation (i.e. barycenter
between a pair of distributions) is used by Bonneel et al. [7] for computer graphics
applications. Rabin et al. [33] apply this OT barycenter for texture synthesis and
mixing. The image mixing is achieved by computing OT barycenters of empirical
distributions of wavelet coefficients. A similar approach is proposed by Ferradans et
al. [15] for static and dynamic texture mixing using Gaussian distributions.

2
1.3. Regularized and relaxed transport.
Removing transport artifacts. The OT map between complicated densities is usu-
ally irregular. Using directly this transport plan to perform color transfer creates arti-
facts and amplifies the noise in flat areas of the image. Since the transfer is computed
over the 3-D color space, it does not take into account the pixel-domain regularity
of the image. The visual quality of the transfer is thus improved by denoising the
resulting transport using a pixel-domain regularization either as a post-processing [29]
or by solving a variational problem [29, 32].
Transport regularization. A more theoretically grounded way to tackle the prob-
lem of colorization artifacts should use directly a regularized OT. This corresponds
to adding a regularization penalty to the OT energy. This however leads to diffi-
cult non-convex variational problems, that have not yet been solved in a satisfying
manner either theoretically or numerically. The only theoretical contribution we are
aware of is the recent work of Louet and Santambrogio [24]. They show that in 1-
D the (un-regularized) OT is also the solution of the Sobolev regularized transport
problem.
Graph regularization and matching. For imaging applications, we use regulariza-
tions built on top of a graph structure connecting neighboring points in the input
density. This follows ideas introduced in manifold learning [39], that have been ap-
plied to various image processing problems, see for instance [13]. Using graphs enables
us to design regularizations that are adapted to the geometry of the input density,
that often has a manifold-like structure.
This idea of graph-based regularization of OT can be interpreted as a soft ver-
sion of the graph matching problem, which is at the heart of many computer vision
tasks, see [4, 46]. Graph matching is a quadratic assignment problem, known to be
NP-hard to solve. Similarly to our regularized OT formulation, several convex ap-
proximations have been proposed, including for instance linear programming [2] and
SDP programming [36].
Transport relaxation. The result of Louet and Santambrogio [24] is deceiving from
the applications point of view, since it shows that, in 1-D, no regularization is pos-
sible if one maintains a 1:1 assignment between the two densities. This is our first
motivation for introducing a relaxed transport which is not a bijection between the
densities. Another (more practical) motivation is that relaxation is crucial to solve
imaging problems such as color transfer. Indeed, the color distributions of natural
images are multi-modals. An ideal color transfer should match the modes together.
This cannot be achieved by classical OT because these modes often do not have the
same mass. A typical example is for two images with strong foreground and back-
ground dominant colors (thus having bi-modal densities) but where the proportion of
pixels in foreground and background are not the same. Such simple examples can-
not be handled properly with OT. Allowing a controlled variation of the matched
densities thus requires an appropriate relaxation of the mass conservation constraint.
Mass conservation relaxation is related to the relaxation of the bijectivity constraint
in graph matching, for which a convex formulation is proposed in [45].
1.4. Contributions. In this paper, we generalize the discrete formulation of OT
to tackle the two major flaws that we just mentioned: i) the lack of regularity of the
transport and ii) the need for a relaxed matching between densities. Our main contri-
bution is the integration of these two properties in a unified variational formulation to
compute a regular transport map between two empirical densities. The corresponding
optimization problem is convex and can be solved using standard convex optimiza-
3
tion procedures. We propose two optimization algorithms adapted to the different
class of regularizations. We apply this framework to the color transfer problem and
obtain results that are comparable to the state of the art. Our second contribution
takes advantage of the proposed regularized OT energy to compute the barycenter
of several empirical densities. We develop a block-coordinate descent method that
converges to a stationary point of the non-convex barycenter energy. We show an
application to color normalization between a set of photographs. Numerical results
show the relevance of these approaches to imaging problems. The matlab code to
reproduce the figures of this article is available online∗ .
Part of this work was presented at the conference SSVM 2013 [14].
2. Discrete Optimal Transport. Monge’s original formulation of the OT
problem corresponds to minimizing the cost for transporting a distribution µX onto
another distribution µY using a map T
Z
min c(x, T (x))dµX (x), where T #µX = µY . (2.1)
T X

Here, µX , µY are measures in Rd , T : Rd → Rd is a µX -measurable function, c :


Rd ×Rd → R+ is a µX ⊗µY -measurable function, and # is the push forward operator.
We focus here on the case where the measures are discrete, have the same number
of points, and all points have the same mass, thus
N N
1 X 1 X
µX = δX and µY = δY ,
N i=1 i N j=1 j

where δx is the Dirac measure at location x ∈ Rd , and where the position of the
supporting points are X = (Xi )N N d
i=1 , and Y = (Yj )j=1 , where Xi , Yj ∈ R . In this
context, the transport between X and Y is a one-to-one assignment, i.e. T (Xi ) = Yσ(i)
where σ is a permutation of {1, . . . , N }, which can be encoded using a permutation
matrix Σ such that

1 if j = σ(i),
Σi,j =
0 otherwise.

A more compact way to denote the transport is T (Xi ) = (ΣY )i , ∀i = {1, . . . , N }.


Introducing the cost matrix

CX,Y ∈ RN ×N where ∀ (i, j) ∈ {1, . . . , N }2 , (CX,Y )i,j = c(Xi , Yj ),

this permutation matrix Σ is thus the solution to the following optimization problem

N
X
min hCX,Y , Σi = c(Xi , Yj )Σi,j , (2.2)
Σ∈P
i,j=1

where P is the set of permutation matrices

P = Σ ∈ RN ×N \ Σ∗ I = I, ΣI = I, Σi,j ∈ {0, 1} ,


∗ https://www.ceremade.dauphine.fr/
~sira/regularizeddiscreteOT
4
see [41] for more details. We have denoted I = (1, . . . , 1)∗ ∈ RN , and A∗ as the adjoint
of the matrix A, that for real matrices amounts to the transpose operation.
In the special case where

(CX,Y )i,j = c(Xi , Yj ) = ||Xi − Yj ||α

where || · || is the Euclidean norm in Rd and α ≥ 1, the value of the optimization


problem (2.2) is called the Lα -Wasserstein distance (to the power α), and is denoted
Wα (µX , µY )α . It can be shown that Wα defines a distance on the set of distributions
that have moments of order α.
Kantorovich OT formulation. The set of permutation matrices P is not convex.
Its convex hull is the set of bi-stochastic matrices

S1 = Σ ∈ RN ×N \ ΣI = I, Σ∗ I = I, Σi,j ∈ [0, 1] .


One can show that the relaxation

min hCX,Y , Σi (2.3)


Σ∈S1

of (2.2) is tight, meaning that there exists a solution of (2.3) which is a binary matrix,
hence being also a solution of the original non-convex problem (2.2), see [41].
3. Relaxed and Regularized Transport. In the previous section, we intro-
duced the Monge-Kantorovich formulation for the computation of the OT between two
distributions, as the minimization of the energy (2.3). In this section, we modify this
energy in order to obtain a regular OT mapping, which is important for applications
such as color transfer.
3.1. Relaxed Transport. Section 4 tackles the color transfer problem, where,
as in many applications in imaging, strict mass conservation should be avoided. As a
consequence, it is not desirable to impose a one-to-one mapping between the points
in X and Y .
The relaxation we propose allows each point of X to be transported to multiple
points of Y and vice versa. This corresponds to imposing the constraints

kX I 6 ΣI 6 KX I and kY I 6 Σ∗ I 6 KY I

on the matrix Σ, where κ = (kX , KX , kY , KY ) ∈ (R+ )4 are the parameters of the


method. To impose the total amount of mass M transported between the densities,
we further impose the constraint I∗ ΣI = M , where M > 0 is a parameter. The initial
OT problem (2.3) now becomes:

min hCX,Y , Σi (3.1)


Σ∈Sκ

 
kX I 6 ΣI 6 KX I, ∗
where Sκ = Σ ∈ [0, 1]N ×N \ I ΣI = M .
kY I 6 Σ∗ I 6 KY I,

To ensure that Sκ is non empty, we impose that

M
max(kX , kY ) 6 6 min(KX , KY )
N
5
For the application to the color manipulations considered in this paper, we set once
and for all this parameter to M = N .
Note that if min(KX , KY ) > N , there is no restriction on the number of connec-
tions of each element of X or Y , then the optimal solution increases (always under the
constraints I∗ ΣI = M ) the weight given to the connection between the closest points
in X to the closest points in Y , that is to say, the minima (CX,Y )i,j are assigned the
maximum possible weight, see Fig. 3.1 for an example.
Problem (3.1) is a convex linear program, which can be solved using standard
linear programming algorithms.
Relaxed OT map. Optimal matrices Σ minimizing (3.1) are in general non binary
and furthermore their non zero entries do not define one-to-one maps between the
points of X and Y . It is however possible to define a map T from X to Y by mapping
each point Xi to a weighted barycenter of its neighbors in Y as defined by Σ. This
corresponds to defining
PN
j=1 Σi,j Yj
T (Xi ) = PN
j=1 Σi,j

which in vectorial form can be expressed as T (Xi ) = Zi , where Z = (diag(ΣI))−1 ΣY ,


and where the operator diag(v) creates a diagonal matrix in RN ×N with the vector
v ∈ RN on the diagonal. To insure that the map is well defined, we impose that
kX > 0. Note that it is possible to define a map from Y to X by replacing Σ by Σ∗
in the previous formula and exchanging the roles of X and Y .
The following proposition shows that an optimal Σ is binary when the parameters
κ are integers. Such a binary Σ can be interpreted as a set of pairwise assignments
between the points in X and Y . Note that this is not true in general when the
parameters κ are not integers.
Proposition 1. For (kX , KX , kY , KY , M ) ∈ (N∗ )5 , there exists a solution Σ̃
N ×N
of (3.1) which is binary, i.e. Σ̃ ∈ {0,
 1} N ×N .
Proof. One can write Sκ = Σ ∈ R \ A(Σ) 6 bκ where A is the linear
mapping A(Σ) = (−Σ, ΣI, −ΣI, Σ∗ I, −Σ∗ I, IΣI, −IΣI), where Σ∗ I, ΣI ∈ RN and IΣI ∈
R, and bκ = (0N,N , KX I, −kX I, KY I, −kY I, M, −M ). A standard result shows that
A is a totally unimodular matrix [37]. For any (kX , KX , kY , KY , M ) ∈ (N∗ )5 , the
vector bκ has integer coefficients, and thus the polytope Sκ has integer vertices. Since
there is always a solution of the linear program (3.1) which is a vertex of Sκ , it has
coefficients in {0, 1}.
Numerical Illustrations. In Fig. 3.1, we show a simple example to illustrate the
properties of the method proposed so far. Given a set of points X (in blue) and Y
(in red), we compute the optimal Σ solving (3.1) for different values of κ. For each
values of κ, we draw a line between Xi and Yj if the value of the associated optimal
Σi,j > 0.1, solid if Σi,j = 1, and dashed otherwise.
As we prove in the Proposition 1, for non integer values of KX , KY , the mappings
Σi,j are in [0, 1] while for integer values, Σi,j ∈ {0, 1}. Note that as we increase the
values of KX , KY (Fig. 3.1, right), the points in X tend to be mapped to the closer
points in Y .
3.2. Discrete Regularized Transport. So far, we have introduced a trans-
port problem where the mass conservation constraint is relaxed. The second step
is to define its regularization. A classic way of imposing regularity on a mapping
V : Rd → Rd is by measuring the amplitude of its derivatives. Two examples for
6
κ = (1, 1, 1, 1) κ = (1, 1, 0, 2) κ = (1, 1, 0.1, 10)

κ = (1, 1, 0.1, 1.5) κ = (0, 2, 1, 1) κ = (0.1, 10, 0.1, 10)

Figure 3.1. Relaxed transport computed between X (blue dots) and Y (red dots) for different
values of κ. Note that κ = (1, 1, 1, 1) corresponds to classical OT. A dashed line between Xi and Yj
indicates that Σi,j is not an integer.

continuous functions are the quadratic Tikhonov regularizations such as the Sobolev
semi-norm k∇V k2 , and the anisotropic total variation semi-norm k∇V k1 regulariza-
tion. Nevertheless, the differential operator ∇ cannot be applied directly to our point
clouds due to the lack of neighborhood definition. To extend the definition of the
gradient operator, we need to impose graph structures on the point clouds.
In our setting, we want to regularize the discrete map T defined in (3.1), which
is only defined at the location of the points as Xi 7→ Ṽi = Xi − diag(ΣI)−1 (ΣY )i . To
avoid the normalization diag(ΣI) (which typically leads to non-convex optimization
problems), and further regularize the variation of the weights ΣI ∈ RN , we impose a
regularity on the map Xi 7→ Vi = diag(ΣI)Xi − (ΣY )i .
Gradient on Graphs. A natural way to define a gradient on a point cloud X is by
using the gradient on a weighted graph GX = (X, EX , WX ) where EX ⊂ {1, . . . , N }2
is the set of edges and WX is the set of weights, WX = (wi,j )N 2 +
i,j=1 : {1, . . . , N } 7→ R ,
satisfying wi,j = 0 if (i, j) ∈
/ EX . The edges of this graph are defined depending on the
application. A typical example is the n-nearest neighbor graph, where every vertex
Xi is connected to Xj if Xj is one of the n-closest points to Xi in X, creating the
edge (i, j) ∈ EX , with a weight wi,j . Because the edges are directed, the adjacency
matrix is not symmetric.
The gradient operator on GX is defined as GX : RN ×d → RP ×d , where P = kEX k
is the number of edges and where, for each V = (Vi )N d
i=1 ∈ R ,

GX V = (wi,j (Vi − Vj ))(i,j)∈EX ∈ RP ×d .

A classic choice for the weights to ensure consistency with the directional derivative
is wi,j = ||Xi − Xj ||−1 , see for instance [17].
7
Regularity Term. The regularity of a transport map V ∈ RN ×d is then measured
according to some norm of GX V , that we choose here for simplicity to be the following
X p
Jp,q (GX V ) = (||wi,j (Vi − Vj )||q ) ,
(i,j)∈Gx

where k.kq is the `q norm in Rd .


The case (p, q) = (1, 1) is the graph anisotropic total variation, (p, q) = (2, 2) is
the graph Sobolev semi-norm, and (p, q) = (1, 2) is the graph isotropic total variation,
see for instance [13] for applications of these functionals to imaging problem such as
image segmentation and regularization.
3.3. Symmetric Regular OT Formulation. Given two point clouds X and
Y, our goal is to compute a relaxed OT mapping between them which is regular
with respect to both point clouds. To simplify notation, we conveniently re-write the
displacement fields we aim to regularize as:
∆X,Y (Σ) = diag(ΣI)X − ΣY and ∆Y,X (Σ∗ ) = diag(Σ∗ I)Y − Σ∗ X.
Our goal is to obtain a partial matching that is regular according to X and Y ,
so we create two graphs GX and GY as described in Section 3.2 and we denote the
corresponding gradient operators GX ∈ RPX ×N and GY ∈ RPY ×N where PX and PY
are the number of edges in the respective graphs. The symmetric regularized discrete
OT energy is defined as:
min hΣ, CX,Y i + λX Jp,q (GX ∆X,Y (Σ)) + λY Jp,q (GY ∆Y,X (Σ∗ )), (3.2)
Σ∈Sκ

where (λX , λY ) ∈ (R+ )2 controls the desired amount of regularity. The case κ =
(1, 1, 1, 1) and (λX , λY ) = (0, 0) corresponds to the usual OT defined in (2.3), and
(λX , λY ) = (0, 0) corresponds to the un-regularized formulation (3.1).
3.4. Algorithms. Specific values of the parameters p and q lead to different
regularization terms, which in turn necessitate different optimization methods. In the
following, for the sake of concreteness, we concentrate on the specific cases (p, q) =
(2, 2) and (p, q) = (1, 1).
Sobolev regularization. Defining q = p = 2 fixes the regularization term as a
graph-based Sobolev regularization. In this specific case, the minimization (3.2) be-
comes a quadratic programming problem
λX λY
min f (Σ) = hCX,Y , Σi + kΓX,Y (Σ)k2 + kΓY,X (Σ)k2 , (3.3)
Σ∈Sκ 2 2
where ΓX,Y (Σ) = GX ∆X,Y (Σ) and ΓY,X (Σ) = GY ∆Y,X (Σ∗ ). The Frank-Wolfe al-
gorithm is well tailored to solve such problems, as noticed for instance in [44], given
that f is convex and differentiable, and Sκ is a convex set. The Frank-Wolfe method
(also known as conditional gradient) iterates the following steps until convergence
Σ̃(`+1) ∈ argmin h∇f (Σ(`) ), Σ̃i
Σ̃∈Sκ (3.4)
(`+1) (`+1) (`+1) (`+1)
Σ =Σ + τ` (Σ̃ −Σ ),
where τ` is obtained by line-search. The first equation of (3.4) is a linear program
which is efficiently solved using interior point methods [28]. In our case, one has
∇f (Σ) = CX,Y + λX ∆∗X,Y (G∗X ΓX,Y (Σ)) + λY ∆∗Y,X (G∗Y ΓY,X (Σ)),
8
where

∆∗X,Y (U ) = diag∗ (U X ∗ )I∗ − U Y ∗ and ∆∗Y,X (U ) = (diag∗ (U Y ∗ )I∗ )∗ − XU ∗ ,

where diag∗ : RN ×N 7→ RN is the adjoint of the diag operator, and given A ∈ RN ×N ,


diag∗ (A) is a vector composed by the elements on the diagonal of A.
The line search optimal step can be explicitly computed as

−hE (`) , CX,Y i − hΓX,Y (E (`) ), ΓX,Y (Σ(`) )i − hΓY,X (E (`) ), ΓY,X (Σ(`) )i
τ` =
λX ||ΓX,Y (E (`) )||2 + λY ||ΓY,X (E (`) )||2

where E (`) = Σ(`+1) − Σ̃(`+1) .


Anisotropic TV Regularization. We define an anisotropic total variation (TV)
norm by setting the parameters q = p = 1. Problem (3.2) can be re-written as a
linear program by introducing the auxiliary variables UX ∈ RPX ×d and UY ∈ RPY ×d :

min hCX,Y , Σi + λX hUX , Ii + λY hUY , Ii


Σ,UX ,UY

 −UX 6 GX (ΣY − diag(ΣI)X) 6 UX ,
 (3.5)
subject to −UY 6 GY (Σ∗ X − diag(Σ∗ I)Y ) 6 UY ,

Σ ∈ Sκ .

Numerical Illustrations. In Fig. 3.2, we can observe, on a synthetic example, the


influence of the parameters κ and (λX , λY ), from equation (3.2).
For λX = λY = 0 one obtains the relaxed symmetric OT solution, where the
transport maps the points in X to the closest point on Y , and vice versa. As we
increase the values of λX and λY to 0.001, we can see how the regularization affects
the mapping. Let us analyze Jp,q (GX ∆X,Y (Σ)) = kGX diag(ΣI)X − GX ΣY k2 , for
instance. The term GX diag(ΣI)X is measuring the regularity of the weights diag(ΣI)
on X and the consequence is that for λX = λY = 0.001 there are plenty of connections
with low weight (there are few solid lines), while for λX = λY = 0 there are several
mappings with Σi,j = 1 (solid lines). So, the regularization promotes a spreading of
the matchings.
The minimum of Jp,q (GX ∆X,Y (Σ)) is reached when GX diag(ΣI)X = GX ΣY ,
that is, when the graph structure of X has the same shape as the graph structure
of ΣY , which both can be observed in the last column and row. For high values of
λX = λY the matchings tend to link the clusters by their shape, that is, the big
cluster on X with the big cluster of Y , and similarly for the small clusters (note that
the links with higher value are between the small clusters).
4. Application to Color Transfer. This section shows how the relaxed and
regularized OT formulation can be applied to imaging problems, more specifically
to color transfer, and how the regularization and the relaxation improve the results
obtained by previous methods. The color transfer problem consists in modifying an
input image X 0 so that its colors match the colors of another input image Y 0 .
4.1. Color Images and Histograms. In the following, an image is stored as a
vector X 0 ∈ RN0 ×d where d = 3 is the number of channels (here d = 3 since we handle
color images, with R, G and B color channels) and where N0 = N1 N2 is the number of
pixels (N1 being horizontal and N2 vertical dimensions). The color histogram of such
an image X 0 can be estimated using the empirical distribution µX 0 . The goal of color
9
λX = λY = 0 λX = λY = 0.001

λX = λY = 10 Graphs

Figure 3.2. Given two sets of points X (in blue) and Y (in red), we show the points Z =
diag(ΣI)−1 ΣY (in green), and the mappings Σi,j as line segments connecting Xi and Yj , which
are dashed if Σi,j ∈]0.1, 1[ and solid if Σi,j = 1. The results were obtained with the relaxed and
regularized OT formulation, setting the parameters to κ = (0.1, 8, 0.1, 8). Note the influence of a
change in λX and λY on the final result: with no regularization (λX = λY = 0) only few points
in the data set are matched. The introduction of regularization (λX = λY = 0.001) spreads the
connections among the clusters, while maintaining the cluster-to-cluster matching. For a high value
of λX = λY = 10, the regularization tends to match the clusters with similar shape with each other,
where the shape is defined by the graph structure. The graphs GX and GY are represented with the
nodes on blue and red respectively, and the edges as solid lines.

 
transfer algorithms is to compute a transformation T 0 such that X̃ 0 = T 0 (Xi0 ),
i
where the new empirical distribution µX̃ 0 is close (or equal) to µY 0 . Figure 4.1 shows
an example where X 0 , Y 0 are the original input images, the second row displays the
2-D projection of the 3-D distribution of pixels µX 0 and µY 0 , and in the third column,
we show the µX̃ 0 which is the result of applying T 0 to X 0 , where T 0 is computed
using the method described below. The associated image X̃ 0 has the geometry of X 0
and the color palette (3-D histogram) of Y 0 .
4.2. Regularized OT Color Transfer. As exposed in Section 1.2, OT is now
routinely used to perform color palette modification, and in particular color transfer.
As we illustrate below in the numerical examples, relaxing the mass conservation
constraint is crucial in order to better match the modes (i.e. the dominant colors) of
each distribution. Regularizing the transport is also important to reduce colorization
artifacts.
To make the optimization problem (3.2) tractable for histograms obtained from
large scale images, we apply the method on a sub-sampled point cloud. That is to say,
10
X0 Y0 X̃ 0
1 1 1

0.9 0.9 0.9

0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6

G
G

0.5
G

0.5
0.4
0.5
0.4
0.3
0.4
0.3
0.2

0.3
0.1 0.2

0 0.1 0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R R R

µX 0 µY 0 µX̃ 0

Figure 4.1. Example of the colorization problem. Given images X 0 and Y 0 with their corre-
sponding 3-D color distributions µX 0 and µY 0 (represented here using their 2-D projection on the
RG plane), the goal of colorization methods is to define an image X̃ 0 that has the geometry of X 0
and a histogram µX̃ 0 that is similar to µY 0 .

before computing the relaxed and regularized transport, we define two smaller point
clouds X and Y from X 0 and Y 0 . These clouds are created such that their respective
distributions µX and µY are close to the two original distributions µX 0 and µY 0 .
The mapping T between these small clouds is then extended by interpolation to the
original clouds. The complete algorithm for regularized OT color transfer between a
pair of images (X 0 , Y 0 ) is exposed in Algorithm 1. We now detail each step of the
method.

Algorithm 1: Regularized OT Color Transfer


Input: Images X 0 , Y 0 ∈ RN0 ×d , λX , λY ∈ R+ , and kX , KX , kY , KY ∈ R+ ,
where kX ≤ KX and kY ≤ KY .
Output: Image X̃ 0 ∈ RN0 ×d .
1. Histogram down-sample. Compute X, Y from X 0 , Y 0 respectively
using K-means clustering.
2. Compute Mapping. Compute the optimal Σ such that
T (X) = diag(ΣI)−1 ΣY by solving eq. (3.2) with algorithm (3.4) or the
linear program (3.5) solving with an interior point algorithm.
3. Obtain high resolution result. Compute X̃ 0 with eq. (4.1).

Pixels down-sampling. We construct a smaller data set X ∈ RN ×d by clustering


the set X 0 into N clusters with the K-means algorithm (see [23]). Each cluster
corresponds to a point Xi in our smaller data set X. The same procedure is done for
11
Y 0 to obtain Y ∈ RN ×d .
Graph and (GX , GY ) operator. As exposed in Section 3.2, the regularization is
defined using gradient operators (GX , GY ) on graphs (GX , GY ) connecting the points
in X and Y . Inspired by several recent works on manifold learning (see Section 1.3),
we use here a n-nearest neighbor graph, where n is the number of edges adjacent to
each vertex, i.e. | {j \ (i, j) ∈ EX } | = n where EX is the set of edges of X. The
weights of the graphs are defined as wi,j = ||Xi − Xj ||−1 (same applies for Y ), which
is consistent with the computation of the directional derivatives. An example of this
graph can be observed in Figure 4.2. Note that this graph does not need to be fully
connected.
Transport map computation. The regularized transport map T between the sub-
sampled data (X, Y ) is computed as

T (Xi ) = diag(ΣI)−1 ΣY i for i = 1, . . . , N




where Σ is a solution of (3.2).


Transport map up-sampling. The transport map T is extended to the whole space
using a nearest neighbor interpolation

∀ x ∈ Rd , T 0 (x) = T (Xi(x) ) + x − Xi(x) , where i(x) = argmin ||x − Xi ||. (4.1)


16i6N

Note that this interpolation scheme contains an additive term x − Xi(x) . This corre-
sponds to adding back the quantization error (due to the K-means sub-sampling) to
the nearest neighbors interpolation, which helps to restore small scale textural details,
and improves the visual quality of the result. This transport can now be applied to
the input image X 0 to obtain the new pixel values (X̃ 0 )i = T 0 (Xi0 ).

(a) (b)

Figure 4.2. (a)Flower image (b)its empirical distribution projected on the Red-Blue plane.
The line segments represent the edges EX of the n-nearest neighbor graph computed with n = 4.

4.3. Results. Figure 4.3 shows an example of color transfer between two syn-
thetic images X 0 and Y 0 shown in Figure 4.3 (a). We apply Algorithm 1 to obtain
the image X̃ 0 with a color palette close to Y 0 , but with the geometry of the original
X 0 . We now study the influence of the parameters λX and λY . Figure 3.2 shows a
2-D projection in the Red-Green plane of X and Y , displayed using respectively red
12
X0
Y0

(a) (b) (c) (d)

Figure 4.3. Effect of changing the parameters λX and λY of the relaxed and regularized OT
formulation presented in section 3.2, using parameters κ = (0.1, 8, 0.1, 8). (a) original input images,
(b) relaxed OT, λX = λY = 0; (c) λX = λY = 0.001; (d) λX = λY = 10. Each of these mappings
can be observed in Figure 3.2.

and blue, and X̃ in green. As already pointed out in Section 3.4, a low value of λX
and λY (zero for the first column) tends to match the points in X to the closest point
in Y. This behavior can be observed in the map of the column (b). Many points in
the big cluster of X are mapped to very few points in the small cluster of Y , which
corresponds in the images to mapping many red values of X 0 to very few brown values
in Y 0 . The consequence is that the color resolution of X̃ is reduced, the brown area of
Figure 4.3 (b) is flat, unlike the original brown values in Y . As we increase the value
of λX and λY , the mapping spreads within the small cluster of Y in Figure 3.2(b) and
we gain color resolution, as can be observed in Figure 4.3 (c). On the other hand, if
we increase too much the values of λX and λY , many points in X get matched to the
big cluster in Y in Figure 3.2 (c) which leads to a single dominant color in the final
image X̃ 0 , in Figure 4.3 (d).
Comparison with the state of the art. Figure 4.4 shows some results on natural
images and compare them with the methods of Pitié et al. [31] and Papadakis et.
al [29]. The goal of the experiment is to transfer the color palette of the images in
the second row to the image on the first row. Note that the methods in the state
of the art introduce color artifacts (in the first column there is violet outside the
flower, and in the second column the wheat is blueish), which can be avoided with
the proposed method by an appropriate choice of λX , λY and κ. These results were
obtained setting N = 400 and constructing the graph as a 4-nearest neighbor graph.
By column, the values of λX = λY are 9 × 10−4 , 5 × 10−4 , and 10−3 , and κ was set
to (0.1,1.1,0.1,1.1), (0.1,1.3,0.1,1.3), and (0.1,1,0.1,1), respectively.

5. Regularized OT Barycenters. As presented in the introduction, Section 1.2,


for certain applications in imaging such as texture mixing or color normalization, it
may be useful to compute the barycenter distribution of a set of input distributions.
Until now we focused on the computation of the mapping between two given distri-
butions, now we are interested in finding a new distribution in-between two or more
distributions.
Asymmetric regularized OT metric. To simplify the optimization process, we con-
sider the asymmetric version of the regularized OT energy (3.2). We maintain one
data set as a reference, let say X, by taking into account all its points (kX = KX = 1)
and only perform regularization with respect to its own graph, i.e. λY = 0. Thus, we
13
Original X 0
Original Y 0
Pitié et al. [31]
Papadakis et al. [29]
Proposed method

(a) (b) (c)

Figure 4.4. Comparison between the results obtained with our method and with the methods
of [31] and [29] for image colorization. Note how the proposed method is able to generate results
without color artifacts for example, in (a) the violet color of the flower is not spread outside the
flower, in (b) the wheat does not become bluish and in (c) the result does not enhance or colorize
differently the flat areas of the background.

14
simplify our expression into the following asymmetric distance:

D(µX , µY ) = min E(Σ) = hCX,Y , Σi + λJp,q (GX ∆X,Y (Σ)) (5.1)


Σ∈Dk

Dk = Σ ∈ [0, 1]N ×N \ ΣI = 1, Σ∗ I 6 kI .

where

Note that Sκ = Dk for κ = (1, 1, 0, k). In general, D is not a distance, since it is not
symmetric and one can have D(µX , µY ) = 0 while having µX 6= µY (which is crucial
to allow relaxing of mass conservation condition).
Barycenter. Given a set of input clouds (X [r] )r∈R indexed by R and weights
(ρr )r∈R ∈ (R+ )R , we define a barycenter cloud X as a local minimizer of
X
min Eρ (X) = ρr D(µX [r] , µX ). (5.2)
X∈RN ×d
r∈R

In the case λ = 0 and k = 1, one recovers barycenters over the Wasserstein space, see
the introduction for more details.
5.1. Block-coordinate Descent. The minimization of (5.2) can be performed
by doing a joint minimization on both the barycenter cloud X and a set of matrices
Σ[r] ∈ Dk
X  
min ρr hCX [r] ,X , Σ[r] i + λJp,q (GX [r] (X [r] − Σ[r] X)) . (5.3)
X,(Σ[r] )r∈R
r∈R

This is a non-convex optimization problem. Fortunately, it is separately convex with


respect to each of its variables X and (Σ[r] )r∈R , so one can use the block coordinate
descent scheme. The block coordinate descent method consists in optimizing a given
energy by iteratively minimizing with respect to each of its variables, in our case X
and (Σ[r] )r∈R .
Update Σ[r] . This corresponds to performing in parallel |R| independent relaxed
regularized OT. Fixing X, one solves independently for each Σ[r] the convex problem

min hCX [r] ,X , Σ[r] i + λJp,q (GX [r] (X [r] − Σ[r] X)). (5.4)
Σ[r] ∈Dk

For (p, q) = (2, 2) or (p, q) = (1, 1) this minimization can be solved using the algo-
rithms detailed in Section 3.4.
Update X. Then, one solves for X the following convex optimization problem
X  
min H(X) = ρr hCX [r] ,X , Σ[r] i + λJp,q (GX [r] (X [r] − Σ[r] X)) . (5.5)
X∈RN ×d
r∈R

Update X: Sobolev regularization. The minimization of (5.5) when p = q = 2 is


an unconstrained quadratic problem, whose solution is obtained solving the following
symmetric linear system
X   X  
ρr Σ[r] − λΣ[r]∗ G∗X [r] GX [r] X = ρr Σ[r]∗ X [r] − λΣ[r]∗ G∗X [r] GX [r] X [r] ,
r∈R r∈R
(5.6)
which corresponds to solving ∇H(X) = 0. The solution to this symmetric linear
system can be computed using for instance the conjugate gradient algorithm.
15
Update X: Anisotropic TV regularization. When (p, q) = (1, 1), (5.5) is a linear
program which can be solved using for instance interior point solvers [28]. An alter-
native option, that we detail here, is to use first order proximal splitting schemes,
that are well tailored for such highly structured problems. We propose here to use
the primal-dual splitting scheme developed in [8].
The problem (5.5) can be re-casted as a minimization of the form

min F (K(X)) + H(X)


X∈RN ×d

 K(X) = {Br X}r∈R P,
 K ∗ ({U }

) = ∗ (5.7)
where
r r∈R Pr∈R Br Ur [r]

 F ({Ur }r∈R
P) = λ r∈R ρr kGX[r][r] X − Ur k1
H(X) = r∈R ρr hCX [r] ,X , Σ i

where Br = GX [r] Σ[r] . Let us now recall that the proximal operator of a function F
is defined as
1
ProxγF (X) = argmin ||X − X̃||2 + γF (X̃),
X̃ 2

and being able to compute the proximal mapping of F is equivalent to being able
to compute the proximal mapping of the Legendre-Fenchel dual F ∗ of F , thanks to
Moreau’s identity

X = ProxγF ∗ (X) + γ ProxF/γ (X/γ).

Then, the primal-dual algorithm of [8] to minimize F ◦ K + H reads

Λk+1 = ProxµF ∗ (Λk + µK(X̃ k ),


X k+1 = Proxτ H (X k − τ K ∗ (Λk+1 )), (5.8)
k+1 k+1 k+1 k
X̃ =X + θ(X − X ),

with θ ∈ (0, 1] and where


X
Proxτ F (U ) = G[r] Σ[r] X [r] + Sτ (U [r] − G[r] Σ[r] X [r] )
r∈R
!−1 !
X X
[r] [r]∗ [r]
Proxτ H (Y ) = Id + τ ρr Σ Y +τ ρr Σ X
r∈R r∈R

where Sτ is the soft thresholding function, defined as


 
τ
∀ i = 1, . . . , N, Sτ (U )i = max 0, 1 − Ui .
||Ui ||

5.2. Algorithm. The algorithm starts by some initial point set X (0) , which is
typically chosen to be equal to X [r] where r corresponds to the maximum value of
[r],(`)
ρr . It then constructs iterates (X (`) )` and (Σi,j )r by solving respectively (5.4)
and (5.5). This is detailed in Algorithm 2.
16
Algorithm 2: Regularized and relaxed OT barycenter
Input: Point sets (X [r] )r∈R , weights (ρr )r∈R , initialization X (0) .
Output: Barycenter point set X (`) , computed for ` large enough.
1. Initialization. Set ` = 0.
2. Update of Σ[r] . For each r ∈ R, compute Σ[r],(`+1) by solving (5.4)
where X = X (`) is fixed, using the algorithms detailed in Section 3.4.
3. Update of X. Compute X (`+1) by solving (5.5) where Σ[r] = Σ[r],(`+1)
are fixed. If (p, q) = (2, 2) solve (5.6), if (p, q) = (1, 1), use the
algorithm (5.8).
4. Convergence. While not converged, set ` ← ` + 1 and go back to 2.

5.3. Convergence. The block coordinate descent methods are known to con-
verge for smooth and differentiable energies [40]. The following theorem ensures the
convergence of the proposed algorithm in the case of the Sobolev regularization. For
the anisotropic regularization, one cannot ensure the convergence to stationary points,
although in practice, we always observe it in our numerical tests.
Theorem 1. When (p, q) = (2, 2), the iterates X (`) of the algorithm are bounded
and hence admit converging sub-sequences. The energies Eρ (X (`) ) (with Eρ defined
in (5.2))are decaying and converging to Ẽ. All converging sub-sequences converge to
stationary points of Eρ having the same energy Ẽ.
Proof. By construction, the energy Eρ (X (`) ) is decaying and positive, hence con-
verging. The algorithm minimizes (5.3), which reads
 
[r] [r]
X
min Ē((Σ[r] )r , X) = ρr ||Xi − Xi ||2 Σi,j + λJ2,2 GX [r] (X [r] − Σ[r] X) .
Σ[r] ∈Dk ,X
i,j,r

[r],(`)
Since Σ[r] ∈ Dk which is a bounded set, the iterates (Σi,j )r produced by the
algorithm are bounded and hence they admit converging sub-sequences.
For any iteration index `, one has
[r] (`) [r],(`)
X
ρr ||Xi − Xj ||2 Σi,j 6 Ē((Σ[r],(`) )r , X (`) ) 6 Eρ (X (0) ) (5.9)
i,j,r

[r],(`)
where (Σi,j )r are the matrices obtained at the previous iteration of the method.
[r],(`)
We let r be any index such that ρr > 0. For any j we denote Pγi = Σi,j (we ignore
dependency with (j, r, `) for ease of notations) that satisfy i γi = 1 and define the
barycenter
[r]
X
X̄j = γi Xi
i

[r]
which is a point in the convex hull of the (Xi )i , and is hence bounded independently
of j and `.
Equation (5.9) implies

X [r] (`) Eρ (X (0) )


γi ||Xi − Xj ||2 6
i
ρr
17
(`)
By convexity of the function x ∈ Rd 7→ ||Xj − x||2 , one has

(`)
X (`) [r] Eρ (X (0) )
||Xj − X̄j ||2 6 γi ||Xj − Xi ||2 6
i
ρr

This shows that the iterates X (`) of the algorithm are bounded, and hence admit
converging sub-sequences.
Given that the energy Ē is convex with respect to the variables (Σ[r] )r∈R and
X (although not jointly convex) and the non convex terms J2,2 (GX [r] (X [r] − Σ[r] X))
that mixes the variables is C 1 with Lipschitz gradient, one can apply the Theorem
4.1 of [40], which shows that any converging sub-sequence converges to a stationary
point of Ēρ .
6. Application to color normalization. Color normalization is the process
of imposing the same color palette on a group of images. This color palette is always
somehow related to the color palettes of the original images. For instance, if the
goal is to cancel the illumination of a scene (avoid color cast), then the imposed
histogram should be the histogram of the same scene illuminated with white light.
Of course, in many occasions this information is not available. Following Papadakis
et al. [29], we define an in-between histogram, which is chosen here as the regularized
OT barycenter.
6.1. Algorithm. Given a set of input images (X 0[r] )r∈R , the goal is to impose
on all the images the same histogram µX associated to the barycenter X. As for the
colorization problem tackled in Section 4, the first step is to subsample the original
cloud of points X 0[r] to make the problem tractable. Thus, for every X 0[r] we com-
pute a smaller associated point set X [r] using K-means clustering. Then, we obtain
the barycenter X of all the point clouds (X [r] )r∈R with the algorithms presented in
Section 5.1. Figure 6.1 first row, shows an example on two synthetic cloud of points,
X [1] in blue and X [2] in red. The cloud of points in green corresponds to the barycen-
ter X, which can change its position depending on the parameter ρ = (ρ1 , ρ2 ) in (5.2)
from X [1] for ρ = (1, 0) to X [2] when ρ = (0, 1). This data set X represents the 3-D
histogram we want to impose on all the input images.
Once we have X, we compute the regularized and relaxed OT transport maps
T [r] between each X [r] and the barycenter X, by solving (3.2). The line segments
[1] [1]
in Figure 6.1 represent the transport between points clouds, i.e. if Σi,j > 0, Xi is
linked to Xj , and similarly for Σ[2] .
We apply T [r] to X [r] , obtaining X̃ [r] , for all r ∈ R, that is to say, we obtain a set
of point clouds X̃ [r] with a color distribution close to X. Finally, to recover a set of
high resolution images, we compute each X̃ 0[r] from X 0[r] by up-sampling. A detailed
description of the method is given in Algorithm 3.
6.2. Results. We now show some example of color normalization using Algo-
rithm 3.
Synthetic example. Figure 6.1 shows a comparison of normalization of two syn-
thetic images using classical OT and our proposed relaxed/regularized OT. The results
obtained using Algorithm 3 (setting p = q = 2), using the set of two images (|R| = 2)
already used in Figure 4.3 (a), denoting here X 0[1] = X 0 and X 0[2] = Y 0 . Each
column shows the same experiment but with different values of ρ, which allows to
visualize the interpolation between the color palettes (the colors in the images evolve
from the colors in X [1] towards the colors of X [2] ).
18
Algorithm 3: Regularized OT Color Normalization
Input: Images X 0[r] r∈R ∈ RN0 ×d , λ ∈ R+ , ρ ∈ [0, 1]|R| and k ∈ R+ .

 
Output: Images X̃ 0[r] ∈ RN0 ×d .
r∈R
1. Histogram down-sample. Compute X [r] from X 0[r] using K-means
clustering.
2. Compute barycenter. Compute with either (5.6) or (5.7) a barycenter
µX where X is a local minimum of (5.2) using the block coordinate descent
described in Section 5.1, see Algorithm 2.
3. Compute transport mappings. For all r ∈ R compute T [r] between
[r] [r]
X and X [r] by solving (3.2), such that T [r] (Xi ) = Zi , where
Z [r] = diag(Σ[r] I)−1 Σ[r] X.
4. Transport up-sample. For every T [r] compute T̃ 0[r] following (4.1).
5. Obtain high resolution results. Compute ∀ r, X̃ 0[r] = T̃ 0[r] (X 0[r] ).

With classical OT, the structure of the original data sets in not preserved as we
change ρ, and the consequence on the final images (second and third row), is that the
geometry of the original images changes in the barycenters. In contrast to classical
OT, for all values of ρ the relaxed/regularized barycenters X have the same number
of clusters of the original sets. Note that the consequence of having a transport that
maintains the clusters of the original images, is that the geometry is preserved, while
the histograms change.
Example on natural images. Fig. 6.2 shows the results of the same experiment
as in Fig. 6.1, but on the natural images labeled as X 0[1] and X 0[2] in rows #1 and
#6. In this case, we only show the transport from X to X 0[1] , that is to say, we
maintain the geometry of X 0[1] (row #1) and match its histogram to the barycenter
distribution. As in the previous experiment, note how the colors change smoothly
from (1, 0) to (0, 1) without generating artifacts and match the color and contrast of
image X 0[2] for ρ = (0, 1). The change in contrast is specially visible for the (b) wheat
image.
Color Normalization. Computing the barycenter distribution of the histograms
of a set of images is useful for color normalization. We show in Figures 6.3, and 6.4
the results obtained with Algorithm 3, and compare them with the standard OT and
the method proposed by Papadakis et al. [29]. The improvement of the relaxation
and regularization is specially noticeable in Figures 6.3 where OT creates artifacts
such as coloring the leaves on violet for Figure 6.3 (a), or introducing new colors on
the background in Figure 6.3 (c). In Figure 6.4, OT and Papadakis et al.’s method
introduce artifacts mostly on the sky of Figure 6.4 (a) and Figure 6.4 (b), while the
relaxed and regularized version displays a smoother result for Figure 6.4 (a) and (c)
and a more meaningful color transformation (all the clouds have the same color in
the fourth row) for Figure 6.4 (b).
As a final example, we would like to show in Figure 6.5 how this method can be
applied as a preprocessing before comparing/registering images of the same object
obtained under different illumination conditions.

Conclusion. In this paper, we have proposed a generalization of the discrete


optimal transport that enables to relax the mass conservation constraint and to reg-
ularize the transport map. We showed how this novel class of transports can be
19
ρ = (1, 0) ρ = (0.7, 0.3) ρ = (0.4, 0.6) ρ = (0, 1)

Figure 6.1. Comparison of classical OT (top 3 first rows) and relaxed/regularized OT (bottom
3 last rows). The original input images X 0,[1] and X 0,[2] are shown in Figure 4.3 (a). Rows #1 and
#4 shows the 2-D projections of X [1] (blue) and X [2] (red), and in green the barycenter distribution
[r] [r]
for different values of ρ. We display a line between Xi and Xj if Σi,j > 0.1. Rows #2 and #5
(resp. #3 and #6) show the resulting normalized images X̃ 0[1] (resp. X̃ 0[2] ), for each value of ρ.
Top 3 first rows: classical OT corresponding to setting k = 1 and λ = 0. Bottom 3 last rows:
regularized and relaxed OT, with parameters k = 20 and λ = 0.0005. See main text for comments.

20
Original X 0[1]
ρ = (1, 0)
ρ = (0.7, 0.3)
ρ = (0.4, 0.6)
ρ = (0, 1)
Original X 0[2]

(a) (b) (c)

Figure 6.2. Results for the barycenter algorithm on different images computed with the method
proposed in Section 5.1. The parameters were set to (a) k = 1.1, λ = 0.0009, (b) k = 1.3, λ = 0.01,
and (b) k = 1, λ = 0.001. Note how as ρ approaches (0, 1), the histogram of the barycenter image
becomes similar to the histogram of X 0[2] .

21
(a) (b) (c)

Figure 6.3. In the first row, we show the original images. In the following rows, we show the
result of computing the barycenter histogram and imposing it on each of the original images, with
different algorithms. In the second row, we use OT. In the third row, the results were obtained with
the method proposed by Papadakis et al. [29]. On the last row, we show the results obtained with
the relaxed and regularized OT barycenter with k = 2, λ = 0.005. Note how the proposed algorithm
is the only one that does not produce artifacts on the final images such as (a) color artifacts on the
leaves and (c) different colors on the background.

applied to color transfer and that regularization is crucial to reduce noise amplifica-
tion artifacts, while relaxation enables to cope with mass variation of the modes of the
color palettes. We have extended these ideas to compute the relaxed and regularized
barycenter of a set of input distributions. We illustrate the usefulness of this novel
barycenter to perform color palette normalization of a group of input images.
Acknowledgements. The authors would like to thank Julien Rabin for advises
on color transfer and stimulating discussions.
22
(a) (b) (c)

Figure 6.4. Same experiment as in Figure 6.3, but setting for the final row k = 1.3, λ = 0.0005.
Note how the proposed method does not create artifacts on the sky and the clock for images (a) and
(c), as OT or the method proposed by Papadakis et al. [29].

REFERENCES

[1] M. Agueh and G. Carlier, Barycenters in the wasserstein space, SIAM Journal on Mathe-
matical Analysis, 43 (2011), pp. 904–924.
[2] H. A. Almohamad and S. O. Duffuaa, A linear programming approach for the weighted graph
matching problem, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15
(1993), pp. 522–525.
[3] R. Palma Amestoy, E. Provenzi, M. Bertalmı́o, and V. Caselles, A perceptually inspired
variational framework for color enhancement, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 31 (2009), pp. 458–474.
[4] S. Belongie, J. Malik, and J. Puzicha, Shape matching and object recognition using shape
contexts, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002),
pp. 509–522.
23
Figure 6.5. The proposed method can be applied as a preprocessing step in a pipeline for
objects detection or image registration, where canceling illumination is important. On the first row,
we show a set of pictures of the same object taken at different hours of the day or night, and on the
second row, the result of our algorithm setting (p, q) = (2, 2), k = 1 and λ = 0.0005. Note how the
algorithm is able to normalize the illumination conditions of all the images.

[5] J.-D. Benamou and Y. Brenier, A computational fluid mechanics solution of the Monge-
Kantorovich mass transfer problem, Numerische Mathematik, 84 (2000), pp. 375–393.
[6] D.P. Bertsekas, The auction algorithm: A distributed relaxation method for the assignment
problem, Annals of Operations Research, 14 (1988), pp. 105–123.
[7] N. Bonneel, M. van de Panne, S. Paris, and W. Heidrich, Displacement interpolation using
lagrangian mass transport, ACM Transactions on Graphics (Proceedings of SIGGRAPH
Asia 2011), 30 (2011).
[8] A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with
applications to imaging, Journal of Mathematical Imaging and Vision, 40 (2011), pp. 120–
145.
[9] L. Csink, D. Paulus, U. Ahlrichs, and B. Heigl, Color normalization and object localization,
in In Rehrmann, 1998, pp. 49–55.
[10] G. B. Dantzig, Linear Programming and Extensions, Princeton University Press, Princeton,
NJ, 1963.
[11] J. Delon, Midway image equalization, Journal of Mathematical Imaging and Vision, 21 (2004),
pp. 119–134.
[12] J. Delon, Movie and video scale-time equalization application to flicker reduction, IEEE Trans-
actions on Image Processing, 15 (2006), pp. 241–248.
[13] A. Elmoataz, O. Lezoray, and S. Bougleux, Nonlocal discrete regularization on weighted
graphs: A framework for image and manifold processing, IEEE Transactions on Image
Processing, 17 (2008), pp. 1047–1060.
[14] S. Ferradans, N. Papadakis, G. Peyré, and J-F. Aujol, Regularized discrete optimal trans-
port, in Scale Space and Variational Methods in Computer Vision, SSVM’13, 2013, pp. 428–
439.
[15] S. Ferradans, G-S. Xia, G. Peyré, and J-F. Aujol, Static and dynamic texture mixing using
optimal transport, in Scale Space and Variational Methods in Computer Vision, SSVM’13,
2013, pp. 137–148.
[16] W. Gangbo and A. Świȩch, Optimal maps for the multidimensional Monge-Kantorovich
problem, Communications on Pure and Applied Mathematics, 51 (1998), pp. 23–45.
[17] G. Gilboa and S. Osher, Nonlocal linear image regularization and supervised segmentation,
SIAM Multiscale Modeling and Simulation, 6 (2007), pp. 595—630.
[18] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley Longman Pub-
lishing Co., Inc., Boston, MA, USA, 2nd ed., 2001.
24
[19] H. W. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistic
Quarter, 2 (1955), pp. 83–97.
[20] S. Haker, L. Zhu, A. Tannenbaum, and S. Angenent, Optimal mass transport for registra-
tion and warping, International Journal of Computer Vision, 60 (2004), pp. 225–240.
[21] L. Kantorovitch, On the translocation of masses, Management Science, 5 (1958), pp. 1–4.
[22] E. H. Land and J. J. McCann, Lightness and retinex theory, Journal of the Optical Society
of America, 61 (1971), pp. 1–11.
[23] S.P. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory,
IT-20, 28 (1982), pp. 129–137.
[24] J. Louet and F. Santambrogio, A sharp inequality for transport maps in via approximation,
Applied Mathematics Letters, 25 (2012), pp. 648 – 653.
[25] R.J. McCann, A convexity principle for interacting gases, Advances in Mathematics, 128
(1997), pp. 153–179.
[26] A. J. McCollum and W. F. Clocksin, Multidimensional histogram equalization and modi-
fication, in International Conference on Image Analysis and Processing, ICIAP’07, 2007,
pp. 659–664.
[27] J. Morovic and P-L. Sun, Accurate 3d image colour histogram transformation, Pattern Recog-
nition Letters, 24 (2003), pp. 1725–1735.
[28] Y. E. Nesterov and A. S. Nemirovsky, Interior Point Polynomial Methods in Convex Pro-
gramming : Theory and Algorithms, SIAM Publishing, 1993.
[29] N. Papadakis, E. Provenzi, and V. Caselles, A variational model for histogram transfer of
color images, IEEE Transactions on Image Processing, 20 (2011), pp. 1682–1695.
[30] O. Pele and M. Werman, Fast and robust earth mover’s distances, in IEEE International
Conference on Computer Vision, ICCV’09, 2009, pp. 460–467.
[31] F. Pitié, A. C. Kokaram, and R. Dahyot, Automated colour grading using colour distribution
transfer, Computer Vision and Image Understanding, 107 (2007), pp. 123–137.
[32] J. Rabin and G. Peyré, Wasserstein regularization of imaging problem, in IEEE International
Conderence on Image Processing, ICIP’11, 2011, pp. 1541–1544.
[33] J. Rabin, G. Peyré, J. Delon, and M. Bernot, Wasserstein barycenter and its application
to texture mixing, in Scale Space and Variational Methods in Computer Vision, vol. 6667
of SSVM’11, 2011, pp. 435–446.
[34] E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, Color transfer between images,
IEEE transactions on Computer Graphics and Applications, 21 (2001), pp. 34 –41.
[35] Y. Rubner, C. Tomasi, and L.J. Guibas, A metric for distributions with applications to image
databases, in International Conference on Computer Vision, ICCV’98, 1998, pp. 59–66.
[36] C. Schellewald, S. Roth, and C. Schnörr, Evaluation of a convex relaxation to a quadratic
assignment matching approach for relational object views, Image and Vision Computing,
25 (2007), pp. 1301–1314.
[37] A. Schrijver, Theory of linear and integer programming, John Wiley & Sons, Inc., New York,
NY, USA, 1986.
[38] Y-W. Tai, J. Jia, and C-K. Tang, Local color transfer via probabilistic segmentation by
expectation-maximization, in IEEE Conference on Computer Vision and Pattern Recogni-
tion, vol. 1 of CVPR’05, 2005, pp. 747–754.
[39] J. B. Tenenbaum, V. de Silva, and J. C. Langford, A Global Geometric Framework for
Nonlinear Dimensionality Reduction, Science, 290 (2000), pp. 2319–2323.
[40] P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimiza-
tion, Journal of Optimization Theory and Applications, 109 (2001), pp. 475–494.
[41] C. Villani, Topics in Optimal Transportation, Graduate Studies in Mathematics Series, Amer-
ican Mathematical Society, 2003.
[42] C-M. Wang and Y-H. Huang, A novel color transfer algorithm for image sequences, Journal
of Information Science and Engineering, 20 (2004), pp. 1039–1056.
[43] X. Xiao and L. Ma, Color transfer in correlated color space, in ACM International Conference
on Virtual Reality Continuum and Its Applications, VRCIA’06, 2006, pp. 305–309.
[44] M. Zaslavskiy, F. Bach, and J.-P. Vert, A path following algorithm for the graph matching
problem, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (2009),
pp. 2227–2242.
[45] M. Zaslavskiy, F. Bach, and J-P. Vert, Many-to-many graph matching: a continuous relax-
ation approach, in European Conference on Machine Learning and Practice of Knowledge
Discovery in Databases, ECML PKDD’10, 2010, pp. 515–530.
[46] Y. Zheng and D. Doermann, Robust point matching for nonrigid shapes by preserving local
neighborhood structures, IEEE Transactions on Pattern Analysis and Machine Intelligence,
28 (2006), pp. 643–649.

25

You might also like