On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
Abstract
This paper studies the equitable and optimal transport (EOT) problem, which has many
applications such as fair division problems and optimal transport with multiple agents etc.
In the discrete distributions case, the EOT problem can be formulated as a linear program
(LP). Since this LP is prohibitively large for general LP solvers, (Scetbon et al., 2021)
suggests to perturb the problem by adding an entropy regularization. They proposed
a projected alternating maximization algorithm (PAM) to solve the dual of the entropy
regularized EOT. In this paper, we provide the first convergence analysis of PAM. A novel
rounding procedure is proposed to help construct the primal solution for the original EOT
problem. We also propose a variant of PAM by incorporating the extrapolation technique
that can numerically improve the performance of PAM. Results in this paper may shed
lights on block coordinate (gradient) descent methods for general optimization problems.
Keywords: Equitable and Optimal Transport, Fairness, Saddle Point Problem, Projected
Alternating Maximization, Block Coordinate Descent, Acceleration, Rounding.
1. Introduction
Optimal transport (OT) is a classical problem that recently finds many emerging applica-
tions in machine learning and artificial intelligence, including generative models (Arjovsky
et al., 2017), representation learning (Ozair et al., 2019), reinforcement learning (Bellemare
et al., 2017) and word embeddings (Alvarez-Melis et al., 2019) etc. More recently, (Scetbon
et al., 2021) proposed an equitable and optimal transport (EOT) problem that targets to
fairly distribute the workload of OT when there are multiple agents. In this problem formu-
lation, there are multiple agents working together to move mass from measures µ to ν and
each agent has its unique cost function. A very important issue that needs to be considered
here is the fairness, which aims at finding transportation plans such that the workloads
among all the agents are equal to each other. This can be achieved by minimizing the
largest transportation cost among all agents, which leads to a convex-concave saddle point
problem. The EOT problem has wide applications in economics and machine learning,
such as fair division or the cake-cutting problem (Moulin, 2003; Brandt et al., 2016), multi-
type resource allocation (Mackin and Xia, 2015), internet minimal transportation time and
sequential optimal transport (Scetbon et al., 2021).
WePnow describe the EOT Pn problem formally. Given two discrete probability measures
n
µn = i=1 ai δxi and νn = i=1 bi δyi , the EOT studies the problem of transporting mass
from µ to ν by N agents. Here, {x1 , x2 , ..., xn } ⊂ Rd and {y1 , y2 , ..., yn } ⊂ Rd are the
support points of each measure and a = [a1 , a2 , ..., an ]> ∈ ∆n+ , b = [b1 , b2 , ..., bn ]> ∈ ∆n+ are
corresponding weights for each measure, where ∆n+ denotes the probability simplex in Rn .
Moreover, throughout this paper, we assume vector b > 0. For each agent k, we denote
its unique cost function as ck (x, y), k ∈ [N ] = {1, . . . , N } and its cost matrix as C k , where
k = ck (x , y ). Moreover, we define the following coupling decomposition set
Ci,j i j
( ! ! )
X X
ΠN k
a,b := π = (π )k∈[N ] r πk = a, c πk = b, k
πij ≥ 0, ∀i, j ∈ [n] ,
k k
where r(π) = π1, c(π) = π > 1 are the row sum and column sum of matrix π respectively.
Mathematically, the EOT problem can be formulated as
When N = 1, (1) reduces to the standard OT problem. Note that (1) minimizes the point-
wise maximum of a finite collection of functions. It is easy to see that (1) is equivalent to
the following constrained problem:
N
X
π , λ) :=
min max `(π λk hπ k , C k i. (2)
π ∈ΠN N
a,b λ∈∆+ k=1
The following proposition shows an important property of EOT: at the optimum of the
minimax EOT formulation (2), the transportation costs of the agents are equal to each
other.
Proposition 1 (Scetbon et al., 2021)[Proposition 1] Assume that all entries of all cost
matrices C k , k ∈ [N ] have the same sign. Let π ∗ ∈ ΠN a,b be the optimal solution of (2). It
holds that
h(π ∗ )i , C i i = h(π ∗ )j , C j i, ∀i, j ∈ [N ]. (3)
Note that Proposition 1 requires all entries of all cost matrices to have the same sign. When
the cost matrices are all non-negative, (2) solves the transportation problem with multiple
agents. When the cost matrices are all non-positive, the cost matrices are interpreted as
the utility functions and (2) solves the fair division problem (Moulin, 2003).
The discrete OT is a linear programming (LP) problem (in fact, an assignment prob-
lem) with a complexity of O(n3 log n) (Tarjan, 1997). Due to this cubic dependence on the
2
On the Convergence of PAM for Equitable and Optimal Transport
• We provide the first convergence analysis of the PAM algorithm, and analyze its
iteration complexity for finding an -optimal solution to the EOT problem (2). In
particular, we show that it takes at most O(N n2 −2 ) arithmetic operations to find
an -optimal solution to (2). This matches the rate of the Sinkhorn’s algorithm for
computing the Wasserstein distance (Dvurechensky et al., 2018).
3
Huang and Ma and Lai
k k k k k
where η > 0 is a regularization parameter, P pη (π , λ) := λk hπ , C i − ηH(π ), and the
entropy function H is defined as H(π) = − i,j πi,j (log πi,j −1). The entropy regularization
was first introduced into the OT problem by (Cuturi, 2013) and is now widely used in the
OT community. By adding an entropy regularizer, the primal problem becomes strongly
convex and the dual problem is unconstrained and is suitable for alternating maximization.
This leads to the Sinkhorn’s algorithm which has low per-iteration complexity and thus
is scalable. The PAM algorithm proposed by (Scetbon et al., 2021) used the same idea
for the EOT problem. Note that (4) is a strongly-convex-concave minimax problem whose
constraint sets are convex and bounded, and thus the Sion’s minimax theorem (Sion, 1958)
guarantees that
min max `η (π π , λ) = max min `η (π π , λ). (5)
π ∈ΠN N
a,b λ∈∆+ λ∈∆N N
+ π ∈Πa,b
Now we consider the dual problem of minπ ∈ΠN `η (π π , λ). First, we add a redundant
a,b
k
P
constraint k,i,j πi,j = 1 and consider the dual of
min π , λ).
`η (π (6)
π ∈ΠN k =1
P
a,b , k,i,j πi,j
The reason for adding this redundant constraint is to guarantee that the dual objective
function is Lipschitz smooth. It is easy to verify that the dual problem of (6) is given by
N
!! !!
X X X
k k > k > k >
max P min λk hπ , C i − ηH(ππ) + f a−r π +g b−c (π ) ,
f,g k =1,
πi,j
k,i,j k=1 k k
n×n N
π ∈(R+ )
(7)
k ). It is noted that problem (7)
P
π) =
where f and g are the dual variables and H(π k H(π
admits the following solution:
ζ k (f, g, λ)
π k (f, g, λ) = P k
, ∀k ∈ [N ], (8)
k kζ (f, g, λ)k1
where
f 1> > k
k n + 1n g − λ k C
ζ (f, g, λ) = exp , ∀k ∈ [N ]. (9)
η
By plugging (8) into (7), we obtain the following dual problem of (6):
N
!
X
max
n n
hf, ai + hg, bi − η log kζ k (f, g, λ)k1 − η. (10)
f ∈R , g∈R
k=1
4
On the Convergence of PAM for Equitable and Optimal Transport
Plugging (10) into (5), we know that the entropy regularized EOT problem (4) is equivalent
to a pure maximization problem:
N
!
X
max F (f, g, λ) := hf, ai + hg, bi − η log kζ k (f, g, λ)k1 − η. (11)
f ∈Rn , g∈Rn , λ∈∆N
+ k=1
Function F (f, g, λ) is a smooth concave function with three block variables (f, g, λ). We
use (f ∗ , g ∗ , λ∗ ) to denote an optimal solution of (11), and we denote F ∗ = F (f ∗ , g ∗ , λ∗ ).
The PAM algorithm proposed in (Scetbon et al., 2021) is essentially a block coordinate
descent (BCD) algorithm for solving (11). More specifically, the PAM updates the three
block variables by the following scheme:
Each iteration of PAM consists of two exact maximization steps followed by one projected
gradient step. Importantly, the two exact maximization problems (12a)-(12b) have numer-
ous optimal solutions, and we choose to use the following ones:
a
f t+1 = f t + η log P , (13)
N k (f t , g t , λt )
r k=1 ζ
b
g t+1 = g t + η log P . (14)
N k t+1 , g t , λt )
c k=1 ζ (f
However, we need to point out that the PAM (12) only returns the dual variables (f t , g t , λt ).
One can compute the primal variable π using (8), but it is not necessarily a feasible solution.
That is, π computed from (8) does not satisfy π ∈ ΠN a,b . How to obtain an optimal primal
solution from the dual variables was not discussed in (Scetbon et al., 2021). For the OT
problem, i.e., N = 1, a rounding procedure for returning a feasible primal solution has been
proposed in (Altschuler et al., 2017). However, this rounding procedure cannot be applied
to the EOT problem directly. In the next section, we propose a new rounding procedure
for returning a primal solution based on the dual solution (f t , g t , λt ). This new rounding
procedure involves a dedicated way to compute the margins.
5
Huang and Ma and Lai
The details
PN of this procedure are given below. First, we set ak = r(π k ), which immediately
implies k=1 a = a. We then construct bk such that the following properties hold (these
k
(i) bk ≥ 0;
PN k
(ii) k=1 b = b;
Pn k
Pn k
(iii) i=1 ai = j=1 bj , ∀k ∈ [N ];
(iv) For any fixed j ∈ [n], the quantities bkj − [c(π k )]j have the same sign for all k ∈ [N ].
That is, for any k and k 0 , we have
0 0
(bkj − [c(π k )]j ) · (bkj − [c(π k )]j ) ≥ 0, (17)
which provides the following identity that is useful in our convergence analysis later:
N
X
kbk − c(π k )k1
k=1
XN Xn n X
X N
= |bkj k
− [c(π )]j | = (bkj − [c(π k )]j ) (18)
k=1 j=1 j=1 k=1
n N N
" !# !
X X X
k k
= bj − c π = b−c π .
j=1 k=1 j k=1 1
The procedure on constructing (bk )k∈[N ] satisfying these four properties is provided in Ap-
pendix ??.
After (ak , bk )k∈[N ] are constructed from (16) with π = π (f T , g T −1 , λT −1 ), we adopt the
rounding procedure proposed in (Altschuler et al., 2017) to output a primal feasible solution
(π̂ k )k∈[N ] . The rounding procedure is described in Algorithm 2.
With this new procedure for rounding and computing the margins ak , bk , we now for-
mally describe our PAM algorithm in Algorithm 1. Note that the algorithm is terminated
when the following criteria are met:
6
On the Convergence of PAM for Equitable and Optimal Transport
Algorithm 2 Round(π, a, b)
1: Input: π ∈ Rn×n , a ∈ Rn+ , b ∈ Rn+ .
ai
2: X = Diag (x) with xi = r(π) i
∧1
3: 0
π = Xπ
b
4: Y = Diag (y) with yj = c(πj0 )j ∧ 1
5: π 00 = π 0 Y
6: erra = a − r(π 00 ), errb = b − c(π 00 )
7: Output: π 00 + erra errb> /kerra k1 .
where Xi ⊂ Rdi and J is convex and differentiable. The BCD method for solving (21)
iterates as follows:
xt+1
i = argmin J(xt+1 t+1 t+1 t t
1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xm ), (22)
xi ∈Xi
7
Huang and Ma and Lai
and it assumes that these subproblems are easy to solve. The BCGD method for solving
(21) iterates as follows:
1
xt+1
i = argmin h∇xi J(xt+1 t+1 t t t
1 , . . . , xi−1 , xi , xi+1 , . . . , xm ), xi − xi i + kxi − xti k22 , (23)
xi ∈Xi 2τ
where τ > 0 is the step size. The PAM (12) is a hybrid of BCD (22) and BCGD (23), in
the sense that some block variables are updated by exactly solving a maximization problem
(the f and g steps), and some other block variables are updated by taking a gradient step
(the λ step). Though this hybrid idea has been studied in the literature (Hong et al., 2017;
Xu and Yin, 2013), their convergence analysis requires the blocks corresponding to exact
minimization to be strongly convex. However, in our problem (11), the negative of the
objective function is merely convex. Hence we need to develop new convergence proofs to
analyze the convergence of PAM (Algorithm 1). How to extend our convergence results of
PAM (Algorithm 1) to more general settings is a very interesting topic for future study.
Note that the left hand side of the inequality is the duality gap of (2).
Since (13) and (14) renormalize the row sum and column sum of k ζ k (f, g, λ) to be a and
P
b, we immediately have
N
X N
X
kζ k (f t+1 , g t , λt )k1 = 1, kζ k (f t+1 , g t+1 , λt )k1 = 1, ∀t, (25)
k=1 k=1
8
On the Convergence of PAM for Equitable and Optimal Transport
The following corollary follows (Scetbon et al., 2021)[Proposition 12] and shows that
∇λ F is Lipschitz continuous.
Corollary 4 For any f, g ∈ Rn and λ1 , λ2 ∈ ∆N
+ , the following inequality holds
Lemma 6 Let {f t , g t , λt } be generated by PAM (Algorithm 1). The following equality holds.
N
X
kπ k (f t+1 , g t+1 , λt ) − π k (f t+1 , g t , λt )k1 = ct − b 1
, ∀t.
k
9
Huang and Ma and Lai
Proof.
First, (30a) is a direct consequence of (12a).
Next, we prove (30b). We have
F (f t+1 , g t+1 , λt ) − F (f t+1 , g t , λt )
N N
! !
X X
t+1 t k t+1 t+1 t k t+1 t t
= hg − g , bi − η log kζ (f ,g , λ )k1 + η log kζ (f , g , λ )k1
k=1 k=1
n
t+1 t
X η t
= hg − g , bi = η bj log(bj /ctj ) = ηK(b||ct ) ≥ kc − bk21 ,
2
j=1
where K(x||y) denotes the KL divergence of x and y, the second equality is due to (25), the
third equality is due to (14), and the last inequality follows the Pinsker’s inequality.
Finally, we prove (30c). From the optimality condition of (12c), we know that there
exists
h(λt+1 ) ∈ ∂I∆+ (λt+1 ) (31)
N
such that
1
∇λ F (f t+1 , g t+1 , λt ) − (λt+1 − λt ) − h(λt+1 ) = 0. (32)
τ
From (28) we have
F (f t+1 , g t+1 , λt+1 ) − F (f t+1 , g t+1 , λt )
c2∞ t+1
≥h∇λ F (f t+1 , g t+1 , λt ), λt+1 − λt i −
kλ − λ t k2
2η
1 c2
=h (λt+1 − λt ) + h(λt+1 ), λt+1 − λt i − ∞ kλt+1 − λt k2
τ 2η
1 c2
≥h (λt+1 − λt ), λt+1 − λt i − ∞ kλt+1 − λt k2
τ 2η
2 t+1 t 2
=c∞ kλ − λ k /(2η),
10
On the Convergence of PAM for Equitable and Optimal Transport
where the first equality is due to (32), the second inequality is due to (31), and the last
equality is due to the definition of τ in (20).
1
hλ − λt+1 , (λt+1 − λt ) − ∇λ F (f t+1 , g t+1 , λt )i ≥ 0, ∀λ ∈ ∆+
N,
(34)
τ
which implies that
where the equality is due to (24c), and the last inequality is due to Lemma 6. Finally, we
have
λt − λ, −∇λ F (f t+1 , g t , λt )
= λt − λt+1 , −∇λ F (f t+1 , g t+1 , λt ) + λt+1 − λ, −∇λ F (f t+1 , g t+1 , λt ) +
(37)
λt − λ, ∇λ F (f t+1 , g t+1 , λt ) − ∇λ F (f t+1 , g t , λt )
≤kλt − λt+1 k2 · k∇λ F (f t+1 , g t+1 , λt )k2 + 2c2∞ kλt+1 − λt k2 /η + c∞ kct − bk1 ,
where the first inequality is due to (35) and (36). From (24c) we have k∇λ F (f t+1 , g t+1 , λt )k2 ≤
c∞ , which, combined with (37) and the fact that η ≤ c∞ , yields the desired result.
11
Huang and Ma and Lai
Proof. Denote ut = (maxj gjt + minj gjt )/2, u∗ = (maxj gj∗ + minj gj∗ )/2. From (26) we get
n
X n
X
t
h1, c − bi = ai − bj = 0,
i=1 j=1
g t − g ∗ , ct − b = (g t − ut 1) − (g ∗ − u∗ 1), ct − b
(38)
≤ kg t − ut 1k∞ + kg ∗ − u∗ 1k∞ ct − b ≤ (c∞ − ηι) ct − b
1 1
,
where the last inequality is due to Corollary 5. Now we set λ = λ∗ in (33), and we obtain
F̃ (f t+1 , g t , λt ) = F (f ∗ , g ∗ , λ∗ ) − F (f t+1 , g t , λt )
≤ hf t+1 − f ∗ , r( N k t+1 , g t , λt )) − ai + g t − g ∗ , ct − b
P
k=1 π (f
+ λt − λ∗ , −∇λ F (f t+1 , g t , λt )
≤ (2c∞ − ηι)kct − bk1 + 3c2∞ kλt+1 − λt k2 /η,
where the last inequality follows from (15), (26), (38) and (39).
The next lemma shows that the suboptimality gap F̃ (f, g, λ) can be bounded by O(1/t).
12
On the Convergence of PAM for Equitable and Optimal Transport
Therefore, we have
F̃ (f t+1 , g t , λt ) − F̃ (f t+1 , g t+1 , λt+1 )
η t 2 2
≤− c − b 1 − c2∞ λt+1 − λt 2 /(2η)
2
η
≤ − γ0 · ((2c∞ − ηι)kct − bk1 )2 + (3c2∞ kλt+1 − λt k2 /η)2
2 (41)
η 2
≤ − γ0 (2c∞ − ηι)kct − bk1 + 3c2∞ kλt+1 − λt k2 /η
4
η
≤ − γ0 F̃ (f t+1 , g t , λt )2 ,
4
where the last inequality is from Lemma 9. Dividing both sides of (41) by F̃ (f t+1 , g t+1 , λt+1 )·
F̃ (f t+1 , g t , λt ), we have
1 1 η F̃ (f t+1 , g t , λt )
≥ + γ0 ·
F̃ (f t+1 , g t+1 , λt+1 ) F̃ (f t+1 , g t , λt ) 4 F̃ (f t+1 , g t+1 , λt+1 )
(42)
1 η 1 η
≥ + γ0 ≥ + γ0 ,
F̃ (f , g , λ ) 4
t+1 t t F̃ (f , g , λ ) 4
t t t
where the second inequality is due to (41) and the last inequality is from (30a). Summing
(42) from 0 to t leads to
1 1 η(t + 1)
≥ + γ0 ,
F̃ (f t+1 , g t+1 , λt+1 ) F̃ (f 0 , g 0 , λ0 ) 4
which implies the desired result.
The next lemma gives sufficient conditions for the PAM algorithm to return an -optimal
solution to the original EOT problem (2).
π̂ k = Round(π k (f T , g T −1 , λT −1 ), ak , bk ), ∀k ∈ [N ], λ̂ = λT −1 ,
13
Huang and Ma and Lai
N
( )
X
π ) := argmax `(π
λ̄(π π , λ) = λk hπ k , C k i . (45)
λ∈∆+
N k=1
Note that the term on the left hand side of (44a) can be rewritten as
` π̂ π ) − ` π̂
π , λ̄(π̂ π , λ̂
π )) − `(π̃
π , λ̄(π̂
= (`(π̂ π , λ̄(π̃
π ))) + ([`(π̃ π )) − ηH(π̃
π , λ̄(π̃ π ∗ , λ∗ ) − ηH(π
π )] − [`(π π ∗ )])
| {z } | {z }
(I) (II) (46)
π ∗ , λ∗ ) − ηH(π
+ ([`(π π ∗ )] − [`(π̃
π , λ̂) − ηH(π̃ π , λ̂) − `(π̂
π )]) + (`(π̃ π , λ̂)) .
| {z } | {z }
(III) (IV )
Since (1) and (2) are equivalent, we have the following for the term (I):
∗ ∗ ∗ ∗
X X
(I) = π )]k hπ̂ k , C k i −
[λ̄(π̂ π )]k hπ̃ k , C k i = hπ̂ k̂ , C k̂ i − hπ̃ k̃ , C k̃ i
[λ̄(π̃
k k
k̂∗ k̂∗ k̂∗ k̂∗ ∗ ∗
X
≤hπ̂ , C i − hπ̃ , C i ≤ kπ̂ k̂ − π̃ k̂ k1 kC k k∞ ≤ c∞ kπ̂ k − π̃ k k1 (48)
k
X
≤2c∞ (kr(π̃ k ) − ak k1 + kc(π̃ k ) − bk k1 ) = 2c∞ kcT −1 − bk1 ,
k
where the first inequality follows from the definition of k̃ ∗ in (47), the fourth inequality is
from Lemma 3, and the last equality follows from (15) and (17).
For the term (II), recall that
!
X
k k k f T 1> + 1(g T −1 )> − λTk −1 C k
π) = −
H(π πi,j (log πi,j − 1) and π̃ = exp
η
k,i,j
14
On the Convergence of PAM for Equitable and Optimal Transport
k
D E
T −1 T −1 T −1 T −1 T −1
π ) − λ̂, ∇λ F (f , g
= λ̄(π̃ T
,λ ) + hg ,c − bi + F (f , g T
, λT −1 ) − F ∗
D E
π ) − λ̂, ∇λ F (f T , g T −1 , λT −1 ) + hg T −1 , cT −1 − bi
≤ λ̄(π̃
≤c∞ kcT −1 − bk1 + 3c2∞ kλT − λT −1 k2 /η + hg T −1 − uT −1 1, cT −1 − bi
≤c∞ kcT −1 − bk1 + 3c2∞ kλT − λT −1 k2 /η + kg T −1 − uT −1 1k∞ kcT −1 − bk1
≤(3c∞ /2 − ηι/2)kcT −1 − bk1 + 3c2∞ kλT − λT −1 k2 /η,
(49)
where the third equality uses (26), (24c) and (15), the second inequality follows from Lemma
π ) and t = T − 1, and the last inequality uses Corollary 5.
8 by setting λ = λ̄(π̃
For the term (III), we have
where the first inequality uses |λ̂k | ≤ 1, the second inequality uses Lemma 3 and (17).
Plugging (48) - (51) into (46), and using (43), we obtain (44a).
Now we prove (44b). For ease of presentation, we denote
π (λ) := argmin `(π
π̄ π , λ). (52)
π ∈ΠN
a,b
15
Huang and Ma and Lai
X X X
c (π̄(λ̂))k − b̃k = c (π̄(λ̂))k − b̃k = kb − b̃k1 = kb − cT −1 k1 , (53)
1
k k k 1
Now, note that the left hand side of (44b) can be arranged into three parts:
We now upper bound these three terms. First note that the term (V) is the same as the
term (IV) and thus has the same upper bound in (51). Since 0 ≤ H(ππ ) ≤ log(n2 N ) + 1,
from (54) we have that
X X 1
(V I) = λ̂k hπ̃ k , C k i − λ̂k hπ 0k , C k i ≤ η H(π̃ π0 ) ≤ ,
π ) − H(π (56)
3
k k
where the second inequality follows from Lemma 3, the second equality uses (53) and the
fact that r((π̄(λ̂))k ) = ãk due to the property of the Margins procedure in (16).
Finally, plugging (51) (note (V)=(IV)), (56) and (57) into (55), and using (43a) and
noting ι < 0, we obtain (44b). This completes the proof.
16
On the Convergence of PAM for Equitable and Optimal Transport
36 648c2∞ 28
= O c2∞ −2 ,
T =5+ √ 0 + + (58)
η γ0 η ηγ0
n o
where γ0 = min 1
−ηι)
(2c∞
1
2 , 9c2 is a constant and we know γ0 = O(c−2∞ ). The output pair
∞
of Algorithm 1 is an -optimal solution of the EOT problem (2).
Proof. According to Lemma 11, we only need to show that (43) holds after T iterations
as defined in (58). To guarantee (43a) and (43b), we follow the ideas of Dvurechensky et
al. (Dvurechensky et al., 2018) and construct a switching process. We first reduce F̃ from
F̃ (f 0 , g 0 , λ0 ) to a constant s by running t1 steps. In this process, Lemma 10 indicates
4 4
t1 ≤ 1 + − . (59)
ηγ0 s ηγ0 F̃ (f 0 , g 0 , λ0 )
Secondly, starting from s, we continue running the algorithm, and assume that there are t2
iterations in which (43a) fails. By (30b) we have
72s
t2 ≤ 1 + .
η02
Therefore, we know that the total iteration number that (43a) fails is upper bounded by
72s 4 4
T1 = t1 + t2 ≤ 2 + 02
+ −
η ηγ0 s ηγ0 F̃ (f 0 , g 0 , λ0 )
√
iterations. By choosing s = 0 /(6 γ0 ), we know that
0
2 + η√12 √24 4 √36
(
γ0 0 + η γ0 0 − ηγ0 F̃ (f 0 ,g 0 ,λ0 ) ≤ 2 + η γ0 0 if F̃ (f 0 , g 0 , λ0 ) ≥ √
6 γ0
T1 ≤
2 + η√12 √24 4
γ0 0 + η γ0 0 − ηγ F̃ (f 0 ,g 0 ,λ0 ) ≤ 2 +
√12
η γ0 0 otherwise.
0
648sc2∞
t3 ≤ 1 + ,
η2
where we apply (30b). By choosing s = , we know that the total iteration number that
(43b) fails is upper bounded by
648c2∞ 4 4
T2 = t1 + t3 ≤ 2 + + −
η ηγ0 ηγ0 F̃ (f 0 , g 0 , λ0 )
17
Huang and Ma and Lai
F̃ (f T3 −1 , g T3 −1 , λT3 −1 ) ≤ /6
after
24
T3 = 1 +
ηγ0
iterations. From (30a) we know that after T3 iterations, we have
i.e., (43c) holds. Combining the above discussions, we know that after T = T1 + T2 + T3 + 1
iterations, there must exist at least one iteration such that (43) holds, and thus the output
of PAM is an -optimal solution to the original EOT problem (2).
Remark 13 Though our complexity result matches the rate of the Sinkhorn’s algorithm
in terms of the dependence on , we argue that EOT is a more difficult problem than the
entropic regularized OT, and thus our results are promising. First, EOT is a saddle-point
problem while entropic regularized OT is a minimization problem. Second, the extra vari-
able λ in EOT requires a gradient projection step in the PAM algorithm, which introduces
significant difficulty to the analysis of the convergence behavior. While for Sinkhorn’s algo-
rithm it is much easier to analyze, because the dual is unconstrained. Third, since there are
multiple agents in EOT, it is more difficult to design the rounding procedure to obtain the
primal solution. We also note that the dependence of c∞ in our result and in the result of
Sinkhorn’s algorithm (Dvurechensky et al., 2018) are both c2∞ .
where L is the Lipschitz constant of ∇F . Note that APGA treats the problem (11) as a
generic convex and smooth problem, and does not take advantage of the special structures
of (11). In particular, f and g are updated using gradient ascent steps. This is in contrast
1. In Lemma 4 we proved that ∇λ F is Lipschitz continuous. The Lipschitz continuity of ∇f F and ∇g F
can be proved similarly.
18
On the Convergence of PAM for Equitable and Optimal Transport
Here θ ∈ (0, 1) is a given parameter for the extrapolation step. We see that steps (62a)-
(62b) are the same as (13)-(14) and they are solutions to the exact maximizations (12a)-
(12b). Steps (62c)-(62c) give extrapolation to the gradient step for λ, similar to Nesterov’s
accelerated gradient method. Note that PAME (62) solves the dual entropy-regularized
EOT problem (11). We use the same rounding procedure in Section 2.1 to generate a primal
solution to the original EOT problem (1). The complete PAME algorithm is described in
19
Huang and Ma and Lai
Algorithm 3. Note that the algorithm is terminated when the following criteria are met:
1 1
E(f, g, λ1 , λ2 ) = F (f, g, λ1 ) − kλ − λ2 k22 . (64)
2τ
2θ − θ2 t 1
E(f t+1 , g t+1 , λt+1 , λt ) − E(f t+1 , g t+1 , λt , λt−1 ) ≥
kλ − λt−1 k22 + kλt+1 − y t+1 k22 .
2τ 4τ
(66)
Note that since θ ∈ (0, 1), the right hand side of (66) is always nonnegative.
Proof. From the optimality condition of (62d) we know that, there exists h(λt+1 ) ∈
∂I∆+ (λt+1 ) such that
N
1
∇λ F (f t+1 , g t+1 , y t+1 ) − (λt+1 − y t+1 ) − h(λt+1 ) = 0. (67)
τ
20
On the Convergence of PAM for Equitable and Optimal Transport
F (f t+1 , g t+1 , y t+1 ) + h∇λ F (f t+1 , g t+1 , y t+1 ), λt+1 − y t+1 i − c2∞ kλt+1 − y t+1 k22 /(2η)
= h∇λ F (f t+1 , g t+1 , y t+1 ), λt − λt+1 i + c2∞ kλt+1 − y t+1 k22 /(2η)
≤ h∇λ F (f t+1 , g t+1 , y t+1 ) − h(λt+1 ), λt − λt+1 i + c2∞ kλt+1 − y t+1 k22 /(2η)
1 1
= hλt+1 − y t+1 , λt − λt+1 i + kλt+1 − y t+1 k2
τ 4τ
1 t+1 t+1 t t+1 t+1 1
= hλ − y ,λ − y +y − λt+1 i + kλt+1 − y t+1 k2
τ 4τ
1 3
= − hλt+1 − y t+1 , y t+1 − λt i − kλt+1 − y t+1 k2 ,
τ 4τ
(70)
where the first inequality is from the concavity of F with respect to λ and (28), the second
inequality is due to (68), the second equality is due to (67). Combining (69) and (70) leads
to
1 t+1
E(f t+1 , g t+1 , λt+1 , λt ) = F (f t+1 , g t+1 , λt+1 ) − kλ − λt k22
2τ
1 3
≥ F (f t+1 , g t+1 , λt ) + hλt+1 − y t+1 , y t+1 − λt i + kλt+1 − y t+1 k2
τ 4τ
(1 − θ)2 t 1 1
− kλ − λt−1 k22 − hλt+1 − y t+1 , y t+1 − λt i − kλt+1 − y t+1 k22
2τ τ 2τ
1 2θ − θ 2 1
= F (f t+1 , g t+1 , λt ) − kλt − λt−1 k22 + kλt − λt−1 k22 + kλt+1 − y t+1 k2
2τ 2τ 4τ
2θ − θ 2 1
= E(f t+1 , g t+1 , λt , λt−1 ) + kλt − λt−1 k22 + kλt+1 − y t+1 k2 ,
2τ 4τ
which completes the proof.
Now we define the following function Ẽ, and later we will prove that Ẽ(f t , g t , λt , λt−1 )
can be upper bounded by O(1/t).
Ẽ(f, g, λ1 , λ2 ) = F (f ∗ , g ∗ , λ∗ ) − E(f, g, λ1 , λ2 ).
The next lemma is useful for obtaining the upper bound for Ẽ(f t , g t , λt , λt−1 ). Moreover,
it is noted that Ẽ(f, g, λ1 , λ2 ) ≥ 0, ∀f, g, λ1 , λ2 , and Ẽ(f, g, λ, λ) = F̃ (f, g, λ), ∀f, g, λ.
21
Huang and Ma and Lai
Proof. From the optimality condition of (62d), we have the following inequality:
t+1 1
λ − λ , (λ t+1
− y ) − ∇λ F (f , g , y ) ≥ 0, ∀λ ∈ ∆+
t+1 t+1 t+1 t+1
N. (72)
τ
The left hand side of (71) can be rearranged to three terms.
hλ − λt , ∇λ F (f t+1 , g t , λt )i
= hλt − λ, −∇λ F (f t+1 , g t+1 , y t+1 )i + hλt − λ, ∇λ F (f t+1 , g t+1 , y t+1 ) − ∇λ F (f t+1 , g t+1 , λt )i
| {z } | {z }
(I) (II)
t t+1 t+1 t t+1 t t
+ hλ − λ, ∇λ F (f ,g , λ ) − ∇λ F (f , g , λ )i .
| {z }
(III)
(73)
We now bound these three terms one by one. To bound the term (I), we first note that
from (24c) and (15), we have
k∇λ F (f t+1 , g t+1 , λt )k2 ≤ c∞ ≤ c2∞ /η, (74)
where the second inequality is due to the definition of η (61). Now we can bound the term
(I) as follows:
(I) = λt − λt+1 , −∇λ F (f t+1 , g t+1 , λt ) + λt − λt+1 , ∇λ F (f t+1 , g t+1 , λt ) − ∇λ F (f t+1 , g t+1 , y t+1 )
+ λt+1 − λ, −∇λ F (f t+1 , g t+1 , y t+1 )
c2∞ t
≤ λt − λt+1 2
· ∇λ F (f t+1 , g t+1 , λt ) 2
+ λ − λt+1 2
· λt − y t+1 2
η
1 t+1
+ λ − λ 2 · λt+1 − y t+1 2
τ
≤3c2∞ kλt − λt+1 k2 /η + 4c2∞ kλt+1 − y t+1 k2 /η,
(75)
where the first inequality uses Lemma 4 and (72), the second inequality uses (74) and the
facts that λt − y t+1 2 ≤ 2 and λt − λ 2 ≤ 2.
For the term (II), Lemma 4 yields:
(II) ≤2 ∇λ F (f t+1 , g t+1 , y t+1 ) − ∇λ F (f t+1 , g t+1 , λt ) 2
≤ 2c2∞ y t+1 − λt 2
/η. (76)
For the term (III), it can be bounded as:
N
X
(III) = (λtk − λk ) · hπ k (f t+1 , g t+1 , λt ) − π k (f t+1 , g t , λt ), C k i
k=1
(77)
N
X
≤ kπ k (f t+1 , g t+1 , λt ) − π k (f t+1 , g t , λt )k1 kC k k∞ ≤ c∞ kct − bk1 ,
k=1
22
On the Convergence of PAM for Equitable and Optimal Transport
where the last inequality is due to Lemma (6). Plugging (75) - (77) into (73) and applying
the triangle inequality, we obtain
Ẽ(f t+1 , g t , λt , λt−1 ) ≤ (2c∞ −ηι)kct −bk1 +7c2∞ kλt+1 −y t+1 k2 /η +(7−5θ)c2∞ kλt −λt−1 k2 /η.
where in the first inequality we have used (26), and the second inequality follows from (38)
and setting λ = λ∗ in (71). From (78) we immediately get
1 t
Ẽ(f t+1 , g t , λt , λt−1 ) = F̃ (f t+1 , g t , λt ) +
kλ − λt−1 k22
2τ
≤ c2∞ kλt − λt−1 k22 /η + (2c∞ − ηι)kct − bk1 + 7c2∞ kλt+1 − y t+1 k2 /η + 5(1 − θ)c2∞ kλt − λt−1 k2 /η
≤ 2c2∞ kλt − λt−1 k2 /η + (2c∞ − ηι)kct − bk1 + 7c2∞ kλt+1 − y t+1 k2 /η + 5(1 − θ)c2∞ kλt − λt−1 k2 /η,
where the second inequality is due to kλt − λt−1 k2 ≤ 2. This completes the proof.
2(2θ − θ2 )
1 1
γ1 = min 2 , (7 − 5θ)2 c2 , 49c2 (79)
(2c∞ − ηι) ∞ ∞
is a constant.
23
Huang and Ma and Lai
η t 2θ − θ2 t 2 1 2
≥ kc − bk21 + λ − λt−1 2
+ λt+1 − y t+1 2
,
2 2τ 4τ
which implies that
Ẽ(f t+1 , g t+1 , λt+1 , λt ) − Ẽ(f t+1 , g t , λt , λt−1 )
η c2 c2
≤ − kct − bk21 − (2θ − θ2 ) ∞ kλt − λt−1 k22 − ∞ kλt+1 − y t+1 k22
2 η 2η
η h 2 2 2 i
≤ − γ1 (2c∞ − ηι) ct − b 1 + (7 − 5θ)c2∞ kλt − λt−1 k2 /η + 7c2∞ kλt+1 − y t+1 k2 /η
2
η 2
≤ − γ1 (2c∞ − ηι) ct − b 1 + (7 − 5θ)c2∞ kλt − λt−1 k2 /η + 7c2∞ kλt+1 − y t+1 k2 /η
6
η
≤ − γ1 Ẽ(f t+1 , g t , λt , λt−1 )2 ,
6
(80)
where the last inequality applies Lemma 16. We then divide both sides of (80) by
Ẽ(f t+1 , g t+1 , λt+1 , λt ) · Ẽ(f t+1 , g t , λt , λt−1 ),
and we obtain
1 1 η Ẽ(f t+1 , g t , λt , λt−1 )
≥ + γ1 ·
Ẽ(f t+1 , g t+1 , λt+1 , λt ) Ẽ(f t+1 , g t , λt , λt−1 ) 6 Ẽ(f t+1 , g t+1 , λt+1 , λt )
(81)
1 η 1 η
≥ + γ1 ≥ + γ1 ,
Ẽ(f , g , λ , λ ) 6
t+1 t t t−1 Ẽ(f , g , λ , λ ) 6
t t t t−1
Similar to Lemma 11, the following lemma provides some sufficient conditions for the
PAME algorithm to return an -optimal solution to the original EOT problem (2).
24
On the Convergence of PAM for Equitable and Optimal Transport
π̂ k = Round(π k (f T , g T −1 , λT −1 ), ak , bk ), ∀k ∈ [N ], λ̂ = λT −1 ,
Proof. The proof is essentially the same as that of Lemma 11. More specifically, we again
need to show that the output of PAME (π̂ π , λ̂) satisfies (44). The proof of (44b) is exactly
the same as the proof of Lemma 11. The proof of (44a) only requires to develop a new
bound for D E
π ) − λ̂, ∇λ F (f T , g T −1 , λT −1 )
λ̄(π̃ (83)
that is used in (49). Other parts are again exactly the same as the ones in Lemma 11. The
new bound of (83) can be obtained by applying Lemma 15 with λ = λ̄(π̃ π ) and t = T − 1,
which yields
π ) − λ̂, ∇λ F (f T , g T −1 , λT −1 )i
hλ̄(π̃
(84)
≤ c∞ kcT −1 − bk1 + 5(1 − θ)c2∞ kλT −1 − λT −2 k2 /η + 7c2∞ kλT − y T k2 /η.
By combining (84) with (48)-(51), we can bound the left hand side of (44a) by
` π̂ π ) − `(π̂
π , λ̄(π̂ π , λ̂)
≤ (6c∞ − ηι) cT −1 − b 1
+ 5(1 − θ)c2∞ λT −1 − λT −2 2
/η + 7c2∞ kλT − y T k2 /η
+ F (f T , g T −1 , λT −1 ) − F ∗ (85)
1 1 1 1 1
≤ + + + = ,
6 12 12 6 2
where in the last inequality we have used all the sufficient conditions (82a)-(82d).
Proof. According to Lemma 18, we only need to show that (82) holds after T iterations as
defined in (86).
We follow the same idea as the proof of Theorem 12. First we reduce Ẽ(f t+1 , g t+1 , λt+1 , λt )
from Ẽ(f 0 , g 0 , λ0 , λ−1 ) = F̃ (f 0 , g 0 , λ0 ) to a constant s by running t1 steps. By Lemma 17,
we have
6 6
t1 ≤ 1 + − . (87)
ηγ1 s ηγ1 F̃ (f 0 , g 0 , λ0 )
25
Huang and Ma and Lai
Secondly, starting from s, we continue running the algorithm, and assume that there are t2
iteration in which (82a) fails. By (30b) we have
72s
t2 ≤ 1 + .
η02
Therefore, we know that the total iteration number that (82a) fails can be upper bounded
by
72s 6 6
T1 = t1 + t2 ≤ 2 + 02 + −
η ηγ1 s ηγ1 F̃ (f 0 , g 0 , λ0 )
0
iterations. By choosing s = 6 γ1 ,
√ we know that
By choosing s = , we have the total iteration numbers that (82b) and (82c) fail can be
respectively bounded by
and
3528c2∞ 6 6 3528c2∞ 6
T3 = t1 + t4 ≤ 2 + + − ≤2+ + .
η ηγ1 ηγ1 F̃ (f 0 , g 0 , λ0 ) η ηγ1
after
36
T4 = 1 +
ηγ1
iterations. From (88) we know that
F̃ (f T4 −1 , g T4 −1 , λT4 −1 ) ≤ /6,
26
On the Convergence of PAM for Equitable and Optimal Transport
Figure 1: Computational time comparison between PAM, PAME and APGA algorithms
on Gaussian distributions. Upper Left: N = 10, n = 100, η = 0.1, Upper Right:
N = 10, n = 500, η = 0.1, Bottom Left: N = 10, n = 100, η = 0.5, Bottom Right:
N = 5, n = 100, η = 0.1.
Remark 20 We are not able to analytically prove that PAME has an improved complexity
bound at this moment yet. The APGA proposed in (Scetbon et al., 2021) in fact has better
complexity than PAM and PAME. However, as demonstrated in (Scetbon et al., 2021) and
in our numerical experiments (Sections 6), APGA performs worse than PAM. We believe
the reason is that APGA takes gradient step for the variables f and g, while PAM exactly
minimizes the subproblems corresponding to these two variables. It is the exact minimization
step that led to the improvement. Developing a provably better algorithm is definitely an
important and interesting future direction.
6. Numerical Experiments
We compare the performance of PAME with PAM and APGA (60) (Scetbon et al., 2021)
on a synthetic dataset: the Gaussian distributions. We also conduct numerical comparison
on another synthetic dataset: the fragmented hypercube dataset. The results are included
in the following sections.
27
Huang and Ma and Lai
and
2 1 −0.2
N , (90)
2 −0.2 1
respectively. The base cost matrix C base is computed by Ci,j base = kx − y k2 . Assume we
i j 2
have N agents. The cost matrix of each agent can be obtained by adding Gaussian noise
sampled from N (0, 10) to each element of the base cost. For instance, for the k-th agent
with a cost matrix C k , we have Ci,j
k = |C base + N (0, 10)|.
i,j
We then set a = b = [1/n, ..., 1/n] for all experiments. For all algorithms, we set τ = c5η
2
∞
and we set θ = 0.1 for the PAME algorithm. We consider the EOT error as a measure of
optimality. The EOT error at iteration t is defined by
where `∗ is the approximated optimal value of EOT (2) obtained by running the PAM
algorithm for 20000 iterations. Figure 1 plots the EOT error against the execution time for
Gaussian distributions. We run each algorithm for 2000 iterations for different parameter
settings. In all cases, the PAME and PAM perform significantly better than APGA, and
PAME also shows significant improvement over PAM.
Figure 2 shows the optimal couplings obtained from the standard OT and EOT of two
Gaussian distributions under three different metrics: the Euclidean cost (k · k2 ), the square
Euclidean cost (k · k22 ) and the L1.5 1.5
1 norm (k · k1 ) respectively. We set n = 4, η = 0.05 and
generate samples independently according to (89). For the EOT problem, we consider three
agents with cost matrices computed by the three metrics mentioned above. Note that the
entropy regularized models lead to a dense transportation plan and Figure 2 only plots the
couplings with a probability larger than 10−3 . We see that all the agents have the same
total cost in the EOT model, and as expected, the cost is smaller than the other three OT
costs obtained by using the same metric. The sub figures in the first row imply that if we
split the workload to three parts evenly, then the three agents will each have costs 5.935/3,
2.158/3 and 5.030/3, which is not fair because they have different costs. But the EOT
model can indeed guarantee the fairness.
We further compare the computational time for Gaussian distributions. We generate
the data as previously and set the parameters as η = 0.5, τ = 5η/c2∞ . We stop all the
algorithms when the EOT error (91) is less than 10−4 . Tables 1 and 2 show the CPU time
(in seconds) for different (n, N ) pairs. The reported computational time is averaged over 5
runs. In Table 1, the APGA algorithm fails to reach an error of 10−4 in 500000 iterations
when n = 100 and n = 500. We conclude that the APGA algorithm converges much slower
than PAM and PAME algorithms. The PAME algorithm performs the best among all three
algorithms.
28
On the Convergence of PAM for Equitable and Optimal Transport
Figure 2: Optimal couplings of standard OT (first row) and EOT (second row). OT Square
Euclidean Cost: 5.935; OT Euclidean Cost: 2.158; OT L11.5 Cost: 5.030; EOT Cost: 0.906.
Table 1: CPU time (in seconds) comparison for Gaussian Distributions. Fixed N = 3.
In this section, we compare the performance of PAME with PAM and APGA (60) (Scetbon
et al., 2021) on the fragmented hypercube dataset.
29
Huang and Ma and Lai
Table 2: CPU time (in seconds) comparison for Gaussian Distributions. Fixed n = 50.
Algorithms N =2 N =3 N =5 N = 10 N = 20
PAM 0.180343 0.177209 1.021598 0.719909 0.903429
PAME 0.105775 0.091593 0.560785 0.385495 0.618381
APGA 70.959300 106.768840 169.213121 178.637889 212.697866
Figure 3: CPU time comparison between PAM, PAME and APGA algorithms on the Frag-
mented Hypercube dataset. Upper Left: N = 5, n = 100, η = 0.2, Upper Right:
N = 5, n = 500, η = 0.2, Bottom Left: N = 5, n = 100, η = 0.1, Bottom Right:
N = 10, n = 100, η = 0.2.
Figures 3 plots the EOT error versus the CPU time for Fragmented Hypercube dataset.
We run PAM for 20000 iterations to get an approximate optimal `∗ and run all algorithms for
2000 iterations for different parameter settings. In all cases, the PAME and PAM perform
significantly better than APGA, and PAME also shows significant improvement over PAM.
We then compare the computational time for Fragmented Hypercube dataset. We set
the parameters as η = 0.2, τ = 5η/c2∞ . We stop all the algorithms when the EOT error (91)
is less than 10−4 . Tables 3 and 4 show the CPU time (averaged over 5 runs) for different
(n, N ) pairs. We see that the PAME algorithm still performs the best among all three
30
On the Convergence of PAM for Equitable and Optimal Transport
algorithms. Note that in Table 3 the APGA algorithm fails to reach an error of 10−4 in
500000 iterations when n = 50, n = 100 and n = 500.
Table 3: CPU time (in seconds) comparison for Fragmented Hypercube. Fixed N = 3.
Table 4: CPU time (in seconds) comparison for Fragmented Hypercube. Fixed n = 20.
Algorithms N =2 N =3 N =5 N = 10 N = 20
PAM 0.003180 0.101363 0.172696 0.253926 0.231646
PAME 0.040302 0.068529 0.110080 0.150513 0.129156
APGA 1.007166 22.017804 13.801154 6.304324 3.749495
7. Conclusion
In this paper, we have provided the first convergence analysis of the PAM algorithm for solv-
ing the EOT problem. Specifically, we have shown that it takes at most O(−2 ) iterations
for the PAM algorithm to find an -saddle point. We have proposed a PAME algorithm
which incorporates the extrapolation technique to PAM. The PAME shows significant nu-
merical improvement over PAM. Results in this paper might shed lights on designing new
BCD type algorithms.
Acknowledgments
Research of Shiqian Ma was supported in part by Office of Naval Research (ONR) grant
N00014-24-1-2705, National Science Foundation (NSF) grants DMS-2243650, CCF-2308597,
CCF-2311275 and ECCS-2326591, UC Davis CeDAR (Center for Data Science and Artificial
Intelligence Research) Innovative Data Science Seed Funding Program, and a startup fund
from Rice University. Research of Lifeng Lai was supported by National Science Foundation
under grants CCF-2112504 and CCF-2232907.
References
Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. Near-linear time approxi-
mation algorithms for optimal transport via Sinkhorn iteration. In Advances in neural
information processing systems, pages 1964–1974, 2017.
31
Huang and Ma and Lai
David Alvarez-Melis, Stefanie Jegelka, and Tommi S Jaakkola. Towards optimal transport
with global invariances. In The 22nd International Conference on Artificial Intelligence
and Statistics, pages 1870–1879. PMLR, 2019.
Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial
networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
Amir Beck and Luba Tetruashvili. On the convergence of block coordinate descent type
methods. SIAM journal on Optimization, 23(4):2037–2060, 2013.
Marc G Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on rein-
forcement learning. In International Conference on Machine Learning, pages 449–458,
2017.
Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré.
Iterative Bregman projections for regularized transportation problems. SIAM Journal on
Scientific Computing, 37(2):A1111–A1138, 2015.
Felix Brandt, Vincent Conitzer, Ulle Endriss, Jérôme Lang, and Ariel D Procaccia. Hand-
book of computational social choice. Cambridge University Press, 2016.
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Ad-
vances in neural information processing systems, pages 2292–2300, 2013.
Jelena Diakonikolas and Lorenzo Orecchia. Alternating randomized block coordinate de-
scent. In International Conference on Machine Learning, pages 1224–1232. PMLR, 2018.
Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Computational optimal
transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s al-
gorithm. In International Conference on Machine Learning, pages 1367–1376. PMLR,
2018.
Aude Genevay, Lénaic Chizat, Francis Bach, Marco Cuturi, and Gabriel Peyré. Sample
complexity of sinkhorn divergences. In The 22nd International Conference on Artificial
Intelligence and Statistics, pages 1574–1583, 2019.
M. Hong, X. Wang, M. Razaviyayn, and Z.-Q. Luo. Iteration complexity analysis of block
coordinate descent method. Mathematical Programming Series A, 163(1):85–114, 2017.
Minhui Huang, Shiqian Ma, and Lifeng Lai. A Riemannian block coordinate descent method
for computing the projection robust Wasserstein distance. In Proceedings of the 38th
International Conference on Machine Learning, volume 139, pages 4446–4455. PMLR,
2021a.
Minhui Huang, Shiqian Ma, and Lifeng Lai. Projection robust Wasserstein barycenters.
In Proceedings of the 38th International Conference on Machine Learning, volume 139,
pages 4456–4465. PMLR, 2021b.
Chi Jin, Praneeth Netrapalli, and Michael I Jordan. Accelerated gradient descent escapes
saddle points faster than gradient descent. In Conference On Learning Theory, pages
1042–1085. PMLR, 2018.
32
On the Convergence of PAM for Equitable and Optimal Transport
Tianyi Lin, Chenyou Fan, Nhat Ho, Marco Cuturi, and Michael Jordan. Projection robust
Wasserstein distance and Riemannian optimization. In NeurIPS, volume 33, 2020.
Erika Mackin and Lirong Xia. Allocating indivisible items in categorized domains. arXiv
preprint arXiv:1504.05932, 2015.
Hervé Moulin. Fair division and collective welfare. MIT press, 2003.
Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron Van den Oord, Sergey Levine, and Pierre
Sermanet. Wasserstein dependency measure for representation learning. In Advances in
Neural Information Processing Systems, pages 15604–15614, 2019.
François-Pierre Paty and Marco Cuturi. Subspace robust Wasserstein distances. In Inter-
national Conference on Machine Learning, pages 5072–5081, 2019.
Meyer Scetbon, Laurent Meunier, Jamal Atif, and Marco Cuturi. Equitable and optimal
transport with multiple agents. In International Conference on Artificial Intelligence and
Statistics, pages 2035–2043. PMLR, 2021.
R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matri-
ces. Pacific J. Math., 21:343–348, 1967.
Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums.
The American Mathematical Monthly, 74(4):402–405, 1967.
Ruoyu Sun and Mingyi Hong. Improved iteration complexity bounds of cyclic block coordi-
nate descent for convex problems. Advances in Neural Information Processing Systems,
28, 2015.
Robert E Tarjan. Dynamic trees as search trees via Euler tours, applied to the network
simplex algorithm. Mathematical Programming, 78(2):169–177, 1997.
Y. Xu and W. Yin. A block coordinate descent method for regularized multi-convex op-
timization with applications to nonnegative tensor factorization and completion. SIAM
Journal on Imaging Sciences, 63(3):1758–1789, 2013.
33