0% found this document useful (0 votes)

35 views24 pages

Rectified Flow

rectified flow model review

Uploaded by

pkufankobe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views24 pages

Rectified Flow

rectified flow model review

Uploaded by

pkufankobe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Rectified Flow: A Marginal Preserving Approach to

Optimal Transport
arXiv:2209.14577v1 [stat.ML] 29 Sep 2022

Qiang Liu
University of Texas at Austin
[email protected]

Abstract
We present a flow-based approach to the optimal transport (OT) problem between two continuous
distributions π0 , π1 on Rd , of minimizing a transport cost E[c(X1 −X0 )] in the set of couplings (X0 , X1 )
whose marginal distributions on X0 , X1 equals π0 , π1 , respectively, where c is a cost function. Our
method iteratively constructs a sequence of neural ordinary differentiable equations (ODE), each learned
by solving a simple unconstrained regression problem, which monotonically reduce the transport cost
while automatically preserving the marginal constraints. This yields a monotonic interior approach that
traverses inside the set of valid couplings to decrease the transport cost, which distinguishes itself from
most existing approaches that enforce the coupling constraints from the outside. The main idea of the
method draws from rectified flow [LGL22], a recent approach that simultaneously decreases the whole
family of transport costs induced by convex functions c (and is hence multi-objective in nature), but is
not tailored to minimize a specific transport cost. Our method is a single-object variant of rectified flow
that guarantees to solve the OT problem for a fixed, user-specified convex cost function c.

1 Introduction
The Monge–Kantorovich (MK) optimal transport (OT) problem concerns finding an optimal coupling be-
tween two distributions π0 , π1 :

inf E[c(X1 − X0 )], s.t. Law(X0 ) = π0 , Law(X1 ) = π1 , (1)

(X0 ,X1 )

where we seek to find (the law of) an optimal coupling (X0 , X1 ) of π0 and π1 , for which marginal laws
of X0 , X1 equal π0 , π1 , respectively, to minimize E[c(X1 − X0 )], called the c-transport cost, for a cost
function c. Theories, algorithms, and applications of optimal transport have attracted a vast literature; see,
for example, the monographs of [Vil21, Vil09, ABS21, San15, PC+ 19] for overviews. Notably, OT has
been growing into a popular and powerful technique in machine learning, for key tasks such as learning
generative models, transfer learning, and approximate inference [e.g., PC+ 19, ACB17, SRGB14, EMM12,
CFT14, MMPS16].
The OT problem should be treated differently depending on whether π0 , π1 are discrete or continuous mea-
sures. In this work, we focus on the continuous case when π0 , π1 are high dimensional absolutely con-
tinuous measures on Rd that are observed through empirical observations, a setting called data-driven OT
in [TT16]. A well known result in OT [e.g., Vil09] shows that, if π0 is continuous, the optimization in

1
(1) can be restricted to the set of deterministic couplings satisfying X1 = T (X0 ) for some continuous
transport mapping T : Rd → Rd , which is often approximated in practice with deep neural networks [e.g.,
MTOL20, KLG+ 21, KSB22, HCTC20].
However, continuous OT remains highly challenging computationally. One major difficulty is to handle the
coupling constraints of Law(X0 ) = π0 and Law(X1 ) = π1 , which are infinite dimensional when π0 and π1
are continuous. As a result, (1) can not be solved as a “clean” unconstrained optimization problem. There
are essentially two types of approaches to solving (1) in the literature. One uses Lagrange duality to turn
(1) into a certain minimax game, and the other one approximates the constraint with an integral (often
entropic-like) penalty function. However, the minimax approaches suffer from convergence and instability
issues and are difficult to solve in practice, while the regularization approach can not effectively enforce the
infinite-dimensional coupling constraints.

This work We present a different approach to continuous OT that re-frames (1) into a sequence of simple
unconstrained nonlinear least squares optimization problems, which monotonically reduce the transport cost
of a coupling while automatically preserving the marginal constraints. Different from the minimax and reg-
ularization approaches that enforce the constraints from outside, our method is an interior approach which
starts from a valid coupling (typically the naive independent coupling), and traverses inside the constraint
set to decrease the transport cost. Such an interior approach is non-trivial and has not been realized before,
because there exists no obvious unconstrained parameterization of the set of couplings of π0 and π1 .
Our method is made possible by leveraging rectified flow [LGL22], a recent approach to constructing (non-
optimal) transport maps for generative modeling and domain transfer. What makes rectified flow special is
that it provides a simple procedure that turns a given coupling into a new one that obeys the same marginal
laws, while yielding no worse transport cost w.r.t. all convex functions c simultaneously. Despite this
attractive property, as pointed out in [LGL22], rectified flow can not be used to optimize any fixed cost c, as
it is essentially a special multi-objective optimization procedure that targets no specific cost. Our method is
a variant of rectified flow that targets a user-specified cost function c and hence yields a new approach to the
OT problem (1).

Rectified flow We provide a high-level overview of the rectified flow of [LGL22] and the main results
of this work. For a given coupling (X0 , X1 ) of π0 and π1 , the rectified flow induced by (X0 , X1 ) is the
time-differentiable process Z = {Zt : t ∈ [0, 1]} over an artificial notion of time t ∈ [0, 1], that solves the
following ordinary differential equation (ODE):
dZt = vtX (Zt )dt, t ∈ [0, 1], starting from Z0 = X0 , (2)

where v X : Rd × [0, 1] → Rd is a time-dependent velocity field defined as the solution of

Z 1 h i
inf E kX1 − X0 − v(Xt , t)k2 dt, Xt = tX1 + (1 − t)X0 , (3)
v 0

and Xt is the linear interpolation between X0 and X1 . Eq (3) is a least squares regression problem of
predicting the line direction of (X1 − X0 ) from every space-time point (Xt , t) on the linear interpolation
path, yielding a solution of
vtX (z) = E [X1 − X0 | Xt = z] ,
which is the average of direction (X1 − X0 ) for all lines that pass point Xt = z at time t. The (conditional)
expectations E[·] above are w.r.t. the randomness of (X0 , X1 ). We assume that the solution of (2) exists

2
and is unique, and hence vtX (z) is assumed to exist at least on the trajectories of the ODE. The start-end
pair (Z0 , Z1 ) induced by Z is called the rectified coupling of (X0 , X1 ), and we denote it by (Z0 , Z1 ) =
Rectify((X0 , X1 )).
In practice, the expectation E[·] is approximated by empirical observations of (X0 , X1 ), and v is approxi-
mated by a parametric family, such as deep neural networks. In this case, the optimization in Eq (3) can
be solved conveniently with off-the-shelf stochastic optimizers such as stochastic gradient descent (SGD),
without resorting to minimax algorithms or expensive inner loops. This makes rectified flow attractive for
deep learning applications as these considered in [LGL22].
The importance of (Z0 , Z1 ) = Rectify((X0 , X1 )) is justified by two key properties:
1) (Z0 , Z1 ) shares the same marginal laws as (X0 , X1 ) and is hence a valid coupling of π0 and π1 ;
2) (Z0 , Z1 ) yields no larger convex transport costs than (X0 , X1 ), that is, E[c(Z1 − Z0 )] ≤ E[c(X1 − X0 )],
for every convex function c : Rd → R.
Hence, it is natural to recursively apply the Rectify mapping, that is, (Z0k+1 , Z1k+1 ) = Rectify((Z0k , Z1k ))
starting from (Z00 , Z10 ) = (X0 , X1 ), yielding a sequence of couplings that is monotonically non-increasing
in terms of all convex transport costs. The initialization can be taken to be the independent coupling
(Z00 , Z10 ) ∼ π0 × π1 , or any other couplings that can be constructed from marginal (unpaired) observa-
tions of π0 and π1 . In practice, each step of Rectify is empirically approximated by first drawing samples
of (Z0k , Z1k ) from the ODE with drift v k , and then constructing the next flow v k+1 from the optimization
in (3). Although this process accumulates errors, it was shown that one or two iterations are sufficient for
practical applications [LGL22].
Note that the Rectify procedure is “cost-agnostic” in that it does not dependent on any specific cost c.
Although the recursive Rectify update is monotonically non-increasing on the transport cost for all convex
c, it does not necessarily converge to the optimal coupling for any pre-specified c, as the update would
stop whenever two cost functions are conflicting with each other. In [LGL22], a coupling (X0 , X1 ) is
called straight if it is a fixed point of Rectify, that is, (X0 , X1 ) = Rectify((X0 , X1 )). It was shown
that rectifiable couplings that are optimal w.r.t. a convex c must be straight, but the opposite is not true in
general. One exception is the one dimension case (d = 1), for which all convex functions c (whose c-optimal
coupling exists) share a common optimal coupling that is also straight. But this does not hold when d ≥ 2.

c-Rectified flow In this work, we modify the Rectify procedure so that it can be used to solve (1) given a
user-specified cost function c. We show that this can be done easily by properly restricting the optimization
domain of v and modifying the loss function in (3). The case of quadratic loss c(x) = 12 kxk2 is particularly
simple, for which we simply need to restrict the v to be a gradient field vt = ∇ft in the optimization of (3).
For more general convex c, we need to restrict v to have a form of vt (x) = ∇c∗ (∇ft (x)), with f minimizing
the following loss function:
Z 1 h i
inf E c∗ (∇f (Xt )) − (X1 − X0 )⊤ ∇f (Xt ) + c(X1 − X0 ) dt, (4)
f 0

where c∗ denotes the conjugate function of c. Obviously when c(x) = 21 kxk2 , (4) reduces to (3) with
v = ∇f . The loss function in (4) is closely related to Bregman divergence [e.g., BMD+ 05] and the so-
called matching loss [e.g., AHW95]. We call Z = {Zt : t ∈ [0, 1]} that follows dZt = ∇c∗ (∇ft (Zt ))dt

3
with Z0 = X0 and f solving (4) the c-rectified flow of (X0 , X1 ), and the corresponding (Z0 , Z1 ) the
c-rectified coupling of (X0 , X1 ), denoted as (Z0 , Z1 ) = c-Rectify((X0 , X1 )).
Similar to the original rectified coupling, the c-rectified coupling (Z0 , Z1 ) also share the same marginal laws
as (X0 , X1 ) and hence is a coupling of π0 and π1 . In addition, (Z0 , Z1 ) yields no larger transport cost than
(X0 , X1 ) w.r.t. c, that is, E[c(Z1 − Z0 )] ≤ E[c(X1 − X0 )]. But this only holds for the specific c that is used
to define the flow, rather than all convex functions like Rectify.
More importantly, recursively performing c-Rectify allows us to find c-optimal couplings that solve the
OT problem (1). Under mild conditions, we have

(X0 , X1 ) = c-Rectify((X0 , X1 )) ⇐⇒ (X0 , X1 ) is c-optimal in (1) ⇐⇒ ℓ∗X,c = 0,

where ℓ∗X,c denotes the minimum value of the loss function in (4), which provides a criterion of c-optimality
of a given coupling without solving the OT problem. Moreover, when following the recursive update
(Z0k+1 , Z1k+1 ) = c-Rectify((Z0k , Z1k )), the ℓ∗Z k ,c is guaranteed to decay to zero with mink≤K ℓ∗Z k ,c =
O(1/K).

Notation Let C 1 (Rd ) be the set of continuously differentiable functions f : Rd → R, and Cc1 (Rd ) the
functions in C 1 (Rd ) whose support is compact. For a time-dependent velocity field v : Rd × [0, 1] → R, we
write vt (·) = v(x, t) and use v̇t (x) := ∂v(x, t) and ∇vt (x) := ∂x v(x, t) to denote the partial derivative w.r.t.
time t and variable x, respectively. We denote by C 2,1 (Rd × [0, 1]) the set of functions f : Rd × [0, 1] → R
that are second-order continuously differentiable w.r.t. x and first-order continuously differentiable w.r.t. t.
In this work, an ordinary
R t differential equation (ODE) dzt = vt (zt )dt should be interpolated as an integral
d
equation zt = z0 + 0 vt (zt )dt. For x ∈ R , kxk denotes the Euclidean norm. We always write c∗ as the
convex conjugate of c : Rd → R, that is, c∗ (x) = supy∈Rd {x⊤ y − c(y)}.
Random variables are capitalized (e.g., X, Y, Z) to distinguish them with deterministic values (e.g, x, y, z).
Recall that an Rd -valued random variable X = X(ω) is a measurable function X : Ω → Rd , where Ω is an
underlying sample space equipped with a σ-algebra F and a probability measure P. The triplet (Ω, F, P)
form the underlying probability space, which is omitted in writing in the most places. We use Law(X) to
denote the probability law of X, which is the probability measure L that satisfies L(B) = P({ω : X(ω) ∈
B}) for all measurable sets on Rd . For a functional F (X) of a random variable X, the optimization problem
minX F (X) technically means to find a measurable function X(ω) to minimize F , even though we omit the
underlying sample space Ω. When F (X) depends on X only through Law(X), the optimization problem is
equivalent to finding the optimal Law(X).

Outline The rest of the work is organized as follows. Section 2 introduces the background of optimal
transport. Section 3 reviews rectified flow of [LGL22] from an optimization-based view. Section 4 charac-
terizes the if and only if condition for two differentiable stochastic processes to have equal marginal laws.
Section 5 introduces the main c-rectified flow method and establishes its theoretical properties.

2 Background of Optimal Transport

This section introduces the background of optimal transport (OT), including both the static and dynamic
formulations. Of special importance is the dynamic formulation, which is closely related to the rectified

4
flow approach. The readers can find systematic introductions to OT in a collection of excellent textbooks
[Vil21, FG21, ABS21, PC+ 19, OPV14, San15, Vil09].

Static formulations The optimal transport problem was first formulated by Gaspard Monge in 1781 when
he studied the problem of how to redistribute mass, e.g., a pile of soil, with minimal effort. Monge’s problem
can be formulated as

inf E [c(T (X0 ) − X0 )] s.t. Law(T (X0 )) = π1 , Law(X0 ) = π0 , (5)

where we minimize the c-transport cost in the set of deterministic couplings (X0 , X1 ) that satisfy X1 =
T (X0 ) for a transport mapping T : Rd → Rd . The Monge–Kantorovich (MK) problem in (1) is the relax-
ation of (5) to the set of all (deterministic and stochastic) couplings of π0 and π1 . The two problems are
equivalent when the optimum of (1) is achieved by a deterministic coupling, which is guaranteed if π0 is an
absolutely continuous measure on Rd .
A key feature of the MK problem is that it is a linear programming w.r.t. the law of the coupling (X0 , X1 ),
and yields a dual problem of form:

sup π1 (µ) − π0 (ν) s.t. µ(x1 ) − ν(x0 ) ≤ c(x1 − x0 ), ∀(x0 , x1 ), (6)

µ,ν

where we write π1 (µ) := µ(x)dπ1 (x), and µ, ν are optimized in all functions from Rd to R. For any
R

coupling (X0 , X1 ) of π0 and π1 , and (µ, ν) satisfying the constraint in (6), it is easy to see that

E[c(X1 − X0 )] ≥ E[µ(X1 ) − ν(X0 )] = π1 (µ) − π0 (ν). (7)

As the left side of (7) only depends on (X0 , X1 ) and the right side only on (µ, ν), one can show that
(X0 , X1 ) is c-optimal and (µ, ν) solves (6) iff µ(X0 ) + ν(X1 ) = c(X1 − X0 ) holds with probability one,
which provides a basic optimality criterion. Many existing OT algorithms are developed by exploiting the
primal dual relation of (1) and (6) (see e.g., [KSB22]), but have the drawback of yielding minimax problems
that are challenging to solve in practice.
If c is strictly convex, the optimal transport map of (5) is unique (almost surely) and yields a form of

T (x) = x + ∇c∗ (∇ν(x)),

where c∗ is the convex conjugate function of c, and ν is an optimal solution of (6), which is c-convex in that
ν(x) = supy {−c(y − x) + µ(y)} with µ the associated solution. In the canonical case of quadratic cost
c(x) = 21 kxk2 , we can write T (x) = ∇φ(x), where φ(x) := 12 kxk2 + ν(x) is a convex function.

Dynamic formulations Both the MK and Monge problems can be equivalently framed in dynamic ways
as finding continuous-time processes that transfer π0 to π1 . Let {xt : t ∈ [0, 1]} be a smooth path connecting
x0 and x1 , whose time derivative is denoted as ẋt . For convex c, by Jensen’s inequality, we can represent
the cost c(x1 − x0 ) in an integral form:
Z 1 Z 1
c(x1 − x0 ) = c ẋt dt = inf c(ẋt )dt,
0 x 0

5
where the infimum is attained when xt is the linear interpolation (geodesic) path: xt = tx1 + (1 − t)x0 .
Hence, the MK optimal transport problem (1) is equivalent to
Z 1
inf E c(Ẋt )dt s.t. Law(X0 ) = π0 , Law(X1 ) = π1 , (8)
X 0

where we optimize in the set of time-differentiable stochastic processes X = {Xt : t ∈ [0, 1]}. The op-
timum of (8) is attained by Xt = tX1 + (1 − t)X0 when (X0 , X1 ) is a c-optimal coupling of (1), which
is known as the displacement interpolation [McC97]. We call the objective function in (8) the path-wise
c-transport cost.
The Monge problem can also be framed in a dynamic way. Assume the transport map T can be induced by
an ODE model dXt = vt (Xt )dt such that X1 = T (X0 ). Then the Monge problem is equivalent to
Z 1
inf E c(vt (Xt ))dt s.t. dXt = vt (Xt )dt, Law(X0 ) = π0 , Law(X1 ) = π1 , (9)
v,X 0

which is equivalent to restricting X in (8) to the set of processes that can be induced by ODEs.
Assume that Xt following dXt = vt (Xt )dt yields a density function ̺t . Then it is well known that ̺t
satisfies the continuity equation:
̺˙ t + ∇ · (vt ̺t ) = 0.
Hence, we can rewrite (9) into an optimization problem on (v, ̺), yielding the celebrated Benamou-Brenier
formula [BB00]:
Z 1Z
inf c(vt (x))̺t (x)dxdt s.t. ̺˙ t + ∇ · (vt ̺t ) = 0, ρ0 = dπ0 /dx, ρ1 = dπ1 /dx, (10)
v,̺ 0

where dπi /dx denotes the density function of πi . The key idea of (9) and (10) is to restrict the optimization
of (8) to the set of deterministic processed induced by ODEs, which significantly reduces the search space.
Intuitively, Jensen’s inequality E[c(Z)] ≥ c(E[Z]) shows that we should be able to reduce the expected cost
of a stochastic process by “marginalizing” out the randomness. In fact, we will show that, for a differentiable
stochastic process X, its (c-)rectified flow yields no larger path-wise c-transport cost in (8) than X (see
Lemma 3.3 and Theorem 5.3).
However, all the dynamic formulations above are still highly challenging to solve in practice. We will show
that c-rectified flow can be viewed as a special coordinate descent like approach to solving (8) (Section 5.4).

3 Rectified Flow: An Optimization-Based View

We introduce rectified flow of [LGL22] from an optimization-based perspective: we show that rectified flow
can be viewed as the solution of a special constrained dynamic optimization problem, which allows us to
gain more understanding of rectified flow and motivates the development of c-rectified flow.
Following [LGL22], for a time-differentiable stochastic process X = {Xt : t ∈ [0, 1]}, its expected velocity
field v X is defined as

vtX (z) = E[Ẋt | Xt = z]. (11)

6
where Ẋt denotes the time derivative of Xt . Obviously, v X is the solution of
Z 1
2
inf LX (v) := E Ẋt − vt (Xt ) dt , (12)
v 0

where the optimization is on the set of all measurable velocity fields v : Rd → Rd . The importance of v X
lies on the fact that it characterizes the time-evolution of the marginal laws ρt := Law(Xt ) of X, through
the continuity equation in the distributional sense:

∂t ρt + ∇ · (vtX ρt ) = 0, ρ0 = Law(X0 ), t ∈ [0, 1]. (13)

Precisely, Equation (13) should be interpreted by its weak and integral form:
Z t
ρt (h) − ρ0 (h) − ρt (∇h⊤ vsX )ds = 0, ρ0 = Law(X0 ), ∀h ∈ Cc1 (Rd ), t ∈ [0, 1], (14)
0

where ρt (h) := h(x)dρt (x) and Cc1 (Rd ) denotes the set of continuously differentiable functions on Rd
R

with compact support. Hence, if the solution of Eq (13)-(14) is unique, then the marginal laws {Law(Xt )}t
of X are uniquely determined by v X and the initial Law(X0 ).
We define the rectified flow of X, denoted by Z = Rectflow(X), as the ODE driven by v X :

dZt = vtX (Zt )dt, Z0 = X0 , t ∈ [0, 1]. (15)

Moreover, the rectified flow of a coupling (X0 , X1 ) is defined as the rectified flow of X when X is the
linear interpolation of (X0 , X1 ).
Definition 3.1. A stochastic process X is called rectifiable if v X exists and is locally bounded, and Equa-
tion (15) has an unique solution.
A coupling (X0 , X1 ) is called rectifiable if its linear interpolation process X, following Xt = tX1 + (1 −
t)X0 , is rectifiable. In this case, we call Z = Rectflow(X) the rectified flow of (X0 , X1 ), and write it
(with an abuse of notation) as Z = Rectflow((X0 , X1 )). The corresponding (Z0 , Z1 ) is called the rectified
coupling of (X0 , X1 ), denoted as (Z0 , Z1 ) = Rectify((X0 , X1 )).
By the definition in (15), we have v Z = v X , and hence the marginal laws Law(Zt ) of Z are governed by
the same continuity equation (13)-(14), which is a well known fact. As shown in [Kur11], Equation (15)
has an unique solution iff Equation (14) has an unique solution, which implies that Z and X share the same
marginal laws. We also assumed that the solution of (12) is unique; if not, results in the paper hold for all
solutions of (12).
Theorem 3.2 (Theorem 3.3 of [LGL22]). Assume that X is rectifiable. We have

Z = Rectflow(X) ⇒ vX = vZ ⇒ Law(Xt ) = Law(Zt ), ∀t ∈ [0, 1].

Hence, rectified flow turns a rectifiable stochastic process into a flow while preserving the marginal laws.

7
A optimization view of rectified flow We show that the rectified flow Z of X achieves the minimum
of the path-wise c-transport cost in the set of time-differentiable stochastic processes whose expected ve-
locity field equals v X . This explains that the property of non-increasing convex transport costs of rectified
flow/coupling.
Lemma 3.3. The rectified flow Z = Rectflow(X t ) in (15) attains the minimum of
Z 1 h i
inf Fc (Y ) := E c(Ẏt ) dt, s.t. v Y = v X , (16)
Y 0

which holds for any convex functions c : Rd → R.

Proof. For any stochastic process Y with vtX (z) = vtY (z) = E[Ẏt |Yt = z], we have
Z 1
Fc (Y ) = E[c(Ẏt )]dt
0
Z 1
≥ E[c(E[Ẏt |Yt ])]dt //Jensen’s inequality
0
Z 1
= E[c(v Y (Yt ))]dt
0
Z 1
= E[c(v X (Xt ))]dt //v X = v Y , and hence Law(Xt ) = Law(Yt )
0
Z 1
= E[c(v X (Zt ))]dt //Law(Xt ) = Law(Zt )
0
Z 1
= E[c(Żt )dt = F c (Z).
0

Lemma 3.3 suggests that the rectified flow decreases the path-wise c-transport cost: Fc (Z) ≤ Fc (X), for
all convex c. Note that E [c(Z1 − Z0 )] ≤ Fc (Z) by Jensen’s inequality, and E [c(X1 − X0 )] = Fc (X) if
X is the linear interpolation of (X0 , X1 ). Hence, in this case, we have

E[c(Z1 − Z0 )] ≤ Fc (Z) ≤ Fc (X) = E[c(X1 − X0 )],

which yields a proof of Theorem 3.2 of [LGL22] that the rectified coupling (Z0 , Z1 ) yields no larger convex
transport costs than (X0 , X1 ).

A primal-dual relation Let us generalize the least squares loss LX (v) in (12) to a a Bregman divergence
based loss:
Z 1 h i
L̃X,c (v) := E bc Ẋt ; vt (Xt ) dt, bc (x; y) = c(x) − c(y) − (x − y)⊤ ∇c(y),
0

where bc (·; ·) is the Bregman divergence w.r.t. c. The least squares loss LX is recovered with c(x) = 1
2 kxk2 .

8
Rectified flow can be alternatively implemented by minimizing L̃X,c with a differentiable strictly convex c,
as in this case the minimum of L̃X,c is also attended by v X (z) = E[Ẋt |Xt = z]. The c-rectified flow is
obtained if we minimize L̃X,c with v restricted to be a form of v = ∇c∗ ◦ ∇ft . See more in Section 5.
In the following, we show that the optimization in (16) can be viewed as the dual problem (11).
Theorem 3.4. For any differentiable convex function c, and rectifiable process X, we have

ℓ̃∗X,c := inf L̃X,c (v) = sup Fc (X) − Fc (Y ) s.t. v Y = v X ,

v Y

and the optima above are achieved when v = v X and Y = Rectflow(X).

Proof. Let varc (Ẋt | Xt ) := E[c(Ẋt ) − c(E[Ẋt |Xt ]) | Xt ]. For any v, we have
Z 1
L̃X,c (v) = E[c(Ẋt ) − c(v(Xt )) − (Ẋt − v(Xt ))∇c(v(Xt ))]dt
0
Z 1
= E[c(Ẋt ) − c(v(Xt )) − (v X (Xt ) − v(Xt ))∇c(v(Xt ))]dt //v X (Xt ) = E[Ẋt |Xt ]
0
Z 1
≥ E[c(Ẋt ) − c(v X (Xt ))]dt //c(v X ) ≥ c(v) + (v X − v)∇c(v)
0
Z 1
= varc (Ẋt | Xt )dt,
0

The inequality is tight when v = v X , which attains the minimum of L̃X,c .

Write RX,c (Y ) = Fc (X) − Fc (Y ). We know that Z = Rectify(X) attains the maximum of RX,c (Y )
subject to v Y = v X by Lemma 3.3. In addition,
Z 1
RX,c (Z) = E[c(Ẋt ) − c(Żt )]dt
0
Z 1
= E[c(Ẋt ) − c(vtX (Zt )])]dt
0
Z 1
= E[c(Ẋt ) − c(vtX (Xt )])]dt //Law(Zt ) = Law(Xt ), ∀t
0
Z 1
= E[c(Ẋt ) − c(E[Ẋt |Xt ])]dt
0
Z 1
= E[varc (Ẋt | Xt )]dt.
0

This concludes the proof.

R1
Straight couplings The ℓ̃∗X,c = 0 varc (Ẋt |Xt )dt above provides a measure of how much the different
paths of X intersect with each other. If c is strictly convex and ℓ̃∗X,c = 0, we have Ẋt = E[Ẋt |Xt ] almost
surely, meaning that there exist no two paths that go across a point along two different directions. In this
case, X is a fixed point of Rectflow(·), that is, X = Z = Rectify(X), because we have dXt = Ẋt dt =
E[Ẋt |Xt ]dt = v X (Xt )dt, which is the same Equation (15) that defines Z.

9
Similarly, if X is the linear interpolation of the coupling (X0 , X1 ), then ℓ̃∗X,c = 0 with strictly convex c if
and only if (X0 , X1 ) is a fixed point of the Rectify mapping, that is, (X0 , X1 ) = Rectify((X0 , X1 )),
following [LGL22]. Such couplings are called straight, or fully rectified in [LGL22]. Obtaining straight
couplings is useful for learning fast ODE models because the trajectories of the associated rectified flow
Z are straight lines and hence can be calculated in closed form without iterative numerical solvers. See
[LGL22] for more discussion.
Moreover, [LGL22] showed that rectifiable c-optimal couplings must be straight. In the one dimensional
case (d = 1), the straight coupling, if it exists, is unique and attains the minimum of E[c(X1 − X0 )] for
all convex functions for which c-optimal coupling exists. For higher dimensions (d ≥ 2), however, straight
couplings are not unique, and the specific straight coupling obtained at the convergence of the recursive
Rectify update (i.e. (Z0k+1 , Z1k+1 ) = Rectify((Z0k , Z1k ))) is implicitly determined by the initial coupling
(Z00 , Z10 ), and is not expected to be optimal w.r.t. any pre-fixed c.
The following counter example shows a somewhat stronger negative result: there exist straight couplings
that are not optimal w.r.t. all second order differentiable convex functions with invertible Hessian matrices.
Example 3.5. Take π0 = π1 = N (0, I). Hence, for c(x) = kxkp with p > 0, the c-optimal mapping is the
trivial identity coupling (X0 , X0 ) with X0 ∼ π0 .
However, consider the coupling (X0 , AX0 ), where A is a non-identity and non-reflecting rotation matrix
(namely A⊤ A = I, det(A) = 1, A 6= I and A does not have λ = −1 as an eigenvalue). Then (X0 , AX0 )
is a straight coupling of π0 and π1 , but it is not c-optimal for all second order differentiable convex function
c whose Hessian matrix is invertible everywhere. See Appendix for the proof.
It is the rotation transform that makes (X0 , AX0 ) sub-optimal, which is removed in the proposed c-rectified
flow in Section 5 via a Helmholtz like decomposition.

4 Differentiable Processes with Equivalent Marginal Laws

The marginal preserving property of rectified flow is due to the property of v Z = v X by construction.
However, we show in this section that v X = v Z is only a sufficient condition: two differentiable processes
X and Z can have the same marginal laws even if r := v X − v Z 6= 0. This is because r, as illustrated in
Example 3.5, can be a rotation-only vector field (in a generalized sense shown below) that introduces rotation
components into the dynamics without modifying the marginal distributions. Therefore, the constraint of
v Y = v X in the optimization problem (16) may be too restrictive. A natural relaxation of (16) would be
Z 1
inf Fc (Y ) := E c(Ẏt )dt , s.t. Law(Yt ) = Law(Xt ), ∀t ∈ [0, 1] , (17)
Y 0

which yields a dynamic OT problem with a continuum of marginal constraints. In Section 5, we show that
the solution of (17) yields our c-rectified flow that solve the OT problem at the fixed point. Solving (17)
allows us to remove the rotational components of v X , which is what what renders rectified flow non-optimal.
In this section, we first characterize the necessary and sufficient condition for having equivalent marginal
laws.
Definition 4.1. A time-dependent vector field r : Rd × [0, 1] → Rd is said to be X-marginal-preserving if
Z t
E[∇h(Xt )⊤ rt (Xt )] = 0, ∀t ∈ [0, 1], h ∈ Cc1 (Rd ). (18)
0

10
Equation (18) is equivalent to saying that E[∇h(Xt )⊤ rt (Xt )] = 0 holds almost surely assuming that t is a
random variable following Uniform([0, 1]) (i.e., t-almost surely). Let ρt = Law(Xt ) and it yields a density
function ̺t . Using integration by parts, we have
Z Z
0 = E[∇h(Xt )⊤ rt (Xt )] = ∇h(x)⊤ rt (x)̺t (x)dx = − h(x)∇ · (rt (x)̺t (x))dx, ∀h ∈ Cc1 (Rd ),

which gives ∇ · (rt ̺t ) = 0. This says that rt ̺t is a rotation-only (or divergence-free) vector field in the
classical sense.
Lemma 4.2. Let X and Y be two stochastic processes with the same initial distributions Law(X0 ) =
Law(Y0 ). Assume that X is rectifiable, and vtY (z) := E[Ẏt |Yt = z] exists and is locally bounded.
Then X and Y share the same marginal laws at all time, that is, Law(Xt ) = Law(Yt ), ∀t ∈ [0, 1], if and
only if v X − v Y is Y -marginal-preserving.

Proof. Taking any h in Cc1 (Rd ), we have for t ∈ [0, 1]

Z t
E[h(Xt )] − E[h(X0 )] = E[∇h(Xs )⊤ Ẋs ]ds
0
Z t
= E[∇h(Xs )⊤ vsX (Xs )]ds //vsX (Xs ) = E[Ẋs |Xs ].
0

This suggests that the marginal law ρt := Law(Xt ) satisfies

Z t
ρt (h) − ρ0 (h) − ρs (∇h⊤ vsX )ds = 0, ∀h ∈ Cc1 (Rd ), (19)
0
R
where we define ρt (h) = h(x)dρt (x). Equation (19) is formally written as the continuity equation:

ρ̇t + ∇ · (gtX ρt ) = 0. (20)

Similarly, ρ̃t := Law(Yt ) satisfies

Z t
ρ̃t (h) − ρ̃0 (h) − ρ̃s (∇h⊤ vsY )ds = 0, ∀h, (21)
0

If vtX − vtY is Law(Yt )-preserving for ∀t ∈ [0, 1], we have

Z t
E[h(Yt )] − E[h(Y0 )] = E[∇h(Ys )⊤ Ẏs ]ds
0
Z t
= E[∇h(Ys )⊤ vsY (Ys )]ds
0
Z t Z t
= E[∇h(Ys )⊤ vsX (Ys )]ds + E[∇h(Ys )⊤ (vsY (Ys ) − vsX (Ys ))]ds
0 0
Z t
= E[∇h(Ys )⊤ vsX (Ys )]ds //v X − v Y is Y -preserving,
0

11
which suggests that ρ̃t := Law(Yt ) solves the same continuity equation (20), starting from the same ini-
tialization as Law(X0 ) = Law(Y0 ). Hence, we have ρt = ρ̃t if the solution of (20) is unique, which is
equivalent to the uniqueness of the solution of dZt = vtX (Zt ) in (15) following Corollary 1.3 of [Kur11].
On the other hand, if ρt = Law(Xt ) = Law(Yt ) = ρ̃t , following (19) and (21), we have for any h ∈
Cc1 (Rd ),
Z t Z t
0= ρ̃t (∇h⊤ vsX ) − ρ̃t (∇h⊤ vsY )ds = ∇h(Ys )⊤ (v X (Ys ) − v Y (Ys ))ds,
0 0

which is the definition of Y -marginal-preserving following (18).

5 c-Rectified Flow
We introduce c-rectified flow, a c-dependent variant of rectified flow that guarantees to minimize the c-
transport cost when applied recursively. This section is organized as follows: Section 5.1 defines and dis-
cusses the c-rectified flow of a differentiable stochastic process X, which we show yields the solution of the
infinite-marginal OT problem (17). Section 5.2 considers the c-rectified flow of a coupling (X0 , X1 ), which
we show is non-increasing on the c-transport cost. Section 5.3 proves that the fixed points of c-Rectify
are c-optimal. Section 5.4 interprets c-rectified flow as an alternating direction descent method for the
dynamic OT problem (8), and a majorize-minimization (MM) algorithm for the static OT problem (1). Sec-
tion 5.5 discusses a key lemma relating c-optimal couplings and its associated displacement interpolation
with Hamilton-Jacobi equation.

5.1 c-Rectified Flow of Time-Differentiable Processes X

For a convex cost function c : Rd → R and a time-differentiable process X, the c-rectified flow of X,
denoted as Z = c-Rectflow(X), is defined as the solution of
dZt = gtX,c (Zt )dt, Z0 = X0 , with gtX,c (z) = ∇c∗ (∇ftX,c (z)), t ∈ [0, 1], (22)

where c∗ (x) := supy {x⊤ y − c(y)} is the convex conjugate of c, and f X,c : Rd × [0, 1] → R is the optimal
solution of
Z 1 h i
inf LX,c (f ) := E mc Ẋt ; ∇ft (Xt ) dt , (23)
f 0

where mc : Rd × Rd → [0, +∞) is a loss function defined as

mc (x; y) = c(x) − x⊤ y + c∗ (y).
Note that we have mc (x; y) ≥ 0 for ∀x, y following the definition of the conjugate c∗ (or the Fenchel-Young
inequality). Losses of form mc (x; y) is equivalent to the so called matching loss proposed for learning
generalized linear models [AHW95].
Compared with the original rectified flow, the difference of c-rectified flow is i) restricting the velocity
field to a form of gt = ∇c∗ ◦ ∇ft , and ii) replacing the quadratic objective function to the matching loss.
These two changes combined yield a Helmholtz like decomposition of v X as we show below, allowing us
to remove the “rotation-only” component of v X and obtain c-optimal couplings at fixed points.

12
Bregman divergence, Helmholtz decomposition, marginal preserving We can equivalently write (23)
using Bergman divergence associated with c, that is,

bc (x; y) := c(x) − c(y) − ∇c(y)⊤ (x − y).

Then it is easy to see that mc (x; y) = bc (x; ∇c∗ (y)), by using the fact that ∇c(∇c∗ (y)) = y and c∗ (y) =
y ⊤ ∇c∗ (y) − c(∇c∗ (y)). Hence, mc and bc are equivalent up to the monotonic transform ∇c∗ on y. The
minimum bc (x; y) = 0 is achieved when y = x, while mc (x; y) = 0 is achieved when ∇c∗ (y) = x.
Therefore, (23) is equivalent to
Z 1 h i
inf E bc Ẋt ; gt (Xt )) dt, with gt = ∇c∗ ◦ ∇ft . (24)
f 0

Moreover, the generalized Pythagorean theorem of Bregman divergence (e.g., [BMD+ 05]) gives
h i h i h h ii
E bc Ẋt ; gt | Xt = bc E Ẋt |Xt ; gt + E bc Ẋt ; E Ẋt |Xt . (25)

h i
Because v X (Xt ) = E Ẋt |Xt and the last term of (25) is independent with gt , we can further reframe (23)
into
Z 1
E bc vtX (Xt ); gt (Xt )) dt, with gt = ∇c∗ ◦ ∇ft ,

min (26)
f 0

which can be viewed as projecting the expected velocity vtX to the set of functions of form gt = ∇c∗ ◦ ∇ft ,
w.r.t. the Bregman divergence. This yields an orthogonal decomposition of vtX :

vtX = ∇c∗ ◦ ∇ftX,c + rtX,c , (27)

where rtX,c = vtX,c − ∇c∗ ◦ ∇ftX,c is the residual term. The key result below shows that r X,c is X-
marginal-preserving, which ensures that the c-rectified flow preserves the marginals of X.
Definition 5.1. We say that X is c-rectifiable if v X exists, the minimum of (23) exists and is attained by a
locally bounded function f X,c , and the solution of Equation (22) exists and is unique.
Theorem 5.2. Assume that X is c-rectifiable, and c∗ := supy {x⊤ y − c(y)} and c∗ ∈ C 1 (Rd ). We have
i) v X − g X ,c is X-marginal-preserving.
ii) Z = c-Rectify(X) preserves the marginal laws of X, that is, Law(Zt ) = Law(Xt ), ∀t ∈ [0, 1].

Proof. i) By vtX (z) = E[Ẋt |Xt = z], the loss function in (23) is equivalent to
Z 1 h i
LX,c (f ) = E c∗ (∇ft (Xt )) − E[Ẋt | Xt ]⊤ ∇ft (Xt ) + c(Ẋt ) dt
0
Z 1 h i
= E c∗ (∇ft (Xt )) − vtX (Xt )⊤ ∇ft (Xt ) + c(Ẋt ) dt.
0

13
By Euler-Lagrange equation, we have
Z 1 h i
E (∇c∗ (∇fsX,s (Xs )) − v X (Xs ))⊤ ∇gs (Xs ) ds = 0, ∀g : gs ∈ Cc1 (Rd ).
0

Taking gs = h if s < t and gs = 0 if s > t yields that r X,c (x) = ∇c∗ (∇fsX,c (Xs )) − v X (Xs ) is
X-marginal-preserving following (18).
ii) Note that Z is rectifiable if X is c-rectifiable. Applying Lemma 4.2 yields the result.

For the quadratic cost c(x) = c∗ (x) = 12 kxk2 , the ∇c∗ is the identity mapping, and (27) reduces to the
Helmholtz decomposition, which represents a velocity field into the sum of a gradient field and a divergence-
free field. Hence, (27) yields a generalization of Helmholtz decomposition, in which a monotonic transform
∇c∗ is applied on the gradient field component. We call (27) a Bregman Helmholtz decomposition.

Remark: score matching In some special cases, v X may already be a gradient field, and hence the
rectified flow and c-rectified flow coincide for c(x) = 21 kxk2 . One example of this is when Xt = αt X1 +βt ξ
for some time-differentiable functions αt and βt , and ξ ∼ N (0, I), satisfying α1 = 1, β1 = 0, and X0 =
α0 X1 + β0 ξ. In this case, one can show that
ζt
vtX (z) = E[α̇t X1 + β̇t X0 | Xt = z] = ∇ft (z), with ft (z) = ηt log ̺t (z) + kzk2 ,
2

z−αt x1
dπ1 (x1 ) and φ(z) = exp(− kzk2 /2),
R
where ̺t is the density function of Xt with ̺t (z) ∝ φ βt
and ηt = βt2 (α̇t /αt − β̇t /βt ) and ζt = α̇t /αt . This case covers the probability flow ODEs [SSDK+ 20] and
denoising diffusion implicit models (DDIM) [SME20] with different choices of αt and βt as suggested in
[LGL22]. When ζt = 0, as the case of [SE19], vtX is proportional to ∇ log ρt , the score function of ̺t , and
the least squares loss LX (v) in (12) reduces to a time-integrated score matching loss [HD05, Vin11].
However, vtX is generally not a score function or gradient function, especially in complicate cases when
the coupling (X0 , X1 ) is induced from the previous rectified flow as we iteratively apply the rectification
procedure. In these cases, it is necessary to impose the gradient form as we do in c-rectified flow.

c-Rectified flow solves Problem (17) We are ready to show that the c-rectified flow solves the optimization
problem in (17). Further, (23) forms a dual problem of (17).
Theorem 5.3. Under the conditions in Theorem 5.2, we have
i) Z = c-Rectify(X) attains the minimum of (17).
ii) Problem (17) and (23) has a strong duality:

inf LX,c (f ) = sup {Fc (X) − Fc (Y ) : Law(Yt ) = Law(Xt ), ∀t ∈ [0, 1]} .

f Y

As the optima above are achieved by f X,c and Z, we have LX,c (f X,c ) = Fc (X) − Fc (Z).

14
Proof. Write RX,c (Y ) = Fc (X) − Fc (Y ). First, we show that LX,c (f ) ≥ RX,c (Y ) for any f and Y that
satisfies Law(Yt ) = Law(Xt ), ∀t:

RX,c (Y )
Z 1
=E c(Ẋt ) − c(Ẏt )dt
0
(1)
Z 1
∗ ⊤
≤E c(Ẋt ) + c (∇ft (Yt )) − Ẏt ∇ft (Yt )dt //Fenchel-Young inequality: c(y) ≥ x⊤ y − c∗ (x)
0
Z 1
∗ ⊤
=E Y
c(Ẋt ) + c (∇ft (Yt )) − vt (Yt ) ∇ft (Yt )dt //vtY (Yt ) = E[Ẏt |Yt ]
0
Z 1
∗ Y ⊤
=E c(Ẋt ) + c (∇ft (Xt )) − vt (Xt ) ∇ft (Xt )dt //Law(Xt ) = Law(Yt )
0
Z 1
∗ ⊤
=E X
c(Ẋt ) + c (∇ft (Xt )) − vt (Xt ) ∇ft (Xt )dt //v X − v Y is X-marginal-preserving
0
= LX,c (f ).
(1)
Moreover, if we take Y = Z and f = f X,c , then the inequality in ≤ is tight because Żt = ∇c∗ (∇ft (Yt ))
holds t-almost surely. Therefore, RX,c (Z) = LX,c (f X,c ) ≥ RX,c (Y ), which suggests that Z attains the
maximum of RX,c (under the marginal constraints) and the strong duality holds.

5.2 c-Rectified Flow of Coupling (X0 , X1 )

Similar to the case of rectified flow, the c-rectified flow/coupling of a coupling (X0 , X1 ) is defined as the
c-rectified flow/coupling of its linear interpolation process. In the following, we show that the c-rectified
coupling of a coupling yields no larger c-transport cost.
Definition 5.4. Let X be the linear interpolation of coupling (X0 , X1 ) in that Xt = tX1 + (1 − t)X0 , ∀t ∈
[0, 1]. We say that (X0 , X1 ) is c-rectifiable if X is c-rectifiable, and call Z = c-Rectflow(X) the c-
rectified flow of (X0 , X1 ). We call the induced (Z0 , Z1 ) the c-rectified coupling of (X0 , X1 ), denoted as
(Z0 , Z1 ) = c-Rectify((X0 , X1 )).
Note that the c-transport cost E[c(X1 − X0 )] is related to the path-wise c-transport cost Fc (X) via
Z 1
Fc (X) = E[c(X1 − X0 )] + Sc (X), Sc (X) := E[c(Ẋt ) − c(X1 − X0 )]dt,
0

R 1 measurement Rof1 how close X is to be geodesic: We have Sc (X) ≥ 0

where Sc (X) is a non-negative
following Jensen’s inequality 0 c(Ẋt )dt ≥ c( 0 Ẋt dt) = c(X1 − X0 ), and Sc (X) = 0 if Xt = tX1 +
(1 − t)X0 .
Hence, when X is the linear interpolation of (X0 , X1 ), we have from Theorem 5.3 that

E[c(X1 − X0 )] − E[c(Z1 − Z0 )] = Sc (Z) + LX,c (f X,c ) ≥ 0. (28)

which establishes that (Z0 , Z1 ) yields no larger transport cost than (X0 , X1 ).
Theorem 5.5. Assume that c is convex with conjugate c∗ ∈ C 1 (Rd ), and the conditions in Definition 5.4
holds. Then Equation (28) holds and E[c(Z1 − Z0 )] ≤ E[c(X1 − X0 )].

15
Compared with the regular Rectify mapping, the key difference here is that the monotonicity of c-Rectify
only holds for the specific c that it employees, rather than all convex cost functions. More importantly, as
we show below, recursively applying c-Rectify yields optimal couplings w.r.t. c, a key property that the
regular rectified flow misses.

5.3 Fixed Points of c-Rectify are c-Optimal

We show three key results regarding the optimality of fixed points of the c-Rectify mapping:
1) A coupling (X0 , X1 ) is a fixed point of c-Rectify, that is, (X0 , X1 ) = c-Rectify((X0 , X1 )), if and
only if it is c-optimal;
2) Define ℓ∗X,c = inf f LX,c (f ) where X is the linear interpolation of (X0 , X1 ). Then ℓ∗X,c yields an
indication of c-optimality of (X0 , X1 ), that is, L∗X,c = 0, iff (X0 , X1 ) is c-optimal.
3) The minimum ℓ∗X,c in the first k iterations of c-Rectify steps decreases with an O(1/k) rate.

Theorem 5.6. Assume that c is convex with conjugate c∗ , and c, c∗ ∈ C 1 (Rd ) and X is the linear interpo-
lation process of (X0 , X1 ). Assume that (X0 , X1 ) is a c-rectifiable coupling, and f X,c ∈ C 2,1 (Rd × [0, 1]).
Then the following statements are equivalent:
i) (X0 , X1 ) is a fixed point of c-Rectify, that is, (X0 , X1 ) = c-Rectify(X0 , X1 ).
ii) ℓ∗X,c := inf f LX,c (f ) = LX,c (f X,c ) = 0, for LX,c in (23).
iii) (X0 , X1 ) is a c-optimal coupling.

Proof. i) → ii). If (Z0 , Z1 ) = (X0 , X1 ), we have Sc (Z) = 0 and LX,c (f X,c ) = 0 following (28).
iii) → ii). If (X0 , X1 ) is c-optimal, we have E[c(X1 − X0 )] ≤ E[c(Z1 − Z0 )], which again implies that
LX,c (f X,c ) = 0 following (28).
ii) → i) Note that
Z 1 h i
LX,c (f X,c ) = E bc Ẋt ; gtX,c (Xt ) dt ≥ 0.
0

Therefore, LX,c (f X,c ) = 0 implies that Ẋt = gtX,c (Xt ) t-almost surely. Because Zt satisfies the same
equation, whose solution is assumed to be unique, we have Z = X and hence (Z0 , Z1 ) = (X0 , X1 ).
ii) → iii) Because X is the linear interpolation, we have Xt = tX1 + (1 − t)X0 , and it simultaneously
satisfies the ODE dXt = gtX,c (Xt )dt. Using Lemma 5.9 shows that (X0 , X1 ) is c-optimal.

Knowing that LX,c (f X,c ) is an indication of c-optimality, we show below that it is guaranteed to converge
to zero with recursive Rectify updates.
Corollary 5.7. Let Z k be the k-th c-rectified flow of (X0 , X1 ), satisfying Z k+1 = c-Rectflow((Z0k , Z1k ))
and (Z00 , Z10 ) = (X0 , X1 ). Assume each (Z0k , Z1k ) is c-rectifiable for k = 0, . . . , K. Then
K
X k
,c
LZ k ,c (f Z ) + Sc (Z k+1 ) ≤ E[c(X1 − X0 )].
k=0

16
k
Therefore, if E[c(X1 − X0 )] < +∞, we have mink≤K LZ k ,c (f Z ,c ) + Sc (Z k+1 ) = O(1/K).

Proof. Applying (28) to (Z0k , Z1k ) and (Z0k+1 , Z1k+1 ) yields

k
LZ k ,c (f Z ,c
) + Sc (Z k+1 ) = E[c(Z1k − Z0k )] − E[c(Z1k+1 − Z0k+1 )].

Summing it over k = 0, . . . , K,
K K
X k X
LZ k ,c (f Z ,c
) + Sc (Z k+1 ) = E[c(Z1k − Z0k )] − E[c(Z1k+1 − Z0k+1 )]
k=0 k=0
= E[c(Z10 − Z00 )] − E[c(Z1K+1 − Z0K+1 )]
≤ E[c(X1 − X0 )].

5.4 c-Rectified Flow as Optimization Algorithms

In this section, we draw more understanding on how iterative c-rectified flowing solves the static and dy-
namic OT problems. We first show that c-rectified flow can be viewed as an alternative direction descent on
the dynamic OT problem (8), and then that c-rectified coupling as a majorize-minimization (MM) algorithm
on the statistic OT problem (1). The results in this section are framed in terms of a general path-wise loss
function Fc (Y ), and hence provide a useful starting point for deriving c-rectified flow like approaches to
more general optimization problems with coupling constraints.

c-Rectified flow as alternative direction descent on (8) The mapping Z k+1 = c-Rectflow(Z k ) can be
interpreted as an alternative direction descent procedure for the dynamic OT problem (8):
n o
X k = arg min Fc (Y ) s.t. (Y0 , Y1 ) = (Z0k , Z1k ) , (29)
Y
n o
Z k+1 = arg min Fc (Y ) s.t. Law(Yt ) = Law(Xtk ), ∀t ∈ [0, 1] . (30)
Y

Here in (29), we minimize Fc (Y ) in the set of processes whose start-end pair (Y0 , Y1 ) equals the coupling
(Z0k , Z1k ) from Z k , which simply yields the linear interpolation Xtk = tZ1k +(1−t)Z0k by Jensen’s inequality.
In (30), we minimize Fc (Y ) given the path-wise marginal constraint of Law(Yt ) = Law(Xtk ) for all time
t ∈ [0, 1], which yields the c-rectified flow following Theorem 5.3. Note that the updates in both (29) and
(30) keep the start-end marginal laws Law(Y0 ) and Law(Y1 ) unchanged, and hence the algorithm stays
inside the feasible set {Y : Law(Y0 ) = π0 , Law(Y1 ) = π1 } in (8) once it is initialized to be so.
The updates in (29)-(30) highlight a key difference between our method and the Benamou-Brenier ap-
proach (9)-(10): the key idea of Benamou-Brenier is to restrict the optimization domain to the set of
deterministic, ODE-induced processes (a.k.a. flows), but our updates alternate between the deterministic
c-rectified flow Z k and the linear interpolation process X k , which is not deterministic or ODE-inducable
unless the fixed point is achieved.

17
c-Rectified flow as an MM algorithm The majorize-minimization (MM) algorithm [HL04] is a general
optimization recipe that works by finding a surrogate function that majorizes the objective function. Let
F (X) be the objective concave function to be minimize. An MM algorithm consists of iterative update of
form X k+1 ∈ arg minY F + (Y | X k ), where F + is a majorization function of F that satsifies

F (Y ) = min F + (Y | X), and the minimum is attained when X = Y .

In this case, the MM update guarantees that F (X k ) is monotonically non-increasing:

F (X k+1 ) ≤ F + (X k+1 |X k ) ≤ F + (X k | X k ) = F (X k ).

One can also view MM as conducting coordinate descent on (X, Y ) for solving minX,Y F + (Y | X).
In the following, we show that (Z0k+1 , Z1k+1 ) = c-Rectify((Z0k , Z1k )) can be interpreted as an MM algo-
rithm for the static OT problem (1) for minimizing E[c(X1 − X1 )] in the set of couplings of π0 and π1 . The
majorization function corresponding to c-Rectify can be shown to be
n o
Fc+ ((Y0 , Y1 ) | (X0 , X1 )) = inf Fc (Ỹ ) s.t. (Ỹ0 , Ỹ1 ) = (Y0 , Y1 ), Y ∈ MX ,
Ỹ
with MX = {Y : Law(Yt ) = Law(tX1 + (1 − t)X0 ), ∀t ∈ [0, 1]},

where Fc+ ((Y0 , Y1 ) | (X0 , X1 )) denotes the minimum value of Fc (Ỹ ) for Ỹ whose start-end points equal
(Y0 , Y1 ), and yields the same marginal laws as that of the linear interpolation process of (X0 , X1 ).
Proposition 5.8. i) Fc+ yields a majorization function of the c-transport cost E[c(Y1 − Y0 )] in the sense that

E[c(Y1 − Y0 )] = min {Fc+ ((Y0 , Y1 ) | (X0 , X1 )), s.t. (X0 , X1 ) ∈ Π0,1 },

(X0 ,X1 )

and the minimum is attained by (X0 , X1 ) = (Y0 , Y1 ), where Π0,1 denotes the set of couplings of π0 and π1 .
ii) c-Rectify yields the MM update related F + in that

c-Rectify((X0 , X1 )) ∈ arg min Fc+ ((Y0 , Y1 ) | (X0 , X1 )).

(Y0 ,Y1 )∈Π0,1

Proof. i) For any coupling (X0 , X1 ) and (Y0 , Y1 ), we have

n o
Fc+ ((Y0 , Y1 )|(X0 , X1 )) ≥ inf F c (Ỹ ) s.t. (Ỹ0 , Ỹ1 ) = (Y0 , Y1 ) = E[c(Y1 − Y0 )],
Ỹ

where the inequality holds because remove the constraint Y ∈ MX . In addition, it is obvious that the
inequality above becomes equality when (X0 , X1 ) = (Y0 , Y1 ).
ii) Note that
inf Fc+ ((Y0 , Y1 ) | (X0 , X1 )) = inf {Fc (Y ) s.t. Y ∈ MX } ,
(Y0 ,Y1 ) Y

whose minimum of the right side is attained by Y = c-Rectflow((X0 , X1 )) following Theorem 5.3.
Hence, the minimum of the left side is attained by (Y0 , Y1 ) = c-Rectify((X0 , X1 )).

18
5.5 Hamilton-Jacobi Equation and Optimal Transport
The proof of Theorem 5.6 relies on a key lemma shows that if the trajectories of an ODE of form dXt =
∇c∗ (∇ft (Xt ))dt are geodesic in that Xt = tX1 + (1 − t)X0 , then the induced coupling (X0 , X1 ) is an c-
optimal coupling of its marginals. The proof of this lemma relies on Hamilton-Jacobi (HJ) equation, which
provides a characterization of f for an ODE dXt = ∇c∗ (∇ft (Xt ))dt whose trajectories are geodesic. The
connection between HJ equation and optimal transport has been a classic result and can be found in, for
example, [Vil21, Vil09].
Lemma 5.9. Let vt (x) = ∇c∗ (∇ft (x)) where c∗ ∈ C 1 (Rd ) is a convex function c, and f ∈ C 2,1 (Rd ×[0, 1])
and ∇c∗ is an injective mapping. Assume all trajectories of dxt = vt (xt )dt are geodesic paths in that
xt = tx1 + (1 − t)x0 . Then we have:
i) There exists f˜t such that ∇f˜t = ∇ft (and hence we can replace f with f˜ in the assumption), such that the
following Hamilton–Jacobi (HJ) equation holds

∂t f˜t (x) + c∗ (∇f˜t (x)) = 0, ∀x ∈ Rd , t ∈ [0, 1], (HJ equation). (31)

ii) f satisfies

yt − y0

ft (yt ) = inf tc + f0 (y0 ) , ∀t ∈ [0, 1], yt ∈ Rd , (Hopf-Lax formula)
y0 ∈Rd t

where the minimum is attained if {yt } follows the ODE dyt = vt (yt )dt.
iii) Assume a coupling (X0 , X1 ) of π0 , π1 satisfies dXt = vt (Xt )dt. Then (X0 , X1 ) is a c-optimal coupling.

Proof. i) Starting from any point xt = x ∈ Rd at time t, because the trajectories of dxt = vt (xt )dt are
geodesic, we have ẋt = vt (xt ) = const following the trajectory. Because vt (x) = ∇c∗ (∇ft (x)) and ∇c∗
is injective, we have ∇ft (xt ) = const as well. Hence, we have

d
0= ∇ft (xt ) = ∂t ∇ft (xt ) + ∇2 ft (xt )ẋt
dt
= ∂t ∇ft (xt ) + ∇2 ft (xt )∇c∗ (∇ft (xt )).

On the other hand, define ht (x) = ∂t ft (x) + c∗ (∇ft (x)). Then we have

∇x ht (x) = ∂t ∇ft (x) + ∇2 ft (xt )∇c∗ (∇ft (x)) = 0.

This suggests
Rt that ∇x ht (x) = 0 everywhere and hence ht (x) does not depend on x. Define f˜t (x) =
ft (x) − 0 ht (x0 )dt, where x0 is any fixed point in Rd . Then

h̃t (x) := ∂t f˜t (x) + c∗ (∇f˜t (x)) = ht (x) − ht (x0 ) = 0.

19
ii) Take any y0 , y1 in Rd , let yt = ty1 + (1 − t)y0 be their linear interpolation. We have
f1 (y1 ) − f0 (y0 )
Z 1
= (∂t ft (yt ) + ∇ft (yt )⊤ (yt − y0 ))dt
0
Z 1
= ∇ft (yt )⊤ (y1 − y0 ) − c∗ (∇ft (yt ))dt //ht = ∂ft + c∗ (∇ft ) = 0
0
(1)
Z 1
≤ c(y1 − y0 )dt //c(x) + c∗ (y) ≥ x⊤ y
0
= c(y1 − y0 ).
(1)
The equality in ≤ is attained if yt follows the geodesic ODE dyt = vt (yt )dt as we have y1 − y0 =
∇c∗ (∇ft (yt )), ∀t in this case. A similar derivation holds for ft .
iii) Note that i) gives that c(y1 − y0 ) ≥ f1 (y1 ) − f0 (y0 ). For any coupling (Y0 , Y1 ) of π0 , π1 , we have
E[c(Y1 − Y0 )] ≥ E[f1 (Y1 ) − f0 (Y0 )] = E[f1 (X1 ) − f0 (X0 )] = E[c(X1 − X0 )].
Hence, (X0 , X1 ) is a c-optimal coupling.

Connection to Benamou-Brenier Formula The results in Lemma 5.9 can also formally derived from
Benamou-Brenier problem (10), as shown in the seminal work of [BB00]. By introducing a Lagrangian
multiplier λ : Rd × [0, 1] → R for the constraint of ̺˙ t + ∇ · (vt ̺t ) = 0, the problem in (10) can be framed
into a minimax problem:
Z Z
inf sup L(v, ̺, λ) := c(vt )̺t + λt (̺˙ t + ∇ · (vt ̺t ) s.t. ̺ ∈ Γ0,1 ,
v,̺ λ
where L(v, ̺, λ) is the Lagrangian function, and Γ0,1 denotes the set of density functions {̺t }t satisfying
̺0 = dπ0 /dx, ̺1 = dπ1 /dx. Note that the following integration by parts formulas:
Z Z
⊤
λt ∇ · (vt ̺t ) + ∇λt vt ̺t = 0, λt ̺˙ t + λ̇t ̺t = λ1 ̺1 − λ0 ̺0 ,

where we assume that λt vr ρt decays to zero sufficiently fast at infinity. We have

Z
L(v, ̺, λ) = (λ1 ̺1 − λ0 ̺0 ) + (c ◦ vt )ρt − λ̇t ̺t − ∇λ⊤ t (vt ̺t ).

At the saddle points, the functional derivations of L equal zero, yielding

δL δL
= c(vt ) − λ̇t − ∇λ⊤
t vt = 0, = (∇c(vt ) − ∇λt )̺t = 0.
δ̺t δvt
Assume ̺t is positive everywhere and note that ∇c∗ (∇c(x)) = x, we have vt = ∇c∗ (∇λt ), and hence
δL
∇λ⊤ ∗ ∗
t vt − c(vt ) = c (∇λt ). Plugging it back to δρt = 0 yields that λ̇t + c (∇λt ) = 0. Overall, the (formal)
KKT condition of (10) is
̺˙ t + ∇ · (vt ̺t ) = 0, ρ0 = dπ0 /dx, ρ1 = dπ1 /dx //coupling condition
∗
vt = ∇c (∇λt ) //mapping is gradient of convex function
λ̇t + c∗ (∇λt ) = 0. //Hamilton-Jacobi equation
This matches the result in Lemma 5.9 with λt = f˜t .

20
6 Discussion and Open Questions
1. Corollary 5.7 only bounds the surrogate measure ℓ∗Z k ,c . Can we directly bound the optimality gap on
the c-transport cost e∗k = E[c(Z1k − Z0k )] − inf (Z0 ,Z1 ) E[c(Z1 − Z0 )]? Can we find a certain type of
strong convexity like condition, under which e∗k decays exponentially with k?
2. For machine learning (ML) tasks such as generative models and domain transfer, the transport cost
is not necessarily the direct object of interest. In these cases, as suggested in [LGL22], rectified flow
might be preferred because it is simpler and does not require to specify a particular cost c. Question:
for such ML tasks, when would it be preferred to use OT with a specific c, and how to choose c
optimally?
3. In practice, recursively applying the (c-)rectification accumulates errors because the training optimiza-
tion for the drift field and the simulation of the ODE can not be conducted perfectly. How to avoid
the error accumulation at each step? Assume {x1,i }i ∼ π1 , and {z0,i k , z k } is obtained by solving the
1,i i
ODE of the k-th c-rectified flow starting from z0,i k ∼ π . As we increase k, {z k } may yield increas-
0 0,i i
ingly bad approximation of π1 due to the error accumulation. One way to fix this is to adjust {z1,i k }

to make it closer to {xk1,i }i at each step. This can be done by reweighting/transporting {z1,i
k } towards
i
k k k
{x1,i }i by minimizing certain discrepancy measure, or replacing each z1,i with xσ(i) where σ is a
(i) (i)
permutation that yields a one-to-one matching between {z1 } and {x1 }i . The key and challenging
part is to do the adjustment in a good and fast way, ideally with a (near) linear time complexity.
4. With or without the adjustment step, build a complete theoretical analysis on the statistical error of
the method.
5. In what precise sense is rectified flow solving a multi-objective variant of optimal transport?

References
[ABS21] Luigi Ambrosio, Elia Brué, and Daniele Semola. Lectures on optimal transport. Springer,
2021.

[ACB17] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial net-
works. In International conference on machine learning, pages 214–223. PMLR, 2017.

[AHW95] Peter Auer, Mark Herbster, and Manfred KK Warmuth. Exponentially many local minima for
single neurons. Advances in neural information processing systems, 8, 1995.

[BB00] Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the
monge-kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000.

[BMD+ 05] Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty.
Clustering with bregman divergences. Journal of machine learning research, 6(10), 2005.

[CFT14] Nicolas Courty, Rémi Flamary, and Devis Tuia. Domain adaptation with regularized optimal
transport. In Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pages 274–289. Springer, 2014.

21
[EMM12] Tarek A El Moselhy and Youssef M Marzouk. Bayesian inference with optimal maps. Journal
of Computational Physics, 231(23):7815–7850, 2012.

[FG21] Alessio Figalli and Federico Glaudo. An Invitation to Optimal Transport, Wasserstein Dis-
tances, and Gradient Flows. 2021.

[HCTC20] Chin-Wei Huang, Ricky TQ Chen, Christos Tsirigotis, and Aaron Courville. Convex poten-
tial flows: Universal probability distributions with optimal transport and convex optimization.
arXiv preprint arXiv:2012.05942, 2020.

[HD05] Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score
matching. Journal of Machine Learning Research, 6(4), 2005.

[HL04] David R Hunter and Kenneth Lange. A tutorial on mm algorithms. The American Statistician,
58(1):30–37, 2004.

[KLG+ 21] Alexander Korotin, Lingxiao Li, Aude Genevay, Justin M Solomon, Alexander Filippov, and
Evgeny Burnaev. Do neural optimal transport solvers work? a continuous wasserstein-2 bench-
mark. Advances in Neural Information Processing Systems, 34:14593–14605, 2021.

[KSB22] Alexander Korotin, Daniil Selikhanovych, and Evgeny Burnaev. Neural optimal transport.
arXiv preprint arXiv:2201.12220, 2022.

[Kur11] Thomas G Kurtz. Equivalence of stochastic equations and martingale problems. In Stochastic
analysis 2010, pages 113–130. Springer, 2011.

[LGL22] Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate
and transfer data with rectified flow. preprint, 2022.

[McC97] Robert J McCann. A convexity principle for interacting gases. Advances in mathematics,
128(1):153–179, 1997.

[MMPS16] Youssef Marzouk, Tarek Moselhy, Matthew Parno, and Alessio Spantini. An introduction to
sampling via measure transport. arXiv e-prints, pages arXiv–1602, 2016.

[MTOL20] Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. Optimal transport
mapping via input convex neural networks. In International Conference on Machine Learning,
pages 6672–6681. PMLR, 2020.

[OPV14] Yann Ollivier, Hervé Pajot, and Cedric Villani. Optimal Transportation: Theory and Applica-
tions. Number 413. Cambridge University Press, 2014.

[PC+ 19] Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport: With applications to data
science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.

[San15] Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-
63):94, 2015.

[SE19] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data dis-
tribution. Advances in Neural Information Processing Systems, 32, 2019.

22
[SME20] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In
International Conference on Learning Representations, 2020.

[SRGB14] Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher. Wasserstein propa-
gation for semi-supervised learning. In International Conference on Machine Learning, pages
306–314. PMLR, 2014.

[SSDK+ 20] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and
Ben Poole. Score-based generative modeling through stochastic differential equations. In
International Conference on Learning Representations, 2020.

[TT16] Giulio Trigila and Esteban G Tabak. Data-driven optimal transport. Communications on Pure
and Applied Mathematics, 69(4):613–648, 2016.

[Vil09] Cédric Villani. Optimal transport: old and new, volume 338. Springer, 2009.

[Vil21] Cédric Villani. Topics in optimal transportation, volume 58. American Mathematical Soc.,
2021.

[Vin11] Pascal Vincent. A connection between score matching and denoising autoencoders. Neural
computation, 23(7):1661–1674, 2011.

A Proofs
Proof of Example 3.5. i) The fact that A⊤ A = I and π0 = π1 = N (0, I) ensures that AX0 ∼ π1 and
hence (X0 , AX0 ) is a coupling of π0 and π1 . Let Xt = tAX0 + (1 − t)X0 be the linear interpolation of the
coupling. Related, we have Ẋt = AX0 − X0 . Canceling X0 yields that

Ẋt = (A − I)(tA + (1 − t)I)−1 Xt , (32)

where we use the fact that tA + (1 − t)I is convertible for t ∈ [0, 1], which we prove as follows: if
tA + (1 − t)I is not invertible, then A must have λ = − 1−t t as one of its eigenvalues; but as a rotation
matrix, all eigenvalues of A must have a norm of 1, which means that we must have t = 0.5 and λ = −1.
This, however, is excluded by the assumption that A is non-reflecting (and hence λ 6= −1).
R1 h i
Equation (32) shows that Ẋt is uniquely determined by Xt . Hence, we have 0 E var(Ẋt |Xt ) dt = 0,
which implies that (X0 , AX0 ) is a straight coupling by Theorem 3.6 of [LGL22].
2) Let c be a second order differentiable convex function whose Hessian matrix ∇2 c(x) is invertible every-
where. Let c∗ be the convex conjugate of c, then c∗ is also second order differentiable and ∇c(∇c∗ (x)) = x,
and ∇2 c∗ (x) = ∇2 c(x)−1 .
If (X0 , AX0 ) is a c-optimal coupling, there must exists a function φ : Rd → R, such that

Ax = x + ∇c∗ (∇φ(x)), ∀x, (33)

where c∗ is the convex conjugate of c. Equation (33) is equivalent to ∇c(Ax − x) = ∇φ(x), which means
that ∇φ is continuously differentiable. Taking gradient on both sides of (33) gives

A − I = H x Bx , Hx = ∇2 c∗ (∇φ(x)), Bx = ∇2 φ(x), (34)

23
where Hx , Bx are both symmetric and Hx is positive definite, and hence Then Hx Bx is a diagonalizable
(all its eigenvalues are real) by Lemma A.1. However, A − I is not diagonalizable because A must have
complex eigenvalues as a non-reflecting, non-identity rotation matrix. Hence, (34) can not hold.

Lemma A.1. Assume that A, B are two real symmetric matrices and A is positive definite. Then AB is
diagonalizable (on the real domain), that is, there exists an invertible matrix P , such that P −1 ABP is a
diagonal matrix.

Proof. This is a standard result in linear algebra. Because A is positive definite, there exists an invertible
symmetric matrix C, such that CC = A. Then, AB = CCB, and it is similar to CBC −1 , which is
symmetric and hence diagonalizable.

Assignment 6 Solutions
100% (1)
Assignment 6 Solutions
7 pages
VBook O&N
No ratings yet
VBook O&N
547 pages
Sun, Ne-Zheng - 1999 - Inverse Problems in Groundwater Modeling
No ratings yet
Sun, Ne-Zheng - 1999 - Inverse Problems in Groundwater Modeling
346 pages
A New Filled Function Method With Two Parameters in A Directional Search
No ratings yet
A New Filled Function Method With Two Parameters in A Directional Search
9 pages
Cedric Villani Optimal Transportation Old&New
No ratings yet
Cedric Villani Optimal Transportation Old&New
998 pages
Ito 2003
No ratings yet
Ito 2003
8 pages
Note Hamiltonian
No ratings yet
Note Hamiltonian
8 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
9 pages
Flow Straight and Fast: Learning To Generate and Transfer Data With Rectified Flow
No ratings yet
Flow Straight and Fast: Learning To Generate and Transfer Data With Rectified Flow
41 pages
Complec Vector Spaces - Larson
No ratings yet
Complec Vector Spaces - Larson
42 pages
21 - ICML - Unbalanced Minibatch Optimal Transport - Applications To Domain Adaptation
No ratings yet
21 - ICML - Unbalanced Minibatch Optimal Transport - Applications To Domain Adaptation
12 pages
An Introduction To Optimal Transport and Wasserstein Gradient Flows
No ratings yet
An Introduction To Optimal Transport and Wasserstein Gradient Flows
19 pages
A Multiscale Approach To Optimal Transport
No ratings yet
A Multiscale Approach To Optimal Transport
19 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Solving Graph Compression Via Optimal Transport: Preprint. Under Review
No ratings yet
Solving Graph Compression Via Optimal Transport: Preprint. Under Review
16 pages
Solutions1 3
No ratings yet
Solutions1 3
15 pages
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
No ratings yet
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
15 pages
Sparsity-Constrained Optimal Transport
No ratings yet
Sparsity-Constrained Optimal Transport
26 pages
Iterative Bregman Projections For Regularized Transportation Problems
No ratings yet
Iterative Bregman Projections For Regularized Transportation Problems
29 pages
Mapping of Coherent Structures in Parameterized Flows by Learning Optimal Transportation With Gaussian Models
No ratings yet
Mapping of Coherent Structures in Parameterized Flows by Learning Optimal Transportation With Gaussian Models
26 pages
Optimal Transport in Learning, Control, and Dynamical Systems
No ratings yet
Optimal Transport in Learning, Control, and Dynamical Systems
25 pages
Meta Optimal Transport
No ratings yet
Meta Optimal Transport
19 pages
Quantitative Convergence of Quadratically Regularized Linear Programs
No ratings yet
Quantitative Convergence of Quadratically Regularized Linear Programs
15 pages
A Reconstruction THE: Convergence of Method For Inverse Conductivity Dobsont
No ratings yet
A Reconstruction THE: Convergence of Method For Inverse Conductivity Dobsont
17 pages
Filled Function
No ratings yet
Filled Function
16 pages
Ipse Paper
No ratings yet
Ipse Paper
17 pages
2506.04732v1
No ratings yet
2506.04732v1
21 pages
Smooth and Sparse Optimal Transport
No ratings yet
Smooth and Sparse Optimal Transport
17 pages
Trajectory Flow Matching
No ratings yet
Trajectory Flow Matching
21 pages
Weidelt 1985 JGeophys
No ratings yet
Weidelt 1985 JGeophys
16 pages
Stabilized Sparse Scaling Algorithms For Entropy Regularized Transport Problems
No ratings yet
Stabilized Sparse Scaling Algorithms For Entropy Regularized Transport Problems
30 pages
Quantitative Stability of Regularized Optimal Transport
No ratings yet
Quantitative Stability of Regularized Optimal Transport
35 pages
Ruta Optima
No ratings yet
Ruta Optima
27 pages
Probabilistic Inverse Optimal Transport: Wei-Ting Chiu and Pei Wang Contribute Equally
No ratings yet
Probabilistic Inverse Optimal Transport: Wei-Ting Chiu and Pei Wang Contribute Equally
18 pages
Quadratically Regularized Optimal Transport
No ratings yet
Quadratically Regularized Optimal Transport
28 pages
mathrm (C) /mathrm (I) /mathrm (O) /mathrm (M) /mathrm (P) /mathrm (U) /mathrm (T)
No ratings yet
mathrm (C) /mathrm (I) /mathrm (O) /mathrm (M) /mathrm (P) /mathrm (U) /mathrm (T)
25 pages
Optimal Transport: Fast Probabilistic Approximation With Exact Solvers
No ratings yet
Optimal Transport: Fast Probabilistic Approximation With Exact Solvers
23 pages
A-A Regularized Robust Design Criterion For Uncertain Data
No ratings yet
A-A Regularized Robust Design Criterion For Uncertain Data
23 pages
Acceleration Flow Sampling
No ratings yet
Acceleration Flow Sampling
25 pages
Gradient Methods For Nonsmooth Problems
No ratings yet
Gradient Methods For Nonsmooth Problems
26 pages
Nueral FLOWS
No ratings yet
Nueral FLOWS
24 pages
2409.06530v3
No ratings yet
2409.06530v3
22 pages
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
No ratings yet
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
29 pages
Transport-Based Analysis, Modeling, and Learning From Signal and Data Distributions
No ratings yet
Transport-Based Analysis, Modeling, and Learning From Signal and Data Distributions
24 pages
Max Invqn v4
No ratings yet
Max Invqn v4
27 pages
1412.4430v1 Geometric Graphs
No ratings yet
1412.4430v1 Geometric Graphs
28 pages
Generalized Sobolev Transport For Probability Measures On A Graph
No ratings yet
Generalized Sobolev Transport For Probability Measures On A Graph
25 pages
Sparse Regularized Optimal Transport With Deformed Q-Entropy
No ratings yet
Sparse Regularized Optimal Transport With Deformed Q-Entropy
27 pages
Algorithms - Problems
No ratings yet
Algorithms - Problems
98 pages
Algebra Resource Sheet
No ratings yet
Algebra Resource Sheet
3 pages
Co CV 220165
No ratings yet
Co CV 220165
23 pages
10th Mathematics EM WWW - Tntextbooks.in
No ratings yet
10th Mathematics EM WWW - Tntextbooks.in
352 pages
New Set-Valued Directional Derivatives: Calculus and Optimality Conditions
No ratings yet
New Set-Valued Directional Derivatives: Calculus and Optimality Conditions
27 pages
1111 Neural Optimal Transport
No ratings yet
1111 Neural Optimal Transport
34 pages
Problems Involving Sets Constant Variables and Algebraic Expressions
No ratings yet
Problems Involving Sets Constant Variables and Algebraic Expressions
5 pages
Module 3 Kinematics in 2 Dimensions
No ratings yet
Module 3 Kinematics in 2 Dimensions
4 pages
NLO Notes
No ratings yet
NLO Notes
75 pages
Multi-Marginal Optimal Transport: Theory and Applications
No ratings yet
Multi-Marginal Optimal Transport: Theory and Applications
29 pages
Question 12th1sthalf
No ratings yet
Question 12th1sthalf
8 pages
An Efficient Second-Order Cone Programming Approach For Dynamic Optimal Transport On Staggered Grid Discretization
No ratings yet
An Efficient Second-Order Cone Programming Approach For Dynamic Optimal Transport On Staggered Grid Discretization
34 pages
2508.02692v1
No ratings yet
2508.02692v1
26 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
236 pages
A. Find The Coordinates of All Maximum and Minimum Points On The Given Interval. Justify Your
No ratings yet
A. Find The Coordinates of All Maximum and Minimum Points On The Given Interval. Justify Your
6 pages
Characteristic Polynomial
No ratings yet
Characteristic Polynomial
5 pages
11rth Mathematics PA1 2024 Commerce
No ratings yet
11rth Mathematics PA1 2024 Commerce
1 page
BreimanLectureNeurIPS2024 Doucet
No ratings yet
BreimanLectureNeurIPS2024 Doucet
56 pages
Schrodinger Bridge Flow NeurIPS24
No ratings yet
Schrodinger Bridge Flow NeurIPS24
58 pages
Equations Reducible To Quadratic Equations: Book 4B Chapter 8
No ratings yet
Equations Reducible To Quadratic Equations: Book 4B Chapter 8
28 pages
Arithmetic Progression: (Key Points)
No ratings yet
Arithmetic Progression: (Key Points)
3 pages
Optimal Transportation and Action Minimizing Measures PDF
No ratings yet
Optimal Transportation and Action Minimizing Measures PDF
251 pages
Quadratic Equations
No ratings yet
Quadratic Equations
3 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
209 pages
C2 A Level Maths Logarithms Questions AQA OCR Edexcel and MEI
No ratings yet
C2 A Level Maths Logarithms Questions AQA OCR Edexcel and MEI
3 pages
MA1002
No ratings yet
MA1002
1 page
Lab 4
No ratings yet
Lab 4
2 pages
Functional Analysis - Proposition 32
No ratings yet
Functional Analysis - Proposition 32
3 pages
Elitmus Solved Previous Placement Paper
No ratings yet
Elitmus Solved Previous Placement Paper
13 pages
Abstract Poster PDF
100% (1)
Abstract Poster PDF
21 pages
Geometric (Clifford) Algebra - 1 - A Visual Introduction
No ratings yet
Geometric (Clifford) Algebra - 1 - A Visual Introduction
24 pages
PDF 24
No ratings yet
PDF 24
24 pages
Solutions To The 58th William Lowell Putnam Mathematical Competition Saturday, December 6, 1997
No ratings yet
Solutions To The 58th William Lowell Putnam Mathematical Competition Saturday, December 6, 1997
3 pages
MMW
No ratings yet
MMW
5 pages
Module 9 Gen Math
No ratings yet
Module 9 Gen Math
4 pages
Unit Commitment Part I: by M Usman Asad
No ratings yet
Unit Commitment Part I: by M Usman Asad
46 pages
Subtracting A Number Is The Same As Adding Its Opposite (Also Called Its Additive Inverse) - To
No ratings yet
Subtracting A Number Is The Same As Adding Its Opposite (Also Called Its Additive Inverse) - To
1 page
Introduction To Matlab and Basic Operations With Signals: Exercises
No ratings yet
Introduction To Matlab and Basic Operations With Signals: Exercises
3 pages

Rectified Flow

Uploaded by

Rectified Flow

Uploaded by

Rectified Flow: A Marginal Preserving Approach to

inf E[c(X1 − X0 )], s.t. Law(X0 ) = π0 , Law(X1 ) = π1 , (1)

where v X : Rd × [0, 1] → Rd is a time-dependent velocity field defined as the solution of

(X0 , X1 ) = c-Rectify((X0 , X1 )) ⇐⇒ (X0 , X1 ) is c-optimal in (1) ⇐⇒ ℓ∗X,c = 0,

2 Background of Optimal Transport

inf E [c(T (X0 ) − X0 )] s.t. Law(T (X0 )) = π1 , Law(X0 ) = π0 , (5)

sup π1 (µ) − π0 (ν) s.t. µ(x1 ) − ν(x0 ) ≤ c(x1 − x0 ), ∀(x0 , x1 ), (6)

E[c(X1 − X0 )] ≥ E[µ(X1 ) − ν(X0 )] = π1 (µ) − π0 (ν). (7)

T (x) = x + ∇c∗ (∇ν(x)),

3 Rectified Flow: An Optimization-Based View

vtX (z) = E[Ẋt | Xt = z]. (11)

∂t ρt + ∇ · (vtX ρt ) = 0, ρ0 = Law(X0 ), t ∈ [0, 1]. (13)

dZt = vtX (Zt )dt, Z0 = X0 , t ∈ [0, 1]. (15)

Z = Rectflow(X) ⇒ vX = vZ ⇒ Law(Xt ) = Law(Zt ), ∀t ∈ [0, 1].

which holds for any convex functions c : Rd → R.

E[c(Z1 − Z0 )] ≤ Fc (Z) ≤ Fc (X) = E[c(X1 − X0 )],

ℓ̃∗X,c := inf L̃X,c (v) = sup Fc (X) − Fc (Y ) s.t. v Y = v X ,

and the optima above are achieved when v = v X and Y = Rectflow(X).

The inequality is tight when v = v X , which attains the minimum of L̃X,c .

This concludes the proof.

4 Differentiable Processes with Equivalent Marginal Laws

Proof. Taking any h in Cc1 (Rd ), we have for t ∈ [0, 1]

This suggests that the marginal law ρt := Law(Xt ) satisfies

ρ̇t + ∇ · (gtX ρt ) = 0. (20)

Similarly, ρ̃t := Law(Yt ) satisfies

If vtX − vtY is Law(Yt )-preserving for ∀t ∈ [0, 1], we have

which is the definition of Y -marginal-preserving following (18).

5.1 c-Rectified Flow of Time-Differentiable Processes X

where mc : Rd × Rd → [0, +∞) is a loss function defined as

bc (x; y) := c(x) − c(y) − ∇c(y)⊤ (x − y).

vtX = ∇c∗ ◦ ∇ftX,c + rtX,c , (27)

inf LX,c (f ) = sup {Fc (X) − Fc (Y ) : Law(Yt ) = Law(Xt ), ∀t ∈ [0, 1]} .

5.2 c-Rectified Flow of Coupling (X0 , X1 )

R 1 measurement Rof1 how close X is to be geodesic: We have Sc (X) ≥ 0

E[c(X1 − X0 )] − E[c(Z1 − Z0 )] = Sc (Z) + LX,c (f X,c ) ≥ 0. (28)

5.3 Fixed Points of c-Rectify are c-Optimal

Proof. Applying (28) to (Z0k , Z1k ) and (Z0k+1 , Z1k+1 ) yields

5.4 c-Rectified Flow as Optimization Algorithms

F (Y ) = min F + (Y | X), and the minimum is attained when X = Y .

In this case, the MM update guarantees that F (X k ) is monotonically non-increasing:

E[c(Y1 − Y0 )] = min {Fc+ ((Y0 , Y1 ) | (X0 , X1 )), s.t. (X0 , X1 ) ∈ Π0,1 },

c-Rectify((X0 , X1 )) ∈ arg min Fc+ ((Y0 , Y1 ) | (X0 , X1 )).

Proof. i) For any coupling (X0 , X1 ) and (Y0 , Y1 ), we have

∂t f˜t (x) + c∗ (∇f˜t (x)) = 0, ∀x ∈ Rd , t ∈ [0, 1], (HJ equation). (31)

∇x ht (x) = ∂t ∇ft (x) + ∇2 ft (xt )∇c∗ (∇ft (x)) = 0.

h̃t (x) := ∂t f˜t (x) + c∗ (∇f˜t (x)) = ht (x) − ht (x0 ) = 0.

where we assume that λt vr ρt decays to zero sufficiently fast at infinity. We have

At the saddle points, the functional derivations of L equal zero, yielding

Ẋt = (A − I)(tA + (1 − t)I)−1 Xt , (32)

Ax = x + ∇c∗ (∇φ(x)), ∀x, (33)

A − I = H x Bx , Hx = ∇2 c∗ (∇φ(x)), Bx = ∇2 φ(x), (34)

You might also like