0% found this document useful (0 votes)

15 views9 pages

Wasserstein Propagation For Semi-Supervised Learning

Uploaded by

mymnaka82125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

Wasserstein Propagation For Semi-Supervised Learning

Uploaded by

mymnaka82125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Wasserstein Propagation for Semi-Supervised Learning

Justin Solomon JUSTIN . SOLOMON @ STANFORD . EDU

Raif M. Rustamov RUSTAMOV @ STANFORD . EDU
Leonidas Guibas GUIBAS @ CS . STANFORD . EDU
Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, California 94305 USA
Adrian Butscher ADRIAN . BUTSCHER @ GMAIL . COM
Max Planck Center for Visual Computing and Communication, Campus E1 4, 66123 Saarbrücken, Germany

Abstract Niyogi (2001); Zhu et al. (2003); Belkin et al. (2006); Zhou
Probability distributions and histograms are nat- & Belkin (2011); Ji et al. (2012) (also see the survey by Zhu
ural representations for product ratings, traffic (2008) and references therein), can be applied bin-by-bin to
measurements, and other data considered in many propagate normalized frequency counts, this strategy does
machine learning applications. Thus, this pa- not model interactions between histogram bins. As a result,
per introduces a technique for graph-based semi- a fundamental aspect of this type of data is ignored, leading
supervised learning of histograms, derived from to artifacts even when propagating Gaussian distributions.
the theory of optimal transportation. Our method Among first works directly addressing semi-supervised
has several properties making it suitable for this learning of probability distributions is Subramanya &
application; in particular, its behavior can be char- Bilmes (2011), which propagates distributions represent-
acterized by the moments and shapes of the his- ing class memberships. Their loss function, however, is
tograms at the labeled nodes. In addition, it can be based on Kullback-Leibler divergence, which cannot cap-
used for histograms on non-standard domains like ture interactions between histogram bins. Talukdar & Cram-
circles, revealing a strategy for manifold-valued mer (2009) allow interactions between bins by essentially
semi-supervised learning. We also extend this modifying the underlying graph to its tensor product with a
technique to related problems such as smoothing prescribed bin interaction graph; this approach loses prob-
distributions on graph nodes. abilistic structure and tends to oversmooth. Similar issues
have been encountered in the mathematical literature (Mc-
Cann, 1997; Agueh & Carlier, 2011) and in vision/graphics
1. Introduction applications (Bonneel et al., 2011; Rabin et al., 2012) involv-
Graph-based semi-supervised learning is an effective ap- ing interpolating probability distributions. Their solutions
proach for learning problems involving a limited amount attempt to find weighted barycenters of distributions, which
of labeled data (Singh et al., 2008). Methods in this class is insufficient for propagating distributions along graphs.
typically propagate labels from a subset of nodes of a graph The goal of our work is to provide an efficient and theoreti-
to the rest of the nodes. Usually each node is associated cally sound approach to graph-based semi-supervised learn-
with a real number, but in many applications labels are more ing of probability distributions. Our strategy uses the ma-
naturally expressed as histograms or probability distribu- chinery of optimal transportation (Villani, 2003). Inspired
tions. For instance, the traffic density at a given location by (Solomon et al., 2013), we employ the two-Wasserstein
can be seen as a histogram over the 24-hour cycle; these distance between distributions to construct a regularizer
densities may be known only where a service has cameras measuring the “smoothness” of an assignment of a proba-
installed but need to be propagated to the entire map. Prod- bility distribution to each graph node. The final assignment
uct ratings, climatic measurements, and other data sources is produced by optimizing this energy while fitting the his-
exhibit similar structure. togram predictions at labeled nodes.
While methods for numerical labels, such as Belkin & Our technique has many notable properties. As certainty in
st
Proceedings of the 31 International Conference on Machine the known distributions increases, it reduces to the method
Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy- of label propagation via harmonic functions (Zhu et al.,
right 2014 by the author(s). 2003). Also, the moments and other characteristics of the
Wasserstein Propagation

propagated distributions are well-characterized by those

of the labeled nodes at minima of our smoothness energy.
Our approach does not restrict the class of the distributions
provided at labeled nodes, allowing for bi-modality and
other non-Gaussian properties. Finally, we prove that under
an appropriate change of variables our objective can be
minimized using a fast linear solve.
Figure 1. Propagating prescribed probability distributions (in red)
Overview We first motivate the problem of propagating to interior nodes of path graph identified with the interval [0, 1]:
distributions along graphs and show why naı̈ve techniques (a) naive approach; (b) statistical approach; (c) desirable output.
are ineffective (§2). Given this setup, we develop the Wasser-
stein propagation technique (§3) and discuss its theoretical • The spread of the propagated distributions should be
properties (§3.1). We also show how it can be used to related to the spread of the prescribed distributions.
smooth distribution-valued maps from graphs (§3.2) and
extend it to more general domains (§4). Finally, after provid- • As the prescribed distributions in V0 become peaked
ing algorithmic details (§5) we demonstrate our techniques (concentrated around the mean), the propagated dis-
on both synthetic (§6.1) and real-world (§6.2) data. tributions should become peaked around the values
obtained by propagating means of prescribed distribu-
tions via label propagation (e.g. Zhu et al. (2003)).
2. Preliminaries and Motivation
2.1. Label Propagation on Graphs • The computational complexity of distribution propaga-
tion should be similar to that of scalar propagation.
We consider generalization of the problem of label prop-
agation on a graph G = (V, E). Suppose a label func- The simplest method for propagating probability distribu-
tion f is known on a subset of vertices V0 ⊆ V , and we tions is to extend Zhu et al. (2003) naı̈vely. For each x ∈ R,
wish to extend f to the remainder V \V0 . The classical we can view ρv (x) as a label at v ∈ V and solve the Dirich-
approach of ZhuPet al. (2003) minimizes the Dirichlet en- let problem ∆ρv (x) = 0 with ρv0 (x) prescribed for all
ergy ED [f ] := (v,w)∈E ωe (fv − fw )2 over the space of v ∈ V0 . The resulting functions ρv (x) are distributions be-
functions taking the prescribed values on V0 . Here ωe is
´ maximum principle guarantees ρv (x) ≥ 0 for all
cause the
the weight associated to the edge e = (v, w). ED is a x and R ρv (x) dx = 1 for all v ∈ V since these properties
measure of smoothness; therefore the minimizer matches hold at the boundary (Chung et al., 2007).
the prescribed labels with minimal variation in between.
Minimizing this quadratic objective is equivalent to solv- It is easy to see, however, that this method has short-
ing ∆f = 0 on V \V0 for an appropriate positive definite comings. For instance, consider the case where G is
Laplacian matrix ∆ (Chung & Yau, 2000). Solutions of this a path graph representing the segment [0, 1] and the la-
system are well-known to enjoy many regularity properties, beled vertices are the endpoints, V0 = {0, 1}. In this
making it a sound choice for smooth label propagation. case, the naı̈ve approach results in the linear interpolation
ρt (x) := (1 − t)ρ0 (x) + tρ1 (x) at all intermediate graph
vertices for t ∈ (0, 1). The propagated distributions are
2.2. Propagating Probability Distributions
thus bimodal as in Figure 1a. Given our criteria, however,
Suppose, however, that each vertex in V0 is decorated with we would prefer an interpolation result closer to Figure 1c,
a probability distribution rather than a real number. That which causes the peak in the boundary data simply to slide
is, for each v ∈ V0 , we are given a probability distribution from left to right without introducing variance as t changes.
ρv ∈ Prob(R). Our goal now is to propagate these distri-
An alternative strategy for propagating probability distribu-
butions to the remaining vertices, generating a distribution-
tions over V given boundary data on V0 is to use a statistical
valued map ρ : v ∈ V 7→ ρv ∈ Prob(R) associating a
approach. We could repeatedly draw an independent sam-
probability distribution with every vertex ´ v ∈ V . It must ple from each distribution in {ρv : v ∈ V0 } and propagate
satisfy ρv (x) ≥ 0 for all x ∈ R and R ρv (x) dx = 1.
the resulting scalars using a classical approach; binning the
In §4 we consider the generalized case ρ : V → Prob(Γ)
results of these repeated experiments provides a histogram-
for alternative domains Γ including subsets of Rn ; most of
style distribution at each vertex in V . This strategy has
the statements we prove about maps into Prob(R) extend
a similar shortcomings to the naı̈ve approach above. For
naturally to this setting with suitable technical adjustments.
instance, in the path graph example, the interpolated distri-
In the applications we consider, such a propagation process bution is trimodal as in Figure 1b, with nonzero probability
should satisfy a number of properties: at both endpoints and for some v in the interior of V .
Wasserstein Propagation

Of course, the desiderata above are application-specific. WASSERSTEIN P ROPAGATION

One key assumption is that the spread of the distributions
Minimize ED [ρ] in the space of distribution-valued
is preserved, which differs from existing approaches which
maps with prescribed distributions at all v ∈ V0 .
tend to blur the distributions. While this property is not
intrinsically superior, in a way the experiments in §6 validate
not only the algorithmic effectiveness of our technique but 3.1. Theoretical Properties
also this assumption about probabilistic data on graphs. Solutions of the Wasserstein propagation problem satisfy
many desirable properties that we will establish below. Be-
3. Wasserstein Propagation fore proceeding, however, we recall a fact about the Wasser-
stein distance. Let ρ ∈ Prob(R) be a probability distribution.
Ad hoc methods for propagating distributions based on meth- Then its cumulative distribution function (CDF) is given by
´x
ods for scalar functions tend to have a number of drawbacks. F (x) := −∞ ρ(y) dy, and the generalized inverse of the
Therefore, we tackle this problem using a technique de- its CDF is given by F −1 (s) := inf{x ∈ R : F (x) > s}.
signed explicitly for the probabilistic setting. To this end, Then the following result holds.
we formulate the semi-supervised problem at hand as the
optimization of a Dirichlet energy for distribution-valued Proposition 1. [Villani (2003), Theorem 2.18] Let ρ0 , ρ1 ∈
maps generalizing the classical Dirichlet energy. Prob(R) with CDFs F0 , F1 . Then

Similar to the construction in (Subramanya & Bilmes, 2011), ˆ 1

we replace the square distance between scalar function val- W22 (ρ0 , ρ1 ) = (F1−1 (s) − F0−1 (s))2 ds . (2)
0
ues appearing in the classical Dirichlet energy (namely the
quantity |fv − fw |2 ) with an appropriate distance between
By applying (2) to the minimization problem (1), we obtain
the distributions ρv and ρw . Rather than using the bin-by-bin
a linear strategy for our propagation problem.
KL divergence, however, we use the Wasserstein distance
with quadratic cost between probability distributions with Proposition 2. Wasserstein propagation can be character-
finite second moment on R. This distance is defined as ized in the following way. For each v ∈ V0 let Fv be the
¨ 1/2 CDF of the distribution ρv . Now suppose that for each
W2 (ρv , ρw ) := inf |x − y|2 dπ(x, y) s ∈ [0, 1] we determine gs : V → R as the solution of the
π∈Π(ρv ,ρw ) R2 classical Dirichlet problem

where Π(ρ0 , ρ1 ) ⊆ Prob(R2 ) is the set of probability distri- ∆gs = 0 ∀ v ∈ V \ V0

butions π on R2 satisfying the marginal constraints (3)
gs (v) = Fv−1 (s) ∀ v ∈ V0 .
ˆ 1 ˆ 1
π(x, y) dx = ρw (y) and π(x, y) dy = ρv (x) . Then for each v, the function s 7→ gs (v) is the inverse CDF
0 0 of a probability distribution ρv . Moreover, the distribution-
The Wasserstein distance is a well-known distance metric valued map v 7→ ρv minimizes the Dirichet energy (1).
for probability distributions, sometimes called the quadratic
Earth Mover’s Distance, and is studied in the field of optimal Proof. Let X be the set of functions g : V × [0, 1] → R
transportation. It measures the optimal cost of transporting satisfying the constraints gs (v) = Fv−1 (s) for all s ∈ [0, 1]
one distribution to another, given that the cost of transporting and all v ∈ V0 . Consider the minimization problem
a unit amount of mass from x to y is |x − y|2 . W2 (ρv , ρw )
X ˆ 1
takes into account not only the values of ρv and ρw but
min ÊD (g) := (gs (u) − gs (v))2 ds .
also the ground distance in the sample space R. It already g∈X 0
(u,v)∈E
has shown promise for search and clustering techniques (Ir-
pino et al., 2011; Applegate et al., 2011) and interpolation The solution of this optimization for each s is exactly a solu-
problems in graphics and vision (Bonneel et al., 2011). tion of the classical Dirichlet problem (3) on G. Moreover,
With these ideas in place, we define a Dirichlet energy for a the maximum principle implies that gs (v) ≤ gs′ (v) when-
distribution-valued map from a graph into Prob(R) by ever s < s′ , which holds by definition for all v ∈ V0 , can be
extended to all v ∈ V (Chung et al., 2007). Hence gs (v) can
be interpreted as an inverse CDF for each v ∈ V form which
X
ED [ρ] := W22 (ρv , ρw ) , (1)
(v,w)∈E we can define a distribution-valued map ρ : v 7→ ρv . Since
ÊD takes on its minimum value in the subset of X consisting
along with the notion of Wasserstein propagation of of inverse CDFs, and ÊD coincides with ED on this set, ρ is
distribution-valued maps given prescribed boundary data. a solution of the Wasserstein propagation problem.
Wasserstein Propagation

Distribution-valued maps ρ : V → Prob(R) propagated by of the classical Dirichlet problem and the Wasserstein prop-
optimizing (1) satisfy many analogs of functions extended agation problem coincide in the following way. Suppose that
using the classical Dirichlet problem. Two results of this f : V → R satisfies the classical Dirichlet problem with
kind concern the mean m(v) and the variance σ(v) of the boundary data u. Then ρv (x) := δ(x − f (v)) minimizes (1)
distributions ρv as functions of V . These are defined as subject to the fixed boundary constraints.
ˆ ∞
m(v) := xρv (x) dx Proof. The boundary data for ρ given here yields the bound-
−∞ ary data gs (v) = u(v) for all v ∈ V0 and s ∈ [0, 1) in
ˆ ∞
σ 2 (v) := (x − m(v))2 ρv (x) dx . the Dirichlet problem (3). The solution of this Dirichlet
−∞ problem is thus also constant in s, let us say gs (v) = f (v)
for all s ∈ [0, 1) and v ∈ V . The only distributions whose
Proposition 3. Suppose the distribution-valued map ρ : inverse CDFs are of this form are δ-distributions; hence
V → Prob(R) is obtained using Wasserstein propagation. ρv (x) = δ(x − f (v)) as desired.
Then for all v ∈ V the following estimates hold.
• inf v0 ∈V0 m(v0 ) ≤ m(v) ≤ supv0 ∈V0 m(v0 ). 3.2. Application to Smoothing

• 0 ≤ σ(v) ≤ supv0 ∈V0 σ(v0 ). Using the connection to the classical Dirichlet problem in
Proposition 2 we can extend our treatment to other dif-
Proof. Both estimates can be derived from the following ferential equations. There is a large space of differential
formula. Let ρ ∈ Prob(R) and let φ : R → R be any equations that have been adapted to graphs via the discrete
integrable function. If we apply the change of variables Laplacian ∆; here we focus on the heat equation, considered
s = F (x) where F is the CDF of ρ in the integral defining e.g. in Chung et al. (2007).
the expectation value of φ with respect to ρ, we get The heat equation for scalar functions is applied to smooth-
ˆ ∞ ˆ 1
ing problems; for example, in Rn solving the heat equation
φ(x)ρ(x) dx = φ(F −1 (s)) ds . is equivalent to Gaussian convolution. Just as the Dirichlet
−∞ 0 equation on F −1 is equivalent to Wasserstein propagation,
´1 ´1 heat diffusion on F −1 is equivalent to gradient flows of
Thus m(v) = 0 Fv−1 (s) ds and σ 2 (v) = 0 (Fv−1 (s) − the energy ED in (1), providing a straightforward way to
m(v))2 ds where Fv is the CDF of ρv for each v ∈ V . understand and implement such a diffusive process.
Assume ρ minimizes (1) with fixed boundary constraints Proposition 5. Let ρ : V → Prob(R) be a distribution-
on V0 . By Proposition 2, we then have ∆Fv−1 = 0 for all valued map and let Fv : [0, 1] → R be the CDF of ρv for
´1
v ∈ V . Therefore ∆m(v) = 0 ∆Fv−1 (s) ds = 0, so m is each v ∈ V . Then these two procedures are equivalent:
a harmonic function on V . The estimates for m follow by
• Mass-preserving flow of ρ in the direction of steepest
the maximum principle for harmonic functions. Also,
descent of the Dirichlet energy.
ˆ 1
∆[σ 2 (v)] = ∆(Fv−1 (s) − m(v))2 ds • Heat flow of the inverse CDFs.
0
ˆ 1 2 Proof. A mass-preserving flow of ρ is a family of
X
= a(v, s) − a(v ′ , s) ds
(v,v ′ )∈E 0 distribution-valued maps ρε : V → Prob(R) with ε ∈
(−ε0 , ε0 ) that satisfies the equations
≥0 — where a(v, s) := Fv−1 (s) − m(v),
∂ρv,ε (t)

∂
since ∆Fv−1 (s) = ∆m(v) = 0. Thus σ 2 is a subharmonic + Yv (ε, t)ρv,ε (t) = 0 
∂ε ∂t ∀v ∈ V
function and the upper bound for σ 2 follows by the maxi-
ρv,0 (t) = ρv (t)

mum principle for subharmonic functions.
where Yv : (−ε0 , ε0 ) × R → R is an arbitrary function
Finally, we check that if we encode a classical interpola- that governs the flow. By applying the change of variables
tion problem using Dirac delta distributions, we recover −1
t = Fv,ε (s) using the inverse CDFs of the ρv,ε , we find that
the classical solution. The essence of this result is that this flow is equivalent to the equations
if the boundary data for Wasserstein propagation has zero
variance, then the solution must also have zero variance. −1
∂Fv,ε (s)

−1
= Yv (ε, Fv,ε (s))

Proposition 4. Suppose that there exists u : V0 → R such ∂ε ∀v ∈ V .
that ρv (x) = δ(x−u(v)) for all v ∈ V0 . Then, the solutions −1 −1
Fv,0 (s) = Fv (s)

Wasserstein Propagation

A short calculation starting from (1) now leads to the deriva- ρvi ≥ 0 ∀v ∈ V, i ∈ S xij ≥ 0 ∀i, j ∈ S
tive of the Dirichlet energy under such a flow, namely
where S = {1, . . . , m}.
dED (ρε ) Xˆ 1
−1 −1
= −2 ∆(Fv,ε ) · Yv (ε, Fv,ε (s)) ds .
dε 0 v∈V 5. Algorithm Details
Thus, steepest descent for the Dirichlet energy is achieved We handle the general case from §4 by optimizing the linear
−1
by choosing Yv (ε, Fv,ε (s)) := ∆(Fv,ε (s)) for each v, ε, s. programming formulation directly. Given the size of these
−1
As a result, the equation for the evolution of Fv,ε becomes linear programs, we use large-scale barrier method solvers.
−1
∂Fv,ε (s)

−1 The characterizations in Propositions 2 and 5, however, sug-
= ∆(Fv,ε (s))

∂ε ∀v ∈ V gest a straightforward discretization and accompanying set
−1
Fv,0 (s) = Fv−1 (s)
 of optimization algorithms in the linear case. In fact, we
can recover propagated distributions by inverting the graph
−1
which is exactly heat flow of Fv,ε . Laplacian ∆ via a sparse linear solve, leading to near-real-
time results for moderately-sized graphs G.
4. Generalization For a given graph G = (V, E) and subset V0 ⊆ V , we
discretize the domain [0, 1] of Fv−1 for each v using a set
Our preceding discussion involves distribution-valued maps
of evenly-spaced samples s0 = 0, s1 , . . . , sm = 1. This
into Prob(R), but in a more general setting we might wish
representation supports any ρv provided it is possible to
to replace Prob(R) with Prob(Γ) for an alternative domain
sample the inverse CDF from Proposition 1 at each si . In
Γ carrying a distance metric d. Our original formulation
particular, when the underlying distributions are histograms,
of Wasserstein propagation easily handles such an exten-
we model ρv using δ functions at evenly-spaced bin cen-
sion by replacing |x − y|2 with d(x, y)2 in the definition of
ters, which have piecewise constant CDFs; we model con-
W2 . Furthermore, although proofs in this case are consider-
tinuous ρv using piecewise linear interpolation. Regard-
ably more involved, some key properties proved above for
less, in the end we obtain a non-decreasing set of samples
Prob(R) extend naturally.
(F −1 )1v , . . . , (F −1 )m
v with (F
−1 1
)v = 0 and (F −1 )m
v = 1.
In this case, we no longer can rely on the computational
Now that we have sampled Fv−1 for each v ∈ V0 , we can
benefits of Propositions 2 and 5 but can solve the propaga-
propagate to the remainder V \V0 . For each i ∈ {1, . . . , m},
tion problem directly. If Γ is discrete, then Wasserstein dis-
we solve the system from (3):
tances between ρv ’s can be computed using a linear program.
Suppose we represent two histograms P as {a1 , . . P
. , am } and ∆g = 0 ∀ v ∈ V \ V0
{b1 , . . . , bm } with ai , bi ≥ 0 ∀i and i ai = i bi = 1. (5)
Then, the definition of W2 yields the optimization: g(v) = (F −1 )iv ∀ v ∈ V0 .
X
W22 ({ai }, {bj }) = min d2ij xij (4) In the diffusion case, we replace this system with implicit
ij time stepping for the heat equation, iteratively applying
X X (I − t∆)−1 to g for diffusion time step t. In either case, the
s.t. xij = ai ∀i xij = bj ∀j xij ≥ 0 ∀i, j
linear solve is sparse, symmetric, and positive definite; we
j i
apply Cholesky factorization to solve the systems directly.
Here dij is the distance from bin i to bin j, which need not
This process propagates F −1 to the entire graph, yielding
be proportional to |i − j|.
samples (F −1 )iv for all v ∈ V . We invert once again to
From this viewpoint, the energy ED from (1) remains convex yield samples ρiv for all v ∈ V . Of course, each inversion
in ρ and can be optimized using a linear program simply by incurs some potential for sampling and discretization error,
summing terms of the form (4) above: but in practice we are able to oversample sufficiently to
XX (e)
overcome most potential issues. When the inputs ρv are
min d2ij xij discrete histograms, we return to this discrete representation
ρ,x
e∈E ij by integrating the resulting ρv ∈ Prob([0, 1]) over the width
(e)
X
s.t. xij = ρvi ∀e = (v, w) ∈ E, i ∈ S of the bin about the center defined above.
j
This algorithm is efficient even on large graphs and is easily
(e)
X
xij = ρwj ∀e = (v, w) ∈ E, j ∈ S parallelizable. For instance, the initial sampling steps for
i obtaining F −1 from ρ are parallelizable over v ∈ V0 , and
the linear solve (5) can be parallelized over samples i. Direct
X
ρvi = 1 ∀v ∈ V ρvi fixed ∀v ∈ V0
i
solvers can be replaced with iterative solvers for particularly
Wasserstein Propagation

(a)

Figure 2. Comparison of propagation strategies on a linear graph

(coarse version on left); each horizontal slice represents a vertex
v ∈ V , and the colors from left to right in a slice show ρv . (Sub-
ramanya & Bilmes, 2011) (KL) is shown only in one example
(b) (c)
because it has qualitatively similar behavior to the PDF strategy.
Figure 3. PDF (b) and Wasserstein (c) propagation on a meshed
circle with prescribed boundary distributions (a). The underlying
large graphs G; regardless, the structure of such a solve is graph is shown in grey, and probability distributions at vertices
well-understood and studied, e.g. in Krishnan et al. (2013). v ∈ V are shown as vertical bars colored by the density ρv ; we
invert the color scheme of Figures 2 and 4 to improve contrast.
6. Experiments Propagated distributions in (b) and (c) are computed for all vertices
but for clarity are shown at representative slices of the circle.
We run our scheme through a number of tests demonstrating
its strengths and weaknesses compared to other potential
methods for propagation. We compare Wasserstein propaga-
tion with the strategy of propagating probability distribution
functions (PDFs) directly, as described in §2.2.

6.1. Synthetic Tests

(a) (b)
We begin by considering the behavior of our technique on
synthetic data designed to illustrate its various properties. Figure 4. Comparison of PDF diffusion (a) and Wasserstein dif-
One-Dimensional Examples Figure 2 shows “displace- fusion (b); in both cases the leftmost distribution comprises the
initial conditions, and several time steps of diffusion are shown
ment interpolation” properties inherited by our propagation
left-to-right. The underlying graph G is the circle on the left.
technique from the theory of optimal transportation. The
underlying graph is a line as in Figure 1, along the vertical
axis. Horizontally, each image is colored by values in ρv . unit circle, and we propagate ρv from fixed distributions
on the boundary. Unlike the classical case, however, our
The bottom and top vertices v0 and v1 have fixed distribu-
prescribed boundary distributions ρv are multimodal. Once
tions ρv0 and ρv1 , and the remaining vertices receive ρv
again, Wasserstein propagation recovers a smoothly-varying
via one of two propagation techniques. The left of each
set of distributions whose peaks behave like solutions to
pair propagates distributions by solving a classical Dirichlet
the classical Dirichlet problem. Propagating probability di-
problem independently for each bin of the probability dis-
rections rather than inverse CDFs yields somewhat similar
tribution function (PDF) ρv , whereas the right of each pair
modes, but with much higher entropy and variance espe-
propagates inverse CDFs using our method in §5.
cially at the center of the circle.
By examining the propagation behavior from the bottom to
the top of this figure, it is easy to see how the naı̈ve PDF Diffusion Figure 4 illustrates the behavior of Wasserstein
method varies from Wasserstein propagation. For instance, diffusion compared with simply diffusing distribution val-
in the leftmost example both ρv0 and ρv1 are unimodal, yet ues directly. When PDF values are diffused directly, as time
when propagating PDFs all the intermediate vertices have t increases the distributions simply become more and more
bimodal distributions; furthermore, no relationship is deter- smooth until they are uniform not only along G but also as
mined between the two peaks. Contrastingly, our technique distributions on Prob([0, 1]). Contrastingly, Wasserstein dif-
identifies the modes of ρv0 and ρv1 , linearly moving the fusion preserves the uncertainty from the initial distributions
peak from one side to the other. but does not increase it as time progresses.

Boundary Value Problems Figure 3 illustrates our algo- Alternative Target Domain Figure 5 shows an example
rithm on a less trivial graph G. To mimic a typical test case in which the target is Prob(S1 ), where S1 is the unit cir-
for classical Dirichlet problems, our graph is a mesh of the cle, rather than Prob([0, 1]). We optimize the ED using the
Wasserstein Propagation

(c) of ρv from PDF and Wasserstein propagation. Both

yield similar mean temperatures on V \V0 , which agree with
the means of the ground truth data. The standard devi-
ations, however, better illustrate differences between the
(a) (b) approaches. In particular, the standard deviations of the
Wasserstein-propagated distributions approximately follow
Figure 5. Interpolation of distributions on S1 via (a) PDF propaga- those of the ground truth histograms, whereas the PDF strat-
tion and (b) Wasserstein propagation; in these figures the vertices egy yields high standard deviations nearly everywhere on
with valence 1 have prescribed distributions ρv and the remaining the map due to undesirable smoothing effects.
vertices have distributions from propagation.
Wind Directions We apply the general formulation in §4
to propagating distributions on the unit circle S1 by consid-
linear program in §4 rather than the linear algorithm for ering histograms of wind directions collected over time by
Prob([0, 1]). Conclusions from this example are similar nodes on the ocean outside of Australia.2
to those from Figure 3: Wasserstein propagation identifies
peaks from different prescribed boundary distributions with- In this experiment, we keep approximately 4% of the data
out introducing variance, while PDF propagation exhibits points and propagate to the remaining vertices. Both the
much higher variance in the interpolated distributions and PDF and Wasserstein propagation strategies score similarly
does not “move” peaks from one location to another. with respect to our error metric; in the experiment shown,
Wasserstein propagation exhibits 6.6% average error per
6.2. Real-World Data node and PDF propagation exhibits 6.1% average error per
node. Propagation results are illustrated in Figure 7a.
We now evaluate our techniques on real-world input. To
evaluate the quality of our approach relative to ground truth, The nature of the error from the two strategies, however, is
we will use the one-Wasserstein distance, or Earth Mover’s quite different. In particular, Figure 7b shows the same map
Distance (Rubner et al., 2000), formulated by removing the colored by the entropy of the propagated distributions. PDF
square in the formula for W22 . We use this distance, given on propagation exhibits high entropy away from the prescribed
Prob(R) by the L1 distance between (non-inverted) CDFs, vertices, reflecting the fact that the propagated distributions
because it does not favor the W2 distance used in Wasser- at these points approach uniformity. Wasserstein propaga-
stein propagation while taking into account the ground dis- tion, on the other hand, has a more similar pattern of entropy
tances. We consider weather station coordinates as defining to that of the ground truth data, reflecting structure like that
a point cloud on the plane and compute the point cloud demonstrated in Proposition 3.
Laplacian using the approach of (Coifman & Lafon, 2006).
Non-Euclidean Interpolation Proposition 4 suggests an
Temperature Data Figure 6 illustrates the results of a application outside histogram propagation. In particular, if
series of experiments on weather data on a map of the United the vertices of V0 have prescribed distributions that are δ
States.1 Here, we have |V | = 1113 sites each collecting functions encoding individual points as mapping targets, all
daily temperature measurements, which we classify into propagated distributions also will be δ functions. Thus, one
100 bins at each vertex. In each experiment, we choose a strategy for interpolation is to encode the problem proba-
subset V0 ⊆ V of vertices, propagate the histograms from bilistically using δ distributions, interpolate using Wasser-
these vertices to the remainder of V , and measure the error stein propagation, and then extract peaks of the propagated
between the propagated and ground-truth histograms. distributions. Experimentally we find that optima of the
linear program in §4 with peaked prescribed distributions
Figure 6a shows quantitative results of this experiment. Here yield peaked distributions ρv for all v ∈ V even when the
we show the average histogram error per vertex as a func- target is not Prob(R); we leave a proof for future work.
tion of the percent of nodes in V with fixed labels; the fixed
vertices are chosen randomly, and errors are averaged over In Figure 8, we apply this strategy to interpolating angles on
20 trials for each percentage. The Wasserstein strategy con- S1 from a single day of wind data on a map of Europe.3 Clas-
sistently outperforms naı̈ve PDF interpolation with respect sical Dirichlet interpolation fails to capture the identification
to our error metric and approaches relatively small error of angles 0 and 2π. Contrastingly, if we encode the bound-
with as few as 5% of the labels fixed. ary conditions as peaked distributions on Prob(S1 ), we can
interpolate using Wasserstein propagation without losing
Figures 6b and 6c show results for a single trial. We color structure. The resulting distributions are peaked about a sin-
the vertices v ∈ V by the mean (b) and standard deviation 2
WindSat Remote Sensing Systems
1 3
National Climatic Data Center Carbon Dioxide Information Analysis Center
Wasserstein Propagation

(b)

(c)

(a)

Figure 6. We propagate histograms of temperatures collected over time to a map of the United States: (a) Average error at propagated sites
as a function of the number of nodes with labeled distributions; (b) means of the histograms at the propagated sites from a typical trial in
(a); (c) standard deviations at the propagated sites. Vertices with prescribed distributions are shown in blue and comprise ∼ 2% of V .

Ground truth PDF Wasserstein Ground truth PDF Wasserstein

(a) Histograms of wind directions (b) Entropy

Figure 7. (a) Interpolating histograms of wind directions using the PDF and Wasserstein propagation methods, illustrated using the same
scheme as Figure 5; (b) entropy values from the same distributions.

such as the surface mapping problem in Solomon et al.

(2013). Such an optimization, however, has O(m2 |E|) vari-
ables, which is intractable for dense or large graphs. An
open theoretical problem might be to reduce the number of
variables asymptotically. Some simplifications may also be
Ground truth PDF (19%) Wasserstein (15%) afforded using approximations like (Pele & Werman, 2009),
which simplify the form of dij at the cost of complicating
Figure 8. Learning wind directions on the unit circle S1 . theoretical analysis and understanding of optimal distribu-
tions ρv . Alternatively, work such as (Rabin et al., 2011)
suggests the potential to formulate efficient algorithms when
gle maximum, so we extract a direction field as the mode of replacing Prob([0, 1]) with Prob(S1 ) or other domains with
each ρv . Despite noise in the dataset we achieve 15% error special structure.
rather than the 19% error obtained by classical Dirichlet
interpolation of angles disregarding periodicity. In the end, our proposed algorithms are equally as
lightweight as less principled alternatives, while exhibit-
ing practical performance, theoretical soundness, and the
7. Conclusion possibility of extension into several alternative domains.
It is easy to formulate strategies for histogram propagation
by applying methods for propagating scalar functions bin-
by-bin. Here, however, we have shown that propagating
instead inverse CDFs has a deep connections to the theory of
optimal transportation and provides superior results, making
Acknowledgments The authors gratefully acknowledge
it a strong yet still efficient choice. This basic connection
the support of NSF grants CCF 1161480 and DMS 1228304,
gives our method theoretical and practical soundness that is
AFOSR grant FA9550-12-1-0372, a Google research award,
difficult to guarantee otherwise.
the Max Planck Center for Visual Computing and Commu-
While our algorithms show promise as practical techniques, nications, the National Defense Science and Engineering
we leave many avenues for future study. Most prominently, Graduate Fellowship, the Hertz Foundation Fellowship, and
the generalization in §4 can be applied to many problems, the NSF GRF program.
Wasserstein Propagation

References Rabin, Julien, Peyre, Gabriel, Delon, Julie, and Bernot,

Marc. Wasserstein barycenter and its application to
Agueh, M. and Carlier, G. Barycenters in the Wasserstein
texture mixing. volume 6667 of LNCS, pp. 435–446.
space. J. Math. Anal., 43(2):904–924, 2011. 1
Springer, 2012. 1
Applegate, David, Dasu, Tamraparni, Krishnan, Shankar,
Rubner, Yossi, Tomasi, Carlo, and Guibas, Leonidas. The
and Urbanek, Simon. Unsupervised clustering of multi-
earth mover’s distance as a metric for image retrieval.
dimensional distributions using earth mover distance. In
IJCV, 40(2):99–121, November 2000. 6.2
KDD, pp. 636–644, 2011. 3
Singh, Aarti, Nowak, Robert D., and Zhu, Xiaojin. Unla-
Belkin, Mikhail and Niyogi, Partha. Laplacian eigenmaps
beled data: Now it helps, now it doesn’t. In NIPS, pp.
and spectral techniques for embedding and clustering. In
1513–1520, 2008. 1
NIPS, pp. 585–591, 2001. 1
Solomon, Justin, Guibas, Leonidas, and Butscher, Adrian.
Belkin, Mikhail, Niyogi, Partha, and Sindhwani, Vikas.
Dirichlet energy for analysis and synthesis of soft maps.
Manifold regularization: A geometric framework for
Comp. Graph. Forum, 32(5):197–206, 2013. 1, 7
learning from labeled and unlabeled examples. JMLR, 7:
2399–2434, December 2006. 1 Subramanya, Amarnag and Bilmes, Jeff. Semi-supervised
learning with measure propagation. JMLR, 12:3311–
Bonneel, Nicolas, van de Panne, Michiel, Paris, Sylvain, and
3370, 2011. 1, 3, 2
Heidrich, Wolfgang. Displacement interpolation using
Lagrangian mass transport. Trans. Graph., 30(6):158:1– Talukdar, Partha Pratim and Crammer, Koby. New regular-
158:12, December 2011. 1, 3 ized algorithms for transductive learning. ECML-PKDD,
5782:442–457, 2009. 1
Chung, Fan and Yau, S.-T. Discrete Green’s functions. J.
Combinatorial Theory, 91(1–2):191–214, 2000. 2.1 Villani, Cédric. Topics in Optimal Transportation. Graduate
Studies in Mathematics. AMS, 2003. 1, 1
Chung, Soon-Yeong, Chung, Yun-Sung, and Kim, Jong-Ho.
Diffusion and elastic equations on networks. Pub. RIMS, Zhou, Xueyuan and Belkin, Mikhail. Semi-supervised learn-
43(3):699–726, 2007. 2.2, 3.1, 3.2 ing by higher order regularization. ICML, 15:892–900,
2011. 1
Coifman, Ronald R. and Lafon, Stéphane. Diffusion maps.
Applied and Computational Harmonic Anal., 21(1):5–30, Zhu, Xiaojin. Semi-supervised learning literature survey.
2006. 6.2 Technical Report 1530, Computer Sciences, University
of Wisconsin-Madison, 2008. 1
Irpino, Antonio, Verde, Rosanna, and de A.T. de Car-
valho, Francisco. Dynamic clustering of histogram data Zhu, Xiaojin, Ghahramani, Zoubin, and Lafferty, John D.
based on adaptive squared Wasserstein distances. CoRR, Semi-supervised learning using Gaussian fields and har-
abs/1110.1462, 2011. 3 monic functions. pp. 912–919, 2003. 1, 2.1, 2.2
Ji, Ming, Yang, Tianbao, Lin, Binbin, Jin, Rong, and Han,
Jiawei. A simple algorithm for semi-supervised learning
with improved generalization error bound. In ICML, 2012.
1

Krishnan, Dilip, Fattal, Raanan, and Szeliski, Richard. Effi-

cient preconditioning of Laplacian matrices for computer
graphics. Trans. Graph., 32(4):142:1–142:15, July 2013.
5

McCann, Robert J. A convexity principle for interacting

gases. Advances in Math., 128(1):153–179, 1997. 1

Pele, O. and Werman, M. Fast and robust earth mover’s

distances. In ICCV, pp. 460–467, 2009. 7

Rabin, Julien, Delon, Julie, and Gousseau, Yann. Trans-

portation distances on the circle. J. Math. Imaging Vis.,
41(1–2):147–167, September 2011. 7

HR Analytics PDF
100% (3)
HR Analytics PDF
16 pages
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
100% (1)
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
1,266 pages
New PPT On Work Ethics
100% (10)
New PPT On Work Ethics
18 pages
2 Bac Global Test S1 2018 Option 1
100% (1)
2 Bac Global Test S1 2018 Option 1
2 pages
Department of Education: Summative Test 2 in English 5 Second Quarter
No ratings yet
Department of Education: Summative Test 2 in English 5 Second Quarter
4 pages
Monsters University L2 - Activity Worksheets
No ratings yet
Monsters University L2 - Activity Worksheets
11 pages
Attention-Deficit/Hyperactivity Disorder
No ratings yet
Attention-Deficit/Hyperactivity Disorder
3 pages
Graphical Models, Exponential Families, and Variational Inference
No ratings yet
Graphical Models, Exponential Families, and Variational Inference
305 pages
Chapter 9 Data Mining
No ratings yet
Chapter 9 Data Mining
147 pages
Demystifying Deep Learning
No ratings yet
Demystifying Deep Learning
68 pages
Slides - Graph Signal Processing: Fundamentals and Applications To Diffusion Processes
No ratings yet
Slides - Graph Signal Processing: Fundamentals and Applications To Diffusion Processes
118 pages
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
29 pages
Label Propagation On Graphs: Leonid E. Zhukov
No ratings yet
Label Propagation On Graphs: Leonid E. Zhukov
26 pages
3 Bayesian Deep Learning
No ratings yet
3 Bayesian Deep Learning
33 pages
CVX Lecture Graphs
No ratings yet
CVX Lecture Graphs
79 pages
Robust Shape Matching With OT
No ratings yet
Robust Shape Matching With OT
175 pages
Learning Discrete Bayesian Networks From Continous Data, Chen2017
No ratings yet
Learning Discrete Bayesian Networks From Continous Data, Chen2017
30 pages
Rubrics Pericare Male
No ratings yet
Rubrics Pericare Male
2 pages
Digital Unit Plan Template
No ratings yet
Digital Unit Plan Template
9 pages
1 Action Research Final
No ratings yet
1 Action Research Final
75 pages
TEAM 4 - Overhead Costs
No ratings yet
TEAM 4 - Overhead Costs
6 pages
Sf5 - 2018 - Grade 10 (Year IV) - 10-Diamond
No ratings yet
Sf5 - 2018 - Grade 10 (Year IV) - 10-Diamond
3 pages
Deep Unsupervised Learning Using Nonequilibrium Thermodynamics
No ratings yet
Deep Unsupervised Learning Using Nonequilibrium Thermodynamics
18 pages
Slides - Graph Signal Processing and Applications in Neuroscience
No ratings yet
Slides - Graph Signal Processing and Applications in Neuroscience
103 pages
Cambridge IGCSE™: Chemistry 0620/23 October/November 2021
No ratings yet
Cambridge IGCSE™: Chemistry 0620/23 October/November 2021
3 pages
Hemorrhagic Shock Clinical Presentation: History
No ratings yet
Hemorrhagic Shock Clinical Presentation: History
12 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
JP Word Cards A4 Family
No ratings yet
JP Word Cards A4 Family
6 pages
Learning From Labeled and Unlabeled Data On A Directed Graph
No ratings yet
Learning From Labeled and Unlabeled Data On A Directed Graph
8 pages
Argmax Flows and Multinomial Diffusion
No ratings yet
Argmax Flows and Multinomial Diffusion
20 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
13 pages
Hypergraph Co-Optimal Transport: Metric and Categorical Properties
No ratings yet
Hypergraph Co-Optimal Transport: Metric and Categorical Properties
21 pages
Wainwright Microsoft Slides2
No ratings yet
Wainwright Microsoft Slides2
67 pages
Structure Learning in Graphical Modeling
No ratings yet
Structure Learning in Graphical Modeling
28 pages
Learning More Accurate Metrics For Self-Organizing Maps
No ratings yet
Learning More Accurate Metrics For Self-Organizing Maps
6 pages
The Emerging Field of Signal Processing On Graphs
No ratings yet
The Emerging Field of Signal Processing On Graphs
14 pages
A Geometric View of Optimal Transportation and Generative Model
No ratings yet
A Geometric View of Optimal Transportation and Generative Model
21 pages
Sap Community Network Social Business For More Successful Outcomes
No ratings yet
Sap Community Network Social Business For More Successful Outcomes
2 pages
Learning To Compress Images and Videos
No ratings yet
Learning To Compress Images and Videos
8 pages
Variyam 2
No ratings yet
Variyam 2
48 pages
Diffusion Wavelets On Graphs and Manifolds: R.R. Coifman, MM, J.C. Bremer JR., A.D. Szlam
No ratings yet
Diffusion Wavelets On Graphs and Manifolds: R.R. Coifman, MM, J.C. Bremer JR., A.D. Szlam
46 pages
Website - Machine Learning
No ratings yet
Website - Machine Learning
6 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Spectral Distances On Graphs
No ratings yet
Spectral Distances On Graphs
11 pages
Ebn Mad
No ratings yet
Ebn Mad
6 pages
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
No ratings yet
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
20 pages
04 Exact Inference
No ratings yet
04 Exact Inference
23 pages
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
No ratings yet
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
9 pages
Bataan 2
No ratings yet
Bataan 2
2 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Dhobi Ghat
No ratings yet
Dhobi Ghat
2 pages
Concentration of Random Graphs and Application To Community Detection
No ratings yet
Concentration of Random Graphs and Application To Community Detection
22 pages
Lec22 PDF
No ratings yet
Lec22 PDF
8 pages
Learning Graphs From Data A Signal Representation Perspective
No ratings yet
Learning Graphs From Data A Signal Representation Perspective
20 pages
Accuracy Analysis of Semi Supervised Classification When The - 2015 - Neurocompu
No ratings yet
Accuracy Analysis of Semi Supervised Classification When The - 2015 - Neurocompu
9 pages
2016-Revisiting Semi-Supervised Learning With Graph Embeddings
No ratings yet
2016-Revisiting Semi-Supervised Learning With Graph Embeddings
9 pages
14 Efficient Learning
No ratings yet
14 Efficient Learning
7 pages
DRDO PPT m1
No ratings yet
DRDO PPT m1
16 pages
Graphical Models: Michael I. Jordan
No ratings yet
Graphical Models: Michael I. Jordan
16 pages
Attention Seeking Additional Input
No ratings yet
Attention Seeking Additional Input
15 pages
GEC05 Course Syllabus Math in The Modern World
No ratings yet
GEC05 Course Syllabus Math in The Modern World
8 pages
Transport Inequalities. A Survey
No ratings yet
Transport Inequalities. A Survey
82 pages
Distance Distributions and Inverse Problems For Metric Measure
No ratings yet
Distance Distributions and Inverse Problems For Metric Measure
71 pages
LKG Lesson Plan For The Month of January 2025
No ratings yet
LKG Lesson Plan For The Month of January 2025
5 pages
Final Weekly Report5
No ratings yet
Final Weekly Report5
10 pages
Wasserstein Weisfeiler-Lehman Graph Kernels
No ratings yet
Wasserstein Weisfeiler-Lehman Graph Kernels
19 pages
A Linear Transportation LP Distance For Pattern Recognition
No ratings yet
A Linear Transportation LP Distance For Pattern Recognition
41 pages
Notice For Admission GrEnFIn EMJM 25 26 en LAST Signed 2
No ratings yet
Notice For Admission GrEnFIn EMJM 25 26 en LAST Signed 2
17 pages
Quantitative Stability of Regularized Optimal Transport
No ratings yet
Quantitative Stability of Regularized Optimal Transport
35 pages
Multi-Marginal Optimal Transport Defines A Generalized Metric
No ratings yet
Multi-Marginal Optimal Transport Defines A Generalized Metric
17 pages
Stabilized Sparse Scaling Algorithms For Entropy Regularized Transport Problems
No ratings yet
Stabilized Sparse Scaling Algorithms For Entropy Regularized Transport Problems
30 pages
Uniqueness and Monge Solutions in The Multimarginal OT Problem
No ratings yet
Uniqueness and Monge Solutions in The Multimarginal OT Problem
20 pages
Tensor Optimal Transport Distance Between Sets of
No ratings yet
Tensor Optimal Transport Distance Between Sets of
33 pages
A Multiscale Approach To Optimal Transport
No ratings yet
A Multiscale Approach To Optimal Transport
19 pages
Understanding The Basis of Graph Signal Processing Via An Intuitive Example-Driven Approach
No ratings yet
Understanding The Basis of Graph Signal Processing Via An Intuitive Example-Driven Approach
10 pages
Cook Umn 0130E 23206
No ratings yet
Cook Umn 0130E 23206
105 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
Deep Density Estimation
No ratings yet
Deep Density Estimation
20 pages
ELEN-2100-Digital Logic Design-Final
No ratings yet
ELEN-2100-Digital Logic Design-Final
2 pages
Learning Graph Node Embeddings by Smooth Pair Sampling: Konstantin Kutzkov
No ratings yet
Learning Graph Node Embeddings by Smooth Pair Sampling: Konstantin Kutzkov
37 pages
Graph-Based Semi-Supervised Learning With Multiple Labels
No ratings yet
Graph-Based Semi-Supervised Learning With Multiple Labels
4 pages
Semi-Supervised Learning With Graphs
No ratings yet
Semi-Supervised Learning With Graphs
174 pages
Prob Inf
No ratings yet
Prob Inf
56 pages
Child Development Theories and Critical Perspectives, 2nd Edition Dropbox Download
100% (13)
Child Development Theories and Critical Perspectives, 2nd Edition Dropbox Download
14 pages
NeurIPS 2020 Graph Random Neural Networks For Semi Supervised Learning On Graphs Paper
No ratings yet
NeurIPS 2020 Graph Random Neural Networks For Semi Supervised Learning On Graphs Paper
12 pages
Introduction To Variational Methods
No ratings yet
Introduction To Variational Methods
51 pages
Supp 2
No ratings yet
Supp 2
332 pages
Mod6 Slides
No ratings yet
Mod6 Slides
27 pages
الامتحان الوطني في اللغة الإنجليزية 2024 مسلك علوم انسانية الدورة العادية
No ratings yet
الامتحان الوطني في اللغة الإنجليزية 2024 مسلك علوم انسانية الدورة العادية
5 pages
Belkin 06 A
No ratings yet
Belkin 06 A
36 pages
Week 9
No ratings yet
Week 9
88 pages
Graphical Models, Exponential Families, and Variational Inference
No ratings yet
Graphical Models, Exponential Families, and Variational Inference
301 pages
Shadows Being: Ibidem Ibidem
0% (1)
Shadows Being: Ibidem Ibidem
203 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
Diploma in Computer Applications: Course Brochure
No ratings yet
Diploma in Computer Applications: Course Brochure
7 pages
Advanced Patented Methodologies in Ground Vibration Testing For Aerospace Applications
No ratings yet
Advanced Patented Methodologies in Ground Vibration Testing For Aerospace Applications
19 pages
Educating For Peace 1st Edition Lokanath Mishra Download
100% (1)
Educating For Peace 1st Edition Lokanath Mishra Download
42 pages
Wishart Distributions For Decomposable Covariance Graph Models
No ratings yet
Wishart Distributions For Decomposable Covariance Graph Models
43 pages
22-6-2025 Jee Mains+kcet+boards Enthusiast PH-1 & 2 It-2 (Common For All) Set-B HS
No ratings yet
22-6-2025 Jee Mains+kcet+boards Enthusiast PH-1 & 2 It-2 (Common For All) Set-B HS
9 pages
Scale Space: Exploring Dimensions in Computer Vision
From Everand
Scale Space: Exploring Dimensions in Computer Vision
Fouad Sabry
No ratings yet
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
From Everand
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
Fouad Sabry
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
Constructed Layered Systems: Measurements and Analysis
From Everand
Constructed Layered Systems: Measurements and Analysis
W. H. Cogill
No ratings yet

Wasserstein Propagation For Semi-Supervised Learning

Uploaded by

Wasserstein Propagation For Semi-Supervised Learning

Uploaded by

Wasserstein Propagation for Semi-Supervised Learning

Justin Solomon JUSTIN . SOLOMON @ STANFORD . EDU

propagated distributions are well-characterized by those

Of course, the desiderata above are application-specific. WASSERSTEIN P ROPAGATION

Similar to the construction in (Subramanya & Bilmes, 2011), ˆ 1

where Π(ρ0 , ρ1 ) ⊆ Prob(R2 ) is the set of probability distri- ∆gs = 0 ∀ v ∈ V \ V0

Figure 2. Comparison of propagation strategies on a linear graph

6.1. Synthetic Tests

(c) of ρv from PDF and Wasserstein propagation. Both

Ground truth PDF Wasserstein Ground truth PDF Wasserstein

such as the surface mapping problem in Solomon et al.

References Rabin, Julien, Peyre, Gabriel, Delon, Julie, and Bernot,

Krishnan, Dilip, Fattal, Raanan, and Szeliski, Richard. Effi-

McCann, Robert J. A convexity principle for interacting

Pele, O. and Werman, M. Fast and robust earth mover’s

Rabin, Julien, Delon, Julie, and Gousseau, Yann. Trans-

You might also like