0% found this document useful (0 votes)

49 views25 pages

SWIFT: Scalable Wasserstein Factorization For Sparse Nonnegative Tensors

AAAI.

Uploaded by

Indir Jaganjac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views25 pages

SWIFT: Scalable Wasserstein Factorization For Sparse Nonnegative Tensors

AAAI.

Uploaded by

Indir Jaganjac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

SWIFT: Scalable Wasserstein Factorization for Sparse

Nonnegative Tensors

Ardavan Afshar1 Kejing Yin2 Sherry Yan3 Cheng Qian4 Joyce C. Ho5
Haesun Park1 Jimeng Sun6
1 2 3 4
Georgia Institute of Technology Hong Kong Baptist University Sutter Health IQVIA
5 6
Emory University University of Illinois at Urbana-Champaign
Background: CP Tensor Factorization

CP factorization1 approximates a tensor X as the sum of R rank-one tensors:

R
X
X ≈ X̂ = JA(1) , A(2) , ...., A(N) K = a(1) (2) (N)
r ◦ ar ◦ ... ◦ ar ,
r =1

• A(n) : Factor matrix for the n-th mode.

An example of CP factorization with input of a
(n)
patient-diagnosis-medication tensor. • ar : the r -th column of A(n)

• It is widely-used in various applications, e.g. healthcare data analytics.

• It is highly interpretable: each rank-one tensor can be treated as a latent factor.

1 Tamara G Kolda and Brett W Bader. “Tensor decompositions and applications”. In: SIAM Review (2009).

1/23
Motivation

Existing tensor factorization models assume certain distributions of input, for example:

• Gaussian distribution: minXb ||X − Xb||2F ← MSE loss2

• Poisson distribution: minXb Xb − X ∗ log(Xb) ← KL divergence3
• Bernoulli distribution: minXb log(1 + e Xb) − X ∗ Xb ← logit loss4

Do we always know the distribution of a given input tensor?

• Real-world data often have very complex distributions.
• We usually do not know the underlying distribution of the input tensor.

2 J Carroll and J Chang. “Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition”. In: Psychometrika (1970).
3 E Chi and T Kolda. “On tensors, sparsity, and nonnegative factorizations”. In: SIAM Journal on Matrix Analysis and Applications (2012).
4 D Hong, T Kolda, and J Duersch. “Generalized canonical polyadic tensor decomposition”. In: SIAM Review (2020).

2/23
Motivation

Instead of assuming a specific distribution, the Wasserstein distance can be an alternative.

• a.k.a. Earth Mover Distance (EMD);

• is a potentially better measure of the difference between two distributions;
• does not assume any particular distributions of input data; and
• can leverage correlation relationship within each mode by defining the cost matrix.

3/23
Preliminaries: Wasserstein Distance and Optimal Transport

Definition (Wasserstein distance between vectors)

Wasserstein distance between probability vectors a and b is defined as

W (a, b) = hC, Ti (1)

• C is the cost matrix, where cij is the cost of moving ai to bj .

• T ∈ U(a, b) is an Optimal Transport (OT) solution between a and b.
n×m
• U(a, b) = {T ∈ R+ |T1m = a, TT 1n = b} is the feasible set of the OT problem.

Solving this OT problem is very expensive5 : it has a complexity of O(n3 ).

5 Gabriel Peyré, Marco Cuturi, et al. “Computational optimal transport”. In: Foundations and Trends® in Machine Learning (2019).

4/23
Preliminaries: Wasserstein Distance and Optimal Transport

An efficient alternative:
Definition (Entropy regularized OT problem6 )
The entropy regularized OT problem is defined as:
1
WV (a, b) = minimize hC, Ti − E (T), (2)
T∈U(a,b) ρ
PM,N
where E (T) = − i,j=1 tij log(tij ) is the entropy of T.

• It is strictly convex with a unique solution.

• It can be tackled with u and v such that diag(u) exp(−ρC) diag(v) ∈ U(a, b).
• Optimal u and v can be computed via the Sinkhorn’s algorithm7 .
6 Marco Cuturi. “Sinkhorn distances: Lightspeed computation of optimal transport”. In: Advances in Neural Information Processing Systems. 2013.
7 Richard Sinkhorn and Paul Knopp. “Concerning nonnegative matrices and doubly stochastic matrices”. In: Pacific Journal of Mathematics (1967)

5/23
Challenges

However, applying Wasserstein distance to tensor factorization is challenging:

1. Wasserstein distance is not well-defined for tensors: It is well-defined for vectors, yet
vectorizing tensor yields extremely large vectors, making it infeasible to solve.

2. Wasserstein distance is difficult to scale: It requires to solve the OT problems many

times in each iteration, which is extremely time consuming.

3. Real-world input are often large, sparse and non-negative: Efficient algorithms are
possible only when the sparsity structure are fully utilized.

6/23
Our Contributions

Contribution 1: Defining Wasserstein Tensor Distance

• SWIFT is the first work that defines Wasserstein distance for tensors.
• It does not assume any particular distribution.
• Therefore, it can handle non-negative inputs, including binaries, counts, and real-values.
Contribution 2: Formulating Wasserstein Tensor Factorization
• We propose SWIFT model to minimize the Wasserstein distance between the input and its
CP reconstructions.
Contribution 3: Efficiently Solving Wasserstein Tensor Factorization
• SWIFT effectively explores the sparsity structure of the input and reduces the number of
times required to compute OT.
• It reduces the computational time by efficient rearrangement of its sub-problems.
• As a result, it achieves 921x speed up over a naive implementation.
7/23
Defining Wasserstein Tensor Distance

We first define the Wasserstein distance for matrices by summing that over their columns:
Definition (Wasserstein Matrix Distance)
Given a cost matrix C ∈ RM×M + , the Wasserstein distance between two matrices A = [a1 , ..., aP ] ∈ RM×P
+ and
B = [b1 , ..., bP ] ∈ RM×P
+ is denoted by WM (A, B), and given by:

P
X 1
WM (A, B) = WV (ap , bp ) = minimizehC, Ti − E (T), (3)
T∈U(A,B) ρ
p=1

where C = [C, ...., C], T = [T1 , ...Tp , ..., TP ], and the feasible set U(A, B) is given by:
| {z }
P times

T ∈ RM×MP | Tp 1M = ap , TT T ∈ RM×MP

U(A, B) = + p 1M = bp ∀p = + | ∆(T) = A, Ψ(T) = B , (4)

where ∆(T) = [T1 1M , ..., TP 1M ] = T(IP ⊗ 1M ), Ψ(T) = [TT T

1 1M , ..., TP 1M ] and 1M is a vector of all ones with
the size of M.

8/23
Defining Wasserstein Tensor Distance

Then we can define the Wasserstein distance for tensors by summing that over the
matricization along each mode of the tensor:
Definition (Wasserstein Tensor Distance)
I ×...×IN
The Wasserstein distance between N-th order tensor X ∈ R+1 and its reconstruction
I ×...×IN
X̂ ∈ R+1 is denoted by WT (X̂ , X ):
 
N N 
1
X X 
WT (X̂ , X ) = WM X
b (n) , X(n) ≡ minimize hCn , Tn i − E (Tn ) , (5)
Tn ∈U bX ,X ρ
n=1 n=1 (n) (n)


In ×I(−n) In ×In I(−n)

where X(n) ∈ R+ is the n-th mode matricization of X , Cn = [Cn , Cn , ..., Cn ] ∈ R+ , and
In ×In I(−n) In ×In
Tn = [Tn1 , ..., Tnj , ..., TnI(−n) ] ∈ R+ . Tnj ∈ R+ is the transport matrix between the columns
b (n) (:, j) ∈ RI+n and X(n) (:, j) ∈ RI+n .
X

The Wasserstein distance WT (X , Y) defined above is a valid distance and satisfies the metric
axioms of positivity, symmetry, and triangle inequality. 9/23
Defining Wasserstein Tensor Distance

Illustration of the Wasserstein distances. Left: Wasserstein matrix distance; right: Wasserstein tensor distance.

10/23
Wasserstein Tensor Factorization

SWIFT minimizes the Wasserstein tensor distance between input and its CP reconstruction:
Optimization problem
N
1
X
minimize hCn , Tn i − E (Tn )
{An ≥0,Tn }N ρ
n=1 n=1

subject to X
b = JA1 , . . . , AN K
Tn ∈ U(X
b(n) , X(n) ), n = 1, . . . , N

Constraint Relaxation using the generalized KL-divergence

N
1
X
E (Tn ) +λ KL(∆(Tn )||An (A )T ) + KL(Ψ(Tn )||X(n) )
(−n)
minimize hCn , Tn i − (6)
{An ≥0,Tn }N ρ
n=1 n=1
| {z } | {z }
Part P2 Part P3
| {z }
Part P1

We alternate between An and Tn to solve Eq. (6).

11/23
Efficient Algorithms: 1. Solving for OT Problems (Tn )

In ×In I
Note that: Tn = [Tn1 , ..., Tnj , ..., TnI(−n) ] ∈ R+ (−n) .
The number of optimal transport problems to solve is: I(−n) = I1 × · · · × In−1 × In+1 × · · · × IN .

Instead, we use the property of the OT solution T∗nj 1 = diag(uj )Kn vj = uj ∗ (Kn vj ); therefore:

Proposition 2

∆(Tn ) = [Tn1 1, ..., Tnj 1, ..., TnI(−n) 1] = Un ∗ (Kn Vn ) (7)

In ×In b (n) )Φ Kn X(n) (KT Un ) Φ Φ
minimizes (6), where Kn = e (−ρCn −1) ∈ R+ , Un = (X n ,
Vn = X(n) (KT ) Φ
, Φ = λρ
, and indicates element-wise division.

n U n λρ+1

12/23
Efficient Algorithms: 1. Solving for OT Problems (Tn )

Exploring the sparsity structure for efficiently computing ∆(Tn )

• Many columns of the X(n) are all zeros; thus they can be ignored when computing Vn .
• Besides, each column of Vn can be computed in parallel.

𝜙
𝜙
𝜙

𝑋#! 𝐾! 𝑋! 𝐾!" 𝑈!
Computa(on in Parallel

𝜙 𝜙 𝜙
𝜙 𝜙 𝜙
𝜙 𝜙 𝜙

𝐾! 𝐾!" 𝐾! 𝐾!" 𝐾! 𝐾!"

SWIFT explores sparsity structure in input data X(n) and drops zero values columns.

13/23
Efficient Algorithms: 2. Updating CP factors (An )

Sub-problem for the CP factor matrices An

N
X
KL ∆(Ti ) || Ai (A )T
(−i)
minimize (8)
An ≥0
i=1

(−i)
Challenge: An is also involved in the Khatri-Rao product A .

To tackle this, we define an rearranging operator Π, such that

Efficient rearranging operation
In ×I(−n)
Π(Ai (A )T , n) = An (A
(−i) (−n) T
) ∈ R+ ∀ i 6= n. (9)

In this way, the Khatri-Rao product term no longer contain the factor matrix An .

14/23
Efficient Algorithms: 2. Updating CP factors (An )

With this operator, the sub-problem is equivalent to:

Rearranged sub-problem for An
An (A )T
(−n)
Π(∆(T1 ), n)
   
 ... 
  ... 
minimize KL  Π(∆(Ti ), n) 
  An (A(−n) )T  (10)
An ≥0  
 ...  ...

Π(∆(TN ), n) An (A
(−n) T
)

With the rearranged objective function, the factor matrix An can be efficiently updated via
multiplicative update rules8 .

8 Daniel D Lee and H Sebastian Seung. “Algorithms for non-negative matrix factorization”. In: Advances in Neural Information Processing Systems. 2001.

15/23
Experiments: Datasets and Evaluation Metrics

BBC News9
• a third-order counting tensor with size of 400 articles by 100 words by 100 words
• downstream task: article category classification; evaluated by accuracy.

Sutter
• a dataset collected from a large real-world health provider network
• a third-order binary tensor with size of 1000 patients by 100 diagnoses by 100 medications
• downstream task: heart failure onset; evaluated by PR-AUC.

We use the pair-wise cosine distance to compute the cost matrices for each mode of the two
datasets.
9 Derek Greene and Pádraig Cunningham. “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering”. In: International Conference on Machine
learning. 2006.

16/23
Experiments: Baselines

We compare against the following tensor factorization models with different loss functions:

Underlying
Model Loss Type Reference
Distribution Assumption
CP-ALS MSE Loss Gaussian (Bader & Kolda 2007)
CP-NMU MSE Loss Gaussian (Bader & Kolda 2007)
Supervised CP MSE Loss Gaussian (Kim et al. 2017)
Similarity based CP MSE Loss Gaussian (Kim et al. 2017)
CP-Continuous Gamma Loss Gamma (Hong et al. 2020)
CP-Binary Log Loss Bernoulli (Hong et al. 2020)
CP-APR KL Loss Poisson (Chi & Kolda 2012)

17/23
Experimental Results: Classification Performance

SWIFT outperforms all models consistently by a large margin.

18/23
Experimental Results: Classification Performance

Comparison against widely-adopted classifiers:

Accuracy on BBC PR-AUC on Sutter

Lasso Logistic Regression .728 ± .013 .308 ± .033
Random Forest .628 ± .049 .318 ± .083
Multi-Layer Perceptron .690 ± .052 .305 ± .054
K-Nearest Neighbor .596 ± .067 .259 ± .067
SWIFT (R=5) .759 ± .013 .364 ± .063
SWIFT (R=40) .818 ± .020 .374 ± .044

SWIFT with rank of 5 already outperforms all other classifiers compared.

19/23
Experimental Results: Classification Performance on Noisy Data

We inject random noise to the BBC News data and run all models using the noisy data:

1.0
CP ALS Supervised CP CP Gamma CP APR
CP NMU Similarity Based CP CP Binary SWIFT
0.8

0.6

Accuracy 0.4

0.2

0.0
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Noise Level

SWIFT outperforms all baselines, especially for medium and high noise levels.

20/23
Experimental Results: Scalability of SWIFT

We set R = 40, switch off all parallelization of SWIFT for fair comparison and measure the
running time of all models.

SWIFT is as scalable as other CP factorization models.

21/23
Experimental Results: Interpretability of SWIFT

We interpret the factor matrices learned for Sutter datasets. Following are three examples:

• Each group (phenotype) contains clinically relevant

diagnoses and medications.
• The weight indicates the lasso logistic regression
coefficient for heart failure (HF) prediction.
• First two groups are clinically relevant to HF, but the
third is not.
• The clinical meaningfulness is endorsed by a medical
expert.

SWIFT yields interpretable factor matrices.

22/23
Conclusion

• We define the Wasserstein distance between two tensors and propose SWIFT, a Wasserstein
tensor factorization model.

• We derive an efficient learning algorithm by exploring the sparsity structure and introducing
efficient rearrangement operator.

• Empirical evaluations demonstrate that SWIFT consistently outperforms baselines in

downstream prediction tasks, even in the presence of heavy noise.

• SWIFT is also shown scalable and interpretable.

23/23
Thank you!
All questions and comments are greatly appreciated!

23/23

List of Hormones - Hypersecretion and Hyposecretion
88% (49)
List of Hormones - Hypersecretion and Hyposecretion
11 pages
EM 300 G3 Manual 12 2016 EN
100% (1)
EM 300 G3 Manual 12 2016 EN
60 pages
Manual CP-8050 ENG DC8-026-2 04
No ratings yet
Manual CP-8050 ENG DC8-026-2 04
1,116 pages
Child, You Have To Do It Now
No ratings yet
Child, You Have To Do It Now
69 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
236 pages
Mass Transport Final
No ratings yet
Mass Transport Final
137 pages
Purchase Receipt
No ratings yet
Purchase Receipt
3 pages
Book Ddays Exeter Abstracts 2015c
No ratings yet
Book Ddays Exeter Abstracts 2015c
282 pages
Comparison of PM Frameworks
67% (3)
Comparison of PM Frameworks
77 pages
Loan Documents
No ratings yet
Loan Documents
3 pages
IPAM Splines
No ratings yet
IPAM Splines
48 pages
Constrained Sliced Wasserstein Embedding: Navid Naderializadeh
No ratings yet
Constrained Sliced Wasserstein Embedding: Navid Naderializadeh
27 pages
2023 ICLR Hierarchical Sliced Wasserstein Distance
No ratings yet
2023 ICLR Hierarchical Sliced Wasserstein Distance
29 pages
Grab The Full PDF Version of Test Bank For Human Physiology: An Integrated Approach, 8th Edition, Dee Unglaub Silverthorn, With A Fast Download.
100% (6)
Grab The Full PDF Version of Test Bank For Human Physiology: An Integrated Approach, 8th Edition, Dee Unglaub Silverthorn, With A Fast Download.
72 pages
Anomaly Detection On Time Series Data Challenge Rules
100% (1)
Anomaly Detection On Time Series Data Challenge Rules
8 pages
Problems in Quantum Mechanics: Third Edition
From Everand
Problems in Quantum Mechanics: Third Edition
D. ter Haar
3/5 (2)
JOHANSSON Online
No ratings yet
JOHANSSON Online
49 pages
23 Ecp525
No ratings yet
23 Ecp525
14 pages
Rockridge News
No ratings yet
Rockridge News
16 pages
Complete A Killer Welcome Haunting Avery Winters Cosy Mystery 1 1st Edition Dionne Lister PDF For All Chapters
100% (2)
Complete A Killer Welcome Haunting Avery Winters Cosy Mystery 1 1st Edition Dionne Lister PDF For All Chapters
40 pages
Tensor Optimal Transport Distance Between Sets of
No ratings yet
Tensor Optimal Transport Distance Between Sets of
33 pages
Robust Shape Matching With OT
No ratings yet
Robust Shape Matching With OT
175 pages
Semi-Relaxed Gromov Wasserstein Divergence With Applications On Graphs
No ratings yet
Semi-Relaxed Gromov Wasserstein Divergence With Applications On Graphs
28 pages
On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
No ratings yet
On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
33 pages
Sparsity-Constrained Optimal Transport
No ratings yet
Sparsity-Constrained Optimal Transport
26 pages
Khabouris Acts
No ratings yet
Khabouris Acts
84 pages
Ruta Optima
No ratings yet
Ruta Optima
27 pages
OTNote
No ratings yet
OTNote
46 pages
Accelerating Sinkhorn Algorithm With Sparse Newton Iterations
No ratings yet
Accelerating Sinkhorn Algorithm With Sparse Newton Iterations
20 pages
Meta Optimal Transport
No ratings yet
Meta Optimal Transport
19 pages
A Linear Transportation LP Distance For Pattern Recognition
No ratings yet
A Linear Transportation LP Distance For Pattern Recognition
41 pages
Daa Report
No ratings yet
Daa Report
3 pages
Interpolating Between Optimal Transport and MMD Using Sinkhorn Divergences
No ratings yet
Interpolating Between Optimal Transport and MMD Using Sinkhorn Divergences
15 pages
Wasserstein Fair Classification: Ray Jiang Aldo Pacchiano Tom Stepleton
No ratings yet
Wasserstein Fair Classification: Ray Jiang Aldo Pacchiano Tom Stepleton
15 pages
Optimal Transport in Learning, Control, and Dynamical Systems
No ratings yet
Optimal Transport in Learning, Control, and Dynamical Systems
25 pages
Stabilized Sparse Scaling Algorithms For Entropy Regularized Transport Problems
No ratings yet
Stabilized Sparse Scaling Algorithms For Entropy Regularized Transport Problems
30 pages
Transport-Based Analysis, Modeling, and Learning From Signal and Data Distributions
No ratings yet
Transport-Based Analysis, Modeling, and Learning From Signal and Data Distributions
24 pages
Ambrosio L., Gigli N. - A User's Guide To Optimal Transport-Web Draft (2009)
No ratings yet
Ambrosio L., Gigli N. - A User's Guide To Optimal Transport-Web Draft (2009)
128 pages
Hausdorff and Wasserstein Metrics On Graphs and Other Structured Data
No ratings yet
Hausdorff and Wasserstein Metrics On Graphs and Other Structured Data
41 pages
Recent Advances in Optimal Transport For Machine Learning
No ratings yet
Recent Advances in Optimal Transport For Machine Learning
20 pages
BSP6032 Writ2
No ratings yet
BSP6032 Writ2
10 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
209 pages
Iterative Bregman Projections For Regularized Transportation Problems
No ratings yet
Iterative Bregman Projections For Regularized Transportation Problems
29 pages
21 - ICML - Unbalanced Minibatch Optimal Transport - Applications To Domain Adaptation
No ratings yet
21 - ICML - Unbalanced Minibatch Optimal Transport - Applications To Domain Adaptation
12 pages
On A Linear Fused Gromov-Wasserstein Distance For Graph Structured Data
No ratings yet
On A Linear Fused Gromov-Wasserstein Distance For Graph Structured Data
10 pages
Optimal Transport: Fast Probabilistic Approximation With Exact Solvers
No ratings yet
Optimal Transport: Fast Probabilistic Approximation With Exact Solvers
23 pages
Calculus of Variations Solution Manual Russak
100% (2)
Calculus of Variations Solution Manual Russak
240 pages
Sobolev Transport: A Scalable Metric For Probability Measures With Graph Metrics
No ratings yet
Sobolev Transport: A Scalable Metric For Probability Measures With Graph Metrics
25 pages
Multi-Marginal Optimal Transport Defines A Generalized Metric
No ratings yet
Multi-Marginal Optimal Transport Defines A Generalized Metric
17 pages
Smooth and Sparse Optimal Transport
No ratings yet
Smooth and Sparse Optimal Transport
17 pages
1
No ratings yet
1
40 pages
WGAN Explained
No ratings yet
WGAN Explained
30 pages
Michael Scholkemper Damin K Uhn Gerion Nabbefeld Simon Musall BJ Orn Kampa Michael T. Schaub
No ratings yet
Michael Scholkemper Damin K Uhn Gerion Nabbefeld Simon Musall BJ Orn Kampa Michael T. Schaub
7 pages
Solving Graph Compression Via Optimal Transport: Preprint. Under Review
No ratings yet
Solving Graph Compression Via Optimal Transport: Preprint. Under Review
16 pages
Learning Wasserstein Embeddings
No ratings yet
Learning Wasserstein Embeddings
10 pages
Boundary Method
No ratings yet
Boundary Method
38 pages
Graph Diffusion Wasserstein Distances
No ratings yet
Graph Diffusion Wasserstein Distances
17 pages
Unbalanced Optimal Transport: Geometry and Kantorovich Formulation
No ratings yet
Unbalanced Optimal Transport: Geometry and Kantorovich Formulation
45 pages
Applsci 14 02503 v2
No ratings yet
Applsci 14 02503 v2
22 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
13 pages
13 Optimal Transport
No ratings yet
13 Optimal Transport
78 pages
1 Sornette 2
No ratings yet
1 Sornette 2
90 pages
Junk 4
No ratings yet
Junk 4
12 pages
A Global Positioning System Based Climat
No ratings yet
A Global Positioning System Based Climat
16 pages
Spatio-Temporal Alignments: Optimal Transport Through Space and Time
No ratings yet
Spatio-Temporal Alignments: Optimal Transport Through Space and Time
21 pages
Universal Scaling Between Wave Speed and Size Enables Nanoscale High-Performance Reservoir Computing Based On Propagating Spin-Waves
No ratings yet
Universal Scaling Between Wave Speed and Size Enables Nanoscale High-Performance Reservoir Computing Based On Propagating Spin-Waves
14 pages
Book 082101 1 Online
No ratings yet
Book 082101 1 Online
13 pages
Antarctic Ice Sheet Topography Cavity Ge
No ratings yet
Antarctic Ice Sheet Topography Cavity Ge
13 pages
Classroom Management: Experience
No ratings yet
Classroom Management: Experience
5 pages
The Application of Multiresolution Analysis Wavelet Decomposition of Vibration Signals in The Condition Monitoring of Car Suspension
No ratings yet
The Application of Multiresolution Analysis Wavelet Decomposition of Vibration Signals in The Condition Monitoring of Car Suspension
15 pages
Bharat B Hwan
50% (2)
Bharat B Hwan
16 pages
MP Set 107 17
No ratings yet
MP Set 107 17
12 pages
OTGNN
No ratings yet
OTGNN
13 pages
Advanced Calculus
From Everand
Advanced Calculus
H.K Nickerson
No ratings yet
Unit-5 6
No ratings yet
Unit-5 6
12 pages
1742-Article Text-13016-1-10-20240318
No ratings yet
1742-Article Text-13016-1-10-20240318
9 pages
How To Trade The IV Flush Strategy
No ratings yet
How To Trade The IV Flush Strategy
4 pages
Course Optimal Transport
No ratings yet
Course Optimal Transport
46 pages
A Parents Guide To Grammar 1
No ratings yet
A Parents Guide To Grammar 1
8 pages
Ocean Optics Ebook - Final
No ratings yet
Ocean Optics Ebook - Final
16 pages
Operation/Technical Manual
No ratings yet
Operation/Technical Manual
64 pages
Wind Effects On Structures
No ratings yet
Wind Effects On Structures
53 pages
Wasserstein Metric
No ratings yet
Wasserstein Metric
5 pages
Department of Mathematics, The Ohio State University.: Bstract
No ratings yet
Department of Mathematics, The Ohio State University.: Bstract
23 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
9 pages
Pianoman: "Piano Man"
No ratings yet
Pianoman: "Piano Man"
2 pages
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
No ratings yet
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
15 pages
Bid Evaluation Report - 23H00003
No ratings yet
Bid Evaluation Report - 23H00003
3 pages
04 Long Tables Cetking CAT DILR150 Frequently Repeated Questions
No ratings yet
04 Long Tables Cetking CAT DILR150 Frequently Repeated Questions
7 pages
Ieee ZhouBattery
No ratings yet
Ieee ZhouBattery
15 pages
Lookup Functions Practice Sheet
No ratings yet
Lookup Functions Practice Sheet
11 pages
Russak (1996) Calculus of Variations Lecture Notes
No ratings yet
Russak (1996) Calculus of Variations Lecture Notes
133 pages
2012 09 Solar Energy Stabilise Power Grid
No ratings yet
2012 09 Solar Energy Stabilise Power Grid
4 pages
Writing Effective Covering Letters
No ratings yet
Writing Effective Covering Letters
3 pages
Walmart Lighter Fluid
No ratings yet
Walmart Lighter Fluid
9 pages
Hidden Factors: From The Editor From The Editor
No ratings yet
Hidden Factors: From The Editor From The Editor
3 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Conclusion
No ratings yet
Conclusion
2 pages
Introduction Presentation
No ratings yet
Introduction Presentation
17 pages
11.a Study of The Recruitment and Selection Process
No ratings yet
11.a Study of The Recruitment and Selection Process
11 pages
The TOEFL ITP Tests at A Glance
No ratings yet
The TOEFL ITP Tests at A Glance
4 pages
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Process Selection and Facility Layout
No ratings yet
Process Selection and Facility Layout
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

SWIFT: Scalable Wasserstein Factorization For Sparse Nonnegative Tensors

Uploaded by

SWIFT: Scalable Wasserstein Factorization For Sparse Nonnegative Tensors

Uploaded by

SWIFT: Scalable Wasserstein Factorization for Sparse

CP factorization1 approximates a tensor X as the sum of R rank-one tensors:

• A(n) : Factor matrix for the n-th mode.

• It is widely-used in various applications, e.g. healthcare data analytics.

• Gaussian distribution: minXb ||X − Xb||2F ← MSE loss2

Do we always know the distribution of a given input tensor?

Instead of assuming a specific distribution, the Wasserstein distance can be an alternative.

• a.k.a. Earth Mover Distance (EMD);

Definition (Wasserstein distance between vectors)

W (a, b) = hC, Ti (1)

• C is the cost matrix, where cij is the cost of moving ai to bj .

Solving this OT problem is very expensive5 : it has a complexity of O(n3 ).

• It is strictly convex with a unique solution.

However, applying Wasserstein distance to tensor factorization is challenging:

2. Wasserstein distance is difficult to scale: It requires to solve the OT problems many

Contribution 1: Defining Wasserstein Tensor Distance

where ∆(T) = [T1 1M , ..., TP 1M ] = T(IP ⊗ 1M ), Ψ(T) = [TT T

In ×I(−n) In ×In I(−n)

Constraint Relaxation using the generalized KL-divergence

We alternate between An and Tn to solve Eq. (6).

∆(Tn ) = [Tn1 1, ..., Tnj 1, ..., TnI(−n) 1] = Un ∗ (Kn Vn ) (7)

Exploring the sparsity structure for efficiently computing ∆(Tn )

𝐾! 𝐾!" 𝐾! 𝐾!" 𝐾! 𝐾!"

Sub-problem for the CP factor matrices An

To tackle this, we define an rearranging operator Π, such that

With this operator, the sub-problem is equivalent to:

SWIFT outperforms all models consistently by a large margin.

Comparison against widely-adopted classifiers:

Accuracy on BBC PR-AUC on Sutter

SWIFT with rank of 5 already outperforms all other classifiers compared.

SWIFT is as scalable as other CP factorization models.

• Each group (phenotype) contains clinically relevant

SWIFT yields interpretable factor matrices.

• Empirical evaluations demonstrate that SWIFT consistently outperforms baselines in

• SWIFT is also shown scalable and interpretable.

You might also like