0% found this document useful (0 votes)

14 views112 pages

03 Matrix

Uploaded by

chunfeng277

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views112 pages

03 Matrix

Uploaded by

chunfeng277

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 112

CS 514 Advanced Topics in Network Science

Lecture 3. Matrix and Tensor

Hanghang Tong, Computer Science, Univ. Illinois at Urbana -Champaign, 2024
Network Science: An Overview
We are here
network
(e.g., Patterns, laws, connectivity, etc.)

(e.g., clusters, communities,

dense subgraphs, etc.)
subgraph

(e.g., ranking, link prediction, embedding, etc.)

node/link

• Level 1: diameter, connectivity, graph-level classification, graph-level embedding, graph kernel, graph structure learning, graph generator,…
• Level 2: frequent subgraphs, clustering, community detection, motif, teams, dense subgraphs, subgraph matching, NetFair, …
• Level 3: node proximity, node classification, link prediction, anomaly detection, node embedding, network alignment, NetFair,
• Beyond:, network of X, ….

2
Matrix & Tensor Tools
• Matrix Tools
– Proximity (covered in Lecture 2)
– Low-rank approximation
– Co-clustering
• Tensor Tools

3
Motivation
• Q: How to find patterns?
– e.g., communities, anomalies, etc.
• A (Common Approach): Low-Rank
Approximation (LRA) for Adjacency Matrix.
X M X R

A ~ L

4
Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos: Colibri: fast
mining of large static and dynamic graphs. KDD 2008: 686-694
LRA for Graph Mining
Conference

John
ICDM
1 1 0 0
Tom 1 1 0 0
KDD

Author
Bob
1 1 0 0
Carl
ISMB 0 1 1 1
Van
RECOMB
0 0 1 1
Roy
0 0 1 1
Author Conference Adjacency matrix: A
5
LRA for Graph Mining: Communities
R: Conf. Group Matrix
John Adj. matrix: A
ICDM
Tom X X
KDD
Bob

Carl
ISMB
~ M: Group-Group
Van
Interaction Matrix
RECOMB
Roy

L: author group matrix

Author Conf.

6
LRA for Graph Mining: Anomalies
John Adj. matrix: A L M R
ICDM
Tom X X
KDD
Bob

Carl
ISMB
~
Van
RECOMB
Roy

Author Conf.
Recon. error is high
→ ‘Carl’ is abnormal
7
Challenges – Problem 1
• Prob.1: Given a static graph A,
+ (C1) How to get (L, M, R) efficiently?
- Both time and space
+ (C2) What is the interpretation of
(L, M, R)?

8
Challenges – Problem 2
• Prob. 2: Given a dynamic graph
At(t=1,2,…),
+ (C3) How to get (Lt, Mt, Rt) incrementally?
- Track patterns over time

9
Roadmap - LRA
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
10
Overview

X M X R

A L L
Find L Project A

Projection of A

Same for
different methods
11
Matrix & Vector
3 1 Phlip Yu Philip 1
ICML
• Matrix B= 1 1 William Cohen William
1
3
1 SIGMOD
0 0 John Smith John

SIGMOD ICML

John Smith
William Cohen

ICML = [1, 1, 0]’

SIGMOD = [3, 1, 0]’
Philip Yu

12
Column Space
3 1 Phlip Yu Philip 1
ICML
• Matrix B= 1 1 William Cohen William
1
3
1 SIGMOD
0 0 John Smith John

SIGMOD ICML

• Column Space of a Matrix

ICML SIGMOD

VLDB = SIGMOD – ICML = [2 0 0]’

13
Projection & Projection Matrix
KDD
v

ICML
v~ KDD ~
SIGMOD

+
X BTB X BT X

v~ = B v
Core Matrix

Projection of v Projection matrix of B An arbitrary vector 14

Projection of a Matrix

L M R

+
X BTB X BT X

~ = B A
A
Core Matrix

Projection of A Projection matrix of B

15
Roadmap
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
16
Singular-Value-Decomposition (SVD)
1 … v1

x x

…
… … …. ….
… k vk

 V:
a1 a2 ….
a3 …a m
~ u1 …
…. uk
right singular vectors

… …

A: n x m U: left singular vectors 17

SVD: definitions
• #1: Find the left matrix U,
– where A  viT a1  vi ,1 + a2  vi ,2 + ... + am  vi ,m
ui = =
i i
• #2: Project A into the column space of U
+
A = U (U U ) U A = ... = U V
T T

18
SVD: advantages
• Optimal Low-Rank Approximation
–In both L2 and LF

–For any rank-k matrix Ak

|| A – ||2, F <= || A – Ak ||2,F

19
SVD: drawbacks
• (C1) Efficiency A U  V
2 2
– Time O (min( n m, nm ))
[footnote: or O( E • Iter ) ] =
– Space (U, V) are dense

• (C2) Interpretation

20
SVD: drawbacks
• (C3) Dynamic: not easy
At Ut t Vt At+1 Ut+1 t+1 Vt+1

21
Roadmap
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
22
CUR (CX) decomposition
[Drineas+ 2005]

+
… … … x (C T….
C) x C AT
….

U R
… ……. … ~ ….
•Sample Columns from A
•Project A
… … … Left matrix: C
Middle matrix: (C T C ) +
Right matrix : C T A

A: n x m C 23
CUR (CX): advantages
• (C0) Quality: Near-Optimal
• (C1) Efficiency (better than SVD)
– Time O ( c 2
n ) or O ( c 3
+ cm)
• (c is # of sampled col.s)
– Space (C, R) are sparse

• (C2) Interpretation

24
CUR (CX): drawbacks
• (C1) Redundancy in C

• 3 copies of green,
• 2 copies of red,
• 2 copies of purple
• purple=0.5*green + red…

25
Redundant Col.
KDD
Does Not Help
ICML ~
KDD
SIGMOD

Observations: VLDB
~
#1: Does not help KDD
KDD
#2: Wastes time & space
ICML ~
KDD
SIGMOD
VLDB

26
CUR (CX): drawbacks
• (C3) Dynamic: not easy

~~
~

?
t t+1

C ~
C

27
Roadmap
• Motivation
• Survey: Existing Methods
– SVD
– CUR/CX
– CMD
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
28
CMD [Sun+ 2007]
CUR (CX) CMD
Original Matrix

~~ ~

Left matrix: C
Middle matrix: (C T C ) +
• 3 copies of green, Right matrix : C T A
• 2 copies of red,
• 2 copies of purple
• purple=0.5*green + red C

Duplicate: deleted in CMD!

29
Challenges
• Can we do even better than CMD
• by removing the other types of redundancy?
• Can we efficiently track LRA
• for time-evolving graphs?

30
Roadmap
• Motivation
• Survey: Existing Methods
• Proposed Methods: Colibri
– Colibri-S for static graphs (Problem 1)
– Colibri-D for dynamic graphs (Problem 2)
• Experimental Results
• Conclusion

31
Colibri-S: Basic Idea
CUR (CX) Colibri-S
Original Matrix
…
x. x ….

M R

…
. Left matrix: L
Middle matrix: ( LT L) −1

• 3 copies of green, Right matrix : LT A

• 2 copies of red,
• 2 copies of purple
• purple=0.5*green + red L
We want the Col.s in L to be linearly independent!
32
Q: How to find L & M from C efficiently?

33
A: Find L & M incrementally!
Initially Sampled
….
Matrix C

Current For each col. v in C

Redundant
discard v
L&M Project it on L ? Yes

Expand L & M
34
Step 1: How to test if KDD is redundant ?

KDD
SIGMOD

~
_ X

ICML
KDD
KDD
=

ICML
~ SIGMOD

KDD
KDD = Mold X ICML X
SIGMOD

35
Step 2: How to update core matrix ?

-1

SIGMOD
SIGMOD

ICML
KDD Mold = ICML
X

ICML
~
KDD
?
SIGMOD
-1
SIGMOD

SIGMOD
ICML
X

KDD
Mnew = ICML
KDD

36
Q: How to update core matrix?
A: Incrementally.
Theorem 1
1 ~
KDDX −1
[Tong et al KDD 2008]
+ X 

KDD
Mold 

KDD
~

~
 2

Mnew = −1
 ~
KDD
1
 

~
We only need to know KDD and !
37
Colibri-S vs. CUR(CMD)
Example:
• (C0) Quality: -If c = 200, c = 1000
• Colibri-S = CUR(CMD) - Colibri-S: 125x faster !

• (C1) Time: O(c 3 + cm) vs. O(c3 + cm ), where c  c, m  m

• Colibri-S better or equal CUR(CMD)
• (C1) Space
• Colibri-S better or equal CUR(CMD)
• (C2) Interpretations
• Colibri-S = CUR(CMD)
38
A Pictorial Comparison
1 ICML Y: William Cohen
Philip
1 3
William 1 SIGMOD
……

X: Philip Yu

Each dot is a conference

39
A Pictorial Comparison: SVD
Y: William Cohen

2nd singular vector

1st singular vector

X: Philip Yu

Each dot is a conference

40
A Pictorial Comparison: CUR
[Drineas+ 2005]

Y: William Cohen

3x 2x

4x
2x
1x X: Philip Yu

Each dot is a conference

41
A Pictorial Comparison: CMD
[Sun+ 2007]

Y: William Cohen

X: Philip Yu

Each dot is a conference

42
A Pictorial Comparison: Colibri-S
[Tong+ 2008]

Y: William Cohen

X: Philip Yu

Each dot is a conference

43
Roadmap
• Motivation
• Survey: Existing Methods
• Proposed Methods: Colibri
– Colibri-S for static graphs (Problem 1)
– Colibri-D for dynamic graphs (Problem 2)
• Experimental Results optional

• Conclusion

44
Problem Definition
• Given (e.g., Author-Conference Graphs)

A1 A2 A3 …

• Find Incrementally

M1 R1 M2 R2 M3 R3
L1 L2 L3 …
45
Colibri-D for dynamic graphs

Mt Rt

t Lt

Initially sampled matrix

Mt+1 Rt+1
?
t+1 Lt+1

Q: How to update L and M efficiently? 46

Colibri-D: How-To
Selected Redundant

Mt Rt

t Lt

Initially sampled matrix

Selected Redundant
t+1 t+1
M R
?
t+1 Lt+1

47
Changed from t
Colibri-D: How-To Mt

Selected Redundant Lt

t
~
M
Unchanged Cols!

~ Subspace by
L
Initially sampled matrix blue cols
Selected Redundant at t+1

t+1
Mt+1

Lt+1
48
How to Get Core Matrix
for Un-changed Col.s ?
Lt
-1

X
M t
= [(Lt )’ x Lt ]-1 =
t

?
-1
t+1 ~ ~t ~ t -1
Mt = [(L )’ xL ] = X

~
Lt v
49
How to Get Core Matrix
for Un-changed Col.s ?
Let
s: # of changed columns in Lt

Theorem 2 [Tong et al KDD 2008] -1

X X
~t _ t
M 2,2
t
M 2,1
M = t
M 1,2

We only need an s x s matrix inverse !

50
How to Get Core Matrix
for Un-changed Col.s
Let t: # of un-changed columns in Lt

s: # of changed columns in Lt

We only need a matrix inverse of size

- s x s, instead of t x t
- if s<< t (a.k.a, “smooth”), we are faster
- example:
+ if s=10 and t=100, we are 1000x faster!

51
Comparison SVD, CUR/CMD vs. Colibri
s

Wish List SVD CUR/CMD Colibri

[Golub+ 1989] [Drineas+ 2005, [Tong+ 2008]
Sun+ 2007]
(C0)
Quality
(C1)
Efficiency
(C2)
Interpretation
(C3)
Dynamics
(?)

52
Roadmap
• Motivation
• Survey: Existing Methods
• Proposed Methods: Colibri
• Experimental Results
• Conclusion

53
Experimental Setup
• Data set
• Network traffic
• 21,837 sources/destinations
• 1,222 consecutive hours (~ 2 months)
• 22,800 edges per hour
• Accuracy:
Accuracy =
• Space Cost:

54
Performance
SVD SVD
of Colibri-S
• Accuracy CUR CUR
• Same 91%+
• Time
• 12x of CMD
• 28x of CUR
• Space
• ~1/3 of CMD CMD
• ~10% of CUR CMD Ours
Ours
Time Space 55
Performance
Time
CMD of Colibri-D
(Prior Best Method)

Network traffic
- 21,837 nodes
Colibri-S - 1,220 hours
- 22,800 edge/hr
Colibri-D Accuracy
- Same 93%+
# of changed cols
Colibri-D achieves up to 112x speedups 56
Conclusion: Colibri
• Colibri-S (for static graphs)
– Idea: remove redundancy
– Up to 52x speedup; 2/3 space saving
– No quality loss (w.r.t., CUR/CMD)
• Colibri-D (for dynamic graphs)
– Idea: leverage “smoothness”
– Up to 112x than CMD

57
optional

• More on Matrix Low Rank Approximations

58
Graph Mining by Low-Rank Approximation

Q: How to get the low-rank matrix approximations?

59
optional

More on LRA
• Q0: SVD + example-based LRA
• Q1: Nonnegative Matrix Factorization
• Q2: Non-negative Residual Matrix Factorization
• Q3: Nuclear norm related technologies

60
Low Rank Approximation
• Nonnegative Matrix Factorization (NMF)

DanielMay
D. Lee 1st-4th,
and H. Sebastian
2013Seung. Learning
SDM the 2013,
parts of objects
Austin, by Texas
non-negative matrix factorization. Nature 401,61788-
791 (21 October 1999)
Nonnegative Matrix Factorization (NMF)

• Factorizing a nonnegative matrix to the

product of two low-rank matrices
(entire F matrix)

(1 row in G)

r r
62
NMF Solutions: Multiplicative Updates
• Multiplicative update method

Daniel D. Lee and H. Sebastian Seung (2001). Algorithms for Non-negative Matrix Factorization. NIPS 2001.
H Zhou, K Lange,
May and M Suchard.
1st-4th, 2013 (2010) Graphical
SDM processing units andTexas
2013, Austin, high-dimensional optimization, Statistical Science,
63
25:311-324
NMF Solutions: Alternating Nonnegative
Least Squares
• Initialize F and G with nonnegative values
• Iterate the following procedure:
– Fixing , Solve
– Fixing , Solve

(1) Projected Gradient: http://www.csie.ntu.edu.tw/~cjlin/nmf/

(2) Newtown Type of Method:
http://www.cs.utexas.edu/users/dmkim/Source/software/nnma/index.html
(3) Block Principal Pivoting: https://sites.google.com/site/jingukim/nmf_bpas.zip?attredirects=0

P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error
estimates of data values. Environmetrics, 5(1):111–126, 1994
C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation,19(2007), 2756-2779.
D. Kim, S. Sra, I. S. Dhillon, Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem.
SDM 2007.
May 1st-4th, 2013 SDM 2013, Austin, Texas 64
J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons. ICDM 2008.
Application of NMF: Privacy-Aware On-line User
Role Tracking [AAAI11]
• Problem Definitions
– Given: the user-activity log that changes over time
– Monitor: (1) the user role/cluster; and (2) the role/cluster description.
• Design Objective
– (1) Privacy-aware; and (2) Efficiency (in both time and space).

65
Key Ideas
• Minimize the upper bound of the original/exact objective function
min ||X + ΔX – F G - ΔF GT – F ΔGT - ΔF ΔGT ||F

||X – F GT ||F Dependent on X,

but fixed
≤ +
||ΔX – ΔF GT – F ΔGT - ΔF ΔGT ||F Independent on X
subject to: ΔF+F ≥ 0, ΔG+G ≥ 0

min ||ΔX – ΔF GT – F ΔGT - ΔF ΔGT ||F Independent on X

subject to: ΔF+F ≥ 0, ΔG+G ≥ 0

Can be solved by the projected gradient descent method

Fei Wang, Hanghang Tong, Ching-Yung Lin: Towards Evolutionary Nonnegative Matrix
66
Factorization. AAAI 2011
Experimental Results

Time
Time Stamp Data Sets

Red: Our method; Blue: Off-line method

Fei Wang, Hanghang Tong, Ching-Yung Lin: Towards Evolutionary Nonnegative Matrix
67
Factorization. AAAI 2011
NMF: Extensions
• General loss
– Bregman Divergence
• Different constraints
– Semi-NMF, Convex NMF, Symmetric NMF
• Incorporating supervisions
– Pairwise constraints, label
• Multiple factorized matrices
– Tri-factorization
I. S. Dhillon and S. Sra. Generalized Nonnegative Matrix Approximations with Bregman Divergences. NIPS 2005.
Chris H. Q. Ding, Tao Li, Michael I. Jordan: Convex and Semi-Nonnegative Matrix Factorizations. IEEE Trans. Pattern Anal.
Mach. Intell. 32(1): 45-55 (2010)
Chris H. Q. Ding, Tao Li, Wei Peng, Haesun Park: Orthogonal nonnegative matrix t-factorizations for clustering. KDD 2006.
Fei Wang, Tao Li, Changshui Zhang: Semi-Supervised Clustering via Matrix Factorization. SDM 2008: 1-12
Yuheng Hu, Fei Wang, Subbarao Kambhampati. Listen to the Crowd: Automated Analysis of Live Events via Aggregated
TwitterMay 1st-4th, 2013 SDM 2013, Austin, Texas 68
Sentiment. IJCAI 2013.
Graph Mining by Low-Rank Approximation

Q: How to get the low-rank matrix approximations?

69
A2: Non-negative Residual MF
• Observations: anomalies → actual activities
• Examples: popularity contest, port scanner, etc
• NrMF formulation

Weighted Frobenius Form

Common in Any MF Weight

Unique in NrMF Non-negative residual

H. Tong, C. Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly
May 1st-4th,
Detection. SDM 20112013 SDM 2013, Austin, Texas 70
Visual Comparisons
Original NrMF SVD Original NrMF SVD

71
Low Rank Approximation
• Nonnegative Matrix Factorization
• Non-negative Residual Matrix Factorization
• Nuclear norm related technologies

SDM 2013, Austin, Texas 72

Rank Minimization and Nuclear Norm
• Matrix completion with rank minimization
NP hard

• Convex relaxation

M. Fazel, H. Hindi, S. Boyd. A Rank Minimization Heuristic with Application to Minimum Order
May
System 1st-4th, 2013Proceedings
Approximation. SDM 2013, Control
American Austin, Conference,
Texas 6:4734-4739, June 2001.
73
Nuclear Norm Minimization
• Singular Value Thresholding
– http://svt.stanford.edu/
• Accelerated gradient
– http://www.public.asu.edu/~jye02/Software/SLEP
/index.htm
• Interior point methods
– http://abel.ee.ucla.edu/cvxopt/applications/nucnr
m/
J-F. Cai, E.J. Candès and Z. Shen. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM Journal on
Optimization. Volume 20 Issue 4, January 2010 Pages 1956-1982.
Shuiwang Ji and Jieping Ye. An Accelerated Gradient Method for Trace Norm Minimization. The Twenty-Sixth International
Conference on Machine Learning (ICML 2009)
Z. Liu, May
Lieven 1st-4th,
Vandenberghe.
2013 Interior-point method for nuclear
SDM 2013, norm approximation
Austin, Texas with application to system 74
identification. SIAM Journal on Matrix Analysis and Applications (2009)
◼ From LRA to Co-clustering
Co-clustering
• Let X and Y be discrete random variables
– X and Y take values in {1, 2, …, m} and {1, 2, …, n}
– p(X, Y) denotes the joint probability distribution—if
not known, it is often estimated based on co-occurrence
data
– Application areas: text mining, market-basket analysis,
analysis of browsing behavior, etc.
• Key Obstacles in Clustering Contingency Tables
– High Dimensionality, Sparsity, Noise
– Need for robust and scalable algorithms

Reference:
1. Dhillon et al. Information-Theoretic Co-clustering, KDD’03
76
n
𝑃(𝑋, 𝑌) .05 .05 .05 0 0 0  eg, terms x documents
.05 .05 .05 0 0 0

m 0 0 0 .05 .05 .05

.04 
0 0 0 .05 .05 .05
.04 0 .04 .04 .04
.04 .04 .04 0 .04 .04 
k
 =
l n
.5 0 0  .3 0  l .36 .36 .28 0 0 0 .054 .054 .042 0 0 0 
.5 0 0  k 0 .3
.2 .2 0 0 0 .28 .36 .36
.054 .054 .042 0 0 0

m 0   00 
0 .5 0 0 0 .042 .054 .054

0 .5 0
 ෠
𝑃(𝑌|𝑌)
.036 0 0 .042 .054 .054

0  𝑃(𝑋,෠ 𝑌) .036 
0 .5 .036 028 .028 .036 .036
 0 .5  ෠ .036 .028 .028 .036 .036 

෠
𝑃(𝑋, 𝑌)
෠
𝑃(𝑋|𝑋)

77
med. doc
cs doc

.05 .05 .05 0 0 0  med. terms

.05 .05 .05 0 0 0

 00 0 0 .05 .05 .05
 cs terms
.04 
0 0 .05 .05 .05
term group x
.04 0 .04 .04 .04
doc. group .04 .04 .04 0 .04 .04 
common terms

.5
.5
0
0
0
0
 .03 .03
.2 .2

.36 .36 .28
0 0 0
0 0
.28 .36 .36
0
= .054
.054
.054
.054
.042
.042
0
0
0
0
0
0


0 .5 0
  00 0 0 .042 .054 .054

 00 .5 0
 doc x .036 0 0 .042 .054 .054

0  .036 
0 .5 .036 028 .028 .036 .036
 0 .5  doc group .036 .028 .028 .036 .036 

term x
term-group
78
Co-clustering
Observations
• uses KL divergence, instead of L2 or LF
• the middle matrix is not diagonal
– we’ll see that again in the Tucker tensor
decomposition

79
Matrix & Tensor Tools
• Matrix Tools
• Tensor Tools
– Tensor Basics
– Tucker
• Tucker 1
• Tucker 2
• Tucker 3
– PARAFAC

80
Tensor Basics
Reminder: SVD

n n



VT
m A m

U
– Best rank-k approximation in L2 or LF

82
Reminder: SVD

n
1u1v1 2u2v2

m A  +

– Best rank-k approximation in L2

83 See also PARAFAC

Goal: extension to >=3 modes

IxJxK
IxR JxR


¼
A
B = +…+

RxRxR

84
Main points:
• 2 major types of tensor decompositions:
PARAFAC and Tucker
• both can be solved with ``alternating least
squares’’ (ALS)
• Details follow – we start with terminology:

85
[T. Kolda,’07]
A tensor is a multidimensional array
An I x J x K tensor Column (Mode-1) Row (Mode-2) Tube (Mode-3)
Fibers Fibers Fibers
X1,1,1

xijk
I

J
Horizontal Slices Lateral Slices Frontal Slices
3rd
order tensor
mode 1 has dimension I
mode 2 has dimension J
mode 3 has dimension K
Note: focus is on 3rd
order, but everything
can be extended to
higher orders.

86
details [T. Kolda,’07]
Matricization: Converting a Tensor to
a Matrix
X(n): The mode-n fibers are
Matricize
(i′,j′) rearranged to be the columns
(unfolding) (i,j,k)
of a matrix

Reverse
(i′,j′)
Matricize (i,j,k)

5 7
1 3
6 8
2 4

87
details

Tensor Mode-n Multiplication

• Tensor Times Matrix • Tensor Times Vector

Compute the dot

Multiply each
product of a and
row (mode-2)
fiber by B each column
(mode-1) fiber

[T. Kolda,’07]
88
details

Mode-n product Example

• Tensor times a matrix

Time

Location

Clusters
Location

Time

Clusters
Time

[T. Kolda,’07]
89
details

Mode-n product Example

• Tensor times a vector

Location

Time
Location

Time

[T. Kolda,’07]
90
details
Outer, Kronecker, &
Khatri-Rao Products
3-Way Outer Product Review: Matrix Kronecker Product

MxN PxQ

MP x NQ

=
Matrix Khatri-Rao Product
Rank-1 Tensor

MxR NxR MN x R

91 [T. Kolda,’07]
Specially Structured Tensors
Specially Structured Tensors
• Tucker Tensor • Kruskal Tensor

Our
Notation
Our
Notation

“core”

IxJxK IxR JxS IxJxK wI1 x R wR

JxR
= V = = +…+ V
v1 vR
U U
RxSxT u1
RxRxR uR

[T. Kolda,’07]
93
details

Specially Structured Tensors

• Tucker Tensor • Kruskal Tensor

In matrix form: In matrix form:

[T. Kolda,’07]
94
Outline: Part 2
• Matrix Tools
• Tensor Tools
– Tensor Basics
– Tucker
• Tucker 1
• Tucker 2
• Tucker 3
– PARAFAC

95
Tensor Decompositions
Tucker Decomposition - intuition

IxJxK IxR JxS

~ B
A
RxSxT

• author x keyword x conference

• A: author x author-group
• B: keyword x keyword-group
• C: conf. x conf-group
• G: how groups relate to each other
97
Reminder
.05 .05 .05 0 0 0  med. terms
.05 .05 .05 0 0 0

 00 0 0 .05 .05 .05
 cs terms
.04 
0 0 .05 .05 .05
term group x
.04 0 .04 .04 .04
doc. group .04 .04 .04 0 .04 .04 
common terms

term x
term-group

98
Tucker Decomposition

IxJxK IxR JxS

~ B
Given A, B, C, the optimal core is:
A
RxSxT

• Proposed by Tucker (1966) Recall the equations for

• AKA: Three-mode factor analysis, three-mode converting a tensor to a matrix
PCA, orthogonal array decomposition
• A, B, and C generally assumed to be
orthonormal (generally assume they have full
column rank)
• is not diagonal
• Not unique

99
details

Tucker Variations
See Kroonenberg & De Leeuw, Psychometrika,1980 for discussion.
• Tucker2 Identity Matrix
IxJxK IxR JxS
~ B
A
RxSxK

• Tucker1
IxJxK IxR
~
A Finding principal components in only mode 1
RxJxK
can be solved via rank-R matrix SVD

100
details
Solving for Tucker
Given A, B, C orthonormal, the optimal core is: IxJxK IxR JxS
~~ B
Tensor norm is the square A
root of the sum of all the RxSxT
elements squared Eliminate the core to get:

Minimize
s.t. A,B,C orthonormal fixed maximize this
If B & C are fixed, then we can solve for A as follows:

Optimal A is R left leading singular vectors for

101
details

Higher Order SVD (HO-SVD)

Not optimal, but
IxJxK often used to
IxR JxS initialize Tucker-
~ B ALS algorithm.
A
RxSxT

(Observe connection to Tucker1)

De Lathauwer, De Moor, & Vandewalle, SIMAX, 1980

102
Tucker-Alternating Least Squares (ALS)
Successively solve for each component (A,B,C).

• Initialize
– Choose R, S, T
IxJxK – Calculate A, B, C via HO-SVD
IxR JxS
• Until converged do…
= B
– A = R leading left singular
A vectors of X(1)(CB)
RxSxT
– B = S leading left singular
vectors of X(2)(CA)
– C = T leading left singular
vectors of X(3)(BA)
• Solve for core:

Kroonenberg & De Leeuw, Psychometrika, 1980

103
details
Tucker in Not Unique

IxJxK IxR JxS

~ B
A
RxSxT

Tucker decomposition is not unique. Let Y be

an RxR orthonormal matrix. Then…

[T. Kolda,’07]
104
Outline: Part 2
• Matrix Tools
• Tensor Tools
– Tensor Basics
– Tucker
• Tucker 1
• Tucker 2
• Tucker 3
– PARAFAC

105
CANDECOMP/PARAFAC
Decomposition

IxJxK
IxR JxR
~ B = +…+
A
RxRxR

• CANDECOMP = Canonical Decomposition (Carroll & Chang, 1970)

• PARAFAC = Parallel Factors (Harshman, 1970)
• Core is diagonal (specified by the vector )
• Columns of A, B, and C are not orthonormal
• If R is minimal, then R is called the rank of the tensor (Kruskal 1977)
• Can have rank ( ) > min{I,J,K}
106
details

PARAFAC-Alternating Least Squares (ALS)

Successively solve for each component (A,B,C).

= +…+

IxJxK
Khatri-Rao Product
(column-wise Kronecker product) Find all the vectors in
one mode at a time

Hadamard Product

If C, B, and  are fixed, the optimal A is given by:

Repeat for B,C, etc.

[T. Kolda,’07]
107
details

PARAFAC is often unique

IxJxK c1 cR
Assume
+…+ PARAFAC
= b1 bR decomposition
a1 aR is exact.

Sufficient condition for uniqueness (Kruskal, 1977):

kA = k-rank of A = max number k such that every set

of k columns of A is linearly independent
108
Tucker vs. PARAFAC Decompositions
• Tucker • PARAFAC
– Variable transformation in – Sum of rank-1 components
each mode – No core, i.e., superdiagonal
– Core G may be dense core
– A, B, C generally – A, B, C may have linearly
orthonormal dependent columns
– Not unique – Generally unique

IxJxK IxR JxS IxJxK c1 cR

~¼
B ¼ +…+
~ b1 bR
A
RxSxT a1 aR

109
Tensor tools - summary
• Two main tools
– PARAFAC
– Tucker
• Both find row-, column-, tube-groups
– but in PARAFAC the three groups are identical
• To solve: Alternating Least Squares

110
Tensor tools - resources
• Toolbox: from Tamara Kolda:
csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
• T. G. Kolda and B. W. Bader. Tensor
Decompositions and Applications. SIAM
Review 2008
• csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfil
es/TensorReview-preprint.pdf

111
Key Papers
Core Papers
• Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos: Colibri: fast mining
of large static and dynamic graphs. KDD 2008: 686-694
• Dhillon et al. Information-Theoretic Co-clustering, KDD’03
• T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review 2008

Further Reading
• Chih-Jen Lin: Projected Gradient Methods for Non-negative Matrix Factorization.
https://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf
• Candès, Emmanuel J., and Benjamin Recht. "Exact matrix completion via convex optimization."
Foundations of Computational mathematics 9, no. 6 (2009): 717.
• Rendle, S. (2010, December). Factorization machines. In 2010 IEEE International Conference on Data
Mining (pp. 995-1000). IEEE.
• Tamara G. Kolda, Brett W. Bader, Joseph P. Kenny: Higher-Order Web Link Analysis Using Multilinear
Algebra. ICDM 2005: 242-249
• U Kang, Evangelos E. Papalexakis, Abhay Harpale, Christos Faloutsos: GigaTensor: scaling tensor analysis
up by 100 times - algorithms and discoveries. KDD 2012: 316-324
• Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, Christos Faloutsos: Fully automatic
cross-associations. KDD 2004: 79-88
• Trigeorgis, G., Bousmalis, K., Zafeiriou, S., & Schuller, B. (2014, January). A deep semi-nmf model for
learning hidden representations. In International Conference on Machine Learning (pp. 1692-1700).
• Risi Kondor, Nedelina Teneva, and Vikas Garg. 2014. Multiresolution matrix factorization. In International
Conference on Machine Learning. 1620–1628

112

Nguyen Princeton 0181D 11063
No ratings yet
Nguyen Princeton 0181D 11063
168 pages
Pub Scientific-Computation - Gonnet09 PDF
No ratings yet
Pub Scientific-Computation - Gonnet09 PDF
249 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
RD Sharma Class 12 Maths Solutions Chapter 5 Algebra of Matrices
No ratings yet
RD Sharma Class 12 Maths Solutions Chapter 5 Algebra of Matrices
93 pages
Directsparsematrices
No ratings yet
Directsparsematrices
87 pages
Calculus 1 Quiz1 Finals Version 1 PDF
100% (13)
Calculus 1 Quiz1 Finals Version 1 PDF
259 pages
Data Science L30 - ManifoldLearning
No ratings yet
Data Science L30 - ManifoldLearning
79 pages
T25. Forecasting Big Time Series - Theory and Practice
No ratings yet
T25. Forecasting Big Time Series - Theory and Practice
166 pages
M4.arrays Searching Sorting
No ratings yet
M4.arrays Searching Sorting
44 pages
Limit & Continuity
89% (9)
Limit & Continuity
32 pages
(Ebook PDF) Analysis and Design of Algorithms 3rd Ed. Edition by Amrinder Arora PDF Download
100% (2)
(Ebook PDF) Analysis and Design of Algorithms 3rd Ed. Edition by Amrinder Arora PDF Download
61 pages
Matrix Algorithms
No ratings yet
Matrix Algorithms
74 pages
Data Science - UNIT - 5
No ratings yet
Data Science - UNIT - 5
57 pages
Advanced R
100% (2)
Advanced R
24 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
1symeonidis Panagiotis Zioupos Andreas Matrix and Tensor Fact
No ratings yet
1symeonidis Panagiotis Zioupos Andreas Matrix and Tensor Fact
101 pages
Class Xi Maths Trigonometry Practice Questions 2015 161
No ratings yet
Class Xi Maths Trigonometry Practice Questions 2015 161
11 pages
SIAM Invited Address, JMM18, San Diego, CA Jan. 11, 2018
No ratings yet
SIAM Invited Address, JMM18, San Diego, CA Jan. 11, 2018
40 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
Anti Derivatives
No ratings yet
Anti Derivatives
21 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
17-Matrix Sketching
No ratings yet
17-Matrix Sketching
65 pages
Chapter 1. Vectors, Matrices, and Arrays: Problem
No ratings yet
Chapter 1. Vectors, Matrices, and Arrays: Problem
26 pages
MIT18 409S15 Bookex
No ratings yet
MIT18 409S15 Bookex
123 pages
Day School 03
No ratings yet
Day School 03
32 pages
Clustering Lecture 1: Basics: Jing Gao
No ratings yet
Clustering Lecture 1: Basics: Jing Gao
62 pages
Fast and Exact Fixed-Radius Neighbor Search Based On Sorting
No ratings yet
Fast and Exact Fixed-Radius Neighbor Search Based On Sorting
17 pages
TSIndexing
No ratings yet
TSIndexing
64 pages
HPC Graph
No ratings yet
HPC Graph
22 pages
Elpis
No ratings yet
Elpis
12 pages
U5 - SVD - 5th Sem - DS
No ratings yet
U5 - SVD - 5th Sem - DS
17 pages
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
No ratings yet
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
100 pages
Intro To Duplicate Detection
No ratings yet
Intro To Duplicate Detection
87 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
02data Part4
No ratings yet
02data Part4
28 pages
100 Time Series Data Mining Questions With Answers
No ratings yet
100 Time Series Data Mining Questions With Answers
26 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
SMAI-M20-06: Data, Distances and Learning: C. V. Jawahar
No ratings yet
SMAI-M20-06: Data, Distances and Learning: C. V. Jawahar
24 pages
Lec 5
No ratings yet
Lec 5
24 pages
M2 - FDS
No ratings yet
M2 - FDS
20 pages
Recommender Systems Assignment
No ratings yet
Recommender Systems Assignment
10 pages
Z-Matrix (Mathematics) PDF
No ratings yet
Z-Matrix (Mathematics) PDF
7 pages
Algebra 2 Test For Lessons 23 MIDTERM
No ratings yet
Algebra 2 Test For Lessons 23 MIDTERM
14 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
The QR Algorithm: and Other Methods To Compute The Eigenvalues of Complex Matrices
No ratings yet
The QR Algorithm: and Other Methods To Compute The Eigenvalues of Complex Matrices
28 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Cluster Analysis Introduction
No ratings yet
Cluster Analysis Introduction
23 pages
19MAM81-GRLmidsem 1 Answer Key
No ratings yet
19MAM81-GRLmidsem 1 Answer Key
14 pages
Most Tensor Problems Are NP-Hard: ACM Reference Format
No ratings yet
Most Tensor Problems Are NP-Hard: ACM Reference Format
39 pages
Balaji Institute of Sciences: Narsampet, Warangal-506 331 2010-11
No ratings yet
Balaji Institute of Sciences: Narsampet, Warangal-506 331 2010-11
36 pages
Coding Assignment Report
No ratings yet
Coding Assignment Report
5 pages
Himanshu Maths Lab Manual
No ratings yet
Himanshu Maths Lab Manual
21 pages
Algorithms 17 00112 v2
No ratings yet
Algorithms 17 00112 v2
11 pages
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
No ratings yet
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
35 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
MATRIX FACTORIZ-WPS Office
No ratings yet
MATRIX FACTORIZ-WPS Office
15 pages
MATH UA 120 Discrete Mathematics Problem Set 9
No ratings yet
MATH UA 120 Discrete Mathematics Problem Set 9
10 pages
Planning Kopia
No ratings yet
Planning Kopia
4 pages
Bca - Graph Theory
No ratings yet
Bca - Graph Theory
38 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
EECS 275 Matrix Computation: Ming-Hsuan Yang
No ratings yet
EECS 275 Matrix Computation: Ming-Hsuan Yang
21 pages
Sparse Matrix Methods: Day 1: Overview
No ratings yet
Sparse Matrix Methods: Day 1: Overview
17 pages
Unsupervised Learning Algorithm 1
No ratings yet
Unsupervised Learning Algorithm 1
3 pages
Linear Algebra Course Project
No ratings yet
Linear Algebra Course Project
7 pages
Euclidean Distance Reconstruction From Partial Distance Information
No ratings yet
Euclidean Distance Reconstruction From Partial Distance Information
11 pages
Bisection Method
No ratings yet
Bisection Method
7 pages
Laplace Transformation by Shishir
No ratings yet
Laplace Transformation by Shishir
13 pages
Geng Shan
No ratings yet
Geng Shan
206 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Advanced Data Structures Lab
No ratings yet
Advanced Data Structures Lab
2 pages
Night of the Living Dogs: Book 3
From Everand
Night of the Living Dogs: Book 3
Trina Robbins
4/5 (1)
KPSVD
No ratings yet
KPSVD
6 pages
Near-Neighbor Search in Pattern Distance Spaces
No ratings yet
Near-Neighbor Search in Pattern Distance Spaces
5 pages
Large-Scale Data Mining CS 395T: Unique Number: 49460
No ratings yet
Large-Scale Data Mining CS 395T: Unique Number: 49460
4 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
Linear Algebra UCD
No ratings yet
Linear Algebra UCD
152 pages
Qualifying Round 2025 Answers
No ratings yet
Qualifying Round 2025 Answers
2 pages
Lecture 3
No ratings yet
Lecture 3
62 pages
Math 38 Mathematical Analysis III: I. F. Evidente
No ratings yet
Math 38 Mathematical Analysis III: I. F. Evidente
38 pages
HW 3.2.1 Solving Equations Using e and LN
No ratings yet
HW 3.2.1 Solving Equations Using e and LN
3 pages
Grade8 4 End of Unit Assessment (A) Assessment
100% (1)
Grade8 4 End of Unit Assessment (A) Assessment
4 pages
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
No ratings yet
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
15 pages
Rational Functions Match Game (Project)
No ratings yet
Rational Functions Match Game (Project)
7 pages
Akar Persamaan: Vita Kusumasari
No ratings yet
Akar Persamaan: Vita Kusumasari
9 pages
Monst3r Paper
No ratings yet
Monst3r Paper
21 pages
A Hierarchical 3D Gaussian Representation For Real-Time Rendering of Very Large Datasets
No ratings yet
A Hierarchical 3D Gaussian Representation For Real-Time Rendering of Very Large Datasets
15 pages
Lecture # 2 Types of Matrices:: Linear Algebra and Analytical Geometry
No ratings yet
Lecture # 2 Types of Matrices:: Linear Algebra and Analytical Geometry
3 pages
Graph - Xem lại lần làm thử - BK LMS
No ratings yet
Graph - Xem lại lần làm thử - BK LMS
32 pages
FRTN10 Multivariable Control
No ratings yet
FRTN10 Multivariable Control
38 pages
2.7.1 Definition:: LU Decomposition
No ratings yet
2.7.1 Definition:: LU Decomposition
43 pages
Introduction To Algebraic Topology - Gottsche, Lothar
No ratings yet
Introduction To Algebraic Topology - Gottsche, Lothar
52 pages
Prerequisite Skills
No ratings yet
Prerequisite Skills
5 pages
Xu Et Al. 2022
No ratings yet
Xu Et Al. 2022
12 pages
Answers10 Ra
No ratings yet
Answers10 Ra
8 pages
Chen Et Al. 2018
No ratings yet
Chen Et Al. 2018
10 pages
Non-Commuting Graph of A Group
No ratings yet
Non-Commuting Graph of A Group
25 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
A Cyclic Coloring of Central Graph of Gear Graph Families
No ratings yet
A Cyclic Coloring of Central Graph of Gear Graph Families
4 pages

03 Matrix

Uploaded by

03 Matrix

Uploaded by

CS 514 Advanced Topics in Network Science

Lecture 3. Matrix and Tensor

(e.g., clusters, communities,

(e.g., ranking, link prediction, embedding, etc.)

L: author group matrix

ICML = [1, 1, 0]’

• Column Space of a Matrix

VLDB = SIGMOD – ICML = [2 0 0]’

Projection of v Projection matrix of B An arbitrary vector 14

Projection of A Projection matrix of B

A: n x m U: left singular vectors 17

–For any rank-k matrix Ak

Duplicate: deleted in CMD!

• 3 copies of green, Right matrix : LT A

Current For each col. v in C

• (C1) Time: O(c 3 + cm) vs. O(c3 + cm ), where c  c, m  m

Each dot is a conference

2nd singular vector

1st singular vector

Each dot is a conference

Each dot is a conference

Each dot is a conference

Each dot is a conference

Initially sampled matrix

Q: How to update L and M efficiently? 46

Initially sampled matrix

Theorem 2 [Tong et al KDD 2008] -1

We only need an s x s matrix inverse !

We only need a matrix inverse of size

Wish List SVD CUR/CMD Colibri

• More on Matrix Low Rank Approximations

Q: How to get the low-rank matrix approximations?

• Factorizing a nonnegative matrix to the

(1) Projected Gradient: http://www.csie.ntu.edu.tw/~cjlin/nmf/

||X – F GT ||F Dependent on X,

min ||ΔX – ΔF GT – F ΔGT - ΔF ΔGT ||F Independent on X

Can be solved by the projected gradient descent method

Red: Our method; Blue: Off-line method

Q: How to get the low-rank matrix approximations?

Weighted Frobenius Form

Common in Any MF Weight

Unique in NrMF Non-negative residual

SDM 2013, Austin, Texas 72

.05 .05 .05 0 0 0  med. terms

– Best rank-k approximation in L2

83 See also PARAFAC

Tensor Mode-n Multiplication

• Tensor Times Matrix • Tensor Times Vector

Compute the dot

Mode-n product Example

Mode-n product Example

IxJxK IxR JxS IxJxK wI1 x R wR

Specially Structured Tensors

In matrix form: In matrix form:

IxJxK IxR JxS

• author x keyword x conference

IxJxK IxR JxS

• Proposed by Tucker (1966) Recall the equations for

Optimal A is R left leading singular vectors for

Higher Order SVD (HO-SVD)

(Observe connection to Tucker1)

De Lathauwer, De Moor, & Vandewalle, SIMAX, 1980

Kroonenberg & De Leeuw, Psychometrika, 1980

IxJxK IxR JxS

Tucker decomposition is not unique. Let Y be

• CANDECOMP = Canonical Decomposition (Carroll & Chang, 1970)

PARAFAC-Alternating Least Squares (ALS)

If C, B, and  are fixed, the optimal A is given by:

Repeat for B,C, etc.

PARAFAC is often unique

Sufficient condition for uniqueness (Kruskal, 1977):

kA = k-rank of A = max number k such that every set

IxJxK IxR JxS IxJxK c1 cR

You might also like