0% found this document useful (0 votes)

48 views79 pages

CVX Lecture Graphs

The document discusses optimization methods for learning graphs from data. It introduces graphical models and different types of graphical models used to represent relationships between entities. It describes learning graphs by estimating the edge weights between nodes from data using physically or probabilistically motivated models. The goal is to learn graph representations that encode specific structures for downstream tasks like classification. Experiments are conducted to validate the graph learning approaches.

Uploaded by

Seuneedhi Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views79 pages

CVX Lecture Graphs

Uploaded by

Seuneedhi Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Optimization Methods for Graph Learning

Sandeep Kumar and Prof. Daniel P. Palomar

The Hong Kong University of Science and Technology

ELEC5470/IEDA6100A - Convex Optimization

Fall 2019-20
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

1/78
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

2/78
Graphical models

Representing knowledge through graphical models

x4
x7

x2 x3

x6 x5
x9

x1 x8

I Nodes correspond to the entities (variables).

I Edges encode the relationships between entities (dependencies between

the variables).

3/78
Examples

Learning relational dependencies among entities benefits numerous

application domains.

Figure 2: Social Graph

Figure 1: Financial Graph

Objective: To model behavioral similarity/

Objective: To infer inter-dependencies of influence between people.
financial companies.
Input xi is the individual online activities
Input xi is the economic indices (stock
(tagging, liking, purchasing).
price, volume, etc.) of each entity.
4/78
Graphical model importance

I Graphs are intuitive way of representing and visualising the

relationships between entities.
I Graphs allow us to abstract out the conditional independence
relationships between the variables from the details of their parametric
forms. Thus we can answer questions like: “Is x1 dependent on x6
given that we know the value of x8 ?” just by looking at the graph.
I Graphs are widely used in a variety of applications in machine learning,
graph CNN, graph signal processing, etc.
I Graphs offer a language through which different disciplines can
seamlessly interact with each other.
I Graph-based approaches with big data and machine learning are driving
the current research frontiers.

Graphical Models = Statistics × Graph Theory × Optimization × Engineering

5/78
How to learn a graph?

Graphical models are about having a graph representation that can encode
relationships between entities.
In many cases, the relationships between entities are straightforward:
I Are two people friends in a social network?
I Are two researchers co-authors in a published paper?

In many other cases, relationships are not known and must be learned:
I Does one gene regulate the expression of others?
I Which drug alters the pharmacologic effect of another drug?

The choice of graph representation affects the subsequent analysis and

eventually the performance of any graph-based algorithm.

The goal is to learn a graph representation of data with specific properties

(e.g., structures).

6/78
Schematic of graph learning
I Given a data matrix X ∈ Rn×p = [x1 , x2 , . . . , xp ], each column
xi ∈ Rn is assumed to reside on one of the p nodes and each of the n
rows of X is a signal (or feature) on the same graph.
I The goal is to obtain a graph representation of the data.
x4
w 47
x7 w 24 w 49

w 37
w 67 w 23
w 57 x2 x3
w 26
w 25
w 29
w 56
x6 x5 w 59 w 38
x9

w 16 w 15 w 98
w 19

w 18
x1 x8

Graph is a simple mathematical structure of form G = (V, E, W), where

I V contains the set of nodes V = {1, 2, 3, . . . , p}, and
I E = {(1, 2), (1, 3), . . . , (i, j), . . . , (p, p − 1)} contains the set of edges
between any pair of nodes (i, j).
I Weight matrix W encode the relationships strength.
7/78
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

8/78
Graph and its matrix representation
Connectivity matrix C, Adjacency matrix W , and Laplacian matrix L.
  
1 if (i, j) ∈ E
 wij
 if (i, j) ∈ E −wij
 if (i, j) ∈ E
[C]ij = 0 if (i, j) ∈
/E [W]ij = 0 if (i, j) ∈
/E [L]ij = 0 if (i, j) ∈
/E
  Pp

wij if i = j
0 if i = j 0 if i = j
 
j=1

Example: V = {1, 2, 3, 4}, E = {(1, 2), (1, 3), (2, 3), (2, 4)} W = {2, 2, 3, 1}.

     
0 1 1 0 0 2 2 0 4 −2 −2 0
 1 0 1 1   −2 −3 −1 
, W =  2 0 3 1  6

C= , L=
 −2

 1 1 0 0   2 3 0 0  −3 5 0 
0 1 0 0 0 1 0 0 0 −1 0 1
9/78
Graph matrices
I The adjacency matrix W = [wij ] and the Laplacian matrix L both
represent the same weighted graph and are related:

L = D − W,

where D = Diag(W · 1) is the diagonal matrix, whoseP(i, i)-element is

the sum of W0 s i-th row (Lij = −wij ∀i 6= j, Lii = i6=j wij ).
I L and W have different mathematical properties, e.g., L is PSD, while
W is not.

I A valid set for the adjacency matrix W:

W = W ∈ Rp×p | Wij = Wji ≥ 0, i 6= j, Wii = 0

I A valid set for the Laplacian matrix L:

 
 X 
L = L ∈ Rp×p |Lij = Lji ≤ 0, i 6= j, Lii = − Lij
 
i6=j

10/78
Types of graphical models

I Models encoding direct dependencies: simple and intuitive.

I Sample correlation based graph: two entities i and j are connected if
the pairwise coorelation is greater than certain threshold (ρij > α,) and
vice versa.
I Similarity function (e.g., Gaussian RBF) based graph:
[W]ij = exp(− kxi − xj k22 /2σ 2 ), if i 6= 0, and Wii = 0. Here, the
scaling parameter σ 2 controls how rapidly the affinity [W]ij falls off
with distances between xi and xj .

I Models based on some assumption on the data: X ∼ F(G)

I Statistical models: F represents a distribution by G (e.g., Markov model
and Bayesian model).
I Physically-inspired models: F represents generative model on G (e.g.,
diffusion process on graphs, smooth signals defined over graphs).

11/78
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

12/78
Graph learning from smooth signals
Quantifying smoothness:
X 2
tr XLX> = wij kxi − xj k2 .
i,j

A smaller tr X> LX indicating a smoother signal X over the graph G,

where L is its Laplacian matrix.

When a graph G is not already available, we can learn it directly from the
data X by finding the graph weights that minimize the smoothness term
combined with some regularization term:

L̂ := arg min tr XLX> + λh(L),

I Smaller distance kxi − xj k22 between data points xi and xj will force
to learn a graph with larger affinity value wij , and vice versa.
I Higher value of weight wij will imply the features xi and xj are similar,
and hence, strongly connected.
I h(L) is a regularization function (e.g., kLk1 , kLk2F , log det(L))
13/78
Ex 1. Learning graphs under constraints

resVariable: W = [w1 , w2 , . . . , wp ], where wi is the vector containning

weights of the edges connected to node i: wi ∈ Rp = [wi1 , wi2 , . . . , wip ].
It is often desired to learn a graph with additional properties:
I bounded weights: (0 ≤ wij ≤ 1),
Pp >
I edge weights per node sum up to one: j=1 wij = w 1 = 1,
Pp 2
I bounded energy: j=1 wij .
Problem formulation:
P 2 λ
Pp 2
minimize
p i,j wij kxi − xj k2 + 2 j=1 wij ,
{wij }i,j=1
Pp
subject to j=1 wij = 1, wij ≥ 0, wii = 0 ∀, i, j.

This a convex optimization problem, we will obtain a closed-form update

rule.

14/78
Contd...

I Define eij = kxi − xj k22 , and ei as a vector with the j-th element as
eij .
I For wi , we have the following problem:

ei 2
1

minimize 2
wi +
λ 2 ,
wi
subject to wi> 1 = 1, wij ≥ 0, wii = 0.

I The Lagrangian function of this problem is

1 2
ei
L(wi , ηi , β i ) = wi + − ηi (wi> 1 − 1) − β >
i wi

2 λ 2
where ηi > 0 and β i ∈ Rp are the Lagrangian multipliers, where
β ij ≥ 0 ∀ i 6= j.

15/78
KKT optimality
The optimal solution ŵi should satisfy that the derivative of the Lagrangian
w.r.t wi is equal to zero, so we have
ei
ŵi + − ηi 1 − β i = 0
λ
Then for the j-th element of ŵi , we have
eij
ŵij + − ηi − βij = 0
λ
Noting that ŵij βij = 0 (complementary slackness), then from the KKT
condition:
e +
ij
ŵij = − + ηi , where (a)+ = max(a, 0).
λ
ηi is found to satisfy the constraint wi> 1 = 1 ∀ i = 1, . . . , p.
Additional goal: sparsity in the graph. How do we enforce sparsity?
i) Use sparsity enforcing regularizer, e.g., `1 -norm.
ii) Chose λ such that each node has exactly m << p neighbors, i.e.,
kwi k0 = m.
We will explore the second path for enforcing sparsity. 16/78
Sparsity by chosing λi : number of neighbors kŵi k0 = m
Without loss of generality, suppose ei1 , ei2 , . . . , ein are ordered in increasing
order. By design, we have wii = 0.
Constraint kŵi k0 = m implies ŵim > 0 and ŵi,m+1 = 0. Therefore, we have

eim ei,m+1
− + 2ηi > 0, and − + 2ηi ≤ 0
λi λi

Combining the solution of ŵ with the constraint ŵ> 1 = 1, we have

m m
X eim 1 1 X
− + 2ηi = 1 =⇒ ηi = + eij .
j=1
λi m 2mλi j=1

This leads to following inequality for λi :

m m
m 1X m 1X
eim − eij < λi ≤ ei,m+1 − eij
2 2 j=1 2 2 j=1

17/78
Contd...

Therefore, to obtain an optimal solution ŵi with exactly m nonzero values

(kŵi k0 = m), the maximal λi is
m
m 1X
λi = ei,m+1 − eij
2 2 j=1

Combining the previous results, the optimal {ŵij }i6=j can be obtained as
follows:

−eij
( ei,m+1P
mei,m+1 − m , j≤m
ŵij = h=1 eih

0, j>m

18/78
Ex 2. Graph based clustering
Goal of graph based clustering: Given an initial connected graph W infer
the k-component graph S.
1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

(iii) W (iv) S

(v) Initial graph (vi) Desired graph

19/78
Eigenvalue property of Laplacian matrix L

L = U> Diag(λ1 , λ2 , · · · , λp )U
where U are the eigenvectors, and [λ1 , λ2 , . . . , λp ] are the eigenvalues in the increasing
order and have the following property:
{{λj = 0}kj=1 , c1 ≤ λk+1 ≤ · · · ≤ λp ≤ c2 },
where the k zero eigenvalues denote the number of components.

50
14

Eigenvalues of Laplacian Matrix

56 51 12
60
53 47 48

45
58
49 54
10
43
46 41

57 42 44
59

55 8

31 6

30 33 39 11
18 1
9 4
23 14
34
17
3 2 5
29 21 40
35 27 16 12
22
32 4 10 6 2
20
26
28 15 8 3 zero eigenavlues of 3 component graph Laplacian
37 19
25
36 38

24 13 0
0 10 20 30 40 50 60
Number of nodes(eigenvalues)

20/78
Constrained Laplacian Rank (CLR) for clustering

I Compute from data W as [W]ij = exp(− kxi − xj k22 /2σ 2 ). W may

not exhibit the true component structure.
I Goal: To infer S from W such that S captures the true component
structure, i.e., k-component structure.
I Let Ls = Diag(S1) − S denote the Laplacian for S.
CLR problem formulation:

2
minimize kS − WkF ,
S=[s1 ,..,sp ]
subject to s>
i 1 = 1, si ≥ 0, sii = 0.
rank(Ls ) = p − k

I Let λ(Ls ) = {0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λp } denote the eigenvalues of Ls in

the increasing order.
I rank(Ls ) = p − k =⇒ λ(Ls ) = {λi = 0}ki=1 , {λi > 0}pi=k+1

21/78
CLR problem reformulation

2 Pk
minimize kS − WkF + β i=1 λi (Ls ),
S=[s1 ,..,sp ]
subject to s>
i 1 = 1, si ≥ 0, sii = 0.

For sufficiently
Plarge β, the optimal solution S to this problem will make the
k
second term i=1 λi (Ls ) equal to zero.
Ky Fan’s Theorem [Fan, 1949]:
k
X
λi (Ls ) = minimize tr F> Ls F, subject to F> F = I.

minimize
{λi (Ls )}k
i=1
F∈Rp×k
i=1

Using the Ky Fan’s Theorem, we can force the first k eigenvalues to zero via
the following formulation:

2
kS − WkF + βtr F> Ls F ,

minimize
S,F∈Rp×k
subject to s>
i 1 = 1, si ≥ 0, sii = 0,
F> F = I.
22/78
Solving for F and S alternately
I Sub-problem for F :
minimize tr F> Ls F, subject to F> F = I.

F∈Rp×k

I The optimal solution of F is formed by the k eigenvectors of Ls

corresponding to the k smallest eigenvalues.
I Sub-problem for S:
Pp 2 Pp 2
minimize
p i,j ksij − wij k2 + β i,j kfi − fj k2 sij .
{sij }i,j=1
subject to s>
i 1 = 1, si ≥ 0, sii = 0.

I Note that the problem is independent for different i, so we can solve

the following problem separately for each i:
n
X n
X
2 2
minimize ksij − wij k2 +β kfi − fj k2 sij .
s>
i 1=1,si ≥0 j i,j
2
β
2

minimize si − w i − v i ,
vij = kfi − fj k2 .
>
si 1=1,si ≥0
2 2
I We have seen this problem before, which has a closed-form solution.
23/78
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

24/78
Gaussian Markov random field (GMRF)
A random vector x = (x1 , x2 , . . . , xp )> is called a GMRF with parameters
(0, Θ), if its density follows:

(−p/2) 1 1 >
p(x) = (2π) (det(Θ)) exp − (x Θx) .
2
2

The nonzero pattern of Θ determines a conditional graph G = (V, E) :

Θij 6= 0 ⇐⇒ {i, j} ∈ E ∀ i 6= j
xi ⊥ xj |x/(xi , xj ) ⇐⇒ Θij = 0

I For a Gaussian distributed data x ∼ N (0, Θ† ) the graph learning is

simply an inverse covariance (precision) matrix estimation problem
[Lauritzen, 1996].
I If the rank(Θ) < p then x is called an improper GMRF (IGMRF)
[Rue and Held, 2005].
I If Θij ≤ 0 ∀ i 6= j then x is called an attractive improper GMRF
[Slawski and Hein, 2015].
25/78
Historical timeline of Markov graphical models
Pn
Data X = {x(i) ∼ N (0, Σ = Θ† )}ni=1 , S= 1
n i=1 (x
(i)
)(x(i) )>
I Covariance selection [Dempster, 1972]: graph from the elements of S−1
inverse sample covariance matrix.
I Neighborhood regression [Meinshausen and Bühlmann, 2006]:

arg min |x(1) − β1 X/x(1) |2 + αkβ1 k1 .

β1

I `1 -regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]:

maximize log det(Θ) − tr ΘS − αkΘk1 .
Θ0

I Ising model: `1 -regularized logistic regression [Ravikumar et al., 2010].

I Attractive IGMRF [Slawski and Hein, 2015].
I Laplacian structure in Θ [Lake and Tenenbaum, 2010].
I `1 -regularized MLE with Laplacian structure
[Egilmez et al., 2017, Zhao et al., 2019]
26/78
Setting our goal

I Given data X = {x(i) ∼ N (0, Σ = Θ† )}ni=1 ,

1
Pn (i) >
I we have sample covariance matrix sample S = n i=1 (x ) (x(i) ).
I Learn the graph matrix Θ.

27/78
Penalized MLE Gaussian Graphical Modeling (GGM)

I `1 -regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]:

maximize log det(Θ) − tr ΘS − αkΘk1 .
Θ0

I When the data is coming from p-variate Gaussian distribution, it is the

MLE of the inverse covariance matrix.
I For arbitrarily distributed data: it is the log-determinant Bregman
divergence regularized optimization problem.
I Computationally efficient method to solve this problem, the famous
Graphical Lasso (GLasso) algorithm [Friedman et al., 2008].
I An elegant and statistically consistent way of associating a graph with
given data and has become an integral part of numerous applications.

28/78
Additional structure

maximize log det(Θ) − tr ΘS − αkΘk1 .
Θ0

I Attractive IGMRF model [Slawski and Hein, 2015]:

n o
SΘ = Θ 0, Θij = Θji ≤ 0, i, j = 1, . . . , p, i 6= j .

I Laplacian
n structure o
P
SL = Θ|Θij = Θji ≤ 0 for i 6= j, Θii = − j6=i Θij . But Laplacian
is a PSD matrix and the log det is only defined for a PD matrix.
I Modified Laplacian structure [Lake and Tenenbaum, 2010]:
1
maximize log det(Θ̃) − tr Θ̃S − αkΘ̃k1 , subject to Θ̃ = Θ + 2 I
Θ̃0 σ
I `1 -regularized MLE with Laplacian structure after adding J = p1 11>
[Egilmez et al., 2017, Zhao et al., 2019].

maximize log det(Θ + J) − tr ΘS − αkΘk1 ,
Θ∈SL
29/78
GLasso algorithm [Friedman et al., 2008]

maximize log det(Θ) − tr ΘS − αkΘk1 .
Θ0

The optimality condition for the above problem is

Θ−1 − S − αΓ = 0,

where Γ is a matrix of component-wise signs of Θ:

(
γjk = sign(Θjk ), if Θjk 6= 0
[Γ]jk =
γjk ∈ [−1, 1], if Θjk 6= 0.
The equation for optimality condition is also known as the normal equation.
Further, the constraint requires Θjj to be positive, this implies that

Σ̂ii = Sii + α, i = 1, . . . , p,

where Σ̂ = Θ−1 .
30/78
GLasso uses a block-coordinate method for solving the problem. Consider a
partitioning of Θ and Σ̂:

Θ11 θ 12 Σ̂11 σ̂ 12
Θ= , Σ̂ = ,
θ>12 Θ22 σ̂ >
12 Σ̂22

where Θ11 ∈ R(p−1)×(p−1) , θ 12 ∈ Rp−1 and Θ22 is a scalar, and similarly for
the other partitions. Next, ΘΣ̂ = I : Σ̂ = Θ−1 can be expressed as
 
Θ−1 > −1
11 θ 12 θ 12 Θ11 Θ−1
11 θ 12
Θ−1
11 + Θ22 −θ 21 Θ−1 Θ22 −θ > −1
Σ̂ =  11 θ 12 12 Θ11 θ 12 
1
· Θ −θ Θ−1 θ
22 21 11 12

GLasso solves for a row/column of Θ at a time, holding the rest fixed.

Considering the pth column of the normal equation, we get

−σ̂ 12 + s12 + αγ 12 = 0.

31/78
GLasso [Mazumder and Hastie, 2012]
Consider reading off σ̂ 12 from the partitioned expression:

Θ−1
11 θ 12
+ s12 + αγ 12 = 0.
Θ22 − θ > −1
12 Θ11 θ 12

The above also simplifies to

−1
Θ11 θ 12 Σ̂22 + s12 + αγ 12 = 0.

with ν = θ 12 Σ̂22 (with Σ̂22 fixed), Θ11 0 is equivalent to the stationary

condition for

1 > −1 >
minimize ν Θ 11 ν + ν s12 + α kνk1
ν∈Rp−1 2
Let ν ? be the minimizer, then

θ ?12 = ν ? Σ̂22
1
Θ?22 = + (θ ?12 )> Θ−1 ?
11 θ 12
Σ̂22
32/78
GLasso algorithm summary

−1
1: Initialize: Σ̂ = Diag(S) + αI, and Θ = Σ̂ . in what follows.
2: Repeat untill convergence criteria is met
(a) Rearrange row and column such that the target column is last
(implicitly).
σ̂ σ̂ >
(b) Compute Θ−1 = Σ̂11 − 12 Σ̂22
12

?
(c) Obtain ν and update θ 12 and Θ?22 .
(d) Update Θ and Σ̂ using the second partition function, ensuring
ΘΣ̂ = I.
3: Output the precision matrix Θ.

33/78
Learning a graph with Laplacian constraint

34/78
Solving for the Laplacian constraint [Zhao et al., 2019]

maximize log det(Θ + J) − tr ΘS − αkΘk1 .
Θ∈SL
n P o
where SL = Θ|Θij = Θji ≤ 0 for i 6= j, Θii = − j6=i Θij .
Now that Θ satisfies the Laplacian constraints, the off-diagonal elements of
Θ are non-positive and the diagonal elements are non-negative, so

kΘk1 = tr (ΘH)

where H = 2I − 11> . Thus, the objective function becomes

log det(Θ + J) − tr ΘS − αtr (ΘH)

= log det(Θ + J) − tr ΘK

where K = S + αH.
35/78
Solving with known connectivity information: Approach 1

Goal: To estimate the graph Laplacian weights Θ with known connectivity

information C.

maximize log det(Θ + J) − tr ΘK
Θ∈SL (C)

The constraint set SL (C) can be written as


Θ 0, Θ1 = 0

Θij = Θji ≤ 0 if Cij = 1 for i 6= j

Θij = Θji = 0 if Cij = 0 for i 6= j


36/78
We further suppose the graph has no self loops, so the diagonal elements of
C are all zero. Then, the constraint set SL (C) can be compactly rewritten
in the following way:

Θ = Ω,
Θ ∈ SΘ = {Θ|Θ 0, Θ1 = 0}
Ω ∈ SΩ = {Ω|I Ω ≥ 0, B Ω = 0, C Ω ≤ 0}

where B = 11> − I − C. The constraint I Ω ≥ 0 is implied from the

constraint Θ 0.
Next, we will present an equivalent form of the constraints Θ1 = 0 and
Θ 0:

Θ = PΞP>

where P ∈ Rp×(p−1) is the orthogonal complement of 1, i.e., P> P = Iand

P> 1 = 0. Note that the choice of P is nonunique. Why?
Now with an alternative form of Θ, we can rewrite the objective function as
follows:

tr (ΘK) = tr ΞK̃ , where K̃ = P> KP
37/78
log det(Θ + J)

> 1 >
= log det PΞP + 11
p
= log det(Ξ)

Thus, the problem formulation changes to

minimize tr ΞK̃ − log det Ξ
Ξ,Ω
subject to Ξ0
PΞP> = Ω 
I Ω ≥ 0
B Ω = 0 Ω ∈ C.
CΩ≤0


Note: Convex problem with linearly constrained variables Ω, Ξ.

An alternating direction method of multipliers (ADMM) based method is
suitable for such problem.
38/78
Sketch of the ADMM update

minimize f (x) + g(z)

x,z

subject to Ax + Bz = c
where x ∈ Rn , z ∈ Rm A ∈ Rp×n and B ∈ Rp×m and c ∈ Rp . The
augmented Lagrangian is written as
ρ 2
Lρ (x, z, y) = f (x) + g(z) + y> (Ax + Bz − c) + kAx + Bz − ck2
2
where ρ > 0 is the penalty parameter. Now, the ADMM subroutine cycles
through following updates until it converges
1: Initialize: z(0) , y(0) , ρ
2: t←0
3: while stopping criterion is not met do
4: z(t+1) = arg minz∈Z Lρ (x(t) , z, y(t) )
5: x(t+1) = arg minx∈X Lρ (x, z(t+1) , y(t) )
6: y(t+1) = y(t) + ρ Ax(t+1) + Bz(t+1) − c
7: t←t+1
8: end while 39/78
Edge based formulation: Approach 2

Suppose there are M edges present in a graph, and the mth edge connects
vertex im and jm . Then we can always perform the following decomposition
on its graph Laplacian matrix Θ:

M
X
Θ= Wim jm (eim e> > > >
im + ejm ejm − eim ejm − ejm eim )
m=1
M
X
= Wim jm (eim − ejm )> (eim − ejm )
m=1
= EDiag(w)E>

where w = {Wim jm } represents the weights on the edges. The matrix E is

known as the incidence matrix, can be inferred from the connectivity matrix
C.
This decomposition technique is motivated for highly sparse graph structures
(e.g., tree graphs).

40/78
Problem reformulation
minimize − log det(EDiag(w)E> + J) + tr EDiag(w)E> K .

w≥0

Furthermore, with J = p1 11> , we have

1
EDiag(w)E> + 11> = [E, 1]Diag([w, 1/p]> )[E, 1]> = GDiag([w, 1/p]> )G>
p

Now the problem can be expressed as

minimize − log det(GDiag([w, 1/p]> )G> ) + tr EDiag(w)E> K

w≥0

The problem is convex, we will obtain a simple closed form update rule via
the majorijation-minimization (MM) approach[Sun et al., 2016].

We start with following basic inequality:

log det(X) ≤ log det(X0 ) + tr X−1

0 (X − X0 )

which is due to the concavity of the log-determinant function.

41/78
Majorijation function

− log det(GXG> ) = log det (GXG> )−1

−1
≤ log det (GX0 G> )−1 + tr (GX0 G> )−1 (GXG> )−1 − (GX0 G> )−1

=tr F (GXG> )−1 + const.,

where F0 = GX0 G> we substitute Diag([w> , 1/p> ]> ) for X, and the
minimization problem becomes

minimize tr EDiag(w)E> K

w≥0
−1
+ tr F0 GDiag [w0> , 1/p]> G>

with F0 = GX0 G> = GDiag([w0> , 1/p]> )G> .

Yet, this minimization problem does not yield a simple closed-form solution.
For the sake of algorithmic simplicity, we need to further majorize the
objective.

42/78
Double majorization and optimal solution
For any YXY > 0, the following matrix inequality holds:

(YXY > )−1 4 Z−1

0 YX0 X
−1
X0 Y > Z−1
0 , (1)
where Z0 = YX0 Y> . Equality is achieved at X = X0 .
Now, we are able to do the following (define w̃ = [w, 1/p]> and w̃0 = [w0 , 1/p]> ):
−1
tr F0 GDiag(w̃)G>

1/2 −1

1/2
=tr F0 GDiag(w̃)G> F0

1/2 1/2
≤tr F0 F−10 GDiag(w̃0 )Diag(w̃)
−1
Diag(w̃0 )G> F−1
0 F0

where Y = G, X = diag(w), X0 = diag(w̃0 ), Z0 = F0 . The majorized problem can be

expressed as
minimize diag(R)> w + diag(QM )> w−1
w≥0

where R = E> KE, QM = Q1:M,1:M , and

Q = Diag(w̃0 )G> (GDiag(w̃0 )G> )−1 GDiag(w̃0 ). The optimal solution to this problem
is
q
w? = diag(QM ) diag(R)−1 .
43/78
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

44/78
Structured graphs

(vii) Multi-component (viii) Regular graph (ix) Modular graph

graph

(x) Bipartite graph (xi) Grid graph (xii) Tree graph

45/78
Structured graphs: importance and challenges
Useful structures:
I Multi-component: graph for clustering, classification.
I Bipartite: graph for matching and constructing two-channel filter
banks.
I Multi-component bipartite: graph for co-clustering.
I Tree: graphs for sampling algorithms.
I Modular: graph for social network analysis.
I Connected sparse: graph for graph signal processing applications.
Structured graph learning from data
I involves both the estimation of structure (graph connectivity) and
parameters (graph weights),
I parameter estimation is well explored (e.g., maximum likelihood),
I but structure is a combinatorial property which makes structure
estimation very challenging.
Structure learning is NP-hard for a general class of graphical models
[Bogdanov et al., 2008].
46/78
Structured graphs: direction

State-of-the-art direction
I The effort has been on characterizing the families of structures for
which learning can be made feasible e.g., maximum weight spanning
tree for tree structure [Chow and Liu, 1968] and local-separation and
walk summability for Erdos-Renyi graphs, power-law graphs, and
small-world graphs [Anandkumar et al., 2012].
I Existing methods are restricted to some particular structures and it is
difficult to extend them to learn other useful structures, e.g.,
multi-component, bipartite, etc.

Proposed direction: Graph (structure) ⇐⇒ Graph matrix (spectrum)

I Spectral properties of a graph matrix is one such characterization
[Chung, 1997] which is considered in the present work.
I Under this framework, structure learning of a large class of graph
structures can be expressed as an eigenvalue problem of the graph
matrices.

47/78
Problem statement

To learn structured graphs via spectral

constraints

48/78
Motivating example 1: structure via Laplacian eigenvalues
Θ = UDiag(λ)U>
For a multi-component graph the first k eigenvalues of its Laplacian matrix
are zero:
Sλ = {{λj = 0}kj=1 , c1 ≤ λk+1 ≤ · · · ≤ λp ≤ c2 }

50
14

Eigenvalues of Laplacian Matrix

56 51 12
60
53 47 48

45
58
49 54
10
43
46 41

57 42 44
59

55 8

31 6

30 33 39 11
18 1
9 4
23 14
34
17
3 2 5
29 21 40
35 27 16 12
22
32 4 10 6 2
20
26
28 15 8 3 zero eigenavlues of 3 component graph Laplacian
37 19
25
36 38

24 13 0
0 10 20 30 40 50 60
Number of nodes(eigenvalues)

49/78
Motivating example 2: structure via adjacency eigenvalues
Adjacency matrix ΘA : ΘA = Diag(diag(Θ)) − Θ.
ΘA = VDiag(ψ)V>
For a bipartite graph the eigenvalues are symmetric about the origin:
Sψ = {ψi = −ψp−i+1 , ∀i = 1, . . . , p}.

4
30 60
29 59
28 58
27 57 3
26 56
25 55

Eigenvalues of Adjacency Matrix

24 54
23 53 2
22 52
21 51
20 50
1
19 49
18 48
17 47
16 46 0
15 45
14 44
13 43
12 42 -1
11 41
10 40
9 39
8 38
-2
7 37
6 36
5 35
-3 Eigenvalues symmetric along 0
4 34
3 33
2 32
1 31 -4
0 10 20 30 40 50 60
Number of nodes(eigenvalues)

50/78
Proposed unified framework for SGL

maximize log gdet(Θ) − tr ΘS − αh(Θ),
Θ
subject to Θ ∈ SΘ , λ(T (Θ)) ∈ ST

I gdet is the generalized determinant defined as the non-zero eigenvalues

product,
I SΘ encodes the typical constraints of a Laplacian matrix,
I λ(T (Θ)) is the vector containing the eigenvalues of matrix T (Θ),
I T (·) is the transformation matrix to consider the eigenvalues of
different graph matrices, and
I ST allows to include spectral constraints in the eigenvalues.
I Precisely ST will facilitate the process of incorporating the spectral
properties required for enforcing structure.

The proposed formulation has converted the combinatorial structural

constraints into analytical spectral constraints.
51/78
Optimization for Laplacian spectral constraints

maximize log gdet(Θ) − tr ΘS − α kΘk1 ,
Θ,λ,U
subject to Θ ∈ SΘ , Θ = UDiag(λ)U> , λ ∈ Sλ , U> U = I,

where λ = [λ1 , λ2 , . . . , λp ] is the vector of eigenvalues and U is the matrix

of eigenvectors.

The resulting formulation is still complicated and intractable:

I Laplacian structural constraints,
I non-convex constraints coupling Θ, U, λ, and
I non-convex constraints on U.

In order to derive a feasible formulation:

I we first introduce a linear operator L that transforms the Laplacian
structural constraints to simple algebraic constraints and
I then relax the eigen-decomposition expression into the objective
function. 52/78
Linear operator for Θ ∈ SΘ

n X o
SΘ = Θ|Θij = Θji ≤ 0 for i 6= j, Θii = − Θij ,
j6=i

Θij = Θji ≤ 0 and Θ1 = 0 implying the target matrix is symmetric with

degrees of freedom of Θ equal to p(p − 1)/2.
p(p−1)/2
We define a linear operator L : w ∈ R+ → Lw ∈ Rp×p , which maps a
weight vector w to the Laplacian matrix:
[Lw]ij = [Lw]ji ≤ 0; i 6= j
X
[Lw]ii = − [Lw]ij
j6=i

Example of Lw on w = [w1 , w2 , w3 , w4 , w5 , w6 ]> :

 P
wi −w1 −w2 −w3

i=1,2,3 P
 −w1 i=1,4,5 wi −w4 −w5 
Lw =  P .
 −w2 −w4 i=2,4,6 wi −w6 
P
−w3 −w5 −w6 i=3,5,6 wi
53/78
Problem reformulation

maximize log gdet(Θ) − tr ΘS − α kΘk1 ,
Θ,λ,U
subject to Θ ∈ SΘ , Θ = UDiag(λ)U> , λ ∈ Sλ , U> U = I,

Using: i) Θ = Lw and ii) tr ΘS + αh(Θ) = tr ΘK , K = S + H and
H = α(2I − 11> ) the proposed problem formulation becomes:

β
minimize − log gdet(Diag(λ)) + tr(KLw) + kLw − UDiag(λ)U> k2F ,
w,λ,U 2
subject to w ≥ 0, λ ∈ Sλ , U> U = I.

54/78
SGL algorithm for k-component graph learning

I Variables: X = (w, λ, U)

I Spectral constraint: Sλ = {{λj = 0}kj=1 , c1 ≤ λk+1 ≤ · · · ≤ λp ≤ c2 }.

I Positivity constraint: w ≥ 0
I Orthogonality constraint: U> U = Ip−k

We develop a block majorization-minimization (block-MM) type method

which updates each block sequentially while keeping the other blocks
fixed [Sun et al., 2016, Razaviyayn et al., 2013].

55/78
Update for w
Sub-problem for w:
β
minimize tr (KLw) + kLw − UDiag(λ)U> k2F .
w≥0 2

1 2
minimize f (w) = kLwkF − c> w.
w≥0 2
This problem is a convex quadratic program, but does not have a
closed-form solution due to the non-negativity constraint w ≥ 0.
The function f (w) is majorized at wt by the function
L w − wt 2
g(w|wt ) = f (w> ) + (w − wt )> ∇f (wt ) +

2
where w> is the update from previous iteration [Sun et al., 2016].
A closed-form update by using the MM technique
+
1
wt+1 = wt − ∇f (wt ) ,
2p
where (a)+ = max(a, 0). 56/78
Update for U

Sub-problem for U:

tr U> LwUDiag(λ)

maximize
U
subject to U> U = Ip−k .

This sub-problem is an optimization on the orthogonal Stiefel manifold

[Absil et al., 2009, Benidis et al., 2016]. From the KKT optimality
conditions the solution is given by

Ut+1 = eigenvectors(Lwt+1 )[k + 1 : p],

that is, the p − k principal eigenvectors of the matrix Lwt+1 in increasing

order of eigenvalue magnitude.

57/78
Update for λ

Sub-problem for λ:
β
minimize − log det(λ) + kU> (Lw)U − Diag(λ)k2F .
λ∈Sλ 2

p−k
X β
minimize − log λk+i + kλ − dk2 ,
c1 ≤λk+1 ≤···≤λp ≤c2
i=1
2

The sub-problem is popularly known as a regularized isotonic regression

problem. This is a convex optimization problem and the solution can be
obtained from the KKT optimality conditions. We develop an efficient
algorithm with a fast convergence to the global optimum in a maximum of
p − k iterations [Kumar et al., 2019].

Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P. Palomar,“
A Unified Framework for Structured Graph Learning via Spectral Constraints.” arXiv
preprint arXiv:1904.09792 (2019).
58/78
SGL algorithm summary

β
minimize − log gdet(Diag(λ)) + tr(KLw) + kLw − UDiag(λ)U> k2F ,
w,λ,U 2
subject to w ≥ 0, λ ∈ Sλ , U> U = Ip−k .

Algorithm:
1: Input: SCM S, k, c1 , c2 , β
2: Output: Lw
3: t←0
4: while stopping criterion is not met do
+
1
5: wt+1 = wt − 2p ∇f (wt )

6: Ut+1 ← eigenvectors(Lwt+1 ), suitably ordered.

7: Update λt+1 (via isotonic regression method with maxm iter p − k).
8: t←t+1
9: end while
10: return w(t+1)
59/78
Convergence and the computational complexity

The worst-case computational complexity of the proposed algorithm is

O(p3 ).

Theorem: The limit point (w? , U? , λ? ) generated by this algorithm

converges to the set of KKT points of the optimization problem.

Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P. Palomar,
“Structured graph learning via Laplacian spectral constraints,” in Advances in Neural
Information Processing Systems (NeurIPS), 2019.

60/78
Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

61/78
Synthetic experiment setup

I Generate a graph with desired structure.

I Sample weights for the graph edges.
I Obtain true Laplacian Θtrue .
I Sample data X = {x(i) ∈ Rp ∼ N (0, Σ = Θ†true )}ni=1 .
I S = n1 ni=1 (x(i) )(x(i) )> .
P

I Use S and some prior spectral information, if available.

I Performance metric

?
Θ̂ − Θtrue

F 2tp
Relative Error = , F-Score =
kΘtrue kF 2tp + fp + fn
?
I Where Θ̂ is the final estimation result the algorithm and Θtrue is the
true reference graph Laplacian matrix, and tp, fp, fn correspond to true
positives, false positives, and false negatives, respectively.

62/78
Grid graph

(i) True (ii) [Egilmez et al., 2017]

(iii) SGL with `1 -norm (iv) SGL with reweighted

`1 -norm 63/78
Noisy multi-component graph
1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0

(v) True (vi) Noisy (vii) Learned

(viii) True graph (ix) Noisy graph (x) Learned

64/78
Model mismatch
1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0

(xiv) True k = 7 (xv) Noisy (xvi) Learned with k =

2
65/78
Popular multi-component structures

66/78
Real data: cancer dataset [Weinstein et al., 2013]

(xxiii) CLR (Nie et al., 2016) (xxiv) SGL with k = 5

Clustering accuracy (ACC): CLR = 0.9862 and SGL = 0.99875.

67/78
Animal dataset [Osherson et al., 1991]

Iguana Cockroach
Ostrich
Ant Butterfly
Chicken
Bee Robin
Alligator
Trout

Trout Finch

Butterfly Eagle
Salmon
Dolphin
Tiger Iguana
Whale Giraffe
Lion Salmon Rhino
Gorilla Camel Elephant
Wolf Bee Penguin Ant
Rhino Elephant
Seal
Deer Horse
Dog Cat Cow Alligator
Chimp Penguin
Horse Gorilla
Whale
Seal
Camel Chimp
Cow
Deer Squirrel Lion
Mouse Chicken Dolphin
Ostrich Cockroach
Giraffe Mouse
Squirrel Eagle Wolf
Dog Tiger
Finch

Robin Cat

(xxv) GGL [Egilmez et al., 2017] (xxvi) GLasso [Friedman et al., 2008]

68/78
Animal dataset contd...

Giraffe
Horse Chicken
CamelElephant
Deer Robin
Cow Ostrich
Chimp Rhino
Squirrel Gorilla Salmon
Finch
Mouse Bee
Eagle
Butterfly Penguin Trout
Cockroach
Bee
Lion CockroachAnt Ant
Cat Tiger Butterfly
Wolf Seal
Dog Alligator Iguana Whale
Alligator
Mouse

Dolphin
Deer
Trout Iguana
Camel Squirrel
Seal Dolphin
Salmon Cow
Whale
Giraffe Dog
Penguin
Horse Cat
Rhino Wolf
Ostrich Chimp
Eagle
Chicken Elephant Lion
Tiger
Finch
Robin Gorilla

(xxvii) SGL: proposed (k = 1) (xxviii) SGL: proposed (k = 4)

69/78
Bipartite structure via adjacency spectral constraints

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0

(xxix) True (xxx) Noisy (xxxi) Learned

70/78
Multi-component bipartite structure via joint spectral
constraints

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0

(xxxii) True (xxxiii) Noisy (xxxiv) Learned

71/78
Resources
An R package “spectralGraphTopology” containing code for all the experimental results
is available at
https://cran.r-project.org/package=spectralGraphTopology

NeurIPS paper: Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P.
Palomar, “Structured graph learning via Laplacian spectral constraints,” in Advances in
Neural Information Processing Systems (NeurIPS), 2019.
https://arxiv.org/pdf/1909.11594.pdf

Extended version paper: Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and
Daniel P. Palomar, “A Unified Framework for Structured Graph Learning via Spectral
Constraints, (2019).” https://arxiv.org/pdf/1904.09792.pdf
Authors:

72/78
Thanks

For more information visit:

https://www.danielppalomar.com

73/78
References
Absil, P.-A., Mahony, R., and Sepulchre, R. (2009).
Optimization algorithms on matrix manifolds.
Princeton University Press.

Anandkumar, A., Tan, V. Y., Huang, F., and Willsky, A. S. (2012).

High-dimensional gaussian graphical model selection: Walk summability and local
separation criterion.
Journal of Machine Learning Research, 13(Aug):2293–2337.

Banerjee, O., Ghaoui, L. E., and d’Aspremont, A. (2008).

Model selection through sparse maximum likelihood estimation for multivariate
Gaussian or binary data.
Journal of Machine Learning Research, 9(Mar):485–516.

Benidis, K., Sun, Y., Babu, P., and Palomar, D. P. (2016).

Orthogonal sparse pca and covariance estimation via procrustes reformulation.
IEEE Transactions on Signal Processing, 64(23):6211–6226.

Bogdanov, A., Mossel, E., and Vadhan, S. (2008).

The complexity of distinguishing markov random fields.
In Approximation, Randomization and Combinatorial Optimization. Algorithms and
Techniques, pages 331–342. Springer.
74/78
References

Chow, C. and Liu, C. (1968).

Approximating discrete probability distributions with dependence trees.
IEEE transactions on Information Theory, 14(3):462–467.

Chung, F. R. (1997).
Spectral graph theory.
Number 92. American Mathematical Soc.
Dempster, A. P. (1972).
Covariance selection.
Biometrics, pages 157–175.

Egilmez, H. E., Pavez, E., and Ortega, A. (2017).

Graph learning from data under laplacian and structural constraints.
IEEE Journal of Selected Topics in Signal Processing, 11(6):825–841.

Fan, K. (1949).
On a theorem of weyl concerning eigenvalues of linear transformations i.
Proceedings of the National Academy of Sciences of the United States of America,
35(11):652.

75/78
References

Friedman, J., Hastie, T., and Tibshirani, R. (2008).

Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9(3):432–441.

Kumar, S., Ying, J., Cardoso, J. V. d. M., and Palomar, D. (2019).

A unified framework for structured graph learning via spectral constraints.
arXiv preprint arXiv:1904.09792.

Lake, B. and Tenenbaum, J. (2010).

Discovering structure by learning sparse graphs.
In Proceedings of the 33rd Annual Cognitive Science Conference.

Lauritzen, S. L. (1996).
Graphical models, volume 17.
Clarendon Press.
Mazumder, R. and Hastie, T. (2012).
The graphical lasso: New insights and alternatives.
Electronic journal of statistics, 6:2125.

76/78
References

Meinshausen, N. and Bühlmann, P. (2006).

High-dimensional graphs and variable selection with the lasso.
The annals of statistics, 34(3):1436–1462.

Osherson, D. N., Stern, J., Wilkie, O., Stob, M., and Smith, E. E. (1991).
Default probability.
Cognitive Science, 15(2):251–269.

Ravikumar, P., Wainwright, M. J., Lafferty, J. D., et al. (2010).

High-dimensional ising model selection using `1 -regularized logistic regression.
The Annals of Statistics, 38(3):1287–1319.

Razaviyayn, M., Hong, M., and Luo, Z.-Q. (2013).

A unified convergence analysis of block successive minimization methods for
nonsmooth optimization.
SIAM Journal on Optimization, 23(2):1126–1153.

Rue, H. and Held, L. (2005).

Gaussian Markov random fields: theory and applications.
CRC press.

77/78
References
Slawski, M. and Hein, M. (2015).
Estimation of positive definite m-matrices and structure learning for attractive
gaussian markov random fields.
Linear Algebra and its Applications, 473:145–179.

Sun, Y., Babu, P., and Palomar, D. P. (2016).

Majorization-minimization algorithms in signal processing, communications, and
machine learning.
IEEE Transactions on Signal Processing, 65(3):794–816.

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A.,
Ellrott, K., Shmulevich, I., Sander, C., Stuart, J. M., et al. (2013).
The cancer genome atlas pan-cancer analysis project.
Nature Genetics, 45(10):1113.

Zhao, L., Wang, Y., Kumar, S., and Palomar, D. P. (2019).

Optimization algorithms for graph laplacian estimation via ADMM and MM.
IEEE Transactions on Signal Processing, 67(16):4231–4244.

Zhao, L., Wang, Y., Kumar, S., and Palomar, D. P. (2019).

Optimization algorithms for graph laplacian estimation via admm and mm.
IEEE Transactions on Signal Processing, 67(16):4231–4244.
78/78

Graph Related Optimization and Decision Theory 1st Edition Saoussen Krichen PDF Download
100% (3)
Graph Related Optimization and Decision Theory 1st Edition Saoussen Krichen PDF Download
61 pages
Foundation of Translation (Dịch Cơ Bản)
No ratings yet
Foundation of Translation (Dịch Cơ Bản)
92 pages
Week 9
No ratings yet
Week 9
88 pages
Graphical Models, Exponential Families, and Variational Inference
No ratings yet
Graphical Models, Exponential Families, and Variational Inference
301 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
MLST Wavelet Positional Encoding
No ratings yet
MLST Wavelet Positional Encoding
25 pages
MLST Wavelet Positional Encoding
No ratings yet
MLST Wavelet Positional Encoding
23 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
Triplet, Vertices N Labels. Edges Ordered Pairs - " Can Be Influenced by ." Weights "Strength of The Influence of On ."
No ratings yet
Triplet, Vertices N Labels. Edges Ordered Pairs - " Can Be Influenced by ." Weights "Strength of The Influence of On ."
38 pages
Graph Meets LLMs - Towards Large Graph Models
No ratings yet
Graph Meets LLMs - Towards Large Graph Models
20 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
58 pages
GPUY
No ratings yet
GPUY
404 pages
Community Detection With Graph Neural Networks
No ratings yet
Community Detection With Graph Neural Networks
16 pages
Structural Drawing
No ratings yet
Structural Drawing
9 pages
Stable Diffusion
No ratings yet
Stable Diffusion
58 pages
29256-Article Text-33310-1-2-20240324
No ratings yet
29256-Article Text-33310-1-2-20240324
9 pages
STQA Lab Manual
100% (2)
STQA Lab Manual
43 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
Learning Graphs From Data A Signal Representation Perspective
No ratings yet
Learning Graphs From Data A Signal Representation Perspective
20 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
No ratings yet
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
23 pages
A Survey On Network Embedding
No ratings yet
A Survey On Network Embedding
20 pages
Any Graph
No ratings yet
Any Graph
11 pages
DRDO PPT m1
No ratings yet
DRDO PPT m1
16 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
Hexagon Number Sense
From Everand
Hexagon Number Sense
Christopher Casey
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
61 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
Thura2022-05-30 - Geometric Deep Learning
No ratings yet
Thura2022-05-30 - Geometric Deep Learning
33 pages
Cini 2023 SparseGraphLearningFromSpatiotemporal Time Series
No ratings yet
Cini 2023 SparseGraphLearningFromSpatiotemporal Time Series
36 pages
Ai Based Graph Theory Method and Process
No ratings yet
Ai Based Graph Theory Method and Process
7 pages
4MTH312 Lecture Notes-7
No ratings yet
4MTH312 Lecture Notes-7
83 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Lec1 Graph
No ratings yet
Lec1 Graph
42 pages
Es 5 Power - Flow
No ratings yet
Es 5 Power - Flow
84 pages
unit-II Node Embeddings
No ratings yet
unit-II Node Embeddings
44 pages
Dynamic Bayesian Multinets
No ratings yet
Dynamic Bayesian Multinets
8 pages
2019 - Medium - Tutorial On Graph Neural Networks For Computer Vision and Beyond - by Boris Knyazev
No ratings yet
2019 - Medium - Tutorial On Graph Neural Networks For Computer Vision and Beyond - by Boris Knyazev
21 pages
Wainwright Microsoft Slides2
No ratings yet
Wainwright Microsoft Slides2
67 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
2020 - William L. Hamilton - Graph Representation Learning-Morgan & Claypool
No ratings yet
2020 - William L. Hamilton - Graph Representation Learning-Morgan & Claypool
161 pages
Structure Learning in Graphical Modeling
No ratings yet
Structure Learning in Graphical Modeling
28 pages
Optimization On Graphs
No ratings yet
Optimization On Graphs
23 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
LS Srinath Four Chapters Complete
No ratings yet
LS Srinath Four Chapters Complete
137 pages
19MAM81-GRLmidsem 1 Answer Key
No ratings yet
19MAM81-GRLmidsem 1 Answer Key
14 pages
VQGraph: A New Method To Encode and Learn From Graphs
No ratings yet
VQGraph: A New Method To Encode and Learn From Graphs
7 pages
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
No ratings yet
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
24 pages
Tolerance Analysis - Wikipedia
No ratings yet
Tolerance Analysis - Wikipedia
4 pages
ADMM For Combinatorial Graph Problems: Preprint
No ratings yet
ADMM For Combinatorial Graph Problems: Preprint
20 pages
Graphs, Linear Systems and Laplacian Matrices: A Seminar
No ratings yet
Graphs, Linear Systems and Laplacian Matrices: A Seminar
51 pages
Original GNN
No ratings yet
Original GNN
22 pages
SIP Action Plan Overview
No ratings yet
SIP Action Plan Overview
2 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
Graphnorm: A Principled Approach To Accelerating Graph Neural Network Training
No ratings yet
Graphnorm: A Principled Approach To Accelerating Graph Neural Network Training
25 pages
The Graph Neural Network Model
No ratings yet
The Graph Neural Network Model
20 pages
PDS DeltaV DocLibrary
No ratings yet
PDS DeltaV DocLibrary
3 pages
GNN Foundations Frontiers and Applications Chapter2
No ratings yet
GNN Foundations Frontiers and Applications Chapter2
10 pages
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
No ratings yet
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
49 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Yj Portfolio
100% (1)
Yj Portfolio
20 pages
Graph Theory
No ratings yet
Graph Theory
6 pages
c2000 Reference Guide
No ratings yet
c2000 Reference Guide
37 pages
RareJob Template
No ratings yet
RareJob Template
3 pages
Generated Qa Pairs
No ratings yet
Generated Qa Pairs
296 pages
Causal AI Final
No ratings yet
Causal AI Final
71 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Morning To Afternoon Choghadiya: Sunset Sunrise
No ratings yet
Morning To Afternoon Choghadiya: Sunset Sunrise
1 page
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
GNNS
No ratings yet
GNNS
7 pages
Structured Norm
No ratings yet
Structured Norm
77 pages
Science 10 7es Lesson Plan Quarter 3 Week 3 Topic: The Reproductive System
87% (39)
Science 10 7es Lesson Plan Quarter 3 Week 3 Topic: The Reproductive System
3 pages
Windows Display Driver Model Enhancements
No ratings yet
Windows Display Driver Model Enhancements
65 pages
Worksheet 1
No ratings yet
Worksheet 1
5 pages
Complex numbers
From Everand
Complex numbers
Alessio Mangoni
No ratings yet
Test 10mo
No ratings yet
Test 10mo
3 pages
Economics Ee
No ratings yet
Economics Ee
21 pages
المناجمنت العمومي الجديد كمدخل لاصلاح الادارة العمومية -دراسة حالة بعض دول منظمة التعاون والتنمية الاقتصادية
No ratings yet
المناجمنت العمومي الجديد كمدخل لاصلاح الادارة العمومية -دراسة حالة بعض دول منظمة التعاون والتنمية الاقتصادية
14 pages
SCALE-Sim Tutorial ASPLOS2021 2 Overview
No ratings yet
SCALE-Sim Tutorial ASPLOS2021 2 Overview
35 pages
SERBIA Smart Solution
No ratings yet
SERBIA Smart Solution
18 pages
Step 1: Seiri, or Sort
No ratings yet
Step 1: Seiri, or Sort
3 pages
Saini 2012
No ratings yet
Saini 2012
17 pages
Academic Integrity
No ratings yet
Academic Integrity
20 pages
Sciencedirect: Recurring Issues in Historic Building Conservation
No ratings yet
Sciencedirect: Recurring Issues in Historic Building Conservation
9 pages
CH 04
No ratings yet
CH 04
14 pages
Datasetfordigitalelectronics
No ratings yet
Datasetfordigitalelectronics
8 pages
Present Perfect Continuous: Exercises
No ratings yet
Present Perfect Continuous: Exercises
5 pages
Bapi Ipak Start
No ratings yet
Bapi Ipak Start
8 pages
Hypervisor Overview PDF
No ratings yet
Hypervisor Overview PDF
2 pages
Planning Worksheet For Access and Quality
No ratings yet
Planning Worksheet For Access and Quality
4 pages
Time: 60 Minutes NAME: . GROUP: .. DATE
No ratings yet
Time: 60 Minutes NAME: . GROUP: .. DATE
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Abdul Subhan C.V
No ratings yet
Abdul Subhan C.V
2 pages

CVX Lecture Graphs

Uploaded by

CVX Lecture Graphs

Uploaded by

Optimization Methods for Graph Learning

Sandeep Kumar and Prof. Daniel P. Palomar

The Hong Kong University of Science and Technology

ELEC5470/IEDA6100A - Convex Optimization

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

Representing knowledge through graphical models

I Nodes correspond to the entities (variables).

I Edges encode the relationships between entities (dependencies between

Learning relational dependencies among entities benefits numerous

Figure 2: Social Graph

Objective: To model behavioral similarity/

I Graphs are intuitive way of representing and visualising the

Graphical Models = Statistics × Graph Theory × Optimization × Engineering

The choice of graph representation affects the subsequent analysis and

The goal is to learn a graph representation of data with specific properties

Graph is a simple mathematical structure of form G = (V, E, W), where

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

where D = Diag(W · 1) is the diagonal matrix, whoseP(i, i)-element is

I A valid set for the adjacency matrix W:

W = W ∈ Rp×p | Wij = Wji ≥ 0, i 6= j, Wii = 0

I A valid set for the Laplacian matrix L:

I Models encoding direct dependencies: simple and intuitive.

I Models based on some assumption on the data: X ∼ F(G)

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

A smaller tr X> LX indicating a smoother signal X over the graph G,

where L is its Laplacian matrix.

L̂ := arg min tr XLX> + λh(L),

resVariable: W = [w1 , w2 , . . . , wp ], where wi is the vector containning

This a convex optimization problem, we will obtain a closed-form update

I The Lagrangian function of this problem is

Combining the solution of ŵ with the constraint ŵ> 1 = 1, we have

This leads to following inequality for λi :

Therefore, to obtain an optimal solution ŵi with exactly m nonzero values

(v) Initial graph (vi) Desired graph

Eigenvalues of Laplacian Matrix

I Compute from data W as [W]ij = exp(− kxi − xj k22 /2σ 2 ). W may

I Let λ(Ls ) = {0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λp } denote the eigenvalues of Ls in

I The optimal solution of F is formed by the k eigenvectors of Ls

I Note that the problem is independent for different i, so we can solve

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

The nonzero pattern of Θ determines a conditional graph G = (V, E) :

I For a Gaussian distributed data x ∼ N (0, Θ† ) the graph learning is

arg min |x(1) − β1 X/x(1) |2 + αkβ1 k1 .

I `1 -regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]:

I Ising model: `1 -regularized logistic regression [Ravikumar et al., 2010].

I Given data X = {x(i) ∼ N (0, Σ = Θ† )}ni=1 ,

I `1 -regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]:

I When the data is coming from p-variate Gaussian distribution, it is the

I Attractive IGMRF model [Slawski and Hein, 2015]:

The optimality condition for the above problem is

where Γ is a matrix of component-wise signs of Θ:

GLasso solves for a row/column of Θ at a time, holding the rest fixed.

The above also simplifies to

with ν = θ 12 Σ̂22 (with Σ̂22 fixed), Θ11  0 is equivalent to the stationary

where H = 2I − 11> . Thus, the objective function becomes

Goal: To estimate the graph Laplacian weights Θ with known connectivity

The constraint set SL (C) can be written as

where B = 11> − I − C. The constraint I Ω ≥ 0 is implied from the

where P ∈ Rp×(p−1) is the orthogonal complement of 1, i.e., P> P = Iand

Thus, the problem formulation changes to

with ν = θ 12 Σ̂22 (with Σ̂22 fixed), Θ11 0 is equivalent to the stationary