0% found this document useful (0 votes)

29 views12 pages

NeurIPS 2021 Multi View Contrastive Graph Clustering Paper

This document discusses a proposed method called Multi-view Contrastive Graph Clustering (MCGC) for clustering multi-view attributed graph data. MCGC learns a new consensus graph by exploring information across attributes and graphs, rather than using the initial graph, to address issues with noise and incompleteness. It includes three key components: graph filtering to obtain a smoothed representation, graph learning to generate a consensus graph with adaptive view weighting, and graph contrastive learning with a contrastive loss to make the consensus graph more clustering-friendly. Experimental results on five benchmark datasets show MCGC achieves state-of-the-art performance compared to shallow and deep baselines.

Uploaded by

Yijian Fan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views12 pages

NeurIPS 2021 Multi View Contrastive Graph Clustering Paper

Uploaded by

Yijian Fan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Multi-view Contrastive Graph Clustering

Erlin Pan, Zhao Kang∗

School of Computer Science and Engineering,
University of Electronic Science and Technology of China, Chengdu, China
[email protected] [email protected]

Abstract
With the explosive growth of information technology, multi-view graph data have
become increasingly prevalent and valuable. Most existing multi-view clustering
techniques either focus on the scenario of multiple graphs or multi-view attributes.
In this paper, we propose a generic framework to cluster multi-view attributed
graph data. Specifcally, inspired by the success of contrastive learning, we propose
multi-view contrastive graph clustering (MCGC) method to learn a consensus graph
since the original graph could be noisy or incomplete and is not directly applicable.
Our method composes of two key steps: we frst flter out the undesirable high-
frequency noise while preserving the graph geometric features via graph fltering
and obtain a smooth representation of nodes; we then learn a consensus graph
regularized by graph contrastive loss. Results on several benchmark datasets
show the superiority of our method with respect to state-of-the-art approaches. In
particular, our simple approach outperforms existing deep learning-based methods.

1 Introduction
An attributed graph contains of node features and edges characterizing the pairwise relations between
nodes. It is a natural and effcient representation for many real-world data [Liu et al., 2021]. For
example, social network users have their own profles and the topological graph refects their social
relationships. Different from most classical clustering methods like K-means and hierarchical
clustering which only focus on Euclidean data, graph clustering divides unlabeled nodes of graph
into clusters. Typical graph clustering methods frst learn a good representation of graph and then
apply a classical clustering method upon the embeddings. For example, large-scale information
network embedding (LINE) [Tang et al., 2015] is a popular graph representation learning method,
which can preserve both local and global information and scale up easily to large-scale networks.
To incorporate node features and graph structure information, graph autoencoder (GAE) [Kipf
and Welling, 2016] employs a graph convolution network (GCN) encoder and achieves signifcant
performance improvement. The real-life data are often collected from various sources or obtained
from different extractors, thus are naturally represented by different features or views [Kang et al.,
2021, 2020c]. Each view could be noisy and incomplete, but important factors, such as geometry
and semantics, tend to be shared among all views. Therefore, features and graphs of different views
are complementary, which implies that it’s paramount to integrate all features and graphs of diverse
views to improve the performance of clustering task.
Numerous graph-based multi-view clustering methods have been developed to capture the consensus
information shared by different views in the literature. Graph-based multi-view clustering constructs a
graph for each view and fuses them based on a weighting mechanism [Wang et al., 2019]. Multi-view
spectral clustering network [Huang et al., 2019] learns a discriminative representation by using a
deep metric learning network. These methods are developed for feature matrix and can not handle
graph data. To directly process graph data, some representative methods have also been proposed.
∗
Corresponding author.

35th Conference on Neural Information Processing Systems (NeurIPS 2021).

Scalable multiplex network embedding (MNE) [Zhang et al., 2018] is a scalable multi-view network
embedding model, which learns multiple relations by a unifed network embedding framework.
Principled multilayer network embedding (PMNE) [Liu et al., 2017] proposes three strategies
(“network aggregation”, “results aggregation”, and “layer co-analysis”) to project a multilayer
network into a continuous vector space. Nevertheless, they fail to explore the feature information
[Lin and Kang, 2021].
Recently, based on GCN, One2Multi graph autoencoder clustering (O2MA) framework [Fan et al.,
2020] and multi-view attribute GCNs for clustering (MAGCN) [Cheng et al., 2020] achieve superior
performance on graph clustering. O2MA introduces a graph autoencoder to learn node embeddings
based on one informative graph and reconstruct multiple graphs. However, the shared feature
representation of multiple graphs could be incomplete because O2MA only takes into account the
informative view selected by modularity. MAGCN exploits the abundant information of all views
and adopts a cross-view consensus learning by enforcing the representations of different views to be
as similar as possible. Nevertheless, O2MA targets for multiple graphs while MAGCN mainly solves
graph structured data of multi-view attributes. They are not directly applicable to multiple graphs
data with multi-view attributes. Therefore, the research of multi-view graph clustering is at the initial
stage and more dedicated efforts are pressingly needed.
In this paper, we propose a generic framework of clustering on attributed graph data with multi-
view features and multiple topological graphs, denoted by Multi-view Contrastive Graph Clustering
(MCGC). To be exact, MCGC learns a new consensus graph by exploring the holistic information
among various attributes and graphs rather than utilizing the initial graph. The reason of introducing
graph learning is that the initial graph is often noisy or incomplete, which leads to suboptimal
solutions [Chen et al., 2020b, Kang et al., 2020b]. A contrastive loss is adopted as regularization to
make the consensus graph clustering-friendly. Moreover, we implement on the smooth representation
rather than raw data. The contributions of this work could be summarized as follows:

• To boost the quality of learned graph, we propose a novel contrastive loss at graph-level. It
is capable of drawing similar nodes close and pushing those dissimilar ones apart.
• We propose a generic clustering framework to handle multilayer graphs with multi-view
attributes, which contains graph fltering, graph learning, and graph contrastive components.
The graph fltering is simple and effcient to obtain a smoothed representation; the graph
learning is utilized to generate the consensus graph with an adaptive weighting mechanism
for different views.
• Our method achieves state-of-the-art performance compared with shallow methods and deep
methods on fve benchmark datasets.

2 Related Work
2.1 Multi-view Clustering

Large quantities of multi-view clustering methods have been proposed in the last decades. Multi-view
low-rank sparse subspace clustering [Brbić and Kopriva, 2018] obtains a joint subspace representation
across all views by learning an affnity matrix constrained by sparsity and low-rank constraint.
Cross-view matching clustering (COMIC) [Peng et al., 2019] enforces the graphs to be as similar
as possible instead of the representation in the latent space. Robust multi-view spectral clustering
(RMSC) [Xia et al., 2014] uses a shared low-rank transition probability matrix derived from each
single view as input to the standard Markov chain method for clustering. These methods are designed
for feature matrix and try to learn a graph from data. To directly cluster multiple graphs, self-weighted
multi-view clustering (SwMC) [Nie et al., 2017] method learns a shared graph from multiple graphs
by using a novel weighting strategy. Above methods are suitable for graph or feature data only, and
can not simultaneously explore attributes and graph structure. As previously discussed, O2MA [Fan
et al., 2020] and MAGCN [Cheng et al., 2020] can handle attributed graph, but they are not direct
applicable to generic multi-view graph data.

2.2 Contrastive Clustering

Due to its impressive performance in many tasks, contrastive learning has become the most hot
topic in unsupervised learning. Its motivation is to maximize the similarity of positive pairs and
distance of negative pairs [Hadsell et al., 2006]. Generally, the positive pair are composed of data
augmentations of the same instance while those of different instances are regarded as negatives.
Several loss functions have been proposed, such as the triplet loss [Chopra et al., 2005], the noise con-
trastive estimation (NCE) loss [Gutmann and Hyvärinen, 2010], the normalized temperature-scaled
cross entropy loss (NT-Xent) [Chen et al., 2020a]. Deep robust clustering turns maximizing mutual
information into minimizing contrastive loss and achieves signifcant improvement after applying
contrastive learning to decrease intra-class variance [Zhong et al., 2020]. Contrastive clustering
develops a dual contrastive learning framework, which conducts contrastive learning at instance-level
as well as cluster-level [Li et al., 2021]. As a result, it produces a representation that facilitates the
downstream clustering task. Unfortunately, theses method can only handle single-view data.
Recently, by combining reconstruction, cross-view contrastive learning, and cross-view dual predic-
tion, incomplete multi-view clustering via contrastive prediction (COMPLETER) [Lin et al., 2021a]
performs data recovery and consistency learning of incomplete multi-view data simultaneously. It
also obtains promising performance on complete multi-view data. However, it can not deal with graph
data. On the other hand, contrastive multi-view representation learning on graphs (MVGRL) [Hassani
and Khasahmadi, 2020] method is proposed, which performs representation learning by contrasting
two diffusion matrices transformed from the adjacency matrix. It reports better performance than
variational GAE (VGAE) [Kipf and Welling, 2016], marginalized GAE (MGAE) [Wang et al., 2017],
adversarially regularized GAE (ARGA) and VGAE (ARVGA) [Pan et al., 2018], and GALA [Park
et al., 2019]. Different from MVGRL, our contrastive regularizer is directly applied on learned graph.

3 Methodology
3.1 Notation

Defne the multi-view graph data as G = {V, E1 , ..., EV , X 1 , ..., X V }, where V represents the
sets of N nodes, eij ∈ Ev denotes the relationship between node i and node j in the v-th view,
v >
X v = {xv1 , ..., xN } is the feature matrix. Adjacency matrices {A ev }V characterize the initial
v=1
v V
graph structure. {D }v=1 represent the degree matrices in various views. The normalized adjacency
1
matrix Av = (Dv )− 2 (A ev + I)(Dv )− 12 and the corresponding graph laplacian Lv = I − Av .

3.2 Graph Filtering

R
A feature matrix X ∈ N ×d of N nodes can be treated as d N -dimensional graph signals. A natural
signal should be smooth on nearby nodes in term of the underlying graph. The smoothed signals H
can be achieved by solving the following optimization problem [Zhu et al., 2021, Lin et al., 2021b]
min kH − Xk2F + s Tr H> LH ,
�
(1)
H
where s > 0 is a balance parameter and L is the laplacian matrix associated with X. H can be
obtained by taking the derivative of Eq. (1) w.r.t. H and setting it to zero, which yields
H = (I + sL)−1 X. (2)
To get rid of matrix inversion, we approximate H by its frst-order Taylor series expansion, i.e.,
H = (I − sL)X. Generally, m-th order graph fltering can be written as
H = (I − sL)m X, (3)
where m is a non-negative integer. Graph fltering can flter out undesirable high-frequency noise
while preserving the graph geometric features.

3.3 Graph Learning

Since real-world graph is often noisy or incomplete, which will degrade the downstream task
performance if it is directly applied. Thus we learn an optimized graph S from the smoothed
representation H. This can be realized based on the self-expression property of data, i.e., each data
point can be represented by a linear combination of other data samples [Lv et al., 2021, Ma et al.,
2020]. And the combination coeffcients represent the relationships among data points. The objective
function on single-view data can be mathematically formulated as
2 2
min H > − H > S F
+ α kSkF , (4)
S
R
where S ∈ N ×N is the graph matrix and α > 0 is the trade-off parameter. The frst term is the
reconstruction loss and the second term serves as a regularizer to avoid trivial solution. Many other
regularizers could also be applied, such as the nuclear norm, sparse `1 norm [Kang et al., 2020a]. To
tackle multi-view data, we can compute a smooth representation H v for each view and extend Eq.
(4) by introducing a weighting factor to distinguish the contributions of different views
V X V
X 2 γ
minv λv H v> − H v> S F + αkSk2F + (λv ) , (5)
S,λ
v=1 v=1
v
where λ is the weight of v-th view and γ is a smooth parameter. Eq. (5) learns a consensus graph S
shared by all views. To learn a more discriminative S, we introduce a novel regularizer in this work.

3.4 Graph Contrastive Regularizer

Generally, contrastive learning is performed at instance-level and positive/negative pairs are con-
structed by data augmentation. Most graph contrastive learning methods conduct random corruption
on nodes and edges to learn a good node representation. Different from them, each node and its
k-nearest neighbors (kNN) are regarded as positive pairs in this paper. Then, we perform contrastive
learning at graph-level by applying a contrastive regularizer on the graph matrix S instead of node
features. It can be expressed as
N X
X exp (Sij )
J = − log PN , (6)
i=1 j∈Nv i
p=6 i exp (Sip )

where Nvi represents the k-nearest neighbors of node i in v-th view. The introduction of Eq. (6) is to
draw neighbors close and push non-neighbors apart, so as to boost the quality of graph.
Eventually, our proposed multi-view contrastive graph clustering (MCGC) model can be formulated
as ⎛ ⎞
V N X V
X
v⎝ v> v> 2 X exp (S ij ) X γ
minv λ H −H S F +α − log PN ⎠ + (λv ) . (7)
S,λ
v=1 i=1 j∈Nv i
6 i exp (Sip )
p= v=1

Different from existing multi-view clustering methods, MCGC explores the holistic information from
both multi-view attributes and multiple structural graphs. Furthermore, it constructs a consensus
graph from the smooth signal rather than the raw data.

3.5 Optimization

There are two groups of variables in Eq. (7) and it’s diffcult to solve them directly. To optimize them,
we adopt an alternating optimization strategy, in which we update one variable and fx all others at
each time.
Fix λv , Update S
Because λv is fxed, our objective function can be expressed as
⎛ ⎞
V N X
X 2 X exp (S ij )
min λv ⎝ H v> − H v> S F + α − log PN ⎠. (8)
S exp (S )
v=1 i=1 j∈N v 6 i
p= ip
i

S can be elemently solved by gradient descent and its derivative at epoch t can be denoted as
(t) (t)
r1 + αr2 . (9)
The frst term is
V h i
(t)
X
λv − H v H v> ij + H v H v> S (t−1)

r1 = 2 . (10)
ij
v=1

PN (t−1)
Defne K (t−1) = p=i 6 exp Sip and let n be the total number of neighbors (i.e., the neighbors
from each graph are all incorporated). Consequently, the second term becomes

(t−1)
⎧ V
n exp Sij
v
, if j in Nvi ,
⎪ P
⎪
⎨ λ −1 + K (t−1)
(t) v=1
r2 = V

(t−1)
(11)
n exp Sij
λv
⎪ P
⎪
⎩ K (t−1)
, otherwise .
v=1
Then we adopt Adam optimization strategy [Kingma and Ba, 2015] to update S. To increase the
speed of convergence, we initialize S with S ∗ , where S ∗ is the solution of Eq. (5).
Fix S, Update λv
2
For each view v, we defne M v = H v> − H v> S F + αJ . Then, the loss function is simplifed as
V
X V
X γ
min
v
λv M v + (λv ) . (12)
λ
v=1 v

By setting its derivation to zero, we get

1
−M v
γ−1
v
λ = . (13)
γ
We alternatively update S and λv until convergence. The complete procedures are outlined in
Algorithm 1.

Algorithm 1 MCGC
Require: adjacency matrix A e1 ,...,A
eV , feature X 1 ,...,X V , the order of graph fltering m, parameter
α, s and γ, the number of clusters c.
Ensure: c clusters.
1: λv = 1;
1
2: Av = (D v )− 2 (A ev + I)(Dv )− 12 ;
v v
3: L = I − A ;
4: Graph fltering by Eq. (4) for each view;
5: while convergence condition does not meet do
6: Update S in Eq. (9) via Adam;
7: for each view do
8: Update λv in Eq. (13);
9: end for
10: end while
(|S|+|S|> )
11: C = 2 ;
12: Clustering on C.

4 Experiments
4.1 Datasets and Metrics

We evaluate MCGC on fve benchmark datasets, ACM, DBLP, IMDB [Fan et al., 2020], Amazon
photos and Amazon computers [Shchur et al., 2018]. The statistical information is shown in Table 1.
ACM: It is a paper network from the ACM dataset. Node attribute features are the elements of a
bag-of-words representing of each paper’s keywords. The two graphs are constructed by two types of
relationships: "Co-Author" means that two papers are written by the same author and "Co-Subject"
suggests that they focus on the same fled.
DBLP: It is an author network from the DBLP dataset. Node attribute features are the elements
of a bag-of-words representing of each author’s keywords. Three graphs are derived from the
relationships: "Co-Author", "Co-Conference", and "Co-Term", which indicate that two authors have
worked together on papers, published papers at the same conference, and published papers with the
same terms.
IMDB: It is a movie network from the IMDB dataset. Node attribute features correspond to elements
of a bag-of-words representing of each movie. The relationships of being acted by the same actor
(Co-actor) and directed by the same director (Co-director) are exploited to construct two graphs.
Amazon photos and Amazon computers: They are segments of the Amazon co-purchase network
dataset, in which nodes represent goods and features of each good are bag-of-words of product
reviews, the edges means that two goods are purchased together. To have multi-view attributes, the
second feature matrix is constructed via cartesian product by following [Cheng et al., 2020].
We adopt four popular clustering metrics: Accuracy (ACC), normalized Mutual Information (NMI),
Adjusted Rand Index (ARI), F1 score.
Table 1: The statistical information of datasets.
Dataset Nodes Features Graph and Edges Clusters
Co-Subject (29,281)
ACM 3,025 1,830 Co-Author (2,210,761) 3
Co-Author (11,113)
DBLP 4,057 334 Co-Conference (5,000,495) 4
Co-Term (6,776,335)
Co-Actor (98,010)
IMDB 4,780 1,232 Co-Director (21,018) 3
745
Amazon photos 7,487 7,487 Co-Purchase(119,043) 8
767
Amazon computers 13,381 13,381 Co-Purchase(245,778) 10

4.2 Experiment Setup

We compare MCGC with multi-view methods as well as single-view methods. LINE [Tang et al.,
2015] and GAE [Kipf and Welling, 2016] have been chosen as representatives of single-view methods,
and we report the best one among the results from all views. Compared multi-view clustering methods
include: PMNE [Liu et al., 2017], RMSC [Xia et al., 2014], SwMC [Nie et al., 2017]. PMNE and
SwMC only use structural information while RMSC only exploits attribute features. PMNE uses
three strategies to project a multilayer network into a continuous vector space, so we select the best
result. MCGC is also compared with other methods that not only explore attribute features but also
structural information, i.e., O2MA and O2MAC [Fan et al., 2020], MAGCN [Cheng et al., 2020].
In addition, MCGC is compared with COMPLETER [Lin et al., 2021a] and MVGRL [Hassani and
Khasahmadi, 2020] that conduct contrastive learning to learn a common representation shared across
features of different views and multiple graphs respectively. For an unbiased comparison, we copy
part of the results from [Fan et al., 2020]. Since the neighbors of each node on different views could
be different, we also examine another strategy: only use the shared neighbors in the contrastive loss
TV
term, Ni = v=1 Nvi . And our method with this approach is marked as MCGC*. During experiments,
k = 10 is used to select neighbors and γ is fxed as −4 since we fnd that it has little infuence to the
result. According to parameter analysis, we set m = 2, s = 0.5, and tune α. All experiments are
conducted on the same machine with the Intel(R) Core(TM) i7-8700 3.20GHz CPU, two GeForce
GTX 1080 Ti GPUs and 64GB RAM. The implementation of MCGC is public available 1 .

4.3 Results

All results are shown in Table 2 and Table 3. Compared with single-view method GAE, MCGC
improves ACC by more than 9%, 4%, 19% on ACM, DBLP, IMDB, respectively. Though using
deep neural networks, GAE can not explore the complementarity of views. Compared with PMNE,
the ACC, NMI, ARI, F1 are boosted by 16%, 20%, 20%, 12% on average. With respect to LINE,
RMSC, SwMC, the improvement is more signifcant. This can be attributed to the exploration of
both feature and structure information in MCGC. Although O2MA, O2MAC, and MAGCN capture
attributes and structure information, MCGC still outperforms them considerably. Specifcally, MCGC
improves O2MAC on average by almost 6%, 9%, 11% on ACC, NMI, F1, respectively. With respect
to MAGCN, the improvement is more than 20% for all metrics. Compared with contrastive learning-
based approaches, our improvement is also impressive. In particular, compared with COMPLETER,
the improvement is more than 30% on Amazon datasets, which illustrates that MCGC benefts from
the graph structure information. MCGC also enhances the performance of MVGRL by 20%. By
comparing the results of MCGC and MCGC*, we can see that the strategy of choosing neighbors
does have impact on performance.

1
https://github.com/Panern/MCGC
Table 2: Results on ACM, DBLP, IMDB.
Method LINE GAE PMNE RMSC SwMC O2MA O2MAC MCGC MCGC*
ACC 0.6479 0.8216 0.6936 0.6315 0.3831 0.888 0.9042 0.9147 0.9055
NMI 0.3941 0.4914 0.4648 0.3973 0.4709 0.6515 0.6923 0.7126 0.6823
ACM ARI 0.3433 0.5444 0.4302 0.3312 0.0838 0.6987 0.7394 0.7627 0.7385
F1 0.6594 0.8225 0.6955 0.5746 0.018 0.8894 0.9053 0.9155 0.9062
ACC 0.8689 0.8859 0.7925 0.8994 0.3253 0.904 0.9074 0.9298 0.9162
NMI 0.6676 0.6925 0.5914 0.7111 0.019 0.7257 0.7287 0.8302 0.7490
DBLP ARI 0.6988 0.741 0.5265 0.7647 0.0159 0.7705 0.778 0.7746 0.7995
F1 0.8546 0.8743 0.7966 0.8248 0.2808 0.8976 0.9013 0.9252 0.9112
ACC 0.4268 0.4298 0.4958 0.2702 0.2453 0.4697 0.4502 0.6182 0.6113
NMI 0.0031 0.0402 0.0359 0.3775 0.0023 0.0524 0.0421 0.1149 0.1225
IMDB ARI -0.009 0.0473 0.0366 0.0054 0.0017 0.0753 0.0564 0.1833 0.1811
F1 0.287 0.4062 0.3906 0.0018 0.3164 0.4229 0.1459 0.4401 0.4512

Table 3: Results on Amazon photos and Amazon computers. The ‘-’ means that the method raises
out-of-memory problem.

Dataset Amazon photos Amazon computers

ACC NMI ARI F1 ACC NMI ARI F1
COMPLETER 0.3678 0.2606 0.0759 0.3067 0.2417 0.1562 0.0536 0.1601
MVGRL 0.5054 0.4331 0.2379 0.4599 0.2450 0.1012 0.0553 0.1706
MAGCN 0.5167 0.3897 0.2401 0.4736 – – – –
MCGC 0.7164 0.6154 0.4323 0.6864 0.5967 0.5317 0.3902 0.5204

5 Ablation Study

5.1 The Effect of Contrastive Loss

By employing the contrastive regularizer, our method pulls neighbors into the same cluster, which
decreases intra-cluster variance. To see its effect, we replace J with a Frobenius term, i.e. Eq.
(5). As can be seen from Table 4, the performance falls precipitously without contrastive loss on
all datasets. MCGC achieves ACC improvements by 16%, 8%, 5%, 12% on DBLP, ACM, IMDB,
Amazon datasets, respectively. For other metrics, the contrastive regularizer also enhances the
performance signifcantly. Above facts validate that MCGC benefts from the graph contrastive loss.

Table 4: Results of MCGC without contrastive loss.

Datasets ACM DBLP IMDB Amazon photos Amazon computers
MCGC 0.9147 0.9298 0.6182 0.7164 0.5967
ACC MCGC w/o J 0.8334 0.7658 0.5636 0.5882 0.4662
MCGC 0.7126 0.8302 0.1149 0.6154 0.5317
NMI MCGC w/o J 0.5264 0.4621 0.0707 0.5372 0.3988
MCGC 0.7627 0.7746 0.1833 0.4323 0.3902
ARI MCGC w/o J 0.5779 0.4949 0.1451 0.2640 0.1745
MCGC 0.9155 0.9252 0.4401 0.6864 0.5204
F1 MCGC w/o J 0.8313 0.7601 0.4444 0.5437 0.3678
Table 5: Results in various views of MCGC on ACM and Amazon photos. G1 and G2 denote the
graphs in different views.

Dataset ACM Amazon photos

G1 , X G2 , X G1 , G2 , X X 1, G X 2, G X 1, X 2, G
ACC 0.9088 0.8152 0.9147 0.4433 0.6935 0.7164
NMI 0.6929 0.4656 0.7126 0.3519 0.5976 0.6154
ARI 0.7470 0.5229 0.7627 0.1572 0.4291 0.4323
F1 0.9097 0.8184 0.9155 0.3675 0.6734 0.6864

5.2 The Effect of Multi-View Learning

To demonstrate the effect of multi-view learning, we evaluate the performance of the following
single-view model
N X
> > 2 X exp(Sij )
min H − H S F
+α − log . (14)
N
S N
i=1 j∈
X
i
exp(Sip )
p6=i

Taking ACM and Amazon photos as examples, we report the clustering performance of various
scenarios in Table 5. We can observe that the best performance is always achieved when all views
are incorporated. In addition, we also see that the performance varies a lot for different views. This
justifes the necessity of λv in Eq. (7). Therefore, it is benefcial to explore the complementarity of
multi-view information.

5.3 The Effect of Graph Filtering

To understand the contribution of graph fltering, we conduct another group of experiments. Without
graph fltering, our objective function becomes
⎛ ⎞
V N X V
X 2 X exp (S ij ) X γ
minν λv ⎝ X v> − X v> S F + α − log PN ⎠+ (λv ) . (15)
S,λ
v=1 i=1 j∈Nv i
6 i exp (Sip )
p= v=1

We denote this model as MCGC-. The results of MCGC- are shown in Table 6. With respect to
MCGC, ACC on ACM, DBLP, IMDB drops by 0.8%, 1.3%, 0.8%, respectively. This indicates that
graph fltering makes a positive impact on our model. For other metrics, MCGC also outperforms
MCGC- in most cases.

Table 6: The results of MCGC- (without graph fltering).

Dataset ACM DBLP IMDB

MCGC MCGC- MCGC MCGC- MCGC MCGC-
ACC 0.9147 0.9061 0.9298 0.9162 0.6182 0.6109
NMI 0.7126 0.6974 0.8302 0.7490 0.1149 0.1219
ARI 0.7627 0.7439 0.7746 0.7995 0.1833 0.1804
F1 0.9155 0.9057 0.9252 0.9112 0.4401 0.4509

6 Parameter Analysis
Firstly, two parameters m and s are applied in graph fltering. Taking ACM as an example, we
show their infuence on performance by setting m = [1, 2, 3, 4, 5], s = [0.01, 0.1, 0.3, 0.5, 1, 3, 5, 10]
in Fig. 1. It can be seen that MCGC achieves a reasonable performance for a small m and s.
Figure 1: Sensitivity analysis of parameters m and s on ACM.

Therefore, we set m = 2 and s = 0.5 in all experiments. Afterwards, we tune the trade-off parameter
α = [10−3 , 0.1, 1, 10, 102 , 103 ]. As shown in Fig. 2, our method is not sensitive to α, which enhances
the practicality in real-world applications. In addition, we plot the objective variation of Eq. (7) in
Fig. 3. As observed from this fgure, our method converges quickly.

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

Figure 2: Sensitivity analysis of parameter α on ACM (a-d), DBLP (e-h), IMDB (i-l).

(a) ACM (b) DBLP (c) IMDB

Figure 3: The evolution of objective function.

7 Conclusion
Multi-view graph clustering is till at a nascent stage with many challenges remained unsolved. In
this paper, we propose a novel method (MCGC) to learn a consensus graph by exploiting not only
attribute content but also graph structure information. Particularly, graph fltering is introduced to
flter out noisy components and a contrastive regularizer is employed to further enhance the quality of
learned graph. Experimental results on multi-view attributed graph datasets have shown the superior
performance of our method. This study demonstrates that it is possible for shallow model to beat
deep learning methods facing the systematic use of complex deep neural networks. Graph learning is
crucial to more and more tasks and applications. Just like other methods that learn from data, brings
the risk of learning biases and perpetuating them in the form of decisions. Thus our method should
be deployed with careful consideration of any potential underlying biases in the data. One potential
limitation of our approach is that it could take a lot of memory if the data contain too many nodes.
This is because the size of learned graph is N × N . Research on large scale network is left for future
work.

Acknowledgments and Disclosure of Funding

This paper was in part supported by the Natural Science Foundation of China (Nos.61806045,
U19A2059).

References
Maria Brbić and Ivica Kopriva. Multi-view low-rank sparse subspace clustering. Pattern Recognition,
73:247–258, 2018.

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for
contrastive learning of visual representations. In International conference on machine learning,
pages 1597–1607. PMLR, 2020a.

Yu Chen, Lingfei Wu, and Mohammed Zaki. Iterative deep graph learning for graph neural networks:
Better and robust node embeddings. Advances in Neural Information Processing Systems, 33,
2020b.

Jiafeng Cheng, Qianqian Wang, Zhiqiang Tao, Deyan Xie, and Quanxue Gao. Multi-view attribute
graph convolution networks for clustering. IJCAI, 2020.

Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with
application to face verifcation. In 2005 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.

Shaohua Fan, Xiao Wang, Chuan Shi, Emiao Lu, Ken Lin, and Bai Wang. One2multi graph
autoencoder for multi-view graph clustering. In Proceedings of The Web Conference 2020, pages
3070–3076, 2020.

Michael Gutmann and Aapo Hyvärinen. Noise-contrastive estimation: A new estimation principle
for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on
Artifcial Intelligence and Statistics, pages 297–304. JMLR Workshop and Conference Proceedings,
2010.

Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant
mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR’06), volume 2, pages 1735–1742. IEEE, 2006.

Kaveh Hassani and Amir Hosein Khasahmadi. Contrastive multi-view representation learning on
graphs. In International Conference on Machine Learning, pages 4116–4126. PMLR, 2020.

Zhenyu Huang, Joey Tianyi Zhou, Xi Peng, Changqing Zhang, Hongyuan Zhu, and Jiancheng Lv.
Multi-view spectral clustering network. In IJCAI, pages 2563–2569, 2019.

Zhao Kang, Xiao Lu, Yiwei Lu, chong Peng, Wenyu Chen, and Zenglin Xu. Structure learning with
similarity preserving. Neural Networks, 129:138–148, 2020a.

Zhao Kang, Haiqi Pan, Steven C.H. Hoi, and Zenglin Xu. Robust graph learning from noisy data.
IEEE Transactions on Cybernetics, 50(5):1833–1843, 2020b.
Zhao Kang, Xinjia Zhao, Shi, chong Peng, Hongyuan Zhu, Joey Tianyi Zhou, Xi Peng, Wenyu Chen,
and Zenglin Xu. Partition level multiview subspace clustering. Neural Networks, 122:279–288,
2020c.
Zhao Kang, Zhiping Lin, Xiaofeng Zhu, and Wenbo Xu. Structured graph learning for scalable
subspace clustering: From single-view to multi-view. IEEE Transactions on Cybernetics, 2021.
doi: 10.1109/TCYB.2021.3061660.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio
and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015,
San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
Thomas N Kipf and Max Welling. Variational graph auto-encoders. In NIPS Bayesian Deep Learning
Workshop, 2016.
Yunfan Li, Peng Hu, Zitao Liu, Dezhong Peng, Joey Tianyi Zhou, and Xi Peng. Contrastive clustering.
In AAAI 2021, volume 35, Feb. 2021.
Yijie Lin, Yuanbiao Gou, Zitao Liu, Boyun Li, Jiancheng Lv, and Xi Peng. Completer: Incomplete
multi-view clustering via contrastive prediction. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), June 2021a.
Zhiping Lin and Zhao Kang. Graph flter-based multi-view attributed graph clustering. In Proceedings
of the 30th International Joint Conference on Artifcial Intelligence, IJCAI, pages 19–26, 2021.
Zhiping Lin, Zhao Kang, Lizong Zhang, and Ling Tian. Multi-view attributed graph clustering. IEEE
Transactions on Knowledge and Data Engineering, 2021b. doi: 10.1109/TKDE.2021.3101227.
Changshu Liu, Liangjian Wen, Zhao Kang, Guangchun Luo, and Ling Tian. Self-supervised consensus
representation learning for attributed graph. In Proceedings of the 29th ACM International
Conference on Multimedia, 2021.
Weiyi Liu, Pin-Yu Chen, Sailung Yeung, Toyotaro Suzumura, and Lingli Chen. Principled multilayer
network embedding. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW),
pages 134–141. IEEE, 2017.
Juncheng Lv, Zhao Kang, Xiao Lu, and Zenglin Xu. Pseudo-supervised deep subspace clustering.
IEEE Transactions on Image Processing, 30:5252–5263, 2021.
Zhengrui Ma, Zhao Kang, Guangchun Luo, Ling Tian, and Wenyu Chen. Towards clustering-
friendly representations: Subspace clustering via graph fltering. In Proceedings of the 28th ACM
International Conference on Multimedia, pages 3081–3089, 2020.
Feiping Nie, Jing Li, Xuelong Li, et al. Self-weighted multiview clustering with multiple graphs. In
IJCAI, pages 2564–2570, 2017.
Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. Adversarially
regularized graph autoencoder for graph embedding. In Proceedings of the 27th International Joint
Conference on Artifcial Intelligence, pages 2609–2615, 2018.
Jiwoong Park, Minsik Lee, Hyung Jin Chang, Kyuewang Lee, and Jin Young Choi. Symmetric graph
convolutional autoencoder for unsupervised graph representation learning. In Proceedings of the
IEEE/CVF International Conference on Computer Vision, pages 6519–6528, 2019.
Xi Peng, Zhenyu Huang, Jiancheng Lv, Hongyuan Zhu, and Joey Tianyi Zhou. Comic: Multi-view
clustering without parameter selection. In International Conference on Machine Learning, pages
5092–5101. PMLR, 2019.
Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls
of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale
information network embedding. In Proceedings of the 24th international conference on world
wide web, pages 1067–1077, 2015.
Chun Wang, Shirui Pan, Guodong Long, Xingquan Zhu, and Jing Jiang. Mgae: Marginalized graph
autoencoder for graph clustering. In Proceedings of the 2017 ACM on Conference on Information
and Knowledge Management, pages 889–898, 2017.
Hao Wang, Yan Yang, and Bing Liu. Gmc: Graph-based multi-view clustering. IEEE Transactions
on Knowledge and Data Engineering, 32(6):1116–1129, 2019.
Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. Robust multi-view spectral clustering via low-rank and
sparse decomposition. In Proceedings of the AAAI conference on artifcial intelligence, volume 28,
2014.
Hongming Zhang, Liwei Qiu, Lingling Yi, and Yangqiu Song. Scalable multiplex network embedding.
In IJCAI, volume 18, pages 3082–3088, 2018.
Huasong Zhong, Chong Chen, Zhongming Jin, and Xian-Sheng Hua. Deep robust clustering by
contrastive learning. arXiv preprint arXiv:2008.03030, 2020.
Meiqi Zhu, Xiao Wang, Chuan Shi, Houye Ji, and Peng Cui. Interpreting and unifying graph neural
networks with an optimization framework. In Proceedings of the Web Conference 2021, pages
1215–1226, 2021.

Sim Et Al. - 2024 - Learning To Approximate Adaptive Kernel Convolution On Graphs
No ratings yet
Sim Et Al. - 2024 - Learning To Approximate Adaptive Kernel Convolution On Graphs
9 pages
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
100% (1)
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
14 pages
DLunit 5
No ratings yet
DLunit 5
17 pages
Unit 2
No ratings yet
Unit 2
34 pages
Graph-Based Clustering and Data Visualization Algorithms (PDFDrive)
No ratings yet
Graph-Based Clustering and Data Visualization Algorithms (PDFDrive)
120 pages
2024 - CSUR - A Survey and An Empirical Evaluation of Multi-View Clustering Approaches
No ratings yet
2024 - CSUR - A Survey and An Empirical Evaluation of Multi-View Clustering Approaches
56 pages
A Comprehensive Survey On Deep Graph Representation Learning
No ratings yet
A Comprehensive Survey On Deep Graph Representation Learning
85 pages
Graph Based Clustering
No ratings yet
Graph Based Clustering
78 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
79 pages
A Time Series Is Worth 64 Words - Long-Term Forecasting With Transformers
No ratings yet
A Time Series Is Worth 64 Words - Long-Term Forecasting With Transformers
24 pages
Chap2 Multi View Clustering 1
No ratings yet
Chap2 Multi View Clustering 1
27 pages
TSP CMC 51816
No ratings yet
TSP CMC 51816
18 pages
Multi-View Clustering A Survey
No ratings yet
Multi-View Clustering A Survey
25 pages
Graph Neural Networks Methods Applications and Opp
No ratings yet
Graph Neural Networks Methods Applications and Opp
35 pages
Fimvc Via
No ratings yet
Fimvc Via
12 pages
Outer-Points Shaver-Robust Graph-Based Clustering Via Node Cutting
No ratings yet
Outer-Points Shaver-Robust Graph-Based Clustering Via Node Cutting
13 pages
Deep Multiple Auto Encoder Based Multi View Clustering: Guowang Du Lihua Zhou Yudi Yang Kevin Lü Lizhen Wang
No ratings yet
Deep Multiple Auto Encoder Based Multi View Clustering: Guowang Du Lihua Zhou Yudi Yang Kevin Lü Lizhen Wang
16 pages
ACM'MM'23 - Scalable Incomplete Multi-View Clustering With Structure Alignment
No ratings yet
ACM'MM'23 - Scalable Incomplete Multi-View Clustering With Structure Alignment
10 pages
Graph Conv
No ratings yet
Graph Conv
16 pages
22 GCC
No ratings yet
22 GCC
9 pages
29307-Article Text-33361-1-2-20240324
No ratings yet
29307-Article Text-33361-1-2-20240324
9 pages
Fast Continual Multi-View Clustering With Incomplete Views
No ratings yet
Fast Continual Multi-View Clustering With Incomplete Views
14 pages
ASWT SGNN：基于自适应谱小波变换的自监督图神经网络
No ratings yet
ASWT SGNN：基于自适应谱小波变换的自监督图神经网络
15 pages
Chen Er Al. (2020)
No ratings yet
Chen Er Al. (2020)
12 pages
NeurIPS 2020 Graph Random Neural Networks For Semi Supervised Learning On Graphs Paper
No ratings yet
NeurIPS 2020 Graph Random Neural Networks For Semi Supervised Learning On Graphs Paper
12 pages
Spectral Networks and Deep Locally Connected Networks On Graphs
No ratings yet
Spectral Networks and Deep Locally Connected Networks On Graphs
14 pages
Nguyen 20 C
No ratings yet
Nguyen 20 C
11 pages
Local-Global Fuzzy Clustering With Anchor Graph
No ratings yet
Local-Global Fuzzy Clustering With Anchor Graph
15 pages
FMGNN: Fused Manifold Graph Neural Network
No ratings yet
FMGNN: Fused Manifold Graph Neural Network
17 pages
1312.6203 Spectral Networks and Locally Connected Networks
No ratings yet
1312.6203 Spectral Networks and Locally Connected Networks
14 pages
Efficient Multi-View Graph Clustering With Local and Global Structure Preservation
No ratings yet
Efficient Multi-View Graph Clustering With Local and Global Structure Preservation
10 pages
Multiview Fuzzy Clustering Based On Anchor Graph
No ratings yet
Multiview Fuzzy Clustering Based On Anchor Graph
12 pages
Deep Facial Expression Recognition A Survey
No ratings yet
Deep Facial Expression Recognition A Survey
21 pages
Deep Multi-View Semi-Supervised Clustering
No ratings yet
Deep Multi-View Semi-Supervised Clustering
14 pages
Joint Learning of Latent Similarity and Local Embedding For Multi-View Clustering
No ratings yet
Joint Learning of Latent Similarity and Local Embedding For Multi-View Clustering
13 pages
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
No ratings yet
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
23 pages
Deep Multi-View Subspace Clustering With Unified and Discriminative Learning
No ratings yet
Deep Multi-View Subspace Clustering With Unified and Discriminative Learning
11 pages
Mtech Ai ML
No ratings yet
Mtech Ai ML
30 pages
2025-Divide-and-Contrast - A Text-Based Method For Firm Market Risk Prediction
No ratings yet
2025-Divide-and-Contrast - A Text-Based Method For Firm Market Risk Prediction
18 pages
GRAE: Graph Recurrent Autoencoder For Multi-View Graph Clustering
No ratings yet
GRAE: Graph Recurrent Autoencoder For Multi-View Graph Clustering
9 pages
Simonovsky Dynamic Edge-Conditioned Filters CVPR 2017 Paper
No ratings yet
Simonovsky Dynamic Edge-Conditioned Filters CVPR 2017 Paper
10 pages
Graph Contrastive Learning With Augmentations
No ratings yet
Graph Contrastive Learning With Augmentations
12 pages
2016-IJCAI-Parameter-Free Auto-Weighted Multiple Graph Learning
No ratings yet
2016-IJCAI-Parameter-Free Auto-Weighted Multiple Graph Learning
7 pages
Liu Multi-View Self-Constructing Graph Convolutional Networks With Adaptive Class Weighting Loss CVPRW 2020 Paper
No ratings yet
Liu Multi-View Self-Constructing Graph Convolutional Networks With Adaptive Class Weighting Loss CVPRW 2020 Paper
7 pages
Individuality-Enhanced and Multi-Granularity Consistency-Preserving Graph Neural Network For Semi-Supervised Node Classification
No ratings yet
Individuality-Enhanced and Multi-Granularity Consistency-Preserving Graph Neural Network For Semi-Supervised Node Classification
16 pages
Deep Multimodal Representation Learning A Survey
No ratings yet
Deep Multimodal Representation Learning A Survey
22 pages
Geometric Deep Learning On Graphs and Manifolds Using Mixture Model Cnns
No ratings yet
Geometric Deep Learning On Graphs and Manifolds Using Mixture Model Cnns
13 pages
1 s2.0 S0893608024004775 Main
No ratings yet
1 s2.0 S0893608024004775 Main
14 pages
Xu Et Al. 2023
No ratings yet
Xu Et Al. 2023
12 pages
GCLAA
No ratings yet
GCLAA
12 pages
Friday Final PPT
No ratings yet
Friday Final PPT
12 pages
Learning With L1-Graph For Image Analysis-rD5
No ratings yet
Learning With L1-Graph For Image Analysis-rD5
9 pages
Pattern Vectors From Algebraic Graph Theory
No ratings yet
Pattern Vectors From Algebraic Graph Theory
14 pages
Spectral Approach (BU)
No ratings yet
Spectral Approach (BU)
2 pages
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
No ratings yet
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
49 pages
A Survey of Deep Graph Clustering
No ratings yet
A Survey of Deep Graph Clustering
20 pages
Learning To Cluster Faces On An Affinity Graph
No ratings yet
Learning To Cluster Faces On An Affinity Graph
9 pages
GNN Foundations Frontiers and Applications Chapter2
No ratings yet
GNN Foundations Frontiers and Applications Chapter2
10 pages
Spectral Bellman Method: Unifying Representation and Exploration in RL
No ratings yet
Spectral Bellman Method: Unifying Representation and Exploration in RL
18 pages
Self-Constructing Graph Convolutional Networks For Semantic Labeling
No ratings yet
Self-Constructing Graph Convolutional Networks For Semantic Labeling
4 pages
DCGAN (Deep Convolution Generative Adversarial Networks)
No ratings yet
DCGAN (Deep Convolution Generative Adversarial Networks)
27 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
Graph Transformer Networks: Corresponding Author
No ratings yet
Graph Transformer Networks: Corresponding Author
11 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
22 pages
Lifelong Graph Learning: Preprint. Under Review
No ratings yet
Lifelong Graph Learning: Preprint. Under Review
12 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
To Compress or Not To Compress - Self-Supervised Learning and Information Theory: A Review
No ratings yet
To Compress or Not To Compress - Self-Supervised Learning and Information Theory: A Review
38 pages
Chang 2023 Arxiv
No ratings yet
Chang 2023 Arxiv
20 pages
The Platonic Representation Hypothesis
No ratings yet
The Platonic Representation Hypothesis
26 pages
【TPAMI 综述】时间序列分析的自我监督学习
No ratings yet
【TPAMI 综述】时间序列分析的自我监督学习
20 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Multi-Modal, Multi-Granularity Path Representation Learning - Extended Version.18428v1
No ratings yet
Multi-Modal, Multi-Granularity Path Representation Learning - Extended Version.18428v1
12 pages
Point Transformer V3
No ratings yet
Point Transformer V3
15 pages
Deep Representation Learning of Patient Data From Electronic Health Records (EHR)
No ratings yet
Deep Representation Learning of Patient Data From Electronic Health Records (EHR)
13 pages
Exploring Wav2vec 2.0 Fine Tuning For Improved Speech Emotion Recognition
No ratings yet
Exploring Wav2vec 2.0 Fine Tuning For Improved Speech Emotion Recognition
5 pages
Rasa: Relation and Sensitivity Aware Representation Learning For Text-Based Person Search
No ratings yet
Rasa: Relation and Sensitivity Aware Representation Learning For Text-Based Person Search
13 pages
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
No ratings yet
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
8 pages
Pip-Net: Patch-Based Intuitive Prototypes For Interpretable Image Classification
No ratings yet
Pip-Net: Patch-Based Intuitive Prototypes For Interpretable Image Classification
10 pages
NeurIPS 2018 Kong Kernels For Ordered Neighborhood Graphs Paper
No ratings yet
NeurIPS 2018 Kong Kernels For Ordered Neighborhood Graphs Paper
10 pages
Augmentation-Free Self-Supervised Learning
No ratings yet
Augmentation-Free Self-Supervised Learning
9 pages
1801 05365 PDF
No ratings yet
1801 05365 PDF
15 pages
Representation Learning
No ratings yet
Representation Learning
6 pages
Deep Learning For Sensor-Based Activity Recognition: A Survey
No ratings yet
Deep Learning For Sensor-Based Activity Recognition: A Survey
10 pages
Multimodal Deep Learning
No ratings yet
Multimodal Deep Learning
8 pages
Online Network: G-CNN Encoder
No ratings yet
Online Network: G-CNN Encoder
1 page