0% found this document useful (0 votes)

12 views10 pages

Higher-Order Clustering and Pooling For Graph Neural Networks

The document presents HoscPool, a novel clustering-based pooling operator for Graph Neural Networks (GNNs) that captures higher-order connectivity patterns to enhance graph representations. By learning a probabilistic cluster assignment matrix through motif spectral clustering, HoscPool aims to address limitations of existing pooling methods that often rely solely on first-order information. The authors evaluate HoscPool's performance on various graph classification tasks, demonstrating its effectiveness in capturing hierarchical structures and outperforming traditional pooling techniques.

Uploaded by

2309994706

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

Higher-Order Clustering and Pooling For Graph Neural Networks

Uploaded by

2309994706

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Higher-order Clustering and Pooling for Graph Neural Networks

Alexandre Duval Fragkiskos Malliaros

Université Paris-Saclay, CentraleSupélec, Inria Université Paris-Saclay, CentraleSupélec, Inria
Gif-sur-Yvette, France Gif-sur-Yvette, France
[email protected] [email protected]

ABSTRACT all nodes’ embeddings to create a single graph representation, usu-

Graph Neural Networks achieve state-of-the-art performance on a ally via a simple sum or average operation [2]. This global pooling
plethora of graph classification tasks, especially due to pooling oper- discards completely graph structure when computing its final rep-
ators, which aggregate learned node embeddings hierarchically into resentation, failing to capture the topology of many real-world
a final graph representation. However, they are not only questioned networks and thus preventing researchers to build effective GNNs.
by recent work showing on par performance with random pooling, More desirable alternatives emerged to solve this limitation. They
but also ignore completely higher-order connectivity patterns. To progressively coarsen the graph between message passing layers,
tackle this issue, we propose HoscPool, a clustering-based graph for instance by regrouping highly connected nodes (i.e. clusters)
pooling operator that captures higher-order information hierarchi- together into supernodes with adapted adjacency / feature vectors.
cally, leading to richer graph representations. In fact, we learn a This allows to better capture the graph hierarchical structure com-
probabilistic cluster assignment matrix end-to-end by minimising pared to global pooling, without loosing relevant information if
relaxed formulations of motif spectral clustering in our objective the coarsening is accurately done. While the first clustering-based
function, and we then extend it to a pooling operator. We evaluate pooling algorithms were deterministic [9, 12] – because of their
HoscPool on graph classification tasks and its clustering compo- high computational complexity, their transductive nature and their
nent on graphs with ground-truth community structure, achieving incapacity to leverage node features – they were replaced by train-
best performance. Lastly, we provide a deep empirical analysis of able end-to-end clustering approaches such as StructPool [51] or
pooling operators’ inner functioning. The code is available here. DiffPool [49]. Such methods solve the above limitations, often by
learning a cluster assignment matrix along with GNN parameters
CCS CONCEPTS thanks to a specific loss function, e.g. a link prediction score.
Despite presenting many advantages, such methods pool nodes
• Information systems → Data mining; • Computing method-
together based on a simple functions or metrics which often lack
ologies → Machine learning algorithms.
strong supporting theoretical foundations. Besides, they reduce
the graph uniquely based on first-order information. And in many
KEYWORDS cases, graph datasets may not present any edge-based connectivity
Graph Neural Networks (GNNs), Graph Pooling, Clustering. structure, leading to insignificant graph coarsening steps, while
ACM Reference Format: they may have clear community structure with respect to more
Alexandre Duval and Fragkiskos Malliaros. 2022. Higher-order Clustering complex (domain-specific) motifs [21]. Overall, this limits the ex-
and Pooling for Graph Neural Networks. In Proceedings of the 31st ACM pressiveness of the hierarchical information captured, and therefore
International Conference on Information and Knowledge Management (CIKM of the classification performance. On top of that, existing pooling
’22), October 17–21, 2022, Atlanta, GA, USA. ACM, New York, NY, USA, operators were surprisingly shown to perform on par with random
10 pages. https://doi.org/10.1145/3511808.3557353
pooling for many graph classification tasks, raising major concerns
[27] and finding limited justifications. This discovery appears rather
1 INTRODUCTION counter-intuitive as we logically expect the graph coarsening step,
Graph Neural Networks are powerful tools for graph datasets due that is, the way to pool nodes together, to increase significantly the
to their message passing scheme, where they propagate node fea- graph hierarchical information captured in its final representation.
tures along the edges of the graph to compute meaningful node Combining these two facts, we propose HoscPool, a new end-
representations [14, 17]. They achieve state-of-the-art performance to-end higher-order pooling operator grounded on probabilistic
on a variety of tasks including clustering, link prediction, node and motif spectral clustering to capture a more advanced type of com-
graph classification [53]. For the latter, since the goal is to predict munities thanks to the incorporation of higher-order connectivity
the label of the entire graph, standard approaches pool together patterns. The latter has shown to be very successful for a wide
range of applications [20] but has not yet been applied to graph
Permission to make digital or hard copies of all or part of this work for personal or classification, while it could greatly benefit from it. Specifically, we
classroom use is granted without fee provided that copies are not made or distributed hierarchically coarsen the input graph using a cluster assignment
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
matrix S learned by defining a well-motivated objective function,
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, which includes continuous relaxations of motif conductance and
to post on servers or to redistribute to lists, requires prior specific permission and/or a thus combines various types of connectivity patterns for greater
fee. Request permissions from [email protected].
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. expressiveness. Since the process is fully differentiable, we can stack
© 2022 Association for Computing Machinery. several such pooling layers, intertwined by message passing layers,
ACM ISBN 978-1-4503-9236-5/22/10. . . $15.00 to capture graph hierarchical information. We jointly optimise our
https://doi.org/10.1145/3511808.3557353

426
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Alexandre Duval & Fragkiskos Malliaros

1 1
unsupervised loss with any task-specific supervised loss function Laplacian matrix of G. Ã = D − 2 AD − 2 ∈ R𝑁 ×𝑁 is the symmetri-
to allow truly end-to-end graph classification. Finally, we evaluate cally normalised adjacency matrix with corresponding D̃, L̃.
the performance of HoscPool on a plethora of graph datasets, and
the reliability of its clustering algorithm on a variety of graphs 3.1 Graph Cut and Normalised Cut
endowed with ground-truth community structure. During this ex- Clustering involves partitioning the vertices of a graph into 𝐾
periment phase, we proceed to a deep analysis aimed to understand disjoint subsets with more intra-connections than inter-connections
why existing pooling methods fail to truly outperform random base- [43]. One of the most common and effective way to do it [35] is to
lines and attempt to provide explications. This is another important solve the Normalised Cut problem [38]:
contribution, which we hope will help future works.
cut(S𝑘 , S¯𝑘 )
𝐾
∑︁
min , (1)
2 RELATED WORK S1 ,...,S𝐾 vol(S𝑘 )
𝑘=1
Graph pooling. Leaving aside global pooling [2, 39, 47], we distin-
where S¯𝑘 = V\S𝑘 , cut(S𝑘 , S¯𝑘 ) = 𝑖 ∈ S𝐾 ,𝑗 ∈ S̄𝐾 𝐴𝑖 𝑗 , and vol(S𝑘 ) =
Í
guish between two main types of hierarchical approaches. Node Í
drop methods [3, 13, 19, 31, 48, 50, 52] use a learnable scoring func- 𝑖 ∈ S𝑘 ,𝑗 ∈ V 𝐴𝑖 𝑗 . Unlike the simple min-cut objective, (1) scales each
tion based on message passing representations to assess all nodes term by the cluster volume, thus enforcing clusters to be “reason-
and drop the ones with lowest score. The drawback is that we ably large” and avoiding degenerate solutions where most nodes
loose information during pooling by dropping completely certain are assigned to a single cluster. Although minimising (1) is NP-hard
nodes. On the other hand, clustering approaches cast the pooling [44], there are approximation algorithms with theoretical guaran-
problem as a clustering one [10, 23–25, 33, 46, 51]. For instance, tees [8] for finding clusters with small conductance, such as Spectral
StructPool [51] utilizes conditional random fields to learn the Clustering (SC), which proposes clusters determined based on the
cluster assignment matrix; HaarPool [46] uses the compressive eigen-decomposition of the Laplacian matrix. A refresher on SC is
Haar transform; EdgePool [10] gradually merges nodes by con- provided in [43].
tracting high-scoring edges. Of particular interest here are two very
popular end-to-end clustering methods, namely DiffPool [49] and 3.2 Motif conductance
MinCutPool [5], because of their original and efficient underlying While the Normalised Cut builds on first-order connectivity pat-
idea. While DiffPool utilises a link prediction objective along with terns (i.e. edges), [4, 42] propose to cluster a network based on
an entropy regularization to learn the cluster assignment matrix, specific higher-order substructures. Formally, for graph G, motif
MinCutPool leverages an min-cut score objective along with an 𝑀 made of |𝑀 | nodes, and M = {v ∈ V |𝑀 | |v = 𝑀 } the set of
orthogonality term. Although there are more pooling operators, all instances of 𝑀 in G, they propose to search for the partition
we wish to improve this line of method, that we think is promis- S1, . . . , S𝐾 minimising motif conductance:
ing and perfectible. In addition to solving existing limitations, we
want to introduce the notion of higher-order to pooling for graph 𝐾 cut ( G) (S , S¯ )
∑︁
𝑀 𝑘 𝑘
classification, which is unexplored yet. min , (2)
S1 ,...,S𝐾 ( G)
Higher-order connectivity patterns (i.e. motifs – small network 𝑘=1 vol𝑀 (S𝑘 )
subgraphs like triangles ).), are known to be the fundamental ( G)
where cut𝑀 (S𝑘 , S¯𝑘 ) = v∈ M 1(∃𝑖, 𝑗 ∈ v|𝑖 ∈ S𝑘 , 𝑗 ∈ S¯𝑘 ), i.e.
Í
building blocks of complex networks [6, 28]. They are essential for
the number of instances v of 𝑀 with at least one node in S𝑘 and
modelling and understanding the organization of various types of ( G)
at least one node in S¯𝑘 ; and vol𝑀 (S𝑘 ) = v∈ M 𝑖 ∈v 1(𝑖 ∈ S𝑘 ),
Í Í
networks. For instance, they play an essential role in the characteri-
sation of social, biological or molecules networks [30]. [11] showed i.e. the number of motif instance endpoints in S𝑘 .
that vertices participating in the same higher-order structure often
share the same label, spreading its adoption to node classification 4 PROPOSED METHOD
tasks [20, 22]. Going further, several recent research papers have The objective of this paper is to design a differentiable cluster as-
clearly demonstrated the benefits of leveraging higher-order struc- signment matrix S that learns to find relevant clusters based on
ture for link prediction [1, 37], explanation generation [32, 36], higher-order connectivity patterns, in an end-to-end manner within
ranking [34], clustering [15, 18]. Regarding the latter, [4, 42] argue any GNN architecture. To achieve this, we formulate a continuous
that domain-specific motifs are a better signature of the community relaxation of motif spectral clustering and embed the derived for-
structure than simple edges. Their intuition is that motifs allow us mulation into the model objective function to enforce its learning.
to focus on particular network substructures that are important
for networks of a given domain. As a result, they generalized the 4.1 Probabilistic motif spectral clustering
notion of conductance to triangle conductance (Section 3), which Before exploring how we can rewrite the motif conductance op-
was found highly beneficial by [6, 40]. timisation problem (2) in a solvable way, we introduce the motif
adjacency matrix A𝑀 , where each entry (𝐴𝑀 )𝑖 𝑗 represents the
3 PRELIMINARY KNOWLEDGE number of motifs in which both node 𝑖 and node 𝑗 participate. Its di-
Í
G = (V, E) is a graph with vertex set V and edge set E, charac- agonal has zero values. Formally, (𝐴𝑀 )𝑖 𝑗 = v∈ M 1(𝑖, 𝑗 ∈ v, 𝑖 ≠ 𝑗).
terised by its adjacency matrix A ∈ R𝑁 ×𝑁 and node feature matrix
Í
G𝑀 is the graph induced by A𝑀 . (𝐷 𝑀 )𝑖𝑖 = 𝑁 𝑗=1 (𝐴𝑀 )𝑖 𝑗 and L𝑀
X ∈ R𝑁 ×𝐹 . D = diag(A1𝑁 ) is the degree matrix and L = D − A the are the motif degree and motif Laplacian matrices.

427
Higher-order Clustering and Pooling for Graph Neural Networks CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.

For now, we focus on triangle motifs (𝑀 = 𝐾3 ), and extend to We compute the soft cluster assignment matrix S using one (or
more complex motifs in Section 4.2. From [4], we have: more) fully connected layer(s), mapping each node’s representation
( G) 1 ∑︁ ∑︁ X𝑖∗ to its probabilistic cluster assignment vector S𝑖∗ . We apply a
cut𝑀 (S𝑘 , S¯𝑘 ) = (𝐴𝑀 )𝑖 𝑗 softmax activation function to enforce the constraint inherited from
2 ¯ 𝑖 ∈ S𝑘 𝑗 ∈𝑆𝑘 (4): 𝑆𝑖 𝑗 ∈ [0, 1] and S1𝐾 = 1𝑁 :
( G) 1 ∑︁ ∑︁
vol𝑀 (S𝑘 ) = (𝐴𝑀 )𝑖 𝑗 , S = FC(X; 𝚯). (5)
2
𝑖 ∈ S𝑘 𝑗 ∈ V 𝚯 are trainable parameters, optimised by minimising the unsuper-
which enables us to rewrite (2) as: vised loss function L𝑚𝑐 , which approximates the relaxed formula-
Í tion of the motif conductance problem (4):
𝑖 ∈ S ,𝑗 ∈ S¯𝑘 (𝐴𝑀 )𝑖 𝑗
𝐾
∑︁
min Í 𝑘 1
⊤
S A S

S1 ,...,S𝐾 (𝐴 )𝑖 𝑗 L𝑚𝑐 = − · Tr ⊤ 𝑀 . (6)
𝑘=1 𝑖 ∈ S𝑘 ,𝑗 ∈ V 𝑀 𝐾 S D𝑀 S
𝐾 Í
𝑖,𝑗 ∈ S𝑘 (𝐴𝑀 )𝑖 𝑗 Referring to the spectral clustering formulation1 , L𝑚𝑐 ∈ [−1, 0].
∑︁
≡ max Í , (3)
S1 ,...,S𝐾 𝑖 ∈ S𝑘 ,𝑗 ∈ V (𝐴𝑀 )𝑖 𝑗
𝑘=1 It reaches −1 when G𝑀 has ≥ 𝐾 connected components (no motif
where the last equivalence follows from endpoints are separated by clustering), and 0 when for each pair of
∑︁ ∑︁ ∑︁ nodes participating in the same motif (i.e. (𝐴𝑀 )𝑖 𝑗 > 0), the cluster
(𝐴𝑀 )𝑖 𝑗 + (𝐴𝑀 )𝑖 𝑗 = (𝐴𝑀 )𝑖 𝑗 . assignments are orthogonal: ⟨S𝑖∗, S 𝑗∗ ⟩ = 0. L𝑚𝑐 is a non-convex
𝑖,𝑗 ∈ S𝑘 𝑖 ∈ S𝑘 ,𝑗 ∈ S¯𝑘 𝑖 ∈ S𝑘 ,𝑗 ∈ V function and its minimisation can lead to local minima, although
Instead of using partition sets, we define a discrete cluster as- our probabilistic membership formulation makes it less likely to
signment matrix S ∈ {0, 1}𝑁 ×𝐾 where S𝑖 𝑗 = 1 if 𝑣𝑖 ∈ S 𝑗 and 0 happen w.r.t. hard membership [16].
In fact, we allow the combination of several motifs inside our ob-
otherwise. We denote by S 𝑗 = [𝑆 1𝑗 , . . . , 𝑆 𝑁 𝑗 ] ⊤ the 𝑗 𝑡ℎ column of S, Í
jective function (6) via L𝑚𝑐 = 𝑗 𝛼 𝑗 L𝑚𝑐 𝑗 where L𝑚𝑐 𝑗 denotes the
which indicates the nodes belonging to cluster S 𝑗 . Using this, we
objective function with respect to a particular motif (e.g., edge , tri-
transform (3) into:
angle , 4-nodes cycle ) and 𝛼 𝑗 is an importance factor. This also
𝐾 Í
𝑖,𝑗 ∈ V (𝐴𝑀 )𝑖 𝑗 𝑆𝑖𝑘 𝑆 𝑗𝑘 increases the power of our method, allowing us to find communities
∑︁
max Í of nodes w.r.t. a hierarchy of higher-order substructures. As a result,
S∈ {0,1} 𝑁 ×𝐾
𝑖,𝑗 ∈ V 𝑆𝑖𝑘 (𝐴𝑀 )𝑖 𝑗
𝑘=1
the graph coarsening step will pool together more relevant groups
𝐾 S⊤ A S
∑︁
𝑘 𝑀 𝑘 of nodes, potentially capturing more relevant patterns in subse-
≡ max
S∈ {0,1} 𝑁 ×𝐾 S𝑘⊤ D𝑀 S𝑘 quent layers, ultimately producing richer graph representation. We
𝑘=1
⊤ implement it for edge and triangle motifs:
S A S
−Tr ⊤ 𝑀 ,
⊤ ⊤
≡ min (4)

𝛼1 S AS 𝛼2 S A S
S∈ {0,1} 𝑁 ×𝐾 S D𝑀 S L𝑚𝑐 = − · Tr ⊤ − · Tr ⊤ 𝑀 . (7)
𝐾 S DS 𝐾 S D𝑀 S
where the division sign in the last line is an element-wise division
We let 𝛼 1 , 𝛼 2 , to be dynamic functions of the epoch, subject to
on the diagonal of both matrices. By definition, S is subject to the
𝛼 1 + 𝛼 2 = 1, allowing to first optimise higher-order motifs before
constraint S1𝐾 = 1𝑁 , i.e. each node belongs exactly to 1 cluster.
moving on to smaller ones. It helps refine the level of granular-
This optimisation problem is NP-hard since S take discrete val-
ity progressively and was found desirable empirically. This is the
ues. We thus relax it to a probabilistic framework, where S take
higher-order clustering formulation that we consider in the paper.
continuous values in the range [0, 1], representing cluster member-
In case we would like to enforce more rigorously the hard clus-
ship probabilities, i.e. each entry 𝑆𝑖𝑘 denotes the probability that
ter assignment, characteristic of the original motif conductance
node 𝑖 belongs to cluster 𝑘. Referring to [43] and [4], solving this
formulation, we design an auxiliary loss function:
continuous relaxation of motif spectral clustering approximates a
closed form solution with theoretical guarantees, provided by the √ 𝐾

1 1 ∑︁
Cheeger inequality [8]. Compared to the original hard assignment L𝑜 = √ 𝐾−√ ||𝑆 ∗𝑗 || 𝐹 , (8)
𝐾 −1 𝑁 𝑗=1
problem, this soft cluster assignment formulation is less likely to
be trapped in local minima [16]. It also allows to generalise easily where ∥ · ∥ 𝐹 indicates the Frobenius norm. This orthogonality loss
to multi-class assignment, expresses uncertainty in clustering, and encourages more balanced and discrete clusters (i.e. a node assigned
can be optimised within any GNN. to a cluster with high probability, while to other clusters with a
low one), discouraging further degenerate solutions. Although its
4.2 End-to-end clustering framework effect overlaps with L𝑚𝑐 , it often smoothes out the optimisation
In this section, we leverage this probabilistic approximation of motif process and even improves slightly performance in complex tasks
conductance to learn our cluster assignment matrix S in a trainable or networks, such as graph classification. In (8), we rescale L𝑜 to
manner. Our method addresses the limitations of (motif) spectral [0, 1], making it commensurable to L𝑚𝑐 . As a result, the two terms
clustering: we cluster nodes based both on graph topology and can be safely summed and optimised together when specified. A
node features; leverage higher-order connectivity patterns; avoid parameter 𝜇 controls the strength of this regularisation.
the expensive eigen-decomposition of the motif Laplacian; and 1 The largest eigenvalue A𝑀 S = 𝜆D𝑀 S is 1 and the smallest 0; we are summing only
allow to cluster out-of-sample graphs. the 𝑘 largest eigenvalues.

428
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Alexandre Duval & Fragkiskos Malliaros

Similarly to other cluster-based pooling operators, our method HoscPool

relies on two assumptions. Firstly, nodes are identifiable via their 1
features. Secondly, node features represent a good initialisation for

Loss value
computing cluster assignments. The latter is realistic due to the
homophily property of many real-world networks [26] as well as 0 objective
the smoothing effect of message passing layers [7], which render regularizer
connected nodes more similar.
We conclude this section with a note for future work. An in- −1
teresting research direction would be to extend this framework to 0 10 20 30 40 50 60
4-nodes motifs. Despite having managed to derive a theoretical MinCutPool
formulation for the 4-nodes motif conductance problem in Appen- 1
dix C, it becomes complex and would probably necessitate its own

Loss value
dedicated research, as it could be an promising extension.
0 objective
4.3 Higher-order graph coarsening regularizer
The methodology detailed in the previous sections is a general −1
clustering technique that can be used for any clustering tasks on
0 10 20 30 40 50 60
any graph dataset. In this paper, we utilise it to form a pooling
MinCutPool - degenerate solution
operator, called HoscPool, which exploits the cluster assignment
matrix S to generate a coarsened version of the graph (with fewer 1
nodes and edges) that preserve critical information and embeds

Loss value
higher-order connectivity patterns. More precisely, it coarsens the
0 objective
existing graph by creating super-nodes from the derived clusters,
with a new edge set and feature vector, depending on previous regularizer
nodes belonging to this cluster. Mathematically, −1
𝑝𝑜𝑜𝑙 𝑝𝑜𝑜𝑙 𝑝𝑜𝑜𝑙 0 10 20 30 40 50 60
HoscPool : G = (X, A) → G = (X ,A )
Epochs
A𝑝𝑜𝑜𝑙 = S⊤ AS and X𝑝𝑜𝑜𝑙 = S⊤ X.
Figure 1: Loss function value w.r.t. epochs. MinCutPool op-
𝑝𝑜𝑜𝑙
Each entry 𝑋𝑖,𝑗 denotes feature 𝑗’s value for cluster 𝑖, calculated timises the orthogonality loss, which decreases smoothly,
as a sum of feature 𝑗’s value for the nodes belonging to cluster 𝑖, while its min-cut objective remains constant (acting like a
weighted by the corresponding cluster assignment scores. A𝑝𝑜𝑜𝑙 ∈ regularizer); whereas HoscPool optimises the main objective
R𝐾 ×𝐾 is a symmetric matrix where 𝐴𝑖,𝑗 can be viewed as the
𝑝𝑜𝑜𝑙 directly. Sometimes, MinCutPool does not manage to opti-
connection strength between cluster 𝑖 and cluster 𝑗. Given our mise the regularizer loss, yielding a degenerate clustering.
optimisation function, it will be a diagonal-dominant matrix, which
will hamper the propagation across adjacent nodes. For this reason,
we remove self-loops. We also symmetrically normalise the new 4.4 Comparison with relevant baselines
adjacency matrix. Lastly, note that we use the original A and X Before moving to the experiments, we take a moment to emphasise
for this graph coarsening step; their motif counterparts A𝑀 and the key differences with respect to core end-to-end clustering-based
X𝑀 are simply leveraged to compute the loss function. Our work pooling baselines. We focus on MinCutPool in the following since
thus differ clearly from diffusion methods and traditional GNN it is our closest baseline. DiffPool and others differ more signifi-
leveraging higher-order. cantly, in addition to being less theoretically-grounded and efficient.
Because our GNN-based implementation of motif spectral clus- Firstly, MinCutPool focuses on first-order connectivity patterns,
tering is fully differentiable, we can stack several HoscPool layers, while we work on higher-order, which implies a more elaborated
intertwined with message passing layers, to hierarchically coarsen background theory with the construction and combination of sev-
the graph representation. In the end, a global pooling and some eral motif adjacency matrices (each specific to a particular motif).
dense layers produce a graph prediction. The parameters of each This shall lead to capturing more advanced types of communities,
HoscPool layer can be learned end-to-end by jointly optimizing: producing ultimately a better coarsening of the graph. Secondly,
we approximate a probabilistic version of the motif conductance
L = L𝑚𝑐 + 𝜇L𝑜 + L𝑠 , (9) problem (extension of the normalised min-cut to motifs) whereas
MinCutPool approximates the relaxed unormalised min-cut prob-
where L𝑠 denotes any supervised loss for a particular downstream lem. Despite claiming to formulate a relaxation of the normalised
task (here the cross entropy loss). This way, we should be able to min-cut (a trace ratio), it truly minimises a ratio of traces in the
hierarchically capture relevant graph higher-order structure while Tr(S⊤ ÃS)
. Since Tr(S⊤ D̃S) = 𝑖 ∈ V 𝐷˜ 𝑖𝑖 is a
Í
objective function: −
learning GNN parameters so as to ultimately better classify the Tr(S⊤ D̃S)
graphs within our dataset. constant, this yields the unormalised min-cut −Tr(S⊤ ÃS), which

429
Higher-order Clustering and Pooling for Graph Neural Networks CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.

Table 1: (Right) NMI obtained by clustering the nodes of various networks over 10 different runs. Best results are in bold, second
best underlined. The number of clusters 𝐾 is equal to the number of node classes. (Left) Dataset properties.

Dataset Nodes Edges Feat. 𝐾 SC MSC DiffPool MinCutPool HP-1 HP-2 HoscPool
Cora 2,708 5,429 1,433 7 0.150 ± 0.002 0.056 ± 0.014 0.308 ± 0.023 0.391 ± 0.028 0.435 ± 0.032 0.464 ± 0.036 0.502 ± 0.029
PubMed 19,717 88,651 500 3 0.183 ± 0.002 0.002 ± 0.000 0.098 ± 0.006 0.214 ± 0.066 0.230 ± 0.071 0.215 ± 0.073 0.260 ± 0.054
Photo 7,650 287,326 745 8 0.592 ± 0.008 0.451 ± 0.011 0.171 ± 0.004 0.086 ± 0.014 0.495 ± 0.068 0.513 ± 0.083 0.598 ± 0.101
PC 13,752 245,861 767 10 0.464 ± 0.002 0.166 ± 0.009 0.043 ± 0.008 0.026 ± 0.006 0.497 ± 0.040 0.499 ± 0.036 0.528 ± 0.041
CS 18,333 81,894 6,805 15 0.273 ± 0.006 0.011 ± 0.009 0.383 ± 0.048 0.431 ± 0.060 0.479 ± 0.022 0.701 ± 0.029 0.731 ± 0.018
Karate 34 156 10 2 0.792 ± 0.035 0.870 ± 0.031 0.715 ± 0.018 0.751 ± 0.090 0.792 ± 0.038 0.862 ± 0.046 0.894 ± 0.039
DBLP 17,716 105,734 1,639 4 0.027 ± 0.003 0.005 ± 0.006 0.186 ± 0.014 0.334 ± 0.026 0.326 ± 0.027 0.284 ± 0.026 0.312 ± 0.027
Polblogs 1,491 33,433 10 2 0.017 ± 0.000 0.014 ± 0.001 0.317 ± 0.010 0.440 ± 0.390 0.992 ± 0.003 0.994 ± 0.001 0.994 ± 0.005
Email-eu 1,005 32,770 10 42 0.485 ± 0.030 0.382 ± 0.019 0.096 ± 0.034 0.253 ± 0.028 0.317 ± 0.026 0.488 ± 0.025 0.476 ± 0.021
Syn1 1,000 6,243 10 3 0.000 ± 0.000 1.000 ± 0.000 0.035 ± 0.000 0.043 ± 0.008 0.041 ± 0.006 1.000 ± 0.000 1.000 ± 0.000
Syn2 1,000 5,496 10 2 0.003 ± 0.000 0.050 ± 0.003 0.081 ± 0.008 0.902 ± 0.028 0.942 ± 0.028 1.000 ± 0.000 1.000 ± 0.000
Syn3 500 48,205 10 5 1.000 ± 0.000 1.000 ± 0.000 0.067 ± 0.001 0.052 ± 0.002 0.115 ± 0.006 0.826 ± 0.005 1.000 ± 0.000

often produces degenerate solutions. To cope with this limitation, entropy and node cluster membership is determined by the argmax
MinCutPool optimises in parallel a penalty term L𝑜 encouraging of its assignment probabilities. We also calculate completeness,
balanced and discrete clusters assignments. But despite this regu- modularity, normalised cut, and motif conductance (App. Table 7).
larizer, it often gets stuck in local minima [45] (see Fig. 1), as we Datasets. We use a collection of node classification datasets with
will see empirically in Section 5. We spot and correct this weakness ground truth community labels: citation networks Cora, PubMed;
in HoscPool. Thirdly, we introduced a new and more powerful collaboration networks DBLP, Coauthor CS; co-purchase networks
orthogonality term together with a regularization control parame- Amazon Photo, Amazon PC; the KarateClub community network;
ter. Unlike MinCutPool, it is unnecessary but often smoothes out and communication networks Polblogs and Eu-email. They are
training and improves performance. Lastly, we showcase a different all taken from Pytorch Geometric. We construct three synthetic
architecture involving a more general way of computing S. datasets: Syn1, Syn2, Syn3 (based on several random graphs) where
node labels are determined based on higher-order community struc-
5 EVALUATION ture and node features are simple graph statistics (Appendix A).
We now evaluate the benefits of the proposed method, with the They are designed to show the additional efficiency of HoscPool
goal of answering the following questions: when datasets have clear higher-order structure, which is not al-
(1) Does our differentiable higher-order clustering algorithm ways the case for the standard baseline datasets chosen.
compute meaningful clusters? Is considering higher-order Baselines. We compare HoscPool with the original spectral clus-
structures beneficial? tering (SC), motif spectral clustering (MSC)2 as well as key pooling
(2) How does HoscPool compare with state-of-the-art pooling baselines DiffPool and MinCutPool. We refer to all methods by
approaches for graph classication tasks? their pooling name for simplicity, although this experiment focuses
(3) Why do existing pooling operators fail to outperform signif- on the clustering part and does not involve the coarsening step. We
icantly random pooling? repeat all experiments 10 times and average results across runs. For
ablation study, let HP-1 and HP-2 denote HoscPool where L𝑚𝑐
5.1 Clustering in Eq. (7) has 𝛼 2 = 0 (first-order connectivity only) and 𝛼 1 = 0
(higher-order only), respectively.
Experimental setup. For this experiment, we first run a Message
Passing (MP) layer; in this case a GCN model with skip connection
Results are reported in Table 1. HoscPool achieves better per-
for initial features [30]: X̄ = ReLU(AXΘ1 + XΘ2 ), where 𝚯1 and
formance than all baselines across most datasets. This trend is
𝚯2 are trainable weight matrices. It has 32 hidden units and ReLU
emphasised on synthetic datasets, where we know higher-order
activation function. We then run a Multi-Layer Perceptron (MLP)
structure is critical, proving the benefits of our clustering method.
with 32 hidden units to produce the cluster assignment matrix of
DiffPool often fails to converge to a good solution. MinCutPool,
dimension 𝑛𝑢𝑚_𝑛𝑜𝑑𝑒𝑠 × 𝑛𝑢𝑚_𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠, trained end-to-end by opti-
as evoked earlier and in [41], sometimes get stuck in degenerate
mising the unsupervised loss function L𝑚𝑐 +𝜇L𝑜 . This architecture
solutions (e.g., Amazon PC and Photo – all nodes are assigned to
is trained using a learning rate of 0.001 for an Adam optimizer, 500
less than 10% of clusters), failing completely to converge even when
epochs, a gradient clip of 2.0, 200 early stop patience, a learning
tuning model architecture and hyper-parameters (see Fig.1). HP-
decay patience of 25 and 𝜇 = {0, .1, 1}.
1 shows superior performance and alleviates this issue, meaning
Metrics. We evaluate the quality of S by comparing the distribution
of true node labels with the one of predicted labels, via Normalised
𝐻 ( ỹ) −𝐻 ( ỹ|y) 2 SCbased on motif conductance [4] instead of edge conductance; meaning SC applied
Mutual Information NMI( ỹ, y) = √ , where 𝐻 (·) is the on A𝑀 .
𝐻 ( ỹ) −𝐻 (y)

430
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Alexandre Duval & Fragkiskos Malliaros

that it can be considered as an improved version of MinCutPool. of 2.0, 100 early stop patience, a learning decay patience of 50 and
Spectral Clustering (SC) performs really well on some datasets, a regularisation parameter 𝜇 = {0, 0.1}.
poorly on others. MSC often performs badly, revealing its excessive Baselines. We compare our method to representative state-of-
dependence to the presence of motifs. On the contrary, our results the-art graph classification baselines, involving pooling operators
highlight the robustness of HoscPool to the limited presence of mo- DiffPool [49], MinCutPool [5], EigPool [25], SAGPool [19],
tifs due to its consideration for node features. Besides, HoscPool’s ASAP [33], GMT [3]; by replacing the pooling layer in the above
consideration for finer granularity levels allows to group nodes pipeline. We implement a random pooling operator (Random) to
primarily based on motifs while still considering edges when nec- assess the benefits of pooling similar nodes together, and a model
essary, which may be the reason of its superior performance with with a single global pooling operator (NoPool) to assess how useful
respect to HP-2, itself more desirable than HP-1 (edge-only). This leveraging hierarchical information is.
ablation study proves the relevance of our underlying claims: incor- Datasets. We use several common benchmark datasets for GC,
porating higher-order information leads to better communities and taken from TUDataset [29], including three bioinformatics protein
combining several motifs further help. See Table 7 for more results. datasets Proteins, Enzymes, D&D; one mutagen Mutagenicity; one
Complexity. The main complexity of HoscPool lies in the deriva- anticancer activity dataset NCI1; two chemical compound dataset
tion of A𝑀 , which remains relatively fast for triangles: A𝑀 = A2 ⊙A. Cox-2-MD, ER-MD; one social network Reddit-Binary. Bench-hard
In Table 2, we remark that HoscPool (and HP-2) has a comparable is taken from source where X and A are completely uninformative
running time with respect to MinCutPool on small or average if considered alone. We split them into training set (80%), valida-
size datasets. It is slower to compute than MinCutPool on large tion set (10%), and test set (10%). We adopt the accuracy metric to
datasets, while staying relatively affordable. This extra time lies measure performance and average the results over 10 runs, each
with the computation and processing of the motif adjacency matrix with a different split. We select the best model using validation set
as well as the combination of several connectivity order; which accuracy, and report the corresponding test set accuracy. For fea-
grows bigger with the graph size. Note however that we could avoid tureless graphs, we use constant features. Model hyperparameters
the computation of the regularisation loss, which both MinCut- are tuned for each dataset, but are kept fixed across all baselines.
Pool and DiffPool cannot afford. HP-1 is not reported as it shares Lastly, despite being used by all baselines, note that these datasets
similar times as MinCutPool while reaching better performance. are known to be small and noisy, leading to large errors.

Table 2: Running time (s) of the entire clustering experiment. Results are reported in Table 3, from which we draw the fol-
lowing conclusions. Performing pooling proves useful (NoPool)
in most cases. HoscPool compares favourably on all datasets w.r.t.
Dataset DiffPool MinCutPool HP-2 HoscPool pooling baselines. Higher-order connectivity patterns are more
Cora 13 16 17 24 desirable than first-order ones, and combining both is even bet-
PubMed 80 95 264 501 ter. It confirms findings from Section 5.1 and shows that better
Photo 23 48 91 182 clustering (i.e. graph coarsening) is correlated with better classifi-
PC 89 101 304 510 cation performance. However, while the clustering performance of
CS 157 251 683 1406 HoscPool is significantly better than baselines, the performance
Karate 9 9 9 9 gap has slightly closed down on this task. Even more surprising,
DBLP 126 210 635 1330 the benefits of existing advanced node-grouping or node-dropping
Polblogs 8 9 10 10 methods are not considerable with respect to the Random pooling
Email-eu 9 9 10 12 baseline. Faithfully to what we announced in Section 1, we attempt
to provide explanations.

5.2 Supervised graph classification 5.3 Pooling behaviour investigated

Experimental setup. We evaluate our pooling operator HoscPool First of all, we investigate the optimisation process of some key
on a plethora of graph classification (GC) tasks, for a fixed network pooling operators (e.g., MinCutPool, DiffPool). We notice that
architecture: GNN – Pooling – GNN – Pooling – GNN – Global they do not really learn to optimise their cluster assignment matrix
Pooling – Dense (×2). Again, the GNN chosen is a GCN with skip on these graph classification tasks, producing degenerate solutions
connection, as it was found more efficient than other GNNs (see where most nodes are assigned to few clusters (similarly to Fig.1).
ablation study in Table 5). We sometimes add skip connections This issue would explain why random pooling performs on par
and global pooling to the output of the first and second GNN; and with them; as they do not learn structurally meaningful clusters.
concatenate the resulting vector to the third GNN’s output. Each A potential solution to this problem is to design a clustering-
MP layer and final dense layer has between 16 and 64 hidden units based pooling operator allowing to capture faithfully a more ad-
depending on the dataset regarded, and ReLU activation function. vanced kind of relationship between nodes, which we tried to do
A Pooling block produces a cluster assignment matrix of dimension with HoscPool. We also tested a variety of architectures and opti-
𝑛𝑢𝑚_𝑛𝑜𝑑𝑒𝑠 × int(𝑛𝑢𝑚_𝑛𝑜𝑑𝑒𝑠 ∗ 0.25). The batch-size is different for misation options to see if learning would occur in specific situations.
every dataset, and ranges from 8 to 64. This architecture is trained For instance, we tested several GNNs, different model architectures,
using a learning rate for Adam of 0.001, 500 epochs, a gradient clip skip-connections, no supervised loss at the start, etc. (see ablation

431
Higher-order Clustering and Pooling for Graph Neural Networks CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.

Table 3: Graph classification accuracy. Top results are in bold, second best underlined.

Dataset NoPool Random GMT MinCutPool DiffPool EigPool SAGPool ASAP HP-1 HP-2 HoscPool
Proteins 71.6±4.1 75.7±3.2 75.0±4.2 75.9±2.4 73.8±3.7 74.2±3.1 70.6±3.5 74.4±2.6 76.7±2.5 77.0±3.1 77.5±2.3
NCI1 77.1±1.9 77.0±1.7 74.9±4.3 76.8±1.6 76.7±2.1 75.0±2.2 74.1±3.9 74.3±1.6 77.3±1.6 80.3±2.0 79.9±1.7
Mutagen. 78.1±1.3 79.2±1.3 79.4±2.2 78.6±1.8 77.9±2.3 75.2±2.7 74.4±2.7 76.8±2.4 79.8±1.6 81.7±2.1 82.3±1.3
DD 71.2±2.2 77.1±1.5 78.1±3.2 78.4±2.8 76.3±2.1 75.1±1.8 71.5±4.1 73.2±2.5 78.8±2.0 78.2±2.1 79.4±1.8
Reddit-B 80.1±2.6 89.3±2.6 86.7±2.6 89.0±1.4 87.3±2.4 82.8±2.1 74.7±4.5 84.1±1.1 91.2±1.0 92.8±1.5 93.6±0.9
Cox2-MD 58.7±3.2 62.9±3.6 58.9±3.6 58.9±5.1 57.1±4.8 59.8±3.4 56.9±9.7 60.5±5.5 61.6±3.5 66.4±4.6 64.6±3.9
ER-MD 72.2±2.9 73.0±4.5 74.3±4.5 75.5±4.0 76.8±4.8 73.1±3.8 71.7±8.2 74.5±5.9 76.2±4.2 77.9±4.3 78.2±3.8
b-hard 66.5±0.5 69.1±2.1 70.1±3.4 72.6±1.5 70.7±2.0 69.1±3.1 39.6±9.6 70.5±1.7 72.4±0.8 73.5±0.8 74.0±0.4

Table 4: (Left) Simple graph statistics. (Middle) The clustering coefficient (cc), proportion of triangles attached per node (triangle),
transivitiy (transi), homophily (homo) and proportion of node labels in a graph w.r.t. all graphs (diff labels) are computed on
each graph individually and averaged over the whole dataset. (Right) msc, sc and sc-mod denote motif conductance, normalised
cut, and modularity obtained by clustering each graph using traditional deterministic spectral clustering, where the number of
clusters is equal to the number of labels in a graph. The last column refers the NMI obtained through HoscPool clustering only.
All metrics provide information on graph community structure. Reddit-Binary has no node labels and is treated differently.

Datasets # graphs # edges av # nodes labels cc triangle transi homo diff-labels msc sc sc-mod NMI
Proteins 1,113 162,088 39 3 .575 1.03 .517 .476 .833 .034 .005 .460 .46
NCI1 4,110 132,753 29 37 .125 .125 .214 .667 .054 .111 0.0 .388 .71
DD 1,178 843,046 284 89 .496 2.0 .462 .058 .219 .021 .013 .402 .38
Mutagenicity 4,337 133,447 30 14 .002 .003 .002 .376 .244 .056 0.0 .378 .85
Reddit-Binary 2,000 995,508 429 no .051 .069 .009 - - .008 .011 .071 -
COX2-MD 303 203,084 26.2 7 1.00 103 1.00 .707 .482 .302 .333 .01 .45
ER-MD 446 209482 21.1 10 1.00 77.4 1.00 .701 .232 .331 .323 .01 .56

Table 5: Ablation study of HoscPool, denoted as Base. GIN, SAGE, GAT change the core GNN model; No-diag does not zero-
out the diagonal of S in the pooling step, 1-pooling uses an architecture with only one HoscPool block, skip-co adds a skip
connection between every GNN layer and the dense layer, c-ratio involves a higher clustering ratio and no-adapt refers to the
discussed dynamic adaptative loss. For dense-feat, we simply added some graph statistics to boost node identifiability.

Model Proteins NCI1 Mutagen. DD Reddit-B Cox2-MD ER-MD b-hard

Base 77.5 79.9 82.3 79.4 93.6 64.6 78.2 74.0
𝜇=0 76.4 78.9 80.7 78.1 93.6 64.2 75.4 73.1
not-ada 77.5 78.9 80.8 79.4 93.4 62.4 76.4 72.5
No-diag 77.7 77.2 80.0 78.9 90.2 60.9 74.6 70.7
SAGE 76.7 77.2 79.5 78.9 92.4 62.1 74.9 71.0
GAT 77.6 78.6 77.9 79.2 91.5 60.6 73.4 74.4
GIN 76.9 77.7 76.7 79.6 93.6 58.7 77.0 71.5
skip-co 77.2 77.7 80.5 79.5 93.9 61.8 76.6 71.8
1-pooling 76.6 79.9 82.3 78.3 90.5 63.6 77.4 74.0
c-ratio 75.1 77.4 80.3 78.9 92.3 61.6 75.3 70.4
dense-feat 77.2 79.4 80.0 78.7 92.0 58.5 73.2 70.8

study in Table 5). However, despite clear progress – we learn to graph classification model architecture and the nature of selected
decently optimise S, to assign nodes to more clusters and to better datasets.
balance the number of nodes per cluster – there still seems to be Concerning model architecture, we show in Appendix B that us-
room for improvement. We thus look for other potential causes ing more complex clustering frameworks (2-layer clustering: GNN
which could prevent a proper learning, especially targeting the – Pooling – GNN –Pooling) prevents totally the learning of mean-
ingful clusters for MinCutPool (and DiffPool), which illustrates

432
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Alexandre Duval & Fragkiskos Malliaros

a feature oversmoothing issue. HoscPool, on the other hand, has and efficient pooling operators, ensuring significant improvement
fixed this issue and still manages to learn meaningful clusters. Nev- over the random baseline for graph classification tasks.
ertheless, the learning process becomes longer and more difficult, Acknowledgements. Supported in part by ANR (French National
leading to a drop in performance. In addition to showing the ro- Research Agency) under the JCJC project GraphIA (ANR-20-CE23-
bustness of HoscPool with respect to existing pooling baselines, 0009-01).
this experiment reveals that the clustering performed in graph clas-
sification tasks may not lead to meaningful clusters because of the A SYNTHETIC DATASETS
more complex framework. Although it is likely to contribute, it is (1) syn1 is made of 𝑘 communities, each densely intra-connected by
probably a factor among others, since simpler GC models like GNN triangles. We then widely link these communities without creating
– Pooling – GNN – Global Pooling – Dense (1-pooling in Table 5) new triangles through these new links. We create random Gaussian
do not improve things. features (included one correlated to node labels) since our method
We therefore also look for answers from a dataset perspective. is dependent on node features.
In Table 4, the computed graph properties and clustering results (2) syn2 is an Erdős–Rényi random graph with 1,000 nodes and
on individual graphs suggest that graphs are relatively small, with 𝑝 = 0.012. Each node receives label 0 if it does not belong to a
few node types co-existing in a same graph, weak homophily and triangle and label 1 otherwise. Node features include several graph
a relatively poor community structure which clustering algorithms statistics.
would like to exploit. Besides, because most datasets do not have (3) syn3 is designed using a Gaussian random partition graph with 𝑘
dense node features (only labels), the node identifiablity assumption partitions with size drawn from a normal distribution. Nodes within
is shaken and does not enable our MLP (5) to fully distinguish the same partition are connected with probability 𝑝 = 0.8, while
between same-label-nodes, thus making it impossible to place them nodes across partitions with probability 0.2. Here, only random
in distinct clusters. On top of that, we now need to learn a clustering features are used.
pattern that extends to all graphs, which is a much more complex
task (compared to 1 graph in Section 5.1). B 2-LAYER CLUSTERING: PRECISIONS
As a result, taking into consideration the multiple pooling layers,
In this experiment, we complexify the clustering framework (MP –
the joint optimisation with a supervised loss, the poor individ-
MLP), making it more similar to its use as a pooling operator inside
ual graph community structure, and the complexity of learning to
supervised graph classification tasks. More precisely, we follow an
cluster all graphs with few features, learning meaningful clusters
architecture: MP – Pooling – MP – Pooling. As before, the pooling
becomes extremely challenging. This would explain the optimi-
step regroups an MLP to compute the first cluster assignment matrix
sation difficulties encountered by existing pooling operators so
S1 , and a graph coarsening step. In the end, we provide a unique
far. Although HoscPool makes a step towards better pooling, we
cluster assignment matrix S of dimension 𝑁 × 𝐾, composed of the
advice future research to explore more appropriate datasets than
two matrix derived above (S1 and S2 ), such that the probability that
TUDataset [29] even though it is used by all pooling baselines Í
node 𝑖 belongs to cluster 𝑘 is written S𝑖𝑘 = 𝑗 S1𝑖 𝑗 S2 𝑗𝑘 .
as benchmark, such as Open Graph Benchmark datasets (OGB).
The results, given in Table 6, are obtained using 1, 000 epochs
We also recommend to design simpler node-grouping approach,
with 𝑒𝑎𝑟𝑙𝑦_𝑠𝑡𝑜𝑝_𝑝𝑎𝑡𝑖𝑒𝑛𝑐𝑒 = 500—meaning using many more epochs
to use higher-order information so as to capture more relevant
than for standard 1-layer clustering. This is because the conver-
communities even with complex model architectures, as well as
gence to a desirable solution is weaker. Furthermore, the obtained
to exploit more directly graph structure information (as targeted
solution is less desirable and yields to a less desirable clustering.
graphs do not have dense node features). Finally, the heterophilious
Overall, this argument is very important as it suggests that the
nature of these datasets (Table 4) come to question the true benefit
clustering obtained in supervised graph classification tasks might
of grouping together nodes with similar embeddings (homophily
not be as accurate as what our original evaluation on real-world
assumption) when coarsening the graph.
dataset with ground-truth community structure suggested.

6 CONCLUSION Dataset MinCutPool HoscPool

We have introduced HoscPool, a new hierarchical pooling operator Cora 0.000 ± 0.000 0.369 ± 0.026
bringing higher-order information to the graph coarsening step, ul- PubMed 0.000 ± 0.000 0.187 ± 0.013
timately leading to motif-aware hierarchical graph representations. Photo 0.007 ± 0.005 0.230 ± 0.063
HoscPool builds on a novel end-to-end clustering scheme, which PC 0.004 ± 0.001 0.194 ± 0.042
designs an objective function combining several continuous relax- CS 0.446 ± 0.018 0.417 ± 0.025
ations of motif spectral clustering, avoiding the shortcomings of Karate 0.000 ± 0.000 0.745 ± 0.046
deterministic methods and solving the limitations of previous key DBLP 0.174 ± 0.078 0.244 ± 0.023
baselines DiffPool and MinCutPool. The proposed experiments, Polblogs 0.135 ± 0.101 0.994 ± 0.003
through cluster observation and pooling performance, demonstrate Email-eu 0.197 ± 0.016 0.421 ± 0.009
the advantages brought by considering higher-order connectivity
patterns and by combining flexibly different levels of motifs. Finally, Table 6: NMI of MinCutPool and HoscPool for 2-layer clus-
our discussion about the relevance of the pooling operation itself tering framework
aims to inspire and guide future research to design more adapted

433
Higher-order Clustering and Pooling for Graph Neural Networks CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.

C EXTENSION TO 4-NODES MOTIFS In practice however, unlike triangle normalised cut, this expres-
Here, we consider motifs composed of 4 nodes (|M | = 4), such as the sion is not easy to compute. First of all, computing the related motif
4-cycle or 𝐾4 , written as v = {𝑙, 𝑞, 𝑟, 𝑘 }. In Section 4.1, we formulated adjacency matrix is difficult; it cannot be written a simple matrix
a relation between triangle normalised cut and graph-normalised dot product. Secondly, there is this term on the RHS to take into
cut, in order to compute triangle normalised cut easily. Here, we consideration. And although we might be able to compute both
do the same, but for 4-nodes-motif conductance. Again, we derive directly via a complex algorithm, it is not guaranteed that solving
this problem is quicker than the original optimisation problem (def.
this relation by looking at a single cluster ( S with corresponding (𝐺 ) (𝐺 )
1 if 𝑖 ∈ S of vol𝑀 and cut𝑀 ).
cluster assignment vector y, with 𝑦𝑖 = .
0 else
( G)
∑︁
3cut𝑀 (S, S̄) = 3 1{∃𝑖, 𝑗 ∈ v|𝑖 ∈ S, 𝑗 ∈ S̄}
v∈ M REFERENCES
∑︁
[1] Ghadeer AbuOda, Gianmarco De Francisci Morales, and Ashraf Aboulnaga. 2019.
= 3(𝑦𝑙 + 𝑦𝑞 + 𝑦𝑟 + 𝑦𝑘 ) Link prediction via higher-order motif features. arXiv preprint arXiv:1902.06679
v∈ M (2019).
− 2(𝑦𝑙 𝑦𝑞 + 𝑦𝑙 𝑦𝑟 + 𝑦𝑙 𝑦𝑘 + 𝑦𝑞𝑦𝑟 + 𝑦𝑞𝑦𝑘 + 𝑦𝑟 𝑦𝑘 ) [2] James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks.
In Advances in neural information processing systems. 1993–2001.
− 1{exactly 2 of 𝑙, 𝑞, 𝑟, 𝑘 are in S} . [3] Jinheon Baek, Minki Kang, and Sung Ju Hwang. 2021. Accurate Learning of Graph
Representations with Graph Multiset Pooling. arXiv preprint arXiv:2102.11533
(2021).


 0 if all 𝑦𝑙 , 𝑦𝑞 , 𝑦𝑟 , 𝑦𝑘 are the same [4] Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organi-

 Í
This expression equals v∈ M 3 if 3 are the same zation of complex networks. Science 353, 6295 (2016), 163–166.
 [5] Filippo Maria Bianchi, Daniele Grattarola, and Cesare Alippi. 2020. Spectral clus-
 4 if 2 are the same.

tering with graph neural networks for graph pooling. In International Conference

Thus, on Machine Learning. PMLR, 874–883.
[6] Aldo G Carranza, Ryan A Rossi, Anup Rao, and Eunyee Koh. 2020. Higher-order
( G) clustering in complex heterogeneous networks. In Proceedings of the 26th ACM
3cut𝑀 (S, S̄) + 1{exactly 2 of 𝑙, 𝑞, 𝑟, 𝑘 are in S}
∑︁ SIGKDD International Conference on Knowledge Discovery & Data Mining. 25–35.
= 3(𝑦𝑙 + 𝑦𝑞 + 𝑦𝑟 + 𝑦𝑘 ) [7] Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring
and relieving the over-smoothing problem for graph neural networks from the
v∈ M topological view. In Proceedings of the AAAI Conference on Artificial Intelligence,
Vol. 34. 3438–3445.

− 2(𝑦𝑙 𝑦𝑞 + 𝑦𝑙 𝑦𝑟 + 𝑦𝑙 𝑦𝑘 + 𝑦𝑞𝑦𝑟 + 𝑦𝑞𝑦𝑘 + 𝑦𝑟 𝑦𝑘 )
[8] Fan Chung. 2007. Four proofs for the Cheeger inequality and graph partition
= y⊤ D𝑀 y − y⊤ A𝑀 y algorithms. In Proceedings of ICCM, Vol. 2. Citeseer, 378.
[9] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-
= y⊤ L𝑀 y tional neural networks on graphs with fast localized spectral filtering. Advances
in neural information processing systems 29 (2016), 3844–3852.
= cut ( G𝑀 ) (S, S̄), [10] Frederik Diehl. 2019. Edge contraction pooling for graph neural networks. arXiv
preprint arXiv:1905.10990 (2019).
where the second inequality holds because [11] Dhivya Eswaran, Srijan Kumar, and Christos Faloutsos. 2020. Higher-order label
homogeneity and spreading in graphs. In Proceedings of The Web Conference 2020.
( G)
∑︁ ∑︁
y⊤ D𝑀 y = (𝐴𝑀 )𝑖 𝑗 = vol ( G𝑀 ) (S) = |M |vol𝑀 (S) 2493–2499.
[12] Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Heinrich Müller. 2018.
𝑖∈S 𝑗 ∈V Splinecnn: Fast geometric deep learning with continuous b-spline kernels. In
( G) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
∑︁
= 3vol𝑀 (S) = 3 (𝑦𝑙 + 𝑦𝑞 + 𝑦𝑟 + 𝑦𝑘 ) 869–877.
v∈ M [13] Hongyang Gao and Shuiwang Ji. 2019. Graph u-nets. In international conference
∑︁ ∑︁ ∑︁ ∑︁ ∑︁ on machine learning. PMLR, 2083–2092.
[14] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation
y⊤ A𝑀 y = 𝑦𝑖 𝑦 𝑗 (𝐴𝑀 )𝑖 𝑗 = (𝐴𝑀 )𝑖 𝑗 = 1{𝑖, 𝑗 ∈ v} learning on large graphs. In Proceedings of the 31st International Conference on
𝑖∈V 𝑗 ∈V 𝑖,𝑗 ∈ S 𝑖,𝑗 ∈ S v∈ M Neural Information Processing Systems. 1025–1035.
∑︁ [15] Lun Hu, Jun Zhang, Xiangyu Pan, Hong Yan, and Zhu-Hong You. 2021. HiSCF:
= 2(𝑦𝑙 𝑦𝑞 + 𝑦𝑙 𝑦𝑟 + 𝑦𝑙 𝑦𝑘 + 𝑦𝑞𝑦𝑟 + 𝑦𝑞𝑦𝑘 + 𝑦𝑟 𝑦𝑘 ). leveraging higher-order structures for clustering analysis in biological networks.
v∈ M Bioinformatics 37, 4 (2021), 542–550.
[16] Rong Jin, Feng Kang, and Chris Ding. 2005. A probabilistic approach for opti-
Overall, we obtain the following equality: mizing spectral clustering. Advances in neural information processing systems 18
1 1 ∑︁ (2005), 571–578.
( G)
cut𝑀 (S, S̄) = cut ( G𝑀 ) (S, S̄) − 1{exactly 2 of 𝑙, 𝑞, 𝑟, 𝑘 ∈ S} [17] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph
3 3 convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
v∈ M [18] Christine Klymko, David Gleich, and Tamara G Kolda. 2014. Using triangles to im-
The optimisation problem can be written as: prove community detection in directed networks. arXiv preprint arXiv:1404.5874
(2014).
∑︁ cut ( G) (S𝑘 , S¯𝑘 ) [19] Junhyun Lee, Inyeop Lee, and Jaewoo Kang. 2019. Self-attention graph pooling.
𝑀 In International Conference on Machine Learning. PMLR, 3734–3743.
min
S ( G) [20] John Boaz Lee, Ryan A Rossi, Xiangnan Kong, Sungchul Kim, Eunyee Koh, and
𝑘 vol𝑀 (S𝑘 )
Anup Rao. 2018. Higher-order graph convolutional networks. arXiv preprint
∑︁ 1 cut ( G𝑀 ) (S𝑘 , S¯𝑘 ) − 1 Ív∈ M 1{exactly 2 nodes in v ∈ S} arXiv:1809.07697 (2018).
3 3 [21] Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009.
≡ min 1 vol ( G𝑀 ) (S )
S Community structure in large networks: Natural cluster sizes and the absence of
𝑘 3 𝑘
large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
⊤ ∑︁ Í
S A S v∈ M 1{exactly 2 nodes in v ∈ S} [22] Jianxin Li, Hao Peng, Yuwei Cao, Yingtong Dou, Hekai Zhang, Philip Yu, and
≡ min −Tr ⊤ 𝑀 − . Lifang He. 2021. Higher-order attribute-enhancing heterogeneous graph neural
S∈ [0,1] 𝑁 ×𝐾 S D𝑀 S
𝑘 vol ( G𝑀 ) (S𝑘 ) networks. IEEE Transactions on Knowledge and Data Engineering (2021).

434
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Alexandre Duval & Fragkiskos Malliaros

Table 7: Modularity (Mod), Conductance (Cond), Motif Conductance (M.Cond), Homogeneity (Homog) obtained by clustering
the nodes of various networks over 10 different runs. The number of clusters 𝐾 is equal to the number of node classes. HP-2
optimises better the motif conductance metric than MinCutPool. HoscPool achieves a similar motif conductance but a better
conductance than HP-2, which it also often outperforms in terms of modularity. Finally, MinCutPool does achieve degenerate
solutions for several datasets (e.g., PC, Photo, CS, Email-eu).

MinCutPool HP-2 HoscPool

Dataset Mod Cond M.Cond Homog Mod Cond M.Cond Homog Mod Cond M.Cond Homog
Cora 0.700 0.156 0.094 0.464 0.621 0.125 0.025 0.338 0.654 0.091 0.026 0.314
PubMed 0.532 0.120 0.047 0.225 0.478 0.069 0.029 0.101 0.454 0.082 0.038 0.096
CS −0.005 0.001 0.000 0.000 0.684 0.141 0.087 0.637 0.695 0.131 0.084 0.638
Photo 0.000 0.008 0.002 0.002 0.566 0.084 0.033 0.470 0.684 0.093 0.043 0.580
PC −0.001 0.000 0.000 0.000 0.546 0.285 0.263 0.457 0.591 0.149 0.082 0.556
DBLP 0.533 0.182 0.157 0.363 0.588 0.131 0.065 0.277 0.608 0.114 0.066 0.318
Karate 0.370 0.269 0.281 0.543 0.389 0.192 0.088 0.715 0.417 0.217 0.133 0.861
Email-eu 0.002 0.011 0.003 0.025 0.189 0.455 0.382 0.166 0.185 0.488 0.396 0.208
Polblogs 0.409 0.090 0.048 0.991 0.409 0.087 0.035 0.993 0.429 0.073 0.035 0.991

[23] Ning Liu, Songlei Jian, Dongsheng Li, Yiming Zhang, Zhiquan Lai, and Hongzuo [40] Konstantinos Sotiropoulos and Charalampos E Tsourakakis. 2021. Triangle-aware
Xu. 2021. Hierarchical Adaptive Pooling by Capturing High-order Dependency Spectral Sparsifiers and Community Detection. In Proceedings of the 27th ACM
for Graph Representation Learning. IEEE Transactions on Knowledge and Data SIGKDD Conference on Knowledge Discovery & Data Mining. 1501–1509.
Engineering (2021). [41] Anton Tsitsulin, John Palowitch, Bryan Perozzi, and Emmanuel Müller. 2020.
[24] Enxhell Luzhnica, Ben Day, and Pietro Lio. 2019. Clique pooling for graph Graph clustering with graph neural networks. arXiv preprint arXiv:2006.16904
classification. arXiv preprint arXiv:1904.00374 (2019). (2020).
[25] Yao Ma, Suhang Wang, Charu C Aggarwal, and Jiliang Tang. 2019. Graph convo- [42] Charalampos E Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. 2017.
lutional networks with eigenpooling. In Proceedings of the 25th ACM SIGKDD Scalable motif-aware graph clustering. In Proceedings of the 26th International
International Conference on Knowledge Discovery & Data Mining. 723–731. Conference on World Wide Web. 1451–1460.
[26] Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: [43] Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and
Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444. computing 17, 4 (2007), 395–416.
[27] Diego Mesquita, Amauri H Souza, and Samuel Kaski. 2020. Rethinking pooling [44] Dorothea Wagner and Frank Wagner. 1993. Between min cut and graph bisection.
in graph neural networks. arXiv preprint arXiv:2010.11418 (2020). In International Symposium on Mathematical Foundations of Computer Science.
[28] Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, Springer, 744–750.
and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. [45] Huan Wang, Shuicheng Yan, Dong Xu, Xiaoou Tang, and Thomas Huang. 2007.
Science 298, 5594 (2002), 824–827. Trace ratio vs. ratio trace for dimensionality reduction. In 2007 IEEE Conference
[29] Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, on Computer Vision and Pattern Recognition. IEEE, 1–8.
and Marion Neumann. 2020. Tudataset: A collection of benchmark datasets for [46] Yu Guang Wang, Ming Li, Zheng Ma, Guido Montufar, Xiaosheng Zhuang, and
learning with graphs. arXiv preprint arXiv:2007.08663 (2020). Yanan Fan. 2020. Haar graph pooling. In International conference on machine
[30] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric learning. PMLR, 9952–9962.
Lenssen, Gaurav Rattan, and Martin Grohe. 2019. Weisfeiler and leman go neural: [47] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful
Higher-order graph neural networks. In Proceedings of the AAAI Conference on are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
Artificial Intelligence, Vol. 33. 4602–4609. [48] Yuhua Xu, Junli Wang, Mingjian Guang, Chungang Yan, and Changjun Jiang.
[31] Yunsheng Pang, Yunxiang Zhao, and Dongsheng Li. 2021. Graph pooling via 2022. Multistructure Graph Classification Method With Attention-Based Pooling.
coarsened graph infomax. In Proceedings of the 44th International ACM SIGIR IEEE Transactions on Computational Social Systems (2022).
Conference on Research and Development in Information Retrieval. 2177–2181. [49] Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L Hamilton, and
[32] Alan Perotti, Paolo Bajardi, Francesco Bonchi, and André Panisson. 2022. Jure Leskovec. 2018. Hierarchical graph representation learning with differen-
GRAPHSHAP: Motif-based Explanations for Black-box Graph Classifiers. arXiv tiable pooling. arXiv preprint arXiv:1806.08804 (2018).
preprint arXiv:2202.08815 (2022). [50] Hualei Yu, Jinliang Yuan, Hao Cheng, Meng Cao, and Chongjun Wang. 2021.
[33] Ekagra Ranjan, Soumya Sanyal, and Partha Talukdar. 2020. Asap: Adaptive struc- GSAPool: Gated Structure Aware Pooling for Graph Representation Learning.
ture aware pooling for learning hierarchical graph representations. In Proceedings In 2021 International Joint Conference on Neural Networks (IJCNN). 1–8. https:
of the AAAI Conference on Artificial Intelligence, Vol. 34. 5470–5477. //doi.org/10.1109/IJCNN52387.2021.9534320
[34] Ryan A Rossi, Anup Rao, Sungchul Kim, Eunyee Koh, Nesreen K Ahmed, and [51] Hao Yuan and Shuiwang Ji. 2020. Structpool: Structured graph pooling via
Gang Wu. 2019. Higher-order ranking and link prediction: From closing triangles conditional random fields. In Proceedings of the 8th International Conference on
to closing higher-order motifs. arXiv preprint arXiv:1906.05059 (2019). Learning Representations.
[35] Satu Elisa Schaeffer. 2007. Graph clustering. Computer science review 1, 1 (2007), [52] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-
27–64. to-end deep learning architecture for graph classification. In Thirty-Second AAAI
[36] Thomas Schnake, Oliver Eberle, Jonas Lederer, Shinichi Nakajima, Kristof T Conference on Artificial Intelligence.
Schütt, Klaus-Robert Müller, and Grégoire Montavon. 2020. Higher-order explana- [53] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu,
tions of graph neural networks via relevant walks. arXiv preprint arXiv:2006.03589 Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks:
(2020). A review of methods and applications. AI Open 1 (2020), 57–81.
[37] Govind Sharma, Aditya Challa, Paarth Gupta, and M Narasimha Murty. 2021.
Higher-Order Relations Skew Link Prediction in Graphs. arXiv preprint
arXiv:2111.00271 (2021).
[38] Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation.
IEEE Transactions on pattern analysis and machine intelligence 22, 8 (2000), 888–
905.
[39] Martin Simonovsky and Nikos Komodakis. 2017. Dynamic edge-conditioned
filters in convolutional neural networks on graphs. In Proceedings of the IEEE
conference on computer vision and pattern recognition. 3693–3702.

435

The Impact of Jeepney Modernization To The Commuters of South Caloocan
100% (3)
The Impact of Jeepney Modernization To The Commuters of South Caloocan
61 pages
Cameron & Neal (2003)
No ratings yet
Cameron & Neal (2003)
484 pages
Medical English 4.4
No ratings yet
Medical English 4.4
37 pages
Application Letter For Safaricom
No ratings yet
Application Letter For Safaricom
4 pages
Tfn-Nursing Theories
100% (7)
Tfn-Nursing Theories
31 pages
Guide To Safety Edge
No ratings yet
Guide To Safety Edge
7 pages
Quantum Physics For Babies
No ratings yet
Quantum Physics For Babies
13 pages
NEET Chemistry Chapter Wise Mock Test - Physical Chemistry I - CBSE Tuts
No ratings yet
NEET Chemistry Chapter Wise Mock Test - Physical Chemistry I - CBSE Tuts
25 pages
ASTM D 422-63 (Reapproved 2002)
No ratings yet
ASTM D 422-63 (Reapproved 2002)
8 pages
Proceedings of The 1ST International Congress of The International Society of Sports Sciences in The Arab World
No ratings yet
Proceedings of The 1ST International Congress of The International Society of Sports Sciences in The Arab World
140 pages
Dose, Dilution and The LM Potencies
No ratings yet
Dose, Dilution and The LM Potencies
12 pages
Australian Standard: Inspection of Buildings Part 1: Pre-Purchase Inspections - Residential Buildings
No ratings yet
Australian Standard: Inspection of Buildings Part 1: Pre-Purchase Inspections - Residential Buildings
9 pages
Advertisement For Dav
No ratings yet
Advertisement For Dav
9 pages
Experiment No.: - Welding Procedure Specification (WPS) & Welder Performance Qualification (WPQ)
No ratings yet
Experiment No.: - Welding Procedure Specification (WPS) & Welder Performance Qualification (WPQ)
12 pages
Enhancing The Weather - Governance of Weather Modification Activit
No ratings yet
Enhancing The Weather - Governance of Weather Modification Activit
69 pages
Hilbert System Logic
No ratings yet
Hilbert System Logic
55 pages
Limit of PAC (1.5%) Analysis of The Effect of Polyanionic Cellulose On Viscosity and Filtrate Volume in Drilling Fluid
No ratings yet
Limit of PAC (1.5%) Analysis of The Effect of Polyanionic Cellulose On Viscosity and Filtrate Volume in Drilling Fluid
6 pages
MSC 417 PDF
No ratings yet
MSC 417 PDF
26 pages
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
No ratings yet
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
28 pages
Work and Energy
No ratings yet
Work and Energy
13 pages
Territorial Depth - Kris Scheerlinck 1-8
No ratings yet
Territorial Depth - Kris Scheerlinck 1-8
45 pages
Master of Public Management: Admission Requirements
No ratings yet
Master of Public Management: Admission Requirements
3 pages
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
No ratings yet
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
8 pages
Types of Load Pavement Failures in Kenya
No ratings yet
Types of Load Pavement Failures in Kenya
4 pages
İdi̇l Ören CV
No ratings yet
İdi̇l Ören CV
3 pages
Heritage Week - Plan
No ratings yet
Heritage Week - Plan
8 pages
The Role of Subject Knowledge in The Eff PDF
No ratings yet
The Role of Subject Knowledge in The Eff PDF
15 pages
Electronic Temperature Controllers: Multipact
No ratings yet
Electronic Temperature Controllers: Multipact
6 pages
Edge Contraction Pooling GNN
No ratings yet
Edge Contraction Pooling GNN
9 pages
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
No ratings yet
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
207 pages
Curvas Graficas de LM35
No ratings yet
Curvas Graficas de LM35
2 pages
Self Attention Graph Pooling
No ratings yet
Self Attention Graph Pooling
10 pages
29256-Article Text-33310-1-2-20240324
No ratings yet
29256-Article Text-33310-1-2-20240324
9 pages
Graph Neural Networks Methods Applications and Opp
No ratings yet
Graph Neural Networks Methods Applications and Opp
35 pages
Unit III GNN
No ratings yet
Unit III GNN
56 pages
Community Detection With Graph Neural Networks
No ratings yet
Community Detection With Graph Neural Networks
16 pages
Assignment No. 7 Chemical Engineering Fluid Dynamics Session 2016 Due Date: 16 May-2018 Solve All The Questions. (As A Part of Assessment of CLO3)
No ratings yet
Assignment No. 7 Chemical Engineering Fluid Dynamics Session 2016 Due Date: 16 May-2018 Solve All The Questions. (As A Part of Assessment of CLO3)
1 page
Attacking Graph Convolutional Networks Via Rewiring
No ratings yet
Attacking Graph Convolutional Networks Via Rewiring
12 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
Original Paper
No ratings yet
Original Paper
10 pages
GRL Unit 3
No ratings yet
GRL Unit 3
14 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
A Divisive Hierarchical Structural Clustering Algorithm For Networks
No ratings yet
A Divisive Hierarchical Structural Clustering Algorithm For Networks
6 pages
(2020 Arxiv) A Survey On The Expressive Power of Graph Neural Networks
No ratings yet
(2020 Arxiv) A Survey On The Expressive Power of Graph Neural Networks
42 pages
Ai Presentation
No ratings yet
Ai Presentation
71 pages
GNN Foundations Frontiers and Applications Chapter4
No ratings yet
GNN Foundations Frontiers and Applications Chapter4
21 pages
Hierarchical Graph Pooling With Structure Learning
No ratings yet
Hierarchical Graph Pooling With Structure Learning
9 pages
Graph Clustering With Graph Neural Networks: Anton Tsitsulin John Palowitch Bryan Perozzi Emmanuel Müller
No ratings yet
Graph Clustering With Graph Neural Networks: Anton Tsitsulin John Palowitch Bryan Perozzi Emmanuel Müller
21 pages
Thesis Master 2022 Application of GNN For Graph Classification
No ratings yet
Thesis Master 2022 Application of GNN For Graph Classification
81 pages
GCNN
No ratings yet
GCNN
11 pages
Hierarchical Message-Passing Graph Neural Networks: Zhiqiang Zhong Cheng-Te Li Jun Pang
No ratings yet
Hierarchical Message-Passing Graph Neural Networks: Zhiqiang Zhong Cheng-Te Li Jun Pang
28 pages
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
No ratings yet
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
23 pages
23 - AAAI - Substructure Aware Graph Neural Networks
No ratings yet
23 - AAAI - Substructure Aware Graph Neural Networks
9 pages
Graph Pooling
No ratings yet
Graph Pooling
26 pages
26005-Article Text-30068-1-2-20230626
No ratings yet
26005-Article Text-30068-1-2-20230626
9 pages
Algorithms Unit 2
No ratings yet
Algorithms Unit 2
71 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
Intro To GNN
No ratings yet
Intro To GNN
49 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Otcoarsening
No ratings yet
Otcoarsening
9 pages
GNN Foundations Frontiers and Applications Chapter9
No ratings yet
GNN Foundations Frontiers and Applications Chapter9
15 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Hierarchical Graph Neural Networks
No ratings yet
Hierarchical Graph Neural Networks
14 pages
Xu Et Al. 2023
No ratings yet
Xu Et Al. 2023
12 pages
Adaptive Graph Diffusion
No ratings yet
Adaptive Graph Diffusion
18 pages
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
No ratings yet
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
12 pages
Line Graph Neural Networks For Link Prediction: Lei Cai, Jundong Li, Jie Wang, and Shuiwang Ji
No ratings yet
Line Graph Neural Networks For Link Prediction: Lei Cai, Jundong Li, Jie Wang, and Shuiwang Ji
11 pages
25866-Article Text-29929-1-2-20230626
No ratings yet
25866-Article Text-29929-1-2-20230626
8 pages
Enadpool: The Edge-Node Attention-Based Differentiable Pooling For Graph Neural Networks
No ratings yet
Enadpool: The Edge-Node Attention-Based Differentiable Pooling For Graph Neural Networks
9 pages
Directed Graph Neural Networks
No ratings yet
Directed Graph Neural Networks
2 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
Approximation - and Quantization-Aware Training For Graph Neural Networks
No ratings yet
Approximation - and Quantization-Aware Training For Graph Neural Networks
14 pages
Improving Graph Neural Networks With Simple Architecture Design
No ratings yet
Improving Graph Neural Networks With Simple Architecture Design
10 pages
A Comparison Between Recursive Neural Networks and Graph Neural Networks
No ratings yet
A Comparison Between Recursive Neural Networks and Graph Neural Networks
8 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
DGCNN
No ratings yet
DGCNN
8 pages
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
No ratings yet
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
24 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
Hierarchical Graph Representation Learning With Differentiable Pooling
No ratings yet
Hierarchical Graph Representation Learning With Differentiable Pooling
9 pages
Original GNN
No ratings yet
Original GNN
22 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Featgraph: A Flexible and Efficient Backend For Graph Neural Network Systems
No ratings yet
Featgraph: A Flexible and Efficient Backend For Graph Neural Network Systems
12 pages
Learning From Labeled and Unlabeled Data On A Directed Graph
No ratings yet
Learning From Labeled and Unlabeled Data On A Directed Graph
8 pages
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GNNS
No ratings yet
GNNS
7 pages
Graph Transformer Networks: Corresponding Author
No ratings yet
Graph Transformer Networks: Corresponding Author
11 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
Fusion Graph Convolutional Networks
No ratings yet
Fusion Graph Convolutional Networks
10 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet

Higher-Order Clustering and Pooling For Graph Neural Networks

Uploaded by

Higher-Order Clustering and Pooling For Graph Neural Networks

Uploaded by

Higher-order Clustering and Pooling for Graph Neural Networks

Alexandre Duval Fragkiskos Malliaros

ABSTRACT all nodes’ embeddings to create a single graph representation, usu-

Similarly to other cluster-based pooling operators, our method HoscPool

5.2 Supervised graph classification 5.3 Pooling behaviour investigated

Model Proteins NCI1 Mutagen. DD Reddit-B Cox2-MD ER-MD b-hard

6 CONCLUSION Dataset MinCutPool HoscPool

MinCutPool HP-2 HoscPool

You might also like