0% found this document useful (0 votes)
33 views26 pages

Graph Pooling

Graph pooling methods aim to downsample graph data to obtain a representation of the whole graph. There are two main approaches: clustering-based and sorting-based. Clustering methods like DiffPool and MinCutPool cluster nodes into subgraphs, while sorting methods like TopK Pool and SAG Pool rank nodes and select top nodes. DiffPool uses a learnable matrix to assign nodes to clusters and auxiliary losses to train the pooling layer. MinCutPool formulates clustering as a normalized min cut problem. TopK Pool selects top nodes based on projection scores, while SAG Pool uses self-attention to rank nodes based on topology. Performance evaluations on graph classification tasks show clustering methods generally outperform sorting methods, at the cost of higher complexity

Uploaded by

mrboss0533
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views26 pages

Graph Pooling

Graph pooling methods aim to downsample graph data to obtain a representation of the whole graph. There are two main approaches: clustering-based and sorting-based. Clustering methods like DiffPool and MinCutPool cluster nodes into subgraphs, while sorting methods like TopK Pool and SAG Pool rank nodes and select top nodes. DiffPool uses a learnable matrix to assign nodes to clusters and auxiliary losses to train the pooling layer. MinCutPool formulates clustering as a normalized min cut problem. TopK Pool selects top nodes based on projection scores, while SAG Pool uses self-attention to rank nodes based on topology. Performance evaluations on graph classification tasks show clustering methods generally outperform sorting methods, at the cost of higher complexity

Uploaded by

mrboss0533
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Graph Pooling

Cao Yu
Sep. 3, 2020
What is graph pooling?
Graph neural network (GNN) has been widely used in message propagating
between nodes in graph data, obtaining topology-aware node representation.

But under some circumstances (e.g. graph classification), we need to obtain the
representation of the whole graph (or higher-level) instead of each raw node and
edge. Thus the graph need to be downsampled gradually and finally into
representation with smaller scale, which is called graph pooling, just like pooling
in images.
Two typical kinds of approaches
1) Clustering-based method: cluster original graph into several subgraphs in each
time of pooling

Representative methods: DiffPool (Stanford University, NIPS 2018), MinCutPool


(Politecnico di Milano, ICML 2020)

2) Sorting-based method: ranking nodes and only remain partial of them in each
time of pooling

Representative methods: Top-k Pool (Texas AM University, ICML 2019),


SAGPool (Korea University, ICML 2019)
Hierarchical Graph Representation Learning with
Differentiable Pooling (DiffPool)

Rex Ying, Jiaxuan You, Christopher Morris, et al.

NIPS2018
Main idea
Using a learnable assignment matrix in each layer to transform the
representation of all current nodes and edges into a smaller size, in other words,
coarsen the graph representation whose node amount is smaller than current
node amount.

GNN, who takes information from all nodes and edges, will be stacked together
with pooling layer to make sure that each node can obtain the information of whole
topological and other nodes.
DiffPool Pooling Layer
A graph can be represented by G=(A,X), is the adjacency matrix and
X is the node feature matrix.

GNN takes A and X as input and Graph Convolutional Networks (GCNs) can be
represented as Z=GNN(A,X)= .

At layer l, an assignment matrix is , in which nl and nl+1 are nodes


(cluster) amount in layer l and layer l+1. Then the feature matrix Z after GNN and
adjacency matrix can be clustered by
DiffPool Pooling Layer
To generate the assignment matrix, another GNN is used on the feature matrix
and adjacency matrix, along with softmax function.

To better train the pooling layer, an auxiliary link prediction objective is added,
encoding the intuition that nearby nodes should be pooled together using
Frobenius norm.

Another auxiliary loss is entropy of cluster assignment so that lower entropy


means each cluster is more clearly defined.

These two loss will be added to the final training loss.


Spectral Clustering with Graph Neural Networks
for Graph Pooling (MinCutPool)

Filippo Maria Bianchi, Daniele Grattarola, Cesare


Alippi

ICML2020
Main idea
Also based on cluster approach, it solves the clustering by regarding it as a K-way
normalized MinCut problem, in which splitting the graph into K disjoint subgraphs
by removing the minimum volume of edges. It is equivalent to
MinCut Problem
A graph can be represented by G=(A, X) in which is adjacency matrix
and is node feature matrix. Given a cluster matrix assignment matrix
s , the MinCut problem is expressed as

A near-optimal can be obtained by

Such problem is still no-convex, but it can be approximated by gradient descent.


MinCutPool Layer
Similar to DiffPool, GNN will be used before pooling

A assignment matrix S is generated via MLP, , softmax is used to


guarantee that and the sum of each row is 1.

Graph will be corsened using S

New adjacency matrix is zero-diagonal


MinCutPool Layer
The unsupervised loss function is composed of two terms,

is cut loss that encourages strongly connected nodes to be clustered together,


whose maximum value is 0 when cluster assignments are orthogonal. is
orthogonality loss encourages cluster to be orthogonal and clusters in similar size.

The unsupervised loss from each layer will be added to the original training loss
for the specific task.
Graph U-Nets (TopK Pool)

Hongyang Gao, Shuiwang Ji

ICML2019
Main idea
As a sort-based method, it uses a projection vector to transform nodes into
corresponding scores in each pooling. Then only nodes with TopK scores along
with related edges are remained as the input of next processing.

It should be noted that the score is only based on representation of each node
independently.
TopK Pooling Layer
Similarly, given a graph G=(A, X) , a trainable projection vector p is used to get the
scores for each node

Then top-k node index idx will be ranked based on y

Corresponding node features and edges will be chosen

The scores y after activation will be used to weight as a gate to obtain the
node feature in the next layer

Such gate is essential which can make the whole procedure differentiable,
otherwise the top-k selection will be a discrete operation.
TopK Pooling Layer
Different from cluster-based method, there is no auxiliary loss for sort-based
method, the whole model will be trained end-to-end, whose loss is the same as
the specific task.

The reason is that there is no significant unsupervised loss for designing the
ranking function.
Unpooling and encoder-decoder architecture
Actually this paper also introduce a graph encoder-decoder frame, in which
pooling and unpooling are two important components.

For unpooling (upsampling) on the same data, a distribute function is used in


which the graph structure before pooling is remained and only node representation
who are selected by TopK are remained in corresponding position, while features
of other nodes are zero
Self-Attention Graph Pooling
(SAG Pool)

Junhyun Lee, Inyeop Lee, Jaewoo Kang

ICML2019
Main idea

As a sort-based method, to sort the nodes, it obtains ranking scores based on


GNN who considers the topology of graph rather than barely independent node
features as TopK Pool.
SAG Pooling Layer
Given a graph G=(A, X) in, the self-attention score is calculated by GCN

Similarly, indices of nodes with high scores are selected based on a ratio k

These scores for selected nodes are also gating weights for node features after
filtering, such procedure is the same as TopK Pooling.
SAG Pooling Layer
There are also variants for calculating the ranking scores.

1) Condering two-hop neighbors by adding the square of adjacency matrix

2) Stacking GNN layers for indirect aggregation of multi-hop nodes

3) Average attention score among M GNNs (like ensemble)


Model Architecture
It also proposes two architectures,
one is global pooling in which there
is only one pooling layer after several
stacked GCN layer, and the other one
is hierarchical pooling in which a
pooling layer is stacked together with
a GCN layer.
Performance comparison of methods
The most common approach is using graph classification task, each graph will be
transformed into a fixed-length feature then using MLP to make classification.

Some statistics of common datasets are shown as below


Dataset samples classes avg. nodes avg. edges node labels
DD 1178 2 284.32 715.66 yes
PROTEINS 1113 2 39.06 72.82 yes
NCI1 4410 2 29.87 32.30 yes
NCI109 4127 2 29.68 32.13 yes
Mutagencity 4337 2 30.32 30.77 yes
COLLAB 5000 3 74.49 2457.78 no
Reddit-binary 2000 2 429.63 497.75 no
Performance comparison of methods
Classification accuracy of all above models and baseline avg-Pool (average
pooling after the same number of GCNs)

Generally speaking, clustering-based method is superior to sort-based ones, with


the cost of higher complexity.
Avg method is comparable or even better than sort-based methods when graph
scale is small, but it fails when graph is large.

Methods DD PROTEINS NCI1 NCI109 Mutagenicity COLLAB Reddit-binary


AvgPool 73.05% 71.55% 70.89% 69.62% 79.63% 70.62% 82.41%
DiffPool 79.30% 72.70% - - 77.60% 81.80% 80.80%
MincutPool 79.56% 75.88% 76.77% 74.97% 79.24% 82.89% 83.35%
TopK Pool 75.01% 71.10% 67.02% 66.12% 73.67% 77.56% 74.70%
SAG Pool 76.45% 71.86% 67.45% 67.86% 74.52% 79.20% 73.90%
Complexity comparison

The node number in original graph and new graph after pooling is N and K
respectively, d is node feature dimension of current layer.

DiffPool: space O(Kd) (GNN), time O(NK(N+K+d)+N2(2N+d)) (GNN to obtain S


and cluster edges).

MinCutPool: space O(NK) (matrix S), time O(NK(N+K)) (loss term Lc).

TopK Pool: space O(d) (projection vector p), time O(Nd+NlogN+Kd) .

SAG Pool: space space O(Kd) (GNN), time O(N3) (GNN to obtain the ranking
scores).
Thanks and QA

You might also like