Graph Pooling
Graph Pooling
Cao Yu
Sep. 3, 2020
What is graph pooling?
Graph neural network (GNN) has been widely used in message propagating
between nodes in graph data, obtaining topology-aware node representation.
But under some circumstances (e.g. graph classification), we need to obtain the
representation of the whole graph (or higher-level) instead of each raw node and
edge. Thus the graph need to be downsampled gradually and finally into
representation with smaller scale, which is called graph pooling, just like pooling
in images.
Two typical kinds of approaches
1) Clustering-based method: cluster original graph into several subgraphs in each
time of pooling
2) Sorting-based method: ranking nodes and only remain partial of them in each
time of pooling
NIPS2018
Main idea
Using a learnable assignment matrix in each layer to transform the
representation of all current nodes and edges into a smaller size, in other words,
coarsen the graph representation whose node amount is smaller than current
node amount.
GNN, who takes information from all nodes and edges, will be stacked together
with pooling layer to make sure that each node can obtain the information of whole
topological and other nodes.
DiffPool Pooling Layer
A graph can be represented by G=(A,X), is the adjacency matrix and
X is the node feature matrix.
GNN takes A and X as input and Graph Convolutional Networks (GCNs) can be
represented as Z=GNN(A,X)= .
To better train the pooling layer, an auxiliary link prediction objective is added,
encoding the intuition that nearby nodes should be pooled together using
Frobenius norm.
ICML2020
Main idea
Also based on cluster approach, it solves the clustering by regarding it as a K-way
normalized MinCut problem, in which splitting the graph into K disjoint subgraphs
by removing the minimum volume of edges. It is equivalent to
MinCut Problem
A graph can be represented by G=(A, X) in which is adjacency matrix
and is node feature matrix. Given a cluster matrix assignment matrix
s , the MinCut problem is expressed as
The unsupervised loss from each layer will be added to the original training loss
for the specific task.
Graph U-Nets (TopK Pool)
ICML2019
Main idea
As a sort-based method, it uses a projection vector to transform nodes into
corresponding scores in each pooling. Then only nodes with TopK scores along
with related edges are remained as the input of next processing.
It should be noted that the score is only based on representation of each node
independently.
TopK Pooling Layer
Similarly, given a graph G=(A, X) , a trainable projection vector p is used to get the
scores for each node
The scores y after activation will be used to weight as a gate to obtain the
node feature in the next layer
Such gate is essential which can make the whole procedure differentiable,
otherwise the top-k selection will be a discrete operation.
TopK Pooling Layer
Different from cluster-based method, there is no auxiliary loss for sort-based
method, the whole model will be trained end-to-end, whose loss is the same as
the specific task.
The reason is that there is no significant unsupervised loss for designing the
ranking function.
Unpooling and encoder-decoder architecture
Actually this paper also introduce a graph encoder-decoder frame, in which
pooling and unpooling are two important components.
ICML2019
Main idea
Similarly, indices of nodes with high scores are selected based on a ratio k
These scores for selected nodes are also gating weights for node features after
filtering, such procedure is the same as TopK Pooling.
SAG Pooling Layer
There are also variants for calculating the ranking scores.
The node number in original graph and new graph after pooling is N and K
respectively, d is node feature dimension of current layer.
MinCutPool: space O(NK) (matrix S), time O(NK(N+K)) (loss term Lc).
SAG Pool: space space O(Kd) (GNN), time O(N3) (GNN to obtain the ranking
scores).
Thanks and QA