Graph Convolutional Networks: Yunsheng Bai
Graph Convolutional Networks: Yunsheng Bai
Networks
Yunsheng Bai
Overview
1. Improve GCN itself
a. Convolution
i. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
ii. Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)
iii. Dynamic Filters in Graph Convolutional Networks (2017)
b. Pooling (Unpooling)
i. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
http://www.norbertwiener.umd.edu/Research/lectures/2014/MBegue_Prelim.pdf
Graph Laplacian
http://www.norbertwiener.umd.edu/Research/lectures/2014/MBegue_Prelim.pdf
Graph Laplacian
Labeled graph Degree matrix Adjacency matrix Laplacian matrix
https://en.wikipedia.org/wiki/Laplacian_matrix
Graph Fourier Transform
D: Degree Matrix
W: Adjacency Matrix
Λ: Eigenvalues of L
: Fourier Transform of x
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
1-D Convolution
Ex. x = [0 1 2 3 4 5 6 7 8]
f = [4 -1 0]
= [0 4 7 10 ...]
Filter a signal:
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
Spectral Filtering
e1 x(1)
1
0 0
e1 e2 e3 0 0 e2 x(2)
2
0 0 3 e3
x(3)
Fourier Basis
Spectral Filtering
e1 x(1)
1
0 0
e1 e2 e3 0 0 e2 x(2)
2
0 0 3 e3
x(3)
Convolution:
Filter a signal:
e1 x(1) y(1)
1
0 0
e1 e2 e3 0 0 e2 x(2) = y(2)
2
0 0 3 e3
x(3) y(3)
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
Better Filters
Convolution:
Filter a signal:
e1 x(1) y(1)
1
0 0
e1 e2 e3 0 0 e2 x(2) = y(2)
2
0 0 3 e3
x(3) y(3)
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
Better Filters: Localized
L2
Filter a signal:
Convolution:
Filter a signal:
Filter:
s(1)
Computing
Eigenvectors e1 e2 e3 s(2)
O(n3) :(
s(3)
I am actually confused. They used Chebyshev
polynomials to approximate the filter, but at the end of the
day, the filtered signal is the same as the previous slide. O(n2) :(
In fact, authors of Semi-Supervised Classification with
Graph Convolutional Networks (ICLR 2017) set K=1, so
no Chebyshev at all.
Approximations
If K=1, filtering becomes:
If input is a matrix:
Filter parameters:
Filtering complexity:
D-1/2 * A * D-1/2 * X *
Feature 2 of
Word Embedding of Node 1 F F Node 1
i i
l l
L * X * t
e
t
e
= Z
r r
Feature 2 of
Word Embedding of Node 6 1 2
Node 6
Feature 1 of
Node 6
...
0 0 0 0
11 11
3 2
0 0 0 0 1
11 11
Filter 4
0 0 0 0 1
All share 11 11
the same 5
parameter. 0 0 0 0
11 11 6
11 11 11 11
0 0
0 11 11 11 11
0
Poor Filter: Parameters All Different (No Sharing)
0 1212
0 0 0 1216
...
0 0 0 0
1112 1116
3 2
0 0 0 0 1
1121 1123
What if they Filter
share? 4
0 0 0 0 1
1131 1136
0 0 0 0 5
1145 1145 6
...
0 ’1121 0 0 0 ’1122
3 2
’1121 0 ’1122 0 0 0 1
Share the Filter
same ’112 4
’1121 0 0 0 0 ’1122 1
0 0 ’1121 0 ’1122 0 5
6
...
’1111 0 0 0 0 0
3 2
...
0 0 0 0 0 0
3 2
0 0 0 0 0 0
0 0 0 0 0 0
Proposed Filter: Generalizable to Regular CNN
0 0 0 0 0 0
Moving Filter
...
Stride == 1
0 0 0 0 0 0
0 0 0 0 0
...
0 0 0 0 0 0
3 2
0 0 0 0 0 0
0 0 0 0 0 0
Proposed Filter: More Sharing of Weights
0 0 0 0 0 0
...
0 0 0 0 0 0
3 2
0 0 0 0 0 0
0 0 0 0 0 0
Proposed Filter: Soft Assignment (Dynamic Filters in Graph Convolutional Networks (2017))
0 ’1221 0 0 0 ’1222
...
0 0 0 0
1112 1116
3 2
0 0 0 0 1
1121 1123
Filter 4
Each one is a 1
0 0 0 0
weighted sum of 1131 1136
later ones. 5
Essentially soft 0 0 0 0
1145 1145 6
assignment.
’1141 ’1142 ’1143 ’1144 0 0
...
Nodes are not Share weights if same # of neighbors.
Respect
created equal. Nodes with small # of neighbors have independent weights from the large.
diversity.
They have
Treat
different # of Share weights if same # of neighbors.
differently. Nodes with small # of neighbors randomly copy weights from the large.
neighbors.
All nodes kind of share the same weights but actually have different weights.
Nodes with small # of neighbors have weights softly assigned from the large.
Respect ...
diversity.
Treat the Sequences from random walk and neighbors are fed to LSTM.
same. Graph LSTM: Variant version of LSTM on graph.
...
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
i. Basics
ii. Ideas
iii. Ordering
iv. Example
b. Improvement 2: Deconvolution
?
? ?
1. No ordering.
No generalizability :(
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
and
Implicit ordering.
[0 0 1 0 0 0 0]
[0 0 0 0 0 1 0]
[0 0 0 1 0 0 0]
2
3 5
1
[0 0 0 0 1 0 0] 4 6 [0 0 0 0 0 0 1]
Explicit ordering.
[0 0 0 … 0 1 0 … 0 0 0]
O(N*M)/O(N) additional parameters to learn.
5! = 120
?
? ?
4. Hard assign every neighbor to an ordering. Assignments are fixed, e.g. rank.
2
3 5
1
4 6
Explicit ordering.
[[0, 2, 0, 0, 0, 2], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]], \
# Another filter/feature
[[[0, 3, 0, 0, 0, 3], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]], \
[[0, 4, 0, 0, 0, 4], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]]])
Shape: (2, 2, 6, 6)
3 2
0 2 0 0 0 2 0 4 0 0 0 4
... ... 1
0 1 0 0 0 1 0 3 0 0 0 3
0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter 4
0 0 0 0 0 0 1 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 5
6
0 0 0 0 0 0 0 0 0 0 0 0
Input Data
matrix2 = tf.constant([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
Channel Channel
1 2
Node 1 1 7
2 8
3 9
4 10
5 11
Node 6 6 12
Convolution: Step 1
product = tf.multiply(matrix1, matrix2)
result = sess.run(product)
0 2 0 0 0 2 0 4 0 0 0 4 Channel Channel 0 16 0 0 0 24 0 32 0 0 0 48
1 2
... ... ... ...
0 1 0 0 0 1 0 3 0 0 0 3 1 7 0 2 0 0 0 6 0 6 0 0 0 18
0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter * 2 8 0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter
1 2 (element- 3 9 = 1 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 0 wise) 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 4 10 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 5 11 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 6 12 0 0 0 0 0 0 0 0 0 0 0 0
Shape: (2, 2, 6, 6) Shape: (2, 1, 6) Shape: (2, 2, 6, 6)
Convolution: Step 2
reduced = tf.transpose(tf.reduce_sum(product, [1, 3]))
result = sess.run(reduced)
0 16 0 0 0 24 0 32 0 0 0 48 Feature Feature
1 2
... ...
0 2 0 0 0 6 0 6 0 0 0 18 48 104
0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter
0 0
= 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Shape: (2, 2, 6, 6)
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
b. Improvement 2: Deconvolution
Improvement 2
Deconvolution
Why Deconvolution?
To visualize/understand/probe an existing GCN.
Pooling <--> Unpooling
https://www.quora.com/How-does-a-deconvolutional-neural-network-work
In progress.