0% found this document useful (0 votes)
39 views53 pages

Graph Convolutional Networks: Yunsheng Bai

The document summarizes graph convolutional networks (GCNs). It discusses improving GCNs by developing better convolutional and pooling operations on graphs. It also discusses applying GCNs to larger datasets and graphs. The document outlines defining convolution on graphs using graph Fourier transforms and the graph Laplacian. It proposes using localized and polynomial filters in the spectral domain to better model graph structures.

Uploaded by

FengShi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views53 pages

Graph Convolutional Networks: Yunsheng Bai

The document summarizes graph convolutional networks (GCNs). It discusses improving GCNs by developing better convolutional and pooling operations on graphs. It also discusses applying GCNs to larger datasets and graphs. The document outlines defining convolution on graphs using graph Fourier transforms and the graph Laplacian. It proposes using localized and polynomial filters in the spectral domain to better model graph structures.

Uploaded by

FengShi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Graph Convolutional

Networks
Yunsheng Bai
Overview
1. Improve GCN itself
a. Convolution
i. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
ii. Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)
iii. Dynamic Filters in Graph Convolutional Networks (2017)
b. Pooling (Unpooling)
i. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)

2. Apply to new/larger datasets/graphs


a. Use GCN as an auxiliary module
i. Structured Sequence Modeling with Graph Convolutional Recurrent Networks (2017)
b. Use GCN only
i. Node/Link classification/prediction: http://tkipf.github.io/misc/GCNSlides.pdf
1. Directed graph
a. Modeling Relational Data with Graph Convolutional Networks (2017)
ii. Graph classification, e.g. MNIST (with or without pooling)
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
b. Improvement 2: Deconvolution
Define Convolution for Graph
Laplace

http://www.norbertwiener.umd.edu/Research/lectures/2014/MBegue_Prelim.pdf
Graph Laplacian

http://www.norbertwiener.umd.edu/Research/lectures/2014/MBegue_Prelim.pdf
Graph Laplacian
Labeled graph Degree matrix Adjacency matrix Laplacian matrix

https://en.wikipedia.org/wiki/Laplacian_matrix
Graph Fourier Transform

L: (Normalized) Graph Laplacian

D: Degree Matrix

W: Adjacency Matrix

U: Eigenvectors of L (Orthonormal b.c. L is symmetric PSD)

Λ: Eigenvalues of L

: Fourier Transform of x

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
1-D Convolution

Ex. x = [0 1 2 3 4 5 6 7 8]

f = [4 -1 0]

y = [0*4 1*4+0*(-1) 2*4+1*(-1)+0*0 3*4+2*(-1)+1*0 ...]

= [0 4 7 10 ...]

I made this based on my EECS 351 lecture notes.


Convolution <--> Multiplication in Fourier Domain

View X and F as vectors

I made this based on my EECS 351 lecture notes.


Spectral Filtering
~ to the
Convolution: previous slide

Filter a signal:

“As we cannot express a


meaningful translation e1 x(1) y(1)
operator in the vertex 1
0 0
domain, the convolution e1 e2 e3 0 0 e2 x(2) = y(2)
2
operator on graph G is 0 0 3 e3
defined in the Fourier x(3) y(3)
domain”

Inverse Fourier Non-parametric Fourier Transform


Transform Filter of x

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
Spectral Filtering
e1 x(1)
1
0 0
e1 e2 e3 0 0 e2 x(2)
2
0 0 3 e3
x(3)

Inverse Fourier Non-parametric Fourier Transform


Transform Filter of x

*<e1,x> e1 + *<e2,x> e2 + *<e3,x> e3


= 1 2 3

Fourier Basis
Spectral Filtering
e1 x(1)
1
0 0
e1 e2 e3 0 0 e2 x(2)
2
0 0 3 e3
x(3)

Inverse Fourier Non-parametric Fourier Transform


Transform Filter of x

= *<e1,x> e1 + *<e2,x> e2 + *<e3,x> e3


1 2 3

The result of the convolution is the original signal:


(1) first Fourier Transformed
(2) then multiplied by a filter
(3) finally inverse Fourier Transformed
Spectral Filtering

Convolution:

Filter a signal:

e1 x(1) y(1)
1
0 0
e1 e2 e3 0 0 e2 x(2) = y(2)
2
0 0 3 e3
x(3) y(3)

Inverse Fourier Non-parametric Fourier Transform


Transform Filter of x

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
Better Filters

Convolution:

Filter a signal:

e1 x(1) y(1)
1
0 0
e1 e2 e3 0 0 e2 x(2) = y(2)
2
0 0 3 e3
x(3) y(3)

Inverse Fourier Localized & Fourier Transform


Transform Polynomial of x
Filter

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)
Better Filters: Localized

Labeled graph Degree matrix Adjacency matrix Laplacian matrix

L2

Wavelets on Graphs via Spectral Graph Theory (2011)


Better Filters: Localized
Filter:

Filter a signal:

x(1) x(1) x(1)


1-step neighbors 2-step neighbors
(2) (2)
x x x(2)
K=3
x(3) x(3) x(3)
= Θ 0* + Θ 1* * + Θ2*( )*
x(4) x(4) x(4)

x(5) x(5) x(5)

x(6) x(6) x(6)


Fixed Θ for every neighbor :(
(Dynamic Filters in Graph
Convolutional Networks
(2017))
Better Filters, but O(n2)

Convolution:

Filter a signal:

Filter:
s(1)
Computing
Eigenvectors e1 e2 e3 s(2)
O(n3) :(
s(3)
I am actually confused. They used Chebyshev
polynomials to approximate the filter, but at the end of the
day, the filtered signal is the same as the previous slide. O(n2) :(
In fact, authors of Semi-Supervised Classification with
Graph Convolutional Networks (ICLR 2017) set K=1, so
no Chebyshev at all.
Approximations
If K=1, filtering becomes:

If set (further approximate):

Then, filtering becomes:

If input is a matrix:

Then, filtering becomes:

Filter parameters:

Convolved signal matrix:

Filtering complexity:

Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)


Illustration of

D-1/2 * A * D-1/2 * X *

Feature 2 of
Word Embedding of Node 1 F F Node 1
i i
l l
L * X * t
e
t
e
= Z
r r
Feature 2 of
Word Embedding of Node 6 1 2
Node 6

Feature 1 of
Node 6

Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)


Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
b. Improvement 2: Deconvolution
Architecture of
Graph Convolutional Networks
Schematic Depiction

Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)


Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
b. Improvement 2: Deconvolution
Improvements:
Generalizable Graph Convolutional
Networks with Deconvolutional Layers
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
b. Improvement 2: Deconvolution
Improvement 1
Dynamic Filters -> Generalizable
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
i. Basics
ii. Ideas
iii. Ordering
iv. Example
b. Improvement 2: Deconvolution
Baseline Filter: Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)
0 12
0 0 0 12

...

0 0 0 0
11 11
3 2

0 0 0 0 1
11 11
Filter 4
0 0 0 0 1
All share 11 11

the same 5
parameter. 0 0 0 0
11 11 6

11 11 11 11
0 0

0 11 11 11 11
0
Poor Filter: Parameters All Different (No Sharing)
0 1212
0 0 0 1216

...

0 0 0 0
1112 1116
3 2

0 0 0 0 1
1121 1123
What if they Filter
share? 4
0 0 0 0 1
1131 1136

0 0 0 0 5
1145 1145 6

1151 1152 1153 1154


0 0
What if they
share? 0 0
1162 1163 1164 1165
Proposed Filter: Parameters Shared across Nodes with Same # of Neighbors
0 ’1221 0 0 0 ’1222

...

0 ’1121 0 0 0 ’1122
3 2

’1121 0 ’1122 0 0 0 1
Share the Filter
same ’112 4
’1121 0 0 0 0 ’1122 1

0 0 ’1121 0 ’1122 0 5
6

’1141 ’1142 ’1143 ’1144 0 0


Share the
same ’114 0 ’1141 ’1142 ’1143 ’1144 0
Proposed Filter: Total Size O(N2*F*C)
’1211 0 0 0 0 0

...

’1111 0 0 0 0 0
3 2

’1121 ’1122 0 0 0 0 Filter 1


1 4
’1131 ’1132 ’1133 0 0 0 (view
N=6, without
F=1, adjacency 5
C=2 ’1141 ’1142 ’1143 ’1143 0 0
info) 6

’1151 ’1152 ’1153 ’1154 ’1155 0

’1161 ’1162 ’1163 ’1164 ’1165 ’1166


Proposed Filter: Total Size O(nmax2*F*C) <= O(N2*F*C)
0 0 0 0 0 0

...

0 0 0 0 0 0
3 2

’1121 ’1122 0 0 0 0 Filter 1


1 4
N=6, 0 0 0 0 0 0 (view
F=1, without
C=2, adjacency 5
’1141 ’1142 ’1143 ’1144 0 0
nmax=4 info) 6

0 0 0 0 0 0

0 0 0 0 0 0
Proposed Filter: Generalizable to Regular CNN
0 0 0 0 0 0
Moving Filter
...
Stride == 1

0 0 0 0 0 0
0 0 0 0 0

’1121 ’1122 0 0 0 0 Filter 0 1 2 3 0


1
0 0 0 0 0 0 (view
without 0 4 5 6 0
adjacency
’1141 ’1142 ’1143 ’1144 0 0 0 0 0 0 0
info)

0 0 0 0 0 0 Regular 2-D image,


Regular CNN
0 0 0 0 0 0
Proposed Filter: More Sharing of Weights
0 0 0 0 0 0

...

0 0 0 0 0 0
3 2

’1121 ’1122 0 0 0 0 Filter 1


1 4
Weights from 0 0 0 0 0 0 (view
previous rows without
are related to adjacency 5
’1141 ’1142 ’1143 ’1144 0 0
later rows. info) 6

0 0 0 0 0 0

0 0 0 0 0 0
Proposed Filter: More Sharing of Weights
0 0 0 0 0 0

...

0 0 0 0 0 0
3 2

’1121 ’1122 0 0 0 0 Filter 1


1 4
But copy or 0 0 0 0 0 0 (view
other relations? without
If copy, adjacency 5
’1141 ’1142 ’1143 ’1144 0 0
randomly copy? info) 6

0 0 0 0 0 0

0 0 0 0 0 0
Proposed Filter: Soft Assignment (Dynamic Filters in Graph Convolutional Networks (2017))

0 ’1221 0 0 0 ’1222

...

0 0 0 0
1112 1116
3 2

0 0 0 0 1
1121 1123
Filter 4
Each one is a 1
0 0 0 0
weighted sum of 1131 1136

later ones. 5
Essentially soft 0 0 0 0
1145 1145 6
assignment.
’1141 ’1142 ’1143 ’1144 0 0

0 ’1141 ’1142 ’1143 ’1144 0


Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
i. Basics
ii. Ideas
iii. Ordering
iv. Example
b. Improvement 2: Deconvolution
1 2
Proposed Filter: Summary and More Ideas
Make them Add 2-step neighbors as less important 1-step neighbors.
look equal. Duplicate 1-step neighbors as less important dummy neighbors.
So treat them
the same. Convert all the 1-step neighbors into one neighbor.

...
Nodes are not Share weights if same # of neighbors.
Respect
created equal. Nodes with small # of neighbors have independent weights from the large.
diversity.
They have
Treat
different # of Share weights if same # of neighbors.
differently. Nodes with small # of neighbors randomly copy weights from the large.
neighbors.
All nodes kind of share the same weights but actually have different weights.
Nodes with small # of neighbors have weights softly assigned from the large.
Respect ...
diversity.
Treat the Sequences from random walk and neighbors are fed to LSTM.
same. Graph LSTM: Variant version of LSTM on graph.
...
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
i. Basics
ii. Ideas
iii. Ordering
iv. Example
b. Improvement 2: Deconvolution
?
? ?

Proposed Filter: Ordering of Neighbors ?


1
?

1. No ordering.

No generalizability :(

But works well in

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016)

and

Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)


?
? ?

Proposed Filter: Ordering of Neighbors ?


1
?

2. Soft assign every neighbor to all weights. Assignments are learnable.

Implicit ordering.

O(N*M) additional parameters to learn.

Dynamic Filters in Graph Convolutional Networks (2017)


?
? ?

Proposed Filter: Ordering of Neighbors ?


1
?

3. Hard assign every neighbor to an ordering. Assignments are learnable.

[0 0 1 0 0 0 0]
[0 0 0 0 0 1 0]
[0 0 0 1 0 0 0]
2
3 5
1
[0 0 0 0 1 0 0] 4 6 [0 0 0 0 0 0 1]

Explicit ordering.
[0 0 0 … 0 1 0 … 0 0 0]
O(N*M)/O(N) additional parameters to learn.
5! = 120
?
? ?

Proposed Filter: Ordering of Neighbors ?


1
?

4. Hard assign every neighbor to an ordering. Assignments are fixed, e.g. rank.

2
3 5
1
4 6

Explicit ordering.

No additional parameters to learn.


Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
i. Basics
ii. Ideas
iii. Ordering
iv. Example
b. Improvement 2: Deconvolution
Example: Share Weights If Same # of Neighbors
matrix1 = tf.constant([[[[0, 1, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0,
0, 0, 0, 0]], \

[[0, 2, 0, 0, 0, 2], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]], \

# Another filter/feature

[[[0, 3, 0, 0, 0, 3], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]], \

[[0, 4, 0, 0, 0, 4], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]]])

Shape: (2, 2, 6, 6)
3 2
0 2 0 0 0 2 0 4 0 0 0 4
... ... 1
0 1 0 0 0 1 0 3 0 0 0 3
0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter 4
0 0 0 0 0 0 1 0 0 0 0 0 0 2

0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 5
6
0 0 0 0 0 0 0 0 0 0 0 0
Input Data
matrix2 = tf.constant([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])

matrix2 = tf.reshape(matrix2, (2, 1, 6))

Channel Channel
1 2

Node 1 1 7
2 8
3 9
4 10
5 11
Node 6 6 12
Convolution: Step 1
product = tf.multiply(matrix1, matrix2)

with tf.Session() as sess:

result = sess.run(product)

0 2 0 0 0 2 0 4 0 0 0 4 Channel Channel 0 16 0 0 0 24 0 32 0 0 0 48
1 2
... ... ... ...
0 1 0 0 0 1 0 3 0 0 0 3 1 7 0 2 0 0 0 6 0 6 0 0 0 18
0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter * 2 8 0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter
1 2 (element- 3 9 = 1 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 0 0 wise) 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 4 10 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 5 11 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 6 12 0 0 0 0 0 0 0 0 0 0 0 0
Shape: (2, 2, 6, 6) Shape: (2, 1, 6) Shape: (2, 2, 6, 6)
Convolution: Step 2
reduced = tf.transpose(tf.reduce_sum(product, [1, 3]))

with tf.Session() as sess:

result = sess.run(reduced)

0 16 0 0 0 24 0 32 0 0 0 48 Feature Feature
1 2
... ...
0 2 0 0 0 6 0 6 0 0 0 18 48 104
0 0 0 0 0 0 Filter 0 0 0 0 0 0 Filter
0 0
= 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 2

0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Shape: (2, 2, 6, 6)
Roadmap
1. Define Convolution for Graph
2. Architecture of Graph Convolutional Networks
3. Improvements: Generalizable Graph Convolutional Networks with
Deconvolutional Layers
a. Improvement 1: Dynamic Filters -> Generalizable
b. Improvement 2: Deconvolution
Improvement 2
Deconvolution
Why Deconvolution?
To visualize/understand/probe an existing GCN.
Pooling <--> Unpooling
https://www.quora.com/How-does-a-deconvolutional-neural-network-work

In progress.

You might also like