0% found this document useful (0 votes)
79 views29 pages

Graphrnn: A Deep Generative Model For Graphs

GraphRNN is a deep generative model for graphs that can model and efficiently sample complex distributions over graphs. It decomposes graph generation into two recurrent neural networks: a graph-level RNN that generates a sequence of nodes, and an edge-level RNN that generates the sequence of edges for each new node. GraphRNN represents a graph as a sequence by ordering its nodes, and models the joint distribution over graphs as a product of conditional distributions. It infers graphs by recursively predicting the adjacency vectors of nodes given previously generated structure.

Uploaded by

P6E7P7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views29 pages

Graphrnn: A Deep Generative Model For Graphs

GraphRNN is a deep generative model for graphs that can model and efficiently sample complex distributions over graphs. It decomposes graph generation into two recurrent neural networks: a graph-level RNN that generates a sequence of nodes, and an edge-level RNN that generates the sequence of edges for each new node. GraphRNN represents a graph as a sequence by ordering its nodes, and models the joint distribution over graphs as a product of conditional distributions. It infers graphs by recursively predicting the adjacency vectors of nodes given previously generated structure.

Uploaded by

P6E7P7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec
Presented by: Jesse Bettencourt and Harris Chan
March 9, 2018
University of Toronto, Vector Institute

1
Introduction: Generative Model for Graphs

Modeling graphs is fundamental for studying networks


e.g. medical, chemical, social

Goal:
Model and efficiently sample complex distributions over graphs
Learn generative model from observed set of graphs

2
Challenges in Graph Generation

Large and variable output spaces


Graph with n nodes requires n2 to fully specify structure
Number of nodes and edges varies between different graphs

Non-unique representations
Distributions over graphs without assuming fixed set of nodes
n node graph represented by up to n! equivalent adjacency matrices
π ∈ Π is arbitrary node ordering

Complex, non-local dependencies


New edges depend on previously generated edges

3
Overview to GraphRNN

Decompose graph generation into two RNNs:

• Graph-level: generates sequence of nodes


• Edge-level: generates sequence of edges for each new node

4
Modeling Graphs as Sequences

Graph G ∼ p(G ) with n nodes under node ordering π


Define mapping fs from G to sequence
S π = fS (G , π) = (S1π , . . . , Snπ ) (1)
Each sequence element is adjacency vector
Siπ ∈ {0, 1}i−1 i ∈ {1, . . . , n}
for edges between node π(vi ) and π(vj ) , j ∈ {1, . . . , i − 1}

5
Modeling Graphs as Sequences

Graph G ∼ p(G ) with n nodes under node ordering π


Define mapping fs from G to sequence
S π = fS (G , π) = (S1π , . . . , Snπ ) (1)
Each sequence element is adjacency vector
Siπ ∈ {0, 1}i−1 i ∈ {1, . . . , n}
for edges between node π(vi ) and π(vj ) , j ∈ {1, . . . , i − 1}

5
Modeling Graphs as Sequences

Graph G ∼ p(G ) with n nodes under node ordering π


Define mapping fs from G to sequence
S π = fS (G , π) = (S1π , . . . , Snπ ) (1)
Each sequence element is adjacency vector
Siπ ∈ {0, 1}i−1 i ∈ {1, . . . , n}
for edges between node π(vi ) and π(vj ) , j ∈ {1, . . . , i − 1}

5
Modeling Graphs as Sequences

Graph G ∼ p(G ) with n nodes under node ordering π


Define mapping fs from G to sequence
S π = fS (G , π) = (S1π , . . . , Snπ ) (1)
Each sequence element is adjacency vector
Siπ ∈ {0, 1}i−1 i ∈ {1, . . . , n}
for edges between node π(vi ) and π(vj ) , j ∈ {1, . . . , i − 1}

5
Modeling Graphs as Sequences

Graph G ∼ p(G ) with n nodes under node ordering π


Define mapping fs from G to sequence
S π = fS (G , π) = (S1π , . . . , Snπ ) (1)
Each sequence element is adjacency vector
Siπ ∈ {0, 1}i−1 i ∈ {1, . . . , n}
for edges between node π(vi ) and π(vj ) , j ∈ {1, . . . , i − 1}

5
Distribution on Graphs → Distribution on Sequences

Instead of learning p(G ) sample, π ∼ Π to get observations of S π


Then learn p(S π ) modeled autoregressively:

p(S π )1[fG (S π ) = G ]
X
p(G ) = (3)

Exploiting sequential structure of S π , decompose p(S π )

n+1
Y
P(S π ) = p(Siπ |S1π , . . . , Si−1
π
) (4)
i=1
n+1
Y
= p(Siπ |S<i
π
)
i=1

6
Motivating GraphRNN

Model p(G )
Distribution over graphs

Model p(S π )
Distribution over sequence of edge connections

π )
Model p(Siπ |S<i
Distribution over edge connections for i-th node
conditioned on previous nodes’ edge connections
parameterize with an expressive neural network

7
GraphRNN Framework

Idea: Use an RNN that consists of a state-transition function and


an output function:

π
hi = ftrans (hi−1 , Si−1 ) (5)
θi = fout (hi ) (6)

• hi ∈ Rd encodes the state of the graph generated so far


π encodes adjacency for most recently generated node i − 1
• Si−1
• θi specifies the distribution of next node’s adjacency vector

Siπ ∼ Pθi

• ftrans and fout can be arbitrary neural networks


• Pθi can be an arbitrary distribution over binary vectors
8
GraphRNN Framework Corrected

Idea: Use an RNN that consists of a state-transition function and


an output function:

hi = ftrans (hi−1 , Siπ ) (5)


θi+1 = fout (hi ) (6)

• hi ∈ Rd encodes the state of the graph generated so far


• Siπ encodes adjacency for most recently generated node i
• θi+1 specifies the distribution of next node’s adjacency vector

π
Si+1 ∼ Pθi+1

• ftrans and fout can be arbitrary neural networks


• Pθi can be an arbitrary distribution over binary vectors.
9
GraphRNN Framework Corrected

Idea: Use an RNN that consists of a state-transition function and


an output function:

hi = ftrans (hi−1 , Siπ ) (5)


θi+1 = fout (hi ) (6)
π
Si+1 ∼ Pθi+1

10
GraphRNN Inference Algorithm

Algorithm 1 GraphRNN inference algorithm

Input: RNN-based transition module ftrans , output module fout ,


probability distribution Pθi parameterized by θi , start token SOS,
end token EOS, empty graph state h0
Output: Graph sequence S π
S0π = SOS, h0 = h0 , i = 0
repeat
i =i +1
π ) {update graph state}
hi = ftrans (hi−1 , Si−1
θi = fout (hi )
Siπ ∼ Pθi {sample node i’s edge connections}
until Siπ is EOS
Return S π = (S1π , ..., Siπ )
11
GraphRNN Inference Algorithm Corrected

Algorithm 1 GraphRNN inference algorithm

Input: RNN-based transition module ftrans , output module fout ,


probability distribution Pθi parameterized by θi , start token SOS,
end token EOS, empty graph state h0
Output: Graph sequence S π
π = SOS, h = h0 , i = 0
S 01 0

repeat
i =i +1
π
hi = ftrans (hi−1 , S i−1 ) {update graph state}
i
θi i+1 = fout (hi )
Sπ ∼ Pθ i i+1 {sample node  i i + 1’s edge connections}

i i+1 
until S π is EOS
i i+1
Return S = (S1π , ..., Siπ )
π

12
GraphRNN Variants

pmodel (S π ) over all observed graph sequences


Q
Objective:
Implement ftrans as Gated Recurrent Unit (GRU)
But different assumptions about p(Siπ |S<i π ) for each variant:

1. Multivariate Bernoulli (GraphRNN-S):


fout is a MLP with sigmoid activation that outputs θi+1 ∈ Ri
θi+1 parameterizes the multivariate Bernoulli
π ∼P
Si+1 θi+1 independently

13
GraphRNN Variants

pmodel (S π ) over all observed graph sequences


Q
Objective:
Implement ftrans as Gated Recurrent Unit (GRU)
But different assumptions about p(Siπ |S<i π ) for each variant:

2. Dependent Bernoulli sequence (GraphRNN):

i−1
Y
π
p(Siπ |S<i )= π
p(Si,j π
|Si,<j π
, S<i ) (7)
j=1

π
• Si,j ∈ {0, 1} indicating if node π(vi ) is connected to node π(vj )
• fout is a edge-level RNN generates the edges of a given node

14
Tractability via Breadth First Search (BFS)

Idea: Apply BFS ordering to the graph G with node permutation


π before generating the sequence S π
Benefits:

• Reduce overall # of seq to consider


Only need to train on all possible BFS orderings, rather than
all possible node permutations
• Reduce the number of edge predictions
Edge-level RNN only predicts M edges, the maximum size of
the BFS queue

15
BFS Order Leads To Fixed Size Siπ

Siπ ∈ RM represents “sliding window” over nodes in the BFS queue


Zero-pad all Siπ to be a length M vector:

Siπ = (Aπmax(1,i−M),i , ..., Aπi−1,i )T , i ∈ {2, ..., n} (9)

16
Experiments
Datasets

3 Synthetic and 2 real graph datasets:

Dataset Type # Graphs Graph Size Description


Community Synthetic 500 60 ≤ kV k ≤ 160 2-community,
Erdős-Rényimodel
(E-R)
Grid Synthetic 100 100 ≤ |V | ≤ 400 Standard 2D grid
B-A Synthetic 500 100 ≤ |V | ≤ 200 Barabási-Albert
model, 4 existing
nodes connected
Protein Real 918 100 ≤ |V | ≤ 500 Amino acids
nodes, edge if
≤ 6 Angstroms
apart
Ego Real 757 50 ≤ |V | ≤ 399 Document nodes,
edges citation re-
lationships, from
Citeseer

17
Baseline Methods & Settings

• Compared GraphRNN to traditional models and deep learning


baselines:
Method Type Algorithm
Erdős-RényiModel (E-R) (Erdös & Rényi, 1959)

Barabási-Albert Model (B-A) (Albert & Barabási, 2002)


Traditional
Kronecker graph models (Leskovec et al., 2010)

Mixed-membership stochastic block models (MMSB) (Airoldi et al.,


2008)

GraphVAE (Simonovsky & Komodakis, 2018)


Deep learning
DeepGMG (Li et al., 2018)

• 80%-20% train-test split


• All models trained with early stopping
• Traditional methods: learn from a single graph, so train a
separate model for each training graph in order to compare
with these methods
• Deep learning baselines: use smaller dataset:
Community-small: 12 ≤ |V | ≤ 20
18
Ego-small: 4 ≤ kV k ≤ 18
Evaluating Generated Graph Via MMD Metric

Existing:

• Visual Inspection
• Simple comparisons of average statistics between the two sets

Proposed:
A metric based on Maximum Mean Discrepancy (MMD), to
compare all moments of their empirical distributions using an
exponential kernel with Wasserstein distance.

19
Graph Visualization

Figure 2: Visualization of graphs from grid dataset (Left group),


community dataset (Middle group) and Ego dataset (Right group).
Within each group, graphs from training set (First row), graphs
generated by GraphRNN(Second row) and graphs generated by
Kronecker, MMSB and B-A baselines respectively (Third row) are shown.
Different visualization layouts are used for different datasets.

20
Comparison with traditional models

Table 1: Comparison of GraphRNNto traditional graph generative


models using MMD. (max(|V |), max(|E |)) of each dataset is shown.
Community (160,1945) Ego (399,1071) Grid (361,684) Protein (500,1575)

Deg. Clus. Orbit Deg. Clus. Orbit Deg. Clus. Orbit Deg. Clus. Orbit

E-R 0.021 1.243 0.049 0.508 1.288 0.232 1.011 0.018 0.900 0.145 1.779 1.135
B-A 0.268 0.322 0.047 0.275 0.973 0.095 1.860 0 0.720 1.401 1.706 0.920
Kronecker 0.259 1.685 0.069 0.108 0.975 0.052 1.074 0.008 0.080 0.084 0.441 0.288
MMSB 0.166 1.59 0.054 0.304 0.245 0.048 1.881 0.131 1.239 0.236 0.495 0.775

GraphRNN-S 0.055 0.016 0.041 0.090 0.006 0.043 0.029 10−5 0.011 0.057 0.102 0.037
GraphRNN 0.014 0.002 0.039 0.077 0.316 0.030 10−5 0 10−4 0.034 0.935 0.217

• GraphRNN had 80% decrease of MMD on average


compared with traditional baselines
• GraphRNN-S performed well on Protein: may not involve
highly complex edge dependencies

21
Comparison with Deep Learning Models & Generalization

Table 2: GraphRNNcompared to state-of-the-art deep graph generative


models on small graph datasets using MMD and negative log-likelihood
(NLL). (max(|V |), max(|E |)) of each dataset is shown. (DeepVAE and
GraphVAE cannot scale to the graphs in Table 1.)

Community-small (20,83) Ego-small (18,69)

Degree Clustering Orbit Train NLL Test NLL Degree Clustering Orbit Train NLL Test NLL

GraphVAE 0.35 0.98 0.54 13.55 25.48 0.13 0.17 0.05 12.45 14.28
DeepGMG 0.22 0.95 0.40 106.09 112.19 0.04 0.10 0.02 21.17 22.40
GraphRNN-S 0.02 0.15 0.01 31.24 35.94 0.002 0.05 0.0009 8.51 9.88
GraphRNN 0.03 0.03 0.01 28.95 35.10 0.0003 0.05 0.0009 9.05 10.61

• GraphRNN had 90% decrease of MMD on average


compared with deep learning baselines
• 22% smaller average NLL gap compared to other deep models

22
Experiments: Evaluation with Graph Statistics

Figure 3: Average degree (Left) and clustering coefficient (Right)


distributions of graphs from test set and graphs generated by GraphRNN
and baseline models.

• GraphRNN generated graphs’ average statistics closely matchs


the overall test set distribution.
23
Experiments: Robustness

Interpolate between (B-A) and (E-R) graphs


Randomly perturb [0%, 20%, ..., 100%] edges of B-A graphs
0% (B-A) ←→ 100% (E-R)

Figure 4: MMD performance of different approaches on degree (Left)


and clustering coefficient (Right) under different noise level.

GraphRNN maintains strong performance as we interpolate


24
between these structures, indicating high robustness and versatility.

You might also like