0% found this document useful (0 votes)
10 views

2019-Adversarially Regularized Graph Autoencoder For Graph Embedding

This paper proposes two adversarial graph embedding frameworks called adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA). The frameworks encode graph topological structure and node content into a low-dimensional representation, on which a decoder reconstructs the graph structure. The latent representation is also enforced to match a prior distribution through adversarial training for more robust embeddings. Experiments on real-world graphs show the proposed methods outperform baselines in link prediction, graph clustering, and visualization tasks.

Uploaded by

Hedy Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

2019-Adversarially Regularized Graph Autoencoder For Graph Embedding

This paper proposes two adversarial graph embedding frameworks called adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA). The frameworks encode graph topological structure and node content into a low-dimensional representation, on which a decoder reconstructs the graph structure. The latent representation is also enforced to match a prior distribution through adversarial training for more robust embeddings. Experiments on real-world graphs show the proposed methods outperform baselines in link prediction, graph clustering, and visualization tasks.

Uploaded by

Hedy Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Adversarially Regularized Graph Autoencoder for Graph Embedding

Shirui Pan1∗ , Ruiqi Hu1∗ , Guodong Long1 , Jing Jiang1 , Lina Yao2 , Chengqi Zhang1
1
Centre for Artificial Intelligence, FEIT, University of Technology Sydney, Australia
2
School of Computer Science and Engineering, University of New South Wales, Australia
[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]

Abstract Graph embedding converts graph data into a low dimen-


arXiv:1802.04407v2 [cs.LG] 8 Jan 2019

sional, compact, and continuous feature space. The key idea


Graph embedding is an effective method to rep- is to preserve the topological structure, vertex content, and
resent graph data in a low dimensional space for other side information [Zhang et al., 2017a]. This new learn-
graph analytics. Most existing embedding algo- ing paradigm has shifted the tasks of seeking complex models
rithms typically focus on preserving the topological for classification, clustering, and link prediction to learning a
structure or minimizing the reconstruction errors of robust representation of the graph data, so that any graph ana-
graph data, but they have mostly ignored the data lytic task can be easily performed by employing simple tradi-
distribution of the latent codes from the graphs, tional models (e.g., a linear SVM for the classification task).
which often results in inferior embedding in real- This merit has motivated a number of studies in this area [Cai
world graph data. In this paper, we propose a novel et al., 2017; Goyal and Ferrara, 2017].
adversarial graph embedding framework for graph
Graph embedding algorithms can be classified into three
data. The framework encodes the topological struc-
categories: probabilistic models, matrix factorization-based
ture and node content in a graph to a compact rep-
algorithms, and deep learning-based algorithms. Probabilis-
resentation, on which a decoder is trained to recon-
tic models like DeepWalk [Perozzi et al., 2014], node2vec
struct the graph structure. Furthermore, the latent [Grover and Leskovec, 2016] and LINE [Tang et al., 2015]
representation is enforced to match a prior distribu-
attempt to learn graph embedding by extracting different pat-
tion via an adversarial training scheme. To learn a
terns from the graph. The captured patterns or walks include
robust embedding, two variants of adversarial ap-
global structural equivalence, local neighborhood connectiv-
proaches, adversarially regularized graph autoen-
ities, and other various order proximities. Compared with
coder (ARGA) and adversarially regularized vari-
classical methods such as Spectral Clustering [Tang and Liu,
ational graph autoencoder (ARVGA), are devel-
2011], these graph embedding algorithms perform more ef-
oped. Experimental studies on real-world graphs
fectively and are scalable to large graphs.
validate our design and demonstrate that our algo-
rithms outperform baselines by a wide margin in Matrix factorization-based algorithms, such as GraRep
[Cao et al., 2015], HOPE [Ou et al., 2016], M-NMF [Wang
link prediction, graph clustering, and graph visual-
ization tasks. et al., 2017b] pre-process the graph structure into an adja-
cency matrix and get the embedding by decomposing the ad-
jacency matrix. Recently it has been shown that many prob-
1 Introduction abilistic algorithms are equivalent to matrix factorization ap-
proaches [Qiu et al., 2017]. Deep learning approaches, es-
Graphs are essential tools to capture and model complicated
pecially autoencoder-based methods, are also widely studied
relationships among data. In a variety of graph applications,
for graph embedding. SDNE [Wang et al., 2016] and DNGR
including protein-protein interaction networks, social media, [Cao et al., 2016] employ deep autoencoders to preserve the
and citation networks, analyzing graph data plays an impor-
graph proximities and model positive pointwise mutual infor-
tant role in various data mining tasks including node or graph
mation (PPMI). The MGAE algorithm utilizes a marginalized
classification [Kipf and Welling, 2016a; Pan et al., 2016a],
single layer autoencoder to learn representation for clustering
link prediction [Wang et al., 2017c], and node clustering [Wang et al., 2017a].
[Wang et al., 2017a]. However, the high computational com-
plexity, low parallelizability, and inapplicability of machine The approaches above are typically unregularized ap-
learning methods to graph data have made these graph an- proaches which mainly focus on preserving the structure rela-
alytic tasks profoundly challenging [Cui et al., 2017]. Re- tionship (probabilistic approaches), or minimizing the recon-
cently graph embedding has emerged as a general approach struction error (matrix factorization or deep learning meth-
to these problems. ods). They have mostly ignored the data distribution of the la-
tent codes. In practice unregularized embedding approaches

These authors contribute equally to this work. often learn a degenerate identity mapping where the latent
code space is free of any structure [Makhzani et al., 2015], maximumly. Perozzi et al. propose a DeepWalk model to
and can easily result in poor representation in dealing with learn the node embedding from a collection of random walks
real-world sparse and noisy graph data. One common way [Perozzi et al., 2014]. Since then, a number of probabilistic
to handle this problem is to introduce some regularization models such as node2vec [Grover and Leskovec, 2016] and
to the latent codes and enforce them to follow some prior LINE [Tang et al., 2015] have been developed. As a graph
data distribution [Makhzani et al., 2015]. Recently gener- can be mathematically represented as an adjacency matrix,
ative adversarial based frameworks [Donahue et al., 2016; many matrix factorization approaches such as GraRep [Cao
Radford et al., 2015] have also been developed for learning et al., 2015], HOPE [Ou et al., 2016], M-NMF [Wang et al.,
robust latent representation. However, none of these frame- 2017b] are proposed to learn the latent representation for a
works is specifically for graph data, where both topological graph. Recently deep learning models have been widely ex-
structure and content information are required to embed to a ploited to learn the graph embedding. These algorithms pre-
latent space. serve the first and second order of proximities [Wang et al.,
In this paper, we propose a novel adversarial framework 2016], or reconstruct the positive pointwise mutual informa-
with two variants, namely adversarially regularized graph tion (PPMI) [Cao et al., 2016] via different variants of au-
autoencoder (ARGA) and adversarially regularized varia- toencoders.
tional graph autoencoder (ARVGA), for graph embedding. Content enhanced embedding methods assume node con-
The theme of our framework is to not only minimize the re- tent information is available and exploit both topological in-
construction errors of the graph structure but also to enforce formation and content features simultaneously. TADW [Yang
the latent codes to match a prior distribution. By exploiting et al., 2015] presents a matrix factorization approach to ex-
both graph structure and node content with a graph convo- plore node features. TriDNR [Pan et al., 2016b] captures
lutional network, our algorithms encodes the graph data in structure, node content, and label information via a tri-party
the latent space. With a decoder aiming at reconstructing neural network architecture. UPP-SNE employs an approxi-
the topological graph information, we further incorporate an mated kernel mapping scheme to exploit user profile features
adversarial training scheme to regularize the latent codes to to enhance the embedding learning of users in social networks
learn a robust graph representation. The adversarial training [Zhang et al., 2017b].
module aims to discriminate if the latent codes are from a Unfortunately the above algorithms largely ignore the la-
real prior distribution or from the graph encoder. The graph tent distribution of the embedding, which may result in poor
encoder learning and adversarial regularization are jointly op- representation in practice. In this paper, we explore adversar-
timized in a unified framework so that each can be beneficial ial training methods to address this issue.
to the other and finally lead to a better graph embedding. The Adversarial Models. Our method is motivated by the gener-
experimental results on benchmark datasets demonstrate the ative adversarial network (GAN) [Goodfellow et al., 2014].
superb performance of our algorithms on three unsupervised GAN plays an adversarial game with two linked models: the
graph analytic tasks, namely link prediction, node clustering, generator G and the discriminator D. The discriminator can
and graph visualization. Our contributions can be summa- be a multi-layer perceptron which discriminates if an input
rized below: sample comes from the data distribution or from the generator
• We propose a novel adversarially regularized framework we built. Simultaneously, the generator is trained to generate
for graph embedding, which represent topological struc- the samples to convince the discriminator that the generated
ture and node content in a continuous vector space. Our samples come from the prior data distribution. Due to its ef-
framework learns the embedding to minimize the recon- fectiveness in many unsupervised tasks, recently a number of
struction error while enforcing the latent codes to match adversarial training algorithms have been proposed [Donahue
a prior distribution. et al., 2016; Radford et al., 2015].
Recently Makhzani et al. proposed an adversarial autoen-
• We develop two variants of adversarial approaches, coder (AAE) to learn the latent embedding by merging the
adversarially regularized graph autoencoder (ARGA) adversarial mechanism into the autoencoder [Makhzani et al.,
and adversarially regularized variational graph autoen- 2015]. However, it is designed for general data rather than
coder (ARVGA) to learn the graph embedding. graph data. Dai et al. applied the adversarial mechanism to
• Experiments on benchmark graph datasets demonstrate graphs. However, their approach can only exploit the topo-
that our graph embedding approaches outperform the logical information [Dai et al., 2017]. In contrast, our algo-
others on three unsupervised tasks. rithm is more flexible and can handle both topological and
content information for graph data.
2 Related Work
Graph Embedding Models. From the perspective of in-
3 Problem Definition and Framework
formation exploration, graph embedding algorithms can be A graph is represented as G = {V, E, X}, where V =
also separated into two groups: topological embedding ap- {vi }i = 1, · · · , n consists of a set of nodes in a graph and
proaches and content enhanced embedding methods. ei,j =< vi , vj >∈ E represents a linkage encoding the ci-
Topological embedding approaches assume that there is tation edge between the nodes. The topological structure of
only topological structure information available, and the graph G can be represented by an adjacency matrix A, where
learning objective is to preserve the topological information Ai,j = 1 if ei,j ∈ E, otherwise Ai,j = 0. xi ∈ X indicates
𝑞(𝑍|𝐴, 𝑋)
𝐴>
𝑍 ~ 𝑞(𝑍) 𝑍 𝑍D
𝐴

⋯ ⋯ ⋯ 𝜎( ∗ )
⋯ ⋯
𝑋

Encoder
Fake

𝑍 >~ 𝑝(𝑍)
Real 1 Real
+
Input

0 Fake

Discriminator

Figure 1: The architecture of the adversarially regularized graph autoencoder (ARGA). The upper tier is a graph convolutional autoencoder
that reconstructs a graph A from an embedding Z which is generated by the encoder which exploits graph structure A and the node content
matrix X. The lower tier is an adversarial network trained to discriminate if a sample is generated from the embedding or from a prior
distribution. The adversarially regularized variational graph autoencoder (ARVGA) is similar to ARGA except that it employs a variational
graph autoencoder in the upper tier (See Algorithm 1 for details).

the content features associated with each node vi . Graph Convolutional Encoder Model G(X, A). To rep-
Given a graph G, our purpose is to map the nodes vi ∈ V resent both graph structure A and node content X in a uni-
to low-dimensional vectors zi ∈ Rd with the formal format fied framework, we develop a variant of the graph convolu-
as follows: f : (A, X)  Z, where z> i is the i-th row of tional network (GCN) [Kipf and Welling, 2016a] as a graph
the matrix Z ∈ Rn×d . n is the number of nodes and d is the encoder. Our graph convolutional network (GCN) extends the
dimension of embedding. We take Z as the embedding ma- operation of convolution to graph data in the spectral domain,
trix and the embeddings should well preserve the topological and learns a layer-wise transformation by a spectral convolu-
structure A as well as content information X . tion function f (Z(l) , A|W(l) ):

3.1 Overall Framework Z(l+1) = f (Z(l) , A|W(l) ) (1)


Our objective is to learn a robust embedding given a graph Here, Zl is the input for convolution, and Z(l+1) is the out-
G = {V, E, X}. To this end, we leverage an adversarial put after convolution. We have Z0 = X ∈ Rn×m (n nodes
architecture with a graph autoencoder to directly process the
and m features) for our problem. W(l) is a matrix of fil-
entire graph and learn a robust embedding. Figure 1 demon-
ter parameters we need to learn in the neural network. If
strates the workflow of ARGA which consists of two mod-
ules: the graph autoencoder and the adversarial network. f (Z(l) , A|W(l) ) is well defined, we can build arbitrary deep
convolutional neural networks efficiently.
• Graph Convolutional Autoencoder. The autoencoder Each layer of our graph convolutional network can be ex-
takes in the structure of graph A and the node content pressed with the function f (Z(l) , A|W(l) ) as follows:
X as inputs to learn a latent representation Z, and then
reconstructs the graph structure A from Z. f (Z(l) , A|W(l) ) = φ(D e − 12 A
eDe − 12 Z(l) W(l) ), (2)
• Adversarial Regularization. The adversarial network P
forces the latent codes to match a prior distribution where A e = A + I and D e ii =
j Aij . I is the identity
e
by an adversarial training module, which discriminates matrix of A and φ is an activation function such as Relu(t) =
1
whether the current latent code zi ∈ Z comes from the max(0, t) or sigmoid(t) = 1+e t . Overall, the graph encoder
encoder or from the prior distribution. G(X, A) is constructed with a two-layer GCN. In our paper,
we develop two variants of encoder, e.g., Graph Encoder and
4 Proposed Algorithm Variational Graph Encoder.
The Graph Encoder is constructed as follows:
4.1 Graph Convolutional Autoencoder
The graph convolutional autoencoder aims to embed a graph Z(1) = fRelu (X, A|W(0) ); (3)
G = {V, E, X} in a low-dimensional space. Two key ques- Z (2)
= flinear (Z (1)
, A|W (1)
). (4)
tions arise (1) how to integrate both graph structure A and
node content X in an encoder, and (2) what sort of informa- Relu(·) and linear activation functions are used for the
tion should be reconstructed via a decoder? first and second layers. Our graph convolutional encoder
G(Z, A) = q(Z|X, A) encodes both graph structure and Algorithm 1 Adversarially Regularized Graph Embedding
node content into a representation Z = q(Z|X, A) = Z(2) . Require:
A Variational Graph Encoder is defined by an inference G = {V, E, X}: a Graph with links and features;
model: T : the number of iterations;
n
K: the number of steps for iterating discriminator;
Y d: the dimension of the latent variable
q(Z|X, A) = q(zi |X, A), (5) Ensure: Z ∈ Rn×d
i=1 1: for iterator = 1,2,3, · · · · · · , T do
q(zi |X, A) = N (zi |µi , diag(σ 2 )) (6) 2: Generate latent variables matrix Z through Eq.(4);
3: for k = 1,2, · · · · · · , K do
Here, µ = Z(2) is the matrix of mean vectors zi ; similarly 4: Sample m entities {z(1) , . . . , z(m) } from latent matrix Z
logσ = flinear (Z(1) , A|W0(1) ) which share the weights W(0) 5: Sample m entities {a(1) , . . . , a(m) } from the prior distri-
with µ in the first layer in Eq. (3). bution pz
6: Update the discriminator with its stochastic gradient:
Decoder Model. Our decoder model is used to reconstruct
m
the graph data. We can reconstruct either the graph structure 1 X
A, content information X, or both. In our paper, we pro- 5 [log D(ai ) + log (1 − D(z(i) ))]
m i=1
pose to reconstruct graph structure A, which provides more
flexibility in the sense that our algorithm will still function end for
properly even if there is no content information X available 7: Update the graph autoencoder with its stochastic gradient by
(e.g., X = I). Our decoder p(Â|Z) predicts whether there is Eq. (10) for ARGA or Eq. (11) for ARVGA;
a link between two nodes. More specifically, we train a link end for
8: return Z ∈ Rn×d
prediction layer based on the graph embedding:
n Y
Y n
p(Â|Z) = p(Âij |zi , zj ); (7) In our paper, we use simple Gaussian distribution as pz .
i=1 j=1 Adversarial Graph Autoencoder Model. The equation for
p(Âij = 1|zi , zj ) = sigmoid(z> training the encoder model with Discriminator D(Z) can be
i , zj ), (8)
written as follows:
Graph Autoencoder Model. The embedding Z and the min max Ez∼pz [logD(Z)]+Ex∼p(x) [log(1−D(G(X, A)))] (13)
G D
reconstructed graph  can be presented as follows:
where G(X, A) and D(Z) indicate the generator and discrim-
 = sigmoid(ZZ> ), here Z = q(Z|X, A) (9) inator explained above.
Optimization. For the graph encoder, we minimize the 4.3 Algorithm Explanation
reconstruction error of the graph data by:
Algorithm 1 is our proposed framework. Given a graph G,
L0 = Eq(Z|(X,A)) [log p(Â|Z)] (10) the step 2 gets the latent variables matrix Z from the graph
convolutional encoder. Then we take the same number of
For the variational graph encoder, we optimize the variational samples from the generated Z and the real data distribution
lower bound as follows: pz in step 4 and 5 respectively, to update the discriminator
L1 = Eq(Z|(X,A)) [log p(Â|Z)] − KL[q(Z|X, A) k p(Z)] with the cross-entropy cost computed in step 6. After K runs
(11) of training the discriminator, the graph encoder will try to
where KL[q(•)||p(•)] is the Kullback-Leibler divergence be- confuse the trained discriminator and update itself with gen-
tween q(•) and erated gradient in step 7. We can update Eq. (10) to train the
Q p(•). We also take a Gaussian prior p(Z) =
adversarially regularized graph autoencoder (ARGA), or
Q
i p(z i ) = i N (zi |0, I).
Eq. (11) to train the adversarially regularized variational
4.2 Adversarial Model D(Z) graph autoencoder (ARVGA), respectively. Finally, we will
The key idea of our model is to enforce latent representation return the graph embedding Z ∈ Rn×d in step 8.
Z to match a prior distribution, which is achieved by an ad-
versarial training model. The adversarial model is built on a 5 Experiments
standard multi-layer perceptron (MLP) where the output layer We report our results on three unsupervised graph analytic
only has one dimension with a sigmoid function. The adver- tasks: link prediction, node clustering, and graph visualiza-
sarial model acts as a discriminator to distinguish whether a tion. The benchmark graph datasets used in the paper are
latent code is from the prior pz (positive) or from graph en- summarized in Table 1. Each data set consists of scientific
coder G(X, A) (negative). By minimizing the cross-entropy publications as nodes and citation relationships as edges. The
cost for training the binary classifier, the embedding will fi- features are unique words in each document.
nally be regularized and improved during the training process.
The cost can be computed as follows: 5.1 Link Prediction
1 1 Baselines. We compared our algorithms against state-of-
− Ez∼pz logD(Z) − EX log(1 − D(G(X, A))), (12) the-art algorithms for the link prediction task:
2 2
Data Set # Nodes # Links # Content Words # Features
Cora 2,708 5,429 3,880,564 1,433
Citeseer 3,327 4,732 12,274,336 3,703
PubMed 19,717 44,338 9,858,500 500

Table 1: Real-world Graph Datasets Used in the Paper

• DeepWalk [Perozzi et al., 2014]: is a network repre-


sentation approach which encodes social relations into a
continuous vector space. Figure 2: Average performance on different dimensions of the em-
• Spectral Clustering [Tang and Liu, 2011]: is an effec- bedding. (A) Average Precision score; (B) AUC score.
tive approach for learning social embedding.
• GAE [Kipf and Welling, 2016b]: is the most recent Parameter Study. We vary the dimension of embedding
autoencoder-based unsupervised framework for graph from 8 neurons to 1024 and report the results in Fig 2.
data, which naturally leverages both topological and The results from both Fig 2 (A) and (B) reveal similar
content information. trends: when adding the dimension of embedding from 8-
• VGAE [Kipf and Welling, 2016b]: is a variational graph neuron to 16-neuron, the performance of embedding on link
autoencoder approach for graph embedding with both prediction steadily rises; but when we further increase the
topological and content information. number of the neurons at the embedding layer to 32-neuron,
the performance fluctuates however the results for both the
• ARGA: Our proposed adversarially regularized autoen- AP score and the AUC score remain good.
coder algorithm which uses graph autoencoder to learn It is worth mentioning that if we continue to set more neu-
the embedding. rons, for examples, 64-neuron, 128-neuron and 1024-neuron,
• ARVGA: Our proposed algorithm, which uses a varia- the performance rises markedly.
tional graph autoencoder to learn the embedding.
5.2 Node Clustering
Metrics. We report the results in terms of AUC score (the
area under a receiver operating characteristic curve) and av- For the node clustering task, we first learn the graph embed-
erage precision (AP) [Kipf and Welling, 2016b] score. We ding, and then perform K-means clustering algorithm based
conduct each experiment 10 times and report the mean values on the embedding.
with the standard errors as the final scores. Each dataset is Baselines. We compare both embedding based approaches
separated into a training, testing set and validation set. The as well as approaches directly for graph clustering. Except
validation set contains 5% citation edges for hyperparameter for the baselines we compared for link prediction, we also
optimization, the test set holds 10% citation edges to verify include baselines which are designed for clustering:
the performance, and the rest are used for training. 1. K-means is a classical method and also the foundation
Parameter Settings. For the Cora and Citeseer data sets, we of many clustering algorithms.
train all autoencoder-related models for 200 iterations and op- 2. Graph Encoder [Tian et al., 2014] learns graph embed-
timize them with the Adam algorithm. Both learning rate and ding for spectral graph clustering.
discriminator learning rate are set as 0.001. As the PubMed 3. DNGR [Cao et al., 2016] trains a stacked denoising au-
data set is relatively large (around 20,000 nodes), we iterate toencoder for graph embedding.
2,000 times for an adequate training with a 0.008 discrimi-
nator learning rate and 0.001 learning rate. We construct en- 4. RTM [Chang and Blei, 2009] learns the topic distribu-
coders with a 32-neuron hidden layer and a 16-neuron em- tions of each document from both text and citation.
bedding layer for all the experiments and all the discrimina- 5. RMSC [Xia et al., 2014] employs a multi-view learning
tors are built with two hidden layers(16-neuron, 64-neuron approach for graph clustering.
respectively). For the rest of the baselines, we retain to the 6. TADW [Yang et al., 2015] applies matrix factorization
settings described in the corresponding papers. for network representation learning.
Experimental Results. The details of the experimental
Here the first three algorithms only exploit the graph struc-
results on the link prediction are shown in Table 2. The re-
tures, while the last three algorithms use both graph structure
sults show that by incorporating an effective adversarial train-
and node content for the graph clustering task.
ing module into our graph convolutional autoencoder, ARGA
and ARVGA achieve outstanding performance: all AP and Metrics. Following [Xia et al., 2014], we employ five met-
AUC scores are as higher as 92% on all three data sets. Com- rics to validate the clustering results: accuracy (Acc), normal-
pared with all the baselines, ARGE increased the AP score ized mutual information (NMI), precision, F-score (F1) and
from around 2.5% compared with VGAE incorporating with average rand index (ARI).
node features, 11% compared with VGAE without node fea- Experimental Results. The clustering results on the Cora
tures; 15.5% and 10.6% compared with DeepWalk and Spec- and Citeseer data sets are given in Table 3 and Table 4. The re-
tral Clustering respectively on the large PubMed data set . sults show that ARGA and ARVGA have achieved a dramatic
Approaches Cora Citeseer PubMed
AUC AP AUC AP AUC AP
SC 84.6 ± 0.01 88.5 ± 0.00 80.5 ± 0.01 85.0 ± 0.01 84.2 ± 0.02 87.8 ± 0.01
DW 83.1 ± 0.01 85.0 ± 0.00 80.5 ± 0.02 83.6 ± 0.01 84.4 ± 0.00 84.1 ± 0.00
GAE∗ 84.3 ± 0.02 88.1 ± 0.01 78.7 ± 0.02 84.1 ± 0.02 82.2 ± 0.01 87.4 ± 0.00
VGAE∗ 84.0 ± 0.02 87.7 ± 0.01 78.9 ± 0.03 84.1 ± 0.02 82.7 ± 0.01 87.5 ± 0.01
GAE 91.0 ± 0.02 92.0 ± 0.03 89.5 ± 0.04 89.9 ± 0.05 96.4 ± 0.00 96.5 ± 0.00
VGAE 91.4 ± 0.01 92.6 ± 0.01 90.8 ± 0.02 92.0 ± 0.02 94.4 ± 0.02 94.7 ± 0.02
ARGE 92.4 ± 0.003 93.2 ± 0.003 91.9 ± 0.003 93.0± 0.003 96.8 ± 0.001 97.1 ± 0.001
ARVGE 92.4 ± 0.004 92.6 ± 0.004 92.4 ± 0.003 93.0 ± 0.003 96.5± 0.001 96.8± 0.001

Table 2: Results for Link Prediction. GAE∗ and VGAE∗ are variants of GAE, which only explore topological structure, i.e., X = I.

Figure 3: The Cora data visualization comparison. From left to right: embeddings from our ARGA, VGAE, GAE, DeepWalk, and Spectral
Clustering. The different colors represent different groups.

Cora Acc NMI F1 Precision ARI Citeseer Acc NMI F1 Precision ARI
K-means 0.492 0.321 0.368 0.369 0.230 K-means 0.540 0.305 0.409 0.405 0.279
Spectral 0.367 0.127 0.318 0.193 0.031 Spectral 0.239 0.056 0.299 0.179 0.010
GraphEncoder 0.325 0.109 0.298 0.182 0.006 GraphEncoder 0.225 0.033 0.301 0.179 0.010
DeepWalk 0.484 0.327 0.392 0.361 0.243 DeepWalk 0.337 0.088 0.270 0.248 0.092
DNGR 0.419 0.318 0.340 0.266 0.142 DNGR 0.326 0.180 0.300 0.200 0.044
RTM 0.440 0.230 0.307 0.332 0.169 RTM 0.451 0.239 0.342 0.349 0.203
RMSC 0.407 0.255 0.331 0.227 0.090 RMSC 0.295 0.139 0.320 0.204 0.049
TADW 0.560 0.441 0.481 0.396 0.332 TADW 0.455 0.291 0.414 0.312 0.228
GAE 0.596 0.429 0.595 0.596 0.347 GAE 0.408 0.176 0.372 0.418 0.124
VGAE 0.609 0.436 0.609 0.609 0.346 VGAE 0.344 0.156 0.308 0.349 0.093
ARGE 0.640 0.449 0.619 0.646 0.352 ARGE 0.573 0.350 0.546 0.573 0.341
ARVGE 0.638 0.450 0.627 0.624 0.374 ARVGE 0.544 0.261 0.529 0.549 0.245

Table 3: Clustering Results on Cora Table 4: Clustering Results on Citeseer

improvement on all five metrics compared with all the other


baselines. For instance, on Citeseer, ARGA has increased the
accuracy from 6.1% compared with K-means to 154.7% com- 6 Conclusion
pared with GraphEncoder; increased the F1 score from 31.9%
compared with TADW to 102.2% compared with DeepWalk;
and increased NMI from 14.8% compared with K-means to In this paper, we proposed a novel adversarial graph embed-
124.4% compared with VGAE. The wide margin in the re- ding framework for graph data. We argue that most existing
sults between ARGE and GAE (and the others) has further graph embedding algorithms are unregularized methods that
proved the superiority of our adversarially regularized graph ignore the data distributions of the latent representation and
autoencoder. suffer from inferior embedding in real-world graph data. We
proposed an adversarial training scheme to regularize the la-
5.3 Graph Visualization tent codes and enforce the latent codes to match a prior distri-
We visualize the Cora data in a two-dimensional space by bution. The adversarial module is jointly learned with a graph
applying the t-SNE algorithm [Maaten, 2014] on the learned convolutional autoencoder to produce a robust representation.
embedding. The results in Fig 3 validate that by applying Experiment results demonstrated that our algorithms ARGA
adversarial training to the graph data, we can obtained a more and ARVGA outperform baselines in link prediction, node
meaningful layout of the graph data. clustering, and graph visualization tasks.
Acknowledgements [Ou et al., 2016] M. Ou, P. Cui, J. Pei, and et.al. Asymmetric
This research was funded by the Australian Government transitivity preserving graph embedding. In KDD, pages
through the Australian Research Council (ARC) under grants 1105–1114, 2016.
1) LP160100630 partnership with Australia Government [Pan et al., 2016a] S. Pan, J. Wu, X. Zhu, and et.al. Joint
Department of Health and 2) LP150100671 partnership structure feature exploration and regularization for multi-
with Australia Research Alliance for Children and Youth task graph classification. TKDE, 28(3):715–728, 2016.
(ARACY) and Global Business College Australia (GBCA). [Pan et al., 2016b] S. Pan, J. Wu, X. Zhu, and et.al. Tri-party
We acknowledge the support of NVIDIA Corporation and deep network representation. In IJCAI, pages 1895–1901,
MakeMagic Australia with the donation of GPU used for this 2016.
research.
[Perozzi et al., 2014] B. Perozzi, R. Al-Rfou, and S. Skiena.
Deepwalk: Online learning of social representations. In
References SIGKDD, pages 701–710. ACM, 2014.
[Cai et al., 2017] H. Cai, V. W. Zheng, and K. C-C Chang. [Qiu et al., 2017] J. Qiu, Y. Dong, H. Ma, and et.al. Network
A comprehensive survey of graph embedding: Prob- embedding as matrix factorization: Unifyingdeepwalk,
lems, techniques and applications. arXiv preprint line, pte, and node2vec. arXiv preprint arXiv:1710.02971,
arXiv:1709.07604, 2017. 2017.
[Cao et al., 2015] S. Cao, W. Lu, and Q. Xu. Grarep: Learn- [Radford et al., 2015] A. Radford, L. Metz, and S. Chintala.
ing graph representations with global structural informa- Unsupervised representation learning with deep convo-
tion. In CIKM, pages 891–900. ACM, 2015. lutional generative adversarial networks. arXiv preprint
[Cao et al., 2016] S. Cao, W. Lu, and Q. Xu. Deep neu- arXiv:1511.06434, 2015.
ral networks for learning graph representations. In AAAI, [Tang and Liu, 2011] L. Tang and H. Liu. Leveraging social
pages 1145–1152, 2016. media networks for classification. DMKD, 23(3):447–478,
[Chang and Blei, 2009] J. Chang and D. Blei. Relational 2011.
topic models for document networks. In Artificial Intel- [Tang et al., 2015] J. Tang, M. Qu, M. Wang, and et.al.
ligence and Statistics, pages 81–88, 2009. LINE: large-scale information network embedding. In
[Cui et al., 2017] P. Cui, X. Wang, J. Pei, and et.al. A survey WWW, pages 1067–1077, 2015.
on network embedding. arXiv preprint arXiv:1711.08752, [Tian et al., 2014] F. Tian, B. Gao, Q. Cui, and et.al. Learn-
2017. ing deep representations for graph clustering. In AAAI,
[Dai et al., 2017] Q. Dai, Q. Li, J. Tang, and et.al. Adversar- pages 1293–1299, 2014.
ial network embedding. arXiv preprint arXiv:1711.07838, [Wang et al., 2016] D. Wang, P. Cui, and W. Zhu. Structural
2017. deep network embedding. In SIGKDD, pages 1225–1234.
[Donahue et al., 2016] J. Donahue, P. Krähenbühl, and ACM, 2016.
T. Darrell. Adversarial feature learning. arXiv preprint [Wang et al., 2017a] C. Wang, S. Pan, G. Long, and et.al.
arXiv:1605.09782, 2016. Mgae: Marginalized graph autoencoder for graph cluster-
[Goodfellow et al., 2014] I. Goodfellow, J. Pouget-Abadie, ing. In CIKM, pages 889–898. ACM, 2017.
M. Mirza, and et.al. Generative adversarial nets. In NIPS, [Wang et al., 2017b] X. Wang, P. Cui, J. Wang, and et.al.
pages 2672–2680, 2014. Community preserving network embedding. In AAAI,
[Goyal and Ferrara, 2017] P. Goyal and E. Ferrara. Graph pages 203–209, 2017.
embedding techniques, applications, and performance: A [Wang et al., 2017c] Z. Wang, C. Chen, and W. Li. Predic-
survey. arXiv preprint arXiv:1705.02801, 2017. tive network representation learning for link prediction. In
[Grover and Leskovec, 2016] A. Grover and J. Leskovec. SIGIR, pages 969–972. ACM, 2017.
node2vec: Scalable feature learning for networks. In [Xia et al., 2014] R. Xia, Y. Pan, L. Du, and et.al. Robust
SIGKDD, pages 855–864. ACM, 2016. multi-view spectral clustering via low-rank and sparse de-
[Kipf and Welling, 2016a] T. N Kipf and M. Welling. Semi- composition. In AAAI, pages 2149–2155, 2014.
supervised classification with graph convolutional net- [Yang et al., 2015] C. Yang, Z. Liu, D. Zhao, and et.al. Net-
works. arXiv preprint arXiv:1609.02907, 2016. work representation learning with rich text information. In
[Kipf and Welling, 2016b] T. N Kipf and M. Welling. Varia- IJCAI, pages 2111–2117, 2015.
tional graph auto-encoders. NIPS, 2016. [Zhang et al., 2017a] D. Zhang, J. Yin, X. Zhu, and et.al.
Network representation learning: A survey. arXiv preprint
[Maaten, 2014] L. V. D. Maaten. Accelerating t-sne using
arXiv:1801.05852, 2017.
tree-based algorithms. JMLR, 15(1):3221–3245, 2014.
[Zhang et al., 2017b] D. Zhang, J. Yin, X. Zhu, and et.al.
[Makhzani et al., 2015] A. Makhzani, J. Shlens, N. Jaitly,
User profile preserving social network embedding. In IJ-
and et.al. Adversarial autoencoders. arXiv preprint CAI, pages 3378–3384, 2017.
arXiv:1511.05644, 2015.

You might also like