Composition-Based Multi-Relational Graph Convolutional Networks
Composition-Based Multi-Relational Graph Convolutional Networks
A BSTRACT
arXiv:1911.03082v2 [cs.LG] 18 Jan 2020
Graph Convolutional Networks (GCNs) have recently been shown to be quite suc-
cessful in modeling graph-structured data. However, the primary focus has been
on handling simple undirected graphs. Multi-relational graphs are a more general
and prevalent form of graphs where each edge has a label and direction associ-
ated with it. Most of the existing approaches to handle such graphs suffer from
over-parameterization and are restricted to learning representations of nodes only.
In this paper, we propose C OMP GCN, a novel Graph Convolutional framework
which jointly embeds both nodes and relations in a relational graph. C OMP GCN
leverages a variety of entity-relation composition operations from Knowledge
Graph Embedding techniques and scales with the number of relations. It also gen-
eralizes several of the existing multi-relational GCN methods. We evaluate our
proposed method on multiple tasks such as node classification, link prediction,
and graph classification, and achieve demonstrably superior results. We make the
source code of C OMP GCN available to foster reproducible research.
1 I NTRODUCTION
Graphs are one of the most expressive data-structures which have been used to model a variety of
problems. Traditional neural network architectures like Convolutional Neural Networks (Krizhevsky
et al., 2012) and Recurrent Neural Networks (Hochreiter & Schmidhuber, 1997) are constrained to
handle only Euclidean data. Recently, Graph Convolutional Networks (GCNs) (Bruna et al., 2013;
Defferrard et al., 2016) have been proposed to address this shortcoming, and have been success-
fully applied to several domains such as social networks (Hamilton et al., 2017), knowledge graphs
(Schlichtkrull et al., 2017), natural language processing (Marcheggiani & Titov, 2017), drug discov-
ery (Ramsundar et al., 2019), crystal property prediction (Sanyal et al., 2018), and natural sciences
(Fout et al., 2017).
However, most of the existing research on GCNs (Kipf & Welling, 2016; Hamilton et al., 2017;
Veličković et al., 2018) have focused on learning representations of nodes in simple undirected
graphs. A more general and pervasive class of graphs are multi-relational graphs1 . A notable ex-
ample of such graphs is knowledge graphs. Most of the existing GCN based approaches for han-
dling relational graphs (Marcheggiani & Titov, 2017; Schlichtkrull et al., 2017) suffer from over-
parameterization and are limited to learning only node representations. Hence, such methods are
not directly applicable for tasks such as link prediction which require relation embedding vectors.
Initial attempts at learning representations for relations in graphs (Monti et al., 2018; Beck et al.,
2018) have shown some performance gains on tasks like node classification and neural machine
translation.
There has been extensive research on embedding Knowledge Graphs (KG) (Nickel et al., 2016;
Wang et al., 2017) where representations of both nodes and relations are jointly learned. These meth-
ods are restricted to learning embeddings using link prediction objective. Even though GCNs can
∗
Equally Contributed
†
Work done while at IISc, Bangalore
1
In this paper, multi-relational graphs refer to graphs with edges that have labels and directions.
1
Published as a conference paper at ICLR 2020
London London
United ϕ United
Kingdom Kingdom
Born-in Citizen-of_inv Born-in
WO
Born-in_inv
Citizen-of
+ Citizen-of
Christopher Christopher WO
WI
Nolan Nolan ϕ
Directed-by
ϕ
Directed-by_inv Directed-by_inv
Figure 1: Overview of C OMP GCN. Given node and relation embeddings, C OMP GCN performs a
composition operation φ(·) over each edge in the neighborhood of a central node (e.g. Christopher
Nolan above). The composed embeddings are then convolved with specific filters WO and WI for
original and inverse relations respectively. We omit self-loop in the diagram for clarity. The message
from all the neighbors are then aggregated to get an updated embedding of the central node. Also,
the relation embeddings are transformed using a separate weight matrix. Please refer to Section 4
for details.
learn from task-specific objectives such as classification, their application has been largely restricted
to non-relational graph setting. Thus, there is a need for a framework which can utilize KG embed-
ding techniques for learning task-specific node and relation embeddings. In this paper, we propose
C OMP GCN, a novel GCN framework for multi-relational graphs which systematically leverages
entity-relation composition operations from knowledge graph embedding techniques. C OMP GCN
addresses the shortcomings of previously proposed GCN models by jointly learning vector repre-
sentations for both nodes and relations in the graph. An overview of C OMP GCN is presented in
Figure 1. The contributions of our work can be summarized as follows:
1. We propose C OMP GCN, a novel framework for incorporating multi-relational information in
Graph Convolutional Networks which leverages a variety of composition operations from knowl-
edge graph embedding techniques to jointly embed both nodes and relations in a graph.
2. We demonstrate that C OMP GCN framework generalizes several existing multi-relational GCN
methods (Proposition 4.1) and also scales with the increase in number of relations in the graph
(Section 6.3).
3. Through extensive experiments on tasks such as node classification, link prediction, and graph
classification, we demonstrate the effectiveness of our proposed method.
The source code of C OMP GCN and datasets used in the paper have been made available at http:
//github.com/malllabiisc/CompGCN.
2 R ELATED W ORK
Graph Convolutional Networks: GCNs generalize Convolutional Neural Networks (CNNs) to
non-Euclidean data. GCNs were first introduced by Bruna et al. (2013) and later made scalable
through efficient localized filters in the spectral domain (Defferrard et al., 2016). A first-order ap-
proximation of GCNs using Chebyshev polynomials has been proposed by Kipf & Welling (2016).
Recently, several of its extensions have also been formulated (Hamilton et al., 2017; Veličković
et al., 2018; Xu et al., 2019). Most of the existing GCN methods follow Message Passing Neural
Networks (MPNN) framework (Gilmer et al., 2017) for node aggregation. Our proposed method can
be seen as an instantiation of the MPNN framework. However, it is specialized for relational graphs.
GCNs for Multi-Relational Graph: An extension of GCNs for relational graphs is proposed by
Marcheggiani & Titov (2017). However, they only consider direction-specific filters and ignore rela-
2
Published as a conference paper at ICLR 2020
tions due to over-parameterization. Schlichtkrull et al. (2017) address this shortcoming by proposing
basis and block-diagonal decomposition of relation specific filters. Weighted Graph Convolutional
Network (Shang et al., 2019) utilizes learnable relational specific scalar weights during GCN ag-
gregation. While these methods show performance gains on node classification and link prediction,
they are limited to embedding only the nodes of the graph. Contemporary to our work, Ye et al.
(2019) have also proposed an extension of GCNs for embedding both nodes and relations in multi-
relational graphs. However, our proposed method is a more generic framework which can leverage
any KG composition operator. We compare against their method in Section 6.1.
Knowledge Graph Embedding: Knowledge graph (KG) embedding is a widely studied field
(Nickel et al., 2016; Wang et al., 2017) with application in tasks like link prediction and ques-
tion answering (Bordes et al., 2014). Most of KG embedding approaches define a score function
and train node and relation embeddings such that valid triples are assigned a higher score than the
invalid ones. Based on the type of score function, KG embedding method are classified as transla-
tional (Bordes et al., 2013; Wang et al., 2014b), semantic matching based (Yang et al., 2014; Nickel
et al., 2016) and neural network based (Socher et al., 2013; Dettmers et al., 2018). In our work, we
evaluate the performance of C OMP GCN on link prediction with methods of all three types.
3 BACKGROUND
In this section, we give a brief overview of Graph Convolutional Networks (GCNs) for undirected
graphs and its extension to directed relational graphs.
GCN on Undirected Graphs: Given a graph G = (V, E, X ), where V denotes the set of vertices,
E is the set of edges, and X ∈ R|V|×d0 represents d0 -dimensional input features of each node.
The node representation obtained from a single GCN layer is defined as: H = f (ÂX W ). Here,
 = D e − 12 (A + I)D
e − 12 is the normalized adjacency matrix with added self-connections and D e is
d0 ×d1
P
defined as Dii = j (A + I)ij . The model parameter is denoted by W ∈ R
e and f is some
activation function. The GCN representation H encodes the immediate neighborhood of each node
in the graph. For capturing multi-hop dependencies in the graph, several GCN layers can be stacked,
one on the top of another as follows: H k+1 = f (ÂH k W k ), where k denotes the number of layers,
W k ∈ Rdk ×dk+1 is layer-specific parameter and H 0 = X .
GCN on Multi-Relational Graphs: For a multi-relational graph G = (V, R, E, X ), where R
denotes the set of relations, and each edge (u, v, r) represents that the relation r ∈ R exist from
node u to v. The GCN formulation as devised by Marcheggiani & Titov (2017) is based on the
assumption that information in a directed edge flows along both directions. Hence, for each edge
(u, v, r) ∈ E, an inverse edge (v, u, r−1 ) is included in G. The representations obtained after k
layers of directed GCN is given by
H k+1 = f (ÂH k Wrk ). (1)
Here, Wrk denotes the relation specific parameters of the model. However, the above formulation
leads to over-parameterization with an increase in the number of relations and hence, Marcheggiani
& Titov (2017) use direction-specific weight matrices. Schlichtkrull et al. (2017) address over-
parameterization by proposing basis and block-diagonal decomposition of Wrk .
3
Published as a conference paper at ICLR 2020
Table 1: Comparison of our proposed method, C OMP GCN with other Graph Convolutional meth-
ods. Here, K denotes the number of layers in the model, d is the embedding dimension, B rep-
resents the number of bases and |R| indicates the total number of relations in the graph. Overall,
C OMP GCN is most comprehensive and is more parameter efficient than methods which encode
relation and direction information.
Unlike most of the existing methods which embed only nodes in the graph, C OMP GCN learns a
d-dimensional representation hr ∈ Rd , ∀r ∈ R along with node embeddings hv ∈ Rd , ∀v ∈ V.
Representing relations as vectors alleviates the problem of over-parameterization while applying
GCNs on relational graphs. Further, it allows C OMP GCN to exploit any available relation fea-
tures (Z) as initial representations. To incorporate relation embeddings into the GCN formulation,
we leverage the entity-relation composition operations used in Knowledge Graph embedding ap-
proaches (Bordes et al., 2013; Nickel et al., 2016), which are of the form
eo = φ(es , er ).
d d d
Here, φ : R × R → R is a composition operator, s, r, and o denote subject, relation and object
in the knowledge graph and e(·) ∈ Rd denotes their corresponding embeddings. In this paper, we
restrict ourselves to non-parameterized operations like subtraction (Bordes et al., 2013), multiplica-
tion (Yang et al., 2014) and circular-correlation (Nickel et al., 2016). However, C OMP GCN can be
extended to parameterized operations like Neural Tensor Networks (NTN) (Socher et al., 2013) and
ConvE (Dettmers et al., 2018). We defer their analysis as future work.
As we show in Section 6, the choice of composition operation is important in deciding the quality of
the learned embeddings. Hence, superior composition operations for Knowledge Graphs developed
in future can be adopted to improve C OMP GCN’s performance further.
where N (v) is a set of immediate neighbors of v for its outgoing edges. Since this formulation
suffers from over-parameterization, in C OMP GCN we perform composition (φ) of a neighboring
node u with respect to its relation r as defined above. This allows our model to be relation aware
while being linear (O(|R|d)) in the number of feature dimensions. Moreover, for treating original,
inverse, and self edges differently, we define separate filters for each of them. The update equation
of C OMP GCN is given as:
!
X
hv = f Wλ(r) φ(xu , zr ) , (2)
(u,r)∈N (v)
where xu , zr denotes initial features for node u and relation r respectively, hv denotes the updated
representation of node v, and Wλ(r) ∈ Rd1 ×d0 is a relation-type specific parameter. In C OMP GCN,
we use direction specific weights, i.e., λ(r) = dir(r), given as:
WO , r ∈ R
Wdir(r) = WI , r ∈ Rinv (3)
WS , r = > (self-loop)
4
Published as a conference paper at ICLR 2020
k
Methods Wλ(r) φ(hku , hkr )
Kipf-GCN (Kipf & Welling, 2016) Wk hku
Relational-GCN (Schlichtkrull et al., 2017) Wrk hku
k
Directed-GCN (Marcheggiani & Titov, 2017) Wdir(r) hku
Weighted-GCN (Shang et al., 2019) Wk αrk hku
Table 2: Reduction of C OMP GCN to several existing Graph Convolutional methods. Here, αrk is a
relation specific scalar, Wrk denotes a separate weight for each relation, and Wdir(r)
k
is as defined in
Equation 3. Please refer to Proposition 4.1 for more details.
Further, in C OMP GCN, after the node embedding update defined in Eq. 2, the relation embeddings
are also transformed as follows:
hr = Wrel zr , (4)
where Wrel ∈ Rd1 ×d0 is a learnable transformation matrix which projects all the relations to the
same embedding space as nodes and allows them to be utilized in the next C OMP GCN layer. In
Table 1, we present a contrast between C OMP GCN and other existing methods in terms of their
features and parameter complexity.
Scaling with Increasing Number of Relations To ensure that C OMP GCN scales with the increas-
ing number of relations, we use a variant of the basis formulations proposed in Schlichtkrull et al.
(2017). Instead of independently defining an embedding for each relation, they are expressed as a
linear combination of a set of basis vectors. Formally, let {v1 , v2 , ..., vB } be a set of learnable basis
vectors. Then, initial relation representation is given as:
B
X
zr = αbr vb .
b=1
hk+1
r
k
= Wrel hkr .
Here, h0v and h0r are the initial node (xv ) and relation (zr ) features respectively.
Proposition 4.1. C OMP GCN generalizes the following Graph Convolutional based methods: Kipf-
GCN (Kipf & Welling, 2016), Relational GCN (Schlichtkrull et al., 2017), Directed GCN (Marcheg-
giani & Titov, 2017), and Weighted GCN (Shang et al., 2019).
Proof. For Kipf-GCN, this can be trivially obtained by making weights (Wλ(r) ) and composition
function (φ) relation agnostic in Equation 5, i.e., Wλ(r) = W and φ(hu , hr ) = hu . Similar
reductions can be obtained for other methods as shown in Table 2.
5
Published as a conference paper at ICLR 2020
FB15k-237 WN18RR
MRR MR H@10 H@3 H@1 MRR MR H@10 H@3 H@1
TransE (Bordes et al., 2013) .294 357 .465 - - .226 3384 .501 - -
DistMult (Yang et al., 2014) .241 254 .419 .263 .155 .43 5110 .49 .44 .39
ComplEx (Trouillon et al., 2016) .247 339 .428 .275 .158 .44 5261 .51 .46 .41
R-GCN (Schlichtkrull et al., 2017) .248 - .417 .151 - - - -
KBGAN (Cai & Wang, 2018) .278 - .458 - .214 - .472 - -
ConvE (Dettmers et al., 2018) .325 244 .501 .356 .237 .43 4187 .52 .44 .40
ConvKB (Nguyen et al., 2018) .243 311 .421 .371 .155 .249 3324 .524 .417 .057
SACN (Shang et al., 2019) .35 - .54 .39 .26 .47 - .54 .48 .43
HypER (Balažević et al., 2019) .341 250 .520 .376 .252 .465 5798 .522 .477 .436
RotatE (Sun et al., 2019) .338 177 .533 .375 .241 .476 3340 .571 .492 .428
ConvR (Jiang et al., 2019) .350 - .528 .385 .261 .475 - .537 .489 .443
VR-GCN (Ye et al., 2019) .248 - .432 .272 .159 - - - - -
C OMP GCN (Proposed Method) .355 197 .535 .390 .264 .479 3533 .546 .494 .443
Table 3: Link prediction performance of C OMP GCN and several recent models on FB15k-237 and WN18RR
datasets. The results of all the baseline methods are taken directly from the previous papers (’-’ indicates
missing values). We find that C OMP GCN outperforms all the existing methods on 4 out of 5 metrics on
FB15k-237 and 3 out of 5 metrics on WN18RR. Please refer to Section 6.1 for more details.
5 E XPERIMENTAL S ETUP
5.2 BASELINES
Across all tasks, we compare against the following GCN methods for relational graphs: (1)
Relational-GCN (R-GCN) (Schlichtkrull et al., 2017) which uses relation-specific weight matri-
ces that are defined as a linear combinations of a set of basis matrices. (2) Directed-GCN (D-GCN)
(Marcheggiani & Titov, 2017) has separate weight matrices for incoming edges, outgoing edges,
and self-loops. It also has relation-specific biases. (3) Weighted-GCN (W-GCN) (Shang et al.,
2019) assigns a learnable scalar weight to each relation and multiplies an incoming "message" by
this weight. Apart from this, we also compare with several task-specific baselines mentioned below.
Link prediction: For evaluating C OMP GCN, we compare against several non-neural and neural
baselines: TransE Bordes et al. (2013), DistMult (Yang et al., 2014), ComplEx (Trouillon et al.,
2016), R-GCN (Schlichtkrull et al., 2017), KBGAN (Cai & Wang, 2018), ConvE (Dettmers et al.,
2018), ConvKB (Nguyen et al., 2018), SACN (Shang et al., 2019), HypER (Balažević et al., 2019),
RotatE (Sun et al., 2019), ConvR (Jiang et al., 2019), and VR-GCN (Ye et al., 2019).
Node and Graph Classification: For node classification, following Schlichtkrull et al. (2017), we
compare with Feat (Paulheim & Fümkranz, 2012), WL (Shervashidze et al., 2011), and RDF2Vec
(Ristoski & Paulheim, 2016). Finally, for graph classification, we evaluate against PACHY SAN
(Niepert et al., 2016), Deep Graph CNN (DGCNN) (Zhang et al., 2018), and Graph Isomorphism
Network (GIN) (Xu et al., 2019).
6
Published as a conference paper at ICLR 2020
Table 4: Performance on link prediction task evaluated on FB15k-237 dataset. X + M (Y) denotes that
method M is used for obtaining entity (and relation) embeddings with X as the scoring function. In the case
of C OMP GCN, Y denotes the composition operator used. B indicates the number of relational basis vectors
used. Overall, we find that C OMP GCN outperforms all the existing methods across different scoring functions.
ConvE + C OMP GCN (Corr) gives the best performance across all settings (highlighted using · ). Please refer
to Section 6.1 for more details.
All 100.0
Entity
Embeddings
100 99.4
Number of Basis
CompGCN TransE
Relation
Embeddings 50 98.6
R-GCN/ Entity
Knowledge
WGCN
TransE 25 98.0
Graph Embeddings
Relation 5 97.2
Embeddings
0 25 50 75 100
Encoder (M) Score
Function (X) Relative MRR
Figure 2: Knowledge Graph link prediction with Figure 3: Performance of C OMP GCN with dif-
C OMP GCN and other methods. C OMP GCN gener- ferent number of relation basis vectors on link pre-
ates both entity and relation embedding as opposed diction task. We report the relative change in MRR
to just entity embeddings for other models. For more on FB15k-237 dataset. Overall, C OMP GCN gives
details, please refer to Section 6.2 comparable performance even with limited parame-
ters. Refer to Section 6.3 for details.
6 R ESULTS
7
Published as a conference paper at ICLR 2020
0.345
All
0.342 All 97.2
Number of Relations
Number of Relations 0.331
100 100 99.4
0.325
0.316
50 50 100.0
0.308
0.321
25 25 100.0
0.316
0.269 5 99.3
10 CompGCN (B=5)
0.265 R-GCN
0 25 50 75 100
0.250 0.275 0.300 0.325 0.350
Figure 4: Comparison of C OMP GCN (B = 5) with Figure 5: Performance of C OMP GCN with differ-
R-GCN for pruned versions of Fb15k-237 dataset ent number of relations on link prediction task. We
containing different number of relations. C OMP GCN report the relative change in MRR on pruned ver-
with 5 relation basis vectors outperforms R-GCN sions of FB15k-237 dataset. Overall, C OMP GCN
across all setups. For more details, please refer to Sec- gives comparable performance even with limited pa-
tion 6.2 rameters. Refer to Section 6.2 for details.
8
Published as a conference paper at ICLR 2020
Table 5: Performance comparison on node classification (Left) and graph classification (Right) tasks. ∗ and †
indicate that results are directly taken from Schlichtkrull et al. (2017) and Xu et al. (2019) respectively. Overall,
we find that C OMP GCN either outperforms or performs comparably compared to the existing methods. Please
refer to Section 6.4 for more details.
vectors. We note that with B = 100, the performance of the model becomes comparable to the case
where all relations have their individual embeddings. In Table 4, we report the results for the best
performing model across all score function with B set to 50. We note that the parameter-efficient
variant also gives a comparable performance and outperforms the baselines in all settings.
Effect of Number of Relations: Next, we report the relative performance of C OMP GCN using 5
relation basis vectors (B = 5) against C OMP GCN, which utilizes a separate vector for each relation
in the dataset. The results are presented in Figure 5. Overall, we find that across all different numbers
of relations, C OMP GCN, with a limited basis, gives comparable performance to the full model. The
results show that a parameter-efficient variant of C OMP GCN scales with the increasing number of
relations.
Comparison with R-GCN: Here, we perform a comparison of a parameter-efficient variant of
C OMP GCN (B = 5) against R-GCN on different number of relations. The results are depicted
in Figure 4. We observe that C OMP GCN with limited parameters consistently outperforms R-GCN
across all settings. Thus, C OMP GCN is parameter-efficient and more effective at encoding multi-
relational graphs than R-GCN.
In this section, we evaluate C OMP GCN on node and graph classification tasks on datasets as de-
scribed in Section 5.1. The experimental results are presented in Table 5. For node classification
task, we report accuracy on test split provided by Ristoski et al. (2016), whereas for graph classi-
fication, following Yanardag & Vishwanathan (2015) and Xu et al. (2019), we report the average
and standard deviation of validation accuracies across the 10 folds cross-validation. Overall, we find
that C OMP GCN outperforms all the baseline methods on node classification and gives a compara-
ble performance on graph classification task. This demonstrates the effectiveness of incorporating
relations using C OMP GCN over the existing GCN based models. On node classification, compared
to the best performing baseline, we obtain an average improvement of 3% across both datasets while
on graph classification, we obtain an improvement of 3% on PTC dataset.
7 C ONCLUSION
In this paper, we proposed C OMP GCN, a novel Graph Convolutional based framework for multi-
relational graphs which leverages a variety of composition operators from Knowledge Graph em-
bedding techniques to jointly embed nodes and relations in a graph. Our method generalizes
several existing multi-relational GCN methods. Moreover, our method alleviates the problem of
over-parameterization by sharing relation embeddings across layers and using basis decomposition.
Through extensive experiments on knowledge graph link prediction, node classification, and graph
classification tasks, we showed the effectiveness of C OMP GCN over existing GCN based methods
and demonstrated its scalability with increasing number of relations.
9
Published as a conference paper at ICLR 2020
ACKNOWLEDGMENTS
We thank the anonymous reviewers for their constructive comments. This work is supported in
part by the Ministry of Human Resource Development (Government of India) and Google PhD
Fellowship.
R EFERENCES
Ivana Balažević, Carl Allen, and Timothy M Hospedales. Hypernetwork knowledge graph embed-
dings. In International Conference on Artificial Neural Networks, 2019.
Daniel Beck, Gholamreza Haffari, and Trevor Cohn. Graph-to-sequence learning using gated graph
neural networks. In Iryna Gurevych and Yusuke Miyao (eds.), ACL 2018 - The 56th Annual Meet-
ing of the Association for Computational Linguistics, pp. 273–283. Association for Computational
Linguistics (ACL), 2018. ISBN 9781948087322.
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-
sana Yakhnenko. Translating embeddings for modeling multi-relational data. In
C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger
(eds.), Advances in Neural Information Processing Systems 26, pp. 2787–2795.
Curran Associates, Inc., 2013. URL http://papers.nips.cc/paper/
5071-translating-embeddings-for-modeling-multi-relational-data.
pdf.
Antoine Bordes, Sumit Chopra, and Jason Weston. Question answering with subgraph embeddings.
In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), pp. 615–620, Doha, Qatar, October 2014. Association for Computational Linguistics.
doi: 10.3115/v1/D14-1067. URL https://www.aclweb.org/anthology/D14-1067.
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally
connected networks on graphs. CoRR, abs/1312.6203, 2013. URL http://arxiv.org/
abs/1312.6203.
Liwei Cai and William Yang Wang. KBGAN: Adversarial learning for knowledge graph embed-
dings. In Proceedings of the 2018 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies, pp. 1470–1480, 2018. URL
https://www.aclweb.org/anthology/N18-1133.
Victor de Boer, Jan Wielemaker, Judith van Gent, Michiel Hildebrand, Antoine Isaac, Jacco van
Ossenbruggen, and Guus Schreiber. Supporting linked data production for cultural heritage in-
stitutes: The amsterdam museum case study. In Proceedings of the 9th International Conference
on The Semantic Web: Research and Applications, ESWC’12, pp. 733–747, Berlin, Heidelberg,
2012. Springer-Verlag. ISBN 978-3-642-30283-1. doi: 10.1007/978-3-642-30284-8_56. URL
http://dx.doi.org/10.1007/978-3-642-30284-8_56.
Asim Kumar Debnath, Rosa L. Lopez de Compadre, Gargi Debnath, Alan J. Shusterman, and Cor-
win Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro com-
pounds. correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal
Chemistry, 34(2):786–797, 1991. doi: 10.1021/jm00106a046. URL https://doi.org/10.
1021/jm00106a046.
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks
on graphs with fast localized spectral filtering. CoRR, abs/1606.09375, 2016. URL http:
//arxiv.org/abs/1606.09375.
Tim Dettmers, Minervini Pasquale, Stenetorp Pontus, and Sebastian Riedel. Convolutional 2d
knowledge graph embeddings. In Proceedings of the 32th AAAI Conference on Artificial Intelli-
gence, pp. 1811–1818, February 2018. URL https://arxiv.org/abs/1707.01476.
Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric. In
ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
10
Published as a conference paper at ICLR 2020
Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. Protein interface prediction using
graph convolutional networks. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30,
pp. 6530–6539. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/
7231-protein-interface-prediction-using-graph-convolutional-networks.
pdf.
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural
message passing for quantum chemistry. In Proceedings of the 34th International Conference
on Machine Learning - Volume 70, ICML’17, pp. 1263–1272. JMLR.org, 2017. URL http:
//dl.acm.org/citation.cfm?id=3305381.3305512.
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural
networks. In Yee Whye Teh and Mike Titterington (eds.), Proceedings of the Thirteenth Interna-
tional Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine
Learning Research, pp. 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
URL http://proceedings.mlr.press/v9/glorot10a.html.
William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In NIPS, 2017.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–
1780, November 1997. ISSN 0899-7667. doi: 10.1162/neco.1997.9.8.1735. URL http://dx.
doi.org/10.1162/neco.1997.9.8.1735.
Xiaotian Jiang, Quan Wang, and Bin Wang. Adaptive convolution for multi-relational learn-
ing. In Proceedings of the 2019 Conference of the North American Chapter of the Associ-
ation for Computational Linguistics: Human Language Technologies, 2019. URL https:
//www.aclweb.org/anthology/N19-1103".
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. 12 2014.
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional net-
works. CoRR, abs/1609.02907, 2016. URL http://arxiv.org/abs/1609.02907.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q.
Weinberger (eds.), Advances in Neural Information Processing Systems 25, pp. 1097–
1105. Curran Associates, Inc., 2012. URL http://papers.nips.cc/paper/
4824-imagenet-classification-with-deep-convolutional-neural-networks.
pdf.
Diego Marcheggiani and Ivan Titov. Encoding sentences with graph convolutional networks for
semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing, pp. 1506–1515. Association for Computational Linguistics, 2017. URL
http://aclweb.org/anthology/D17-1159.
George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41, Novem-
ber 1995. ISSN 0001-0782. doi: 10.1145/219717.219748. URL http://doi.acm.org/
10.1145/219717.219748.
Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan Günnemann, and
Michael M. Bronstein. Dual-primal graph convolutional networks. CoRR, abs/1806.00770, 2018.
URL http://arxiv.org/abs/1806.00770.
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. A novel embedding
model for knowledge base completion based on convolutional neural network. In Proceedings
of the 2018 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 327–333. Association
for Computational Linguistics, 2018. doi: 10.18653/v1/N18-2053. URL http://aclweb.
org/anthology/N18-2053.
11
Published as a conference paper at ICLR 2020
M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for
knowledge graphs. Proceedings of the IEEE, 104(1):11–33, Jan 2016. ISSN 0018-9219. doi:
10.1109/JPROC.2015.2483592.
Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. Holographic embeddings of knowledge
graphs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16,
pp. 1955–1961. AAAI Press, 2016. URL http://dl.acm.org/citation.cfm?id=
3016100.3016172.
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural net-
works for graphs. In Proceedings of the 33rd International Conference on International Con-
ference on Machine Learning - Volume 48, ICML’16, pp. 2014–2023. JMLR.org, 2016. URL
http://dl.acm.org/citation.cfm?id=3045390.3045603.
Heiko Paulheim and Johannes Fümkranz. Unsupervised generation of data mining features from
linked open data. In Proceedings of the 2Nd International Conference on Web Intelligence, Mining
and Semantics, WIMS ’12, pp. 31:1–31:12, New York, NY, USA, 2012. ACM. ISBN 978-1-
4503-0915-8. doi: 10.1145/2254129.2254168. URL http://doi.acm.org/10.1145/
2254129.2254168.
Bharath Ramsundar, Peter Eastman, Patrick Walters, Vijay Pande, Karl Leswing, and Zhenqin Wu.
Deep Learning for the Life Sciences. O’Reilly Media, 2019. https://www.amazon.com/
Deep-Learning-Life-Sciences-Microscopy/dp/1492039837.
Petar Ristoski and Heiko Paulheim. Rdf2vec: Rdf graph embeddings for data mining. In Interna-
tional Semantic Web Conference, pp. 498–514. Springer, 2016.
Petar Ristoski, Gerben Klaas Dirk de Vries, and Heiko Paulheim. A collection of benchmark datasets
for systematic evaluations of machine learning on the semantic web. In Paul Groth, Elena Sim-
perl, Alasdair Gray, Marta Sabou, Markus Krötzsch, Freddy Lecue, Fabian Flöck, and Yolanda
Gil (eds.), The Semantic Web – ISWC 2016, pp. 186–194, Cham, 2016. Springer International
Publishing. ISBN 978-3-319-46547-0.
Soumya Sanyal, Janakiraman Balachandran, Naganand Yadati, Abhishek Kumar, Padmini Ra-
jagopalan, Suchismita Sanyal, and Partha Talukdar. Mt-cgcnn: Integrating crystal graph con-
volutional neural network with multitask learning for material property prediction. arXiv preprint
arXiv:1811.05660, 2018.
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and
Max Welling. Modeling relational data with graph convolutional networks. arXiv preprint
arXiv:1703.06103, 2017.
Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, and Bowen Zhou. End-to-end structure-
aware convolutional networks for knowledge base completion, 2019.
Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M.
Borgwardt. Weisfeiler-lehman graph kernels. J. Mach. Learn. Res., 12:2539–2561, Novem-
ber 2011. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1953048.
2078187.
Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural
tensor networks for knowledge base completion. In C. J. C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems
26, pp. 926–934. Curran Associates, Inc., 2013. URL http://papers.nips.cc/paper/
5028-reasoning-with-neural-tensor-networks-for-knowledge-base-completion.
pdf.
A. Srinivasan, R. D. King, S. H. Muggleton, and M. J. E. Sternberg. The predictive toxicology
evaluation challenge. In Proceedings of the 15th International Joint Conference on Artifical In-
telligence - Volume 1, IJCAI’97, pp. 4–9, San Francisco, CA, USA, 1997. Morgan Kaufmann
Publishers Inc. ISBN 1-555860-480-4. URL http://dl.acm.org/citation.cfm?id=
1624162.1624163.
12
Published as a conference paper at ICLR 2020
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by
relational rotation in complex space. In International Conference on Learning Representations,
2019. URL https://openreview.net/forum?id=HkgEQnRqYQ.
Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and text
inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their
Compositionality, pp. 57–66, 2015.
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Com-
plex embeddings for simple link prediction. In Proceedings of the 33rd International Con-
ference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 2071–
2080. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.
3045609.
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua
Bengio. Graph Attention Networks. International Conference on Learning Representations,
2018. URL https://openreview.net/forum?id=rJXMpikCZ. accepted as poster.
Q. Wang, Z. Mao, B. Wang, and L. Guo. Knowledge graph embedding: A survey of approaches and
applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743, Dec
2017. ISSN 1041-4347. doi: 10.1109/TKDE.2017.2754499.
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by trans-
lating on hyperplanes, 2014a. URL https://www.aaai.org/ocs/index.php/AAAI/
AAAI14/paper/view/8531.
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by
translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial
Intelligence, AAAI’14, pp. 1112–1119. AAAI Press, 2014b. URL http://dl.acm.org/
citation.cfm?id=2893873.2894046.
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural
networks? In International Conference on Learning Representations, 2019. URL https:
//openreview.net/forum?id=ryGs6iA5Km.
Pinar Yanardag and S.V.N. Vishwanathan. Deep graph kernels. In Proceedings of the 21th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pp.
1365–1374, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3664-2. doi: 10.1145/2783258.
2783417. URL http://doi.acm.org/10.1145/2783258.2783417.
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and
relations for learning and inference in knowledge bases. CoRR, abs/1412.6575, 2014. URL
http://arxiv.org/abs/1412.6575.
Rui Ye, Xin Li, Yujie Fang, Hongyu Zang, and Mingzhong Wang. A vectorized relational graph con-
volutional network for multi-relational network alignment. In Proceedings of the Twenty-Eighth
International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 4135–4141. International
Joint Conferences on Artificial Intelligence Organization, 7 2019. doi: 10.24963/ijcai.2019/574.
URL https://doi.org/10.24963/ijcai.2019/574.
Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning
architecture for graph classification. In AAAI, pp. 4438–4445, 2018.
13
Published as a conference paper at ICLR 2020
A A PPENDIX
In this section, we investigate the performance of C OMP GCN on link prediction for different relation
categories on FB15k-237 dataset. Following Wang et al. (2014a); Sun et al. (2019), based on the
average number of tails per head and heads per tail, we divide the relations into four categories:
one-to-one, one-to-many, many-to-one and many-to-many. The results are summarized in Table 6.
We observe that using GCN based encoders for obtaining entity and relation embeddings helps to
improve performance on all types of relations. In the case of one-to-one relations, C OMP GCN gives
an average improvement of around 10% on MRR compared to the best performing baseline (ConvE
+ W-GCN). For one-to-many, many-to-one, and many-to-many the corresponding improvements are
10.5%, 7.5%, and 4%. These results show that C OMP GCN is effective at handling both simple and
complex relations.
1-N 0.068 922 0.116 0.093 612 0.187 0.112 604 0.190
N-1 0.438 123 0.638 0.454 101 0.647 0.471 99 0.656
N-N 0.246 189 0.436 0.261 169 0.459 0.275 179 0.474
1-1 0.177 402 0.391 0.406 319 0.531 0.453 193 0.589
Tail Pred
Table 6: Results on link prediction by relation category on FB15k-237 dataset. Following Wang et al.
(2014a), the relations are divided into four categories: one-to-one (1-1), one-to-many (1-N), many-
to-one (N-1), and many-to-many (N-N). We find that C OMP GCN helps to improve performance on
all types of relations compared to existing methods. Please refer to Section A.1 for more details.
In this section, we provide the details of the different datasets used in the experiments. For link
prediction, we use the following two datasets:
• FB15k-237 (Toutanova & Chen, 2015) is a pruned version of FB15k (Bordes et al., 2013) dataset
with inverse relations removed to prevent direct inference.
• WN18RR (Dettmers et al., 2018), similar to FB15k-237, is a subset from WN18 (Bordes et al.,
2013) dataset which is derived from WordNet (Miller, 1995).
For node classification, similar to Schlichtkrull et al. (2017), we evaluate on the following two
datasets:
• MUTAG (Node) is a dataset from DL-Learner toolkit4 . It contains relationship between complex
molecules and the task is to identify whether a molecule is carcinogenic or not.
• AM dataset contains relationship between different artifacts in Amsterdam Museum (de Boer
et al., 2012). The goal is to predict the category of a given artifact based on its links and other
attributes.
Finally, for graph classification, similar to Xu et al. (2019), we evaluate on the following datasets:
• MUTAG (Graph) Debnath et al. (1991) is a bioinformatics dataset of 188 mutagenic aromatic
and nitro compounds. The graphs need to be categorized into two classes based on their mutagenic
effect on a bacterium.
4
http://www.dl-learner.org
14
Published as a conference paper at ICLR 2020
Table 7: The details of the datasets used for node classification, link prediction, and graph classifi-
cation tasks. Please refer to Section 5.1 for more details.
• PTC Srinivasan et al. (1997) is a dataset consisting of 344 chemical compounds which indicate
carcinogenicity of male and female rats. The task is to label the graphs based on their carcino-
genicity on rodents.
A summary statistics of all the datasets used is presented in Table 7.
A.3 H YPERPARAMETERS
Here, we present the implementation details for each task used for evaluation in the paper. For all
the tasks, we used C OMP GCN build on PyTorch geometric framework (Fey & Lenssen, 2019).
Link Prediction: For evaluation, 200-dimensional embeddings for node and relation embeddings
are used. For selecting the best model we perform a hyperparameter search using the validation data
over the values listed in Table 8. For training link prediction models, we use the standard binary
cross entropy loss with label smoothing Dettmers et al. (2018).
Node Classification: Following Schlichtkrull et al. (2017), we use 10% training data as validation
for selecting the best model for both the datasets. We restrict the number of hidden units to 32. We
use cross-entropy loss for training our model.
Graph Classification: Similar to Yanardag & Vishwanathan (2015); Xu et al. (2019), we report the
mean and standard deviation of validation accuracies across the 10 folds cross-validation. Cross-
entropy loss is used for training the entire model. For obtaining the graph-level representation, we
use simple averaging of embedding of all nodes as the readout function, i.e.,
1 X
hG = hv ,
|V|
v∈V
Hyperparameter Values
Number of GCN Layer (K) {1, 2, 3}
Learning rate {0.001, 0.0001}
Batch size {128, 256}
Dropout {0.0, 0.1, 0.2, 0.3}
Table 8: Details of hyperparameters used for link prediction task. Please refer to Section A.3 for
more details.
15