1218 Link Prediction in Hypergraphs
1218 Link Prediction in Hypergraphs
L INK
PREDICTION IN H YPERGRAPHS USING G RAPH
CONVOLUTIONAL NETWORKS
Anonymous authors
Paper under double-blind review
A BSTRACT
1 I NTRODUCTION
The problem of link prediction in graphs has numerous applications in the fields of social network
analysis (Liben-Nowell & Kleinberg, 2003), knowledge bases (Nickel et al., 2016), bioinformatics
(Lü & Zhou, 2011) to name a few. However, in many real-world problems relationships go beyond
pairwise associations. For example, in chemical reactions data the relationship representing a group
of chemical compounds that can react is inherently higher-order and similarly, the co-authorship
relationship in a citation network is higher-order etc. Hypergraphs provide a natural way to model
such higher-order complex relations. Hyperlink prediction is the problem of predicting such missing
higher-order relationships in a hypergraph.
Besides the higher-order relationships, modeling the direction information between these relationships
is also useful in many practical applications. For example, in the chemical reactions data, in addition to
predicting groups of chemical compounds which form reactants and/or products, it is also important to
predict the direction between reactants and products, i.e., a group of reactants react to give a group of
products. Directed hypergraphs (Gallo et al., 1993) provide a way to model the direction information
in hypergraphs. Similar to the undirected hypergraphs, predicting the missing hyperlinks in a directed
hypergraph is also useful in practical settings. Figure 1 illustrates the difference between modeling
the chemical reactions data using undirected and directed hypergraphs. Most of the previous work on
hyperlink prediction (Zhou et al., 2006; Zhang et al., 2018) focus only on undirected hypergraphs. In
this work we focus both on undirected and directed hypergraphs.
Recently, Graph Convolutional Networks (GCNs) (Kipf & Welling, 2017) have emerged as a powerful
tool for representation learning on graphs. GCNs have also been successfully applied for link
prediction on normal graphs (Zhang & Chen, 2018; van den Berg et al., 2018; Schlichtkrull et al.,
2018; Kipf & Welling, 2016). Inspired by the success of GCNs for link prediction in graphs and deep
learning in general Wang et al. (2017), we propose a GCN-based framework for hyperlink prediction
which works for both undirected and directed hypergraphs. We make the following contributions:
1
Under review as a conference paper at ICLR 2019
Figure 1: Illustrating the difference between modeling chemical reactions data using undirected
and directed hypergraphs. To the left is the undirected hypergraph, in which both the reactants and
products are present in the same hyperlink. Whereas in the directed hypergraph (to the right), for a
given reaction, the reactants are connected by one hyperlink and products are connected by another
hyperlink and both these hyperlinks are connected by a directed link.
2 R ELATED WORK
In this section, we briefly review related work in deep learning on graphs and link prediction on
hypergraphs.
Learning representations on graphs: The key advancements in learning low-dimensional node
representations in graphs include matrix factorisation-based methods, random-walk based algorithms,
and deep learning on graphs (Hamilton et al., 2017). Our work is based on deep learning on graphs.
Geometric deep learning (Bronstein et al., 2017) is an umbrella phrase for emerging techniques
attempting to generalise (structured) deep neural network models to non-Euclidean domains such as
graphs and manifolds. The earliest attempts to generalise neural networks to graphs embed each node
in an Euclidean space with a recurrent neural network (RNN) and use those embeddings as features
for classification or regression of nodes or graphs (Gori et al., 2005; Scarselli et al., 2009).
A CNN-like deep neural neural network on graphs was later formulated in the spectral domain in a
pioneering work (Bruna et al., 2014) by a mathematically sound definition of convolution on graph
employing the analogy between the classical Fourier transforms and projections onto the eigen basis
of the graph Laplacian operator (Hammond et al., 2011). Initial works proposed to learn smooth
spectral multipliers of the graph Laplacian, although at high computational cost (Bruna et al., 2014;
Henaff et al., 2015). To resolve the computational bottleneck and avoid the expensive computation
of eigenvectors, the ChebNet framework (Defferrard et al., 2016) learns Chebyshev polynomials of
the graph Laplacian (hence the name ChebNet). The graph convolutional network (GCN) (Kipf &
Welling, 2017) is a simplified ChebNet framework that uses simple filters operating on 1-hop local
neighborhoods of the graph.
A second formulation of convolution on graph is in the spatial domain (or equivalently in the vertex
domain) where the localisation property is provided by construction. One of the first formulations
of a spatial CNN-like neural network on graph generalised standard molecular feature extraction
methods based on circular fingerprints (Duvenaud et al., 2015). Subsequently, all of the above types
2
Under review as a conference paper at ICLR 2019
(RNN, spectral CNN, spatial CNN on graph) were unified into a single message passing neural
network (MPNN) framework (Gilmer et al., 2017) and a variant of MPNN has been shown to achieve
state-of-the-art results on an important molecular property prediction benchmark.
The reader is referred to a comprehensive literature review (Bronstein et al., 2017) and a survey
(Hamilton et al., 2017) on the topic of deep learning on graphs and learning representation on graphs
respectively. Below, we give an overview of related research in link prediction on hypergraphs where
relationships go beyond pairwise.
Link Prediction on hypergraphs: Machine learning on hypergraphs was introduced in a seminal
work (Zhou et al., 2006) that generalised the powerful methodology of spectral clustering to hyper-
graphs and further inspired algorithms for hypergraph embedding and semi-supervised classification
of hypernodes.
Link prediction on hypergraph (hyperlink prediction) has been especially popular for social networks
to predict higher-order links such as a user releases a tweet containing a hashtag (Li et al., 2013) and
to predict metadata information such as tags, groups, labels, users for entities (images from Flickr)
(Arya & Worring, 2018). Techniques for hyperlink prediction on social networks include ranking
for link proximity information (Li et al., 2013) and matrix completion on the (incomplete) incidence
matrix of the hypergraph (Arya & Worring, 2018; Monti et al., 2017). Hyperlink prediction has also
been helpful to predict multi-actor collaborations (Sharma et al., 2014).
In other works, a dual hypergraph has been constructed from the intial (primal) hypergraph to cast
the hyperlink prediction as an instance of vertex classification problem on the dual hypergraph
(Lugo-Martinez & Radivojac, 2017). Coordinated matrix maximisation (CMM) predicts hyperlinks
in the adjacency space with non-negative matrix factorisation and least square matching performed
alternately in the vertex adjacency space (Zhang et al., 2018). CMM uses expectation maximisation
algorithm for optimisation for hyperlink prediction tasks such as predicting missing reactions of
organisms’ metabolic networks.
3 P RELIMINARIES
Undirected hypergraph is an ordered pair H = (V, E) where V = {v1 , · · · , vn } is a set of n
hypernodes and E = {e1 , · · · , em } ⊆ 2V is a set of m hyperlinks.
The problem of link prediction in an incomplete undirected hypergraph H involves predicting missing
hyperlinks from Ē = 2V − E based on the current set of observed hyperlinks E. The number
of hypernodes in any given hyperlink e ∈ E can be any integer between 1 and 2n . This variable
cardinality of a hyperlink makes traditional graph-based link prediction methods infeasible because
they are based on exactly two input features (those of the two nodes potentially forming a link). The
variable cardinality problem also results in an exponentially large inference space because the total
number of potential hyperlinks is O(2n ). However, in practical cases, there is no need to consider all
the hyperlinks in Ē as most of them can be easily filtered out (Zhang et al., 2018). For example, for
the task of finding missing metabolic reactions, we can restrict hyperlink prediction to all feasible
reactions because the infeasible reactions seldom have biological meanings. In other cases such as
predicting multi-author collaborations of academic/technical papers, hyperlinks have cardinalities
less than a small number, as papers seldom have more than 6 authors. The number of restricted
hyperlinks in such practical cases is not exponential and hence hyperlink prediction on the restricted
set of hyperlinks becomes a feasible problem.
Formally, a hyperlink prediction problem (Zhang et al., 2018) is a tuple (H, E), where H = (V, E)
is a given incomplete hypergraph and E is a set of (restricted) candidate hyperlinks with E ⊆ E. The
problem is to find the most likely hyperlinks missing in H from the set of hyperlinks E − E.
Directed hypergraph (Gallo et al., 1993) is an ordered pair H = (V, E) where V = {v1 , · · · , vn }
is a set of n hypernodes and E = {(t1 , h1 ), · · · , (tm , hm )} ⊆ 2V is a set of m directed hyperlinks.
Each e ∈ E is denoted by (t, h) where t ⊆ V is the tail and h ⊆ V is the head with t 6= Φ, h 6= Φ.
As shown in figure 1, chemical reactions can be modeled by directed hyperlinks with chemical
substances forming the set V . Observe that this model captures and is general enough to subsume
previous graph models:
• an undirected hyperlink is the special case when t = h
3
Under review as a conference paper at ICLR 2019
• a directed simple link (edge) is the special case when |t| = |h| = 1
Similar to the undirected case, the directed hyperlink prediction problem is a tuple (H, E), where
H = (V, E) is a given incomplete directed hypergraph and E is a set of candidate hyperlinks with
E ⊆ E. The problem is to find the most likely hyperlinks missing in H from the set of hyperlinks
E − E.
Given an undirected hypergraph H = (V, E), as a first step NHP constructs a dual hypergraph
H ∗ = (V ∗ , E ∗ ) of H defined as follows.
Hypergraph Duality (Scheinerman & Ullman, 2011) The dual hypergraph of a hypergraph H =
(V, E) with V = {v1 , . . . , vn } and E = {e1 , . . . , em }, denoted by H ∗ = (V ∗ , E ∗ ) is obtained by
taking V ∗ = E as the set of hypernodes and E ∗ = {e∗1 , . . . , e∗n } such that e∗i = {e ∈ E : vi ∈ e}
with e∗i corresponding to vi for i = 1, . . . , n. The vertex-hyperedge incidence matrices of H and H ∗
are transposes of each other.
The problem of link prediction in H can be posed as a binary node classification problem in H ∗
(Lugo-Martinez & Radivojac, 2017). A label +1 on a node in H ∗ indicates the presence of a hyperlink
in H and a label of −1 indicates the absence. For the problem of semi-supervised node classification
on the dual hypergraph H ∗ , we use Graph Convolutional Networks (GCN) on the graph obtained
from the clique expansion of H ∗ . Clique expansion (Zhou et al., 2006; Feng et al., 2019) is a standard
and a simple way of approximating a hypergraph into a planar graph by replacing every hyperlink of
size s with an s-clique (Feng et al., 2018).
Graph Convolutional Network (Kipf & Welling, 2017) Let G = (V, E), with N = |V|, be a simple
undirected graph with the adjacency matrix A ∈ RN ×N , and let the data matrix be X ∈ RN ×p . The
data matrix has p-dimensional real-valued vector representations for each node in the graph. The
forward model for a simple two-layer GCN takes the following simple form:
!
(0) (1)
Z = fGCN (X, A) = softmax Ā ReLU ĀXΘ Θ , (1)
1 1 PN
where Ā = D̃− 2 ÃD̃− 2 , Ã = A + I, and D̃ii = j=1 Ãij . Θ(0) ∈ Rp×h is an input-to-hidden
weight matrix for a hidden layer with h hidden units and Θ(1) ∈ Rh×r is a hidden-to-output weight
i)
matrix. The softmax activation function is defined as softmax(xi ) = Pexp(x
exp(xj ) and applied row-wise.
j
For semi-supervised multi-class classification with q classes, we minimise the cross-entropy error
over the set of labeled examples, VL .
q
XX
L=− Yij ln Zij . (2)
i∈VL j=1
The weights of the graph convolutional network, viz. Θ(0) and Θ(1) , are trained using gradient
descent.
We note that Lugo-Martinez & Radivojac (2017) also follow a similar approach of constructing dual
graph followed by node classification for hyperlink prediction. However, ours is a deep learning based
approach and (Lugo-Martinez & Radivojac, 2017) do not perform link prediction in the directed
hypergraphs.
4
Under review as a conference paper at ICLR 2019
Figure 2: (best seen in colour) The proposed NHP framework. We convert the hypergraph with the
observed hyperlinks and the candidate hyperlinks into its dual in which hyperlinks are converted into
hypernodes. We then use a technique from positive unlaballed learning to get plausible negatively
labelled hypernodes. The clique expansion of the hypergraph is used to approximate the hypergraph.
Then a GCN is run on the graph to classify the unlabelled hypernodes. A label of +1 on e1 indicates
presence of the e1 in the primal. For more details, refer to section 4.
Learning on hypergraphs in the positive unlabelled setting The cross-entropy objective of GCN
in equation 2 inherently assumes the presence of labeled data from at least two different classes
and hence cannot be directly used for the positive unlabeled setting. A challenging variant of
semi-supervised learning is positive-unlabelled learning (Elkan & Noto, 2008) which arises in many
real-world applications such as text classification, recommender systems, etc. In this setting, only a
limited amount of positive examples are available and the rest are unlabelled examples.
In positive unlabelled learning framework, we construct a plausible negative sampling set based on
data similarity (Han & Shen, 2016). We calculate the average similarity of each unlabelled point
u ∈ E − E, to the positive labelled examples in E:
1 X
su = S f (u), f (e) (3)
|E|
e∈E
where S represents a similarity function and f represents a map that maps an example to a d-
dimensional emebedding space f : E → Rd . We then rank the unlabelled training examples in E − E
in the ascending order of their average similarity scores. We then select the top ones in the ascending
order (i.e. with the lowest similarity values) to construct the set of plausible negative examples
F ⊆ E − E. The intuition here is that the set of plausible negative examples contain examples
most dissimilar to the positive examples. The GCN on the hypergraph can subsequently be run by
minimising the objective 2 over the positive examples in E and the plausible negative examples in F
i.e. VL = E ∪ F .
2
XX X 1
X
Ljoint = Lu + Ld = − Yij ln Zij + − dijk ln Dijk . (4)
i∈VL j=1 (i,j)∈WL k=0
5
Under review as a conference paper at ICLR 2019
We denote WL+ = {(t, h) ∈ E} to be a set of positively labelled directed pairs in loss Ld . In other
words dij1 = 1 and dij0 = 0 for all (i, j) ∈ WL+ . The tail hyperlink and the corresponfding head
hyperlink are separate hypernodes in the dual and form directed hyperlinks in the primal (with the
direction from t to h). The set WL+ consists of directed hyperlinks that currently exist in the given
directed hypergraph. Note that, for loss Lu , the set of positively labelled hypernodes will be,
[
VL+ = t ∪ h. (5)
+
(t,h)∈WL
We sample |WL+ | = |E| hypernodes (in the dual) from the unlabelled data using the positive
unlabelled approach of 3 to get the set of WL− pairs. We label these pairs negative i.e. dij1 = 0
and dij0 = 1 for all (i, j) ∈ WL− . We construct VL− = (t,h)∈W − t ∪ h similarly as in 5. The sets
S
L
VL = VL+ ∪ VL− and WL = WL+ ∪ WL− are used to minimise the objective 6.
To explain how D is computed, we rewrite equation 1 as:
!
(0) (1)
Z = fGCN (X, A) = softmax Ā ReLU ĀXΘ Θ = softmax(X ), (6)
We use D = g(x1 , x2 ) with g being a function that takes the dual hypernode representations x1 ∈ X
and x2 ∈ X and is parameterised by for example a simple neural network. In the experiments, we
used a simple 2-layer multilayer perceptron on the concatenated embeddings x1 ||x2 i.e.
g(x1 , x2 ) = MLPΘ x1 ||x2 .
6
Under review as a conference paper at ICLR 2019
Table 3: mean AUC (higher is better) over 10 trials. NHP achieves consistently superior performance
over its baselines for all the datasets. Refer to section 5 for more details.
dataset iAF692 iHN637 iAF1260b iJO1366 CORA DBLP
SHC (Zhou et al., 2006) 248 ± 6 289 ± 4 1025 ± 4 1104 ± 19 1056 ± 14 845 ± 18
node2vec (Grover & Leskovec, 2016) 299 ± 10 303 ± 4 1100 ± 13 1221 ± 21 1369 ± 15 813 ± 9
CMM (Zhang et al., 2018) 170 ± 6 225 ± 10 827 ± 1 963 ± 15 1452 ± 13 651 ± 20
GCN on star expansion (Ying et al., 2018) 174 ± 5 219 ± 12 649 ± 10 568 ± 18 1003 ± 14 646 ± 15
NHP-U (ours) 313 ± 6 360 ± 5 1258 ± 9 1381 ± 9 1476 ± 20 866 ± 15
# missing hyperlinks, |∆E| 621 706 2149 2324 2437 1431
Recall@ |∆E| for NHP-U 0.50 0.51 0.58 0.59 0.60 0.61
Table 4: mean (± std) number of hyperlinks recovered over 10 trials (higher is better) among the top
ranked |∆E| hyperlinks. NHP achieves consistently superior performance over its baselines for all
the datasets. Refer to section 5 for more details.
For each dataset we randomly sampled 10% of the hypernodes to get E and then we sampled an equal
number of negatively-labelled hypernodes from E − E in the positive-unlabelled setting of 3. To get
the feature matrix X, we used random 32−dimensional Gaussian features (p = 32 in equation 1)
for the metabolic networks and bag-of-word features shown in table 2 for the coauthorship datasets.
For fake papers we generated random Gaussian bag-of-word features. We used node2vec (Grover
& Leskovec, 2016) to learn low dimensional embedding mapping, f : E → Rd with d = 128. We
used the clique expansion of the dual hypergraph as input graph to node2vec and cosine similarity to
compute similarity between two embeddings.
We compared NHP against the following state-of-the-art baselines for the same E as constructed
above.
• Spectral hypergraph Clustering (SHC) (Zhou et al., 2006): SHC outputs classification
scores by f = (I − ξΘ)−1 y. We used SHC on the dual hypergraph.
• node2vec (Grover & Leskovec, 2016): One of the most popular node embedding ap-
proaches. We note that (Grover & Leskovec, 2016) have shown node2vec to be superior
to DeepWalk (Perozzi et al., 2014) and LINE (Tang et al., 2015) and hence we compared
against only node2vec. We used node2vec to embed the nodes of the clique expansion (of
the dual). We then used an MLP on the embeddings with the semi-supervised objective of
equation 2 in the positive unlabelled setting of equation 3.
• Co-ordinated Matrix Maximisation (CMM) (Zhang et al., 2018): The matrix
factorisation-based CMM technique uses the EM algorithm to determine the presence
or absence of candidate hyperlinks.
• GCN on star expansion (Ying et al., 2018): PinSage (Ying et al., 2018) is a GCN-based
method designed to work on the (web-scale) bipartite graph of Pinterest. The Pinterest
7
Under review as a conference paper at ICLR 2019
graph can be seen as the star expansion of a hypergraph (Agarwal et al., 2006) with pins
(hypernodes) on one side of the partition and boards (hyperlinks) on the other side. We have
compared NHP against star expansion. This is essentially approximating the hypergraph
with its star exapansion and then running a GCN over it (instead of the clique expansion of
NHP).
Similar to (Zhang et al., 2018), we report mean AUC over 10 trials in table 3 and the mean number of
hyperlinks recovered over 10 trials in the top ranked |∆E| ones in table 4. Note that ∆E ⊂ E is the
set of missing hyperlinks with |∆E| = 0.9 ∗ |E|. As we can observe, we consistently outperform the
baselines in both the metrics. We believe this is because of the powerful non-linear feature extraction
capability of GCNs. We also report Recall@ |∆E| for NHP-U for all datasets (to make the numbers
across datasets somewhat comparable). It is got through dividing the mean number of hyperlinks
recovered by ∆E.
Table 6: mean AUC over 10 trials for all the datasets. Both the proposed models achieve similar
results. Refer to section 6 for more details.
Table 7: mean (± std) number of hyperlinks recovered over 10 trials (higher is better) among the top
ranked |∆E| hyperlinks. Both the proposed models achieve similar results. Refer to section 6 for
more details.
We used the same four metabolic networks to construct directed hyperlinks. The metabolic reactions
are encoded by stoichiometric matrices. The negative entries in a stoichiometric matrix indicate
reactants and positive entries indicate products. We extracted only those reactions which have at
least two substances in each of the reactant side and the product side and the statistics are shown in
table 5. We labelled randomly sampled 10% of the hyperlinks in the data and use the remaining 90%
unlabelled data for testing. Tables 6 and 7 show the results on the datasets. NHP-D (joint) is the
model proposed in 4.2. On the other hand, NHP-D (sequential) is the model which treats undirected
hyperlink prediction and direction prediction separately. NHP-D (sequential) first runs a GCN on the
clique expansion of the undirected hypergraph to get the node embeddings (without softmax) and
then runs a multi-layer perceptron on the concatenated emebeddings to predict the directions.
8
Under review as a conference paper at ICLR 2019
Table 8: mean AUC (higher is better) over 10 trials. Mixed consistently achieves the best AUC
values. It provides benefits of both positive unlabeled learning and random negative sampling. Refer
to section 7 for more details.
dataset iAF692 iHN637 iAF1260b iJO1366 CORA DBLP
random negative sampling 236 ± 32 415 ± 47 967 ± 125 1074 ± 168 978 ± 181 543 ± 125
positive-unlabeled learning 313 ± 6 360 ± 5 1258 ± 9 1381 ± 9 1476 ± 20 866 ± 15
mixed 317 ± 11 481 ± 13 1202 ± 29 1336 ± 34 1496 ± 62 813 ± 36
# missing hyperlinks, |∆E| 621 706 2149 2324 2437 1431
Table 9: mean (± std) number of hyperlinks recovered over 10 trials among the top ranked |∆E|
hyperlinks. Positive unlabeled learning of 3 achieves consistently lower standard deviations than the
the other two. The standard deviations of random negative sampling are on the higher side. Refer to
section 7 for more details.
• node2vec (Grover & Leskovec, 2016) + MLP: We used node2vec for the undirected
hyperlink prediction part (as explained in section 5 and a 2-layer perceptron to predict the
direction between hyperlinks with the joint objective of equation 4.
• Co-ordinated Matrix Maximisation (CMM) (Zhang et al., 2018) + MLP: The matrix
factorisation-based CMM technique uses the EM algorithm to determine the presence or
absence of candidate hyperlinks for the following optimisation problem:
min ||A + U ΛU T − W W T ||2F
Λ,W
As we see in the tables, both NHP-D (joint) and NHP-D (sequential) perform similarly. This can
be attributed to the fact that training data to predict directions between hyperlinks is sparse and
hence the learned hypernode representations of both the models are similar. Please note that existing
apporaches for link prediction on directed simple graphs cannot be trivially adopted for this problem
because of the sparsity in the training data.
Comparison to baselines NHP-D outperforms the baselines on 3 out of 4 datasets. The dataset
iHN637 seems to be a very challenging dataset on which each model recovers less than half the
number of missing hyperlinks.
9
Under review as a conference paper at ICLR 2019
from E − E as random negative sampling. We have called negative samples chosen through positive
unlabeled learning of equation 3, i.e. NHP-U, as positive-unlabeled learning. Note that the numbers
corresponding to this row are the same as those in tables 3 and 4.
In addition to the above two, we used positive unlabeled learning technique of equation 3 to sort
the hyperedges (primal) in nondecreasing order of their similarities and then selected uniformly
at random from only the first half of the sorted order (i.e. most dissimilar hyperedges). We have
called this technique mixed as, intuitively, it provides benefits of both positive unlabeled learning and
uniform random negative sampling. More principled approaches than mixed is left for future work.
D ISCUSSION OF RESULTS
As we can see in table 9, the standard deviations of random negative sampling are on the higher side.
This is expected as the particular choice made for negative samples decides the decision boundary
for the binary classifier. The superior AUC values of mixed in table 8 supports our intuition that
it provides benefits of both positive unlabeled learning and uniform random negative sampling.
The standard deviations of mixed are much lower but still higher than positive-unlabeled learning.
In general, summarising the results for all datasets, we believe that positive-unlabeled learning is
superior to random negative sampling because of the higher confidence (low standard deviation)
predictions.
R EFERENCES
Sameer Agarwal, Kristin Branson, and Serge J. Belongie. Higher order learning with graphs. In
ICML, 2006. 8.
Devanshu Arya and Marcel Worring. Exploiting relational information in social networks using
geometric deep learning on hypergraphs. In ICMR, 2018. 3.
Haoli Bai, Zhuangbin Chen, Michael R. Lyu, Irwin King, and Zenglin Xu. Neural relational topic
models for scientific article analysis. In CIKM, 2018. 7.
A. L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution of the social
network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 2002.
7.
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric
deep learning: Going beyond euclidean data. IEEE Signal Process. Mag., 2017. 2 and 3.
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally
connected networks on graphs. In ICLR, 2014. 2.
N. Chen, J. Zhu, F. Xia, and B. Zhang. Discriminative relational topic models. PAMI, 2015. 7.
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on
graphs with fast localized spectral filtering. In NIPS, 2016. 2.
10
Under review as a conference paper at ICLR 2019
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan
Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular
fingerprints. In NIPS, 2015. 2.
Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. In KDD,
2008. 5.
Fuli Feng, Xiangnan He, Yiqun Liu, Liqiang Nie, and Tat-Seng Chua. Learning on partial-order
hypergraphs. In WWW, 2018. 4 and 10.
Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks.
In AAAI, 2019. 4.
Giorgio Gallo, Giustino Longo, Stefano Pallottino, and Sang Nguyen. Directed hypergraphs and
applications. Discrete Appl. Math., 1993. 1 and 3.
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural
message passing for quantum chemistry. In ICML, 2017. 3.
Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains.
In IJCNN, 2005. 2.
Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In KDD, 2016.
7, 8, and 9.
William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods
and applications. IEEE Data Eng. Bull., 2017. 2 and 3.
David K. Hammond, Pierre Vandergheynst, and Rémi Gribonval. Wavelets on graphs via spectral
graph theory. Applied and Computational Harmonic Analysis, 2011. 2.
Yufei Han and Yun Shen. Partially supervised graph embedding for positive unlabelled feature
selection. In IJCAI, 2016. 5.
Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data.
CoRR, arXiv:1506.05163, 2015. 2.
Thomas N Kipf and Max Welling. Variational graph auto-encoders. NIPS Workshop on Bayesian
Deep Learning, 2016. 1.
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.
In ICLR, 2017. 1, 2, 4, and 12.
Dong Li, Zhiming Xu, Sheng Li, and Xin Sun. Link prediction in social networks based on hypergraph.
In WWW, 2013. 3.
Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for
semi-supervised learning. In AAAI, 2018. 10.
David Liben-Nowell and Jon Kleinberg. The link prediction problem for social networks. In CIKM,
2003. 1.
Jose Lugo-Martinez and Predrag Radivojac. Classification in biological networks with hypergraphlet
kernels. CoRR, arXiv:1703.04823, 2017. 3 and 4.
Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey. Physica A: Statistical
Mechanics and its Applications, 2011. 1.
Federico Monti, Michael Bronstein, and Xavier Bresson. Geometric matrix completion with recurrent
multi-graph neural networks. In NIPS, 2017. 3.
Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan Günnemann, and
Michael M. Bronstein. Dual-primal graph convolutional networks. CoRR, arXiv:1806.00770,
2018. 10.
11
Under review as a conference paper at ICLR 2019
Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational
machine learning for knowledge graphs. Proceedings of the IEEE, 2016. 1.
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representa-
tions. In KDD, 2014. 7.
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The
graph neural network model. IEEE Transactions on Neural Networks, 2009. 2.
Edward R Scheinerman and Daniel H Ullman. Fractional graph theory: a rational approach to the
theory of graphs. Courier Corporation, 2011. 4.
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max
Welling. Modeling relational data with graph convolutional networks. In ESWC, 2018. 1.
Ankit Sharma, Jaideep Srivastava, and Abhishek Chandra. Predicting multi-actor collaborations
using hypergraphs. CoRR, arXiv:1401.6404, 2014. 3.
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale
information network embedding. In WWW, 2015. 7.
Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion.
KDD Deep Learning Day, 2018. 1.
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua
Bengio. Graph attention networks. In ICLR, 2018. 10.
Hao Wang, Xingjian Shi, and Dit-Yan Yeung. Relational deep learning: A deep latent variable model
for link prediction. In AAAI, pp. 2688–2694, 2017. 1.
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec.
Graph convolutional neural networks for web-scale recommender systems. In KDD, 2018. 7, 8, and 9.
Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In NIPS, 2018. 1.
Muhan Zhang, Zhicheng Cui, Shali Jiang, and Yixin Chen. Beyond link prediction: Predicting
hyperlinks in adjacency space. In AAAI, 2018. 1, 3, 6, 7, 8, and 9.
Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. Learning with hypergraphs: Clustering,
classification, and embedding. In NIPS, 2006. 1, 3, 4, and 7.
A PPENDIX
We used the same hyperparameters as Kipf & Welling (2017) for the 2-layer GCN model for all the
datasets in all the experiments.
Additionally for the 2-layer multi-layer perceptron used for the directed hyperlink prediction experi-
ments, we used 16 hidden units with a dropout rate of 0.25
12
Under review as a conference paper at ICLR 2019
hyperparameter value
number of hidden units 16
number of hidden layers 2
dropout rate 0.5
L2 regularisation 5 × 10−4
learning rate 0.01
non-linearity ReLU
Table 10: Hyperparameters of GCN used for all the datasets
• DBLP: We used the DBLP database v43 . We filtered out papers without abstracts, and
processed each abstract by tokenizing it and removing stop-words removal. Further, we
filtered out papers with one author only. This left 540532 papers.
In order to ensure that the hypergraph formed would be sufficiently dense, we found the
number of papers authored by each author and took the top 1000 authors as ‘selected
authors’. Then we filtered out the papers that were not authored by at least three of the
selected authors. Finally, we were left with 1590 papers by 685 of the original 1000 selected
authors.
To extract word features from each of these abstracts, we took all words appearing in
these abstracts with a frequency greater than 50. Each abstract was thus represented by a
602-dimensional bag-of-words representation.
For both datasets, we randomly sample |E| fake papers according to the author distribution of
the existing non-fake papers (2708 and 1590 for CORA and DBLP respectively). We randomly
generated Gaussian p dimensional features for these fake papers (1433 and 602 for CORA and DBLP
respectively).
3
https://aminer.org/lab-datasets/citation/DBLP-citation-Jan8.tar.bz2
13