Heterogeneous Hypergraph Variational Autoencoder For Link PR
Heterogeneous Hypergraph Variational Autoencoder For Link PR
Abstract—Link prediction aims at inferring missing links or predicting future ones based on the currently observed network. This topic
is important for many applications such as social media, bioinformatics and recommendation systems. Most existing methods focus on
homogeneous settings and consider only low-order pairwise relations while ignoring either the heterogeneity or high-order complex
relations among different types of nodes, which tends to lead to a sub-optimal embedding result. This paper presents a method named
Heterogeneous Hypergraph Variational Autoencoder (HeteHG-VAE) for link prediction in heterogeneous information networks (HINs).
It first maps a conventional HIN to a heterogeneous hypergraph with a certain kind of semantics to capture both the high-order
semantics and complex relations among nodes, while preserving the low-order pairwise topology information of the original HIN. Then,
deep latent representations of nodes and hyperedges are learned by a Bayesian deep generative framework from the heterogeneous
hypergraph in an unsupervised manner. Moreover, a hyperedge attention module is designed to learn the importance of different types
of nodes in each hyperedge. The major merit of HeteHG-VAE lies in its ability of modeling multi-level relations in heterogeneous
settings. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of the proposed method.
Index Terms—Heterogeneous information network, hypergraph, hyperedge attention, link prediction, variational inference
the proposed method in detail. Lastly, we explain and ana- graph convolutional networks [45] employ a trainable
lyze the results from our experiments in Section 5, before curvature learning layer to map euclidean input features
concluding the paper in Section 6. to embeddings in hyperbolic space.
Moreover, some recent works have focused on the het-
2 RELATED WORK erogeneous information networks [18], [19], [47], which
have multiple types of links and complex dependency
In this section, we first review the related studies about link structures. Some methods such as HIN2Vec[18] and Meta-
prediction and then discuss hypergraph learning methods path2vec [19] utilize meta-path [48] based structural infor-
that are closely related to this work. mation to capture the heterogeneity of the graph for node
embedding. HEER [20] studies the problem of comprehen-
2.1 Link Prediction sive transcription for a heterogeneous graph by leveraging
Link prediction has been studied extensively in the machine the edge representations and heterogeneous metrics for the
learning communities [1], [9] with various applications [2], final graph embedding. HGT [49] employs node- and edge-
[3], [4], [5], [6], [8]. The existing traditional link prediction type dependent parameters to characterize the heteroge-
algorithms can be classified into two categories [29]: unsu- neous attention over each edge. There are also some knowl-
pervised and supervised methods. Unsupervised methods edge graph based models which are designed under the
assign prediction scores to potential links based on the topol- heterogeneous settings, including TransE [50] and ConvE
ogy of the given networks, e.g., Adamic/Adar [30] captures [51]. TransE treats each edge as a triplet composed of two
the intuition that links are more likely to exist between pairs nodes and an edge type, and then translates embeddings
of nodes if they have more common neighbors. Moreover, for modeling multi-relational data. ConvE is similar to
path-based methods sum over the paths between two nodes TransE but goes beyond simple distance or similarity func-
to measure the likelihood of a link [9], [31]. For supervised tions by using the deep neural networks to score the triplet.
methods, link prediction is often treated as a classification However, it is noted that the heterogeneity of networks or
problem where the target class label indicates the presence the high-order relation and semantics among nodes existing
or absence of a link between a pair of nodes, e.g., Supervised commonly have not been systematically explored in previ-
random walks [32] learns a function that assigns strengths to ous works.
edges in the network such that a random walker is more
likely to visit the nodes to which new links will be created in
the future, and DEDS [33] constructs ensemble classifiers for 2.2 Hypergraph Learning
better prediction accuracy. A hypergraph is a generalization of a network (graph) in
Recently, a number of network embedding techniques which a hyperedge can link more than two nodes [52].
have been proposed for link prediction tasks such as Hypergraph learning [53] has been studied to model high-
DeepWalk [10], LINE [11], and node2vec [12], which order correlations among data and achieves successful
implicitly factorize some matrices for representation applications in different domains recently [54], such as
learning on the network. HSRL[34] considers both the image ranking [55], 3D visual classification [56], [57], social
local and global topologies of a network by recursively relationship analysis [58], [59], and active learning [60].
compressing an input network into a series of smaller More recently, hypergraph neural networks (HGNN) [61]
networks and then applies Deepwalk, node2vec, or LINE and hypergraph convolution networks (HyperGCN) [62],
on each compressed network. Some emerging graph neu- are proposed as extension works of graph convolution net-
ral networks (GNN) models that use different aggrega- work frameworks. However, those methods are mainly
tion schemes for node embedding are also successfully designed for homogeneous settings where the nodes and
applied for representation learning on the graph: Graph hyperedges in a hypergraph are of the same type.
convolutional networks (GCNs) [35] based methods Different from the above mentioned works, there are
including GAE/VGAE [36], S-VGAE [37], Deep Genera- several works conducted in the heterogeneous settings
tive Network Embedding (DGE) [38], Relational graph for network clustering [63], [64] and embedding [23],
convolutional networks (R-GCNs)[39], and Linear-AE/ [24], [25]. In [63], an inhomogeneous hypergraph parti-
Linear-VAE [40], are proposed for the link prediction tioning method is proposed that assumes different
task. SEAL [13] trains a classifier by mapping the sub- hyperedge cuts have different weights and assigns dif-
graph patterns to link existence around each target link. ferent costs to different cuts during hypergraph learning.
GraphSAGE [41] concatenates the node’s feature in addi- In [64], p-Laplacians is proposed for submodular hyper-
tion to mean/max/LSTM pooled neighborhood informa- graphs to solve the scalability problem of clique expan-
tion. Graph Attention Networks (GAT) [42] aggregate sion based methods [63]. In [23], object embeddings are
neighborhood information according to trainable atten- learned based on events in heterogeneous information
tion weights. Position-aware Graph Neural Networks (P- networks where a hyperedge encompasses the objects
GNNs) [43] aim to compute position-aware node embed- participating in a specific event. In [24], a Deep Hyper-
dings. There are also some works towards graph learn- Network Embedding (DHNE) model is proposed to
ing in non-euclidean spaces [44], [45], e.g., hyperbolic embed a hypergraph with indecomposable hyperedges,
space [46], to capture the properties of scale-free and in which any subset of nodes in a hyperedge cannot
hierarchical structures of real-world graph data. Hyper- form another complete hyperedge. In [25], HHNE is pro-
bolic graph neural networks [44] extend GNN to learn posed based on the neural tensor network (NTN) [3] to
graph embeddings in hyperbolic space, while hyperbolic learn a tuple-wise similarity score function for
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
4128 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 8, AUGUST 2022
eE V k ¼ TanhðZ
Z eE V k WE V k þ bE V k Þ; (10)
trans trans trans
Vk EVk E Vk E
where WEtrans 2 RDout Dtrans and bEtrans 2 RDtrans are the learn-
able weight and bias of the transformation, respectively, and
DEtrans is the dimensionality of the transformed embedding.
V
Fig. 3. Hyperedge attention module. Next, the importance weights aE k 2 RN1 of the kth type of
node for all N hyperedges are calculated based on the simi-
larity between the transformed embedding Z eE V k and the
where XE 2 RNF can be the initial F -dimension hyperedge DEtrans 1
trans
type preference vector P 2 R , which is randomly ini-
features if available, otherwise, XE ¼ I is simply an identity
tialized and jointly updated during training:
matrix. fV k ðÞ is a non-linear activation function such as
Vk
ReLUðxÞ ¼ maxð0; xÞ or TanhðxÞ ¼ 1þe22x 1, here, we eE
a
Vk
eE
¼Z trans P ;
(11)
Vk Vk
empirically select
Vk
Tanh in the experiments. WV k 2 RDin Dout
and bV k 2 RDout are the weight and bias learned by the and then the importance weights are normalized through a
V V
encoder, Dink and Doutk are the dimensionalities of XE and Z eV k , softmax function:
respectively.
EVk Þ
eV k , two individual ¼ PKexpðea
Vk
Given the projected node embedding Z aE : (12)
expðe
Vk
aE Þ
fully connected layers are then employed to estimate the k¼1
ZV k jH
means mV k and variances s V k of qðZ HV k ; uV k Þ: Then, the fused hyperedge embedding can be obtained by
the weighted sum of the projected type-specific embedding
mV k ¼ Z eV k WVmk þ bVmk (6) eE V k according to the learned importance weights aE V k :
Z
X
K
Vk EVk
eV k WV k þ bV k ;
sVk ¼ Z (7) eE ¼
Z aE Ze : (13)
s s
Vk
k¼1
where WVmk ; WVs k 2 R Dout D
are the two learnable weight
Given the fused hyperedge embedding Z eE , two individ-
matrices, and bVmk ; bVs k 2 RD are biases, respectively. D is the
dimensionality of the final node embedding ZV k , which is ual fully connected layers are then employed to estimate the
sampled by the following process: means mE and variances s E of qðZZE jH
HV 1 ; . . . ; HV k ; uE Þ:
eE WE þ bE
mE ¼ Z (14)
ZV k ¼ m V k þ s V k ; (8) m m
HV k ij ¼ SigmoidðZ
ZV k i ðZ
ZE j ÞT Þ (17) the embeddings of both identifier nodes and slave nodes
directly by aggregating features from neighbors in a homoge-
HV k i;j jZ
pðH ZV k i ; ZE j ; V k Þ ¼ BerðH
HV k ij Þ; (18) neous setting. However, in HeteHG-VAE, the incidence
matrix H 2 RMN and its duality H 2 RNM , transformed
where SigmoidðÞ is the sigmoid activation function. from the adjacency matrix A are taken as inputs to infer the
Finally, the overall process of HeteHG-VAE is described embeddings of slave nodes and identifier nodes respectively.
in Algorithm 1. In the node encoder, the operation of HXWV generates the
embeddings of the nodes by aggregating the features of
4.5 Link Prediction its incident hyperedges. While in the hyperedge encoder,
the operation of H E generates the embeddings of the
XW
Based on the inferred embedding ZV k and ZE , the likelihood
of the link between slave node i and identifier node j is mea- hyperedges by aggregating the features of its incident
sured by the similarity between the two embeddings, which nodes. Moreover, considering the heterogeneity of the
is as follows: network, we first extract different sub-hypergraphs for
different types of nodes in form of the sub-incidence
ZV i ; ZE j Þ ¼ SimðZ
fsco ðZ ZV i ; ZE j Þ; (19) matrices HV k , and conduct a type-aware node embedding
in a heterogeneous setting. We also introduce the atten-
where SimðÞ is a similarity measure function, which can be tion mechanism in the hyperedge for a better embedding
either simple similarity measuring such as euclidean dis- quality by learning the importance of different types of
tance and cosine similarity, or a more complex edge classi- nodes in each hyperedge.
fier such as logistic regression.
Algorithm 1. HeteHG-VAE
4.6 Connection With Previous Work
Require:
We choose variational graph autoencoder (VGAE) [36], a
H ¼ fH HV k gKk¼1 : Incidence matrix of hypergraph G with K
recent VAE[28] based generative model for graph represen-
types of nodes.
tation learning as the base model to discuss the connection Ensure:
between the proposed model and previous work. ZV ¼ fZ ZV k gK
k¼1 : Node embedding;
VGAE extends the variational autoencoder on graph E
Z : Hyperedge embedding.
structure data by inferring the latent variables zi for each 1: for epoch = 1 to T do
node vi as the node representations in the latent space: 2: Project the observed space HV k of kth type of node V k to a
QMþN common latent space Z eV k via Eq. (5);
ZÞ ¼
qðZ i¼1 qðzi jA
AÞ; (20) 3: Sample the node embedding ZV k from the estimated pos-
terior distribution qðZ ZV k jH
HV k ; uV k Þ via Eq. (8);
AÞ ¼ N ðzi jmi ; diagðs 2i ÞÞ, A 2 RðMþNÞðMþNÞ is the
where qðzi jA V k of high-order relation
4: Project the observed space H
adjacency matrix of the graph, and mi and diagðs 2i Þ are the of kth type of node V k , to a common latent space Z eE V k
Gaussian parameters learned by two GCN branches as follows: via Eq. (9);
5: Learn the importance of each kind of node in each hyper-
m ¼ GCNm ðA; XÞ; s ¼ GCNs ðA;
A; X A; X
XÞ: (21)
edge via Eq. (12);
~ ~ eE V k via
Here, GCNðÞ ¼ A AReLUð AXW 0 ÞW1 is a two-layer GCN.
6: Fuse the type-specific hyperedge embedding Z
~ 12 12 Eq. (13);
A ¼ D ðA A þ IIÞD is the symmetrically normalized adja-
cency matrix, with I being the identity matrix and D being 7: Sample the hyperedge embedding ZE from the estimated
the diagonal degree matrix of A þ II. X is the node feature posterior distribution qðZ ZE jH
HV 1 ; . . . ; HV k ; uE Þ via Eq. (16);
matrix. W0 and W 1 are the trainable weights and ReLUðÞ 8: Reconstruct HV k from the estimated distribution pðH HV k i;j j
Vk E Vk
Z i ; Z j ; Þ via Eq. (18);
is activation function ReLUðxÞ ¼ maxð0; xÞ. Different from
9: Update the HeteHG-VAE with its stochastic gradient by
VGAE, in HeteHG-VAE, for the estimation of node distribu-
Eq. (4).
tion, the Gaussian parameters, i.e., m, is inferred by
10: end for
m ¼ TanhðH HXW0 þ b0 ÞW W1 þ b1 . 11: returnZ ZV , ZE ;
The similarity between HeteHG-VAE and VGAE is that a
message passing mechanism including feature transformation
and aggregation from neighbors is employed in both models.
For example, the graph convolution operation of AXW ~ 0 in 5 EXPERIMENTS
VGAE aims to aggregate features of neighbors to the target In this section, we first describe the experimental setup
node following a linear transform, and then another follow- including datasets, baseline methods, parameter settings
ing GCN layer is used to estimate the parameter of Gaussian and evaluation metrics. Then, we present the experimental
distribution. Similarly, the operation of HXWV in HeteHG- results in comparison with the state-of-the-art methods on
VAE also aims to generate the node embedding by aggregat- the link prediction task.
ing the features of its incident hyperedges. Then, a following
dense layer is used for distribution estimation, which is
slightly different from VGAE. The main differences between 5.1 Experimental Setup
HeteHG-VAE and VGAE lie in the model inputs and the 5.1.1 Datasets
manner of network embedding. VGAE takes the normalized Four real-world plain networks [34] including DBLP, Dou-
adjacency matrix A ~ 2 RðMþNÞðMþNÞ as the inputs to learn ban, IMDB and Yelp are used in this paper to evaluate the
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
4132 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 8, AUGUST 2022
TABLE 2
Statistics of Datasets
2jEE j
The density of graph is jVV jðjV
V j1Þ . Identifier node is marked in red and slave node in blue.
proposed method. During the experiments, we use the iden- nodes, and learns node embeddings by preserving
tity matrix as the feature matrix of the node and hyperedge the proximities of nodes in the embedding space.
as in the previous works [36], [40], [42]. The statistics of node2vec [12] is a generalized version of DeepWalk
datasets are shown in Table 2. Details of the datasets are as that provides a trade-off between depth first and
follows: breadth first search for random walks on nodes.
HSRL [34] recursively compresses an input network
DBLP is a bibliographic network in computer science into a series of smaller networks using a community-
collected from four research areas: Machine Learn- awareness compressing strategy, and then learns
ing, Database, Data Mining, and Information node embeddings based on the compressed net-
Retrieval. Four types of objects are included in the works by using DeepWalk, LINE, or node2vec.
network: paper, author, venue, and term. HIN2Vec [18] exploits different types of relationships
Douban is a user-movie interest network from a user in forms of meta-paths in a heterogeneous graph by
review website Douban in China,1 which contains carrying out multiple prediction training tasks
four types of nodes including movies, directors, jointly based on a target set of relationships to learn
actors, and users. latent vectors of nodes and meta-paths.
IMDB is also a user-movie interest network collected metapath2vec [19] employs metapath based random
from the Internet Movie Data.2 Four types of nodes walks on skip-gram model for heterogeneous graph
including movies, directors, actors, and users, are embedding.
contained in the network. HEER [20] studies the problem of comprehensive
Yelp is a user-business network collected from a web- transcription for a heterogeneous graph by leverag-
site Yelp in America.3 It contains four types of nodes ing the edge representations and heterogeneous met-
including users, businesses, locations, and business rics for the final graph embedding.
categories. TransE [50] is a knowledge graph embedding model
designed under the heterogeneous settings by treat-
ing each edge as a triplet composed of two nodes
5.1.2 Baseline Methods and an edge type, and then translating embeddings
We compare our proposed HeteHG-VAE with four kinds of for modeling multi-relational data.
graph representation learning models: homogeneous graph ConvE [51] is similar to TransE but goes beyond sim-
embedding models including DeepWalk [10], LINE [11], ple distance or similarity functions by using the
node2vec [12], and HSRL [34]; heterogeneous graph embed- deep neural networks to score the triplet.
ding models including HIN2Vec [18], metapath2vec [19], GAE/VGAE [36] GAE is an autoencoder-based unsu-
HEER [20], HGT[49], TransE [50], and ConvE [51]; graph pervised model to learn both topologies and node
neural network based models including SEAL [13] and contents of graph data, while VGAE is a variant of
HGCN [45], and graph autoencoders including GAE [36], GAE, in which, variational autoencoder is utilized
VGAE [36], S-VGAE[37], GAT-AE [42], Linear-AE [40], and for network embedding.
Linear-VAE [40]. Details of these methods are as follows: S -VGAE[37] replaces Gaussian distribution of VGAE
with von Mises-Fisher distribution to learning the
DeepWalk [10] is a random walk based graph embed- embedding in the hyperspherical latent space.
ding method which conducts random walks on each GAT-AE [42] GAT explores attention mechanisms
node to sample node sequences and uses the Skip- among node neighbors for node classification. We
Gram model to learn node embeddings by treating replace the classifier of GAT with an inner prod-
node sequences as sentences and nodes as words. uct decoder to construct an autoencoder based
LINE [11] defines the first-order and second-order model, termed as GAT-AE, for link prediction
proximities to measure the similarity between task.
Linear-AE/Linear-VAE [40] are the linear version of
GAE/VGAE by replacing the GCN encoder with a
1. https://movie.douban.com/
2. https://www.imdb.com/ simple linear model w.r.t. the adjacency matrix of
3. https://www.yelp.com the graph.
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
FAN ET AL.: HETEROGENEOUS HYPERGRAPH VARIATIONAL AUTOENCODER FOR LINK PREDICTION 4133
TABLE 3
Link Prediction Performance on DBLP and Douban
All values are percentages. The best results are marked in bold.
SEAL [13] is a link prediction framework, which unsupervised network embedding settings, in the experi-
extracts a local enclosing subgraph around each tar- ments, we randomly hide 20-80 percent of existing edges as
get link, and uses a graph classifier based on a test edge set, and train all models with the remaining
DGCNN [68] for the final link prediction. edges, a held-out validation set (5 percent edges) is used to
HGCN [45] is a hyperbolic graph neural network that tune the hyper-parameters across all of the models. For all
employs GCNs with curvature learning layers to the models except SEAL and HGCN, we follow the strategy
learn node embedding in the hyperbolic space. recommended by previous literature [12], [70] to train a
HGT[49] is a transformer based self-attention model logistic regression classifier for link prediction. We compute
for heterogeneous network embedding by designing the feature representation of each edge by using the Hada-
node- and edge-type dependent parameters to char- mard operator to compute the element-wise product for
acterize the heterogeneous attention over each edge. embeddings of the linked target nodes, and use a classifier
to predict link existence based on the computed edge repre-
sentation. For SEAL, the learned subgraph classifier is
5.1.3 Experimental Design and Metrics directly applied on the same sampled test edges for evalua-
In the experiment, we implement HeteHG-VAE based on tion. For HGCN, we use the Fermi-Dirac predictor [71], [72]
Tensorflow4 and train it with 200 iterations for DBLP, Yelp as used in the original paper [45], to predict the edge proba-
datasets, and 300 iterations for Douban and IMDB dataset bility. We repeat the process five times and the average per-
respectively. Adam algorithm [69] is utilized for optimiza- formance and the mean variance are reported as the final
tion with the learning rate as 0.001. The embedding size of results. Similar to the previous studies [13], [18], [34], in this
all algorithms is set to 50 by default. Following the standard paper, we use the widely used AUC score (the Area Under
a receiver operating characteristic Curve) and AP score
4. https://github.com/tensorflow/tensorflow (Average Precision) to measure the link prediction accuracy
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
4134 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 8, AUGUST 2022
TABLE 4
Link Prediction Performance on IMDB and Yelp
All values are percentages. The best results are marked in bold.
for different models. All the experiments are conducted on (HEER) and 9.22 percent (HGT) on the IMDB dataset,
the Ubuntu 16.04 Server with 189 GB memory, and Intel(R) respectively. With 80 percent training ratio, HeteHG-VAE
Xeon(R) CPU E5-2640 (2.4 GHz). improves AUC score and AP score over the best baseline by
3.11 percent (SEAL) and 4.37 percent (ConvE) on the DBLP
dataset, and 7.74 percent (HGT) and 7.59 percent (S-VGAE)
5.2 Experimental Results on the IMDB dataset, respectively, which demonstrates that
5.2.1 Performance Analysis high-order relations and semantics are useful for pairwise
In this section, we demonstrate the effectiveness of the pro- link prediction in HINs. Among those baselines, DeepWalk,
posed method by presenting the results of our model on the LINE, node2vec, and HSRL learn embedding by capturing
link prediction task, which is important in the real world to
predict missing links or links that are likely to occur in the TABLE 5
future in an information network, and provide a compari- Time Efficiency Performance (CPU Time of Model Learning
son with the state-of-the-art methods. The results of all com- in Seconds for One Training Epoch)
pared methods on DBLP, Douban, IMDB, and Yelp datasets
are provided in Tables 3 and 4. Dataset DBLP Douban IMDB Yelp
The experimental results show that the proposed GAE[36] 17.7 0.2 1.01 0.1 24.8 0.3 10.1 0.2
HeteHG-VAE significantly outperforms all baselines across VGAE[36] 16.2 0.3 0.93 0.1 27.1 1.3 9.3 0.2
all the datasets. For example, with 20 percent training ratio, GAT-AE[42] 45.7 1.4 2.02 0.2 151.4 10.1 18.3 1.3
we observe that HeteHG-VAE improves AUC score and AP Linear-AE[40] 16.1 0.9 0.98 0.1 21.1 0.2 9.57 0.3
Linear-VAE[40] 15.3 0.3 0.91 0.1 23.08 0.7 8.31 0.3
score over the best baseline by 6.89 percent (HGT) and 8.96 HeteHG-VAE 3.7 0.08 0.21 0.01 0.69 0.02 2.4 0.05
percent (HEER) on the DBLP dataset, and 8.25 percent
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
FAN ET AL.: HETEROGENEOUS HYPERGRAPH VARIATIONAL AUTOENCODER FOR LINK PREDICTION 4135
Fig. 4. Link prediction performance with/without hyperedge attention Fig. 5. Box plots of attention weights learned by HeteHG-VAE for differ-
module. ent types of node.
various order proximity patterns of the network, while by an amount of identifier nodes, which plays a less represen-
neglecting the heterogeneity of HINs. Heterogeneous based tative role in representing an identifier node (hyperedge).
methods such as HEER, ConvE, and HGT instead make use Nodes with types such as term (T) and user (U) are less shared
of different types of relationships between nodes for more among hyperedges and are more discriminative to represent a
reasonable node embedding. Moreover, we find that SEAL, specific identifier node, therefore, they are assigned with
HGCN, and HGT achieve much better link prediction per- higher attention weights to represent a specific hyperedge.
formance than other baselines, where SEAL trains a super-
vised link classifier based on the extracted local subgraph 5.2.3 Visualization
pattern around each target link, HGCN benefits from hyper- Different from the conventional heterogeneous graph
bolic geometry, and HGT models heterogeneous edges embedding methods, the proposed HeteHG-VAE aims at
according to the meta relation schema for better representa- learning a high-order relation aware node embedding for
tion learning. Compared with the euclidean GNN models the link prediction task. To gain a better insight into the dif-
such as GAE/VGAE, HGCN achieves better performance in ference between HeteHG-VAE and conventional graph
most cases of all of the datasets except Douban. Since the based methods, we visualize the learned node embedding
performance gain of HGCN is correlated with graph hyper- via the t-SNE tool [73], which embeds the inferred node
bolicity [45], it indicates that the underlying graph structure embedding into a two-dimensional space.
of Douban is more euclidean than hyperbolic. Specifically, we take the low-dimensional embeddings of
Different from the baselines neglecting the high-order nodes learned by three different methods, named DeepWalk,
semantics and complex relations among different types of VGAE, and HeteHG-VAE, as the inputs to the t-SNE tool.
nodes, HeteHG-VAE considers both the different levels of Here, we randomly choose four target nodes, namely b1137,
relations among nodes and the heterogeneity of the net- b5024, b5069, and b8602, and two sets of their neighbor nodes
work, as well as uses attention mechanisms to obtain more and non-neighbor nodes, respectively, to generate the node
representative node embeddings. This explains its better embeddings for visualization. As shown in Fig. 6, the red tri-
performance on the link prediction task. Compared with angle indicates the target node, the orange filled circles are the
five autoencoder based baselines, namely GAE, VGAE, neighbors linked to it and the blue filled circles are randomly
GAT-AE, Linear-AE, and Linear-VAE, HeteHG-VAE selected non-neighbors without links to it. As we can see,
achieves not only higher prediction accuracy, but also better comparing with DeepWalk and VGAE, the neighbor nodes
model training efficiency. As shown in Table 5, compared with links to the target node are clustered and separated better
with GAE and VGAE, HeteHG-VAE achieves a significant from those non-neighbor nodes that have no links to the target
reduction of CPU time during training on all the four data- node by HeteHG-VAE, which demonstrates that high-order
sets. This is because of the simplicity of linear layer and the relation facilitates better node embedding to capture the struc-
small size of input data used in HeteHG-VAE compared ture and semantic proximity among nodes. The visualized
with those conventional graph autoencoders. results also explain why our approach is capable of achieving
better performance on link prediction.
5.2.2 Effectiveness of Hyperedge Attention
As shown in Fig. 4, we conduct an analysis on the effect of the 5.2.4 Embedding Dimension Sensitivity
hyperedge attention module. The link prediction performan- We investigate the sensitivity of different numbers of the
ces of HeteHG-VAE with/without the hyperedge attention embedding dimension D for link prediction. The experiment
module on different datasets are reported, and the results results are shown in Fig. 7. We can see that the performance is
demonstrate the effectiveness of hyperedge attention mecha- proportional to the size of embedding dimension until 64 or
nism by considering the importance of different types of 128 because higher dimensional embeddings are capable of
nodes. Moreover, we report the learned attention weights of encoding more information. However, when the number of
different types of nodes (slave nodes) during the generation dimensions keeps on increasing, the performance tends to
of the final hyperedge (identifier node) embedding in Fig. 5. drop because the model might suffer from overfitting on the
We find that those slave nodes with high degrees such as observed edges and thus performs poorly on new edges.
venue (V), director (D), and localization (L) tend to be associ- Overall, the performance of HeteHG-VAE is relatively stable
ated with low attention weights because they are likely shared within a large range of embedding dimensions.
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
4136 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 8, AUGUST 2022
Fig. 6. 2D t-SNE visualization of the latent embeddings on the Yelp dataset. (Orange points and blue points indicate the neighbor nodes and the ran-
domly sampled non-neighbor nodes of the red triangle shaped target node, respectively).
REFERENCES
[1] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem
for social networks,” J. Amer. Soc. Inf. Sci. Technol., vol. 58, no. 7,
pp. 1019–1031, 2007.
[2] L. Liao, X. He, H. Zhang, and T.-S. Chua, “Attributed social net-
work embedding,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 12,
pp. 2257–2270, Dec. 2018.
[3] R. Socher, D. Chen, C. D. Manning, and A. Ng, “Reasoning with
neural tensor networks for knowledge base completion,” in Proc.
Fig. 7. Impact of different embedding dimensions. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 926–934.
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
FAN ET AL.: HETEROGENEOUS HYPERGRAPH VARIATIONAL AUTOENCODER FOR LINK PREDICTION 4137
[4] Q. Wang, Z. Mao, B. Wang, and L. Guo, “Knowledge graph [30] L. A. Adamic and E. Adar, “Friends and neighbors on the web,”
embedding: A survey of approaches and applications,” IEEE Soc. Netw., vol. 25, no. 3, pp. 211–230, 2003.
Trans. Knowl. Data Eng., vol. 29, no. 12, pp. 2724–2743, Dec. 2017. [31] L. Katz, “A new status index derived from sociometric analysis,”
[5] A. Clauset, C. Moore, and M. E. Newman, “Hierarchical structure Psychometrika, vol. 18, no. 1, pp. 39–43, 1953.
and the prediction of missing links in networks,” Nature, vol. 453, [32] L. Backstrom and J. Leskovec, “Supervised random walks: Pre-
no. 7191, pp. 98–101, 2008. dicting and recommending links in social networks,” in Proc.
[6] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy 4th ACM Int. Conf. Web Search Data Mining, 2011, pp. 635–644.
side effects with graph convolutional networks,” Bioinformatics, [33] Y.-L. Chen, M.-S. Chen, and S. Y. Philip, “Ensemble of diverse
vol. 34, no. 13, pp. 457–466, 2018. sparsifications for link prediction in large-scale networks,” in
[7] W. Feng and J. Wang, “Incorporating heterogeneous information Proc. IEEE Int. Conf. Data Mining, 2015, pp. 51–60.
for personalized tag recommendation in social tagging systems,” [34] G. Fu, C. Hou, and X. Yao, “Learning topological representation
in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, for networks via hierarchical sampling,” in Proc. Int. Joint Conf.
2012, pp. 1276–1284. Neural Netw., 2019, pp. 1–8.
[8] C. Shi, B. Hu, W. X. Zhao, and P. S. Yu, “Heterogeneous informa- [35] T. N. Kipf and M. Welling, “Semi-supervised classification
tion network embedding for recommendation,” IEEE Trans. with graph convolutional networks,” in Proc. Int. Conf. Learn.
Knowl. Data Eng., vol. 31, no. 2, pp. 357–370, Feb. 2019. Representations, 2017.
[9] L. L€u and T. Zhou, “Link prediction in complex networks: A survey,” [36] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” in
Physica A: Statist. Mech. Appl., vol. 390, no. 6, pp. 1150–1170, 2011. Proc. NIPS Workshop Bayesian Deep Learn., 2016.
[10] B. Perozzi, R. Al-Rfou , and S. Skiena, “DeepWalk: Online learning [37] T. R. Davidson, L. Falorsi, N. De Cao , T. Kipf, and J. M. Tomczak,
of social representations,” in Proc. 20th ACM SIGKDD Int. Conf. “Hyperspherical variational auto-encoders,” in Proc. 34th Conf.
Knowl. Discov. Data Mining, 2014, pp. 701–710. Uncertainty Artif. Intell., 2018, pp. 856–865.
[11] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: [38] S. Zhou et al.“DGE: Deep generative network embedding based
Large-scale information network embedding,” in Proc. 24th Int. on commonality and individuality,” in Proc. AAAI Conf. Artif.
Conf. World Wide Web, 2015, pp. 1067–1077. Intell., 2020, pp. 6949–6956.
[12] A. Grover and J. Leskovec, “node2vec: Scalable feature learning [39] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg , I.
for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis- Titov, and M. Welling, “Modeling relational data with graph
cov. Data Mining, 2016, pp. 855–864. convolutional networks,” in Proc. Eur. Semantic Web Conf.,
[13] M. Zhang and Y. Chen, “Link prediction based on graph neural 2018, pp. 593–607.
networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, [40] G. Salha, R. Hennequin, and M. Vazirgiannis, “Keep it simple:
pp. 5165–5175. Graph autoencoders without graph convolutional networks,” in
[14] S. Scellato, A. Noulas, and C. Mascolo, “Exploiting place features in Proc. Workshop Graph Representation Learn. 33rd Conf. Neural Inf.
link prediction on location-based social networks,” in Proc. 17th ACM Process. Syst., 2019.
SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2011, pp. 1046–1054. [41] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
[15] M. Nickel, X. Jiang, and V. Tresp, “Reducing the rank in relational learning on large graphs,” in Proc. Int. Conf. Neural Inf. Process.
factorization models by including observable patterns,” in Proc. Syst., 2017, pp. 1024–1034.
Int. Conf. Neural Inf. Process. Syst., 2014, pp. 1179–1187. [42] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Li o, and Y.
[16] H. Zhao, L. Du, and W. Buntine, “Leveraging node attributes for Bengio, “Graph attention networks,” in Proc. Int. Conf. Learn. Rep-
incomplete relational data,” in Proc. 34th Int. Conf. Mach. Learn., resentations, 2018.
2017, pp. 4072–4081. [43] J. You, R. Ying, and J. Leskovec, “Position-aware graph neural
[17] H. Wang, X. Shi, and D.-Y. Yeung, “Relational deep learning: A networks,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 7134–7143.
deep latent variable model for link prediction,” in Proc. AAAI [44] Q. Liu, M. Nickel, and D. Kiela, “Hyperbolic graph neural
Conf. Artif. Intell., 2017, pp. 2688–2694. networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp.
[18] T.-Y. Fu, W.-C. Lee, and Z. Lei, “HIN2Vec: Explore meta-paths in 8230–8241.
heterogeneous information networks for representation learning,” [45] I. Chami, Z. Ying, C. Re, and J. Leskovec, “Hyperbolic graph con-
in Proc. ACM Conf. Inf. Knowl. Manage., 2017, pp. 1797–1806. volutional neural networks,” in Proc. Int. Conf. Neural Inf. Process.
[19] Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable repre- Syst., 2019, pp. 4868–4879.
sentation learning for heterogeneous networks,” in Proc. 23rd ACM [46] O. Ganea, G. Becigneul, and T. Hofmann, “Hyperbolic neural
SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2017, pp. 135–144. networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018,
[20] Y. Shi, Q. Zhu, F. Guo, C. Zhang, and J. Han, “Easing embedding pp. 5345–5355.
learning by comprehensive transcription of heterogeneous infor- [47] J. Tang, M. Qu, and Q. Mei, “PTE: Predictive text embedding
mation networks,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. through large-scale heterogeneous text networks,” in Proc. 21th
Discov. Data Mining, 2018, pp. 2190–2199. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2015,
[21] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, pp. 1165–1174.
“Heterogeneous graph neural network,” in Proc. 25th ACM [48] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “PathSim: Meta path-based
SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2019, pp. 793–803. top-K similarity search in heterogeneous information networks,” Proc.
[22] X. Wang et al.“Heterogeneous graph attention network,” in Proc. VLDB Endowment, vol. 4, no. 11, pp. 992–1003, 2011.
World Wide Web Conf., 2019, pp. 2022–2032. [49] Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph
[23] H. Gui et al.“Embedding learning with events in heterogeneous transformer,” in Proc. Web Conf., 2020, pp. 2704–2710.
information networks,” IEEE Trans. Knowl. Data Eng., vol. 29, [50] A. Bordes, N. Usunier, A. Garcia-Duran , J. Weston, and
no. 11, pp. 2428–2441, Nov. 2017. O. Yakhnenko, “Translating embeddings for modeling multi-
[24] K. Tu, P. Cui, X. Wang, F. Wang, and W. Zhu, “Structural deep relational data,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
embedding for hyper-networks,” in Proc. AAAI Conf. Artif. Intell., 2013, pp. 2787–2795.
2018, pp. 426–433. [51] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel,
[25] I. M. Baytas, C. Xiao, F. Wang, A. K. Jain, and J. Zhou, “Convolutional 2D knowledge graph embeddings,” in Proc. AAAI
“Heterogeneous hyper-network embedding,” in Proc. IEEE Int. Conf. Artif. Intell., 2018, pp. 1811–1818.
Conf. Data Mining, 2018, pp. 875–880. [52] C. Berge, Graphs and Hypergraphs. Oxford, U.K.: Elsevier Science
[26] T. Chen and Y. Sun, “Task-guided and path-augmented heteroge- Ltd., 1985.
neous network embedding for author identification,” in Proc. 10th [53] D. Zhou, J. Huang, and B. Sch€ olkopf, “Learning with hyper-
ACM Int. Conf. Web Search Data Mining, 2017, pp. 295–304. graphs: Clustering, classification, and embedding,” in Proc. Int.
[27] C. Zhang, C. Huang, L. Yu, X. Zhang, and N. V. Chawla, “Camel: Con- Conf. Neural Inf. Process. Syst., 2007, pp. 1601–1608.
tent-aware and meta-path augmented metric learning for author iden- [54] Y. Gao, Z. Zhang, H. Lin, X. Zhao, S. Du, and C. Zou,
tification,” in Proc. World Wide Web Conf., 2018, pp. 709–718. “Hypergraph learning: Methods and practices,” IEEE Trans.
[28] D. Kingma and M. Welling, “Auto-encoding variational bayes,” in Pattern Anal. Mach. Intell., early access, Nov. 19, 2020,
Proc. Int. Conf. Learn. Representations, 2014. doi: 10.1109/TPAMI.2020.3039374.
[29] L. Duan, S. Ma, C. Aggarwal, T. Ma, and J. Huai, “An ensemble [55] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas, “Image retrieval
approach to link prediction,” IEEE Trans. Knowl. Data Eng., vol. via probabilistic hypergraph ranking,” in Proc. IEEE Comput. Soc.
29, no. 11, pp. 2402–2416, Nov. 2017. Conf. Comput. Vis. Pattern Recognit., 2010, pp. 3376–3383.
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl
4138 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 8, AUGUST 2022
[56] H. Shi et al.“Hypergraph-induced convolutional networks for Yuxuan Wei received the BS degree in electrical
visual classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, engineering from Tsinghua University, Beijing,
no. 10, pp. 2963–2972, Oct. 2019. China, in 2019. He is currently working toward
[57] Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao, “Inductive multi-hyper- the MS degree in software engineering from
graph learning and its application on view-based 3D object classi- Tsinghua University, Beijing, China. His research
fication,” IEEE Trans. Image Process., vol. 27, no. 12, pp. 5957–5968, Dec. interests include graph neural networks and
2018. large-scale graph processing.
[58] D. Yang, B. Qu, J. Yang, and P. Cudre-Mauroux , “Revisiting user
mobility and social relationships in LBSNs: A hypergraph embedding
approach,” in Proc. World Wide Web Conf., 2019, pp. 2147–2157.
[59] P. Li, G. J. Puleo, and O. Milenkovic, “Motif and hypergraph cor-
relation clustering,” IEEE Trans. Inf. Theory, vol. 66, no. 5, pp.
3065–3078, May 2020. Zuoyong Li received the BS and MS degrees in
[60] I. E. Chien, H. Zhou, and P. Li, “HS 2 : Active learning over hyper- computer science and technology from Fuzhou
graphs with pointwise and pairwise queries,” in Proc. 22nd Int. University, Fuzhou, China, in 2002 and 2006,
Conf. Artif. Intell. Statist., 2019, pp. 2466–2475. respectively, and the PhD degree from the School
[61] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural of Computer Science and Technology, Nanjing
networks,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 3558–3565. University of Science and Technology (NUST),
[62] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Nanjing, China, in 2010. He is currently a profes-
Talukdar, “HyperGCN: A new method for training graph convo- sor at the College of Computer and Control Engi-
lutional networks on hypergraphs,” in Proc. 33rd Conf. Neural Inf. neering, Minjiang University, Fuzhou, China.
Process. Syst., 2019, pp. 1509–1520.
[63] P. Li and O. Milenkovic, “Inhomogeneous hypergraph clustering
with applications,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2017, pp. 2308–2318.
[64] P. Li and O. Milenkovic, “Submodular hypergraphs: P-Laplacians,
cheeger inequalities and spectral clustering,” in Proc. Int. Conf. Changqing Zou received the BE degree from the
Mach. Learn., 2018, pp. 3014–3023. Harbin Institute of Technology, Harbin, China, the
[65] C. Shi, Y. Li, J. Zhang, Y. Sun, and P. S. Yu, “A survey of heteroge- ME degree from the Institute of Remote Sensing
neous information network analysis,” IEEE Trans. Knowl. Data and Digital Earth, Chinese Academy of Sciences,
Eng., vol. 29, no. 1, pp. 17–37, Jan. 2017. Beijing, China, and the PhD degree from the
Shenzhen Institutes of Advanced Technology,
[66] E. R. Scheinerman and D. H. Ullman, Fractional Graph Theory: A
Rational Approach to the Theory of Graphs. Chelmsford, MA, USA: Chinese Academy of Sciences, Beijing, China.
Courier Corporation, 2011. He is currently a principal research scientist at
[67] M. Probst and F. Rothlauf, “Harmless overfitting: Using denoising Huawei, as well as a guest researcher at Sun
autoencoders in estimation of distribution algorithms,” J. Mach. Yat-sen University.
Learn. Res., vol. 21, no. 78, pp. 1–31, 2020.
[68] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end
deep learning architecture for graph classification,” in Proc. AAAI Yue Gao (Senior Member, IEEE) received the BS
Conf. Artif. Intell., 2018, pp. 4438–4445. degree from the Harbin Institute of Technology,
[69] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- Harbin, China, and the ME and PhD degrees
mization,” in Proc. Int. Conf. Learn. Representations, 2015. from Tsinghua University, Beijing, China. He is
[70] A. Sankar, Y. Wu, L. Gou, W. Zhang, and H. Yang, “Dynamic currently an associate professor at the School of
graph representation learning via self-attention networks,” in Software, Tsinghua University.
Proc. Workshop Representation Learn. Graphs Manifolds, 2019.
[71] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and
M. Bogun a, “Hyperbolic geometry of complex networks,” Physical
Rev. E, vol. 82, no. 3, 2010, Art. no. 036106.
[72] M. Nickel and D. Kiela, “Poincare embeddings for learning hierar-
Qionghai Dai (Senior Member, IEEE) received
chical representations,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
the MS and PhD degrees in computer science
2017, pp. 6338–6347. and automation from Northeastern University,
[73] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Shenyang, China, in 1994 and 1996, respectively.
J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008. He is currently a professor at the Department of
Automation and the director of the Broadband
Haoyi Fan is currently working toward the PhD Networks and Digital Media Laboratory, Tsinghua
degree from the School of Computer Science University, Beijing. He is also an academician of
and Technology, Harbin University of Science and Chinese Academy of Engineering. He has auth-
Technology, Harbin, China. His current research ored or coauthored more than 200 conference
interests include graph data mining, time series and journal papers and two books. His research
analysis, and anomaly detection. interests include computational photography and microscopy, computer
vision and graphics, and intelligent signal processing. He is associate
editor of the Journal of Visual Communication and Image Representa-
tion, the IEEE Transactions on Neural Networks and Learning Systems,
and the IEEE Transactions on Image Processing.
Fengbin Zhang received the PhD degree in " For more information on this or any other computing topic,
computer application from Harbin Engineering please visit our Digital Library at www.computer.org/csdl.
University, Harbin, China, in 2005. He is currently
a supervisor and professor at the Harbin Univer-
sity of Science and Technology. His current
research interests include network and informa-
tion security, firewall technology, and intrusion
detection technology.
uthorized licensed use limited to: JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY. Downloaded on December 05,2024 at 06:44:40 UTC from IEEE Xplore. Restrictions appl