Dynamic Hypergraph Neural Networks: Jianwen Jiang Yuxuan Wei Yifan Feng Jingxuan Cao Yue Gao
Dynamic Hypergraph Neural Networks: Jianwen Jiang Yuxuan Wei Yifan Feng Jingxuan Cao Yue Gao
2635
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
Vertex
…
Conv
Hyperedge Hyperedge
…
…
Conv
Centroid
Vertex
…
Vertex
Conv
Feature
Vertex Hyperedge
Hyperedge
Features Features
Figure 2: DHGNN framework. The first frame describes the hypergraph construction process on centroid vertex (the star) and its neighbors.
For instance, two hyperedges are generated from two clusters (dashed ellipses). In the second frame, features of vertices contained in a
hyperedge are aggregated to hyperedge feature through vertex convolution and features of adjacent hyperedges are aggregated to centroid
vertex feature through hyperedge convolutoin. After performing such operations for all vertices on current layer feature embedding, we obtain
the new feature embedding where new hypergraph structure will be constructed, as is shown in the third frame.
hypergraph convolution (HGC) module. In DHG module, we data distributions. On social media sentiment prediction,
utilize k-NN method and k-means clustering method to up- we observe performance improvement against state-of-
date hypergraph structure based on local and global features the-art methods.
respectively during a single inference process. Furthermore, The rest of the paper is organized as follows. Section 2 in-
we propose a hypergraph convolution method in HGC mod- troduces related work in graph-based deep learning and hy-
ule by a stack of vertex convolution and hyperedge convo- pergraph learning. Section 3 explains the proposed dynamic
lution. For vertex convolution, we use a transform matrix hypergraph neural networks method. Applications and exper-
to permute and weight vertices in a hyperedge; for hyper- imental results are presented in Section 4. Finally, we draw
edge convolution, we utilize attention mechanism to aggre- conclusions in Section 5.
gate adjacent hyperedge features to the centroid vertex. Com-
pared with hypergraph-based deep learning method HGNN,
our convolution module better fuses information from local
2 Related Work
and global features provided by our DHG module. In this section, we give a brief review on graph-based deep
We have applied our model to data with and without inher- learning and hypergraph learning.
ent graph structure. For data with inherent graph structure, we
conducted an experiment on a citation network benchmark, 2.1 Graph-based Deep Learning
the Cora dataset [Sen et al., 2008], for the node classifica- Semi-supervised learning on graphs has long been an active
tion task. In this experiment, we used DHGNN to jointly research field in deep learning. DeepWalk [Perozzi et al.,
learn embeddings from given graph structure and a hyper- 2014] and Planetoid [Yang et al., 2016] view sampled paths
graph structure from feature space. For data without inherent in graphs as random sequences and learn vector embedding
graph structure, an experiment was conducted on a social me- from these sequences.
dia dataset, the Microblog dataset [Ji et al., 2019], for the sen- After the great success of convolutional neural net-
timent prediction task. In this experiment, a multi-hypergrpah works [Krizhevsky et al., 2012] in image processing, re-
was constructed to model the complex relations among mul- searchers have been devoted to designing convolutional meth-
timodal data. ods for graph-based data. Existing graph neural network
Our contributions are summarized as follows: methods can be divided in two main categories: spectral
methods and spatial methods.
1. We propose a dynamic hypergraph construction method, Based on spectral graph theory, spectral graph convolu-
which adopts k-NN method to generate basic hyperedge tional methods use graph Laplacian eigenvectors as graph
and extends adjacent hyperedge set by clustering algo- Fourier basis. After transforming features to spectral do-
rithm, i.e., k-means clustering. By dynamic hypergraph main, a spectral convolution operation is conducted on
construction method, local and global relations will be spectral features. To overcome the expensive computation
extracted. cost in Laplacian factorization, ChebyshevNet introduces
2. We conducted experiments on network-based classifi- Chebyshev polynomials to approximate Laplacian eigenvec-
cation and social media sentiment prediction. On the tors [Defferrard et al., 2016]. GCN further simplifies the pro-
network-based task, our method outperforms state-of- cess and uses one-order polynomial on each layer [Kipf and
the-art methods and shows higher robustness to different Welling, 2017].
2636
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
Different from spectral methods, spatial graph convolution Algorithm 1 Hypergraph Construction
methods leverage spatial sampler and aggregator to generate Input: Input embedding X; hyperedge size k; adjacent hy-
neighborhood feature embedding. MoNet defines a generic peredge set size S
spatial convolution framework for deep learning on non- Output: Hyperedge set G
Euclidean domains [Monti et al., 2017]. GraphSAGE defines Function: k-Means clustering kM eans; k-nearest neighbor-
sampler and aggregator in graph neural network and tries hood selection knn; distance function dis; S − 1 smallest
LSTM as neighborhood aggregator [Hamilton et al., 2017]. distance index selection topK
GAT introduces self-attention mechanism and computes the 1: C = kM eans(X)
attention coefficients between node pairs [Veličković et al., 2: for u in range(len(X)) do
2018]. In the field of computer vision, DGCNN, a 3D point 3: eb = knn(X[u], X, k)
cloud learning method, also leverages the concept of spatial 4: G[u].insert(eb )
graph convolution in its model [Wang et al., 2018]. 5: D = dis(C.center, u)
6: D = sort(D)
2.2 Hypergraph Learning 7: ind = topK(D, S − 1)
Hypergraph learning is first introduced by [Zhou et al., 2007] 8: for i in ind do
as a label propagation method for semi-supervised learning. 9: G[u].insert(C[i])
This method aims to minimize the differences in labels of 10: end for
vertices that share the same hyperedge. [Huang et al., 2009] 11: end for
discusses the construction methods of hypergraph, including
k-NN method and search radius method. More recent works We use symbol Con(e) to denote the vertex set a hyper-
concentrate on the learning of hyperedge weight, intending to edge e contains and use symbol Adj(u) to denote the hyper-
assign larger weight to hyperedges or sub-hypergraphs with edge set composed of all hyperedge containing the vertex u,
higher importance [Gao et al., 2012]. Besides learning label which is formulated as:
propagation on hypergraph, dynamic hypergraph structure
learning proposes learning of hypergraph structure by a dual
optimization process [Zhang et al., 2018]. Like graph neu- Con(e) = {u1 , u2 , ..., uke } (1)
ral network, hypergraph neural network (HGNN) has been Adj(u) = {e1 , e2 , ..., eku } (2)
proposed as the first deep learning method on hypergraph where ke and ku is the number of vertices in hyperedge e and
structure, employing hypergraph Laplacian to represent hy- the number of hyperedges containing vertex u. Vertex u is
pergraph from spectral perspective [Feng et al., 2018]. defined as the centroid vertex of hyperedge set Adj(u).
Hypergraph has many aspects of applications. In computer We combine k-NN methods and k-means clustering meth-
vision, hypergraph is used to describe relations among visual ods for dynamic hypergraph construction to exploit local and
features for tasks like visual classification [Wang et al., 2015], global structure. On one hand, we have computed the k − 1
image retrieval [Huang et al., 2010] and video object seg- nearest neighbors for each vertex u. These neighborhood ver-
mentation [Huang et al., 2009]. There are also works using tices, along with the vertex u, form a hyperedge in Adj(u).
hypergraph structure for label propagation on 3D model clas- On the other hand, we have conducted k-means algorithm on
sification [Zhang et al., 2018]. In social media, MHG [Chen the whole feature map of each layer according to Euclidean
et al., 2015] and Bi-MHG [Ji et al., 2019] are proposed to distance. For each vertex, the nearest S −1 clusters will be as-
deal with multimodal data. signed as the adjacent hyperedges of this vertex. The detailed
procedure is described in Algorithm 1.
3 Dynamic Hypergraph Neural Networks We perform such procedure on the feature embedding of
each layer. Especially, we initialize hypergraph structure with
In this section, we introduce the dynamic hypergraph neural the input feature embedding. Therefore, the hyperedge set is
networks (DHGNN) proposed in detail. As is illustrated in dynamically adjusted as the feature embedding evolves with
Figure 2, a DHGNN layer consists of two major part: dy- network going deeper. In this way, we are able to obtain bet-
namic hypergraph construction (DHG) and hypergraph con- ter hypergraph struture for high-order data relation modeling
volution (HGC). We will first introduce these two parts in the with deep neural network.
following subsections and then discuss the implementation of
dynamic hypergraph neural networks in the last subsection. 3.2 Hypergraph Convolution
Hypergraph convolution is composed of two submodules:
3.1 Dynamic Hypergraph Construction vertex convolution submodule and hyperedge convolution
submodule. Vertex convolution aggregates vertex features to
Given feature embedding X = [x1 ; x2 ; ...; xn ] where xi (i =
hyperedge and then hyperedge convolution aggregates adja-
1, 2, ..., n) denotes the feature of the i-th sample, we construct
cent hyperedge features to centroid vertex by hyperedge con-
hypergraph G. In hypergraph, vertex u denotes a sample and
volution.
hyperedge e denotes a sample collection containing a flexible
number of samples. Therefore, a hypergraph can be formu- Vertex Convolution
lated as G = {V, E}, where V denotes the vertex set and E Vertex convolution aggregates vertex features to the hyper-
denotes the hyperedge set. edge containing these vertices. A simple solution is pooling
2637
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
2638
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
dynamic hypergraph, we are able to deploy information from lr #train GCN HGNN GAT DHGNN
both citation relation and feature embedding relation in a uni-
form manner. To evaluate the performance of our method, se- std 140 81.5% 81.6% 83.0% 82.5%
ries of experiments were conducted on the public benchmark 2% 54 69.6% 75.4% 74.8% 76.9%
citation network dataset, Cora dataset. 5.2% 140 77.8% 79.7% 79.4% 80.2%
10% 270 79.9% 80.0% 81.5% 81.6%
Cora dataset. Cora dataset is a benchmark dataset of cita- 20% 540 81.4% 80.1% 83.5% 83.6%
tion network. In Cora dataset, there are 2,708 vertices denot- 30% 812 81.9% 82.0% 84.5% 85.0%
ing academic papers and 5,429 edges denoting citation rela- 44% 1200 82.0% 81.9% 85.2% 85.6%
tions between pairs of papers. Each vertex has a bag-of-word
feature vector and a category label indicating the subject that Table 1: Performance comparisons on Cora with different splits. ”lr”
the paper belongs to. There are 7 categories in total. stands for label rate, ”#train” stands for number of training samples
Experimental setup. We have conducted experiments on and ”std” stands for standard split. Standard split experiment and
different splits of the Cora dataset including standard split 5.2% split experiment share the same number of training samples.
described in [Yang et al., 2016]. Because the standard split Different from the standard split setting, samples in 5.2% split is
randomly selected. 44% is the largest possible size of training set
uses fixed training samples with 5.2% of dataset, it is possible
with standard validation and test set.
for a method to be influenced by the fixed data distribution.
Therefore, for further comparison, we randomly sampled dif-
ferent proportion of data as training set to demonstrate the
effectiveness of our method. The proportion for training set
is selected as 2%, 5.2%, 10%, 20%, 30% and 44%, respec-
tively. We compared our method with recent representative
methods like GCN [Kipf and Welling, 2017], HGNN [Feng
et al., 2018] and GAT [Veličković et al., 2018]. 10 times av-
erage accuracy was reported in Table 1 for comparison.
We used 2 layer dynamic hypergraph neural network with
a GCN-style input layer for feature dimension reduction. We
used 400 cluster centers in k-means clustering method and Figure 5: Ablation experiment on dynamic hypergraph(DHG) mod-
chose 64 as the receptive field size. We added two dropout ule and hypergraph convolution (HGC) module. For model without
layers with the dropout rate of 0.5 before two convolutional dynamic hypergraph, we used inherent graph structure on Cora for
layers. convolution operation. For model without hypergraph convolution,
we used average pooling for substitution of vertex convolution and
Semi-supervised node classification. We compared
hyperedge convolution.
DHGNN with most recent graph/hypergraph-based neural
network methods on different dataset splits. Experimental
results are listed as Table 1, showing that our method outper- hypergraph-based learning method, i.e., HGNN and when
formed state-of-the-art by 1.5%, 0.5%, 0.1%, 0.1%, 0.5%, k = 128, our method outperforms HGNN by 0.9%. This
0.4% when 2%, 5.2%, 10%, 20%, 30%, 44% randomly sam- implies that our method is able to aggregate neighborhood
pled data was used as training set, respectively. Moreover, features better.
we observed that hypergraph structure was relatively more
competitive when training set was smaller. The reason is that 4.2 Microblog Sentiment Prediction
graph convolution only uses 1-order adjacent relation while Apart from experiments on citation network, we also eval-
hypergraph convolution utilizes high-order relation, which uated our model on a more complicated task, social media
is helpful to the label propagation process on a sparsely sentiment prediction. Multi-modality is an important feature
labelled graph. of social media. We used hypergraph to model the high-
Ablation experiments. To evaluate the effectiveness of order relations among different modalities. Specifically, ver-
proposed dynamic hypergraph construction (DHG) module tices were used to denote a tweet. Hyperedge sets were con-
and hypergraph convolution (HGC) module, we conducted structed according to each modality feature and hyperedges
two ablation experiments on the Cora dataset, where each from multi-modalities jointly represent the correlation be-
module mentioned above was removed from the complete tween vertices. In our experiment, we used the Microblog
model. We compared the ablated models against the com- dataset to evaluate our hypergraph model.
plete model and investigated the influence of hyperparameter Microblog dataset. The Microblog dataset contains 5,550
k, which denotes the number of sampled vertices in a hyper- tweets crawled from Sina Microblog platform 1 during Feb.
edge. Results are shown in Figure 5. From the results, we 2014 to Apr. 2014. Each tweet has three modalities: text,
observe that DHG module and HGC module always improve image and emoticon. We generated 2547-dimension bag-of-
the performance of baseline with different k. As k increases, words textual feature using Chinese auto-segmentation sys-
the gain from both modules increases, indicating the effec- tem ICTCLAS [Zhang et al., 2003]. To generate visual
tiveness of our method. It is noted that even when k = 4 feature, we used SentiBank [Borth et al., 2013], a kind
(is much smaller than the max degree in the Cora dataset,
1
169), our method still obtains similar performance with other https://www.weibo.com
2639
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
of ANP detector library pre-trained on Twitter images to Method Acc Train Time
transform Microblog images to 1553-dimension feature vec-
tor. For emoticons, we built a emoticons dictionary with 49 CBM-NB [Wang et al., 2014] 71.6% -
frequently-used emoticons and computed bag-of-emoticons CBM-LR [Wang et al., 2014] 79.9% -
features. Each tweet has a label indicating its emotional po- CBM-SVM [Wang et al., 2014] 81.6% -
larity (postive or negative). The task is to predict tweet emo- HGNN [Feng et al., 2018] 86.8% 2m11s
tional polarity by multimodal features. There are 4196 posi- MHG noW [Chen et al., 2015] 87.3% -
tive tweets and 1354 negative tweets in the dataset. MHG [Chen et al., 2015] 88.6% -
MMHG [Chen et al., 2015] 88.7% -
Experimental setup. We followed the experimental setup Bi-MHG [Ji et al., 2019] 90.0% 58.5h
in [Ji et al.], where 4,650, 400, 500 tweets were randomly
selected as training, validation and test set, respectively. 10- DHGNN (our method) 91.8% 1m32s
times average accuracy was reported for method evaluation.
We used 2 layer dynamic hypergraph neural network with a Table 2: Performance comparisons on the Microblog dataset
multi-input fully-connected layer for feature dimension re-
duction. Dimensions of each modality feature were reduced
convolution respectively. Despite of this, we also note that in
to 32 before hypergraph convolution. We constructed three
the standard split of Cora dataset, GAT performs better than
hyperedge sets for three modality respectively and merged
DHGNN. The main reason is that in standard split, training
these sets as one multimoal hyperedge set. For each modal-
set contains fixed samples, thus suffering from larger ran-
ity, we use 400 cluster centers in k-means clustering method
domness and bias. On the other settings, we have randomly
and the number of vertices contained is 8 in each clusters. We
sampled training set for 10 times and reported the average ac-
select 2 nearest clusters from k-means clusters and one k-NN
curacy for comparison to suppress such randomness and bias.
cluster as the adjacent hyperedge set of each vertex. We use
identical activation and dropout setting with Section 4.1. We Discussion on time complexity. Traditional hypergraph
compare our model with recent approaches for multimodal learning models like Bi-MHG involve iterative optimization
sentiment prediction, such as Multi-kernel SVM [Zhang et and matrix inversion, thus suffering from larger time cost
al., 2011], Cross-media Bag-of-words Model [Wang et al., than neural network model. When comparing HGNN and
2014], Bi-layer Multimodal Hypergraph Learning [Ji et al., DHGNN, we find that parameter number of both models
2019], etc. Experiments were conducted on a Nvidia GeForce 0.133M, indicating that it takes roughly the same time to
GTX 1080 Ti GPU with 11G memory and 10.6 T-flops com- train a HGNN/DHGNN epoch. However, it takes 30 epochs
puting capacity. on average for DHGNN to converge on Microblog sentiment
Microblog sentiment prediction. In this experiment, we dataset while it takes 200 epochs on average for HGNN to
ran DHGNN to fuse features from mutliple modalities for converge. Therefore, DHGNN runs faster on Microblog sen-
sentiment label prediction. Experimental results are shown timent dataset.
in Table 2, which can be summarized as:
1. In terms of prediction accuracy, DHGNN achieved 5 Conclusions
higher performance with 1.8% accuracy gains in multi-
modal sentiment prediction task compared with current In this paper, we propose a dynamic hypergraph neural net-
state-of-the-art method. works framework to update hypergraph structure on each
layer. The method consists of two important modules: dy-
2. In terms of time expense, DHGNN remarkably short- namic hypergraph construction method and hypergraph con-
ened training time compared with current state-of-the- volution, where hypergraph convolution includes vertex con-
art methods (2300 times as fast as Bi-MHG and 1.4 volution and hyperedge convolution for hypergraph neighbor-
times as fast as HGNN). hood feature aggregation. We apply our model to citation net-
Experimental results indicate that our method outper- work data and multimodal social media data for evaluation.
formed the state-of-the-art method in both prediction accu- Results demonstrate that our model achieves similar of bet-
racy and training speed. The experiments on Microblog ter performance compared with state-of-the-art methods and
dataset demonstrates the effectiveness of our method in mod- is more robust to different data distributions. We also inves-
eling the high-order relations among multimodal data. tigate the effectiveness of dynamic hypergraph construction
module and hypergraph convolution module independently
4.3 Discussion by ablation experiment. In our model, k-NN method and
Discussion on accuracy. Compared with statically initial- k-means clustering method are used in hypergraph dynamic
ized hypergraph structure, dynamic hypergraph structure can construction. Future work can concentrate on better and more
better represent data distribution in deeper layers. Compared interpretable hypergraph construction methods.
with pooling and multi-layer perception, hypergraph convolu-
tion uses fixed-size, weight-shared learnable convolution ker- Acknowledgments
nel for feature extraction, thus being better for information
aggregation. Ablation experiments demonstrate the effective- This work was supported by National Natural Science Funds
ness of the dynamic hypergraph construction and hypergraph of China (U1701262, U1801263, 61671267).
2640
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
2641