0% found this document useful (0 votes)
16 views8 pages

Totally Dynamic Hypergraph Neural Network

The paper presents a new method called Totally Dynamic Hypergraph Neural Network (TDHNN) that dynamically adjusts the hyperedge number to optimize hypergraph structures for better representation learning. Unlike previous dynamic hypergraph neural networks, TDHNN captures hyperedge feature distributions and updates node features through a novel hypergraph convolution algorithm. Experimental results demonstrate its effectiveness compared to state-of-the-art methods, with the source code available online.

Uploaded by

zhaoyaya773
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Totally Dynamic Hypergraph Neural Network

The paper presents a new method called Totally Dynamic Hypergraph Neural Network (TDHNN) that dynamically adjusts the hyperedge number to optimize hypergraph structures for better representation learning. Unlike previous dynamic hypergraph neural networks, TDHNN captures hyperedge feature distributions and updates node features through a novel hypergraph convolution algorithm. Experimental results demonstrate its effectiveness compared to state-of-the-art methods, with the source code available online.

Uploaded by

zhaoyaya773
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

Totally Dynamic Hypergraph Neural Network


Peng Zhou1 , Zongqian Wu1 , Xiangxiang Zeng3 , Guoqiu Wen1∗ , Junbo Ma1 ,
Xiaofeng Zhu1,2∗
1
Guangxi Key Lab of Multi-Source Information Mining and Security, Guangxi Normal University,
Guilin 541004, China
2
University of Electronic Science and Technology of China
3
Hunan University

Abstract Previous hypergraph methods can be divided into two cate-


gories, i.e., static hypergraph neural networks (SHGNNs) and
Recent dynamic hypergraph neural networks
dynamic hypergraph neural networks (DHGNNs). SHGNNs
(DHGNNs) are designed to adaptively optimize
conduct representation learning by using the initialized hy-
the hypergraph structure to avoid the dependence
pergraph structure to capture the relatuionships among nodes.
on the initial hypergraph structure, thus capturing
To do this, HGNN [Feng et al., 2019] employed the convo-
more hidden information for representation learn-
lution method and HGNN+ [Gao et al., 2022] introduced a
ing. However, most existing DHGNNs cannot ad-
spatial-based hypergraph convolution. However, SHGNNs
just the hyperedge number and thus fail to fully
are highly dependent on the initialized hypergraph structure,
explore the underlying hypergraph structure. This
which usually has redundant information and cannot dis-
paper proposes a new method, namely, totally hy-
cover hidden relationships among nodes. To address these
pergraph neural network (TDHNN), to adjust the
issues, DHGNNs were proposed to learn latent connections
hyperedge number for optimizing the hypergraph
from features and can mine more useful information. For in-
structure. Specifically, the proposed method first
stance, [Jiang et al., 2019] proposed to reconstruct the hy-
captures hyperedge feature distribution to obtain
pergraph using both kNN and k-means. [Bai et al., 2021]
dynamical hyperedge features rather than fixed
proposed using continuous values to construct the incidence
ones, by conducting the sampling from the learned
matrix and the attention mechanism to learn the connection
distribution. The hypergraph is then constructed
weights. DeepHGSL [Zhang et al., 2022] uses hidden rep-
based on the attention coefficients of both sam-
resentations in multiple hypergraph convolutional layers to
pled hyperedges and nodes. The node features
construct the hypergraph. HSL [Cai et al., 2022] samples hy-
are dynamically updated by designing a simple hy-
peredges from the initial hypergraph structure (i.e., removing
pergraph convolution algorithm. Experimental re-
redundant hyperedges) and utilizes attention mechanisms to
sults on real datasets demonstrate the effective-
capture more relationships among nodes and hyperedges for
ness of the proposed method, compared to SOTA
the hypergraph construction.
methods. The source code can be accessed via
https://github.com/HHW-zhou/TDHNN. Previous DHGNNs ignore the adjustment of the number
of hyperedges, i.e., hyperedge number, resulting in being un-
available to correctly explore the hypergraph structure. That
1 Introduction is, no matter how the nodes are allocated, the hyperedge num-
Graphs are widely used in real applications including social ber is always the same. To address this issue, t-DHL [Gao et
networking [Berahmand et al., 2021], web search [Wang et al., 2020] attempts to adaptively adjust the hyperedge number
al., 2021], and recommendation systems [Wu et al., 2022] by projecting the hyperedge space onto a binary tensor. How-
since they can efficiently capture relationships among data. ever, it is a traditional machine learning method and cannot
However, the construction of the regular graph is based on be used in an end-to-end way, thus difficultly exploring the
pairwise relations, which makes it difficult to describe com- relationship among nodes. Besides, we observe that it is es-
plex relationships. Hypergraphs can handle this problem nat- sential for considering the hyperedge features to adjust the
urally. The hypergraph is a good alternative to address the hyperedge number. Actually, a hyperedge connects a set of
above issue by connecting every hyperedge with an arbitrary nodes, so that the common information among these nodes
number of nodes. As a result, the hypergraph is able to cap- (the hyperedge features for short) can be used for character-
ture complex relationships among nodes, and thus having izing this set of nodes. In contrast, the hyperedge is not nec-
more expressive ability than regular graphs. essary if it cannot represent the common characteristics of a

Corresponding author ([email protected]) set of nodes by a specific measurement, e.g., the attention co-
This work was supported in part by the National Key Re- efficients between the hyperedge and the nodes in this paper.
search and Development Program of China under Grant No. In this way, the hyperedge number is updated with the hyper-
2022YFA1004100. edge features. However, the hyperedge features are unavail-

2476
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

Figure 1: The architecture of the proposed TDHNN involving four steps, i.e., (1) Hyperedge feature sampling randomly samples m hyper-
edges from a trainable hyperedge feature distribution; (2) Hyperedge feature update renews hyperedge features by the attention coefficients
of the sampled hyperedge features and the node features; (3) Hypergraph construction builds the hypergraph by assigning nodes to the
hyperedges; (4) Hypergraph convolution updates node features by a simple hypergraph convolutional layer.

able for most datasets and few studies focused on considering between the i-th node and the j-th hyperedge, otherwise the
the hyperedge features for hypergraph neural networks. opposite. hi,j can also be a continuous value, indicating the
To address the above issues, in this paper, we propose a connection strength between the i-th node and the j-th hyper-
new method, namely, Totally Dynamic Hypergraph Neural edge. Since H implies the topology of a hypergraph, we also
Network (TDHNN) shown in Figure 1, to learn dynamical hy- use H to denote the hypergraph for simplicity. The degree
peredge features for updating the hyperedge number, includ- of a hyperedge ei indicates the number of nodes contained in
ing four steps, i.e., hyperedge feature sampling, hyperedge this hyperedge, denoted by ⌈(ei ). In this paper, given a set
feature update, hypergraph construction, and hypergraph con- of nodes V and its feature matrix Xv , our goal is to learn a
volution. The first two steps generate the hyperedge fea- hyperedge feature distribution P(Xe |Xv ) and use this distri-
tures and the third step adjusts the hyperedge number. Hence, bution to sample a suitable number of hyperedges to construct
the aforementioned issues in previous methods have been ex- a hypergraph H, then use hypergraph convolution to update
plored. In particular, both of the hyperedge features and the node features.
hyperedge number are adaptively adjusted with the updated
node features. As a result, our method avoids the influence of 2.1 Hyperedge Feature Sampling
the low quality of the initial hypergraph. Common methods for constructing hypergraphs, such as
Different from previous methods, the main contributions of kNN [Huang et al., 2009], l1-hypergraph [Wang et al., 2015],
our proposed method are summarized as follows: or constructing hypergraph based on existing graph struc-
tures [Fang et al., 2014], all construct hyperedges centered at
• We propose a new end-to-end dynamic hypergraph nodes, which means the number of hyperedges is equal to the
framework, which can dynamically adjust the hyper- number of nodes. As the sample size increases, the number
graph structure and the number of hyperedges. of hyperedges also becomes very large. Not only will it cause
• We propose a simple hypergraph convolution algorithm many nodes to appear on multiple hyperedges, but it will also
based on the learned hyperedge features. reduce computational efficiency. If we have the features of
hyperedges in advance and construct hypergraphs centered
• We propose a supervised constraint loss and an unsuper-
on hyperedges, we can break the above limitations. However,
vised constraint loss to improve the learned hypergraph.
hyperedge features are unavailable for most datasets, which
poses a challenge. To solve this problem, we propose to
2 Methodology sample hyperedges from a trainable distribution P(Xe |Xv ),
A hypergraph is defined as G = (V, E, H). V = and use the relationship between the sampled hyperedges and
{v1 ; v2 ; ...; vn } is the set of all nodes accompanied by nodes training samples to update hyperedge features and this distri-
feature matrix Xv ∈ Rn×dn in which feature dimension is bution. Specifically, we assume that each hyperedge’s feature
dn ; E = {e1 ; e2 ; ...; em } is the set of all hyperedges, with dimension is independent and obeys a trainable Gaussian dis-
each hyperedge e ⊂ E constitutes a subset of V [Arya et tribution (Xe )i,j ∼ N (µj , diag(σ j )), where µ ∈ R1×de
al., 2020]. Similar to V, E should also have a feature ma- and σ ∈ R1×de . At the very beginning, we randomly ini-
trix Xe ∈ Rm×de , although this feature matrix is unavailable tialize this distribution and sample m times to get the ini-
in most datasets. H ∈ Rn×m is the incident matrix, which tial hyperedge feature matrix Xe ∈ Rm×de . Since the sam-
implies the topology of the hypergraph. In incident matrix pling process is discrete and cannot produce gradients, we
H, each row hi represents the relationship between the i-th use reparameterization skill [Kingma and Welling, 2013] to
node and all hyperedges, hi,j = 1 means there is a connection make µ and σ trainable, i.e., we first sample Q ∈ Rm×de

2477
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

from N (0, 1), and then use the following formula to get Xe : where Av ∈ Rn×m is the attention matrix of hyperedges,
αvi ,ej is the attention coefficient of node vi and hyperedge
(Xe )i = µ + σ ⊙ Qi , (1) ej . This reverse is because we need to ensure there are no iso-
where ⊙ represents the Hadamard multiplication. lated nodes since isolated nodes cannot exchange information
from the constructed hypergraph, which may affect the per-
2.2 Hyperedge Feature Update formance of downstream tasks. Then we put each node into
Subsequently, we propose to use the attention coefficient to the ke hyperedges with the highest attention coefficient:
represent the correlation between each hyperedge and all H = topke (Av ), (6)
nodes. Nevertheless, attention can only be computed if the
hyperedge features and node features belong to the same fea- the topke (·) here means that for each row of Av , keep the first
ture space [Bai et al., 2021], so we first use a mapping func- ke values, and set the rest to zero.
tion f (·) to map the input features Xv onto the feature space
Supervised Constraint
of the hyperedge features:
To improve the learned hypergraph structure, we designed a
X̂v = f (Xv ) ∈ Rn×de . (2) supervised constraint function and an unsupervised constraint
function. As mentioned above, each hyperedge can be re-
Then we adopt the same attention calculation method as garded as the centroid of a cluster, and constructing a hyper-
Transformer [Vaswani et al., 2017], including q(·), k(·) and graph is to divide nodes with the same label into the same
v(·) three trainable mapping function: cluster. In other words, for any pair of nodes vi and vj with
⊤ the same label, we hope them to be connected to the same
eq((Xe )i )·k((X̂v )j ) hyperedges. Since the i-th row of the incident matrix H is
αei ,vj =P , (3)
q((Xe )i )·k((X̂v )⊤
l ) the connection between the i-node and hyperedges and the H
le
we learned in Eq. (6) is continuous, what we have to do is to
where Ae ∈ Rm×n is the attention matrix of hyperedges, minimize the distance between hi and hj , i.e., min d(hi , hj ).
αei ,vj is the attention coefficient of hyperedge ei and node From this, we can get the loss function:
vj . After that, we aggregate the features of the top kn nodes X X
with the highest attention coefficient into the hyperedge: Ls = d(hi , hj ), (7)
c∈C vi ∈c;vj ∈c

X̂e = M LP (concate(Xe , topkn (Ae ) · v(X̂v ))), (4) where C represents the set of categories of nodes in a dataset,
where M LP (·) is Multi-Layer Perceptron, concate(·) means c ∈ C represents a specific category and d(·) is the distance
concatenation. The topkn (·) here means that for each row of function. Eq. (7) can also be regarded as a kind of contrastive
Ae , keep the first kn values and set the rest to zero. In this learning using only positive samples.
way, we connect µ and σ with the input features Xv . Thus Adjusting the Number of Hyperedges
we can use backpropagation to update them. Note that we Since we learn the distribution of hyperedge features instead
resample hyperedges at each iteration. of fixed hyperedge features, we can change the number of
hyperedges by adjusting the number of samples m. The chal-
2.3 Hypergraph Construction
lenge is that the number of samples is non-differentiable, and
Now we have the features of m hyperedges. As mentioned we cannot adjust it from backpropagation. From the graph
before, building a hypergraph is actually a clustering process, theory perspective, we can solve this challenge if we know
and each hyperedge is equivalent to a cluster centroid. So, what kind of hyperedge and node relationship constitutes an
we have to put each node into appropriate clusters. From the optimal hypergraph. However, this is also a very challenging
perspective of clustering, we have the features of the cluster topic, and there is currently no relevant theoretical research.
centroids and the features of the samples; the easiest way is to In order to allow the model to find a better sampling number
calculate the distance between each sample and each cluster adaptively, we designed a simple but effective evaluation cri-
centroid and then divide each sample into the nearest cluster. terion to judge whether the number of learned hyperedges is
Nevertheless, from the hypergraph structure and application appropriate. We define the saturation score of a hypergraph:
perspective, a node can be connected to an arbitrary number
of hyperedges. For example, in a co-author network, each |Eempty |
SH = 1 − , (8)
author can associate multiple works simultaneously. Here |E|
we still use the Transformer-like attention method to calcu-
late the attention coefficient between nodes and hyperedges where Eempty = {e|e ∈ E, ⌈e = 0} is the set of empty hyper-
instead of their distances. However, unlike using the hyper- edges in the hypergraph, and SH is the saturation score of the
hypergraph H, i.e., the proportion of non-empty hyperedges
edges queries q(Xe ) to match the keywords of nodes k(X̂v )
to all hyperedges. This idea is very intuitive: if there are too
in the previous step, in this step, we use the nodes queries
many empty hyperedges in a hypergraph, which proves that
q(X̂v ) to match the hyperedges keywords k(X̂e ): these hyperedges are redundant, we then reduce the number
⊤ of samples; at the same time, due to the extreme situation
eq((X̂v )i )·k((X̂e )j ) of sampling, it is possible to sample outliers, so we should
αvi ,ej = P , (5)
q((X̂v )i )·k((X̂e )⊤
l ) also allow a few numbers of empty hyperedges. In order
le

2478
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

to achieve this, we set two hyperparameters β ∈ [0, 1] and Unsupervised Constraint


γ ∈ [0, 1], which represent the lower and upper limits of sat- We use the labeled nodes to get the supervised constraint and
uration score, respectively. After each iteration, we adjust the hope that the nodes with the same label would be divided
number of samples according to the saturation score: into the same hyperedge; similarly, for the unsupervised con-
straint, we hope that the nodes divided into the same hyper-
m − 1, SH < β;

m= (9) edge would be more similar. In the supervised stage, our con-
m + 1, SH > γ. straint loss was directly applied to H, and in the unsupervised
stage, we added a constraint on node features after the convo-
2.4 Hypergraph Convolution lutional layer:
Simple Hypergraph Convolutional Layer X X
We then use the learned hyperedge features and the con- Lu = d((Xv )i , (Xv )j ). (15)
structed hypergraph to update the nodes’ features. [Feng et e∈E vi ∈e,vj ∈e
al., 2019] proposed the first hypergraph convolution formula:
−1 −1
Finally, we evaluate our model on the node classification
X(t+1)
v = σ(Dv 2 HWD−1 ⊤
e H Dv Xv Θ ).
2 (t) (t)
(10) task, so the final loss function is:
Although Eq. (10) is derived from the spectral domain based L = L e + λ 1 L s + λ2 L u , (16)
on the Fourier transform, we can still interpret it from the
−1 where Le is the empirical loss of node classification, and λ1
perspective of the spatial domain where Dv 2 , W and D−1 e
are all diagonal matrices that are equivalent to regularization and λ2 are trade-off hyperparameters.
terms. If we remove these regularization terms, then we have: 2.5 The Connection and Difference with k-means
X(t+1)
v = σ(HH⊤ X(t) (t)
v Θ ). (11) The way we construct a hypergraph is similar to k-means.
(t) The primary process of k-means is (1) randomly selecting
In Eq. (11), H⊤ Xv Θ(t) is to multiply the feature matrix of
nodes with a weight matrix and then convert it into the feature cluster centroids, (2) clustering, (3) updating the cluster cen-
matrix of hyperedges according to the hypergraph structure. troids, then repeating (2) and (3) until convergence. The
Then it is multiplied by H, which assigns the hyperedge fea- pipeline of the proposed TDHNN is (1) sampling hyperedges,
tures to each corresponding node again. Since the hyperedge (2) assigning nodes to each hyperedge, (3) updating the fea-
features are sampled from a trainable distribution, we do not ture distribution of hyperedges, then repeating (2) and (3) un-
need to convert node features to hyperedge features in Eq. til convergence. The main differences between our method
(11). Thus the convolution formula can be simplified as: and k-means are as follows:
Xv(t+1) = σ(H(t) X(t) (t) • k-means learns features of cluster centroids, while
e Θ ). (12)
TDHNN learns the feature distribution of hyperedges;
We first multiply the learned features of hyperedges by a
weight matrix Θ ∈ Rde ×d and then assign the hyperedge • k-means selects fixed k cluster centroids, while TDHNN
features to each relevant node according to the learned hy- dynamically adjusts the number of samples according to
pergraph structure. Note that at each layer, we reconstruct H the saturation score of the learned hypergraph;
according to the current layer’s hyperedge and node features. • Each cluster in k-means is disjoint, while TDHNN has
A previous work [Huang and Yang, 2021] pointed out that intersections between hyperedges.
self-loops are very important for hypergraph convolution. In
a hypergraph, self-loops are hyperedges that contain only one
node. If self-loops are not introduced, the representation of 3 Experiments
a node will only be affected by the features of its neighbor 3.1 Experimental Settings
nodes in the previous layer and lose its own features. There-
Datasets
fore, Eq. (12) can be modified as:
Following the work of the first hypergraph convolutional neu-
X(t+1)
v = σ(H̃(t) X̃(t) (t)
e Θ ), (13) ral network [Feng et al., 2019], we use two visual object
(t) (t)
where H̃ = concat(H , H ) ∈ R

n×(m+n)
, H ∈ Rn×n

classification datasets (i.e., Princeton ModelNet40 [Wu et al.,
is a diagonal incident matrix in which each node vi belongs 2015] and National Taiwan University 3D model (NTU for
to only one hyperedge ei with degree ⌈(ei ) = 1. Since there short)). Among them, ModelNet40 contains 12, 311 objects
is only one node in the hyperedge of the self-loop, we can with 40 types, and NTU contains 2, 012 objects with a to-
directly regard the features of the node as the hyperedge fea- tal of 67 types of 3D shapes. Like [Feng et al., 2019], we use
(t) (t) (t) Group-View Convolutional Neural Network (GVCNN) [Feng
tures, i.e., X̃e = stack(Xe , X̂v ) ∈ R(m+n)×de . How-
et al., 2018] and Multi-view Convolutional Neural Network
ever, instead of taking the form of Eq. (13), our final convo-
(MVCNN) [Su et al., 2015] for feature extraction and con-
lution formula is:
sider the case of using one set of features separately and us-
X(t+1)
v = σ(wX̂(t) (t) (t) (t)
v + H Xe Θ ), (14) ing two sets of features at the same time. We adopted the
where w is a trainable parameter. That is, we add the features same split standard for ModelNet40 and NTU, i.e., 80% as
of nodes themselves on the basis of Eq. (12). In fact, Eq. (14) the training set and 20% as the testing set. Since the standard
is equally valid and more computationally efficient than Eq. split of a dataset uses fixed training samples, a method may be
(13). affected by the fixed data distribution. For better comparison,

2479
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

ModelNet40 NTU
GVCNN MVCNN GV+MV GVCNN MVCNN GV+MV
GCN 91.80±0.46 91.50±1.80 94.85±1.75 78.80±0.92 78.72±1.97 80.43±1.09
GAT 91.65±0.25 90.07±0.41 95.75±0.14 79.60±0.03 78.50±1.17 80.16±1.08
HGNN 91.80±1.73 91.00±0.66 96.96±1.43 82.50±1.62 79.10±0.90 83.64±0.37
HGNN+ 92.50±0.08 90.60±1.68 96.92±1.81 82.80±1.11 76.40±1.17 84.18±0.82
HNHN 92.10±1.76 91.10±1.84 93.80±1.84 83.10±1.89 79.60±0.79 80.60±0.95
DHGNN 92.13±1.55 85.53±0.83 96.99±1.46 82.30±0.98 77.60±1.55 85.13±0.26
HyperGCN 92.20±0.80 90.20±0.28 96.10±0.63 79.90±1.78 78.10±0.83 79.90±0.91
DeepHGSL 89.32±0.71 88.62±0.93 90.33±0.66 76.28±1.45 72.30±1.57 78.67±0.77
HSL 93.17±0.25 91.44±0.42 96.92±0.41 81.82±1.30 75.68±1.41 82.26±1.20
TDHNN 93.81±1.04 92.33±1.67 97.52±0.80 83.69±0.45 79.62±0.53 86.05±1.10

Table 1: Visual object classification accuracy on ModelNet40 and NTU. GVCNN and MVCNN indicate that the features extracted by
GVCNN and MVCNN are used as input, respectively; GV+MV indicates that the two sets of features are concatenated as input. We report
the mean and standard deviation over 20 runs.

we use the same method as [Jiang et al., 2019] to randomly HNHN, TDHNN has an average improvement of 1.33%,
sample different proportions of the data on Cora [Veličković 1.6%, and 2.12%, respectively. Compared with dynamic hy-
et al., 2017] as the training set. Specifically, in addition to the pergraph algorithms DHGNN, HyperGCN, DeepHGSL and
standard split, we respectively select 2%, 5.2%, 10%, 20%, HSL, our classification accuracy has averagely increased by
30%, and 44% of the data to train. 2.22%, 2.77%, 6.25% and 1.95% respectively. In general,
hypergraph algorithms outperform graph algorithms on Mod-
Comparison Methods elNet40 and NTU. However, some algorithms show instabil-
Our comparison methods include two classic graph models, ity in the experimental results. For example, when HNHN
i.e., GCN [Kipf and Welling, 2016] and GAT [Veličković et uses the features of GVCNN and MVCNN on ModelNet40,
al., 2017]; three hypergraph convolutional neural networks, respectively, it shows high accuracy (92.10% and 91.10%,
i.e., HGNN, HGNN+ , and HNHN [Dong et al., 2020]; and respectively). However, the accuracy improvement is not ob-
two dynamic hypergraph networks, i.e., DHGNN, Hyper- vious when fusing these two sets of features simultaneously
GCN [Yadati et al., 2019], DeepHGSL [Zhang et al., 2022] (93.80%). Compared with other methods, it is even lower,
and HSL [Cai et al., 2022]. We implement GCN and GAT by and a similar situation occurs at NTU. When DHGNN uses
the open tool PyTorch Geometric [Fey and Lenssen, 2019], the features of MVCNN alone, the accuracy is relatively low,
and we implement HGNN, HGNN+ , HNHN, HyperGCN and whether it is on ModelNet40 or NTU, but it achieves good
DeepHGSL by the open tool DHG (DeepHypergraph) [Gao grades when fusing these two sets of features. In sharp con-
et al., 2022]. For DHGNN and HSL, we use their source code trast to these two methods, TDHNN has achieved better re-
for experiments. sults, showing better stability and compatibility, whether us-
Setting-up ing a specific feature alone or fusing two sets of features si-
multaneously.
We uniformly set the feature dimension of the hyperedge de
Overall, the proposed TDHNN has three advantages: first,
to 128 and the initial sampling number of the hyperedge m
by applying the hypergraph structure, we can better mine the
to 100. The number of nodes used to update the hyperedge
multi-relationships in the data; second, we dynamically learn
features and the number of hyperedges each node belongs to
the hypergraph structure, which can fully mine the potential
are set to 10. For the hypergraph saturation score, we set the
relationships in the data; third, we can dynamically adjust
lower limit β to 0.9, and the upper limit γ is set to 0.95. We
the number of hyperedges to help learn a more reasonable
used dropout [Srivastava et al., 2014] to prevent overfitting
hypergraph structure.
and set the drop rate to 0.2. The optimizer we use is Adam
[Kingma and Ba, 2014], and the learning rate is 0.001. Citation Network Classification
The experimental results of experiments on Cora using dif-
3.2 Result and Discussion
ferent proportion samples are shown in Table 2. As is shown,
Visual Object Classification TDHNN has achieved the best results except for the 2% split,
The experimental results of visual object classification are which is 76.11% second to DHGNN (76.90%) with a gap
shown in Table 1. The proposed TDHNN achieves the best of 0.8%. Compared with graph algorithms GCN and GAT,
results among all comparison methods, regardless of the set TDHNN has an average increase of 4.11% and 1.57%, re-
of features used, both on ModelNet40 and NTU. Compared spectively. Compared with the static hypergraph algorithms
with graph algorithms GCN and GAT, TDHNN has an aver- HGNN, HGNN+ , and HNHN, TDHNN has an average im-
age increase of 2.82% and 2.88%, respectively. Compared provement of 3.14%, 4.37%, and 6.18%, respectively. Com-
with the static hypergraph algorithms HGNN, HGNN+ , and pared with dynamic hypergraph algorithms DHGNN, Hyper-

2480
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

lr GCN GAT HGNN HGNN+ HNHN DHGNN HyperGCN DeepHGSL HSL TDHNN
std 81.50±1.59 83.00±0.37 81.80±0.79 79.60±1.59 76.80±1.02 82.50±0.69 62.22±1.92 82.16±0.76 79.45±1.92 83.60±0.62
2% 69.60±0.19 74.80±0.81 75.40±1.17 71.40±1.41 66.50±2.06 76.90±1.71 46.21±1.53 74.52±1.74 74.86±4.19 76.11±1.14
5.2% 77.80±0.43 79.40±0.47 79.70±0.75 76.20±1.32 73.10±1.52 80.20±0.20 54.25±0.84 78.66±2.12 77.91±0.80 80.41±1.50
10% 79.90±0.66 81.50±1.90 80.00±0.28 78.20±0.87 76.50±1.94 81.60±0.12 64.93±0.42 79.29±1.32 79.18±1.55 84.53±1.69
20% 81.40±0.57 83.50±1.67 80.10±1.08 81.40±1.43 80.90±1.83 83.60±1.79 72.51±0.02 80.32±1.18 81.69±1.39 85.04±1.87
30% 81.90±1.82 84.50±0.15 82.00±1.67 82.50±0.17 82.50±1.04 85.00±0.41 78.82±1.83 83.22±1.16 83.15±1.70 85.86±1.81
44% 82.00±0.71 85.20±0.23 81.90±0.09 83.00±0.35 83.30±0.96 85.60±0.69 82.44±0.62 83.65±0.89 84.98±1.69 87.34±1.24

Table 2: Node classification results on Cora. In addition to the standard division, we randomly select 2%, 5.2%, 10%, 20%, 30%, and 44%
of the data, respectively as the training set.

GCN, DeepHGSL, and HSL, our classification accuracy has 3.4 Visualizations
averagely increased by 1.07%, 17.38%, 2.86% and 3.09% In order to better show the ability of TDHNN, we use the t-
respectively. Specifically, under the split of 2% and 5.2%, SNE algorithm [Van der Maaten and Hinton, 2008] to reduce
HGNN, HGNN+ , and our TDHNN have achieved relatively the dimensionality of the embedding of the model and visu-
similar performance. However, with the increase of the train- alize it. We also calculate the Silhouette score [Rousseeuw,
ing ratio, the advantage of TDHNN is revealed, which might 1987] of the embedding to evaluate it. We perform the same
be attributed to the supervised constraint loss Ls . experiment on HGNN+ and HNHN for comparison. The ex-
perimental results are shown in Figure 2. As is shown, in
3.3 Ablation Study the experimental results of TDHNN, the boundaries of cate-
The proposed TDHNN has three main components, a hy- gories are clearer. The Silhouette scores also prove this point,
peredge updater (HU for short), supervised constraints (C1 where the Silhouette score of HGNN+ and HNHN are 0.3531
for short), and unsupervised constraints (C2 for short). To and 0.3344, respectively, while TDHNN is 0.5081, which is
demonstrate the effectiveness of each part, we tested differ- 15.50% and 17.34% higher than the previous two methods.
ent combinations of these components. As shown in Table
3, first, TDHNN has an average increase of 1.22%, 3.75%,
3.5 Running Time
and 3.13% on ModelNet40, NTU, and Cora compared to us- In order to demonstrate the computational efficiency of
ing only a single component, respectively. In the case of only TDHNN, we compared the time required for each iteration
using HU, the performance of the model at each dataset is ac- with the latest dynamic hypergraph methods. As shown in
ceptable, especially on ModelNet40 (97.28%), which is close the Table 4, TDHNN improved by an average of 9.44 sec-
to using all components. In the case of using only C1, the onds, 3.01 seconds, and 2.31 seconds per epoch compared
overall performance drops a lot on ModelNet40 (94.24%) and to DHGNN, HyperGCN, and DeepHGSL on three datasets.
NTU (77.74%), but it is slightly better on Cora than using Compared to HSL, TDHNN is on average 0.04 seconds
only HU (increased by 0.3%). When only using C2, Mod- slower, that is because HSL samples and obtains new struc-
elNet40 also performed very well (97.36%), and it was also tures from existing hypergraph structures rather than using
better than using only HU or C1 on Cora. When any two com- features for reconstruction. In addition, we did not compare
ponents are used in combination, the overall performance is TDHNN with static methods as they did not reconstruct hy-
not significantly improved compared with using a single com- pergraphs.
ponent. Specifically, when combining C1 and C2, the model
cannot even converge on ModelNet40 and NTU (the classi- Time per epoch (seconds)
fication accuracy is 7.86% and 6.43%, respectively). How-
ModelNet40 NTU Cora
ever, when combining all these components, the model’s per-
formance achieves the best, and the case of non-convergence DHGNN 25.04 3.58 0.34
does not exist anymore. HyperGCN 6.71 1.16 1.81
DeepHGSL 5.33 1.27 0.97
HU C1 C2 ModelNet40 NTU Cora HSL 0.42 0.06 0.02
✓ 97.28±0.25 84.71±0.80 79.9±1.25 TDHNN 0.42 0.11 0.11
✓ 94.24±0.16 77.74±0.06 80.2±1.86
✓ 97.36±1.47 84.45±0.91 81.3±1.36 Table 4: Running time for each epoch.
✓ ✓ 96.88±0.36 83.64±1.21 80.9±0.75
✓ ✓ 97.36±1.20 84.45±1.30 79.2±0.32
✓ ✓ 07.86±0.15 06.43±0.64 79.6±0.53 3.6 Sensitivity Analysis
✓ ✓ ✓ 97.52±0.80 86.05±1.10 83.6±0.62 Effect of Hypergraph Saturation Threshold
We tested the effect of setting the saturation threshold on the
Table 3: Classification accuracy (mean and standard deviation) of model on ModelNet40 and NTU. We set the lower thresh-
our method with different components on all datasets.
old β from 0.1 to 0.9 with a step size of 0.1 and the corre-
sponding upper threshold γ = β + 0.05. According to the

2481
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

(a) HGNN+ (b) HNHN (c) TDHNN

Figure 2: t-SNE embeddings of HGNN+ , HNHN, and the proposed TDHNN on Cora. The Silhouette scores of the embeddings learned by
the three methods are 0.3531, 0.3344, and 0.5081, respectively.

experimental results, on ModelNet40, with the change of β of hyperedges and adjusts the number of samples according to
and γ, the model’s accuracy has almost no change. On NTU, the saturation score of the learned hypergraph to dynamically
the model’s accuracy fluctuates slightly with the change of adjust the hypergraph structure. To make the constructed hy-
threshold; the model reaches its best at the point β = 0.9, pergraph more reasonable, we propose two constraints on the
but overall the difference is not much. The result shows that graph and features after convolution. Experiments demon-
the learned hypergraph indeed has a large number of redun- strate the effectiveness of our method. However, this method
dant edges. Although these redundant hyperedges do not af- also has many aspects that can be improved. For example,
fect the model’s accuracy, they add many unnecessary cal- the measurement of hypergraph saturation and the strategy of
culations, reducing the computational efficiency. Thanks to adjusting the number of hyperedges may be too simple; the
our strategy of dynamically adjusting the number of hyper- relationship between the number of hyperedges and samples
edges, our model can control the number of redundant hy- has yet to be further explored and theoretically proved. It is
peredges by setting a threshold, thus reducing the computa- also the direction we need to continue in-depth research and
tional overhead. Specifically, considering the convolutional exploration in the future.
(t)
operation H(t) Xe Θ(t) in Eq. (12), the time complexity is
O(n × m × de ) + O(n × de × d) = O(nm). When using Contribution Statement
a traditional method such as kNN to generate a hypergraph, Zongqian Wu made equal contributions to this work. He was
the number of hyperedges m is equal to the number of nodes involved in the overall design of the method, conducted com-
n, then the time complexity is O(n2 ). In contrast, the pro- parative experiments, contributed to chart design, and partic-
posed TDHNN sets m as a constant, and the model optimizes ipated in some writing tasks.
this constant at each iteration so that the time complexity of
convolution is reduced to O(n).
References
Trade-off between Two Constraints [Arya et al., 2020] Devanshu Arya, Deepak K Gupta, Stevan
In order to make the learned hypergraph better, we set a su- Rudinac, and Marcel Worring. Hypersage: Generalizing
pervised constraint and an unsupervised constraint. How to inductive representation learning on hypergraphs. arXiv
set the weight of these two constraints is a question worth preprint arXiv:2010.04558, 2020.
discussing. We conducted experiments on ModelNet40 and [Bai et al., 2021] Song Bai, Feihu Zhang, and Philip HS
NTU, respectively, and set λ1 and λ2 from 10 to 100 with a Torr. Hypergraph convolution and hypergraph attention.
step size of 10. It can be concluded that: (1) TDHNN is less Pattern Recognition, 110:107637, 2021.
sensitive to λ1 and λ2 since as λ1 and λ2 change, the variance
of TDHNN’s accuracy within each data distribution tested is [Berahmand et al., 2021] Kamal Berahmand, Elahe Nasiri,
relatively low; (2) for different data distributions, the influ- Mehrdad Rostami, and Saman Forouzandeh. A modified
ence of λ1 and λ2 is different, i.e., the trade-off of λ1 and λ2 deepwalk method for link prediction in attributed social
depends on the dataset. network. Computing, 103(10):2227–2249, 2021.
[Cai et al., 2022] Derun Cai, Moxian Song, Chenxi Sun,
4 Conclusion Baofeng Zhang, Shenda Hong, and Hongyan Li. Hyper-
graph structure learning for hypergraph neural networks.
In this paper, we propose a novel dynamic hypergraph learn- In IJCAI, pages 1923–1929, 2022.
ing framework that can dynamically adjust both the structure
of the hypergraph and the number of hyperedges. As far as we [Dong et al., 2020] Yihe Dong, Will Sawin, and Yoshua
know, this may be the first work that constructs hypergraphs Bengio. Hnhn: hypergraph networks with hyperedge neu-
centered on the features of hyperedges, and it opens up a new rons. arXiv preprint arXiv:2006.12278, 2020.
way for the research and learning of hypergraph neural net- [Fang et al., 2014] Quan Fang, Jitao Sang, Changsheng Xu,
works. The proposed TDHNN learns the feature distribution and Yong Rui. Topic-sensitive influencer mining in

2482
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

interest-based social media networks via hypergraph learn- [Van der Maaten and Hinton, 2008] Laurens Van der Maaten
ing. IEEE Transactions on Multimedia, 16(3):796–812, and Geoffrey Hinton. Visualizing data using t-sne. Journal
2014. of machine learning research, 9(11), 2008.
[Feng et al., 2018] Yifan Feng, Zizhao Zhang, Xibin Zhao, [Vaswani et al., 2017] Ashish Vaswani, Noam Shazeer, Niki
Rongrong Ji, and Yue Gao. Gvcnn: Group-view convolu- Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
tional neural networks for 3d shape recognition. In CVPR, Łukasz Kaiser, and Illia Polosukhin. Attention is all you
pages 264–272, 2018. need. In NeurIPS, volume 30, 2017.
[Feng et al., 2019] Yifan Feng, Haoxuan You, Zizhao [Veličković et al., 2017] Petar Veličković, Guillem Cucurull,
Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural net- Arantxa Casanova, Adriana Romero, Pietro Lio, and
works. In AAAI, pages 3558–3565, 2019. Yoshua Bengio. Graph attention networks. arXiv preprint
arXiv:1710.10903, 2017.
[Fey and Lenssen, 2019] Matthias Fey and Jan E. Lenssen.
Fast graph representation learning with PyTorch Geomet- [Wang et al., 2015] Meng Wang, Xueliang Liu, and Xindong
ric. In ICLR Workshop on Representation Learning on Wu. Visual classification by l1-hypergraph modeling.
Graphs and Manifolds, 2019. IEEE Transactions on Knowledge and Data Engineering,
27(9):2564–2574, 2015.
[Gao et al., 2020] Yue Gao, Zizhao Zhang, Haojie Lin,
Xibin Zhao, Shaoyi Du, and Changqing Zou. Hypergraph [Wang et al., 2021] Meihong Wang, Linling Qiu, and Xiaoli
learning: Methods and practices. IEEE Transactions on Wang. A survey on knowledge graph embeddings for link
Pattern Analysis and Machine Intelligence, 2020. prediction. Symmetry, 13(3):485, 2021.
[Gao et al., 2022] Yue Gao, Yifan Feng, Shuyi Ji, and Ron- [Wu et al., 2015] Zhirong Wu, Shuran Song, Aditya Khosla,
grong Ji. Hgnn+ : General hypergraph neural networks. Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong
IEEE Transactions on Pattern Analysis and Machine In- Xiao. 3d shapenets: A deep representation for volumetric
telligence, 2022. shapes. In CVPR, pages 1912–1920, 2015.
[Wu et al., 2022] Shiwen Wu, Fei Sun, Wentao Zhang,
[Huang and Yang, 2021] Jing Huang and Jie Yang. Unignn:
Xu Xie, and Bin Cui. Graph neural networks in rec-
a unified framework for graph and hypergraph neural net-
ommender systems: a survey. ACM Computing Surveys,
works. arXiv preprint arXiv:2105.00956, 2021.
55(5):1–37, 2022.
[Huang et al., 2009] Yuchi Huang, Qingshan Liu, and Dim- [Yadati et al., 2019] Naganand Yadati, Madhav
itris Metaxas. ] video object segmentation by hypergraph Nimishakavi, Prateek Yadav, Vikram Nitin, Anand
cut. In CVPR, pages 1738–1745, 2009. Louis, and Partha Talukdar. Hypergcn: A new method for
[Jiang et al., 2019] Jianwen Jiang, Yuxuan Wei, Yifan Feng, training graph convolutional networks on hypergraphs. In
Jingxuan Cao, and Yue Gao. Dynamic hypergraph neural NeurIPS, volume 32, 2019.
networks. In IJCAI, pages 2635–2641, 2019. [Zhang et al., 2022] Zizhao Zhang, Yifan Feng, Shihui Ying,
[Kingma and Ba, 2014] Diederik P Kingma and Jimmy Ba. and Yue Gao. Deep hypergraph structure learning. arXiv
Adam: A method for stochastic optimization. arXiv preprint arXiv:2208.12547, 2022.
preprint arXiv:1412.6980, 2014.
[Kingma and Welling, 2013] Diederik P Kingma and Max
Welling. Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114, 2013.
[Kipf and Welling, 2016] Thomas N Kipf and Max Welling.
Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907, 2016.
[Rousseeuw, 1987] Peter J Rousseeuw. Silhouettes: a graph-
ical aid to the interpretation and validation of cluster anal-
ysis. Journal of computational and applied mathematics,
20:53–65, 1987.
[Srivastava et al., 2014] Nitish Srivastava, Geoffrey Hinton,
Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdi-
nov. Dropout: a simple way to prevent neural networks
from overfitting. The journal of machine learning re-
search, 15(1):1929–1958, 2014.
[Su et al., 2015] Hang Su, Subhransu Maji, Evangelos
Kalogerakis, and Erik Learned-Miller. Multi-view con-
volutional neural networks for 3d shape recognition. In
ICCV, pages 945–953, 2015.

2483

You might also like