Interaction-Aware Hypergraph Neural Networks For User Profiling
Interaction-Aware Hypergraph Neural Networks For User Profiling
User Profiling
2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) | 978-1-6654-7330-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/DSAA54385.2022.10032374
attracted increasing attention due to its various applications User click advertisement
Most existing studies regard user profiling as a node classification Item owns category
task and utilize graph-based methods to exploit users’ relations Item Item owns word
Through the analysis of user information, user profiling man- lations between users, failing to model interactions in a
ages to depict comprehensive user interests or user demo- higher order. Actually, there exists many high-order com-
graphic attributes, so as to make the recommendation effect plicated interaction relations among users. For example,
of the recommender systems more reliable. Previous psycho- more than two users are connected together by some high-
logical work also has highlighted the value of user profiling order interactions called hyperedges if they co-purchase
in improving recommender systems [1] [2]. or co-click the same commodities in Fig.1. Essentially,
Regarding user profiling as a classification task, traditional in contrast to pairwise relationships in a graph, acquiring
user profiling methods usually leverage machine learning complex high-order interaction information among enti-
algorithms to predict users’ age, gender, personality or inter- ties will greatly enhance inference performance.
ests based on user-generated content like texts, behaviors, or • Only one view of users’ explicit interaction information
relations [3][4][5]. Compared with machine learning methods, has been applied in the most existing methods, lead-
ing to information loss and impeding expressiveness. In
∗ Tao Zhao is corresponding author. reality, however, potential interactions among users can
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
be captured from multiple perspectives. For example, and been studied in both academia and industry over the
as shown in Fig. 1, users in e-commerce connect with past decades. In the early age, data mining and machine
each other through different kinds of interactions, such as learning approaches have been widely used in inferring a
explicit interactions analyzed from many same behaviors user’s characteristics based on digital footprints left on social
and implicit interactions like their similar topic interests, media platforms like textual contents or behavior records. The
etc. Therefore, mining multi-view interactions is of great main idea lies in how to obtain users’ features naturally. For
significance to improve classification performance in user instance, [11] [12] employ matrix factorization (MF) to solve
profiling. user profiling for recommend systems based on users’ item-
Towards the above key issues, in this paper, we pro- level responses like ratings or clicks.
pose a novel framework for user profiling, Interaction-aware Recently, motivated by the popularity of deep neural net-
Hypergraph Neural Networks (IHNN), which is capable of work in diverse fields [13]–[16], scholars have attached more
integrating complex high-order interaction and multiple types importance to how to automatically learn representations of
of interactions including explicit and implicit relationships users by neural network based on visual content. For example,
among diversified users’ data. Through constructing multi- [17] analyzes users’ personality using face++ and EmoVu,
ple hypergraphs comprised by degree-free hyperedges, our two APIs based on deep learning with facial features. What’s
model can overcome the limitations of traditional pairwise more, due to the prominent representation learning ability
connections as well as inadequate interaction expressiveness in of dealing with non-Euclidean relations among users, GNN
previous graph-based user profiling. Specifically, as shown in has also attracted much attention [18] [19] [20]. For instance,
Fig.2, in order to aggregate various user affiliated information, [9] proposes a multi-view geolocation model based on Graph
IHNN first learns user presentations initially by leveraging Convolutional Networks(GCN), which jointly learns from text
hierarchical heterogeneous graph using meta-path based at- and network information to classify a user timeline into a
tention operations. Then, in contrast to traditional graph-based location.
user-profiling methods containing single pairwise interactions, To get rid of the shortcoming of single data type and
IHNN introduces multiple high-order hypergraphs from users relations in the homogeneous information network in most
and their affiliated data, which takes advantage of rich struc- previous work, recent heterogeneous-graph based models have
tural information and avoids deficiency of semantic relations made use of the diversity of node and edge types to gener-
among users simultaneously. Concretely, feeding node embed- ate graphs with complex structure and rich semantics, thus
dings of users, we adopt a hypergraph convolutional operation achieving better performance. To deal with the linkage paths
to aggregate high-order neighbor correlations across differ- in the network, meta-path based model metapath2vec [21] was
ent relationships and acquire comprehensive user structure first researched in the heterogeneous graph to utilize semantic-
information by concatenating diverse hypergraphs constructed aware random walks and leverage heterogeneous skip-Gram
by several sources of data. Extensive experiments for user model to perform vertex embedding. Other content-aware
profiling tasks have been conducted on two real-world large- methods, e.g., ASNE [22] combines topology structure and
scale e-commerce datasets to verify the effectiveness of our arbitrary type attributes to learn embedding instead. However,
proposed approach. To sum up, the main contributions of this these models are limited in formulating rich neighbor infor-
paper are three-fold: mation. Considering the powerful capability of capturing rich
• We propose to address the user profiling task by intro- neighbor information and heterogeneity of nodes of GNN,
ducing multiple types of complex high-order interactions a few works jointly combine graph convolutional networks
among users obtained from user and their affiliated data. with heterogeneous graph. HetGNN [23], adopts graph at-
To our knowledge, this is the first work in this community. tention operation combined with BiLSTM to achieve fusion
• We propose a novel framework for user profiling called of heterogeneity of content and neighborhood information of
Interaction-aware Hypergraph Neural Networks (IHHN), nodes. HGAT [8] introduces co-relation like co-click in e-
which integrates hypergraphs with meta-path based het- commerce and hierarchically conveys information included in
erogeneous graph. IHHN is able to comprehensively subgraph using graph attention operation to predict users’ age
explore implicit and explicit high-order interactions from or gender. However, these user profiling prediction methods
different perspectives to improve the performance of user mainly focus on the paired interaction relationship between
profiling. users, the complex higher-order relationships that can be
• We carry out extensive experiments on two real e- mined from users’ behavior data are not considered, which
commerce datasets, demonstrating that the proposed might lead to inferior performance of user profiling.
model greatly outperforms state-of-the-art user profiling
B. Hypergraph Learning
methods.
A hypergraph is a generalized graph whose edges can be
II. R ELATED W ORK
connected to any number of vertices more than that of in
A. User Profiling simple graphs. Owing to superiority for formulating com-
Defined as a process of identifying users’ information in the plex and high-order data correlation, hypergraph has been
field of interest, user profiling has attracted much attention applied on multiple tasks such as node classification [24], link
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
prediction [25] as well as multiple fields such as computer example, as shown in Fig. 1, the meta-path (attributes-item-
vision, community detection [26]. Take computer vision as user-user) means the information can be passed from attributes
an example, extensive works employ hypergraph learning to of items to items and eventually user itself. Based on learning
achieve 3D object classification [27], video segmentation, etc. neighbor representation information of each node, meta-path
In [28], the author proposes an unified hypergraph learning based graph presentation learning for fusing information from
framework to exploit social profile relations between users to diverse sources of data is built.
mapping common users across these social networks.
In addition to a variety of applications, hypergraph learning B. Hypergraph Representation Learning for User Profiling
has been developing rapidly for obtaining more detailed and Hypergraph structure is a general concept where the
complex relationships. With the great success of GCN, hy- edge can connect two or more vertices so as to model
pergraph was first referred and dealt with clique expansion in high-order relationships among vertices. Given the general
[29] with the aim of minimizing label discrepancy of nodes on graph concerning users with affiliated information, we
same hyperedge for hypergraph embedding in the transductive construct some certain hypergarphs based on multiple
classification. Feng et al propose HGNN, a Hypergraph Neural interactions. Defined the hypergraph for user profiling
Networks to learn hidden layer representation [30]. Motivated as GH = (VH , EH , WH ), VH represents the set of users
by GNN, HyperGCN [31], approximating each hyperedge by a U, edge eH ∈ EH denotes the subsets of U, eH =
set of pairwise edges connecting the vertices of the hyperedge, {(ui , uj , . . . um ) |ui , uj , . . . um ∈ U, 1 <= i, j, . . . m <= |U |}
suggests a more efficient and faster model for the same tasks is called hyperedge extracting complex relationships among
to HGNN. Dynamic Attention has been employed to learn users and WH means the weight of hyperedges representing
weights of different hyperedges in recent developments [32] the importance of hyperedges. Generally, EH = ∅,
[33]. Regarding the multi-modal data, [27] constructs multiple |eH | >= 2. For example, users who purchase the same
hypergraphs for a set of 3-D objects based on their 2-D commodity could constitute a hyperedge which generally
views. However, only multiple views of single source of data involves more than two users. Besides, the method of
are considered in most schemes which ignore heterogeneous constructing hypergraph is usually to model the first-order
information and not incorporate affiliated information existing neighbors of nodes or K nearest neighbor nodes based on
in diverse data like titles of commodities in e-commerce KNN algorithm as hyperedges. Indicated by intuition that
systems. vertices on the same hyperedge share similar characteristic,
propagation process on hypergraph is applied to shorten the
III. P RELIMINARIES vector distance of users with similar labels and represent
complicated structured data.
A. Meta-path based Graph Representation Learning
The intrinsic goal in user profiling lies in learning latent
A typical example of user profiling is to identify and predict representation of users Z ∈ R|U |×d which maintains various
users’ traits such as users’ gender and age through various relations and attribute information of nodes. Therefore, node
types of text behavior data generated by users in e-commerce embedding can be used in multiple downstream tasks such
platforms. Fig. 1 shows the architecture of interaction-aware as classification or link prediction. Owing to deficiency of
information network in e-commerce. Given a simple heteroge- labelled data, we devise and train a hybrid model Fθ with
neous interactive graph GS = (VS , ES , WS ), VS denotes a set parameter θ utilizing both label information and correlations
of nodes of various types, including users U (i.e., consumers), in a semi-supervised way referring to [8].
items I (i.e., commodities and advertisements), attributes A
(e.g. words in the titles of commodities), ES represents edges IV. T HE P ROPOSED M ETHOD
with pairwise interactions, WS means the weight of edges This section introduces the details of the proposed model
indicating the importance of edges in the graph structure. IHNN for user profiling. IHNN takes the interactions among
According to the interactions among distinctive types of nodes users, items such as advertisements and commodities, at-
in user profiling, our graph is divided into two subgraphs con- tributes and the interactions among users as input. By com-
sisting of attribute-item subgraph Eai and item-user subgraph bining the advantages of hypergraphs and meta-path based
Eiu , each edge eai ∈ Eai represents item owns an attribute, graphs, the model solves the problems of message passing
Each edge eiu ∈ Eiu represents user interacts an item, such and higher-order relationship mining of users in heterogeneous
as user purchases, clicks a commodity or adds a commodity graphs. More specifically, at the top level of the model, we
to the shopping cart. In e-commerce platform, in order to first employ low-dimensional pretrained embedding of user
utilize users’ affiliated information such as brands of products associated attributes as input and utilize graph attention oper-
purchased by users, etc., we restrict meta-graph [34] which is ator into the message passing mechanism based on meta-path
a directed acyclic graph to user profiling. Given source node to obtain the low-dimensional user feature vector. Multiple
ns and target nt , a meta-path from ns to nt with length l hypergraphs have been constructed and integrated based on
r1 r2 r
is defined as t1 → t2 → . . . →l tl+1 , in which ti denotes hypergraph theory and multiple explicit or implicit interactions
different type of nodes in the hierarchy, ri represents various among users. Through hypergraph convolutional operation,
relations, i.e., edges between two heterogeneous nodes. For users’ features are updated when aggregating information from
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
L=k
...
e2 L=1
Context Vector Context Vector e1 Y
Attribute Item
e2 e4
¢11 ¢11 e3 e8 Training
e4
Item User e3 e5
e5 e6
Softmax
... ¢13 ¢12 ¢13 ¢12 V E V
... ... Y'
Fusion e1
e7
¢21 ¢24 ¢21 ¢24 e7
e6
Attributes Users e8
Users
¢22 ¢23 ¢22 ¢23 Hypergraph convolution
Node->Edge->Node
...
Multi-view Hypergraph Construction Hypergraph Convolutional Operation
neighbors. Finally, the operation of user profiling prediction in respectively. c ∈ RDi denotes the learned vector in attribute-
IHNN is employed to get the classification results of users. The to-item propagation. The result eij is attention coefficient of
overall architecture of our proposed model is shown in Fig. 2. neighbor j to center node i, measuring the importance of
different neighbors to i.
A. Meta-path based Heterogeneous Interaction Layer
exp (eij )
In IHNN, in order to get rich semantic information of users, αij = (2)
k∈Ni exp (eik )
we take a meta-path based method hierarchically aggregating
unsupervised interaction information from diverse nodes with Then, applying softmax function so as to normalize the
different semantic roles, such as commodities and attributes. coefficient and get the weight of neighbors for each item.
As demonstrated in section III-A we build an information Therefore, we can acquire the new embedding of item node
dissemination framework from attribute to item to user, which hi ∈ RDi by adding the product of weighting coefficient and
is mainly divided into two layers: attribute-item layer and feature together as follows, where Ni is the neighbor set of
item-user layer based on attribute-item subgraph and item- item i.
user subgraph, respectively. In our framework, we get all hi = αij W ai hij (3)
the pretrained interactive attributes features and graphs repre- j∈Ni
senting interactions among different types of nodes as inputs Similarly, through same propagation operation, we can
in the feature matrix for each user. To learn the importance aggregate the item information and pay more attention to
between nodes and neighbors and extract users’ features, graph important item node for each user. Eventually extracting
attention mechanism is taken into consideration to aggregate ample semantic meaning of users, all users are collectively
attributes information associated with items as well as items represented as a feature matrix X ∈ R|U |×Du .
information correlated with users.
Here, we take attribute-item layer as an example. Generally, B. Multi-view Hypergraph Convolution Layer
for center node like item i in attribute-item layer, they all As demonstrated in the introduction, real-world correlations
interact with several attributes such as descriptions of items. among users can be more complicated than pairwise inter-
However, a simple averaging or concatenating operation on actions. Besides, the explicit interactions reflected on meta-
neighbors can lead to users’ semantic information loss, as path based heterogeneous graph tend to result in some loss
users ignore the diverse significance of nodes which contribute on underlying interactions like similar topic interests among
differently to their representations. Then our model employs users due to multi-level messaging process, we need other
attention mechanism and define the information propagation structure to supplement the structure information and not bring
process from neighbors to center in attribute-item layer as new noise. So we firstly construct multi-view hypergraphs
follows: combining explicit interaction-based hypergraphs with implicit
eij = cT tanh (W ai hij + bai ) (1) interaction-based hypergraphs to solve the problems. Follow-
ing, based on the constructed hypergraphs, we introduce the
where, given Da , Di , Du as the embedding dimension of hypergraph neural networks in detail.
attributes, items and users, respectively, hij ∈ RDa represent 1) Multi-view Hypergraph Construction: After the initial
the hidden embedding feature of neighbor j (i.e., the attribute embedding vectors of all users are obtained, in order to mine
neighbor of item i). W ai ∈ RDi ×Da and bai ∈ RDi indicates the high-order relations among users and acquire semantically
the weight matrix and bias vector in attribute-item layer, rich representation of users, we construct hypergraph and
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
aggregate high-order users’ information on the hyperedge to obtain the formalized hypergraph convolution since the
then update users’ embedding. Because a hypergraph only hypergraph convolution involves matrix decomposition and
contains single semantic information, multiple hypergraphs multiple matrix multiplication operations. In order to enhance
{HM1 , HM2 , · · · , HMT } are concatenated. In the model, each the stability and expression ability of the model, we construct
hypergraph is represented by a binary incidence matrix HMt ∈ multiple convolutional layer. The (l + 1)th layer hypergraph
R|U | ×|EMt | , where |U |, |EMt | denote the number of users and convolutional operation is defined as:
hyperedges, respectively, with rows representing nodes namely
users and columns representing hyperedges. The element in X (l+1) = σ D −1/2
v HM W D −1
e HM D v
−1/2
X (l) Θ(l) (6)
HMt is defined as follows:
where HM ∈ R|U |×|EM | is the incidence matrix of hy-
T
1, v ∈ e pergraph obtained by concatenation, |E M | = i=1 (|EMi |),
h(v, e) = (4) Dv ∈ R |U |×|U |
and De ∈ R |EM |×|EM |
denote the diagonal
0, v ∈/e
matrixes in which diagonal digits represent degrees of user
which indicates the entrance is non-zero when the node is set U, hyperedge set EM , respectively. For each user vi ∈ U ,
on the hyperedge. In e-commerce system, semantic informa- the degree of it representing the number of related hyperedges
tion of hyperedge significantly influences the performance of is defined as:
hypergraph convolutional operation leading to poor quality in
d (vi ) = w (ei ) = h (vi , ei ) w (ei ) (7)
information aggregation.
ei ∈EM |vi ∈U ei ∈EM
Considering the data characteristics of users with multiple
behaviors in e-commerce, we propose to jointly integrate The degree of hyperedge ei ∈ EM indicates the number of
explicit interaction-based hypergraphs as well as implicit users on the hyperedge, which is defined as:
interaction-based hypergraphs like topic-based hypergraphs.
On one hand, for explicit interaction-based hypergraphs, they δ (ei ) = |ei | = h (vi , ei ) (8)
connect all users who shared same historical behavior di- vi ∈U
rectly. For instance, users who click or purchase the same
In (6), X (l) means the updated feature matrix of sampled
commodity to form a hyperedge in HM1 ∈ R|U | ×|EM1 | ,
users in the lth layer, namely hypergraph signal of layer l.
users who co-click an advertisement to form a hyperedge (l) (l+1)
Θ(l) ∈ RC ×C represents the learnable parameter of
in HM2 ∈ R|U |×|EM2 | which is semantically different from lth layer where C (l) denotes the hidden size of lth layer.
HM1 . Functioning as the convolution kernel, Θ(l) is constantly
Besides, on the other hand, although heterogeneous atten- updated in the training process, so as to achieve a better
tion mechanism has the capacity to get rich attribute informa- performance of extracting node features in the hypergraph
tion and structural information from multiple nodes associating convolutional operation. σ represents the nonlinear activation
with users, some attribute information loss associated with function.
users may be produced due to the process of propagation. We Compared with traditional graph neural networks, hyper-
believe that similar users will own similar interactive attributes graph convolutional neural networks not only makes use of
indicating users’ latent interests. Hence, we introduce topic- graph topological structure and node content characteristics,
based model such as Latent Dirichlet Allocation(LDA) model but also realizes node-hyperedge-node information conversion.
to model attribute information with direct relevance to users Specifically, after multiplying H T and Θ(l) , the operation
for generating topic embedding vectors of each user. Even- extracts users’ features and aggregates information of users
tually, abundant semantic information of users explored from on each hyperedge to formulate embedding of the hyper-
different data sources could be generated. Making use of topic edge. Similarly, features of users are updated by aggregating
embedding vectors, one hyperedge is constructed to connect information of correlated hyperedges. Through multi-layer
the centroid and its K nearest neighbors in the corresponding hypergraph convolutional operation, we obtain the learned
feature space in hypergraph HM3 ∈ R|U | ×|EM3 | . Generally, users’ representation embedding matrix Z uf ∈ R|U |×Du .
we employ fusion strategy to concatenate all the T hypergraphs
in column-wise to generate the whole hypergraph incidence C. User Profiling Prediction
matrix HM used in convolutional operation as follows:
T
Owing to the deficiency of data, user profiling as age and
HM = Concat (HM1 , HM2 , . . . HMT ) ∈ R|U |× i=1 (|EMi |) gender prediction is viewed as a multi-classification task with
(5) semi-supervised learning. Finally, with support of partial real
2) Hypergraph Convolutional Operation: The embedding labels of trained users, we try to minimize the loss to learn
feature matrix of all users X = [x1 , x2 , . . . x|U | ] can be the parameters in the course of model training. Here, the cross
obtained through (3). After applying graph Laplacian of the entropy loss function is applied to calculate the loss:
traditional graph neural network to hypergraph, we approxi-
Fy
mate the convolution kernel by using the first-order Cheby- L=− Yuf log (softmax (Z uf )) (9)
shev polynomial and simplify the hypergraph convolution u∈U f =1
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
where Zuf denotes the output of hypergraph convolutional TABLE I
layer, Fy represents the number of labels. Row-wise softmax S TATISTICS OF L ABEL IN A LIBABA -DATASET.
function is used to scale the data to get the prediction result of
Gender Age
each user’s label category. U means the set of users labelled.
Male Female 1 2 3 4 5
D. Hypergraph Sampling 99,366 232,499 19,983 59,713 99,157 85,311 67,701
In real world, there may be huge number of users in
TABLE II
e-commerce and social media, leading to expensive graph S TATISTICS OF L ABEL IN JD-DATASET.
storage overhead when training our model. Hence, compared
with the traditional method of introducing all node data into Gender Age
the model, we design two following sampling methods for two Male Female <26 26-35 36-55 >55
kinds of hypergraphs. 20,427 30,453 9,329 25,713 14,853 4,985
k-hop sampling method for explicit interaction-based
hypergraphs. For each user u in the set U, k-hop sampling
method is involved to generate a mini-hypergraph that only
contains part of users closely associated with it in the full V. E XPERIMENTS
graph. The parameters of k-hop are denoted as {m1 , m2 ,... A. Datasets
mk }. We reagrd all users on the same hyperedge of the explicit
interaction-based hypergraphs with u as neighbors of u, and To verify the effectiveness of our proposed model IHNN for
generate a mini-hypergraph of u iteratively based on these user profiling, we employed two real-world datasets Alibaba-
neighbors. Specifically, m1 neighbors highly related to u are dataset1 , JD-dataset2 from Alibaba and JD.COM, both popular
randomly selected from the first hop, and the neighbor set is e-commerce platforms in China. These two datasets contain
Nu1 . For ∀m ∈ Nu1 , We firstly select m2 neighbors from historical behavior records such as clicks generated when
the neighbor set, and the neighbor set generated by all new users visit the portal site, as well as information of users,
nodes is denoted as Nu2 . Obtaining all the k-hop neighbors etc. Alibaba-dataset is about the history of advertising display
Nu1 , Nu2 , . . . Nuk of user u at each hop in an iterative way, we and behavior logs of users who log on to Taobao. The
integrate all the neighbors and user u together as a user-user interactions among nodes include ‘users click advertisements’
mini-hypergraph centered on user u, where the user set and the and interactions between users and commodities including
number are denoted as GU and KU , respectively. Furthermore, ‘purchase’, ‘click’, ‘add to the shopping cart’ and ‘favorite’,
for ∀u ∈ GU in the user-user mini-hypergraph, we sample s1 etc. JD-dataset contains users’ historical text of clicks and
items from its associated items to generate user-item mini- purchases, the explicit interaction includes ‘users click or
graph. Similarly, s2 attributes for each item in sampled s1 purchase commodities’, etc. The task of user profiling is
items are sampled from associated attribute neighbors to form predicting age and gender of all users. Labels of users’ age
attribute-item mini-graph. in Alibaba-dataset have been divided in the original dataset.
Global sampling method for implicit interaction-based In the former dataset, attributes refer to ‘category’, ‘brand’,
hypergraphs. Sampling a mini-hypergraph for each user, we ‘campaign’ of the commodities and the latter refer to words
then construct multi-view hypergraphs based on the corre- in the titles of commodities. A detailed description of labels in
sponding interactions on vertices of mini-hypergraph. How- the two datasets is listed in Table I and Table II. Besides, our
ever, the uncertainty of random sampling may lead to the JD-dataset is filtered more users interacting with no attributes
problem of large similarity deviation of users in some mini- which is slightly different from dataset in [8].
hypergraphs. Besides, for implicit interaction-based hyper-
B. Baseline Methods
graph construction method relying on distance among vertices,
direct use of the commonly used K-nearest neighbor method We compared our proposed model with both classical and
may introduce noise when the overall distance between users state-of-the-art graph-based methods for user profiling task.
in the mini-hypergraph is obvious. Therefore, we propose to • Logistic Regression (LR) [35]: A widely used machine
sample the first KS users close to each user in the global learning model employed for supervised learning charac-
users according to the Euclidean distance in the user’s mini- terized by using nonlinear transformation to predict labels
hypergraph. Considering the sampled users, the KS users are of nodes.
intersected with the users in the mini-hypergraph. Therefore, • Support Vector Machine (SVM) [36]: A classical classi-
the intersecting users obtained can be recorded as the im- fication method which aims to divide the training data set
plicit interaction-based hyperedge of the user in the mini- correctly along with maximum geometric interval.
hypergraph. In this way, we reduce the risk of introducing • Graph Convolution Neural Network (GCN) [37]: A pop-
noise due to the large difference of users labels in the mini- ular model with the capability of extracting features of
hypergraph, which makes the constructed implicit interaction-
hypergraphs more stable and enhances the reliability of the 1 https://tianchi.aliyun.com/dataset/dataDetail?dataId=56
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
TABLE III
OVERALL P ERFORMANCE C OMPARISION OF A LL M ODELS ON T WO DATASETS .
TABLE IV
D ETAILED P ERFORMANCE C OMPARISION ON I NDIVIDUAL L ABEL (F1- SCORE ).
Euclidean spatial data, which gathers neighbor informa- click-commodity and co-click-advertisement hypergraph are
tion to obtain node features for its usage of downstream combined in Alibaba-dataset. Following the prior work [7],
tasks and widely used in homogeneous graph. we divide both datasets into training set, validation set and test
• Graph Attention Network (GAT) [38]: To handle the set at the proportions of 75%, 12.5% and 12.5% respectively.
problem of ignoring the importance of different neighbor Accuracy and Macro-F1, commonly used to evaluate multi-
for center node, Velickovic et al introduce attention mech- classification performance, are used to measure the perfor-
anism to graph convolution operation, which achieves mance of each experiment.
better performance in homogeneous graph. In order to realize semi-supervised learning, we only use
• HGCN [8]: An instantiated model of heterogeneous batchsize users with ground-truth labels during training while
graph attention network which implements the graph the labels of other users in the batchsize mini-hypergraphs
convolutional operation to obtain features of users in is unknown. We utilize Adam optimizer and early stopping
heterogeneous graph. method to train the model and get optimized results within
• HGAT [8]: Another instantiated model of heterogeneous 1000 epochs. With the trained model, we could get the results
graph attention network but employs graph attention of gender and age prediction in test set. Besides, the pretrained
operation compared with HGCN. embeddings of attributes stem from FastText [39]. In the k-hop
sampling method for explicit interaction-based hypergraphs, k
C. Implementation Details is set to 2, parameters of k-hop are {10,4} which means KU
For our model, we implement the architecture with PyTorch is set to 50 and the sampling parameters s1 and s2 are both set
which contains a two-layer hypergraph convolutional opera- to 10. In the global sampling method for implicit interaction-
tion. Because JD-dataset contains detailed information of at- based hypergraphs, we set KS to 1000. The topic number
tributes in the form of text while Alibaba-dataset only contains used in implicit interaction-based hypergraph is set to 10. To
the ID of users after desensitization, we design and construct facilitate calculation, the embedding dimension of attribute,
multi-view hypergraphs differently for the two datasets ac- item and user are all fixed at 200. In multi-view hypergraph
cording to the characteristics of the data. Specifically, we con- convolution layer, we use ReLU as the activation function,
catenate the topic-based hypergraph and co-purchase-or-click- seting dropout and hidden units to 0.6 and 16, respectively. For
commodity hypergraph in JD-dataset while co-purchase-or- the baseline methods, in order to ensure fairness, we use the
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
same dimension of attribute embedding, item embedding and to achieving significant performance improvements on the
user embedding as IHNN. The learning rate and mini-batch categories with a small number of users.
size in gender prediction are set to 0.005 and 64 respectively On the whole, IHNN achieves the best results mostly due
for all models which are set to 0.01 and 32 in age prediction to the following reasons. Firstly, IHNN takes advantage of
respectively. heterogeneous information stemming from diverse sources
of data such as the related attributes of items and encodes
D. Comparision with Baselines more attributes as well as their associations by utilizing
The obtained overall experimental results are shown in Table meta-path-based attention in heterogeneous graph. Besides,
II. Compared with baseline models, we can find that our model IHNN integrates multi-view hypergraphs derived from explicit
IHNN can achieve the best performance of gender and age interactions and implicit interactions so as to learn complex
prediction, which verifies the efficiency of our model. It can interactions among users from different views, contributing
be observed from the results that traditional machine learning to semantically rich representations of target users, high per-
methods LR and SVM perform inferior to other methods on formance of user porfiling and strong model generalization
both Alibaba-dataset and JD-dataset. It is because traditional ability.
machine learning methods only take the input attributes of
E. Ablation Study
nodes into consideration. The network structure information
among nodes as well as the complex relationships among users In order to verify the effectiveness of multiple perspectives
are ignored when obtaining users’ representation. Methods in the IHNN, we designed the variants of the complete IHNN
based on graph neural networks such as GCN and GAT model by removing the hypergraph views and compared them
perform better than classical methods, in which GAT performs with proposed model. Experimental results on two datasets are
better due to the addition of attention mechanism to pay shown in Fig. 3. The variants include:
attention to different importance of nodes. For heterogeneous • IHNNwo : modifies model only keeping heterogeneous
graphs, HGAT and HGCN can achieve better indicators in structure without hypergraph structure or other graph
gender task with comparison with traditional methods for structure (i.e., no Multi-view Hypergraph Convolution
homogeneous graphs, which also indicates the necessity of Layer) The output layer employs MLP to obtain clas-
heterogeneous graphs, that is, different semantic representa- sification results.
tions of nodes can be obtained through different data sources. • IHNNep : only utilizes explicit interaction-based hyper-
It’s noteworthy that IHNN achieves all the best overall graphs, such as the hypergraph generated by relationship
performance in the tasks on two datasets. Compared with the of ‘co-click’ on commodities on JD-dataset, without im-
best performing baselines, the accuracy and Macro-F1 of our plicit interaction-based hypergraphs. In Alibaba-dataset,
model IHNN are improved by 0.43%, 0.94% for the task of hypergraphs derived from interactions of user ‘co-click’
age prediction and 0.93%, 0.76% for the task of gender pre- advertisement and user ‘co-purchase’ commodity are ex-
diction on Alibaba-dataset, respectively, which are improved perimented, denoted as IHNNep1 and IHNNep2 , respec-
by 11.18% and 10.49% respectively for age prediction on tively.
JD-dataset. Our model performs more prominently on JD- • IHNNip : only utilizes implicit interaction-based hyper-
dataset than it does on Alibaba-dataset. One reason may be graph such as topic-based hypergraph and ignores users’
that in Alibaba-dataset, the original high-order interactions interactions indicated by direct behaviors such as rela-
among users like ‘users co-purchase users’ or ‘users co-click tionship of ‘co-purchase’ on commodities.
users’ are not as strong as it is in JD-dataset, weakening the From the results, we can observe that IHNN outperforms
function of hypergraph to focus on the complex relationships other model structures in both datasets. Compared with model
among users. Another reason may be that JD-dataset constructs without hypergraph structure IHNNwo , models based on either
hypergraphs based on both explicit and implicit interactions, explicit interactions IHNNep or impiclit interactions IHNNip
while Alibaba-dataset ultilizes merely explicit interaction- achieve better performance owing to user representation with
based hypergraphs, which further implies the effectiveness of more complete semantic information because of mining high-
integrating multi-view interaction hypergraphs. order adjacency relationship between users, thus improving
In addition, as shown in Table IV, we compare the detailed the accuracy of model. The results of IHNN, IHNNep and
performance of IHNN and other graph-based baselines on each IHNNip also indicate that compared with the single hyper-
label of both tasks, based on F1-score metric of individual graph, integrating the higher-order relationships of users from
category. From results, we can observe that IHNN outperforms multiple perspectives including both explicit interactions and
other baselines on categories of two tasks mostly, especially impliclit interactions can contribute to taking advantage of
on categories with less users such as label ‘>55’ on age the complementarity of views to enhance the accurate high-
prediction of JD-dataset. This may because the categories with order relationship among users. For example, considering the
a large number of users contain abundant original information similarity of users according to the topics involved by users or
explicitly, while the categories with less users include less exploiting explicit users’ relationships from users’ communal
explicit interactions. Therefore, the multi-view hypergraphs in behavior logs on commodities or other social information is
IHNN exploring high-order relationships is more conducive capable of making users with similar interests closer.
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
(a) Alibaba-dataset (b) JD-dataset
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.
[3] G. Farnadi, G. Sitaraman, S. Sushmita, F. Celli, M. Kosinski, D. Still- [22] L. Liao, X. He, H. Zhang, and T.-S. Chua, “Attributed social network
well, S. Davalos, M. F. Moens, and M. Cock, “Computational personality embedding,” IEEE Transactions on Knowledge and Data Engineering,
recognition in social media,” User modeling and user-adapted interac- vol. 30, no. 12, pp. 2257–2270, 2018.
tion, vol. 26, no. 2-3, pp. 109–142, 2016. [23] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Het-
[4] F. R. Pardo, F. Celli, P. Rosso, M. Potthast, B. Stein, and W. Daelemans, erogeneous graph neural network,” in Proceedings of the 25th ACM
“Overview of the 3rd author profiling task at pan 2015,” 2015. SIGKDD International Conference on Knowledge Discovery & Data
[5] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, and L. H. Mining, 2019, pp. 793–803.
Ungar, “Personality, gender, and age in the language of social media: [24] Q. Fang, J. Sang, C. Xu, and Y. Rui, “Topic-sensitive influencer mining
The open-vocabulary approach,” PLoS ONE, vol. 8, no. 9, p. e73791, in interest-based social media networks via hypergraph learning,” IEEE
2013. Transactions on Multimedia, vol. 16, no. 3, pp. 796–812, 2014.
[6] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, “Phoneme [25] D. Li, Z. Xu, S. Li, and X. Sun, “Link prediction in social networks
recognition using time-delay neural networks,” IEEE Transactions on based on hypergraph,” in Proceedings of the 22nd international confer-
Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328–339, ence on world wide web, 2013, pp. 41–42.
1989. [26] L. E. Martinet, M. A. Kramer, W. Viles, L. N. Perkins, and E. D.
[7] G. Farnadi, J. Tang, M. D. Cock, and M. F. Moens, “User profiling Kolaczyk, “Robust dynamic community detection with applications to
through deep multimodal fusion,” in Eleventh Acm International Con- human brain functional networks,” Nature Communications, vol. 11,
ference, 2018, pp. 171–179. no. 1, p. 2785, 2020.
[8] W. Chen, Y. Gu, Z. Ren, X. He, H. Xie, T. Guo, D. Yin, and Y. Zhang, [27] Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3-d object retrieval
“Semi-supervised user profiling with heterogeneous graph attention and recognition with hypergraph analysis,” IEEE Transactions on Image
networks.” in IJCAI, vol. 19, 2019, pp. 2116–2122. Processing, vol. 21, no. 9, pp. 4290–4303, 2012.
[9] A. Rahimi, T. Cohn, and T. Baldwin, “Semi-supervised user geolocation [28] W. Zhao, S. Tan, Z. Guan, B. Zhang, M. Gong, Z. Cao, and Q. Wang,
via graph convolutional networks,” Proceedings of the 56th Annual “Learning to map social network users by unified manifold alignment on
Meeting of the Association for Computational Linguistics (Volume 1: hypergraph,” IEEE Transactions on Neural Networks Learning Systems,
Long Papers), 2018. pp. 1–13, 2018.
[10] Z. Xiao, W. Song, H. Xu, Z. Ren, and Y. Sun, “Timme: Twitter ideology- [29] B. Schölkopf, J. Platt, and T. Hofmann, “Learning with hypergraphs:
detection via multi-task multi-relational embedding,” 2020. Clustering, classification, and embedding,” in Advances in Neural In-
[11] X. He, H. Zhang, M. Y. Kan, and T. S. Chua, “Fast matrix factorization formation Processing Systems 19: Proceedings of the 2006 Conference,
for online recommendation with implicit feedback,” 2017. 2007.
[12] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for [30] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural net-
recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009. works,” Proceedings of the AAAI Conference on Artificial Intelligence,
[13] S. Wang, L. Cao, L. Hu, S. Berkovsky, X. Huang, L. Xiao, and vol. 33, pp. 3558–3565, 2019.
W. Lu, “Hierarchical attentive transaction embedding with intra-and [31] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar,
inter-transaction dependencies for next-item recommendation,” IEEE “Hypergcn: A new method for training graph convolutional networks
Intelligent Systems, vol. 36, no. 4, pp. 56–64, 2020. on hypergraphs,” Advances in neural information processing systems,
[14] S. Wang, X. Zhang, Y. Wang, H. Liu, and F. Ricci, “Trustworthy vol. 32, 2019.
recommender systems,” arXiv preprint arXiv:2208.06265, 2022. [32] Y. Gao, M. Wang, Z. J. Zha, J. Shen, X. Li, and X. Wu, “Visual-
[15] S. Wang, X. Xu, X. Zhang, Y. Wang, and W. Song, “Veracity-aware textual joint relevance learning for tag-based social image search,” Image
and event-driven personalized news recommendation for fake news Processing IEEE Transactions on, vol. 22, no. 1, pp. 363–376, 2013.
mitigation,” in Proceedings of the ACM Web Conference 2022, 2022, [33] T. H. Hwang, Z. Tian, R. Kuang, and J. P. Kocher, “Learning on
pp. 3673–3684. weighted hypergraphs to integrate protein interactions and gene ex-
[16] Q. Zhang, L. Cao, C. Shi, and L. Hu, “Tripartite collaborative filtering pressions for cancer outcome prediction,” in Eighth IEEE International
with observability and selection for debiasing rating estimation on Conference on Data Mining, 2008.
missing-not-at-random data,” in AAAI. AAAI Press, 2021, pp. 4671– [34] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path-
4678. based top-k similarity search in heterogeneous information networks,”
[17] L. Liu, D. Preotiucpietro, Z. R. Samani, M. E. Moghaddam, and Proceedings of the Vldb Endowment, vol. 4, no. 11, pp. 992–1003, 2011.
L. Ungar, “Analyzing personality through social media profile picture [35] S. Rosenthal and K. Mckeown, “Age prediction in blogs: A study
choice.” 2016. of style, content, and online behavior in pre- and post-social media
[18] X. Li, M. Zhang, S. Wu, Z. Liu, and P. S. Yu, “Dynamic graph generations,” in The 49th Annual Meeting of the Association for Com-
collaborative filtering,” in 2020 IEEE International Conference on Data putational Linguistics: Human Language Technologies, Proceedings of
Mining (ICDM), 2020. the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011.
[19] S. Wu, Y. Tang, Y. Zhu, L. Wang, and T. Tan, “Session-based recommen- [36] F. A. Zamal, W. Liu, and D. Ruths, “Homophily and latent attribute
dation with graph neural networks,” in Proceedings of the Thirty-Third inference: Inferring latent attributes of twitter users from neighbors,”
AAAI Conference on Artificial Intelligence, 2019. 2012.
[20] F. Yu, Y. Zhu, Q. Liu, S. Wu, L. Wang, and T. Tan, “Tagnn: Target [37] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
attentive graph neural networks for session-based recommendation,” convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
in Proceedings of the 43rd International ACM SIGIR Conference on [38] P. Velikovi, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio,
Research and Development in Information Retrieval, 2020, pp. 1921– “Graph attention networks,” 2017.
1924. [39] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks
[21] Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable rep- for efficient text classification,” Proceedings of the 15th Conference of
resentation learning for heterogeneous networks,” in Proceedings of the the European Chapter of the Association for Computational Linguistics:
23rd ACM SIGKDD international conference on knowledge discovery vol.2, 2017.
and data mining, 2017, pp. 135–144.
Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on January 11,2024 at 16:07:23 UTC from IEEE Xplore. Restrictions apply.