5 Community Enhanced Knowledge Graph For Recommendation
5 Community Enhanced Knowledge Graph For Recommendation
2329-924X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
recommendation.
TABLE I
INCOMPLETENESS AND SPARSENESS IN
THE FREEBASE
Fig. 1. Overall framework of the proposed model. The symbols beginning with u,i,e, and s represent the users, the items, the entities, and the communities,
respectively. Due to the space limitation, the module of KG enhancement is illustrated in a separate figure.
user preference, which could improve the persuasiveness and then lead to the poor performance in the KG-based
user satisfaction of the recommender systems. recommen- dation. As a result, it is necessary to enrich the
KG for better recommendation performance.
III. PRELIMINARIES Intuitively, a possible way to enrich KG is to add additional
entities and relations. With this scheme, the semantic infor-
In this section, we introduce related concepts and the task
mation of KG would be enriched, and more paths between
formulation of KG-based recommendation.
user–item pairs could be found. It is obvious that there are
For clarity, we summarize the notations used throughout
similar entities for a certain type in a KG. For example, in a
this article in Table II. Then, the related definitions would
movie KG, there are similar movies or actors. With clustering
be given as follows for the purpose of better description
techniques, these similar entities can be clustered, and each
of task.
generated cluster can represent the set of entities that share
Definition 1 (User–Item Interaction Information): User–item
interaction information can be denoted as = (u, v) . Specif- similar features. In other words, these generated clusters can
A { } be considered as entities that represent items’ latent attributes,
ically, if a user u has interacted with an item v, then we use
such as the mixture of Action and Crime for the film genre,
(u, v) to denote an interaction between u and v. As a result,
A which does not exist in the original movie KG. For the
can be viewed as the set of all observed user–item
interactions. purpose of enriching KG, those above-mentioned clusters can
Definition 2 (KG): A KG is a kind of directed graph, which be added into the original KG to offer additional semantic
information. To our best of knowledge, it is the first time to
can be denoted as = ( , ). Specifically, = e1, e2, ..., em
G E R E { } add these clusters into KG. To better illustrate this scheme, we
represents the entity set, and = r1, r2, ..., rn
R { } describe this enhancement process in Fig. 2.
represents the relation set.
In the field of network mining, the nodes in the same cluster
Given the user–item interaction information and KG ,
A G constitute a community. To make the terminology clear and
the task is to predict the interaction probability y^
uv for the user
understandable, we adopt the terminology community to refer
u and item v, where v is the item that u has not interacted
to the node that would be added to the KG. The use of this
with before.
terminology indicates that this kind of nodes is relevant to the
result of the clustering. Next, we will give the definition of
IV. COMMUNITY ENHANCED KG FOR RECOMMENDATION community in this article.
A. Framework Definition 3 (Community): The community is the node to
S
be added into the KG, which can be regarded as an
In this section, the proposed method will be presented in ordinary entity in a KG. A community represents a cluster
detail. The proposed method consists of three parts: 1) KG of similar entities Cl in the original KG . In addition, there are
enhancement; 2) path representation learning; and 3) interac- two types of relations associated G
with community, namely the
tion probability prediction. The overall framework is relations rSe and reS between community and entity
illustrated in Fig. 1. belonging to the
In what follows, we will describe the proposed model community, and the relations rS S , rS S between commu-
l1 l2 l2 l1
in detail. nity l1 and l2 .
S S
As described above, a community represents a cluster Cl
S
B. KG Enhancement in . As a result, it is necessary to perform entity clustering
G
in the original KG . Actually, the KG can be considered as
As described above, the KG is usually with the problem the heterogeneous graph. G
The representations of nodes in a
of incompleteness and sparseness. If a KG is incomplete and heterogeneous graph can be learned by graph embedding
sparse, little semantic information is contained in it, which meth- ods, which can promote diverse downstream tasks such
may
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
as node
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
classification and clustering. Intuitively, heterogeneous graph Definition 4 (Enhanced KG): The enhanced KG GE is
embedding methods can be applied to obtain node representa- a special KG which is composed of the original KG G, the
tions s1, s 2 ,... , sm , which are then utilized to cluster enti- set of communities S = {S1 , S2 , ..., Sj }, relations rSe , reS ,
rSl1 Sl2 , and rSl2 Sl1 . Here, reS is a kind of relation indicating a
ties in {
a KG . Among}various kinds of heterogeneous graph
G certain node ek in G has the attribute that a certain
embedding methods, JUST [41] is a heterogeneous graph em- r eS
community Sl represents, which can be denoted as e →
bedding technique based on random walks. Different from S ; r represents k l
those random walks-based heterogeneous graph embedding
the relation indicating the cluster corresponding to
techniques which utilize metapaths to assist random walks, S Seincludes the node e , which can be denoted as S
r l
JUST does not need any metapaths. In most situations, due → e , where 1 ≤ k ≤k m and 1 ≤ l ≤ j. Besides, rSl1 Slk2
to the lack of domain knowledge, it is difficult to design and rSl2 Sl1 denote the relations between two communities Sl1
effective metapaths, which would degrade the performance and Sl2 , where 1 ≤ l1, l2 ≤ j and l1 l2.
}
of representation learning. As a result, heterogeneous ,For anyncluster
where i denotes 1 2
ei , eni ,
≤
embedding techniques without metapaths would be a better number of entities contained in Cli and 1 i j, it contains
choice. Therefore, in this article we choose JUST as the graph a certain type of entities. To enrich the original KG G
, we first
embedding method. add all communities S = 1, 2, ..., j into the KG . Then,
{S S S } G
With the help of JUST, the embeddings of entities emd = new relations should be involved in the graph. Specifically,
E for community Sl, the relation rSe and reS are added
s1, s 2 ,... , sm in the KG are learned. Based on the em-
{ }
beddings of entities, we then cluster the entities into between
clusters. Among many clustering methods, k-means is a Sl and each entity in Cll. Then, we consider adding relation
commonly used among communities. For communities,Sl1 and Sl2 , w,e can
clustering method [42]. The k-means algorithm splits the
, corresponding, clusters
find the
1 1
=n el1 , el1 , ...,
2
a
into clusters with the goal of having small difference within l2 l2 l2
Cl = e , e , ..., e . If there is an edge linking an entity
the l 1 2 n
same cluster and large difference between different clusters.
achieve this goal, sum of squared error (SSE) is adopted as the pair el1 , el2 , where el1 ∈ Cl and el2 ∈ Cl , we then add
l l
objective function, which is defined as
k q k q
Σ Σ 1 2
relations rSl1 Sl2 and rSl2 Sl1 between communities Sl1 and Sl2 .
SSE = D E (C i ,X) (1) With the above process, the enhanced KG GE is formed. Based
K
on GE , we then mine user preference for items, as will be
i=1 X∈Ci described later.
where Ci denotes the ith cluster center, X denotes a certain
sample, and DE denotes the Euclidean distance. Then, SSE
C. Path Representation Learning
needs to be minimized to achieve the above goal. Here, we
utilize k-means to cluster the entities based on their Since paths serve as the recommendation context, it is
embeddings learned by JUST. After the process of neces- sary to find paths concerning all the interactions in the
clustering, clusters are then obtained. Note that the KG is a user–item interaction set . For each user–item interaction (u, v)
kind of heterogeneous , we mine paths A connecting u and v in the enhanced ∈ KG.
,
The ob-
graph, so a cluster may contain various kinds of entities. As i ,
a result, we further divide each cluster into several subclusters, tained path set can be denoted as P (u,v) (u,v) .
= 1 ≤ i ≤ n ,
each of which is composed of entities of the same type. By (u
the above process, the final clusters are formed, which can be mapped onto a certain community, that is, Cli i, where
denoted as = Cl 1 ,Cl 2 , ..., Clj . Each cluster in can be 1 i j. Based on the concept of community, we give the
≤ C { } C
ed licensed use limited to: Vignan's Foundation for Science Technology
→ & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
where p i
is a path connecting u and v, and n is the path
definition of the new type of KG, namely enhanced KG. sampling size for the pair (u, v). For the reason that the entities
in the path connecting a certain user and item can be regarded
as a sequence, the RNN is naturally used to learn the repre-
sentation for a certain path. Among various kinds of network,
LSTM is a popular network architecture to model sequence
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
data. The principle of LSTM can be formalized by the embedding size of entities. Based on the similarity weight, we
following equations: can obtain the path set representation H(u,v) for a certain user–
item interaction (u, v) as follows:
ft = σ (Wf · [ht−1 , xt ] + bf ) n
Σ
it = σ (Wi · [ht−1 , xt ] + bi ) H(u,v) = i l,
h(u,v) (4)
(u
˜ swi
Ct = tanh (WC · [ht−1 , xt ] + bC ) i=1
as follows: (u,v) KG, the length of random walk, and window size of Skip-
l 1 Gram, respectively. The time cost of k-means and the addition
Σ −
i T of communities and corresponding relations are O(k |E|) and
sw(u,v) sj v
j
O( |2 + k2), respectively, where k is the number of
where sj R di ×1
and v R di ×1
are the embedding vectors clusters. In the path representation learning, the time cost of
and calculation of path representation are O((|E| + B) |A|) and
for entity e∈ ∈
j and interacted item v, respectively, and di denotes
O(P |A|), respectively, where B, |A|, and P denote the number
the
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
V. EXPERIMENTS
In this section, extensive experiments will be conducted to
confirm the effectiveness of the proposed CEKGR method.
First of all, we will describe the experimental settings, in-
cluding datasets, evaluation measures, and baselines. Then,
we will report and analyze the comparison results. In addi-
tion, ablation study and parameter analysis will be conducted.
Finally, case study will be conducted. The code is available
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
1
http://grouplens.org/datasets/movielens/
2
http://www.imdb.com/
3
https://www.kaggle.com/c/yelp-recsys-2013/data
4
http://www2.informatik.uni-freiburg.de/cziegler/BX/
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
TABLE IV
TOP-K PERFORMANCE OBTAINED BY ALL THE METHODS ON IM-1M IN TERMS OF RECALL@K AND NDCG@K
Recall NDCG
Recall@1 Recall@5 Recall@10 Recall@20 NDCG@1 NDCG@5 NDCG@10 NDCG@20
RKGE 0.0567 0.2417 0.4024 0.5909 0.5589 0.5198 0.5382 0.5839
Ripple Net 0.0979 0.4000 0.5448 0.6919 0.9989 0.9447 0.9041 0.8829
KGAT 0.0764 0.3413 0.5239 0.6841 0.7476 0.7002 0.7111 0.7331
CKE 0.0771 0.3354 0.5148 0.6761 0.7423 0.6949 0.7030 0.7232
ECFKG 0.0620 0.2395 0.3597 0.5001 0.5546 0.4904 0.4888 0.5126
NFM 0.0771 0.3282 0.4999 0.6595 0.7434 0.6824 0.6838 0.7063
KGCL 0.0727 0.3341 0.5134 0.6786 0.7137 0.6912 0.7018 0.7257
CEKGR 0.0980 0.4685 0.6747 0.8394 0.9981 0.9601 0.9653 0.9705
Improvement percentage 0.10% 17.13% 23.67% 21.32% -0.08% 1.63% 6.77% 9.92%
TABLE V
TOP-K PERFORMANCE OBTAINED BY ALL THE METHODS ON YELP IN TERMS OF RECALL@K AND NDCG@K
Recall NDCG
Recall@1 Recall@5 Recall@10 Recall@20 NDCG@1 NDCG@5 NDCG@10 NDCG@20
RKGE 0.3242 0.6980 0.8426 0.9243 0.5010 0.6148 0.6622 0.6881
Ripple Net 0.6743 0.9078 0.9480 0.9779 0.9760 0.9663 0.9677 0.9710
KGAT 0.3191 0.6492 0.8212 0.9304 0.4861 0.5811 0.6339 0.6659
CKE 0.2523 0.5836 0.7976 0.9183 0.4009 0.5005 0.5679 0.6039
ECFKG 0.3409 0.7193 0.8642 0.9381 0.4986 0.6225 0.6694 0.6926
NFM 0.3671 0.7446 0.8872 0.9498 0.5359 0.6555 0.7019 0.7216
KGCL 0.3350 0.6563 0.8162 0.9292 0.5121 0.5967 0.6446 0.6769
CEKGR 0.6928 0.9430 0.9766 0.9931 0.9788 0.9791 0.9802 0.9811
Improvement percentage 2.74% 3.88% 3.02% 1.55% 0.29% 1.32% 1.29% 1.04%
adopted to assess the performance. Here, we set K = 1, 5, nonlinearity of neural network in modeling the higher
10, 20. order feature interactions.
3) Baselines: To validate the effectiveness of the pro- 7) KGCL [45]: This method takes inspirations from the KG
posed CEKGR method, we compare CEKGR with the follow- learning and self-supervised data augmentation, to in-
ing methods. corporate the KG context to guide the model in refin-
1) RKGE [22]: This method utilizes recurrent network to ing user/item representations with new knowledge-
model paths connecting entity pairs in a KG, and the aware contrastive objectives.
embeddings of entities are used to obtain proximity
score between user–item pairs.
2) RippleNet [24]: To represent users’ preference, this B. Comparison Results
model propagates users’ potential interests along the The comparison results on the three datasets are listed in
connection in a KG. In the end, the clicking probability Tables IV, V, and VI, respectively. For convenience, the best
is predicted based on the item representation and user results are boldfaced, and the row “improvement percentage”
representation. shows the performance improvement obtained by the pro-
3) KGAT [35]: This method propagates the embeddings posed method compared with the best performance of base-
from the current node’s neighbors in a KG, and lines. Based on the results in three tables, we have the follow-
attention mechanism is adopted for the purpose of ing observations.
distinguishing the importance of neighbor nodes. 1) Compared with the most similar baseline, namely
4) CKE [19]: This model obtains the item representation RKGE, which also utilizes paths as recommendation
from structural content, textual content, and visual con- context, the proposed CEKGR method has obtained
tent with respect to a certain knowledge base and the significant performance improvement in terms of recall,
item offset vector learned from historical user–item i.e., the average improvement percentage of 69.10%,
interac- tions. Then, the collaborative joint learning is 43.04%, and 187.73% on IM-1M, Yelp, and Book,
conducted to predict users’ implicit feedback. respectively. This may be due to the following reasons:
5) ECFKG [43]: Through the KG containing users and 1) RKGE uses paths’ representations in the training
items, the representations of users and items can be process, but it con- ducts recommendation with only the
learned by maximizing the generative probability of the representations of users and items that are refined by
observed relation triplets. Then, CF can be conducted to encoding the con- nected paths in the test process. In
predict users’ preference for items. contrast, CEKGR directly utilizes path representation
6) NFM [44]: This method combines the linearity of FM in for the test process; and 2) CEKGR has enriched the
modeling the second-order feature interactions and the KG for more semantic
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
TABLE VI
TOP-K PERFORMANCE OBTAINED BY ALL THE METHODS ON BOOK IN TERMS OF RECALL@K AND NDCG@K
is more effective than
Recall NDCG
Recall@1 Recall@5 Recall@10 Recall@20 NDCG@1 NDCG@5 NDCG@10 NDCG@20
RKGE 0.1837 0.3593 0.4012 0.4257 0.3368 0.3535 0.3610 0.3673
Ripple Net 0.1086 0.4839 0.7274 0.8657 0.1294 0.3144 0.3985 0.4436
KGAT 0.3022 0.5775 0.7686 0.8824 0.4019 0.4892 0.5509 0.5865
CKE 0.1937 0.4359 0.6929 0.8675 0.2855 0.3541 0.4356 0.4882
ECFKG 0.3382 0.6022 0.7914 0.8920 0.4765 0.5312 0.5900 0.6205
NFM 0.3382 0.6126 0.7957 0.8959 0.4692 0.5369 0.5940 0.6249
KGCL 0.2981 0.5486 0.7493 0.8754 0.3960 0.4668 0.5321 0.5712
CEKGR 0.7482 0.9523 0.9822 0.9952 0.9011 0.9156 0.9277 0.9325
Improvement percentage 121.23% 55.45% 23.44% 11.08% 89.11% 70.53% 56.18% 49.22%
C. Ablation Study
In this section, ablation study will be conducted to
demon- strate the effect of some key components of the
proposed method, namely community and aggregation
method.
1) Effectiveness Analysis on Community: The
comparison results have shown that CEKGR achieves
better performance than the path-based recommendation
method RKGE, which proves the effectiveness of the
introduction of community to a certain extent. To further
validate the effectiveness of the introduction of
community, we compare the proposed model with a
variant which does not introduce the communities into
original KG. We report the results in Table VII. Note
that the row “CEKGR-RS” shows the results of the variant
that is without communities, and the row “improvement
percentage” shows the performance improvement of the
complete model over the variant. Generally, CEKGR
achieves the higher recall on three datasets than the variant
which is without communities. In particular, on the IM-1M
dataset, CEKGR shows a small performance improvement
over the variant without communi- ties, and even the value
of Recall@1 decreases. Based on the statistics of datasets
in Table III, the original KG of the IM-1M dataset is
relatively dense, so the semantic information of it is
relatively abundant. Therefore, the performance
improvement of adding communities is relatively limited.
The possible reason
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
TABLE VII
ABLATION STUDY: IMPACT OF COMMUNITY
Recall
Recall@1 Recall@5 Recall@10 Recall@20
CEKGR-RS 0.1048 0.4665 0.6685 0.8345
IM-1M CEKGR 0.0980 0.4685 0.6747 0.8394
Improvement percentage -6.49% 0.43% 0.93% 0.59%
CEKGR-RS 0.5022 0.7682 0.8613 0.9430
Yelp CEKGR 0.6928 0.9430 0.9766 0.9931
Improvement percentage 37.95% 22.75% 13.39% 5.31%
CEKGR-RS 0.5997 0.9067 0.9721 0.9901
Book CEKGR 0.7482 0.9523 0.9822 0.9952
Improvement percentage 24.76% 5.03% 1.04% 0.52%
Note: Bold results in each column are the best.
TABLE VIII
ABLATION STUDY: IMPACT OF AGGREGATION METHOD
Recall
Recall@1 Recall@5 Recall@10 Recall@20
Max pooling 0.0980 0.4685 0.6747 0.8394
IM-1M Average pooling 0.0980 0.4685 0.6747 0.8394
Similarity weight 0.0980 0.4685 0.6747 0.8394
Max pooling 0.6928 0.9430 0.9766 0.9931
Yelp Average pooling 0.6928 0.9430 0.9766 0.9931
Similarity weight 0.6928 0.9430 0.9766 0.9931
Max pooling 0.3586 0.7950 0.8995 0.9509
Book Average pooling 0.3420 0.7423 0.8669 0.9323
Similarity weight 0.7482 0.9523 0.9822 0.9952
Note: Bold results in each column are the best.
for the decrease of Recall@1 is that the increase of semantic pooling. As revealed in the literature [46], there is a tradeoff
information is limited after adding communities to the KG of between interpretability and recommendation effect. Specifi-
IM-1M, and when the length of the list of candidate recom- cally, the recommendation performance may degrade when
mendations is too short (e.g., when the length is 1), the value the interpretability is improved. However, the results show
of recall is more susceptible to unobserved interactions. While that our model becomes more explainable without any
on the Yelp and Book datasets, the improvement of Recall is sacrifice of per- formance, which demonstrates the superiority
obvious. As shown in Table III, these two datasets are both of our proposed aggregation method. In a word, the utilization
with a relatively sparse original KG. Therefore, there is of similarity weight can bring improvement in
relatively little semantic information in KGs. As a result, the recommendation effect when the KG is relatively sparse.
introduction of communities can enhance the semantic Although there is no performance improvement with a
information to a greater extent, so the performance relatively dense KG when using similarity weight, the
improvement will be more obvious. In summary, the results proposed aggregation method can distinguish the importance
have validated the effective- ness of introducing communities, of each path without compromising performance.
especially for the KG with low graph density.
2) Effectiveness Analysis on Aggregation Method: Consid- D. Parameter Analysis
ering that there are several paths between a certain user–item
In this section, parameter analysis will be conducted to
pair, so several path representations should be appropriately
show how some parameters affect the performance of the
aggregated for the following interaction probability
proposed method, namely, path length, path sample size,
prediction. In this part, we compare our method with two
embedding size, and the number of clusters. When analyzing
variants that aggregate path representations by average
one parameter, the other parameters are kept fixed as the
pooling and max pooling, respectively. From the results in
default setting.
Table VIII, it can be seen that the recall value obtained by the
1) Impact of Path Length: When mining paths connecting
proposed method are higher than those of two variants on the
user–item pairs, the length of paths can play an important role
Book dataset. It can also be observed that the recall value on
in the prediction of users’ preference. To study the effect of
the other two datasets keep the same among three kinds of
path length on the recommendation performance, we vary the
aggregation methods, which shows no improvement on the
path length to see how the performance changes with it. Due
other two aggregation methods. However, note that either the
to the huge time complexity consumed by the path-mining
average pooling or max pool- ing cannot indicate the
task in a KG, we restrict the path length to 5. Specifically, the
importance of each path. The method with similarity weight
path length is set to 3 and 5. Since the number of interactions
distinguishes the diverse contribution of each path for a
and the scale of the KG are both relatively large for the Yelp
certain user–item pair without any perfor- mance degradation
dataset, it would take quite a long time to find paths for all the
compared with the average pooling or max
interactions on the Yelp dataset when the path length is set to
5, of which the cost is too high to be acceptable. As a result,
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
TABLE IX TABLE XI
PARAMETER ANALYSIS: THE IMPACT OF PATH LENGTH PARAMETER ANALYSIS: THE IMPACT OF THE EMBEDDING SIZE
Recall Recall
Recall@1 Recall@5 Recall@10 Recall@20 Recall@1 Recall@5 Recall@10 Recall@20
L3 0.0980 0.4685 0.6747 0.8394 8 0.0980 0.4685 0.6746 0.8394
IM-1M L5 0.0979 0.4684 0.6746 0.8394 16 0.0982 0.4687 0.6748 0.8394
L3 0.7482 0.9523 0.9822 0.9952 IM-1M 32 0.0980 0.4685 0.6747 0.8394
Book L5 0.7561 0.9568 0.9840 0.9958 64 0.0982 0.4685 0.6747 0.8395
128 0.0978 0.4679 0.6743 0.8392
Note: Bold results in each column are the best.
8 0.7464 0.9535 0.9828 0.9953
16 0.7454 0.9506 0.9812 0.9948
TABLE X Book 32 0.7482 0.9523 0.9822 0.9952
PARAMETER ANALYSIS: THE IMPACT OF PATH SAMPLE SIZE 64 0.7429 0.9511 0.9810 0.9944
128 0.7490 0.9529 0.9825 0.9954
Recall
Note: Bold results in each column are the best.
Recall@1 Recall@5 Recall@10 Recall@20
S2 0.0980 0.4685 0.6747 0.8394
IM-1M S5 0.0980 0.4685 0.6747 0.8394 TABLE XII
S8 0.0980 0.4685 0.6747 0.8394 PARAMETER ANALYSIS: THE IMPACT OF THE NUMBER OF CLUSTERS
S2 0.7479 0.9523 0.9822 0.9952
Book S5 0.7482 0.9523 0.9822 0.9952 Recall
S8 0.7482 0.9523 0.9822 0.9952 Recall@1 Recall@5 Recall@10 Recall@20
10 0.0979 0.4684 0.6746 0.8394
Note: Bold results in each column are the best. 50 0.0979 0.4684 0.6746 0.8394
IM-1M 100 0.0980 0.4685 0.6747 0.8394
and Book datasets for the parameter analysis. The results are 500 0.0993 0.4668 0.6724 0.8380
1000 0.1014 0.4648 0.6700 0.8362
shown in Table IX, where the row “L3” denotes that the path 10 0.3830 0.8482 0.9346 0.9783
length is 3 and the other rows are with similar meanings. For 50 0.3922 0.6371 0.8168 0.9034
the IM-1M dataset, we can observe that the performance Book 100 0.7482 0.9523 0.9822 0.9952
500 0.6773 0.9294 0.9709 0.9884
remains almost unchanged. However, the situation is different 1000 0.6602 0.9259 0.9686 0.9873
on the Book dataset. As shown in Table IX, the performance
Note: Bold results in each column are the best.
increases when the path length increases. The reason why
the situation
is different on these two datasets is that the graph density of size can be set to 8.
KG of the IM-1M dataset is relatively large, which provides
rich semantic information, so a shorter path would not bring
performance degradation. Nevertheless, the KG of the Book
dataset is so sparse that it contains little information. As a result,
the path should be longer for the purpose of completely
mining user preference.
2) Impact of the Path Sample Size: Since there are many
paths linking a user and an item in a KG, the time cost for
model training would be quite high if we utilize all paths
between users and items. As a result, the strategy of
sampling is necessary. In this part, we will study the impact
of sampling size on rec- ommendation performance. The
results are shown in Table X, where the row “S2” denotes that
the path sample size is 2 and the other rows are with similar
meanings. Obviously, the results stay almost unchanged as the
sample size changes. This fact can confirm that our method is
insensitive to the path sample size, which would be helpful to
reduce the time and space cost for the model training process
since we can adopt a small sample size without performance
degradation.
3) Impact of the Embedding Size: In this part, we inves-
tigate the impact of the embedding size on recommendation
performance. The results are shown in Table XI. For the IM-
1M dataset, it can be seen that the values of recall increase at
first and then decrease as the embedding size rises. The
possible reason is that the model with too small dimension
cannot encode more features, and too large dimension leads to
overfitting. As for the Book dataset, the recommendation
performance is quite fine when the embedding size is 8, 32,
and 128. To reduce the cost of model training, the embedding
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
E. Case Study
By mining the connection between a user and an item, the
proposed method yields a better performance compared
with other state-of-the-art methods. Besides, the paths can
provide explanations why the recommender system
recommends a cer- tain product or service to a user. To
better illustrate it, we randomly select a user–item pair from
the IM-1M dataset. We present the paths between this user–
item pair in Fig. 3. From the figure, we can see that there are
three paths linking the user u941 and the movie Heat, whose
genre covers Action, Crime, and Drama. First, the lower
two paths share the same linking pattern. Based on these
two paths, explanation can be generated as “The movie Heat
is recommended to you since you have watched the movie
Face/Off and Star Wars: Return of the
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
[2] Z.-R. Yang, Z.-Y. He, C.-D. Wang, J.-H. Lai, and Z. Tian,
“Collaborative meta-path modeling for explainable recommendation,”
IEEE Trans. Comput. Social Syst., vol. 11, no. 2, pp. 1805–1815, Apr.
2024.
[3] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grou-
pLens: An open architecture for collaborative filtering of Netnews,” in
Proc. Conf. Comput. Supported Cooperative Work (CSCW), Chapel
Hill, NC, USA, Oct. 22–26, 1994, pp. 175–186.
[4] U. Shardanand and P. Maes, “Social information filtering: Algorithms
for automating ‘word of mouth’,” in Proc. Human Factors Comput.
Syst. (CHI) Conf. Proc., Denver, CO, USA, May 7–11, 1995, pp. 210–
217.
[5] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, “Item-based
collaborative filtering recommendation algorithms,” in Proc. 10th Int.
Fig. 3. Paths connecting the user u941 and the movie Heat. World Wide Web Conf. (WWW), Hong Kong, China, May 1–5, 2001,
pp. 285–295.
[6] X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua, “Fast matrix factorization
Jedi, which are both Action films.” As for the path on the top, for online recommendation with implicit feedback,” in Proc. 39th Int.
the community s100 is included in this path. According to the ACM SIGIR Conf. Res. Develop. Inf. Retrieval, Pisa, Italy, Jul. 17–21,
definition of community, s100 represents a cluster of a certain 2016, pp. 549–558.
[7] C.-D. Wang, W.-D. Xi, L. Huang, Y.-Y. Zheng, Z.-Y. Hu, and J.-H. Lai,
kind of users. In the scenario of movie recommendation, s100 “A BP neural network based recommender framework with attention
can reflect a certain user group that shares similar preference mechanism,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 7, pp. 3029–
for movies. To better illustrate this, we investigate the 3043, Jul. 2022.
[8] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “LightGCN:
interacted movies for u941 and u42. For user u941, we find Simplifying and powering graph convolution network for recommenda-
that more than a quarter of interacted movies belong to the tion,” in Proc. 43rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,
genre of Drama, while over a half of interacted movies are Virtual Event, China, Jul. 25–30, 2020, pp. 639–648.
[9] M. Mao, J. Lu, G. Zhang, and J. Zhang, “Multirelational social rec-
with the genre of Drama for u42. The above observation ommendations via multigraph ranking,” IEEE Trans. Cybern., vol. 47,
implies their common preference for the movies with the no. 12, pp. 4049–4061, Dec. 2017.
genre of Drama. As a result, the explanation based on this [10] J. Ni, Z. Huang, C. Yu, D. Lv, and C. Wang, “Comparative
path can be generated as follows, “We recommend the movie convolutional dynamic multi-attention recommendation model,” IEEE
Trans. Neural Netw. Learn. Syst., vol. 33, no. 8, pp. 3510–3521, Aug.
Heat to you because another user who shares common interest 2022.
with you has watched it.” In the end, we can see that different [11] Q. Zhang, J. Lu, D. Wu, and G. Zhang, “A cross-domain recommender
factors contribute diversely for user’s interacted movies. The system with kernel-induced knowledge transfer for overlapping
difference can be reflected by the weight of the corresponding entities,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp.
1998–2012, Jul. 2019.
path. From the weight of each path, we can see that the reason [12] K. Liu, F. Xue, X. He, D. Guo, and R. Hong, “Joint multi-grained
for u941 to choose movies tend to consider the movies with popularity-aware graph convolution collaborative filtering for recom-
the genre of Action rather than the selecting movies of similar mendation,” IEEE Trans. Comput. Social Syst., vol. 10, no. 1, pp. 72–
83, Feb. 2023.
users. [13] L. Sang, M. Xu, S. Qian, and X. Wu, “Knowledge graph enhanced
neural collaborative recommendation,” Expert Syst. Appl., vol. 164,
VI. CONCLUSION 2021, Art. no. 113992.
[14] Q. Zhu, X. Zhou, J. Wu, J. Tan, and L. Guo, “A knowledge-aware
KG-based recommender systems have shown to achieve the attentional reasoning network for recommendation,” in Proc. 34th
fairly fine performance compared with other kinds of recom- AAAI Conf. Artif. Intell. (AAAI), 32nd Innovative Appl. Artif. Intell.
mender systems with the help of KGs containing rich Conf. (IAAI), 10th AAAI Symp. Educational Adv. Artif. Intell. (EAAI),
New York, NY, USA, Feb. 7–12, 2020, pp. 6999–7006.
semantic information about items. Because of this, the quality [15] K. Tu et al., “Conditional graph attention networks for distilling and
of KG plays an important role in recommendation accuracy of refining knowledge graphs in recommendation,” in Proc. 30th ACM
KG- based recommender systems. For addressing the issues of Int. Conf. Inf. Knowl. Manage. (CIKM), Virtual Event, Queensland,
Australia, Nov. 1–5, 2021, pp. 1834–1843.
in- completeness and sparseness in the KG, we enrich the KG [16] H. Sheu, Z. Chu, D. Qi, and S. Li, “Knowledge-guided article em-
with entities’ clusters and the corresponding relations. To bedding refinement for session-based news recommendation,” IEEE
clearly distinguish the importance of different paths which Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 7921–7927,
Dec. 2022.
reflect the corresponding factors that impact users’ decision,
[17] C. Chen, M. Zhang, Y. Liu, and S. Ma, “Neural attentional rating
we devise a weight mechanism to aggregate those paths, and regression with review-level explanations,” in Proc. World Wide Web
weights themselves can provide better explanations. Based on Conf. (WWW), Lyon, France, Apr. 23–27, 2018, pp. 1583–1592.
the above goals, we propose a recommendation model named [18] Z. Ren, S. Liang, P. Li, S. Wang, and M. de Rijke, “Social collaborative
viewpoint regression with explainable recommendations,” in Proc. 10th
CEKGR. Experiments on three datasets have been conducted, ACM Int. Conf. Web Search Data Mining (WSDM), Cambridge, U.K.,
and the results show the superiority of our method over other Feb. 6–10, 2017, pp. 485–494.
state- of-the-art methods. Furthermore, the case study has [19] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma, “Collaborative
knowledge base embedding for recommender systems,” in Proc. 22nd
shown the interpretability of the proposed method. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San
Francisco, CA, USA, Aug. 13–17, 2016, pp. 353–362.
REFERENCES [20] M. Jamali and M. Ester, “A matrix factorization technique with trust
propagation for recommendation in social networks,” in Proc. ACM
[1] O. Tal, Y. Liu, J. X. Huang, X. Yu, and B. Aljbawi, “Neural attention
Conf. Recommender Syst. (RecSys), Barcelona, Spain, Sep. 26–30, 2010,
frameworks for explainable recommendation,” IEEE Trans. Knowl.
Data Eng., vol. 33, no. 5, pp. 2137–2150, May 2021. pp. 135–142.
[21] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
relation embeddings for knowledge graph completion,” in Proc. 29th
AAAI Conf. Artif. Intell., Jan. 25–30, Austin, TX, USA, 2015, pp.
2181–2187.
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
Int. Conf. Inf. Knowl. Manage. (CIKM), Torino, Italy, Oct. 22–26,
2018, pp. 437–446.
[42] J. MacQueen, “Some methods for classification and analysis of mul-
tivariate observations,” in Proc. 15th Berkeley Symp. Math. Stat.
Probab., vol. 1. Berkeley, CA, USA: Univ. of California Press, 1967,
pp. 281–297.
[43] Q. Ai, V. Azizi, X. Chen, and Y. Zhang, “Learning heterogeneous
knowl- edge base embeddings for explainable recommendation,”
Algorithms, vol. 11, no. 9, p. 137, 2018.
[44] X. He and T.-S. Chua, “Neural factorization machines for sparse
predictive analytics,” in Proc. 40th Int. ACM SIGIR Conf. Res.
Dev. Inf. Retrieval, Shinjuku, Tokyo, Japan, Aug. 7–11, 2017, pp.
355–364.
[45] Y. Yang, C. Huang, L. Xia, and C. Li, “Knowledge graph contrastive
learning for recommendation,” in Proc. 45th Int. ACM SIGIR
Conf. Res. Dev. Inf. Retrieval (SIGIR), Madrid, Spain, Jul. 11–15,
2022, pp. 1434–1443.
[46] Y. Zhang and X. Chen, “Explainable recommendation: A survey and
new perspectives,” Found. Trends Inf. Retrieval, vol. 14, no. 1, pp. 1–
101, 2020.
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination.
Jian-Huang Lai (Senior Member, IEEE) received Yong Tang received the B.S. and M.Sc. degrees
the M.Sc. degree in applied mathematics and the from Wuhan University, Wuhan, China, in 1985
Ph.D. degree in mathematics from Sun Yat-sen and 1990, respectively, and the Ph.D. degree from
University, Guangzhou, China, in 1989 and 1999, the University of Science and Technology of China,
respectively. Hefei, China, in 2001, all in computer science.
In 1989, he joined Sun Yat-sen University as He is the founder of SCHOLAT, a kind of
an Assistant Professor, where he is currently a scholar social network. He is currently a Professor
Professor with the School of Computer Science and the Dean of the School of Computer Science,
and Engineering. His current research interests in- South China Normal University, Guangzhou,
clude the areas of digital image processing, pattern China. Be- fore joining the South China Normal
recognition, multimedia communication, wavelet, University in 2009, he was the Vice Dean of the
and its applications. He has published more than 200 scientific papers in School of
the international journals and conferences on image processing and pattern Information of Science and Technology, Sun Yat-Sen University, Guangzhou,
recognition, such as IEEE TRANSACTIONS ON PATTERN ANALYSIS AND China. He has published more than 200 papers and books. He has supervised
MACHINE INTELLIGENCE, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA more than 40 Ph.D. students since 2003 and more than 100 master’s students
ENGINEERING, IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING since 1996. His research interests include the area of data and knowledge
SYSTEMS, IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE engineering, social networking, and collaborative computing.
TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART B, Pattern Dr. Tang currently serves as the Director of Technical Committee on Col-
Recognition, In- laborative Computing at China Computer Federation (CCF) and the Executive
ternational Conference on Computer Vision (ICCV), CVPR, IJCAI, ICDM, Vice President of Guangdong Computer Academy. For more information,
and SDM. please visit https://scholat.com/ytang.
Prof. Lai serves as a Standing Member of the Image and Graphics
Association of China, and as the Standing Director of the Image and Graphics
Association of Guangdong.
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on August 10,2024 at 05:42:33 UTC from