HeroGRAPH - A Heterogeneous Graph Framework For Multitarget Cross-Domain Recommendation
HeroGRAPH - A Heterogeneous Graph Framework For Multitarget Cross-Domain Recommendation
ABSTRACT 1 INTRODUCTION
Cross-Domain Recommendation (CDR) is an important task in Collaborative Filtering (CF) has become an effective and efficient
recommender systems. Information can be transferred from other technique for recommender systems [13]. However, CF methods of-
domains to target domain to boost its performance and relieve ten face the sparsity issue as real-world datesets usually have a long
the sparsity issue. Most of the previous work is single-target CDR tail of users and items with few feedbacks. With the development
(STCDR), and some researchers recently propose to study dual- of CF, Cross-Domain Recommendation (CDR) has been proven to
target CDR (DTCDR). However, there are several limitations. These be a promising method to alleviate the sparsity. It can transfer rich
works tend to capture pair-wise relations between domains. They information from one domain to another to boost performance.
will need to learn much more relations if they are extended to multi- According to different tasks, previous CDR methods can be
target CDR (MTCDR). Besides, previous CDR works prefer relieving roughly divided into two categories, i.e., single-target CDR (STCDR)
the sparsity issue by extra information or overlapping users. This and dual-target CDR (DTCDR). Most CDR methods belong to the
leads to a lot of pre-operations, such as feature-engineering and former one, which transfers the information from source domain
finding common users. In this work, we propose a heterogeneous to target domain and not vice versa. These methods can be based
graph framework for MTCDR (HeroGRAPH). First, we construct a on either the feedbacks [8, 19] or the rich side information [2, 14]
shared graph by collecting users and items from multiple domains. to relieve the sparsity. The latter DTCDR has recently been studied.
This can obtain cross-domain information for each domain by mod- Information from source domain and target domain is mutually
eling the graph only once, without any relation modeling. Second, utilized to improve the performance of both domains. There are
we relieve the sparsity by aggregating neighbors from multiple do- usually two approaches to conduct dual-target modeling. The first
mains for a user or an item. Then, we devise a recurrent attention way is mostly founded on common users [17, 20] as they can clearly
to model heterogeneous neighbors for each node. This recurrent restore information from multiple domains. The second way utilizes
structure can help iteratively refine the process of selecting im- mapping function [7, 9] performing as a bridge between domains.
portant neighbors. Experiments on real-world datasets show that Technically, previous works are good at STCDR and DTCDR,
HeroGRAPH can effectively transfer information between domains but few people study the multi-target CDR (MTCDR). MTCDR is
and alleviate the sparsity issue. a generalization of DTCDR. Given at least three domains along
with the features and feedbacks, the goal of MTCDR is to boost
CCS CONCEPTS the performance of all domains. It is a more challenging but more
• Information systems → Collaborative filtering; Recommender general task in real systems. Previous successful DTCDR methods
systems. [7, 20] would have some problems if they were extended to MTCDR.
First, DTCDR generally models the pairwise relations between
KEYWORDS domains. If they directly handle n domains, there will be at least
Cn2 relations. Second, most previous works transfer information by
heterogeneous, graph, multi-target, cross-domain users. It is an indirect way to incorporate cross-domain information,
Reference Format: because user behaviors at multiple domains are still processed
Qiang Cui, Tao Wei, Yafeng Zhang, and Qing Zhang. 2020. HeroGRAPH: within each domain. Maybe we can collect all behaviors to devise a
A Heterogeneous Graph Framework for Multi-Target Cross-Domain Rec- shared structure such as graph. Such a structure can directly model
ommendation. In 3rd Workshop on Online Recommender Systems and User within-domain and cross-domain behaviors together, because it can
Modeling (ORSUM 2020), in conjunction with the 14th ACM Conference on acquire feedbacks from all domains as neighbors for a user or an
Recommender Systems, September 25th, 2020, Virtual Event, Brazil.
item.
In this work, we propose a Heterogeneous GRAPH framework
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil
for MTCDR (HeroGRAPH). First, we collect ID information of
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). users and items from multiple domains and build a shared graph.
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil Cui, et al.
Nodes include users and items. If a user purchases an item, there 3 METHODOLOGY
will be an edge in this graph. Then we use information within each In this section, we propose a heterogeneous graph framework for
domain to conduct within-domain modeling, and use the shared multi-target cross-domain recommendation (HeroGRAPH) and its
graph to handle cross-domain information. Besides, we propose a diagram is in Fig. 1. We first formulate the problem. Next, we collect
recurrent attention to aggregate neighbors from multiple domains. feedbacks for each domain and obtain within-domain embedding
Last, we combine within-domain embedding and cross-domain for each user and item. Then we gather all feedbacks to build a
embedding to compute user preference and train the model. The shared graph and acquire cross-domain embedding. Finally, we
main contributions are listed as follows: compute user preference and apply Bayesian Personalized Ranking
(BPR) to train the model.
rA rB rN
Output Layer
EuA EiA GuA GiA EuB EiB GiB GiB EuN EiN GuN GiN
Embedding Layer E A E A G G EB EB G G E N E N G G
uA iA uB iB uN iN
Input Layer
Domain A Domain B ... Domain N
iA uA
iB Heterogeneous Graph
Shared Layer uA iB uB ...
... ... Different Colors Represent
iN uN Various Domains
Figure 1: Diagram of the HeroGRAPH model. Black and red arrows between different layers represent the within-domain
modeling and cross-domain modeling, respectively. Our model gathers information from multiple domains to construct a
heterogeneous graph to transfer knowledge and boost performance of each domain.
Domain A: Domain B:
Clothing Beauty
Domain C:
Health
... Domain N: ...
...
...
Users
Figure 2: Illustration of the heterogeneous graph. Users may have feedbacks in different domains, and we collect all users and
items into one graph as a shared structure. This graph is a bridge among domains. Please note that these domains are limited
to one platform, such as Facebook or Amazon.
V
oV iteration = 1 oV , Att 1
iteration = 2 oV , Att 2
q: Node Vector iteration = 3 oV , Att 3
K: Neighbor’s Vector ... Weighted
...
V: Embeded Neighbor Average of V
Representation
... FC
K a
q K
q oK q new
R
... Weighted
Average of K
softmax
Figure 3: Diagram of the recurrent attention. This attention acts as an aggregator of neighbors of a node. Attention can sum-
marize multiple factors and our work develop a recurrent version to gradually refine this process. The recurrent operation is
conducted within node vector q and neighbor’s vector K. During each iteration, we compute the output of neighbor’s aggrega-
tion by attention weight a and embeded neighbor representation V .
3.4 Recurrent Attention for Neighbor where superscript t represents a sample (u A , i A ) with a certain
Aggregation timestamp. Then we apply the widely-used pair-wise Bayesian
Personalized Ranking (BPR) [11] to train the model
In this subsection, we propose a recurrent attention to aggregate
neighbors for each node by automatically detecting the importance
t t t
lui j A = − ln σ x̂ui A − x̂u j A (6)
of each neighbor. The recurrent attention is illustrated in Fig. 3.
Here is the detail of our recurrent attention. First of all, the where x̂ut j A = Eut A · E tjA + Gut A · G tjA is negative preference based
symbols q, K, V have the same meanings explained in subsection on negative feedback pair (u A , j A ). Finally, the loss function for
3.3. Then, the attention weight a is calculated between q and K by domain A is
Bahdanau Attention [1] and we can obtain a new form of oV by =|u |
X tX
∗ t λΘ 2
ΘA = argmin lui j A + 2 ∥Θ∥ (7)
oV = V · a (3) Θ
u t =1
As neighbors are from multiple domains, we expect to gradually where |u| reprents the number of all samples of user u. The total
refine the process of obtaining attention weight a. In order to do loss is Θ∗ = ΘA
∗ + Θ∗ + Θ∗ + . . . and parameters are updated by
B C
this, we aggregate K and update q by Adam with default values [5].
oK = K · a 4 EXPERIMENTS
(4)
qnew = R · (q + o K ) In this section, we conduct experiments, analyze the sparsity issue
where R is a linear mapping and q + o K is a short-cut connection. and the proposed recurrent attention.
Next, qnew acts as as new q to recalculate a.
Obviously, we can obtain multiple aggregated neighbor repre- 4.1 Experimental Settings
sentations as oV ,At t −1 , oV ,At t −2 and so on, where subscript Att−1 Datasets. The experiment is conducted on Amazon 5-core dataset
and Att−2 means recurrent attention is conducted once and twice [10]. We choose six domains and divide them into two tasks. Each
respectively. In addition, we add a dropout layer to q and K to avoid task has three domains. The statistics of each domain are listed in
overfitting when we first get them. Table 1. Please note that the number of feedbacks is equal to the
number of reviews listed on the website 1 .
3.5 Training Framework Evaluation Protocols. All datasets are divided into training set,
validation set and test set by time. Specifically, the time range of
In this subsection, we obtain user preference and train the model.
validation set is between 1-Mar.-2014 and 30-Apr.-2014. The ratio
Equations are introduced based on domain A.
of the amount of feedbacks in three sets is approximately 8:1:1. The
The positive user preference is calculated based on matrix fac-
performance is evaluated on test set by AUC.
torization
t
x̂ui A
= Eut A · EitA + Gut A · G itA (5) 1 http://jmcauley.ucsd.edu/data/amazon
HeroGRAPH ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil
Baselines. Our model is compared with several baselines. (1) BPR reasons for this phenomenon. First, three domains of task 2 have
[11]: We choose the BPR-MF to model implicit feedback. This is a much more data than that of task 1. The larger the amount of data,
popular and powerful single-domain method. (2) DDTCDR [7]: This the easier it is to learn a good expression. In this case, the single-
is a state-of-the-art DTCDR method and we extend it to handle three domain method can achieve good results and will not be affected by
domains. (3) GraphSAGE-pool [3]: It is a popular graph method and data from other domains. Therefore, it will be difficult to improve
we select its max pooling variant. Besides, as our proposed network on such domains. Second, we treat multiple domains as a whole and
has recurrent attention, we generate three variants: HeroGRAPH- use the global optimal hyperparameters. This will cause our model
Att-1, HeroGRAPH-Att-2 and HeroGRAPH-Att-3. to be unable to achieve optimal performance on each domain. In
In this paper, we aim to explore a novel structure for MTCDR. this unfavorable situation, our model still obtains good results on
Therefore, we choose widely-used matrix factorization to compute domain Clothing and Health, which shows the effectiveness of our
dot product similarity for all methods rather than a learned sim- model.
ilarity such as NeuMF [4], as it still needs to be carefully studied
[12]. 4.4 Analysis of Sparsity Issue
Parameter Settings. Our method is implemented by Tensorflow
In this subsection, we analyze the sparsity issue. First we choose a
2.2 2 and hyper-parameters are chosen based on validation set. The
sparse set from the whole test set. We calculate number of occur-
embedding size for each ID is 8. The regularization parameter λ Θ
rence per item in test set and items with no more than 5 feedbacks
is 0.01. As for the graph modeling, we apply a uniform sampling
makes up a sparse set. We count the total numbers of feedbacks of
used in GraphSAGE [3] to choose neighbors and the sample sizes
sparse set and test set and divide them to obtain a proportion. The
for first-order neighbors and second-order neighbors are 10 and 5
higher the proportion, the more serious the sparsity issue. Statistics
respectively. Correspondingly, output embedding sizes of first-layer
and experimental results are listed in Table 3.
aggregation and second-layer aggregation are 64 and 16 respectively.
On tasks 1 and 2, our model can achieve great improvement
These parameters are used across all tasks in our work.
if that domain has a high proportion of sparse items. If there are
fewer sparse items, such as in domain Video, Beauty and Health,
4.2 Hyperparameter Optimization our HeroGRAPH is not good enough because of the global optimal
Dropout is a powerful technique to prevent overfitting. We intro- hyperparameters. This means that by modeling neighbors for users
duce dropout layers in subsection 3.4 and take our HeroGRAPH-Att- and items, our model can help relieve the sparsity issue.
2 as an example to study different dropout rates. The performance
on validation set is illustrated in Fig. 4 and we choose the best 4.5 Analysis of Recurrent Attention
dropout rate as 0.2 for all tasks in our work.
The analysis of recurrent attention is based on Tables 2 and 3 as
The best rates for different domains may vary. On Task1, they
our variants are listed in the last three lines of each table. Generally
are 0.4, 0.2 and 0.2 for Music, Instrument and Video respectively.
speaking, Att-2 is better than Att-1 and Att-3, and it is also better
On Task2, Clothing, Beauty and Health have best rates as 0.1, 0.4
than GraphSAGE. This means that attention may be more useful
and 0.3 respectively. Although best rates vary among domains, we
and recurrent attention can reduce the variance of data. On the
choose the best from a global perspective rather than selecting
other hand, Att-2 performs comparable with Att-1 on task 2. Nearly
domain-specific best rates. This strategy will affect performance
all values of Att-3 are smaller than that of Att-2. We can conclude
but reduce the amount of parameters.
that if we have too many iterations, our model may get overfiting.
Therefore, we do not perform more iterations.
4.3 Performance Comparison
The overall performance is listed in Table 2. From an overall point
5 CONCLUSION
of view, our HeroGRAPH gains the best performance. It achieves
significant improvement on task 1, while the performance on task In this work, we propose a heterogeneous graph framework for
2 is not big. multi-target cross-domain recommendation (HeroGRAPH). This
On task 1, HeroGRAPH performs good on all three domains. is a challenging but promising task. We fistly propose to use a
On task 2, we find that some methods are comparative, especially shared structure to model information from all the domains such as
BPR gains best performance on domain Beauty. There are several a graph. Then we propose a recurrent attention to gradually refine
the process of neighbor aggregation to relieve the sparsity issue.
2 https://github.com/cuiqiang1990/HeroGRAPH Experiments show the effectiveness of our model.
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil Cui, et al.
HeroGRAPH-Att-2 HeroGRAPH-Att-2
domain: Music 58
70 domain: Instrument
domain: Video
65 56 domain: Clothing
AUC (%)
AUC (%)
domain: Beauty
54 domain: Health
60
55 52
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
Dropout Rate Dropout Rate
Table 3: Performance comparison on sparse set. Items in sparse set appear no more than 5 times in test set.
[7] Pan Li and Alexander Tuzhilin. 2020. DDTCDR: Deep Dual Transfer Cross (2014), 124–134.
Domain Recommendation. In WSDM. 331–339. [15] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
[8] Zhiwei Liu, Lei Zheng, Zhang Jiawei, Jiayu Han, and Philip Yu. 2019. JSCN: Joint Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR (2018).
Spectral Convolutional Network for Cross Domain Recommendation. 850–859. [16] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S
https://doi.org/10.1109/BigData47090.2019.9006266 Yu. 2019. Heterogeneous graph attention network. In WWW. 2022–2032.
[9] Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-Domain [17] Xinghua Wang, Zhaohui Peng, Senzhang Wang, S Yu Philip, Wenjing Fu, and
Recommendation: An Embedding and Mapping Approach.. In IJCAI. 2464–2470. Xiaoguang Hong. 2018. Cross-domain recommendation for cold-start users via
[10] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. neighborhood based feature mapping. In International Conference on Database
2015. Image-based recommendations on styles and substitutes. In ACM SIGIR. Systems for Advanced Applications. Springer, 158–165.
43–52. [18] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,
[11] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale
2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452– recommender systems. In ACM SIGKDD. 974–983.
461. [19] Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. DARec: deep domain
[12] Steffen Rendle, Walid Krichene, Li Zhang, and John Anderson. 2020. Neu- adaptation for cross-domain recommendation via transferring rating patterns. In
ral Collaborative Filtering vs. Matrix Factorization Revisited. arXiv preprint IJCAI.
arXiv:2005.09683 (2020). [20] Feng Zhu, Chaochao Chen, Yan Wang, Guanfeng Liu, and Xiaolin Zheng. 2019.
[13] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based DTCDR: A Framework for Dual-Target Cross-Domain Recommendation. In CIKM.
collaborative filtering recommendation algorithms. In WWW. 285–295. 1533–1542.
[14] Shulong Tan, Jiajun Bu, Xuzhen Qin, Chun Chen, and Deng Cai. 2014. Cross
domain recommendation based on multi-type media fusion. Neurocomputing 127