0% found this document useful (0 votes)
27 views10 pages

Context-Aware Session-Based Recommendation With Graph Neural Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views10 pages

Context-Aware Session-Based Recommendation With Graph Neural Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Context-aware Session-based Recommendation with

Graph Neural Networks


1st Zhihui Zhang 2rd Jianxiang Yu 3nd Xiang Li∗
School of Data Science and Engineering School of Data Science and Engineering School of Data Science and Engineering
East China Normal University East China Normal University East China Normal University
Shanghai, China Shanghai, China Shanghai, China
[email protected] [email protected] [email protected]

period of time, and SBR aims to predict the next item based
arXiv:2310.09593v1 [cs.IR] 14 Oct 2023

Abstract—Session-based recommendation (SBR) is a task that


aims to predict items based on anonymous sequences of user on an anonymous short-term behavior sequence.
behaviors in a session. While there are methods that leverage To address the SBR problem, some existing methods [9]–
rich context information in sessions for SBR, most of them have
the following limitations: 1) they fail to distinguish the item-item [11], [19] utilize the rich context information in sessions,
edge types when constructing the global graph for exploiting which generally includes both intra-session and cross-session
cross-session contexts; 2) they learn a fixed embedding vector ones. For the former, we can further divide it into item-
for each item, which lacks the flexibility to reflect the variation level context, which characterizes the neighboring items in
of user interests across sessions; 3) they generally use the one- the behavior sequence for an item, and session-level con-
hot encoded vector of the target item as the hard label to
predict, thus failing to capture the true user preference. To solve text, which refers to the complete sequence information in
these issues, we propose CARES, a novel context-aware session- a session. Similarly, the latter includes collaborative infor-
based recommendation model with graph neural networks, which mation from sessions with similar behavioral patterns for
utilizes different types of contexts in sessions to capture user both item and session. Details on the division of contexts
interests. Specifically, we first construct a multi-relation cross- are given in Figure 1. Early studies [4], [5], [7] for SBR
session graph to connect items according to intra- and cross-
session item-level contexts. Further, to encode the variation employ intra-session contexts only, whose performance could
of user interests, we design personalized item representations. be adversely affected when the behavior sequence in a ses-
Finally, we employ a label collaboration strategy for gener- sion is very sparse. Recently, there are also methods [9]–
ating soft user preference distribution as labels. Experiments [11], [19] that leverage both intra-session and cross-session
on three benchmark datasets demonstrate that CARES con- contexts, which aim to incorporate contextual information
sistently outperforms state-of-the-art models in terms of P@20
and MRR@20. Our data and codes are publicly available at from relevant sessions to enrich the representation of a given
https://github.com/brilliantZhang/CARES. session. In particular, some methods [11], [19] propose to
Index Terms—Session-based recommendation, Graph neural construct a global graph to link items from various sessions
networks, Collaborative learning according to the intra- and cross-session item-level context
information, and then learn item/session embeddings based
I. I NTRODUCTION on the graph. Despite the success, most of these methods
Recommendation systems play a crucial role in various suffer from three major limitations. First, when constructing
fields because they provide users with personalized informa- the global graph, they fail to distinguish the item-item edge
tion to complete a task in the midst of a large amount of types. Since it has been verified in [29], [30] that integrating
information. At present, many recommendation models have item attributes can improve the recommendation performance,
achieved great success, but most of them usually need to use the categorical attributes of items (e.g., “shirts” and “pants”
user profiles. However, as the number of users on the platform belong to the apparel category), can be used to distinguish
grows and privacy awareness increases, user profiling may not item relations. For example, if products in two categories are
be available in certain applications. Without obtaining user frequently interacted by users, the relation between the two
profiles as well as long-term historical user behaviors, it is hard item categories is of more importance. Second, they learn a
to accurately model portraits of users. Consequently, session- fixed embedding vector for each item. However, since user
based recommendation (SBR) has recently attracted more interest could vary across sessions, the embedding of an item
attention. Here, a session can generate interactive behavior should be learned to reflect the variation of user interests and
sequence (e.g., clicks in e-commerce scenarios) in a short personalized w.r.t. different sessions. Third, they generally use
the one-hot encoded vector of the target item as the hard label
This work is supported by Shanghai Pujiang Talent Program No. to be predicted, which may not reflect the true user preference.
21PJ1402900, Shanghai Science and Technology Committee General Program However, the true distribution of user preferences is usually
No. 22ZR1419900 and National Natural Science Foundation of China No.
62202172. unknown, as only a limited number of items are exposed to
*Corresponding author users. Simply regarding the one-hot encoded vector of the
complexity increase exponentially as the problem scale in-
Session 1 creases. Recently, owing to the powerful representation ca-
pability of deep learning, many deep-learning-based methods
[5]–[7] have been successfully applied to SBR. In particular,
Session 2 some approaches [5], [7] exploit Recurrent Neural Networks
(RNNs) to characterize the item’s sequential information in the
:Contexts
Intra-session item-level context
session. However, these RNN-based methods are incapable of
𝜈1 𝜈2 𝜈3 𝜈4 Session 1: 𝜈1 , 𝜈2 , 𝜈3 , 𝜈4 Internal-session-level context
capturing long-term item dependencies.
Cross-session item-level context
Recently, Graph Neural Network (GNN) [8], [12], [16]
Session 2: 𝜈5 , 𝜈6 , 𝜈7 , 𝜈2 , 𝜈8 External-session-level context
has attracted more and more attention due to their powerful
𝜈5 𝜈6 𝜈7 𝜈8
learning ability for graph structure data representation. To
explore the complex transition relation between items in the
Figure 1: Contexts referred to in this paper. session, GNN-based SBR constructs sessions into graphs and
utilizes GNNs to model the session graph. For instance, SR-
GNN [8] first converts the session into a graph and utilizes
target item as the true distribution could induce bias and lead Gated GNN [23] to model the session graph to explore the
to the overfitting problem [15]. complex transition relations between items in the session.
In this paper, to address these problems, we propose a novel After that, GC-SAN [13] further extends SR-GNN by adding
context-aware session-based recommendation model CARES, self-attention mechanism. However, constructing sessions into
which leverages the four types of contexts introduced earlier. graphs will introduce noise and lose the sequential order
Specifically, we first construct a multi-relation cross-session information, so some GNN-based methods are proposed to
graph to connect items according to intra- and cross-session alleviate these problems. LESSER [14] improves the way of
item-level contexts, where edge relations are defined based on graph construction for sessions, taking into account the relative
item categories. Then based on the graph, we learn general order of nodes in sessions. SGNN-HN [12] alleviates the long-
item embeddings with graph neural networks (GNNs). Fur- range dependency problem by introducing a Star GNN, which
ther, to encode the variation of user interests, we also learn improves the information propagation mechanism between
personalized item representations w.r.t. sessions with a gating items. All these methods only focus on utilizing the internal
mechanism. After that, we unify item embeddings with item information in a session.
positions and session length to learn session representations.
Finally, to alleviate the bias induced by the hard label of one- B. Cross-session Learning in SBR
hot encoded vector of the target item in a session, we employ Utilizing the current session only to make recommenda-
internal- and external-session-level contexts, and present a tions is constrained by its limited information. To incorporate
label collaboration strategy, which uses most similar histor- collaborative information from external sessions, some collab-
ical sessions to the current session for collaborative filtering orative filtering-based SBR methods are proposed to enhance
and generates soft label of user preferences to be predicted. the current session representation. For example, CSRM [9]
We next summarize our main contributions in this paper as incorporates the relevant information contained in the neigh-
follows: borhood sessions by adopting a memory module to obtain
• We propose a novel context-aware session-based recom- more accurate session representations. CoSAN [10] utilizes
mendation model CARES. multi-head attention mechanism to fuse item representations
• We design personalized item embeddings w.r.t. sessions in collaborative sessions by building dynamic item representa-
to capture the variation of user interests across sessions. tions. GCE-GNN [11] simultaneously constructs local session
• We propose a simple and effective label collaboration graphs and a global graph, then extracts information related to
method that generates soft user preference distribution the current session from the global graph. MTD [32] constructs
as labels. a global graph connecting adjacent items in each session and
• We conduct extensive experiments on three public bench- utilizes graphical mutual information maximization to capture
mark datasets to show the superiority of our method over global item-wise transition information to enhance the current
other state-of-art models. session’s representation. S 2 -DHCN [38] utilizes hypergraph
convolutional networks to capture high-order item relations
II. RELATED WORK and constructs two types of hypergraphs to learn informa-
tion from inter- and intra-session. The view augmentation in
A. Session-based Recommendation COTREC [37] enables the model to capture beyond-pairwise
Early studies [3] on SBR use the similarity between the relations among items across sessions.
last item of the session and candidate items to make recom-
mendations. However, they omit the sequential information in C. Multi-relation Learning in SBR
the session. While the Markov-chain-based method [4] can Heterogeneous graphs have proven effective in handling
bridge the gap, the number of states and the computational information by modeling complex high-order dependencies
among heterogeneous information. They can extract user in- A. Multi-relation Cross-session Graph
terests more accurately through global item relations across Most early graph-based methods [8], [12]–[14], [16] model
sessions [16], [34], [35]. AutoGSR [16] uses Network Archi- the item transition patterns in a single session into graphs only
tecture Search (NAS) techniques to automatically search for and ignore the global-level relations between items in different
better GNN architectures that capture information on local- sessions. Therefore, we propose to build connections between
context relations and various item-transition semantics. MGIR different sessions to employ global-level item transition rela-
[35] utilizes item relations of incompatible and co-occurrence tions further. Specifically, we follow [11] to build these edge
relations to generate enhanced session representations, while connections based on ε-neighbor sets of items in all sessions,
CoHHN [34] proposes a heterogeneous hypergraph network which are formally defined as follows.
to model price preferences. These works either built multiple Definition 1: ε-Neighbor set [11]. Given a set of sessions
relation graphs or used hypergraphs to model artificial features U , for an item vix in session Sx , its ε-Neighbor set is a set of
or side information as auxiliary information in modeling user items with:
actions. However, constructing sessions into multiple relation Nε (vix ) = vjy j ∈ [k − ε, k + ε], ∀vky = vix , vky ∈ Sy , Sy ∈ U ,

graphs is cumbersome. Therefore, we propose a new approach
that models association analysis of items’ categories in SBR where i, j, k are positions of items in corresponding sessions,
based on a single heterogeneous graph. respectively. Further, ε is used to control the neighboring range
of item transition.
III. PRELIMINARIES Based on the ε-Neighbor set, items from different sessions
can be linked. Some existing works [11], [15] consider all the
In this section, we introduce the problem statement of SBR item transitions as one type of relation, while we distinguish
and the definition of item-side information. item transitions by taking the categorical attribute of items
into consideration. Intuitively, if a user successively clicks on
items v1 and v2 whose categories are different, we cannot
A. Problem Statement
simply consider these two items are related, because this
We formally formulate the task of session-based recom- could also indicate the drift of user interest. Further, if a
mendation (SBR). Let V = {v1 , v2 , . . . , vm } be all of items, user sequentially clicks on items v1 and v2 with the same
where m is the number of items in V . Assuming that all category, it is more likely that the two items are highly
sessions are denoted as U = {S1 , S2 , . . . , Sn }, where n is the related. This is because a user usually views a number of
number of sessions. Each anonymous session Sτ in U , which similar items before picking the one to buy. Therefore, we
is denoted by Sτ = {v1τ , v2τ , . . . , vtτ }, consists of a sequence of propose to construct a multi-relation cross-session graph based
interactions in chronological order, where vtτ denotes the item on item context and category. Formally, it is defined as
that the user interacted with at the t-th timestamp in the session G = (V, E), where V denotes the node set that contains
Sτ , and the length of Sτ is t. The goal of SBR is to recommend all items in V and E = {(vi , rij , vj )|vi ∈ V, vj ∈ Nε (vi )}
the next item from V that is most probably interacted with by represents the edge set in the graph. We use rij = (ci , cj ) to
the user given the current session Sτ . We call the item that denote the edge type, which captures the contextual relation
interacted at the t + 1-th timestamp the target item or the between items of categories ci and cj . Further, similar as
ground truth item of the session, i.e., ([v1 , v2 , . . . , vt ] , vt+1 ) in [31], for the edge between vi and vj , we give a weight
F req(vj ∈Nε (vi ))
is a session and its target item pair. eij = (log(F req(vi )α )+1)∗(log(F req(vj )α )+1) . Here, F req(·) is
the frequency counting function over all the sessions. To
B. Item-side Information alleviate the dominant effect of a frequently occurred item,
we also introduce a hyper-parameter α, whose value is set to
Item-side information describes the item itself and can 0.75 in our experiments. To speed up the model efficiency, for
provide extra complementary information for the recommen- each item vi in the graph, we only keep the top-N neighbors in
dation. For each item vi , we use its category cvi as the item- each relation that have the largest weights with it. To further
side information to assist in learning user preference. Let simplify the graph, we only retain the top-Q most frequent
C = {c1 , c2 , . . . , cl } be all of categories of items, where l is contextual relations in the graph. For others, we uniformly set
the number of categories of items in C. Each category of item rij = Same when ci = cj ; rij = Drift, when ci ̸= cj .
vi is encoded into an unified embedding space, i.e., hci ∈ Rd . Figure 3 shows a toy example on converting sessions into a
multi-relation cross-session graph.
IV. THE PROPOSED METHOD
B. Item Representation Learning
This section elaborates on our proposed novel Context- After the multi-relation cross-session graph is constructed,
aware Graph Neural Networks for Session-based Recommen- we next learn item representations. We first use an embedding
dation (CARES). We first give an overview of CARES, which look-up table to initialize embedding hi ∈ Rd for item
is illustrated in Figure 2. Next, we describe each component vi . After that, we employ the attention mechanism [22] to
in detail. generate the general embedding vector for each item based
ℎ8
α11 α18
𝜈1 𝜈2 𝜈3 𝜈4 … 𝜈m
ℎ4 Session Encoder Layer
α14 scores
α16 ℎ1 ෪1

ℎ6 α13 α15 ℎ5

ℎ3
s
G
෪1 ℎ
ℎ ෪2 ℎ ෪4 ℎ
෪3 ℎ ෪5 Prediction Layer

RGAT Layer

s 𝑒1 𝑒2 𝑒3 𝑒4 𝑒5

𝜈1
𝜈4
Current session S1 : 𝑣5 𝑣5 𝑣1 𝑣3 𝑣4 𝑣2 Top-K retrieval
𝜈6
All session data:
𝜈8 𝜈2
S1 : 𝑣5 𝑣5 𝑣1 𝑣3 𝑣4 𝑣2 (𝑐2 𝑐2 𝑐1 𝑐4 𝑐3 𝑐3 )
𝜈5 Recent Session
𝜈7 𝜈3 S2 : 𝑣8 𝑣1 𝑣5 𝑣6 𝑣7 (𝑐2 𝑐1 𝑐2 𝑐1 𝑐3 )
Target Item Store

Figure 2: The overview of CARES.

Algorithm 1 The procedure of item representation learning


𝜈8
(𝑐1 , 𝑐2 ) 𝜈1 𝜈4 Input: Items’ initial embeddings h; The global graph G
Drift 𝜈4
Same 𝜈1
𝜈6 Output: personalized representation of the items hs,(k)
𝜈8 𝜈2
𝜈6 1: for each batch do
𝜈5 𝜈5
𝜈7 𝜈3
𝜈3 2: sample a subgraph Ĝ based on sessions in the batch
S1 : 𝑣5 𝑣5 𝑣1 𝑣3 𝑣4 𝑣2 (𝑐2 𝑐2 𝑐1 𝑐4 𝑐3 𝑐3 ) N(𝑣1 ): {𝑣5 𝑣3 𝑣4 𝑣8 𝑣6 } N(𝑣5 ): {𝑣1 𝑣8 𝑣3 𝑣6 𝑣7} 3: h(0) index from item embedding table
S2 : 𝑣8 𝑣1 𝑣5 𝑣6 𝑣7 (𝑐2 𝑐1 𝑐2 𝑐1 𝑐3 )
N(𝑣2 ): {𝑣3 𝑣4} N(𝑣6 ): {𝑣5 𝑣1 𝑣7 } 4: for each session [v1 , v2 , . . . , vt ] in the batch do
Example: N(𝑣3 ): {𝑣1 𝑣5 𝑣4 𝑣2 } N(𝑣7 ): {𝑣5 𝑣6 } (0)
N(𝑣1 ): {𝑣5 𝑣3 𝑣4 𝑣8 𝑣6 }
N(𝑣4 ): {𝑣1 𝑣2 𝑣3 } N(𝑣8 ): {𝑣5 𝑣1 }
5: h̃ = mean(h1 , h2 , . . . , ht )
Top-Q relation: (𝑐2 , 𝑐1 ) …
6: end for
Figure 3: Illustration of the construction of the cross-session 7: for int k=1 to L do
graph. Here, we set ε = 2. 8: h(k) = RGAT(h(k−1) , Ĝ)
9: for each session in the batch do
(k−1)
10: δ k = Attention(h(k) , h̃ )
(k−1) k
11: hs,(k) = Gating(h(k) , h̃ ,δ )
on GNNs. Then we use a gating mechanism to further learn 12: // update node ṽ
(k−1)
a personalized embedding vector for each item w.r.t. a given 13: β k = Attention(hs,(k) , h̃ )
session. (k)
14: h̃ = WeightedSum(hs,(k) , β k )
Learning General Item Representations. Based on the 15: end for
multi-relation cross-session graph, we can easily capture both 16: end for
the intra-session and cross-session item-level context infor- 17: end for
mation. To learn the representation of an item, since its ε-
Neighbors have different importance, we then introduce item-
level attention. Note that for each item, it has various con- Here, the attention score αij is computed by
textual relations. Therefore, when computing attention scores,   
(k−1) (k−1)
we need to distinguish edge relations. Specifically, in the k-th exp a⊤ σ W1 [hi ∥ hj ∥ eij ∥ rij ]
layer, the representation of item vi is derived by neighborhood αij = P    ,
⊤ (k−1) (k−1)
vk ∈N (vi )∪{vi } exp a σ W1 [hi ∥ hk ∥ eik ∥ rik ]
aggregation, which is formulated as:
where σ is the LeakyReLU function, a and W1 are trainable
parameters, and ∥ denotes the concatenation operator. We also
(k) (k−1) (k−1) take edge weight eij and edge relation embedding rij ∈ Rd
X
hi = αii W1 hi + αij W1 hj . (1)
vj ∈N (vi ) as edge features.
Learning Personalized Item Representations w.r.t. Ses- recent items could be more useful for the prediction of the
(k)
sions. Note that hi in Eq. 1 leverages item-level context next item in the session. We also introduce a shared session
and reflects the general embedding vector of item vi . Since an length embedding look-up table L, where the t-th row lt ∈ Rd
item is generally contained in various sessions, we can further corresponding to the embedding for the length t of a session.
enrich the representation of an item w.r.t. a session. Given a Note that we limit the maximum length of a session to be
session S and an item vi ∈ S, S could contain many items that tmax . After that, for the i-th item in a session of length t, we
are not in Nε (vi ) and all the items in S reflect the user interest unify both the information of item position and session length
in the current session. Therefore, we introduce an embedding into the item embedding hi , and output an updated embedding
vector hsi for item vi that is personalized for the session S. zi for vi :
Inspired by [28], we add a virtual node ṽ that is linked to all zi = hi + pt−i + lt . (6)
the items in the session S, whose embedding vector h̃ is used
to capture the information of all the items in S. After that, we To calculate the representation of a session, we can also
apply a gating mechanism to fuse hi and h̃ to generate hsi : employ item categories in the session. For all the items,
we further define a shared item category embedding look-up
s,(k) (k)
hi = (1 − δi ) hi + δi h̃(k−1) , (2) table, where each row indicates an embedding vector of an
item category. Given a session S of length t, we unify item
where the gating score δi is computed by:
categories in the session as:
 ⊤  
(k) (k−1)
W 2 hi W3 h̃ hc = Mean({hci }ti=1 , ∀vi ∈ S), (7)
δi = Sigmoid  √ , (3)
 
d where hci represents the category embedding of the item vi .
√ Then we use the attention mechanism to fuse the information
where W2 ,W3 ∈ Rd×d are learnable parameters, and d of all the items in S, and have:
is the scaling coefficient. In this way, we can generate a t
personalized embedding vector hsi for item vi w.r.t. the session
X
z̄s = γ i zi , (8)
S. When δi is small, hsi will be close to the general represen- i=1
tation of item vi ; otherwise, hsi will be more indicative to the
information in the current session S. Finally, the embedding where the attention weight γi can be calculated by a two-layer
(k) MLP:
vector h̃ for ṽ in the k-th layer is updated as:
γi = MLP(zi ∥ zt ∥ h̃ ∥ hc ) (9)
s,(k)
X
h̃(k) = βi hi , (4)
Here, h̃ is the embedding of the virtual node ṽ in Equation 4.
vi ∈S
we also use the embedding zt of the last item in S because it
where the weight βi is calculated by: could be highly related to the prediction of the next item.

s,(k)
⊤   After that, we combine z̄s and zt to capture user interests
(k−1)
W 4 hi W 5 h̃ in session S:
βi = Softmax  √ . (5)
 
d zs = W6 [z̄s ∥zt ] , (10)

where W6 is a weight parameter. Further, inspired by the skip


Note that W4 ,W5 ∈ Rd×d are trainable parameters. connection technique in [27], we directly derive embedding of
C. Session Representation Learning item vi from the look-up table and rerun Equations 6-10 to
generate a new zs (we denote it as z′s for difference) without
Given a session S, although the embedding of the virtual the item representation learning stage in Section IV-B. Finally,
node ṽ contains the information of all the items in S, it omits the representation of session S is computed by:
the temporal information and cannot be simply taken as the
representation of the session. In the previous section, for each hs = zs + z′s . (11)
item vi , we have computed its general embedding hi and
personalized embedding hsi w.r.t. a session S, respectively. For D. Label Collaboration
notation brevity, we overload the embedding of item vi as hi Most existing works [8]–[12] use the one-hot encoded vec-
and next show how to calculate session representations based tor of the target item as the hard label of user preference, which
on item representations. may not reflect the true preference. The intuition is that users
To leverage item sequence in a session, in addition to item are generally only exposed to a limited number of items, so the
embeddings, we further incorporate the positional information lack of other items could induce a bias to user interest. Further,
of items and the length of the session. For all the sessions, user preference is also influenced by different time periods
we use a shared position embedding look-up table P, where and contextual scenarios, which can deviate from historical
the r-th row pr ∈ Rd represents the embedding vector for data over time. Therefore, to address the problem, we employ
the (t − r)-th reverse position in a session of length t. Note the session-level contexts and propose a label collaboration
that we choose a reverse order for positions because the most strategy, which aims to explicitly utilize the target items of
historical sessions with most similar behavioral patterns to the where ŷi denotes the probability of interacting with item vi
current session as collaborative label information. in the next timestamp. The total loss function consists of two
Collaborative Sessions Retrieval. Given a session S, our components: a cross-entropy loss based on the hard label y
target is to first retrieve K sessions that are most similar to S and a KL-divergence loss based on the soft label ỹ:
from a fixed-size candidate session pool with M most recent
sessions. Intuitively, the more sessions we retrieve, the more L = CrossEntropy(ŷ, y) + λKLD(ŷ, ỹ), (14)
accurate the user preference could be estimated, and the larger where λ is a trade-off parameter that is used to control the
computation cost will be induced. Therefore, we further utilize importance of the two components.
SimHash [1] to speed up the model efficiency. The SimHash
function takes the session representation as input and generates V. EXPERIMENTS
its binary fingerprint, where each entry is either 0 or 1. It has In this section, we conduct extensive experiments on
been pointed out in [2] that the outputs of SimHash satisfy three publicly available datasets to show the effective-
the locality-sensitive properties that the outputs are similar if ness of our method. We preprocess these datasets as in
the input vectors are similar to each other. Specifically, we [8]. First, we arrange all the sessions in the chronolog-
first project embeddings of S and other candidate sessions ical order and split the data into training data and test
into binary fingerprints by multiplying the input embedding data by the timestamps of sessions. Second, we filter out
vectors with a hash function, which is set to be a fixed random items that appear less than 5 times or only appear in
projection matrix H ∈ Rd×m , where m < d. As a result, the test set, and also the sessions of length one. Third,
similar session embedding vectors can get the same hashing we perform data augmentation with a temporal-window
output. After that, we calculate the hamming distance between shifting to generate more data samples in a session, e.g.,
the output vectors and select the top-K most similar sessions ([v1 , v2 , . . . , vn−1 ] , vn ) , . . . , ([v1 , v2 , ] , v3 ) , ([v1 ] , v2 ) for ses-
to S from M candidate sessions by: sion [v1 , v2 , . . . , vn ]. Further, we adopt two widely used
NS , WS = topK (−HammingDistance (e, ê)) , evaluation metrics in information retrieval: Precision (P@20)
and Mean Reciprocal Rank (MRR@20) for evaluating the
where e = SimHash(S), ê = SimHash(Ŝ), and Ŝ is performance.
derived from M candidate sessions. The weights WS are
then normalized to ensure that they sum to 1. We denote Table I: datasets statstics
the set of one-hot encoded labels of selected sessions as
NS = yS1 ,yS2 , . . . , ySK and the set of corresponding weights Dataset Diginetica Tmall Yoochoose1 64
#Train sessions 719, 470 351, 268 369, 859
as WS = w1S , w2S , . . . , wK S
, which will be used for label #Test sessions 60, 858 25, 898 55, 898
collaboration of session S. Further, the pool is updated by a #Items 43, 097 40, 728 16, 766
slide window scheme: removing the oldest sessions and adding Avg. lengths 5.12 6.69 6.16
the most recent ones in the next batch. Therefore, compared to
the time complexity O (M Bd) of retrieval by cosine similarity
A. Datasets
in [9], the time complexity of our retrieval is O (Bm), where
B is the batch size, M is the pool size and m is smaller than The following datasets are utilized to evaluate our model.
session representation dimensionality d. The statistics of the processed datasets are shown in Table I.
Collaborative Label Generation. After K most similar • Diginetica1 contains anonymous user transaction infor-
sessions are retrieved, we next construct the soft label for mation extracted from e-commerce search engine logs for five
session S. These K sessions can help provide more com- months. The dataset is from CIKM Cup 2016.
prehensive estimation for user interests than using S only. • Tmall2 records the anonymized users’ shopping logs on
Therefore, we obtain the collaborative label for S by a the online shopping platform called Tmall. The dataset comes
weighted sum of the one-hot encoded label of each retrieved from the IJCAI15 competition.
session: • Yoochoose1 643 was built by YOOCHOOSE GmbH to
K
X support RecSys Challenge 2015. It records users’ clicks from
ỹ = wiS ySi . (12) an e-commerce website. We follow Wu [8] by using the most
i=1
recent proportion 1/64 of the training sessions.
E. Prediction Layer
The prediction layer is used to output the probability distri- B. Hyper-parameter Setup
bution of items that the user will interact at the next timestamp Following [8], [12], the dimension of the latent vectors is
in the current session. Due to the long-tail distribution problem fixed to 256, and the batch size is set to 100. We use the
[24] in the data for recommendation, we normalize item Adam optimizer with the initial learning rate of 0.001, which
embeddings and session embeddings in each layer. Finally, will decay by 0.8 after every 3 epochs. The l2 penalty is set to
we feed them into a prediction layer, where the inner product
1 http://cikm2016.cs.iupui.edu/cikm-cup
and the Softmax function are applied to generate the output: 2 https://tianchi.aliyun.com/dataset/dataDetail?dataId=42

ŷi = Softmax(h⊤
s hi ), (13) 3 http://2015.recsyschallenge.com/challege
10−5 and the dimension of the hash matrix in SimHash is set to shows the effectiveness of introducing cross-session item-level
64. The candidate number of sessions is set to 1500 in the label context by graph modeling.
collaboration strategy. We set the parameter λ for adjusting the (2) CARES outperforms other GNN-based models SR-
loss weights to 0.1 for Deginetica 5 for Yoochoose1 64, and GNN, LESSER, AotoGSR, and SGNN-HN. This is because
10 for Tmall. We vary the number of retrieved target items in all these methods are designed for local sessions without con-
label collaboration from {10, 30, 50, 70, 90} and the number sidering cross-session information in the global view. While
of frequent contextual relations from {0, 5, 10, 20, 30} to study the cross-session method COTREC leverages self-supervision
their effects. for enhancing session representation, it ignores heterogeneity
and is outperformed by CARES .
C. Baselines (3) The leading performance of CARES and COTREC over
To verify the performance of our proposed model, we com- GCE-GNN implies that it is useful to capture the internal-
pared our model with 17 other methods, which can be grouped session-level context in the global graph because the latter
into three categories. Readers are referred to Section II for only considers the cross-session item-level context of item-
more details. transitions and lacks diversity in its collaborative informa-
tion. Therefore, COTREC employs self-supervised learning to
(Single Session methods): POP recommends the most
impose a divergence constraint on global view and internal-
popular items. Item-KNN [3] recommends items based on
session view of item embedding, while CARES further in-
the cosine similarity between items in the current session
troduces personalized item representation w.r.t sessions. This
and candidate items. FPMC [4] uses both Markov chain and
demonstrates the significance of the internal-session-level con-
Matrix Factorization to consider the user’s personalized and
text in global graph modeling.
general information. GRU4REC [7] exploits the memory of
(4) Our approach achieves the best performance in all the
GRUs by characterizing the entire sequence. NARM [5] and
datasets, which shows the importance of making full use
STAMP [6] further utilize attention mechanism additionally,
of contexts in sessions. Further, our model has a significant
which aims to capture the current interest and general interest
improvement in terms of MRR@20 on Diginetica and Yoo-
of the user. SRGNN [8], LESSER [14], SGNN-HN [12],
choose1 64, indicating that the item relevant to users’ interests
convert each session into a graph and do not utilize cross-
can be ranked higher, which is critical for user experience
session information.
improvement and confirms the superiority of our model.
(Cross Session methods): CSRM [9] incorporates the
(5) To ensure a fair comparison, we conducted experiments
relevant information in the neighborhood sessions through the
with an additional variant model that does not use side
memory network. CoSAN [10] utilizes multi-head attention
information to construct the graph. As shown in Table II,
mechanism to build dynamic item representations by fusing
even without utilizing the side information of the item’s
item representations in collaborative sessions. GCE-GNN [11]
category (aka CARES ns), our method still performs well
and MTD [32] simultaneously focus on cross-session and
across different datasets.
intra-session dependencies. COTREC [37] and S 2 -DHCN
[38] employ a global argumentation view of items to mine E. Ablation Study
informative self-supervision signals.
We conduct an ablation study on CARES to understand the
(Multi-relation methods): AutoGSR [16] and MGIR [35] characteristics of its main components. One variant updates
both learn multi-faceted item relations to enhance session rep- items’ embeddings by directly capturing information from
resentation. Note that MGIR utilizes cross-session information intra-session without utilizing general information to model
while AutoGSR does not. item-transition relationships on the global graph. This helps
us understand the importance of including cross-session item-
D. Overall performance
level context in SBR. We call this variant CARES ng (no
From the experimental results on the three datasets in general information). Another variant learns items’ embedding
Table II, we have the following observations: (1) It is observed without personalized information w.r.t sessions. We call this
that methods utilizing RNNs or attention mechanisms perform variant CARES np (no personalized information), which
better than early methods such as Item-KNN and FPMC helps us evaluate the effectiveness of internal-session-level
because they are both suitable for dealing with sequential context. To show the importance of the label collaboration
data with temporal information without losing the internal- strategy, we train the model with cross-entropy loss only
session-level context. Methods such as CSRM and CoSAN and call this variant CARES nl (no label collaboration).
offer higher performance for introducing auxiliary information CARES ns (no side information) represents the variant of
from historical sessions than single session methods like CARES without considering category information of items to
GRU4Rec, NARM and STAMP. This confirms the effective- understand the effect of items’ category association in SBR.
ness of leveraging external-session-level contexts. The current From the experimental results in Figure 4, the follow-
best-performing methods such as SGNN-HN, COTREC and ing observations are made. (i) Compared with CARES ng,
MGIR are GNN-based approaches because GNNs are good CARES leverages cross-session item-level context and thus
at capturing complex item-transitions across sessions, which can utilize diverse collaborative information from the global
Table II: Overall performance comparison on three datasets. For fairness, we directly report the results of baseline methods
from their original papers, where “-” indicates the absence of corresponding results in the original papers.
Diginetica Tmall Yoochoose1 64
Method P@20 MRR@20 P@20 MRR@20 P@20 MRR@20
POP 1.18 0.28 2.00 0.90 6.71 0.58
Item-KNN 35.75 11.57 9.15 3.31 51.60 21.81
FPMC 22.14 6.66 16.06 7.32 45.62 15.01
GRU4Rec 30.79 8.22 10.93 5.89 60.64 22.89
NARM 48.32 16.00 23.30 10.70 68.32 28.63
STAMP 46.62 15.13 26.47 13.36 68.74 29.67
SR-GNN 50.73 17.59 27.57 13.72 70.57 30.94
LESSR 51.71 18.15 23.53 9.56 70.05 30.59
SGNN-HN 55.67 19.45 − − 72.06 32.61
CSRM 48.49 17.13 29.46 13.96 − −
CoSAN 51.97 17.92 32.68 14.09 − −
GCE-GNN 54.22 19.04 33.42 15.42 70.91 30.63
S 2 -DHCN 53.18 18.44 31.42 15.05 − −
MTD 51.82 17.26 29.12 13.73 71.88 31.32
COTREC 54.18 19.07 36.35 18.04 − −
AutoGSR 54.56 19.20 33.71 15.87 71.77 31.02
MGIR − − 36.41 17.42 − −
CARES ns 55.29 21.04 38.17 17.79 71.82 33.05
CARES 56.49 23.22 38.77 18.37 72.21 34.40
Improv. 1.47% 19.30% 6.48% 1.82% 0.20% 5.48%

graph and outperform CARES ng. (ii) It can also be ob- G. Sensitivity Analysis of Hyper-Parameters
served that CARES with learning personalized information We end this section with a sensitivity analysis on the hyper-
beats CARES np on all the datasets. This indicates that parameters of CARES. In particular, we study two hyper-
internal-session-level context can effectively preserve user parameters: the hash matrix dimension m and the number
intent through adding personalized information w.r.t sessions. of retrieved sessions K. In our experiments, we vary one
(iii) CARES performs better than CARES nl, and this in- parameter each time with others fixed. Fig 6 illustrates the
dicates that utilizing the target items of historical sessions results with w.r.t. P@20 and MRR@20 scores on the datasets
with similar behavioral patterns to the current session as of Tmall and Yoochoose1 64. (Results on other datasets scores
external-session-level context can mitigate the bias in the user exhibit similar trends, and thus are omitted for space reasons.)
preference distribution. (iv) CARES also defeats CARES ns, From the figure, we see that
indicating that items’ category plays an important role in learn-
(1) A larger dimension m can slightly improve the perfor-
ing users’ preferences. Additionally, although side information
mance of the model. Since the model is not very sensitive to
improves recommendation accuracy, our model still performs
the hash matrix dimension, setting a small size of m can also
well without it, as shown in Table II.
guarantee the performance of the model.
(2) Fewer retrieved sessions in label collaboration are not
sufficient to provide enough information for the current ses-
F. Influence of Contextual Relations
sion. And there is also a performance drop when retrieving
In this section, we study how contextual relations affect the more sessions, which shows that a large number of collabo-
performance of the proposed method. Due to the limited space, rative sessions could contain noise that adversely affects the
we only show the results in terms of MRR@20. The results recommendation performance. So, an appropriate number of
are shown in Figure 5. From the results, we can see that the retrieved sessions K is essential.
models that do not use contextual relations always have lower
VI. CONCLUSION
performance. This is because contextual relations can help the
model capture more complex item context, which indicates In this paper, we propose a novel method named CARES
disentangling the relation semantics of sessions is a promising for session-based recommendation based on graph neural
direction for further exploiting the information across sessions. network. Specifically, it converts the session sequences into a
For different datasets, the optimal number of contextual rela- global graph with item attributes as context. The general item
tions is different. For the dataset Yoochoose1 64, the score representations are generated by various contextual relations
hits the highest when the relation number is set to 30. For through item-level attention. After that, we apply a gating
the other two datasets, the optimal relation number is 5 and mechanism to further enrich the representations of items with
we can see that increasing the number of relations does not personalized information w.r.t sessions. Then the intra- and
always result in a better performance. This is because only the cross-session context information are subsequently combined
relation between items’ categories with enough high frequency to enhance the recommendation performance. Finally, it in-
can be considered a context. corporates label collaboration to generate soft user preference
57.5 39.5 73.0
CARES_ns CARES_ns CARES_ns
57.0 CARES_ng CARES_ng CARES_ng
CARES_np 39.0 CARES_np 72.5 CARES_np
56.5 CARES_nl CARES_nl CARES_nl
CARES CARES CARES
38.5 72.0
56.0
P@20

P@20

P@20
55.5 38.0 71.5
55.0
37.5 71.0
54.5
37.0 70.5
54.0
53.5 36.5 70.0
Diginetica Tmall Yoochoose
18.8 36
CARES_ns CARES_ns CARES_ns
24 CARES_ng 18.6 CARES_ng CARES_ng
CARES_np CARES_np 35 CARES_np
CARES_nl CARES_nl CARES_nl
CARES 18.4 CARES CARES
22 34
18.2
MRR@20

MRR@20

MRR@20
20 33
18.0

18 17.8 32

17.6
16 31
17.4
30
Diginetica Tmall Yoochoose

Figure 4: Model performance in the ablation study

distribution as labels and thus empowers the proposed model to


alleviate the overfitting problem. Comprehensive experiments
demonstrate that our proposed model can make full use of
contexts in sessions, especially those cross-session ones, thus
achieving state-of-the-art performance over three real-world
datasets consistently.

R EFERENCES
[1] Charikar, Moses. “Similarity estimation techniques from rounding algo-
rithms.” Symposium on the Theory of Computing (2002).388.
[2] Chen, Qiwei, Changhua Pei, Shanshan Lv, Chao Li, Junfeng Ge and
Wenwu Ou. “End-to-End User Behavior Retrieval in Click-Through
RatePrediction Model.” ArXiv abs/2108.04468 (2021): n. pag.
[3] Sarwar, Badrul Munir, George Karypis, Joseph A. Konstan and John
Riedl. “Item-based collaborative filtering recommendation algorithms.”
Figure 5: Performance comparison on the number of contex- The Web Conference (2001).
tual relations [4] Rendle, Steffen, Christoph Freudenthaler and Lars Schmidt-Thieme.
“Factorizing personalized Markov chains for next-basket recommenda-
tion.” The Web Conference (2010).
[5] Li, Jing, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian and Jun
Ma. “Neural Attentive Session-based Recommendation.” Proceedings
of the 2017 ACM on Conference on Information and Knowledge
Management (2017): n. pag.
[6] Liu, Qiao, Yifu Zeng, Refuoe Mokhosi and Haibin Zhang. “STAMP:
Short-Term Attention/Memory Priority Model for Session-based Rec-
ommendation.” Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining (2018): n. pag.
[7] Hidasi, Balázs, Alexandros Karatzoglou, Linas Baltrunas and Domonkos
Tikk. “Session-based Recommendations with Recurrent Neural Net-
works.” CoRR abs/1511.06939 (2015): n. pag.
[8] Wu, Shu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie and Tieniu
Tan. “Session-based Recommendation with Graph Neural Networks.”
(a) m (b) K
ArXiv abs/1811.00855 (2018): n. pag.
Figure 6: Sensitivity Analysis of Hyper-Parameters [9] Wang, Meirui, Pengjie Ren, Lei Mei, Zhumin Chen, Jun Ma and M.
de Rijke. “A Collaborative Session-based Recommendation Approach
with Parallel Memory Modules.” Proceedings of the 42nd International
ACM SIGIR Conference on Research and Development in Information Supervised Learning for Sequential Recommendation with Mutual In-
Retrieval (2019): n. pag. formation Maximization.” Proceedings of the 29th ACM International
[10] Luo, Anjing, Pengpeng Zhao, Yanchi Liu, Fuzhen Zhuang, Deqing Conference on Information & Knowledge Management (2020): n. pag.
Wang, Jiajie Xu, Junhua Fang and Victor S. Sheng. “Collaborative Self- [30] Hidasi, Balázs, Massimo Quadrana, Alexandros Karatzoglou and
Attention Network for Session-based Recommendation.” International Domonkos Tikk. “Parallel Recurrent Neural Network Architectures for
Joint Conference on Artificial Intelligence (2020). Feature-rich Session-based Recommendations.” Proceedings of the 10th
[11] Wang, Ziyang, Wei Wei, G. Cong, Xiaoli Li, Xian-Ling Mao and ACM Conference on Recommender Systems (2016): n. pag.
Minghui Qiu. “Global Context Enhanced Graph Neural Networks for [31] Linden, Greg, Brent Smith and Jeremy York. “Amazon.com Recommen-
Session-based Recommendation.” Proceedings of the 43rd International dations: Item-to-Item Collaborative Filtering.” IEEE Distributed Syst.
ACM SIGIR Conference on Research and Development in Information Online 4 (2003): n. pag.
Retrieval (2020): n. pag. [32] Huang, Chao, Jiahui Chen, Lianghao Xia, Yong Xu, Peng Dai, Yanqing
[12] Pan, Zhiqiang, Fei Cai, Wanyu Chen, Honghui Chen and M. de Rijke. Chen, Liefeng Bo, Jiashu Zhao and Xiangji Huang. “Graph-Enhanced
“Star Graph Neural Networks for Session-based Recommendation.” Multi-Task Learning of Multi-Level Transition Dynamics for Session-
Proceedings of the 29th ACM International Conference on Information based Recommendation.” ArXiv abs/2110.03996 (2021): n. pag.
& Knowledge Management (2020): n. pag. [33] Fan, Shaohua, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu
[13] Xu, Chengfeng, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Ma and Yongliang Li. “Metapath-guided Heterogeneous Graph Neural
Xu, Fuzhen Zhuang, Junhua Fang and Xiaofang Zhou. “Graph Contex- Network for Intent Recommendation.” Proceedings of the 25th ACM
tualized Self-Attention Network for Session-based Recommendation.” SIGKDD International Conference on Knowledge Discovery & Data
International Joint Conference on Artificial Intelligence (2019). Mining (2019): n. pag.
[14] Chen, Tianwen and Raymond Chi-Wing Wong. “Handling Information [34] Zhang, Xiaokun, Bo Xu, Liang Yang, Chenliang Li, Fenglong Ma,
Loss of Graph Neural Networks for Session-based Recommendation.” Haifeng Liu and Hongfei Lin. “Price DOES Matter!: Modeling Price and
Proceedings of the 26th ACM SIGKDD International Conference on Interest Preferences in Session-based Recommendation.” Proceedings
Knowledge Discovery & Data Mining (2020): n. pag. of the 45th International ACM SIGIR Conference on Research and
[15] Pan, Zhiqiang, Fei Cai, Wanyu Chen, Chonghao Chen and Honghui Development in Information Retrieval (2022): n. pag.
Chen. “Collaborative Graph Learning for Session-based Recommenda- [35] Han, Qilong, Chi Zhang, Rui Chen, Riwei Lai, Hongtao Song and Li
tion.” ACM Transactions on Information Systems (TOIS) 40 (2022): 1 Li. “Multi-Faceted Global Item Relation Learning for Session-Based
- 26. Recommendation.” Proceedings of the 45th International ACM SIGIR
[16] Chen, Jingfan, Guanghui Zhu, Haojun Hou, C. Yuan and Y. Huang. Conference on Research and Development in Information Retrieval
“AutoGSR: Neural Architecture Search for Graph-based Session Rec- (2022): n. pag.
ommendation.” Proceedings of the 45th International ACM SIGIR [36] Agrawal, Rakesh, Tomasz Imielinski and Arun N. Swami. “Mining
Conference on Research and Development in Information Retrieval association rules between sets of items in large databases.” ACM
(2022): n. pag. SIGMOD Conference (1993).
[17] Brody, Shaked, Uri Alon and Eran Yahav. “How Attentive are Graph [37] Xia, Xin, Hongzhi Yin, Junliang Yu, Yingxia Shao and Li-zhen Cui.
Attention Networks?” ArXiv abs/2105.14491 (2021): n. pag. “Self-Supervised Graph Co-Training for Session-based Recommenda-
[18] Zhu, Xiaojin, Zoubin Ghahramani and John D. Lafferty. “Semi- tion.” Proceedings of the 30th ACM International Conference on Infor-
Supervised Learning Using Gaussian Fields and Harmonic Functions.” mation & Knowledge Management (2021): n. pag.
International Conference on Machine Learning (2003). [38] Xia, Xin, Hongzhi Yin, Junliang Yu, Qinyong Wang, Li-zhen Cui
[19] Ye, Rui, Qing Zhang and Hengliang Luo. “Cross-Session Aware Tempo- and Xiangliang Zhang. “Self-Supervised Hypergraph Convolutional
ral Convolutional Network for Session-based Recommendation.” 2020 Networks for Session-based Recommendation.” AAAI Conference on
International Conference on Data Mining Workshops (ICDMW) (2020): Artificial Intelligence (2020).
220-226.
[20] Bai, Shaojie, J. Zico Kolter and Vladlen Koltun. “An Empirical Eval-
uation of Generic Convolutional and Recurrent Networks for Sequence
Modeling.” ArXiv abs/1803.01271 (2018): n. pag.
[21] Kipf, Thomas and Max Welling. “Semi-Supervised Classification with
Graph Convolutional Networks.” ArXiv abs/1609.02907 (2016): n. pag.
[22] Velickovic, Petar, Guillem Cucurull, Arantxa Casanova, Adriana
Romero, Pietro Lio’ and Yoshua Bengio. “Graph Attention Networks.”
ArXiv abs/1710.10903 (2017): n. pag.
[23] Li, Yujia, Daniel Tarlow, Marc Brockschmidt and Richard S. Zemel.
“Gated Graph Sequence Neural Networks.” CoRR abs/1511.05493
(2015): n. pag.
[24] Gupta, Priyanka, Diksha Garg, Pankaj Malhotra, Lovekesh Vig and Gau-
tam M. Shroff. “NISER: Normalized Item and Session Representations
with Graph Neural Networks.” ArXiv abs/1909.04276 (2019): n. pag.
[25] Tan, Yong Kiam, Xinxing Xu and Yong Liu. “Improved Recurrent
Neural Networks for Session-based Recommendations.” Proceedings of
the 1st Workshop on Deep Learning for Recommender Systems (2016):
n. pag.
[26] Hao, Junheng, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos,
Yizhou Sun and Wei Wang. “P-Companion: A Principled Framework for
Diversified Complementary Product Recommendation.” Proceedings of
the 29th ACM International Conference on Information & Knowledge
Management (2020): n. pag.
[27] Xu, Keyulu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-
ichi Kawarabayashi and Stefanie Jegelka. “Representation Learning on
Graphs with Jumping Knowledge Networks.” International Conference
on Machine Learning (2018).
[28] Ahn, Dasom, Sangwon Kim, Hyun Wook Hong and ByoungChul Ko.
“STAR-Transformer: A Spatio-temporal Cross Attention Transformer
for Human Action Recognition.” 2023 IEEE/CVF Winter Conference
on Applications of Computer Vision (WACV) (2022): 3319-3328.
[29] Zhou, Kun, Haibo Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang,
Fuzheng Zhang, Zhongyuan Wang and Ji-rong Wen. “S3-Rec: Self-

You might also like