1. Introduction
Session-based recommendation (SBR) has become increasingly important across various online platforms, including E-commerce, social networks, and entertainment industries. Unlike traditional recommendation systems (RS) that depend on user profiles and extensive historical interaction data, SBR mines a user’s recent preferences by analyzing their anonymous browsing sessions, thereby predicting the next item they are likely to interact with [
1,
2]. This method is particularly valuable in overcoming the challenge of session data sparsity, which arises when information is either entirely missing (e.g., anonymous users) or insufficient (e.g., limited historical interaction records) [
3,
4].
Recent SBR research has largely been based on graph neural networks (GNNs), treating each session as a graph to gain intricate representations, yielding promising results [
5,
6,
7]. Many of these studies [
8,
9] have integrated attention mechanisms to differentiate between long-term and short-term preferences using the current session’s data. Nevertheless, existing GNNs-based SBR approaches continue to struggle with data sparsity, especially in the context of limited short-term interactions.
Despite their remarkable accomplishments, current GNN-based SBR methods remain constrained by inherent limitations, particularly stemming from data sparsity caused by the ephemeral nature of user interactions. While the multi-view approach has emerged as a promising paradigm to alleviate the effects of data sparsity, it still faces two fundamental challenges that warrant deeper exploration:
Firstly, the issue of incomplete view construction persists. Existing methods typically construct session graphs based solely on isolated user interactions, treating item transitions as simplistic pairwise relationships. Such an oversimplified manner of representation fails to capture the intricate, higher-order dependencies and complex interaction patterns among items within the same session. This limitation not only results in an incomplete characterization of user preferences but also undermines the ability to model nuanced user behavior effectively.
Secondly, the problem of suboptimal view integration remains underexplored. While multiple views offer complementary information, the fusion process often lacks sophisticated strategies for integrating heterogeneous information sources. Moreover, the inherent noise in observed interactions—such as accidental clicks or exploratory browsing of irrelevant items—poses additional challenges. Conventional methods frequently struggle to effectively suppress this noise while preserving meaningful patterns, leading to potential information dilution and degraded recommendation performance.
These challenges highlight the need for more sophisticated approaches that can better handle the complexity of user interactions and improve the robustness of multi-view SBR systems. Addressing the above challenges is crucial for advancing SBR and improving recommendation accuracy in scenarios with sparse interaction data. By developing more sophisticated view construction techniques and robust view integration strategies, it is possible to enhance the ability of SBR models to handle real-world complexities and provide more personalized recommendations.
To tackle the above challenges, we propose a cross-session graph and hypergraph co-guided session-based recommendation (CGH-SBR) framework. CGH-SBR is designed to capture not only cross-session item transitions but also the higher-order relationships among items within sessions. Initially, we create a directed graph to model cross-session item transitions, capturing sequential dependencies. Furthermore, we construct a hypergraph to model higher-order relationships within sessions. Subsequently, we utilize two distinct graph neural network (GNN) architectures to learn item representations on these graphs. Moreover, we have devised a co-guided learning framework that encourages the integration of diverse viewpoints, facilitating reciprocal learning between them. This method enhances the model’s ability to effectively merge information from various sources. Finally, we adopt a co-guiding mechanism to fuse the learned embeddings from multiple perspectives to derive a comprehensive user representation for generating SBR recommendations. We validate the efficacy of our proposed model through extensive experiments on two real-world datasets.
Note the proposed CGH-SBR model distinguishes itself from other multi-view approaches in several key ways, particularly in how it integrates and leverages multi-modal graph structures and co-guided learning mechanisms. Unlike traditional multi-view approaches that often focus on pairwise interactions or static user-item relationships, CGH-SBR explicitly models both cross-session dependencies and intra-session high-order correlations. The directed graph captures sequential dependencies across different sessions, enabling the model to understand long-term user preferences. The hypergraph models higher-order correlations among items within a single session, allowing the model to capture complex relationships that go beyond pairwise interactions. This dual perspective ensures that the model accounts for both short-term and long-term user behavior, making it more robust and comprehensive compared to other methods. In addition, CGH-SBR introduces a co-guided learning framework that allows the two GNNs to collaborate and learn from each other. This framework encourages knowledge sharing between the two networks, enabling them to leverage complementary strengths. This also captures the complex interplay between cross-session dependencies and intra-session correlations, leading to more accurate predictions.
The major contributions of this paper are as follows:
We propose the construction of both directed and hypergraphs to learn various item dependencies across entire sessions.
We introduce a co-guided learning scheme that integrates diverse perspectives and enables mutual learning among them, allowing the model to effectively assimilate information from different sources.
We present comprehensive experiments on two public datasets to showcase the effectiveness of our proposed CGH-SBR model.
The remainder of this paper is organized as follows:
Section 2 presents the foundational concepts and a precise articulation of our research problem.
Section 3 delineates the comprehensive framework and the constituent elements of our approach.
Section 4 provides an analysis of the experimental results, and
Section 5 concludes our study and offers insights into future work.
2. Preliminaries
The objective of SBR is to anticipate the subsequent actions of users based on their anonymous historical activity sequences. Below, we define the SBR problem.
Let represent a collection of sessions over an item set , where N is the number of items. An anonymous session is a sequence of items ordered by timestamps, is the j-th clicked item in St and n is the length of session, which may contain duplicated items. Let be a randomly initialized uniform distribution matrix, with its elements indicating a particular embedding item. In order to capture high-order learning representations, we perform GNNs on the constructed two graphs and to embed each item into the latent space. In particular, let denote the r-th view-specific representation of item vi of dimension d in the l-th layer, where , indicating directed graph and hypergraph, respectively. The final representation of the entire item set is denoted as . Each session St is represented by a vector indicating the combination of each item embedding from X consumed in s.
Formally, the goal of our model is to take all session sequences as an input, given a target session st, the recommendation framework returns a list of top-K candidate items to be consumed as the next one .
3. Methodology
In this section, we introduce our proposed model, cross-session graph and hypergraph co-guided session-based recommendation (CGH-SBR), which harnesses a novel multi-view co-guided mechanism to optimize session-based recommendations. The model effectively leverages both cross-session information and high-order item correlations. The architecture of our model is illustrated in
Figure 1.
In essence, The CGH-SBR model operates in three stages: First, a cross-session graph (CG) is generated to encapsulate the relationships between items across different sessions. Concurrently, a hypergraph (HG) is developed to represent the intricate high-order interactions among items. And then, the model applies a query-aware attention mechanism along with hypergraph graph neural networks (HGNNs) to extract and encode item correlations from both the CG and HG into meaningful item representations (detailed in
Section 3.1 and
Section 3.2). Subsequently, a co-guided mechanism is introduced to capture the mutual relationships between different views and user interests (explained in
Section 3.3). Finally, the prediction layer consolidates the session, item, and influence session embeddings, culminating in the prediction score for the target session-item pair (
Section 3.4).
3.1. Multi-View Construction
In this section, we aim to construct two independent and complementary graphs by jointly considering the cross-session relationship and high-order item connection, i.e., a cross-session graph and a hypergraph. We argue that the former illustrates the cross-session item-level temporal relationships, whereas the hypergraph captures the high-order item correlations.
3.1.1. Cross-Session Graph Construction
To capture the pairwise item transitions across items within the entire session, we construct a cross-session graph from all users’ interaction sequences. This graph encapsulates the sequential relationships between items across different sessions.
In
Figure 1b, there are a total of three sessions depicted. Initially, each individual session is transformed into a basic session graph. Subsequently, considering the recurrence of items, diverse session graphs are merged to form a cross-session graph. The primary aim of extending the session graph to a cross-session graph is to integrate cross-session insights into the process of learning individual session representations. For instance, focusing on node
v2, within session
S2, the items preceding and succeeding
v2 are
v8 and
v1, respectively. Furthermore, within session
S3,
v2 is succeeded by item
v5. Consequently, in the graph representation,
v2 exhibits two outgoing edges, connecting
v2 to
v1 and
v5, and one incoming edge, linking
v8 to
v2. Specifically, the cross-session view, represented as
, is a weighted cross-session graph that extends over a set of sessions. The graph comprises a node set that denotes all item nodes and an edge set that represents all weighted directed edges. We use
to signify a user’s click on an item, following an item in any session, and it indicates the weight of the edge. Note that the edge weight is computed as the frequency of co-occurrence of item pairs
across different sessions, reflecting the popularity of the subsequent item. Importantly, the construction of the cross-session graph aims to integrate cross-session information into the learning process for individual session representations.
By structuring cross-session item transitions into a directed graph, we can naturally leverage the strengths of GNNs to process sequential dependencies. GNNs excel at propagating information through nodes and edges, allowing the model to effectively capture long-term user preferences and sequential patterns that span multiple sessions. In addition, the hierarchical feature extraction is critical for understanding the nuanced relationships between items and predicting the next item in a sequence.
3.1.2. Hypergraph Construction
To capture complex relationships beyond pairwise interactions in SBR systems, following [
10], we utilize a hypergraph
to represent sessions as hyperedges. As stated in
Figure 1b, each hyperedge encompasses all items specific to a particular session. For example, hyperedge 1 encompasses all items associated with it. Specifically, we define each hyperedge as a set of items
within a session, where each item
is a node in the hypergraph. Upon conversion to the hypergraph, each pair of items clicked within a session becomes connected. It is crucial to highlight that we transform session sequences into an undirected graph, aligning with the concept that items within a session exhibit temporal relationships rather than strict sequential dependencies. This methodology enables us to explicitly depict many-to-many high-order interactions. Consequently, the hypergraph effectively encapsulates high-order relationships at the item level.
Traditional graphs are limited in their ability to model higher-order correlations between multiple items within a single session. Hypergraphs, on the other hand, allow for the representation of complex relationships involving multiple items simultaneously, making them ideal for capturing high-order dependencies. Moreover, HGNNs enable the aggregation of information from multiple items within a session. This capability is essential for understanding the collective influence of items on user behavior and preferences. The non-linear nature of hypergraphs allows HGNNs to model intricate and non-trivial relationships between items, which are often overlooked by traditional pairwise interaction models.
3.2. Dual-Channel Encoding
To comprehensively capture the pairwise cross-session transitions and intricate high-order correlations among items, we introduce GNNs and HGNNs that adeptly encapsulate inter-session dependencies into session-level representations, respectively.
3.2.1. Cross-Session Graph Encoding
Initially, we utilize the GCNs message passing mechanism [
11] to encapsulate the local context of the transitional signals that occur between different items within the cross-session graph. We explicitly articulate the encoding function as follows:
where
represents the learned item representations that are updated during
l-th propagation layer. Each entry
ai,j in the adjacent matrix
A is set to 1 if there exists a transition relation from item
vi to
vj and
Aij = 0 otherwise. Note that to integrate the self-propagated signals, we refresh the adjacency matrix by summing the identity matrix and the original adjacency matrix, resulting in
. We further apply the symmetric normalization strategy to conduct the information aggregation as
, where
denotes the diagonal node degree matrix of matrix
.
is the RELU activation function and
is trainable weight matrices under the
l-th propagation layer. After stacking the above GNN networks, we generate the embeddings
for item set
in a cross-session graph.
3.2.2. Hypergraph Encoding
To encapsulate complex interactions among items within a session, hypergraph graph neural networks (HGNNs) [
12] are employed to refine item embeddings. Recognizing that diverse perspectives can impart varying significance to recommendation outcomes, it would be imprudent to broadcast the initial term embedding
X(0) across all channels directly. Hence, to manage the transmission of information from the basic item embedding
X(0) to the HGNNs channels, we introduce prefilters equipped with self-gating units (SGUs), defined as follows:
where
and
are the parameter to be learned, ⊙ is the element-wise product operation, that is, the corresponding element of two vectors is multiplied, σ is a non-linear activation function.
Building on the spectral hypergraph convolution presented by Feng et al. [
12], the hypergraph convolution is conceptualized as follows:
where
is the learnable parameter matrix between two convolution layers,
is the incidence matrix. For
, we assign the same weight value of 1 to each hyperedge. The matrix form of Equation (2) with row normalization can be re-expressed as follows:
The hypergraph convolution process can be understood as a dual-phase enhancement of the feature transformation from “item-session-term” for the structure of hypergraphs. The multiplication operation
facilitates the aggregation of information from items to the session, while the pre-multiplied
H is perceived as the aggregation of information from the session back to the items. Following the convolution of the foundational embedding
across L levels of the hypergraph, the item embeddings at each level are combined by averaging to derive the ultimate item embedding.
3.2.3. Session Representation Learning
To generate a session embedding that captures the preferences and dynamics of the session, we then aggregate the representations of all items in the session. Thus, the session embedding represents the collective behavior or interest of the user during that session.
For a given target session
, the next step involves combining the representations of all items to create a session embedding. Recognizing that different item within this embedding may be of varying importance; we incorporate a soft-attention mechanism to refine the representation, ensuring that the session’s preferences are accurately captured and highlighted:
where
,
denotes the learned representations over sessions under the hypergraph or cross-session graph,
is a linear projection vector for generating the weight scalar
,
and
are learnable transformation matrices and bias vector.
3.2.4. Design Discussion
In this section, we introduce a dual-channel encoding approach with the primary aim of fully exploiting the latent item correlation information within each other. This design offers the following distinct advantages:
Enhanced item interaction dependencies: Traditional graph-based SBR models using graph neural networks (GNNs) often encounter limitations related to single-session item transitions. To tackle this challenge, we introduce cross-session transitions to amplify the latent interaction among items. By merging the cross-session embeddings generated by GNNs with those from cross graphs, we significantly enhance the item interactions within the GNN framework, enabling a more comprehensive capture of contextual information.
High-order item correlation encoding: In contrast to conventional GNN architectures that predominantly focus on pairwise node information, the HGNNs we employ deliberately integrate high-order encoding into the embedding computation process. This inclusion supplements additional intra-session details, leading to a more holistic understanding of the underlying relationships within sessions. Through the incorporation of these design components, our proposed dual encoder surpasses the constraints of traditional GNN-based methods, providing a more comprehensive and effective approach for SBR.
3.3. Co-Guiding Schema
3.3.1. Co-Guiding Learning
As previously discussed in earlier sections, the cross-session embedding and hypergraph embedding are interdependent and collectively determine the accuracy of recommendation predictions. Therefore, after obtaining the embeddings from both perspectives, it is crucial to fuse them effectively to enable the model to extract and utilize insightful information about session properties. A straightforward summing of the embeddings cannot address the intricacies of this task.
In this section, we employ a co-guided learning framework [
13] to delineate the intricate relationships between these two types of embeddings and facilitate their mutual enhancement. Specifically, we facilitate the interchange of information between cross-session embeddings and hypergraph embeddings through the concurrent updating of their respective weights. We merge the cross-session embedding with the hypergraph embedding in two distinct manners:
where
,
are learnable transformation parameters. The
and
represent interactive relations between cross-session embedding and the hypergraph embedding under different semantic spaces.
And then, we utilize a gating mechanism to further model the mutual relations between
and
as follows:
where
. The
represents the remember gate, which controls how much interactions are retained when modeling the relations between them. Additionally, we utilize the complement of
, namely 1 −
, to integrate directed graph (or hypergraph) representations, thereby guiding the learning process and enhancing the semantic relevance of these two types of representations. This mechanism enhances the learning process by increasing the semantic significance of these interactions, thereby enriching the model’s understanding of relationships across sessions.
Finally, we obtain the enhanced cross-session embedding with the hypergraph embedding as follows:
where
and
are the cross-session embedding with the hypergraph embedding interactions whose semantics are enriched by each other. Note that the cross-session embedding with the hypergraph embedding extract information from each other to guide the learning process, which enables our method to model the complex relations between them from two channels in predicting recommendation.
3.3.2. Design Discussion
In this section, we present a co-guiding learning schema with the primary aim of fully exploiting the mutual correlation information within two embeddings. This design offers several benefits:
Diverse Perspective Integration: By using multiple GNNs, each potentially capturing different aspects of the data, the co-guiding mechanism allows for a more comprehensive understanding of the items and their relationships.
Mutual Learning: The co-guiding process facilitates the exchange of knowledge between the different GNNs, leading to a more robust and nuanced learning process. This mutual learning can help in uncovering hidden patterns that a single GNN might miss.
3.4. Model Optimization
Intuitively, the relevance of an item to the current session’s preferences determines its importance for recommendation. Once we have obtained the embeddings for each session, we concatenate the embeddings of a session learned through both channels. This allows us to calculate the score for each potential item, which is defined as follows:
Intuitively, the relevance of an item to the current session’s preferences determines its importance for recommendation. Once we have obtained the embeddings for each session, we concatenate the embeddings of session learned through both channels. This allows us to calculate the score for each potential item, which is defined as follows:
Next, we use the SoftMax function to obtain the model output as follows:
For each session, the loss function is defined as the cross-entropy between the predicted outcomes and the actual data. This can be expressed mathematically as:
where
is the one-hot encoding ground-truth vector of the item in the real data.
Finally, the objective function of SBR task is given as follows:
where
is the set of model parameters, λ is a hyperparameter, and
is the L2-regularization that is parameterized by λ to prevent over-fitting.
4. Experiments
To verify the efficacy of the proposed CGH-SBR model and the precision of its recommendation results, a series of experiments are conducted across two various datasets. These experiments aim to address the following research questions:
RQ1: What are the performance benefits of the CGH-SBR model over existing SBR techniques?
RQ2: How impactful is each component of the CGH-SBR in ensuring accurate recommendation results?
RQ3: How do varying hyperparameters influence the performance of the CGH-SBR method?
Concretely, this section is structured as follows: We first provide a detailed description of the dataset employed for the experiments. We then proceed to introduce the baseline models and the experimental framework. Lastly, we outline the experiments designed to validate the proposed CGH-SBR, analyzing the results to demonstrate its effectiveness.
4.1. Datasets
For our evaluation, we utilize two distinct datasets [
14,
15]:
Tmall: Originating from the IJCAI-15 competition, the dataset comprises anonymized user shopping logs from the Tmall online shopping platform.
Retailrocket: This is a dataset from a Kaggle contest, published by an E-commerce company, containing users’ browsing activity over a six-month period.
The comprehensive statistical details for both datasets are presented in
Table 1. This includes the number of training sessions, test sessions, items within the datasets, as well as the average length of sessions.
4.2. Baselines and Experimental Settings
To thoroughly evaluate the efficacy of our proposed algorithm, we engaged in a comparative analysis with several renowned SBR algorithms from different research streams, detailed as follows:
GRU4REC [
8]: This model employs multiple stacked GRU layers to encode session sequences and utilizes a ranking loss for model training.
- (2)
Attention-based Approaches:
NARM [
16]: This is a neural attention model that argues for a recurrent network for SBR, attentively differing in the encoding of sequential items.
STAMP [
17]: This approach replaces all RNN encoders from prior work with attention layers, aiming to better capture both current user interests and general user interests.
- (3)
GNNs-based Approaches:
SR-GNN [
9]: This models every session as a graph and then employs a gated graph neural network to capture the complex item transitions inside sessions.
FGNN [
18]: This model converts sessions into a global graph and employs a graph attention layer to learn item representations.
S2-DHCN [
10]: It designs two types of hypergraphs to learn inter- and intra-session information and employs self-supervised learning to enhance session-based recommendation.
COTREC [
19]: This model takes into account the internal and external connectivity of sessions and frames it as a contrastive learning model for session-based recommendation.
KGCL [
20]: This builds an item attribute hypergraph with item knowledge and develops HCNs to establish the associations among items with common attributes and encode the complex high-order information among items.
HEML [
21]: This extracts dynamic multiples interests from dual-scale sequential patterns and constructs hypergraph to model global multi-order dependencies.
To evaluate the performance of the methods on the test set, we followed a common strategy for top-K recommendation and preference ranking. We used two widely used evaluation metrics, P@K, MRR@K, where K denotes the number of top recommendations considered.
Our proposed CGH-SBR model is implemented with PyTorch (version 0.4.0), a widely used open-source deep learning framework. Consistent with the setup in [
3,
20,
21], we set the embedding size
d to 100, the mini-batch size to 100, and the L2 regularization to 10-5 to mitigate overfitting. In our model, all parameters are initialized from a Gaussian distribution with a mean of 0 and a standard deviation of 0.1. We trained our model with 30 epochs with the Adam optimizer, with a learning rate of 0.001 and a decay rate of 0.96. The three-layer architecture was deployed to yield the best performance. For the baseline models, we adopted the best parameter configurations reported in their respective original papers and reported their results directly if they were available, as we utilized the same datasets and evaluation metrics.
4.3. Performance Comparison (RQ1)
Initially, we benchmarked the CGH-SBR model against several baseline algorithms, with the results presented in
Table 2.
Our evaluation focuses on the Top-K performance metric, adhering to the guidelines provided in references [
20,
21]. To emphasize the most exceptional outcomes, we have bolded the top-performing results and underlined the best results among the baseline models in each case. In particular, to further refine the analysis, we have also included improvement rates over the second-best comparison method for the experimental results, providing a clearer picture of the performance gains achieved by the CGH-SBR model over the baselines. These rates quantitatively demonstrate the model’s superiority and its robustness across different evaluation metrics. Overall, the following observations can be made from this table:
GRU4REC was the pioneering session-based model to utilize a recurrent architecture for capturing sequential data. However, a notable limitation of RNN-based approaches is ‘catastrophic forgetting’, where initial information is lost as sequences progress. The attention-based baselines, NARM and STAMP, integrate a self-attention mechanism that concentrates on the last item, considering it as the key element. This strategy overcomes the linear sequence processing of RNNs, as evidenced by their superior performance over GRU4REC. This is further substantiated by STAMP’s superior outcomes compared to NARM. STAMP abandons the recurrent architecture altogether and prioritizes the last item with great emphasis.
Among the GNN-based models, S2-DHCN and COTREC show consistent improvement over their respective original models, SR-GNN and FGNN, across all datasets. This indicates that contrastive learning does enhance recommendation performance. Moreover, KGCL and HEML exhibit consistently better performance than their counterparts across all datasets, suggesting that the hypergraph structure indeed contributes to the effectiveness of recommendations.
CGH-SBR model demonstrates remarkable performance, consistently outperforming all baseline methods across four key performance metrics on two benchmark datasets. This superior performance highlights the model’s exceptional capability in session-based recommendation tasks. The success of CGH-SBR can be attributed to several critical factors: Firstly, CGH-SBR leverages cross-session item transitions to capture long-term user preferences and sequential dependencies. Secondly, CGH-SBR goes beyond by modeling higher-order correlations among items within sessions, which can capture complex relationships involving multiple items simultaneously. This ability to account for multi-item contexts significantly enhances the model’s capacity to understand nuanced user preferences and recommend items that align closely with their interests. Finally, the adoption of co-guided learning framework further amplifies the model’s performance by fostering collaboration between the GNN and HGNN. By capturing the complex interplay between the cross-session graph and the hyper-graph, the framework ensures that the model benefits from both local and global perspectives of user-item interactions.
4.4. Ablation Studies (RQ2)
In this subsection, we integrate FGNN as an additional framework to further evaluate the impact of our presented each novel components. In particular, FGNN employs gated graph neural networks (GNNs) to encapsulate the intricate cross-session transitions between items within sessions. Thus, in our experimental setup, we perform ablation studies on the individual components of both CGH-SBR and FGNN by removing/integrating the hypergraph channel and the co-guided learning schema, respectively. The results of these various model configurations are detailed in
Table 3.
From the table, we make the following observations:
(1) Effectiveness of dual-channel mechanism
The dual-channel architecture indeed boosts performance by explicitly accounting for both cross-item transitions and hypergraph high-order correlations.
As demonstrated in
Table 3, the dual-channel design (Model-1) surpasses the FGNN baseline in performance. This improvement stems from our model’s capacity to encompass a comprehensive range of item features, incorporating both cross-item transitions and hypergraph high-order correlations, crucial for accurate and reliable recommendation results. Results from Model-3 and Model-4 indicate that excluding either channel leads to a decline in performance. This highlights that solely focusing on cross-item transitions or hypergraph high-order correlations is insufficient for the model to fully comprehend the intricate nature of item interactions. Consequently, our dual-channel approach effectively integrates both types of interactions, enabling the model to more adeptly capture complexities and enhance overall recommendation performance.
(2) Effectiveness of Co-guided Learning Schema
To evaluate the effectiveness of this framework, we integrate it into Model-1, resulting in the creation of Model-2. As illustrated in
Table 3, the performance of FGNN is notably enhanced with the incorporation of the co-guided learning framework. Furthermore, when we substitute the co-guided learning framework with vector concatenation, as demonstrated by Model-5, there is a significant decrease in performance, indicating that the absence of the co-guided learning framework impedes the model’s ability to capture the reciprocal influence and synergistic effects between the two types of interactions. These results underscore that our model effectively learns distinct session representations by independently modeling the two interaction types.
In essence, the dual-channel architecture enables the accurate differentiation and representation of both interaction types within the model. Additionally, the co-guided learning framework boosts the model’s capability to capture the intricate relationships between these interactions, leading to superior performance.
4.5. The Impact of Hyper-Parameters (RQ3)
In the subsequent sections, we conduct comprehensive experiments to ascertain the influence of various hyperparameters on the performance of the CGH-SBR model.
4.5.1. Hidden State Dimensionality d
We explored embedding dimension within the range of 20 to 140.
Figure 2 presents the experimental results in terms of P@20 and M@20, illustrating the effect of varying embedding dimensions on both datasets. It is evident that the performance improves as the embedding dimension is increased from 20 to 100. However, when the embedding dimension is further increased from 100 to 140, there is a slight decline in performance on both datasets. This decrement can be attributed to overfitting, where the model encodes more information than it can effectively handle. Notably, CGH-SBR demonstrates remarkable robustness to changes in embedding dimensions, as only a minor variation in performance is observed. This suggests that the proposed CGH-SBR is less sensitive to different configurations of hidden state dimensionality.
4.5.2. Depth of Graph Convolution L
To assess the model’s performance with varying layer depths, experiments were conducted across different datasets.
Figure 3 presents the experimental results for each dataset in terms of P@20 and M@20.
It is evident from
Figure 3 that the optimal performance is achieved with a three-layer architecture for both datasets. It is important to note that as the depth of layers increases, so does the computational cost of the model, and performance begins to decline after reaching a certain threshold. This is likely due to the addition of more embedding propagation layers, which can introduce noise signals when modeling associations between items, leading to over-smoothing.
5. Conclusions and Future Work
In this study, we present a novel method termed cross-session graph and hypergraph co-guided session-based recommendation (CGH-SBR), which adeptly forecasts the next item with a focus on efficiency and precision. Initially, we structure cross-session item transitions into a directed graph that encapsulates sequential dependencies, and we also model higher-order correlations among items within sessions to construct a hypergraph. Next, we develop two specialized graph neural networks (GNNs) to extract distinct item representations from these dual graph structures. Following this, we introduce a co-guided learning framework that fosters the amalgamation of varied viewpoints and enables collaborative learning among them. Comprehensive experimentation on two benchmark datasets has validated its superiority, demonstrating its exceptional performance and potential in the domain of recommendation systems. However, the incorporation of cross-session dependencies and higher-order correlations into the model may result in increased computational overhead, especially when dealing with large-scale datasets. This potential for added complexity could constrain the model’s scalability in practical, real-world scenarios. Furthermore, although the capacity to capture higher-order correlations augments the model’s ability to represent complex relationships, it also raises the likelihood of overfitting, particularly when the training data lacks diversity or comprises a small sample size.
While our current approach leverages static graph structures, future work could explore dynamic graph modeling techniques to adapt to evolving user preferences and item relationships over time. Incorporating temporal information into both the directed graph and hyper-graph could further enhance the model’s ability to capture real-time user behavior. Moreover, the integration of additional data sources, such as user demographics, contextual information, and item attributes, could provide richer representations for both graphs. This would enable the model to consider more comprehensive factors when making recommendations, potentially improving personalization.