0% found this document useful (0 votes)
9 views10 pages

Computers and Industrial Engineering - 2023 - Providing Prediction Reliability Through Deep Neural Networks For Recommender Systems

research

Uploaded by

duaankush2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Computers and Industrial Engineering - 2023 - Providing Prediction Reliability Through Deep Neural Networks For Recommender Systems

research

Uploaded by

duaankush2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Computers & Industrial Engineering 185 (2023) 109627

Contents lists available at ScienceDirect

Computers & Industrial Engineering


journal homepage: www.elsevier.com/locate/caie

Providing prediction reliability through deep neural networks for


recommender systems✩
Jiangzhou Deng a , Hongtao Li b , Junpeng Guo b , Leo Yu Zhang c , Yong Wang a ,∗
a
School of Economics and Management, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
b
College of Management and Economics, Tianjin University, Tianjin 300072, China
c
School of Information and Communication Technology, Griffith University, Southport, QLD 4215, Australia

ARTICLE INFO ABSTRACT

Keywords: Deep learning-based recommendation approaches have shown significant improvement in the accuracy of
Reliability recommender systems (RSs). However, beyond accuracy, reliability measures are gaining attention to evaluate
Data pre-processing the validity of predictions and enhance user satisfaction. Such measures can ensure that the recommended
Deep neural networks
items are high-scoring items with high reliability. To integrate the native concept of reliability into a deep
Recommender systems
learning model, this paper proposes a deep neural network-based recommendation framework with prediction
reliability. This framework filters out unreliable prediction ratings according to a pre-defined reliability
threshold, ensuring the credibility and reliability of top-N recommendation. The proposed framework relies
solely on user ratings for reliability, making it highly generalizable and scalable. Additionally, we design a
data pre-processing method to address the issue of uneven distribution of ratings before model training, which
effectively improves the effectiveness and fairness. The experiments on four benchmark datasets demonstrate
that the proposed scheme is superior to other comparison methods in evaluation metrics. Furthermore, our
framework performs better on sparse datasets than on dense datasets, indicating its ability to make strong
predictions even with insufficient information.

1. Introduction Recently, deep learning (DL) approaches have been introduced into
RSs to overcome the limitations of MF-based CFs and significantly
Recommender systems (RSs) are effective information filtering tech- improve recommendation accuracy (Chen, Cai, Chen, & Rijke, 2019).
niques that help online users navigate through the vast amount of DL-based recommendation algorithms primarily adopt deep neural net-
complex information to find products or services that they are inter- works (DNNs) to explore auxiliary information, such as textual de-
ested in. It effectively copes with the information overload problem and scriptions of products and image features of videos, and automatically
improves user satisfaction. Collaborative filtering (CF) method, which model their latent feature representations from given inputs (Zhang,
provide personalized recommendations for users, have been widely Yao, Sun, & Tay, 2019). For instance, He et al. (2017) developed
applied to RSs due to their simplicity and efficiency. Typically, CFs are a general neural networks-based CF framework called NCF to learn
divided into two classes: memory-based and model-based CFs (Duan, user–item interactions instead of using an inner product. Xue et al. pre-
Jiang, & Jain, 2022). However, many studies have confirmed that sented an advanced item-based CF model based on DNNs to effectively
the performance of model-based CFs is generally superior to that of learn the higher-order relations among items for top-N recommenda-
memory-based CFs due to their accuracy and scalability. In model- tions (Xue et al., 2019). Chen et al. (2019) proposed a joint neural
based CFs, matrix factorization (MF) is the most prevailing latent factor CF model (JNCF) that seamlessly integrates deep features of users
model (LFM) (Koren, Rendle, & Bell, 2022) to predict user preferences and items with deep modeling of user–item interactions. Xu et al.
using a linear kernel, i.e., a dot product of user and item latent feature (2022) introduced symmetric DNNs with lateral connections to capture
vectors. Nevertheless, this approach may not capture the complex links complex mapping relations and low-rank relations between users and
of user–item interactions effectively.

✩ This work is supported by the National Natural Science Foundation of China (No. 72301050, 72171165, and 62272077), the MOE Layout Foundation of
Humanities and Social Sciences (No. 20YJAZH102 and 21YJA630021), the Natural Science Foundation of Chongqing (No. cstc2021jcyj-msxmX0557), and the
Science and Technology Research Program of Chongqing Municipal Education Commission (No. KJQN202300605).
∗ Corresponding author.
E-mail addresses: [email protected] (J. Deng), [email protected] (H. Li), [email protected] (J. Guo), [email protected] (L.Y. Zhang),
[email protected] (Y. Wang).

https://doi.org/10.1016/j.cie.2023.109627
Received 2 January 2023; Received in revised form 16 August 2023; Accepted 18 September 2023
Available online 20 September 2023
0360-8352/© 2023 Elsevier Ltd. All rights reserved.
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

items jointly. However, to the best of our knowledge, available DL- 3. A pre-processing solution is designed to eliminate uneven rating
based recommendation approaches completely ignore the reliability distribution, effectively resolving the problem of an unequal
of prediction results, which can significantly impact recommendation number of ratings in each sub-matrix caused by rating preference
quality. Bobadilla, Gutiérrez, Ortega, and Zhu (2018) emphasized the bias while improving validity and fairness of model training.
importance of ‘‘reliable’’ recommendations. For instance, an RS may 4. Experiments on four popular datasets indicate that our scheme
recommend a hotel with 5 stars to an active user, but the user may is better than other comparison methods in various evaluation
not be fully convinced by the recommendation result. This is because metrics.
most RSs can provide additional information, such as the number of
users who have rated the hotel and social trust relationships among The remainder of this paper is organized as follows. In Section 2,
users, to allow users to infer the reliability of recommended items. previous studies relevant to DL-based recommendations and reliability
Typically, people prefer to choose an item with an average vote of 4.7 measures are introduced. Section 3 presents our design framework to
stars and 1000 ratings rather than an item with an average vote of 5 generate a set of predictions with the corresponding reliability prob-
stars and only 10 ratings. More user ratings can enhance the reliability ability. In Section 4, experiments are conducted to demonstrate the
of recommended items. effectiveness of our framework. The conclusions and future work are
Providing related reliability values for prediction ratings can ef- discussed in Section 5.
fectively handle or filter out the low reliability predictions in bulk,
resulting in improved accuracy in RSs. Moreover, reliability informa- 2. Related works
tion is a powerful tool to enhance users’ trustworthiness and loyalty to
a RS due to its explainability. Reliability values also provide adequate In this section, we briefly introduce previous studies that are rele-
information to modulate the probability of predictions (Ortega, Lara- vant to DL-based recommendation approaches and reliability measures
Cabrera, González-Prieto, & Bobadilla, 2021). To our knowledge, only for RSs.
several proposed approaches based on KNN and MF models have
focused on the reliability of recommendations. For instance, Hernando, 2.1. DL-based recommendations
Bobadilla, Ortega, and Tejedor (2013) introduced a memory-based CF
framework that uses a reliability measurement associated with the With the comprehensive and in-depth development of DL techniques
predictions to improve the performance of RSs. Moradi and Ahmadian across various industries, DL approaches have also drawn a lot of
(2015) proposed a reliability-based trust-aware CF approach that builds attention in the area of RSs. They have brought opportunities to address
trust networks of users by fusing similarities and trust statements, and the data sparsity problem and improve the quality of recommendations.
then measures the quality of prediction ratings to enhance the recom- Unlike traditional neural networks, such as artificial neural networks
mendation accuracy. Bobadilla et al. (2018) designed two reliability and multi-layer perceptron (MLP) (Jurik, 1992; Werbos, 1988), deep
measures for various recommendation algorithms to achieve better learning-based neural networks approaches, such as convolutional neu-
accuracy results. Ortega et al. (2021) proposed a matrix factorization ral networks (CNNs) (Kim, Park, Oh, Lee, & Yu, 2016), recurrent neural
based on the Bernoulli distribution named BeMF to obtain both predic- works (RNNs) (Zhu et al., 2021), and deep structured semantic model
tion ratings and their corresponding reliability values. However, it is (DSSM) (Huang et al., 2013) (also known as two-tower model), empha-
crucial to consider the reliability of recommended items in DL-based size model depth and the importance of feature learning, which enables
recommendation methods to ensure their credibility to users, as most more effective portrayal of intrinsic information of data, thereby en-
DL-based recommendations only focus on the ranking of recommended hancing the accuracy and reliability of model training results. In recent
items. years, many studies have proposed various DL-based recommendation
Based on the above discussion and analysis, a DNNs-based recom- models that improve the performance of RSs.
mendation framework with prediction reliability is proposed to help Ziarani and Ravanmehr (2021) combined a CNN with the Particle
RSs filter out unreliable prediction ratings and ensure the credibility Swarm Optimization (PSO) method for generating serendipitous rec-
and reliability of top-N recommendations. The main advantages of ommendations. Deng, Huang, Wang, Lai, and Philip (2019) proposed
the model are as follows: (1) Universality. It does not depend on a simple but effective framework called DeepCF, based on the vanilla
additional information (i.e., social information and reviews) other than MLP model, that flexibly addresses complex matching problems and
ratings to provide prediction reliability for widely used RS datasets. learns low-rank user–item relations effectively. Xue, Dai, Zhang, Huang,
Furthermore, it avoids using external reliability measures to evaluate and Chen (2017) introduced deep matrix factorization (DMF), a neural
the reliability of predictions since the related reliability values are network architecture that maps users and items into a common low-
intrinsically linked to the used DL model. (2) Non-linearity. Compared dimensional space by utilizing non-linear operations. To effectively
with the traditional MF methods that integrate the user and item latent capture deep semantic features of users and items, Ni, Huang, Cheng,
feature vectors linearly, the DNN model can capture the complexity and Gao (2021) proposed a deep representation learning-based rec-
structure of user–item interactions effectively. (3) Flexibility. The pro- ommendation model (RM-DRL) that makes full use of auxiliary item
posed scheme can independently calculate the probability that a user information. To tackle the cold-start problem, Ma, Geng, and Wang
assigns any discrete rating on the rating scale to an item, hence making (2020) incorporated three types of interactions between services and
it completely general and flexible from the perspective of probability. mashups into a DNN to construct a multiplex interaction-oriented
Our main contributions are summarized as follows: service recommendation model (MISR). Wang, He, Wang, Feng, and
Chua (2019) designed a neural graph collaborative filtering (NGCF) to
1. A user–item interaction matrix is divided into several binary sub- effectively integrate the bipartite graph structure into the embedding
matrices according to different rating values, and a two-tower process in an explicit manner. Sun et al. (2020) developed a Bayesian
neural network is adopted to train each sub-matrix in parallel graph CNN framework to handle misleading positive interactions in an
and obtain probabilities that a user assigns different ratings to implicit manner. Xia et al. (2022) proposed a self-supervised recom-
the same item. mendation framework Hypergraph Contrastive Collaborative Filtering
2. The proposed scheme is based on the deep learning classification (HCCF) to jointly capture local and global collaborative relations. Li
technique rather than the regression methods, capturing more et al. (2023) advocated a Siamese Graph Contrastive Consensus Learn-
information, and allowing for the aggregation of probabilities to ing (SGCCL) framework, to explore intrinsic correlations and alleviate
obtain normalized reliability values. the bias effects for personalized recommendation. Despite the success

2
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

of these DL-based recommendation models in terms of top-N recom-


mendations, none of them considers prediction reliability, which has
a significant impact on recommendation quality and user satisfac-
tion. Therefore, we argue that incorporating prediction reliability into
DL-based models can further improve recommendation accuracy.

2.2. Reliability measures for RSs

Trust information and reliability are closely related concepts as they


often overlap (Ortega et al., 2021). In RSs, trust or reliability informa-
tion can be collected explicitly from users or inferred implicitly from
historical user ratings of items (Guo, Zhang, & Thalmann, 2014; Yu,
Guo, Li, Wang, & Fan, 2019), which can be used by machine learning
models or reliability measures to obtain reliability values of predic-
tions. Various studies have proposed incorporating trust or reliability
information into recommendation models. For example, Shen, Zhang,
Yu, and Min (2019) proposed a sentiment-based MF method that
considers the reliability between reviews and ratings to obtain more re- Fig. 1. Parallel deep neural networks-based reliability prediction model framework.
liable and fine-grained prediction ratings. Gohari, Aliee, and Haghighi
(2018) combined a confidence-based recommendation model with trust
statements and certainty information to improve recommendation per- values. Taking the MovieLens dataset as an example, where the rating
formance. Zhu, Ortega, Bobadilla, and Gutiérrez (2018) introduced an domain is {1, 2, 3, 4, 5}, we obtain five sub-matrices that correspond
MF-based architecture and method to assign reliability values to predic- to each possible rating value, as illustrated in Fig. 1. Each sub-matrix
tion ratings, which improved the quality of results. Azadjalal, Moradi, contains three pieces of basic information: if a user rating matches
Abdollahpouri, and Jalili (2017) proposed a trust-based CF method that the corresponding rating value of a sub-matrix, it is labeled as 1; if
identified implicit trust information by adopting a reliability measure to different, it is labeled as 0; and the unrated areas are denoted by ‘‘-’’.
improve recommendation accuracy. Wu, Yuan, Duan, and Wu (2019) In addition, to enhance the fairness, accuracy, and reliability of follow-
integrated a social recommendation model with restricted Boltzmann up model training, we design a novel data pre-processing method
machine and trust information to alleviate the data sparsity prob- in Section 4.4 to address the uneven distribution of different rating
lem. Mesas and Bellogín (2020) developed a taxonomy of techniques values. Subsequently, we adopt a classic DNN-based semantic modeling
that can embed awareness into recommendation models to determine approach, named two-tower model (Huang et al., 2013), to predict
whether a predicted item should be recommended. Margaris, Vassilakis, the unrated areas in each sub-matrix rather than the way of MF to
and Spiliotopoulos (2020) realized the uncertainty level of reviews enjoy its flexibility and non-linearity. Here, the feature representation
due to the features of natural language and calculated the reliability method MLP (He et al., 2017) is used to learn user and item latent
of results in the review-to-rating conversion procedure to ensure that embedding vectors in the two-tower model. To ensure the probabilities
the obtained ratings fully reflect user preference behaviors. Ahma- of predictions are unbiased, we normalize the summation of probability
dian, Afsharchi, and Meghdadi (2019) proposed a multi-view reliability distributions of prediction ratings on an unrated item 𝑖 for a user 𝑢
measure-based recommendation method to accurately evaluate simi- returned by each sub-matrix, and are always equal to 1. Finally, we
larity results and find the most reliable neighborhood. Su, Zheng, Ai, set a reliability threshold 𝜃, which filters out unreliable predictions
Shang, and Shen (2019) presented a similarity confidence coefficient and keeps high-quality prediction ratings that have higher reliability
to balance the reliability of similarity results and enhance the accuracy probabilities.
of neighbor selection by accounting for information asymmetry in the
similarity calculation. 3.1. Two-tower model
However, despite the positive impact of the aforementioned studies
on the development of reliable recommendations, there are still two Firstly, we only use a user ID and an item ID as the input features,
main issues that need to be addressed. First, traditional machine learn- mapping them to dense and low-dimensional vectors as the feature
ing methods like KNN and MF often fall short in achieving satisfactory embedding vectors, and each embedding vector is initialized by a
results due to limitations in accuracy, flexibility, and non-linearity. Sec- random function. The purpose of this is that with such a generic feature
ond, most existing recommendation techniques require supplementary representation, the used model can be easily modified to alleviate the
data such as social relationships and reviews, which may not be avail- cold-start problem by using user and item features. Then, a multi-layer
able in all ratings-based RSs. This poses a major challenge in addressing perception method MLP is used to learn better user and item latent
reliability concerns in such systems. To the best of our knowledge, features by fully utilizing the non-linearity and high capacity of DNNs
there are currently few recommendation models that can generate [19]. The main steps of the MLP model are presented as follows.
<prediction, reliability> pairs from DL approaches solely based on user Step 1: In the input layer for users, the feature embedding vector
rating information. This is because most DL-based recommendation of a user 𝑢 is denoted as 𝐳𝑢 .
models use implicit feedback to predict user preferences rather than Step 2: In the hidden layers (MLP layers 1, … , 𝑋 − 1), the Rectifier
explicit rating information, which leads to difficulties in assessing (𝑅𝑒𝐿𝑈 ) is used as the activation function to avoid the overfitting of the
prediction reliability. Thus, we propose a novel framework that utilizes
model, and it is proved to be the most efficient method and well-suited
a classic DL methods to provide prediction reliability for rating-based
for sparse datasets (Fuentes, Parra, Anthony, & Kreinovich, 2017; He
RSs.
et al., 2017).

3. Proposed framework 𝑅𝑒𝐿𝑈 (𝐳𝑢 ) = 𝑚𝑎𝑥(0, 𝐳𝑢 ). (1)

Then, the outputs l of all hidden layers are obtained by


Fig. 1 depicts the proposed framework used to generate a set of
reliable predictions. Firstly, a user–item rating matrix 𝑅 is divided into 𝐥1 = 𝐖𝑇1 𝐳𝑢 ,
(2)
several independent and parallel sub-matrices based on different rating 𝐥𝑛 = 𝑅𝑒𝐿𝑈 (𝐖𝑇 𝐥𝑛−1 + 𝐛𝑛 ), 𝑛 = 2, … , 𝑋 − 1,
𝑛

3
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

where 𝐖𝑛 and 𝐛𝑛 denote the weight matrix and bias vectors for the 𝑛th Table 1
Statistics for the four public datasets.
hidden layer, respectively.
Dataset #User #Item #Rating Sparsity level
Step 3: In the output layer (MLP layer 𝑋), we can obtain the latent
embedding vector of user 𝑢. It can be seen as the latent vector 𝐩𝑢 of ML-100K 943 1682 100,000 6.305%
ML-1M 6040 3952 1000,209 4.190%
user 𝑢.
Apps for Android 87,271 13,209 752,937 0.065%
Movies & TV 123,960 50,052 1,697,533 0.027%
𝐩𝑢 = 𝐥𝑋 = 𝑅𝑒𝐿𝑈 (𝐖𝑇 𝐥𝑋−1 + 𝐛𝑋 ). (3)
𝑋

Similarly, the item latent vector 𝐪𝑖 can be obtained in the same


manner. Therefore, the matching function to calculate the predicted
rating probability ∗ 𝑦𝑢𝑖 that user 𝑢 is matched by item 𝑖 is defined as 𝑟̃ ∈ {𝑟1 , 𝑟2 , … , 𝑟max } with the maximum probability as the prediction

rating 𝑝𝑢𝑖 of user 𝑢 on unrated item 𝑖 by using
𝑦𝑢𝑖 = 𝜎(𝐖𝑇𝑜𝑢𝑡 (𝐩𝑢 ⊙ 𝐪𝑖 )), (4)
𝑝𝑢𝑖 = 𝑟̃ = arg max𝑦̂𝑟𝑢𝑖 . (9)
where 𝐖𝑜𝑢𝑡 and ⊙ denote the weight matrix and the element-wise 𝑟

product of vectors, respectively; the sigmoid function 𝜎(𝑥) = 1∕(1 + 𝑒−𝑥 ) Therefore, in recommender systems, we can set a reliability thresh-
is used to limit the output to the range (0, 1). old 𝜃 > 0 to filter out a set of unreliable prediction ratings. If the
In addition, selecting an appropriate objective function for learning reliability probability 𝑦̂𝑟𝑢𝑖
̃ of a prediction rating 𝑝𝑢𝑖 is less than the

model parameters is an important part. Let 𝓁(⋅) and 𝛺(𝛩) be a loss func- threshold 𝜃, the rating 𝑝𝑢𝑖 is considered unreliable; otherwise, it is
tion and the regularizer, respectively. The generic objective function is reliable.
defined as:
∑∑ ( ) 4. Experiments
𝛷= 𝓁 𝑦𝑢𝑖 , ∗ 𝑦𝑢𝑖 + 𝜆𝛺(𝛩). (5)
𝑢∈𝑈 𝑖∈𝐼
4.1. Experimental environment
3.2. Learning the model
All experiments are conducted in the same operating environment.
For training recommender systems, two types of objective functions, The specific configuration is as follows:
point-wise and pair-wise, are widely used to learn models and obtain
• Operating system: Windows 10
optimal parameters. Point-wise objective functions concentrate on pre-
• CPU: Intel(R) Xeon(R) Gold 5218B @ 2.30 GHz
dicting accurate and reliable ratings, which is well-suited for rating
• Primary memory: 384 GB
prediction tasks (Kabbur, Ning, & Karypis, 2013). Pair-wise objective
• Development platform: Pycharm
functions pay attention to the relative order of predicted items, which
is more suitable for top-N recommendation (He et al., 2017; He, Zhang, • Development language: Python 3.8
Kan, & Chua, 2016). In this paper, we mainly focus on the rating
predictions, so a point-wise objective function is used to optimize the 4.2. Dataset preparation
used model.
Squared loss is widely used in existing point-wise functions (He We implement experiments on four public datasets, including two
et al., 2016; Wang, Fu, Hao, Tao, & Wu, 2016) and is more suitable MovieLens datasets, called ML-100K and ML-1M, and two Amazon
for explicit feedback than implicit feedback. It is formulated as product datasets, called Apps for Android (AA) and Movies & TV (MT).
∑∑ These datasets consist of user ID, item ID, and ratings, and the rating
𝓁𝑠𝑙 = 𝑤𝑢𝑖 (𝑦𝑢𝑖 − ∗ 𝑦𝑢𝑖 )2 , (6) scale is 1–5. Table 1 shows the basic information about these datasets.
𝑢∈𝑈 𝑖∈𝐼
To test the effectiveness of the proposed framework, all datasets
where 𝑤𝑢𝑖 is a hyper-parameter denoting the weight of the training were divided into two parts, one part is a training set formed by ran-
instance (𝑢, 𝑖). While the squared loss is based on the assumption domly selecting 80% of the records from the rated items of each user,
that observations obey a Gaussian distribution (Mnih & Salakhutdinov, and the other part is the remaining 20% of the data as the testing set.
2007), it may not match the binary data well. Therefore, we use the
log loss below (Kabbur et al., 2013) 4.3. Evaluation metrics
∑∑
𝓁𝑙𝑠 = − 𝑦𝑢𝑖 log ∗ 𝑦𝑢𝑖 + (1 − 𝑦𝑢𝑖 ) log(1 − ∗ 𝑦𝑢𝑖 ), (7)
𝑢∈𝑈 𝑖∈𝐼 To evaluate the performance of item recommendations, we choose
two evaluation metrics, called Hit Ratio (HR), and Normalized Dis-
to pay special attention to the binary property of implicit feedback in
counted Cumulative Gain (NDCG). These metrics are introduced as
this paper.
follows:
HR: It is used to measure whether the actual recommendation item
3.3. Predicting rating with reliability probability
in the testing set is presented on the top-k predicted recommendation
list, which indicates the item recommendation ability of a model. It is
Through independently training the MLP model for each sub-matrix,
defined as
we can obtain a set of probability distributions ∗ 𝑦𝑟 (𝑟 = 𝑟1 , 𝑟2 , … , 𝑟𝑚𝑎𝑥
1 ∑
𝑚
denoting the maximum of rating scale) of prediction ratings. In what HR@𝑘 = ℎ𝑖𝑡𝑠(𝑖), (10)
follows, we have to normalize the summation of probabilities of pre- 𝑚 𝑖=1
dictions obtained by each sub-matrix. For the probability distribution where 𝑚 denotes the number of users in the system, and ℎ𝑖𝑡𝑠(𝑖) repre-
∗ 𝑦𝑟 of the prediction rating of user 𝑢 on unrated item 𝑖, the normalized
𝑢𝑖 sents the proportion of the top-𝑘 predicted recommendation items for
probability 𝑦̂𝑟𝑢𝑖 that user 𝑢 assigns rating 𝑟 to item 𝑖 is calculated by the 𝑖th user in the set of actual recommendation items.
∗ 𝑦𝑟 NDCG: It is used to account for the position of the hit by assigning
𝑦̂𝑟𝑢𝑖 = ∑𝑟 𝑢𝑖 . (8) higher scores to hits at top ranks, which shows the item ranking
max ∗
𝑟=1
𝑦𝑟𝑢𝑖
recommendation quality of a model. It is formulated as
In this way, the probability value 𝑦̂𝑟𝑢𝑖 also represents the reliability
DCG@𝑘
that we have in predicting 𝑟. Then, we determine the rating value NDCG@𝑘 = , (11)
IDCG@𝑘

4
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

Fig. 2. Distributions of different rating values on all datasets.

∑𝑘 2𝑟𝑒𝑙𝑝 −1 ∑|𝑅𝐸𝐿| 2𝑟𝑒𝑙𝑝 −1


where DCG@𝑘 = 𝑝=1 log(𝑝+1) and IDCG@𝑘 = 𝑝=1 (𝑝+1) re-
2
log2
spectively denote the discounted cumulative gain and the ideal DCG,
respectively, and 𝑟𝑒𝑙𝑝 is used to show the recommended relevance of
an item at position 𝑝, i.e., 𝑟𝑒𝑙𝑝 = 1 if a predicted recommendation
item appears in the actual recommendation list, and 0 otherwise.
Notably, we assume that this item is recommended if the predicted
or actual rating of an item is greater than or equal to the median of
rating scale. Moreover, additional virtual items are added to satisfy the
recommendation requirement if the length of recommended items is
smaller than the required length 𝑘. In our experiments, the number of
recommended item 𝑘 is fixed to 5.

4.4. Data pre-processing

The purpose of data pre-processing is to mitigate the imbalance


exhibited in the distribution of training ratings across each sub-model.
Across all dataset, user ratings are not evenly distributed, with the
majority of ratings falling within the rating values 4 and 5, while
Fig. 3. Heatmap of probability distributions of low ratings (<= 3) on all datasets.
low ratings are comparatively rare. The distributions of all rating
values on four datasets are shown in Fig. 2. As our scheme relies
on each sub-model being trained independently using discrete ratings,
and prediction ratings are determined by comparing the probabilities
obtained by each sub-model, we need to ensure that the training data
for each sub-model is balanced. Otherwise, the unbalanced data may
significantly impact the accuracy of our predictions. Thus, we design
a data pre-processing method that adjusts the sample size of each
sub-model to ensure balance.
In the pre-processing, we keep the number of individual rating value
ultimately 1∕5 of the training set by populating the lower ratings (< 3)
and removing the higher ratings (≥ 3), which allows each sub-model
to have the same sample size while ensuring that the total number of
samples in the training set remains constant.
The approach to populating the low ratings is based on the as-
sumption that a user who prefers a rating of 1, when faced with an
item that is easily rated as 1, has a high probability that this user will
also rate the item as 1. Thus, we can add such ratings as fill samples
with a rating of 1. Our first priority is to classify users and items
into more fine-grained classes according to their preferred features.
Our previous proposed multi-criteria classification method (Guo, Deng,
Ran, Wang, & Jin, 2021) is adopted. This method can assign users
and items into six classes according to their rating preferences, as
follows: Very Weak (𝑉 𝑊 ), Weak (𝑊 ), Average (𝐴 or 𝑀), Strong (𝑆),
Very Strong (𝑉 𝑆), and Uncertain (𝑈 ) classes. Among them, uncertain
class 𝑈 is removed due to unknown preferences, and the other five
classes correspond to the rating of 1–5, respectively. The Heatmap of
Fig. 3. (continued).
probability distributions of low ratings (< 3) is shown in Fig. 3. Here,

5
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

Table 2
Parameters of normal distribution with different user preferences (mean, variance).
Class ML-100K ML-1M AA MT
𝑉𝑊 (0.05, 0.4) (0.05, 0.4) (0.05, 0.1) (0.05, 0.1)
𝑊 (0.05, 0.4) (0.05, 0.4) (0.05, 0.1) (0.05, 0.1)
𝐴 (0.1, 0.4) (0.1, 0.4) (0.1, 0.1) (0.1, 0.1)
𝑆 (0.2, 0.4) (0.2, 0.4) (0.2, 0.1) (0.2, 0.1)
𝑉𝑆 (0.2, 0.4) (0.2, 0.4) (0.2, 0.1) (0.2, 0.1)
𝑈 (0.1, 0.4) (0.1, 0.4) (0.1, 0.1) (0.1, 0.1)

4.5. Determining reliability threshold

After the model training, the reliability probabilities corresponding


to the possible ratings 1–5 are obtained, and we take the rating value
with the highest reliability probability as the prediction rating. Fig. 4
shows the reliability probability distributions on the four datasets.
It can be seen from Fig. 4 that the probability distributions roughly
conform to the normal distribution, indicating that the training results
of our scheme are reasonable and reliable.
Fig. 4. Reliability probability distributions on the four datasets. Hernando et al. (2013) argued that rating reliability and prediction
error, such as MAE, are negatively correlated. Therefore, we believe
that we can eliminate a certain percentage of unreliable prediction
ratings and retain reliable predictions for recommendation, which can
effectively improve the recommendation accuracy. However, it does
not mean that the higher the rejection rate, the better the recommen-
dation. This is because it would leave fewer and fewer items available
for recommendation, resulting in an inability to meet users’ recommen-
dation needs. Although some popular items can be recommended for
active users to address this issue, doing so will have a certain impact
on personalized recommendations.
In our experiments, we consider that a certain percentage of un-
reliable prediction ratings are removed according to the reliability
threshold, which may cause some users to be unable to receive enough
recommendation items. Thus, we propose a filling method based on
normal distribution to add some virtual samples to ensure that the
number of samples evaluated before and after removing them is the
same. Here, we assume that these virtual samples are popular rec-
ommendation items, and the probability of users preferring popular
items obeys a normal distribution in the 0–1 interval. Meanwhile, user
groups with different preferences should obey different mean–variance
normal distributions. For example, users who tend to rate low scores
Fig. 5. Performance of HR@5 and NDCG@5 𝑤.𝑟.𝑡 the rejection rate 𝑞 on four datasets. have a relatively smaller mean value of the normal distribution, and
vice versa. Similarly, the criteria for classifying user preferences are
determined by the classification method mentioned in Section 4.4.
we use a set of symbols to represent users and items with different The parameters of the normal distribution for different user preference
rating preferences, respectively. For example, 𝑈𝑉 𝑊 and 𝐼𝐴 represent group are presented in Table 2.
a set of users who prefer to rate 1 and a set of items that prefer to be Then, we set a reliability threshold 𝜃 for each dataset to select
rated as 3 by users, respectively. reliable prediction ratings. The selection rule is as follows: if the relia-
As seen in Fig. 3, for 𝑈𝑉 𝑊 and 𝐼𝑉 𝑊 , the probability of a rating of bility probability of a prediction rating is greater than the pre-defined
1 is 0.83 (highest) on the ML-100K. The filling rule is as follows: If a reliability threshold, this rating is considered reliable. Here, we use a
user 𝑢 ∈ 𝑈𝑉 𝑊 , an item 𝑖 ∈ 𝐼𝑉 𝑊 and non-interaction between them rejection rate 𝑞 to denote the proportion of unreliable predictions to be
(viz. a rating 𝑟𝑢𝑖 of user 𝑢 on item 𝑖 is null), we set 𝑟𝑢𝑖 to 1 as a fill filtered out, which directly affects the setting of reliability threshold.
sample ⟨𝑢, 𝑖, 1⟩. We fill samples first in the user–item non-interactions In our experiments, we set the rejection rate between 0 and 1 with a
with the highest probability, when the number of filled samples with step size of 0.1 to observe the variation of performance of our scheme
the rating of 1 is not enough, we then fill samples in the area with the with different rejection rates. The performance results on all datasets
second highest probability, i.e., followed by 𝑈𝑉 𝑊 and 𝐼𝑀 (0.66), and are shown in Fig. 5.
so on until the number of samples with the rating of 1 reaches 1∕5 of It can be seen from Fig. 5 that both HR@5 and NDCG@5 increase
the number of the training set. By this rule, we can fill each sub-model and then decrease on all datasets as the rejection rate 𝑞 increases, and
with corresponding samples to go for a balanced training data. the proposed scheme achieves the best evaluation results in 𝑞 = 0.1.
For other sub-models with a large number of high ratings (≥ 3), we Compared with the results when 𝑞 = 0, our scheme has a huge boost
adopt a random deletion method to remove some corresponding ratings on the datasets Amazon through filtering out appropriate proportion of
to ensure that the number of training samples is 1∕5 of the training set. predictions; the metrics HR@5 and NDCG@5 increase by about 21%
This method has been proven to be reasonable and stable through our (36%) and 9% (15%) on the Apps for Android (Movies & TV), respec-
repeated experimental validation. tively; while our scheme improves less on the datasets MovieLens, with

6
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

Table 3 • DMF (Xue et al., 2017): A deep neural networks-based MF model


Reliability threshold chosen with 10% rejection rate.
that maps users and items into a common low-dimensional space
Dataset ML-100K ML-1M AA MT and considers both implicit and explicit feedback for predictions.
Reliability threshold 𝜃 0.308 0.313 0.362 0.391 • JNCF (Chen et al., 2019): A joint neural CF model that tightly
couples deep feature learning and deep user–item interaction
modeling to optimize each other through joint training.
• HCCF (Xia et al., 2022): A state-of-the art self-supervised rec-
ommendation framework that jointly captures local and global
collaborative relations with a hypergraph-enhanced cross-view
contrastive learning architecture.
• SGCCL (Li et al., 2023): A novel siamese graph learning method
that models high-hop connectivity with robustness between nodes
and optimize them based on the consensus learning principle.

4.7. Parameter settings

To ensure the credibility and reproducibility of our experiments, we


employ the widely used grid search strategy (Caselles-Dupré, Lesaint,
& Royo-Letelier, 2018) to optimize the hyperparameters and determine
the optimal combination of parameters to achieve the best performance
on the testing set for each method. Deep neural networks (DNNs) have
a variety of representations, and we select the better-performing two-
tower model as the baseline of our scheme after comparison. For our
model, the user and item IDs are mapped into an initial embedding
of 100 dimensions, three hidden layers are employed on both sides
of the two-tower model, the number of neurons is 64, 32, and 8, the
L2 regularization is 0.01, and the model is optimized with Adam. The
training method is small batch gradient descent, and the batch size 256
and number of iterations 20 are determined by the number of samples
in the training set. For the comparison methods, they are trained with
the maximum of 20 epochs. For MF-based linear models, the number of
Fig. 6. Confusion matrix of prediction ratings.
the latent factors is fixed to 20, the L2 regularization and learning rate
are 0.01 and 0.001. The BeMF model keeps 90% of reliable predictions
the same as our scheme through the set reliability threshold. For neural
HR@5 and NDCG@5 increasing by about 4% and 1% for the ML-100K networks-based nonlinear models, we learn them with the optimizer
and about 0.6% and 0.5% for ML-1M, respectively. Adam with the batch size of 512. The user and item embedding sizes
Moreover, we also found that the improvement percentage of our of NCF and DMF models are set to 50, and the learning rate is 0.005.
scheme in the metric HR@5 is higher than in the NDCG@5, but the The DI network in the NCF model employs three hidden layers with
results of NDCG@5 metric is better than the HR@5 on four datasets; sizes of [256, 128, 64], and the DF network in the DMF employs four
and the sparser the dataset, the better the improvement effect brought hidden layers with the sizes of [512, 256, 128, 64]. For the JNCF, four
by filtering out unreliable ratings. The main reason is that our pro- layers in the DF network with the sizes of [1024, 512, 256, 128] and
posed filling strategy is more suitable for sparse datasets and has a two layers in the DI network with the sizes of [64, 32] are employed,
more positive impact on the performance improvement of model. It is the parameter of the loss function 𝛼 and learning rate are set to 0.4 and
noted from Fig. 4 that the number of predictions with high reliability 0.03, and the number of negative samples is 4. For the HCCF, two layers
probability is much higher on the sparser datasets Amazon than on for graph local embedding propagation with the learning rate of 0.001
the denser datasets MovieLens; and the sparse datasets can retain more are stacked, and the number of hyperedges and hierarchical hypergraph
prediction ratings with high reliability probability when removing the mapping layers are set as 128 and 3, respectively. The SGCCL is opti-
same proportion of predictions. Thus, the advantage of our model is mized by using VGAEs with the dimensions of user/item embeddings
more obvious under sparse environments. Table 3 shows that when the of 32 and the degree of connectivity of 2. For all experimental results,
rejection rate 𝑞 = 0.1, the corresponding reliability threshold on each the number of recommendation (𝑘) is fixed to 5.
dataset is determined to reach the best performance.
4.8. Results and analysis

4.6. Comparison methods


Unlike previous neural network-based recommendation models, the
proposed framework can generate a set of discrete rating values for a
To verify the effectiveness of our scheme, we compare its recom-
prediction task, solving the problem of rating classification. Here, we
mendation performance against the MF-based linear CFs and against
introduce a visualization method, called confusion matrix, to show the
the neural networks-based nonlinear CFs, as follows:
prediction effect of our scheme, where each row and column of the
• MF (Koren et al., 2022): The most basic model-based CF that ex- matrix represents the actual rating and the predicted rating, respec-
ploits the ALS method to update the user and item latent matrices, tively; and each element on the right diagonal in the matrix denotes
and the prediction ratings are calculated by inner product. the proportion of the number of items with the same predicted and
• BeMF (Ortega et al., 2021): A recently proposed MF model based actual ratings to the total number of items in the testing set. A higher
on the Bernoulli distribution that returns the predictions with the percentage in the confusion matrix indicates the better classification
corresponding reliability probability. performance of the model. All confusion matrices on four datasets are
• NCF (He et al., 2017): The most classic neural CF model that shown in Fig. 6. As seen in Fig. 6, our scheme has good prediction
integrates the GMF and MLP methods to learn the complex re- results, especially for high rating predictions, which indicates that our
lationship between users and items. scheme is also applicable for top-N recommendation.

7
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

Table 4
Performance comparison on four datasets.
Dataset ML-100K ML-1M AA MT
Method HR@5 NDCG@5 Time/s HR@5 NDCG@5 Time/s HR@5 NDCG@5 Time/s HR@5 NDCG@5 Time/s
MF 0.390 0.418 2 0.389 0.396 20 0.209 0.305 17 0.265 0.362 55
BeMF 0.396 0.434 4 0.446 0.428 51 0.217 0.320 42 0.381 0.431 93
NCF 0.433 0.484 8 0.457 0.481 62 0.254 0.369 81 0.307 0.419 277
DMF 0.386 0.430 24 0.388 0.429 541 0.232 0.338 1625 0.282 0.385 4249
JNCF 0.480 0.498 12 0.489 0.500 128 0.265 0.383 385 0.319 0.434 1851
HCCL 0.472 0.497 8 0.444 0.460 221 0.251 0.364 126 0.294 0.401 535
SGCCL 0.469 0.488 11 0.467 0.491 210 0.295 0.427 216 0.332 0.456 600
Ours 0.497 0.519 5 0.490 0.521 23 0.365 0.425 45 0.392 0.457 160

(1) Method validation. As seen in Table 4, our method almost per- Table 5
forms better than the comparison methods in terms of the metrics Performance comparison of methods using our basic framework (ML-100K, AA).
Method Framework With pre-processing Without pre-processing
NDCG@5 and HR@5. Among these comparison methods, the MF is the
HR@5 NDCG@5 HR@5 NDCG@5
most basic linear CF method, and it has the worst recommendation
BeMF (0.460, 0.339) (0.467, 0.403) (0.441, 0.335) (0.451, 0.385)
performance due to its inability to capture the complex relationship NCF
With Reliability
(0.497, 0.339) (0.528, 0.402) (0.481, 0.265) (0.514, 0.380)
HCCF (0.502, 0.322) (0.523, 0.444) (0.491, 0.318) (0.503, 0.437)
between users and items compared with the state-of-the-art methods, SGCCL
(𝑞 = 0.1)
(0.501, 0.422) (0.513, 0.452) (0.462, 0.389) (0.475, 0.430)
but its simplicity makes it the shortest model training time on all Ours (0.497, 0.365) (0.519, 0.425) (0.482, 0.352) (0.512, 0.401)

datasets. For the DMF and NCF methods, they are the classic neural BeMF
NCF
(0.427,
(0.479,
0.244)
0.261)
(0.449,
(0.510,
0.353)
0.377)
(0.425,
(0.468,
0.224)
0.256)
(0.441,
(0.496,
0.325)
0.371)
Without Reliability
network-based CFs, so their performance is relatively worse because of HCCF
(𝑞 = 0)
(0.487, 0.313) (0.501, 0.429) (0.485, 0.288) (0.500, 0.372)
SGCCL (0.498, 0.250) (0.512, 0.365) (0.460, 0.233) (0.472, 0.344)
insufficient feature extraction compared with the new method JNCF, Ours (0.476, 0.284) (0.509, 0.387) (0.473, 0.258) (0.505, 0.374)
but the NCF has better results than the DMF on all datasets. In par-
ticular, the DMF has the slowest runtime of all comparison methods.
For the reliability-based MF method BeMF, it can obtain higher recom-
mendation accuracy on the datasets with a large number of high ratings also shows a similar trend of improvement, and DL-based models have
(≥ 3) (such as Amazon datasets) compared with other methods, because better performance. This result justifies the usefulness of our data pre-
the BeMF prefers to predict high ratings for items, but its performance processing method for initializing models. For models with reliability,
is still worse than the neural networks-based nonlinear CFs. For the they remove unreliable predictions according to the reliability thresh-
closest competitor JNCF on the relatively dense datasets MovieLens, old, and utilize the proposed preference filling strategy to compensate
it performs the best evaluation results among all comparison methods for the rejected predictions. The overall performance of our model with
because its deep interactive joint neural network structure can fully reliability is increased by about 2.42% (HR@5) and 20.49% (NDCG@5)
extract the complex feature representations of users and items. We compared with retaining unreliable predictions. Other models are also
use the JNCF as the benchmark on the dense datasets to present similarly enhanced in this case. This shows the advantage of taking
the advantages of our framework in the evaluation metrics. On two prediction reliability into account for recommendations. The above
MovieLens datasets, our scheme improves the NDCG@5 and HR@5 by findings provide empirical evidence for the rationality and effectiveness
4.22% and 3.54% on the ML-100K, and 4.20% and 0.21% on the ML- of using our proposed framework for existing models.
1M, respectively. The SGCCL, better than another graph-based model
HCCL in terms of the recommendation metrics, is the closest to our 5. Conclusion
method on the relatively sparse datasets Amazon, but its time cost
is much larger than ours, illustrating that our method can achieve a In this paper, we present a deep neural network-based recommen-
better balance between accuracy and efficiency. Compared with the dation framework to provide the reliability probabilities of prediction
SGCCL, our method improves on average 10.39% in two accuracy ratings. Similar to CFs, the proposed scheme relies solely on user rating
metrics. This indicates that our scheme has better evaluation results information to obtain the prediction reliability, which ensures good
on the sparse datasets Amazon than on the dense MovieLens. The generality and scalability. Moreover, to mitigate the issue of uneven
improved performance of our scheme is due to our proposed data pre- distribution of ratings in each sub-model during training, we design a
processing method, which addresses the issue of unbalanced data that novel data pre-processing method that equates the training sample size
can affect prediction accuracy. This method, combined with the ability of each sub-model. This method effectively enhances the validity and
to remove unreliable predictions and fill popular reliable items, makes fairness of the model training. The main contribution is to eliminate
our scheme particularly effective in sparse environments. As a result, predictions with low reliability probabilities and keep those with high
our framework is better suitable for these environments than other reliability for recommendations, improving the accuracy and reliability
comparison methods. of recommendation results. The experiments on four public datasets
(2) Framework validation. To demonstrate the utility of the proposed indicate that the proposed scheme is superior to other comparison
basic framework, we conduct an ablation study to observe the impact methods in terms of top-N recommendation. Additionally, we also
on the performance of several representative comparison methods out- demonstrate the ability to make strong predictions in sparse datasets.
lined in Section 4.6 when integrated into our framework. Here, we This study faced two major challenges. One was the experimental
compare four versions of the selected models — with and without datasets’ extremely uneven rating distribution, causing an unequal
pre-processing as well as with and without reliability. The findings number of ratings trained in each sub-model, leading to reduced ac-
on two datasets with different sparsity levels (ML-100K and AA) are curacy and reliability of the model training. To address this issue,
showcased in Table 5, all comparison method using our proposed we design a data pre-processing method. We have verified in our
framework (with pre-processing and reliability) achieve better perfor- experiments that this method works and improves the effectiveness
mance in all cases. Notably, graph-based models HCCF and SGCCL in of the used models in our basic framework. Another is that although
our framework obtain the more favorable recommendation results over we filtered out unreliable predictions to improve the reliability of the
all others. The relative improvements of our model with pre-processing recommendation results, it would leave fewer and fewer items available
are 1.48% and 5.81% for the datasets ML-100K and AA, respectively. for recommendations, resulting in an inability to meet users’ recom-
The recommendation accuracy of other models with pre-processing mendation needs. To handle this issue, we proposed a filling method

8
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

based on normal distribution, which added samples to ensure the He, X., Zhang, H., Kan, M.-Y., & Chua, T.-S. (2016). Fast matrix factorization for online
number of prediction ratings tested before and after filtering remained recommendation with implicit feedback. In Proceedings of the 39th international ACM
SIGIR conference on research and development in information retrieval (pp. 549–558).
constant, improving the ability of recommendations.
Hernando, A., Bobadilla, J., Ortega, F., & Tejedor, J. (2013). Incorporating reliability
The limitations of this study are that, on the one hand, although
measurements into the predictions of a recommender system. Information Sciences,
the filling rating method achieves a good recommendation result, it 218, 1–16.
has a relatively high computational cost and a certain randomness. Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep
On the other hand, our framework is only applicable to RSs with structured semantic models for web search using clickthrough data. In Proceedings
explicit feedback (user ratings) and currently does not work for implicit of the 22nd ACM international conference on information & knowledge management
(pp. 2333–2338).
feedback (e.g., clicks and purchases).
Jurik, M. (1992). Neurocomputing: Foundations of research. Artificial Intelligence,
There are two areas for potential future work. Firstly, our scheme 53(2–3).
currently only takes into consideration the absolute rating values of Kabbur, S., Ning, X., & Karypis, G. (2013). FISM: Factored item similarity models for
users while disregarding any user preference biases. As a result, it is top-n recommender systems. In Proceedings of the 19th ACM SIGKDD international
imperative that we reclassify the sub-matrices, taking into account user conference on knowledge discovery and data mining (pp. 659–667).
preference features, so as to enhance the accuracy of prediction results. Kim, D., Park, C., Oh, J., Lee, S., & Yu, H. (2016). Convolutional matrix factorization for
document context-aware recommendation. In Proceedings of the 10th ACM conference
Secondly, we will design a new scheme, such as using the soft-argmax
on recommender systems (pp. 233–240).
function in our framework, to automatically learn the unequal weights Koren, Y., Rendle, S., & Bell, R. (2022). Advances in collaborative filtering.
of each sub-model, regardless of the unbalanced number of ratings in Recommender Systems Handbook, 91–142.
each sub-matrix. Li, B., Guo, T., Zhu, X., Li, Q., Wang, Y., & Chen, F. (2023). SGCCL: Siamese graph
contrastive consensus learning for personalized recommendation. In Proceedings
of the sixteenth ACM international conference on web search and data mining (pp.
CRediT authorship contribution statement
589–597).
Ma, Y., Geng, X., & Wang, J. (2020). A deep neural network with multiplex interactions
Jiangzhou Deng: Conceptualization, Investigation, Formal anal- for cold-start service recommendation. IEEE Transactions on Engineering Management,
ysis, Writing – original draft, Writing – review & editing, Funding 68(1), 105–119.
acquisition, Project administration. Hongtao Li: Software, Resources, Margaris, D., Vassilakis, C., & Spiliotopoulos, D. (2020). What makes a review a reliable
rating in recommender systems? Information Processing & Management, 57(6), Article
Data curation, Formal analysis, Writing – original draft, Writing –
102304.
review & editing, Visualization. Junpeng Guo: Conceptualization, Val- Mesas, R. M., & Bellogín, A. (2020). Exploiting recommendation confidence in decision-
idation, Writing – review & editing. Leo Yu Zhang: Validation, Writing aware recommender systems. Journal of Intelligent Information Systems, 54(1),
– review & editing. Yong Wang: Validation, Writing – review & editing, 45–78.
Funding acquisition, Supervision. Mnih, A., & Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. In Advances
in neural information processing systems. Vol. 20.
Moradi, P., & Ahmadian, S. (2015). A reliability-based recommendation method to
Data availability
improve trust-aware recommender systems. Expert Systems with Applications, 42(21),
7386–7398.
Data will be made available on request. Ni, J., Huang, Z., Cheng, J., & Gao, S. (2021). An effective recommendation model
based on deep representation learning. Information Sciences, 542, 324–342.
References Ortega, F., Lara-Cabrera, R., González-Prieto, Á., & Bobadilla, J. (2021). Providing reli-
ability in recommender systems through Bernoulli Matrix Factorization. Information
Sciences, 553, 110–128.
Ahmadian, S., Afsharchi, M., & Meghdadi, M. (2019). A novel approach based on
Shen, R.-P., Zhang, H.-R., Yu, H., & Min, F. (2019). Sentiment based matrix factorization
multi-view reliability measures to alleviate data sparsity in recommender systems.
with reliability for recommendation. Expert Systems with Applications, 135, 249–258.
Multimedia Tools and Applications, 78(13), 17763–17798.
Su, Z., Zheng, X., Ai, J., Shang, L., & Shen, Y. (2019). Link prediction in recommender
Azadjalal, M. M., Moradi, P., Abdollahpouri, A., & Jalili, M. (2017). A trust-aware
systems with confidence measures. Chaos. An Interdisciplinary Journal of Nonlinear
recommendation method based on Pareto dominance and confidence concepts.
Science, 29(8), Article 083133.
Knowledge-Based Systems, 116, 130–143.
Bobadilla, J., Gutiérrez, A., Ortega, F., & Zhu, B. (2018). Reliability quality measures Sun, J., Guo, W., Zhang, D., Zhang, Y., Regol, F., Hu, Y., et al. (2020). A framework
for recommender systems. Information Sciences, 442, 145–157. for recommending accurate and diverse items using bayesian graph convolutional
Caselles-Dupré, H., Lesaint, F., & Royo-Letelier, J. (2018). Word2Vec applied to recom- neural networks. In Proceedings of the 26th ACM SIGKDD international conference on
mendation: Hyperparameters matter. In Proceedings of the 12th ACM conference on knowledge discovery & data mining (pp. 2030–2039).
recommender systems (pp. 352–356). Wang, M., Fu, W., Hao, S., Tao, D., & Wu, X. (2016). Scalable semi-supervised learning
Chen, W., Cai, F., Chen, H., & Rijke, M. D. (2019). Joint neural collaborative filtering by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data
for recommender systems. ACM Transactions on Information Systems (TOIS), 37(4), Engineering, 28(7), 1864–1877.
1–30. Wang, X., He, X., Wang, M., Feng, F., & Chua, T.-S. (2019). Neural graph collaborative
Deng, Z.-H., Huang, L., Wang, C.-D., Lai, J.-H., & Philip, S. Y. (2019). Deepcf: A filtering. In Proceedings of the 42nd international ACM SIGIR conference on research
unified framework of representation learning and matching function learning in and development in information retrieval (pp. 165–174).
recommender system. In Proceedings of the AAAI conference on artificial intelligence. Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent
Vol. 33. No. 01 (pp. 61–68). gas market model. Neural Networks, 1(4), 339–356.
Duan, R., Jiang, C., & Jain, H. K. (2022). Combining review-based collaborative filtering Wu, X., Yuan, X., Duan, C., & Wu, J. (2019). A novel collaborative filtering algo-
and matrix factorization: A solution to rating’s sparsity problem. Decision Support rithm of machine learning by integrating restricted Boltzmann machine and trust
Systems, 156, Article 113748. information. Neural Computing and Applications, 31(9), 4685–4692.
Fuentes, O., Parra, J., Anthony, E. Y., & Kreinovich, V. (2017). Why rectified Xia, L., Huang, C., Xu, Y., Zhao, J., Yin, D., & Huang, J. (2022). Hypergraph contrastive
linear neurons are efficient: Symmetry-based, complexity-based, and fuzzy-based collaborative filtering. In Proceedings of the 45th international ACM SIGIR conference
explanations. on research and development in information retrieval (pp. 70–79).
Gohari, F. S., Aliee, F. S., & Haghighi, H. (2018). A new confidence-based recom- Xu, R., Li, J., Li, G., Pan, P., Zhou, Q., & Wang, C. (2022). SDNN: Symmetric deep
mendation approach: Combining trust and certainty. Information Sciences, 422, neural networks with lateral connections for recommender systems. Information
21–50. Sciences, 595, 217–230.
Guo, J., Deng, J., Ran, X., Wang, Y., & Jin, H. (2021). An efficient and accurate recom- Xue, H.-J., Dai, X., Zhang, J., Huang, S., & Chen, J. (2017). Deep matrix factorization
mendation strategy using degree classification criteria for item-based collaborative models for recommender systems. In Proceedings of the twenty-sixth international joint
filtering. Expert Systems with Applications, 164, Article 113756. conference on artificial intelligence. Vol. 17 (pp. 3203–3209). Melbourne, Australia.
Guo, G., Zhang, J., & Thalmann, D. (2014). Merging trust in collaborative filtering to Xue, F., He, X., Wang, X., Xu, J., Liu, K., & Hong, R. (2019). Deep item-based
alleviate data sparsity and cold start. Knowledge-Based Systems, 57, 57–68. collaborative filtering for top-n recommendation. ACM Transactions on Information
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural collaborative Systems (TOIS), 37(3), 1–25.
filtering. In Proceedings of the 26th international conference on world wide web (pp. Yu, T., Guo, J., Li, W., Wang, H. J., & Fan, L. (2019). Recommendation with diversity:
173–182). An adaptive trust-aware model. Decision Support Systems, 123, Article 113073.

9
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627

Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning based recommender system: Zhu, B., Ortega, F., Bobadilla, J., & Gutiérrez, A. (2018). Assigning reliability values to
A survey and new perspectives. ACM Computing Surveys, 52(1), 1–38. recommendations using matrix factorization. Journal of Computational Science, 26,
Zhu, Y., Lin, Q., Lu, H., Shi, K., Qiu, P., & Niu, Z. (2021). Recommending scientific 165–177.
paper via heterogeneous knowledge embedding based attentive recurrent neural Ziarani, R. J., & Ravanmehr, R. (2021). Deep neural network approach for a serendipity-
networks. Knowledge-Based Systems, 215, Article 106744. oriented recommendation system. Expert Systems with Applications, 185, Article
115660.

10

You might also like