Computers and Industrial Engineering - 2023 - Providing Prediction Reliability Through Deep Neural Networks For Recommender Systems
Computers and Industrial Engineering - 2023 - Providing Prediction Reliability Through Deep Neural Networks For Recommender Systems
Keywords: Deep learning-based recommendation approaches have shown significant improvement in the accuracy of
Reliability recommender systems (RSs). However, beyond accuracy, reliability measures are gaining attention to evaluate
Data pre-processing the validity of predictions and enhance user satisfaction. Such measures can ensure that the recommended
Deep neural networks
items are high-scoring items with high reliability. To integrate the native concept of reliability into a deep
Recommender systems
learning model, this paper proposes a deep neural network-based recommendation framework with prediction
reliability. This framework filters out unreliable prediction ratings according to a pre-defined reliability
threshold, ensuring the credibility and reliability of top-N recommendation. The proposed framework relies
solely on user ratings for reliability, making it highly generalizable and scalable. Additionally, we design a
data pre-processing method to address the issue of uneven distribution of ratings before model training, which
effectively improves the effectiveness and fairness. The experiments on four benchmark datasets demonstrate
that the proposed scheme is superior to other comparison methods in evaluation metrics. Furthermore, our
framework performs better on sparse datasets than on dense datasets, indicating its ability to make strong
predictions even with insufficient information.
1. Introduction Recently, deep learning (DL) approaches have been introduced into
RSs to overcome the limitations of MF-based CFs and significantly
Recommender systems (RSs) are effective information filtering tech- improve recommendation accuracy (Chen, Cai, Chen, & Rijke, 2019).
niques that help online users navigate through the vast amount of DL-based recommendation algorithms primarily adopt deep neural net-
complex information to find products or services that they are inter- works (DNNs) to explore auxiliary information, such as textual de-
ested in. It effectively copes with the information overload problem and scriptions of products and image features of videos, and automatically
improves user satisfaction. Collaborative filtering (CF) method, which model their latent feature representations from given inputs (Zhang,
provide personalized recommendations for users, have been widely Yao, Sun, & Tay, 2019). For instance, He et al. (2017) developed
applied to RSs due to their simplicity and efficiency. Typically, CFs are a general neural networks-based CF framework called NCF to learn
divided into two classes: memory-based and model-based CFs (Duan, user–item interactions instead of using an inner product. Xue et al. pre-
Jiang, & Jain, 2022). However, many studies have confirmed that sented an advanced item-based CF model based on DNNs to effectively
the performance of model-based CFs is generally superior to that of learn the higher-order relations among items for top-N recommenda-
memory-based CFs due to their accuracy and scalability. In model- tions (Xue et al., 2019). Chen et al. (2019) proposed a joint neural
based CFs, matrix factorization (MF) is the most prevailing latent factor CF model (JNCF) that seamlessly integrates deep features of users
model (LFM) (Koren, Rendle, & Bell, 2022) to predict user preferences and items with deep modeling of user–item interactions. Xu et al.
using a linear kernel, i.e., a dot product of user and item latent feature (2022) introduced symmetric DNNs with lateral connections to capture
vectors. Nevertheless, this approach may not capture the complex links complex mapping relations and low-rank relations between users and
of user–item interactions effectively.
✩ This work is supported by the National Natural Science Foundation of China (No. 72301050, 72171165, and 62272077), the MOE Layout Foundation of
Humanities and Social Sciences (No. 20YJAZH102 and 21YJA630021), the Natural Science Foundation of Chongqing (No. cstc2021jcyj-msxmX0557), and the
Science and Technology Research Program of Chongqing Municipal Education Commission (No. KJQN202300605).
∗ Corresponding author.
E-mail addresses: [email protected] (J. Deng), [email protected] (H. Li), [email protected] (J. Guo), [email protected] (L.Y. Zhang),
[email protected] (Y. Wang).
https://doi.org/10.1016/j.cie.2023.109627
Received 2 January 2023; Received in revised form 16 August 2023; Accepted 18 September 2023
Available online 20 September 2023
0360-8352/© 2023 Elsevier Ltd. All rights reserved.
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
items jointly. However, to the best of our knowledge, available DL- 3. A pre-processing solution is designed to eliminate uneven rating
based recommendation approaches completely ignore the reliability distribution, effectively resolving the problem of an unequal
of prediction results, which can significantly impact recommendation number of ratings in each sub-matrix caused by rating preference
quality. Bobadilla, Gutiérrez, Ortega, and Zhu (2018) emphasized the bias while improving validity and fairness of model training.
importance of ‘‘reliable’’ recommendations. For instance, an RS may 4. Experiments on four popular datasets indicate that our scheme
recommend a hotel with 5 stars to an active user, but the user may is better than other comparison methods in various evaluation
not be fully convinced by the recommendation result. This is because metrics.
most RSs can provide additional information, such as the number of
users who have rated the hotel and social trust relationships among The remainder of this paper is organized as follows. In Section 2,
users, to allow users to infer the reliability of recommended items. previous studies relevant to DL-based recommendations and reliability
Typically, people prefer to choose an item with an average vote of 4.7 measures are introduced. Section 3 presents our design framework to
stars and 1000 ratings rather than an item with an average vote of 5 generate a set of predictions with the corresponding reliability prob-
stars and only 10 ratings. More user ratings can enhance the reliability ability. In Section 4, experiments are conducted to demonstrate the
of recommended items. effectiveness of our framework. The conclusions and future work are
Providing related reliability values for prediction ratings can ef- discussed in Section 5.
fectively handle or filter out the low reliability predictions in bulk,
resulting in improved accuracy in RSs. Moreover, reliability informa- 2. Related works
tion is a powerful tool to enhance users’ trustworthiness and loyalty to
a RS due to its explainability. Reliability values also provide adequate In this section, we briefly introduce previous studies that are rele-
information to modulate the probability of predictions (Ortega, Lara- vant to DL-based recommendation approaches and reliability measures
Cabrera, González-Prieto, & Bobadilla, 2021). To our knowledge, only for RSs.
several proposed approaches based on KNN and MF models have
focused on the reliability of recommendations. For instance, Hernando, 2.1. DL-based recommendations
Bobadilla, Ortega, and Tejedor (2013) introduced a memory-based CF
framework that uses a reliability measurement associated with the With the comprehensive and in-depth development of DL techniques
predictions to improve the performance of RSs. Moradi and Ahmadian across various industries, DL approaches have also drawn a lot of
(2015) proposed a reliability-based trust-aware CF approach that builds attention in the area of RSs. They have brought opportunities to address
trust networks of users by fusing similarities and trust statements, and the data sparsity problem and improve the quality of recommendations.
then measures the quality of prediction ratings to enhance the recom- Unlike traditional neural networks, such as artificial neural networks
mendation accuracy. Bobadilla et al. (2018) designed two reliability and multi-layer perceptron (MLP) (Jurik, 1992; Werbos, 1988), deep
measures for various recommendation algorithms to achieve better learning-based neural networks approaches, such as convolutional neu-
accuracy results. Ortega et al. (2021) proposed a matrix factorization ral networks (CNNs) (Kim, Park, Oh, Lee, & Yu, 2016), recurrent neural
based on the Bernoulli distribution named BeMF to obtain both predic- works (RNNs) (Zhu et al., 2021), and deep structured semantic model
tion ratings and their corresponding reliability values. However, it is (DSSM) (Huang et al., 2013) (also known as two-tower model), empha-
crucial to consider the reliability of recommended items in DL-based size model depth and the importance of feature learning, which enables
recommendation methods to ensure their credibility to users, as most more effective portrayal of intrinsic information of data, thereby en-
DL-based recommendations only focus on the ranking of recommended hancing the accuracy and reliability of model training results. In recent
items. years, many studies have proposed various DL-based recommendation
Based on the above discussion and analysis, a DNNs-based recom- models that improve the performance of RSs.
mendation framework with prediction reliability is proposed to help Ziarani and Ravanmehr (2021) combined a CNN with the Particle
RSs filter out unreliable prediction ratings and ensure the credibility Swarm Optimization (PSO) method for generating serendipitous rec-
and reliability of top-N recommendations. The main advantages of ommendations. Deng, Huang, Wang, Lai, and Philip (2019) proposed
the model are as follows: (1) Universality. It does not depend on a simple but effective framework called DeepCF, based on the vanilla
additional information (i.e., social information and reviews) other than MLP model, that flexibly addresses complex matching problems and
ratings to provide prediction reliability for widely used RS datasets. learns low-rank user–item relations effectively. Xue, Dai, Zhang, Huang,
Furthermore, it avoids using external reliability measures to evaluate and Chen (2017) introduced deep matrix factorization (DMF), a neural
the reliability of predictions since the related reliability values are network architecture that maps users and items into a common low-
intrinsically linked to the used DL model. (2) Non-linearity. Compared dimensional space by utilizing non-linear operations. To effectively
with the traditional MF methods that integrate the user and item latent capture deep semantic features of users and items, Ni, Huang, Cheng,
feature vectors linearly, the DNN model can capture the complexity and Gao (2021) proposed a deep representation learning-based rec-
structure of user–item interactions effectively. (3) Flexibility. The pro- ommendation model (RM-DRL) that makes full use of auxiliary item
posed scheme can independently calculate the probability that a user information. To tackle the cold-start problem, Ma, Geng, and Wang
assigns any discrete rating on the rating scale to an item, hence making (2020) incorporated three types of interactions between services and
it completely general and flexible from the perspective of probability. mashups into a DNN to construct a multiplex interaction-oriented
Our main contributions are summarized as follows: service recommendation model (MISR). Wang, He, Wang, Feng, and
Chua (2019) designed a neural graph collaborative filtering (NGCF) to
1. A user–item interaction matrix is divided into several binary sub- effectively integrate the bipartite graph structure into the embedding
matrices according to different rating values, and a two-tower process in an explicit manner. Sun et al. (2020) developed a Bayesian
neural network is adopted to train each sub-matrix in parallel graph CNN framework to handle misleading positive interactions in an
and obtain probabilities that a user assigns different ratings to implicit manner. Xia et al. (2022) proposed a self-supervised recom-
the same item. mendation framework Hypergraph Contrastive Collaborative Filtering
2. The proposed scheme is based on the deep learning classification (HCCF) to jointly capture local and global collaborative relations. Li
technique rather than the regression methods, capturing more et al. (2023) advocated a Siamese Graph Contrastive Consensus Learn-
information, and allowing for the aggregation of probabilities to ing (SGCCL) framework, to explore intrinsic correlations and alleviate
obtain normalized reliability values. the bias effects for personalized recommendation. Despite the success
2
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
3
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
where 𝐖𝑛 and 𝐛𝑛 denote the weight matrix and bias vectors for the 𝑛th Table 1
Statistics for the four public datasets.
hidden layer, respectively.
Dataset #User #Item #Rating Sparsity level
Step 3: In the output layer (MLP layer 𝑋), we can obtain the latent
embedding vector of user 𝑢. It can be seen as the latent vector 𝐩𝑢 of ML-100K 943 1682 100,000 6.305%
ML-1M 6040 3952 1000,209 4.190%
user 𝑢.
Apps for Android 87,271 13,209 752,937 0.065%
Movies & TV 123,960 50,052 1,697,533 0.027%
𝐩𝑢 = 𝐥𝑋 = 𝑅𝑒𝐿𝑈 (𝐖𝑇 𝐥𝑋−1 + 𝐛𝑋 ). (3)
𝑋
product of vectors, respectively; the sigmoid function 𝜎(𝑥) = 1∕(1 + 𝑒−𝑥 ) Therefore, in recommender systems, we can set a reliability thresh-
is used to limit the output to the range (0, 1). old 𝜃 > 0 to filter out a set of unreliable prediction ratings. If the
In addition, selecting an appropriate objective function for learning reliability probability 𝑦̂𝑟𝑢𝑖
̃ of a prediction rating 𝑝𝑢𝑖 is less than the
model parameters is an important part. Let 𝓁(⋅) and 𝛺(𝛩) be a loss func- threshold 𝜃, the rating 𝑝𝑢𝑖 is considered unreliable; otherwise, it is
tion and the regularizer, respectively. The generic objective function is reliable.
defined as:
∑∑ ( ) 4. Experiments
𝛷= 𝓁 𝑦𝑢𝑖 , ∗ 𝑦𝑢𝑖 + 𝜆𝛺(𝛩). (5)
𝑢∈𝑈 𝑖∈𝐼
4.1. Experimental environment
3.2. Learning the model
All experiments are conducted in the same operating environment.
For training recommender systems, two types of objective functions, The specific configuration is as follows:
point-wise and pair-wise, are widely used to learn models and obtain
• Operating system: Windows 10
optimal parameters. Point-wise objective functions concentrate on pre-
• CPU: Intel(R) Xeon(R) Gold 5218B @ 2.30 GHz
dicting accurate and reliable ratings, which is well-suited for rating
• Primary memory: 384 GB
prediction tasks (Kabbur, Ning, & Karypis, 2013). Pair-wise objective
• Development platform: Pycharm
functions pay attention to the relative order of predicted items, which
is more suitable for top-N recommendation (He et al., 2017; He, Zhang, • Development language: Python 3.8
Kan, & Chua, 2016). In this paper, we mainly focus on the rating
predictions, so a point-wise objective function is used to optimize the 4.2. Dataset preparation
used model.
Squared loss is widely used in existing point-wise functions (He We implement experiments on four public datasets, including two
et al., 2016; Wang, Fu, Hao, Tao, & Wu, 2016) and is more suitable MovieLens datasets, called ML-100K and ML-1M, and two Amazon
for explicit feedback than implicit feedback. It is formulated as product datasets, called Apps for Android (AA) and Movies & TV (MT).
∑∑ These datasets consist of user ID, item ID, and ratings, and the rating
𝓁𝑠𝑙 = 𝑤𝑢𝑖 (𝑦𝑢𝑖 − ∗ 𝑦𝑢𝑖 )2 , (6) scale is 1–5. Table 1 shows the basic information about these datasets.
𝑢∈𝑈 𝑖∈𝐼
To test the effectiveness of the proposed framework, all datasets
where 𝑤𝑢𝑖 is a hyper-parameter denoting the weight of the training were divided into two parts, one part is a training set formed by ran-
instance (𝑢, 𝑖). While the squared loss is based on the assumption domly selecting 80% of the records from the rated items of each user,
that observations obey a Gaussian distribution (Mnih & Salakhutdinov, and the other part is the remaining 20% of the data as the testing set.
2007), it may not match the binary data well. Therefore, we use the
log loss below (Kabbur et al., 2013) 4.3. Evaluation metrics
∑∑
𝓁𝑙𝑠 = − 𝑦𝑢𝑖 log ∗ 𝑦𝑢𝑖 + (1 − 𝑦𝑢𝑖 ) log(1 − ∗ 𝑦𝑢𝑖 ), (7)
𝑢∈𝑈 𝑖∈𝐼 To evaluate the performance of item recommendations, we choose
two evaluation metrics, called Hit Ratio (HR), and Normalized Dis-
to pay special attention to the binary property of implicit feedback in
counted Cumulative Gain (NDCG). These metrics are introduced as
this paper.
follows:
HR: It is used to measure whether the actual recommendation item
3.3. Predicting rating with reliability probability
in the testing set is presented on the top-k predicted recommendation
list, which indicates the item recommendation ability of a model. It is
Through independently training the MLP model for each sub-matrix,
defined as
we can obtain a set of probability distributions ∗ 𝑦𝑟 (𝑟 = 𝑟1 , 𝑟2 , … , 𝑟𝑚𝑎𝑥
1 ∑
𝑚
denoting the maximum of rating scale) of prediction ratings. In what HR@𝑘 = ℎ𝑖𝑡𝑠(𝑖), (10)
follows, we have to normalize the summation of probabilities of pre- 𝑚 𝑖=1
dictions obtained by each sub-matrix. For the probability distribution where 𝑚 denotes the number of users in the system, and ℎ𝑖𝑡𝑠(𝑖) repre-
∗ 𝑦𝑟 of the prediction rating of user 𝑢 on unrated item 𝑖, the normalized
𝑢𝑖 sents the proportion of the top-𝑘 predicted recommendation items for
probability 𝑦̂𝑟𝑢𝑖 that user 𝑢 assigns rating 𝑟 to item 𝑖 is calculated by the 𝑖th user in the set of actual recommendation items.
∗ 𝑦𝑟 NDCG: It is used to account for the position of the hit by assigning
𝑦̂𝑟𝑢𝑖 = ∑𝑟 𝑢𝑖 . (8) higher scores to hits at top ranks, which shows the item ranking
max ∗
𝑟=1
𝑦𝑟𝑢𝑖
recommendation quality of a model. It is formulated as
In this way, the probability value 𝑦̂𝑟𝑢𝑖 also represents the reliability
DCG@𝑘
that we have in predicting 𝑟. Then, we determine the rating value NDCG@𝑘 = , (11)
IDCG@𝑘
4
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
5
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
Table 2
Parameters of normal distribution with different user preferences (mean, variance).
Class ML-100K ML-1M AA MT
𝑉𝑊 (0.05, 0.4) (0.05, 0.4) (0.05, 0.1) (0.05, 0.1)
𝑊 (0.05, 0.4) (0.05, 0.4) (0.05, 0.1) (0.05, 0.1)
𝐴 (0.1, 0.4) (0.1, 0.4) (0.1, 0.1) (0.1, 0.1)
𝑆 (0.2, 0.4) (0.2, 0.4) (0.2, 0.1) (0.2, 0.1)
𝑉𝑆 (0.2, 0.4) (0.2, 0.4) (0.2, 0.1) (0.2, 0.1)
𝑈 (0.1, 0.4) (0.1, 0.4) (0.1, 0.1) (0.1, 0.1)
6
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
7
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
Table 4
Performance comparison on four datasets.
Dataset ML-100K ML-1M AA MT
Method HR@5 NDCG@5 Time/s HR@5 NDCG@5 Time/s HR@5 NDCG@5 Time/s HR@5 NDCG@5 Time/s
MF 0.390 0.418 2 0.389 0.396 20 0.209 0.305 17 0.265 0.362 55
BeMF 0.396 0.434 4 0.446 0.428 51 0.217 0.320 42 0.381 0.431 93
NCF 0.433 0.484 8 0.457 0.481 62 0.254 0.369 81 0.307 0.419 277
DMF 0.386 0.430 24 0.388 0.429 541 0.232 0.338 1625 0.282 0.385 4249
JNCF 0.480 0.498 12 0.489 0.500 128 0.265 0.383 385 0.319 0.434 1851
HCCL 0.472 0.497 8 0.444 0.460 221 0.251 0.364 126 0.294 0.401 535
SGCCL 0.469 0.488 11 0.467 0.491 210 0.295 0.427 216 0.332 0.456 600
Ours 0.497 0.519 5 0.490 0.521 23 0.365 0.425 45 0.392 0.457 160
(1) Method validation. As seen in Table 4, our method almost per- Table 5
forms better than the comparison methods in terms of the metrics Performance comparison of methods using our basic framework (ML-100K, AA).
Method Framework With pre-processing Without pre-processing
NDCG@5 and HR@5. Among these comparison methods, the MF is the
HR@5 NDCG@5 HR@5 NDCG@5
most basic linear CF method, and it has the worst recommendation
BeMF (0.460, 0.339) (0.467, 0.403) (0.441, 0.335) (0.451, 0.385)
performance due to its inability to capture the complex relationship NCF
With Reliability
(0.497, 0.339) (0.528, 0.402) (0.481, 0.265) (0.514, 0.380)
HCCF (0.502, 0.322) (0.523, 0.444) (0.491, 0.318) (0.503, 0.437)
between users and items compared with the state-of-the-art methods, SGCCL
(𝑞 = 0.1)
(0.501, 0.422) (0.513, 0.452) (0.462, 0.389) (0.475, 0.430)
but its simplicity makes it the shortest model training time on all Ours (0.497, 0.365) (0.519, 0.425) (0.482, 0.352) (0.512, 0.401)
datasets. For the DMF and NCF methods, they are the classic neural BeMF
NCF
(0.427,
(0.479,
0.244)
0.261)
(0.449,
(0.510,
0.353)
0.377)
(0.425,
(0.468,
0.224)
0.256)
(0.441,
(0.496,
0.325)
0.371)
Without Reliability
network-based CFs, so their performance is relatively worse because of HCCF
(𝑞 = 0)
(0.487, 0.313) (0.501, 0.429) (0.485, 0.288) (0.500, 0.372)
SGCCL (0.498, 0.250) (0.512, 0.365) (0.460, 0.233) (0.472, 0.344)
insufficient feature extraction compared with the new method JNCF, Ours (0.476, 0.284) (0.509, 0.387) (0.473, 0.258) (0.505, 0.374)
but the NCF has better results than the DMF on all datasets. In par-
ticular, the DMF has the slowest runtime of all comparison methods.
For the reliability-based MF method BeMF, it can obtain higher recom-
mendation accuracy on the datasets with a large number of high ratings also shows a similar trend of improvement, and DL-based models have
(≥ 3) (such as Amazon datasets) compared with other methods, because better performance. This result justifies the usefulness of our data pre-
the BeMF prefers to predict high ratings for items, but its performance processing method for initializing models. For models with reliability,
is still worse than the neural networks-based nonlinear CFs. For the they remove unreliable predictions according to the reliability thresh-
closest competitor JNCF on the relatively dense datasets MovieLens, old, and utilize the proposed preference filling strategy to compensate
it performs the best evaluation results among all comparison methods for the rejected predictions. The overall performance of our model with
because its deep interactive joint neural network structure can fully reliability is increased by about 2.42% (HR@5) and 20.49% (NDCG@5)
extract the complex feature representations of users and items. We compared with retaining unreliable predictions. Other models are also
use the JNCF as the benchmark on the dense datasets to present similarly enhanced in this case. This shows the advantage of taking
the advantages of our framework in the evaluation metrics. On two prediction reliability into account for recommendations. The above
MovieLens datasets, our scheme improves the NDCG@5 and HR@5 by findings provide empirical evidence for the rationality and effectiveness
4.22% and 3.54% on the ML-100K, and 4.20% and 0.21% on the ML- of using our proposed framework for existing models.
1M, respectively. The SGCCL, better than another graph-based model
HCCL in terms of the recommendation metrics, is the closest to our 5. Conclusion
method on the relatively sparse datasets Amazon, but its time cost
is much larger than ours, illustrating that our method can achieve a In this paper, we present a deep neural network-based recommen-
better balance between accuracy and efficiency. Compared with the dation framework to provide the reliability probabilities of prediction
SGCCL, our method improves on average 10.39% in two accuracy ratings. Similar to CFs, the proposed scheme relies solely on user rating
metrics. This indicates that our scheme has better evaluation results information to obtain the prediction reliability, which ensures good
on the sparse datasets Amazon than on the dense MovieLens. The generality and scalability. Moreover, to mitigate the issue of uneven
improved performance of our scheme is due to our proposed data pre- distribution of ratings in each sub-model during training, we design a
processing method, which addresses the issue of unbalanced data that novel data pre-processing method that equates the training sample size
can affect prediction accuracy. This method, combined with the ability of each sub-model. This method effectively enhances the validity and
to remove unreliable predictions and fill popular reliable items, makes fairness of the model training. The main contribution is to eliminate
our scheme particularly effective in sparse environments. As a result, predictions with low reliability probabilities and keep those with high
our framework is better suitable for these environments than other reliability for recommendations, improving the accuracy and reliability
comparison methods. of recommendation results. The experiments on four public datasets
(2) Framework validation. To demonstrate the utility of the proposed indicate that the proposed scheme is superior to other comparison
basic framework, we conduct an ablation study to observe the impact methods in terms of top-N recommendation. Additionally, we also
on the performance of several representative comparison methods out- demonstrate the ability to make strong predictions in sparse datasets.
lined in Section 4.6 when integrated into our framework. Here, we This study faced two major challenges. One was the experimental
compare four versions of the selected models — with and without datasets’ extremely uneven rating distribution, causing an unequal
pre-processing as well as with and without reliability. The findings number of ratings trained in each sub-model, leading to reduced ac-
on two datasets with different sparsity levels (ML-100K and AA) are curacy and reliability of the model training. To address this issue,
showcased in Table 5, all comparison method using our proposed we design a data pre-processing method. We have verified in our
framework (with pre-processing and reliability) achieve better perfor- experiments that this method works and improves the effectiveness
mance in all cases. Notably, graph-based models HCCF and SGCCL in of the used models in our basic framework. Another is that although
our framework obtain the more favorable recommendation results over we filtered out unreliable predictions to improve the reliability of the
all others. The relative improvements of our model with pre-processing recommendation results, it would leave fewer and fewer items available
are 1.48% and 5.81% for the datasets ML-100K and AA, respectively. for recommendations, resulting in an inability to meet users’ recom-
The recommendation accuracy of other models with pre-processing mendation needs. To handle this issue, we proposed a filling method
8
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
based on normal distribution, which added samples to ensure the He, X., Zhang, H., Kan, M.-Y., & Chua, T.-S. (2016). Fast matrix factorization for online
number of prediction ratings tested before and after filtering remained recommendation with implicit feedback. In Proceedings of the 39th international ACM
SIGIR conference on research and development in information retrieval (pp. 549–558).
constant, improving the ability of recommendations.
Hernando, A., Bobadilla, J., Ortega, F., & Tejedor, J. (2013). Incorporating reliability
The limitations of this study are that, on the one hand, although
measurements into the predictions of a recommender system. Information Sciences,
the filling rating method achieves a good recommendation result, it 218, 1–16.
has a relatively high computational cost and a certain randomness. Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep
On the other hand, our framework is only applicable to RSs with structured semantic models for web search using clickthrough data. In Proceedings
explicit feedback (user ratings) and currently does not work for implicit of the 22nd ACM international conference on information & knowledge management
(pp. 2333–2338).
feedback (e.g., clicks and purchases).
Jurik, M. (1992). Neurocomputing: Foundations of research. Artificial Intelligence,
There are two areas for potential future work. Firstly, our scheme 53(2–3).
currently only takes into consideration the absolute rating values of Kabbur, S., Ning, X., & Karypis, G. (2013). FISM: Factored item similarity models for
users while disregarding any user preference biases. As a result, it is top-n recommender systems. In Proceedings of the 19th ACM SIGKDD international
imperative that we reclassify the sub-matrices, taking into account user conference on knowledge discovery and data mining (pp. 659–667).
preference features, so as to enhance the accuracy of prediction results. Kim, D., Park, C., Oh, J., Lee, S., & Yu, H. (2016). Convolutional matrix factorization for
document context-aware recommendation. In Proceedings of the 10th ACM conference
Secondly, we will design a new scheme, such as using the soft-argmax
on recommender systems (pp. 233–240).
function in our framework, to automatically learn the unequal weights Koren, Y., Rendle, S., & Bell, R. (2022). Advances in collaborative filtering.
of each sub-model, regardless of the unbalanced number of ratings in Recommender Systems Handbook, 91–142.
each sub-matrix. Li, B., Guo, T., Zhu, X., Li, Q., Wang, Y., & Chen, F. (2023). SGCCL: Siamese graph
contrastive consensus learning for personalized recommendation. In Proceedings
of the sixteenth ACM international conference on web search and data mining (pp.
CRediT authorship contribution statement
589–597).
Ma, Y., Geng, X., & Wang, J. (2020). A deep neural network with multiplex interactions
Jiangzhou Deng: Conceptualization, Investigation, Formal anal- for cold-start service recommendation. IEEE Transactions on Engineering Management,
ysis, Writing – original draft, Writing – review & editing, Funding 68(1), 105–119.
acquisition, Project administration. Hongtao Li: Software, Resources, Margaris, D., Vassilakis, C., & Spiliotopoulos, D. (2020). What makes a review a reliable
rating in recommender systems? Information Processing & Management, 57(6), Article
Data curation, Formal analysis, Writing – original draft, Writing –
102304.
review & editing, Visualization. Junpeng Guo: Conceptualization, Val- Mesas, R. M., & Bellogín, A. (2020). Exploiting recommendation confidence in decision-
idation, Writing – review & editing. Leo Yu Zhang: Validation, Writing aware recommender systems. Journal of Intelligent Information Systems, 54(1),
– review & editing. Yong Wang: Validation, Writing – review & editing, 45–78.
Funding acquisition, Supervision. Mnih, A., & Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. In Advances
in neural information processing systems. Vol. 20.
Moradi, P., & Ahmadian, S. (2015). A reliability-based recommendation method to
Data availability
improve trust-aware recommender systems. Expert Systems with Applications, 42(21),
7386–7398.
Data will be made available on request. Ni, J., Huang, Z., Cheng, J., & Gao, S. (2021). An effective recommendation model
based on deep representation learning. Information Sciences, 542, 324–342.
References Ortega, F., Lara-Cabrera, R., González-Prieto, Á., & Bobadilla, J. (2021). Providing reli-
ability in recommender systems through Bernoulli Matrix Factorization. Information
Sciences, 553, 110–128.
Ahmadian, S., Afsharchi, M., & Meghdadi, M. (2019). A novel approach based on
Shen, R.-P., Zhang, H.-R., Yu, H., & Min, F. (2019). Sentiment based matrix factorization
multi-view reliability measures to alleviate data sparsity in recommender systems.
with reliability for recommendation. Expert Systems with Applications, 135, 249–258.
Multimedia Tools and Applications, 78(13), 17763–17798.
Su, Z., Zheng, X., Ai, J., Shang, L., & Shen, Y. (2019). Link prediction in recommender
Azadjalal, M. M., Moradi, P., Abdollahpouri, A., & Jalili, M. (2017). A trust-aware
systems with confidence measures. Chaos. An Interdisciplinary Journal of Nonlinear
recommendation method based on Pareto dominance and confidence concepts.
Science, 29(8), Article 083133.
Knowledge-Based Systems, 116, 130–143.
Bobadilla, J., Gutiérrez, A., Ortega, F., & Zhu, B. (2018). Reliability quality measures Sun, J., Guo, W., Zhang, D., Zhang, Y., Regol, F., Hu, Y., et al. (2020). A framework
for recommender systems. Information Sciences, 442, 145–157. for recommending accurate and diverse items using bayesian graph convolutional
Caselles-Dupré, H., Lesaint, F., & Royo-Letelier, J. (2018). Word2Vec applied to recom- neural networks. In Proceedings of the 26th ACM SIGKDD international conference on
mendation: Hyperparameters matter. In Proceedings of the 12th ACM conference on knowledge discovery & data mining (pp. 2030–2039).
recommender systems (pp. 352–356). Wang, M., Fu, W., Hao, S., Tao, D., & Wu, X. (2016). Scalable semi-supervised learning
Chen, W., Cai, F., Chen, H., & Rijke, M. D. (2019). Joint neural collaborative filtering by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data
for recommender systems. ACM Transactions on Information Systems (TOIS), 37(4), Engineering, 28(7), 1864–1877.
1–30. Wang, X., He, X., Wang, M., Feng, F., & Chua, T.-S. (2019). Neural graph collaborative
Deng, Z.-H., Huang, L., Wang, C.-D., Lai, J.-H., & Philip, S. Y. (2019). Deepcf: A filtering. In Proceedings of the 42nd international ACM SIGIR conference on research
unified framework of representation learning and matching function learning in and development in information retrieval (pp. 165–174).
recommender system. In Proceedings of the AAAI conference on artificial intelligence. Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent
Vol. 33. No. 01 (pp. 61–68). gas market model. Neural Networks, 1(4), 339–356.
Duan, R., Jiang, C., & Jain, H. K. (2022). Combining review-based collaborative filtering Wu, X., Yuan, X., Duan, C., & Wu, J. (2019). A novel collaborative filtering algo-
and matrix factorization: A solution to rating’s sparsity problem. Decision Support rithm of machine learning by integrating restricted Boltzmann machine and trust
Systems, 156, Article 113748. information. Neural Computing and Applications, 31(9), 4685–4692.
Fuentes, O., Parra, J., Anthony, E. Y., & Kreinovich, V. (2017). Why rectified Xia, L., Huang, C., Xu, Y., Zhao, J., Yin, D., & Huang, J. (2022). Hypergraph contrastive
linear neurons are efficient: Symmetry-based, complexity-based, and fuzzy-based collaborative filtering. In Proceedings of the 45th international ACM SIGIR conference
explanations. on research and development in information retrieval (pp. 70–79).
Gohari, F. S., Aliee, F. S., & Haghighi, H. (2018). A new confidence-based recom- Xu, R., Li, J., Li, G., Pan, P., Zhou, Q., & Wang, C. (2022). SDNN: Symmetric deep
mendation approach: Combining trust and certainty. Information Sciences, 422, neural networks with lateral connections for recommender systems. Information
21–50. Sciences, 595, 217–230.
Guo, J., Deng, J., Ran, X., Wang, Y., & Jin, H. (2021). An efficient and accurate recom- Xue, H.-J., Dai, X., Zhang, J., Huang, S., & Chen, J. (2017). Deep matrix factorization
mendation strategy using degree classification criteria for item-based collaborative models for recommender systems. In Proceedings of the twenty-sixth international joint
filtering. Expert Systems with Applications, 164, Article 113756. conference on artificial intelligence. Vol. 17 (pp. 3203–3209). Melbourne, Australia.
Guo, G., Zhang, J., & Thalmann, D. (2014). Merging trust in collaborative filtering to Xue, F., He, X., Wang, X., Xu, J., Liu, K., & Hong, R. (2019). Deep item-based
alleviate data sparsity and cold start. Knowledge-Based Systems, 57, 57–68. collaborative filtering for top-n recommendation. ACM Transactions on Information
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural collaborative Systems (TOIS), 37(3), 1–25.
filtering. In Proceedings of the 26th international conference on world wide web (pp. Yu, T., Guo, J., Li, W., Wang, H. J., & Fan, L. (2019). Recommendation with diversity:
173–182). An adaptive trust-aware model. Decision Support Systems, 123, Article 113073.
9
J. Deng et al. Computers & Industrial Engineering 185 (2023) 109627
Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning based recommender system: Zhu, B., Ortega, F., Bobadilla, J., & Gutiérrez, A. (2018). Assigning reliability values to
A survey and new perspectives. ACM Computing Surveys, 52(1), 1–38. recommendations using matrix factorization. Journal of Computational Science, 26,
Zhu, Y., Lin, Q., Lu, H., Shi, K., Qiu, P., & Niu, Z. (2021). Recommending scientific 165–177.
paper via heterogeneous knowledge embedding based attentive recurrent neural Ziarani, R. J., & Ravanmehr, R. (2021). Deep neural network approach for a serendipity-
networks. Knowledge-Based Systems, 215, Article 106744. oriented recommendation system. Expert Systems with Applications, 185, Article
115660.
10