Bayesian Probabilistic Matrix Factorization With Social Relations and Item Contents
Bayesian Probabilistic Matrix Factorization With Social Relations and Item Contents
net/publication/257015907
CITATIONS READS
111 851
3 authors, including:
All content following this page was uploaded by Juntao Liu on 04 February 2018.
Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents
for recommendation
Juntao Liu a, b, Caihua Wu c, Wenyu Liu a,⁎
a
Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
b
Department of Computer Engineering, Mechanical Engineering Institute, Shijiazhuang 050003, China
c
Information Combat Commanding Teaching and Research Section, Information Countermeasure Department, Air Force Radar Academy, Wuhan 430010, China
a r t i c l e i n f o a b s t r a c t
Article history: Recommendation systems have received great attention for their commercial value in today's online business
Received 3 September 2012 world. However, most recommendation systems encounter the data sparsity problem and the cold-start
Received in revised form 27 March 2013 problem. To improve recommendation accuracy in this circumstance, additional sources of information
Accepted 4 April 2013
about the users and items should be incorporated in recommendation systems. In this paper, we modify
Available online xxxx
the model in Bayesian Probabilistic Matrix Factorization, and propose two recommendation approaches fus-
Keywords:
ing social relations and item contents with user ratings in a novel way. The proposed approach is computa-
Recommendation system tionally efficient and can be applied to trust-aware or content-aware recommendation systems with very
Collaborative filtering large dataset. Experimental results on three real world datasets show that our method gets more accurate
Social network recommendation results with faster converging speed than other matrix factorization based methods. We
Item contents also verify our method in cold-start settings, and our method gets more accurate recommendation results
Matrix factorization than the compared approaches.
Tags Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.
0167-9236/$ – see front matter. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2013.04.002
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
2 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx
of Bayesian Probabilistic Matrix Factorization (BPMF) [23] in a product of the user latent matrix and the item matrix and the regression
novel way which is different from regularization-based methods of user and item side information. Adams et al. [1] modified the BPMF
and factorization-based methods, and propose two novel recommen- model. The observed rating matrix, the user latent matrix and the
dation methods cooperating with social relations and item contents. item latent matrix are represented by time-varying functions. The
In BPMF, the hyperparameters of user feature vectors are the same variation processes of user latent matrix and item matrix are represent-
for all users. But in practice, users' preferences vary greatly, so the ed by Gaussian processes. Lu et al. [11] proposed a matrix factorization
hyperparameters of different users should be different. Generating based recommendation method to predict the variation of ratings with
user feature vectors by uniform user hyperparameters in BPMF may time. Two regularization terms, spatial term and temporal term,
lead to some recommendation errors. In this paper we modify the are added into the objective function. Gemulla et al. [5] proposed a
model in BPMF, and suggest that the hyperparameters of different stratified stochastic gradient descent (SSGD) algorithm to solve the
user vectors are different. We then propose the Bayesian Probabilis- general matrix factorization problem, and gave sufficient conditions
tic Matrix Factorization with Social Relations (BPMFSR) recommen- for convergence. Luo et al. [12] proposed an incremental collabora-
dation method. To alleviate data sparsity problem and cold-start tive filtering recommendation method based on the regularized ma-
problem we fuse social relations in this method. We argue that the trix factorization.
posterior distribution of user hyperparameters should be conditioned Generally, the main challenges for recommendation systems are the
on the feature vectors of trusted users. The underlying assumption is data sparsity problem and the cold-start problem [2]. To address these
that the users' preference may be influenced by their friends. On the problems, in recent years, researchers proposed some matrix factoriza-
other hand, similar to the uniform user hyperparameters, uniform tion based recommendation methods fusing social relations among
item hyperparameters in BPMF also lead to some recommendation er- users with rating data, which can help to improve the performance of
rors. To address this problem, we extend our BPMFSR method by fusing recommender systems. These methods can be divided into two types:
item contents information and propose an improved algorithm, Bayesian regularization-based methods and factorization-based methods.
Probabilistic Matrix Factorization with Social Relations and Item Con- Regularization-based methods typically add regularization term
tents (BPMFSRIC). In BPMFSRIC, we assume that the hyperparameters to the loss function and minimize it. For examples, recommendation
of different item vectors are different, and the posterior distribution of method proposed by Hao Ma et al. [16] adds social regularization term
item feature vector parameters is conditioned on the feature vectors to the loss function, which measures the difference between the latent
of linked items. The links among items can be extracted by item contents feature vector of a user and those of his (or her) friends. Local minimum
information, such as tags, categories and properties. The BPMFSR can of the loss function is found by gradient-based method. Jamali and Ester
be applied to trust-aware recommender systems. If item contents are [6] proposed a probability model similar to the model 1 in [16]. Relation
known additionally, BPMFSRIC can improve recommendation accuracy regularized matrix factorization (RRMF) [9] method proposed by Li and
further. The proposed method is computationally efficient, and can be Yeung adds the graph Laplacian regularization term of social relations
applied to large-scale real life datasets. Experimental results on Douban into the loss function and minimizes the loss function by alternative
dataset [16], Epinions dataset [17] and Last.fm dataset [4] show that the projection algorithm. Zhu et al. [28] used the same model in [9] and
accuracy of our method outperforms other methods based on matrix built graph Laplacian of social relations using three kinds of kernel
factorization. We also verify our method in cold-start settings, and our functions. The minimization problem is formulated as low-rank semi-
method gets better results than other methods. definite program (LRSDP) and is solved by the method proposed in [3].
The rest of this paper is organized as follows. In Section 2, a survey Regularization-based methods always minimize the difference between
of major recommendation methods based on matrix factorization is the latent feature vector of a user and those of his (or her) friends and
provided. Section 3 introduces PMF and BPMF briefly. The proposed give weights to the regularization terms to tradeoff between factori-
method is described in Section 4. The experimental results are presented zation error and regularization terms. The weights should be tuned
and analyzed in Section 5 followed by the conclusions and further work manually to avoid over fitting. So the drawback of this kind of methods
in Section 6. is the same as that of PMF.
In factorization-based methods, social relations are represented as
2. Related work social relation matrix, which is factored as well as the rating matrix.
The loss function is the weighted sum of the social relation matrix factor-
Rating-based recommendation methods are generally divided into ization error and the rating matrix factorization error. For example, SoRec
collaborative filtering (CF) methods and content-based (CB) methods [13] factorizes social relation matrix and rating matrix simultaneously.
[19]. Matrix factorization based methods are one kind of collaborative Social relation matrix is approximated as the product of the user latent
filtering methods. In this section we review several recommendation feature matrix and the factor feature matrix. SoRec can also be extended
methods based on matrix factorization. to fuse social tags and item tags with rating information [15]. Yuan et al.
Lim and Teh [10] proposed Variational Bayesian based Matrix Fac- [27] argued that factorization-based methods outperform regularization-
torization (VBMF) for movie recommendation. Nakajima and Sugiyama based methods for fusing membership information, and proposed a
[18] analyzed VBMF theoretically. Probabilistic Matrix Factorization method fusing membership and friendship by factorization and re-
(PMF) proposed by Salakhutdinov and Mnih [22] models the predictive gularization, respectively. Factorization-based methods encounter the
error of matrix factorization as Gaussian distribution. Gradient descent same problem as regularization-based methods: the weight of the social
algorithm is used to find the local maximal of the posterior probability relation matrix factorization error and the weight of the rating matrix
over user and item latent matrices with parameters. PMF gets accurate factorization error should be tuned to avoid over fitting, which is compu-
results on Netflix dataset. The main shortcoming of PMF is that careful tationally expensive especially on large-scale datasets. To avoid parame-
parameter tuning is needed to avoid over fitting. This leads to high ter tuning, Singh and Gordon proposed Hierarchical Bayesian Collective
computational complexity on large datasets. Bayesian Probabilistic Matrix Factorization (HBCMF) [26], in which two relation matrices are
Matrix Factorization (BPMF) [23] overcomes this drawback by using factored. The model of HBCMF is very similar to that of BPMF, and a
Markov Chain Monte Carlo (MCMC) method that gets more accurate block Metropolis–Hastings algorithm is used to sample from this model.
predictive results. As far as we know, BPMF method outperforms most In this paper, we propose two novel recommendation methods
of the recommendation methods based on matrix factorization. Shan cooperating with social relations and item contents. Our methods are
and Banerjee [5] extended PMF and BPMF and proposed a series of different from previous methods because the way we use social rela-
general PMF (GPMF) methods. Porteous et al. [20] fused side informa- tions and item contents is not factorization-based and regularization-
tion into BPMF model. In their model, the ratings are estimated by the based. To fuse social relations and item contents, we modify the
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 3
model in BPMF. The differences between our methods and PMF, and vectors. D is the dimension of user feature vector and item feature
BPMF are shown in Fig. 1. Our methods not only avoid parameter vector, which is much less than M and N. In Probabilistic Matrix Factor-
tuning just as BPMF, but also improve recommendation accuracy ization (PMF) [22] method, the conditional probability of observed
and converge speed. ratings matrix R is modeled as:
M N h iI
p RU; V; σ ¼ ∏ ∏ N Rij U i V j ; σ
2 T 2 ij
3. Preliminaries ð1Þ
i¼1 j¼1
a b
c d
v
Fig. 1. Graphical models for PMF (a), BPMF (b), BPMFSR (c) and BPMFSRIC (d). The main difference between BPMF and BPMFSR is that BPMFSR generates user hyperparameters
separately for every user vectors, while BPMF uses uniform user hyperparameters. BPMFSRIC not only generates user hyperparameters separately but also generate item
hyperparameters separately.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
4 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx
BPMF assumes that user hyperparameters ΘU = {μU,ΛU} and item 4.1. Bayesian Probabilistic Matrix Factorization with Social Relation
hyperparameters ΘV = {μV,ΛV} follow Gaussian-Wishart distribution.
The prior distributions of user hyperparameters and item hyper- It is unreasonable that hyperparameters ΘU are the same for
parameters are given by: different users in BPMF, which may cause some recommendation
errors. To address this problem, we assume every user has its
−1 own hyperparameters. By this assumption, Eq. (5) should be mod-
p ΘU jΘ0 ¼ pðμ U jΛU Þ p ðΛU Þ ¼ N μ U jμ 0 ; β0 ΛU W ΛU W 0 ; v0 ð7Þ
ified as:
−1 M
−1
p ΘV jΘ0 ¼ p μ V jΛV Þ p ðΛV Þ ¼ N μ V jμ 0 ; β0 ΛV W ΛV W 0 ; v0 ð8Þ pðU Þ ¼ ∏ N U i jμ U;i ; ΛU;i ð11Þ
i¼1
where W(w|W0, v0) is the Wishart distribution with v0 degrees of freedom where ΘU,i = {μU,i,ΛU,i} are the hyperparameters for user feature vector Ui.
and a D × D scale matrix W0. Θ0 = {μ0,v0,W0}. The graphical model for The posterior distribution over user feature vector Ui described
BPMF is shown in Fig. 1(b). BPMF uses Gibbs algorithm to sample user in Eq. (9) should also be modified. In fact, the posterior distribution
and item feature vectors. It is assumed that the posterior distribution over user feature vector Ui is conditioned on the item feature matrix
over user feature vector Ui, which is conditioned on item feature matrix V, the observed rating matrix R and its own hyperparameters ΘU,i =
V, observed rating matrix R and hyperparameters, is Gaussian: {μU,i,ΛU,i} under our assumption. The posterior distribution over Ui is
given as:
p U i jR; V; ΘU ; α ¼ N U i jμ i ; Λi −1
−1
N h iI ð9Þ
T
∼∏ N Rij jU i V j ; α
−1 ij
pðU i jμ U ; ΛU Þ p U i jR; V; ΘU;i ; α ¼ N U i jμ U;i ; ΛU;i
j¼1
N h iI ð12Þ
T −1 ij
∼∏ N Rij jU i V j ; α p U i jμ U;i ; ΛU;i
N
where Λ i∗ = ΛU + α ∑ N T
j = 1Vj Vj) and μ i ¼ Λi −1 α∑j¼1 V j Rij Iij þ j¼1
μ U ΛU .
I −1
N T ij N I
The conditional distribution over user hyperparameters condi- where ΛU;i ¼ ΛU;i þ α∑j¼1 V j V j and μ U;i ¼ Λi α∑j¼1 V j Rij ij þ
tioned on the user feature matrix U is given as: μ U;i ΛU;i :
−1
Because a user's preference is influenced by his (or her) friends, we
p μ U ; ΛU jU; Θ0 ¼ N μ U jμ 0 ; β0 ΛU W ΛU jW 0 ; v0 ð10Þ
suppose that the conditional distribution over user hyperparameters is
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 5
conditioned on feature vectors of user's friends. By this assumption, 4.2. Bayesian Probabilistic Matrix Factorization with Social Relation and
Eq. (10) should be modified as: Item Contents
β 0 μ 0 þ N j V ðj Þ
μ V;j ¼ ; βV;j ¼ β0 þ Nj ; vV;j ¼ v0 þ Nj ;
β0 þNj
!
β0 Nj T −1
W V;j ¼ W 0 −1 þ Nj S V;j þ μ 0 −V ðjÞ μ 0 −V ðjÞ ;
β0 þ Nj
T
1
V ðjÞ ¼ ∑ V k ; S V;j ¼ N1 ∑ V k −V ðjÞ V k −V ðjÞ ; N j ¼ C j
N j k∈C j
k∈C j
j
Table 1
Statistics of datasets.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
6 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx
This method is called Bayesian Probabilistic Matrix Factorization vectors and sampling all item feature vectors is O(K), where K is the
with Social Relations and Item Contents (BPMFSRIC). Graphical model number of nonzero entries in rating matrix R. So the computational
for BPMFSRIC is shown in Fig. 1(d). In this model, item hyperparameters complexity of one iteration in Algorithms 1 and 2 is O(K), which indi-
are generated for each item, as well as user hyperparameters. We also cates that the computational complexity of our method is linear with
use Gibbs sampler to sample user feature vectors and item feature vec- respect to the number of observed ratings. This complexity analysis
tors, which is given in Algorithm 2. Similar to Algorithm 1, Algorithm 2 shows that our methods are very efficient and can scale up with re-
samples item hyperparameters for each item if there are more than D spect to very large datasets.
items linking to it. Compared with BPMF, the computational complexity of our methods
is slightly higher in one iteration, because Algorithms 1 and 2 sample
user hyperparameters and item hyperparameters using Eqs. (13) and
Algorithm 2. Gibbs sampling for Bayesian Probabilistic Matrix (16) additionally. However, the computational complexity of sampling
Factorization with Social Relationship and Item Contents (BPMFSRIC) all user hyperparameters and sampling all item hyperparameters is
much less than that of sampling user feature vectors and sampling
item feature vectors. Furthermore, the experiments in Section 5 show
that our methods converge faster than BPMF. Considering both compu-
tational complexity in one iteration and converge speed, our methods
can achieve the same recommendation accuracy in few iterations and
in less time.
5. Experiments
4.3. Complexity analysis
In this section, we present the experimental results on three large
The main computation of Algorithms 1 and 2 is sampling user fea- scale datasets to compare our method with other recommendation
ture vectors using Eq. (12) and sampling item feature vectors using methods based on matrix factorization. We also verify our methods in
Eq. (15). The computational complexity of sampling all user feature cold-start settings.
Table 2
Predictive accuracy comparison on Douban Dataset (the values in brackets are the results reported in [16]).
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 7
Table 3
Predictive accuracy comparison on Epinions dataset.
5.1. Datasets and Fusion in Recommender Systems (HetRec 2011) [4]. It contains
1892 users, 17,632 artists and 11,946 tags. Different from Douban dataset
We evaluate our method on Douban dataset [16], Epinions dataset and Epinions dataset, Last.fm dataset only records listening count that
[17] and Last.fm dataset [4]. Douban (http://www.douban.com) dataset, each user listened to artists. Listening count ranges from 1 to 352,698.
crawled by Hao Ma et al. [16], contains 16,830,839 ratings of 129,490 To test our methods on Last.fm, we map listening counts into integer
users on 58,541 movies and 1,692,952 friend links between these users. values of 1 to 5 to represent the extent of favor of artists by the similar
For more details, please see [16]. way in [7]. The mapping formula is given as:
Epinions (http://www.epinions.com) dataset [17] contains 49,290
users and 139,783 items. These users issued 664,824 ratings and
487,181 trust statements. Note that the Epinions dataset collected by r¼ ⌊log10 l⌋ þ 1; if ⌊log10 l⌋ þ 1≤5 ð17Þ
5; otherwise
Massa and Avesani [17] used in this paper is different from that used in
[16]. Epinions dataset collected by Massa and Avesani [17] contains
more items, ratings, friend links and fewer users. where l is the listening count, r is the mapped value, and ⌊⋅⌋ is the operator
Last.fm (http://www.lastfm.com) dataset is released in the frame- of rounding towards zero. To test BPMFSRIC method, we build the links
work of the 2nd International Workshop on Information Heterogeneity according to tags. If two artists received a same tag more than 5 times,
Fig. 2. Testing RMSE of all methods on Douban dataset, the x-axis shows the number of epoch and y-axis shows testing RMSE.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
8 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx
a) 60% training data, D = 10 b) 80% training data, D = 10 c) 90% training data, D =10
Fig. 3. Testing RMSE of all methods on Epinions dataset, the x-axis shows the number of epoch and y-axis shows testing RMSE.
we link these two artists. Last.fm contains 12,717 friend links, 92,834 5.2. Comparisons
listening counts and 186,479 tag statements.
The statistics of these datasets are summarized in Table 1. We compare Hao Ma's method [16] and BPMF [23] with our methods.
We use Mean Absolute Error (MAE) and Root Mean Square Error Hao Ma's method is implemented using model 2 with PCC similarity (see
(RMSE) to measure prediction accuracy of recommendation methods. [16] for details). It is reported that this implementation gets the highest
MAE is defined as: prediction accuracy and outperforms NMF [8], PMF [22], RSTE [14] and
other state-of-art. The initial solutions of our methods are the same as
1 ^ j those of BPMF. In our methods, μ0, v0, and W0 are set to be the same
MAE ¼ ∑jR −R ð18Þ
T i;j ij ij
values as those in BPMF. The experiments repeat 10 times. Mean and
standard deviation of MAE and RMSE are calculated.
^ is the prediction of
where Rij is the rating given by user i for item j, and R ij For Douban dataset, we randomly select 40%, 60% and 80% ratings as
Rij. T is the total number of tested ratings. RMSE is defined as: training data, and use the rest of ratings to test the algorithms. For
Epinions dataset, we use 40%, 60%, 80% and 90% ratings as training
data. Because Douban dataset and Epinions dataset don't contain item
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi contents information, which is needed by BPMFSRIC, we don't test
1 ^ 2:
RMSE ¼ ∑ Rij −R ij ð19Þ BPMFSRIC on these two datasets. The experimental results are shown
T i;j
in Tables 2 and 3. We give the mean and standard deviation of RMSE
Table 4
Predictive accuracy comparison on Last.fm dataset.
10 40% MAE 0.4806 ± 0.0032 0.3359 ± 0.0018 0.3338 ± 0.0012 0.3341 ± 0.0007
RMSE 0.6936 ± 0.0047 0.4655 ± 0.0036 0.4613 ± 0.0028 0.4604 ± 0.0026
60% MAE 0.4539 ± 0.0070 0.3270 ± 0.0014 0.3261 ± 0.0015 0.3278 ± 0.0012
RMSE 0.6529 ± 0.0104 0.4502 ± 0.0023 0.4489 ± 0.0023 0.4502 ± 0.0021
80% MAE 0.4310 ± 0.0056 0.3234 ± 0.0012 0.3222 ± 0.0014 0.3237 ± 0.0016
RMSE 0.6210 ± 0.0080 0.4465 ± 0.0018 0.4449 ± 0.0021 0.4461 ± 0.0020
30 40% MAE 0.4849 ± 0.0020 0.3378 ± 0.0006 0.3354 ± 0.0004 0.3345 ± 0.0005
RMSE 0.6977 ± 0.0019 0.4686 ± 0.0018 0.4630 ± 0.0017 0.4596 ± 0.0013
60% MAE 0.4691 ± 0.0027 0.3283 ±0.0016 0.3277 ± 0.0019 0.3286 ± 0.0017
RMSE 0.6708 ± 0.0038 0.4529 ± 0.0021 0.4515 ± 0.0022 0.4508 ± 0.0020
80% MAE 0.4492 ± 0.0013 0.3241 ± 0.0012 0.3235 ± 0.0012 0.3244 ± 0.0014
RMSE 0.6418 ± 0.0026 0.4467 ± 0.0024 0.4452 ± 0.0024 0.4451 ± 0.0026
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 9
Fig. 4. Testing RMSE of all methods on Last.fm dataset, the x-axis shows the number of epoch and y-axis shows testing RMSE.
and MAE for each experiment. The best results which are statistically Ma's method and BPMF on Epinions datasets in the term of improved
significant (at the 5% significance level) are set to be bold. According RMSE and MAE. On Douban dataset, in the cases of 60% and 80% training
to the results, it can be observed that our method outperforms Hao data with D = 10, BPMF get lower RMSE than BPMFSR. MAE and RMSE
Table 5
Predictive accuracy comparison in cold-start settings on Douban dataset.
Table 6
Predictive accuracy comparison in cold-start settings on Epinions dataset.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
10 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx
Fig. 5. Testing RMSE of all methods on Douban dataset for cold-start user setting, the x-axis shows the number of iteration and y-axis shows testing RMSE.
generated by BPMF and BPMFSR on Douban dataset are very close. The BPMFSR alleviates the data sparsity problem better than other methods.
advantages of BPMFSR are more obvious on Epinions dataset. Epinions We also notice that the increasing of user and item feature vector di-
dataset is much sparser than Douban dataset, so we can say that mension doesn't improve the predictive accuracy of Hao Ma's method
Fig. 6. Testing RMSE of all methods on Epinions dataset for cold-start user setting, the x-axis shows the number of epoch and y-axis shows testing RMSE.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 11
Table 7
Predictive accuracy comparison in cold-start settings on Last.fm dataset.
D (dimension) Cold-start users/items Metrics Hao Ma's method BPMF BPMFSR BPMFSRIC
10 10%/10% MAE 0.4973 ± 0.0155 0.4213 ± 0.0183 0.4209 ± 0.0186 0.4178 ± 0.0169
RMSE 0.7150 ± 0.0180 0.6039 ± 0.0225 0.6009 ± 0.0239 0.5960 ± 0.0191
5%/5% MAE 0.4860 ± 0.0187 0.4127 ± 0.0183 0.4106 ± 0.0183 0.4082 ± 0.0198
RMSE 0.7027 ± 0.0232 0.5945 ± 0.0260 0.5885 ± 0.0276 0.5857 ± 0.0271
30 10%/10% MAE 0.4883 ± 0.0095 0.4174 ± 0.0105 0.4143 ± 0.0086 0.4131 ± 0.0087
RMSE 0.7091 ± 0.0101 0.6017 ± 0.0114 0.5972 ± 0.0100 0.5954 ± 0.0097
5%/5% MAE 0.4800 ± 0.0183 0.4093 ± 0.0165 0.4055 ± 0.0159 0.4033 ± 0.0160
RMSE 0.6970 ± 0.0211 0.5878 ± 0.0212 0.5831 ± 0.0199 0.5786 ± 0.0203
and BPMF, while BPMFSR gets more accurate results with higher di- by Eq. (17). Tag statements are used to build links between items (artists).
mension. This indicates that Hao Ma's method and BPMF may over fit If two artists received a same tag more than 5 times, we link these two art-
in high dimension. ists. We randomly select 40%, 60% and 80% ratings as training data, and
Note that the results reported in [16], shown in the brackets in Table 2, use the rest of ratings as testing data. The experiments repeat 10 times,
are slightly lower than the results obtained by our implementation. Be- and mean and standard deviation of MAE and RMSE are calculated. The
cause we cannot get the code used in [16], we attribute the discrepancy experimental results are shown in Table 4. It can be observed that our
to some implementation details unmentioned in [16]. Although the methods get lower mean of MAE and RMSE than Hao Ma's method and
results reported in [16] and those obtained by us are slightly different, BPMF in all cases. When dimension is high, BPMFSRIC gets lower mean
we can draw the same conclusion that our method outperforms Hao of MAE and RMSE than BPMFSR.
Ma's method on Douban dataset in the case of D = 10 even using the Fig. 4 shows testing RMSE generated by all methods at every
results reported in [16]. epoch on Last.fm dataset. Trends of convergence are similar to those
Figs. 2 and 3 show testing RMSE generated by all methods at in Figs. 2 and 3. Furthermore, BPMFSRIC and BPMFSR converge faster
every epoch on Douban dataset and Epinions dataset. It can be than BPMF especially in the cases of 40% training data.
observed that Hao Ma's method over fits after several epochs, while
BPMF and BPMFSR do not over fit at all. RMSE values generated by 5.3. Comparisons in cold-start settings
BPMFSR are higher than those generated by BPMF at the beginning,
but after about 10 epochs BPMFSR outperforms BPMF. This indicates We test our methods in cold-start settings. For Douban dataset we
that BPMFSR converges faster than BPMF. randomly select 60% and 40% users as cold-start users. For Epinions
We compare BPMFSR and BPMFSRIC with Hao Ma's method and dataset, we randomly select 60%, 40%, 10% and 5% users as cold-start
BPMF on Last.fm dataset. Listening count is mapped into 5-point rating users. All the ratings stated by cold-start users are treated as testing
a) 10% cold-start user and 10% b) 10% cold-start user and 10%
cold-start item, D =10 cold-start item, D = 30
Fig. 7. Testing RMSE of all methods on Last.fm dataset for cold-start user setting, the x-axis shows the number of epoch and y-axis shows testing RMSE.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
12 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 13
[5] R. Gemulla, E. Nijkamp, P. Haas, Y. Sismanis, Large-scale matrix factorization with [22] R. Salakhutdinov, A. Mnih, Probabilistic Matrix Factorization, Advances in Neural
distributed stochastic gradient descent, Proceedings of the 17th ACM SIGKDD Information Processing Systems, 20, 2008. 1257–1264.
International Conference on Knowledge Discovery and Data Mining, ACM, 2011, [23] R. Salakhutdinov, A. Mnih, Bayesian probabilistic matrix factorization using
pp. 69–77. Markov Chain Monte Carlo, Proceedings of the 25th International Conference
[6] M. Jamali, M. Ester, A transitivity aware matrix factorization model for recommenda- on Machine Learning, ACM, 2008, pp. 880–887.
tion in social networks, Proceedings of the Twenty-Second International Joint [24] B. Sarwar, G. Karypis, J. Konstan, J. Reidl, Item-based collaborative filtering recom-
Conference on Artificial Intelligence, IJCAI'11, AAAI Press, 2011, pp. 2644–2649. mendation algorithms, Proceedings of the 10th International Conference on
[7] O. Koyejo, J. Ghosh, A kernel-based approach to exploiting interaction-networks World Wide Web, ACM, 2001, pp. 285–295.
in heterogeneous information sources for improved recommender systems, [25] H. Shan, A. Banerjee, Generalized probabilistic matrix factorizations for collaborative
Proceedings of the 2nd International Workshop on Information Heterogeneity filtering, Data Mining (ICDM), 2010 IEEE 10th International Conference on, IEEE,
and Fusion in Recommender Systems, ACM, 2011, pp. 9–16. 2010, pp. 1025–1030.
[8] D. Lee, H. Seung, et al., Learning the parts of objects by non-negative matrix [26] A. Singh, G. Gordon, A Bayesian matrix factorization model for relational data,
factorization, Nature 401 (6755) (1999) 788–791. Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in
[9] W. Li, D. Yeung, Relation regularized matrix factorization, Proceedings of the 21st Artificial Intelligence (UAI-10), AUAI Press, Corvallis, Oregon, 2010, pp. 556–563.
International Joint Conference on, Artificial Intelligence, 2009, pp. 1126–1131. [27] Q. Yuan, L. Chen, S. Zhao, Factorization vs. regularization: Fusing heterogeneous
[10] Y. Lim, Y. Teh, Variational Bayesian approach to movie rating prediction, Proceed- social relationships in top-n recommendation, Proceedings of the fifth ACM
ings of KDD Cup and Workshop, Citeseer, 2007, pp. 15–21. conference on Recommender Systems, ACM, 2011, pp. 245–252.
[11] Z. Lu, D. Agarwal, I. Dhillon, A spatio-temporal approach to collaborative filtering, [28] J. Zhu, H. Ma, C. Chen, J. Bu, Social Recommendation Using Low-rank Semidefinite
Proceedings of the Third ACM Conference on Recommender Systems, ACM, 2009, Program, AAAI, 2011, pp. 158–163.
pp. 13–20.
[12] X. Luo, Y. Xia, Q. Zhu, Incremental collaborative filtering recommender based on
regularized matrix factorization, Knowledge-Based Systems 27 (2012) 271–280. Juntao Liu received the BS and MS degrees in Computer Science from Ordnance Engi-
[13] H. Ma, H. Yang, M. Lyu, I. King, SoRec: social recommendation using probabilistic neering College, Shijiazhuang, China, in 2002 and 2005, respectively. He is a lecturer
matrix factorization, Proceedings of the 17th ACM Conference on Information and in the Department of Computer Engineering, Ordnance Engineering College. He is
Knowledge Management, ACM, 2008, pp. 931–940. currently pursuing the Ph.D. degree in the Department of Electronics and Information
[14] H. Ma, I. King, M. Lyu, Learning to recommend with social trust ensemble, Engineering, Huazhong University of Science and Technology. His research interests
Proceedings of the 32nd International ACM SIGIR Conference on Research and include data mining, machine learning and computer vision.
Development in Information Retrieval, ACM, 2009, pp. 203–210.
[15] H. Ma, T. Zhou, M. Lyu, I. King, Improving recommender systems by incorporating
social contextual information, ACM Transactions on Information Systems (TOIS) Caihua Wu received the BS, MS and PhD degrees in Computer Science from Ordnance
29 (2) (2011) 9:1–9:23. Engineering College, Shijiazhuang, China, in 2003, 2006 and 2009 respectively. Now she
[16] H. Ma, D. Zhou, C. Liu, M. Lyu, I. King, Recommender systems with social regular- is a lecturer in the Department of Information Counterwork, Air Force Radar Academy.
ization, Proceedings of the Fourth ACM International Conference on Web Search Her research interests include data mining, information counterwork and software
and Data Mining, ACM, 2011, pp. 287–296. engineering.
[17] P. Massa, P. Avesani, Trust-aware bootstrapping of recommender systems, ECAI
2006 Workshop on Recommender Systems, Citeseer, Riva del Garda, Italy, 2006,
pp. 29–33. Wenyu Liu received the BS degree in Computer Science from Tsinghua University,
[18] S. Nakajima, M. Sugiyama, Theoretical analysis of Bayesian matrix factorization, Beijing, China, in 1986, and the MS and PhD degrees, both in Electronics and Informa-
Journal of Machine Learning Research 12 (2011) 2583–2648. tion Engineering, from Huazhong University of Science and Technology (HUST),
[19] D.H. Park, H.K. Kim, I.Y. Choi, J.K. Kim, A literature review and classification of Wuhan, China, in 1991 and 2001, respectively. He is now a professor and associate
recommender systems research, Expert Systems with Applications 39 (11) (2012) dean of the Department of Electronics and Information Engineering, HUST. His
10059–10072. current research areas include computer graphics, multimedia information processing,
[20] I. Porteous, A. Asuncion, M. Welling, Bayesian matrix factorization with side infor- and computer vision. He is a member of IEEE System, Man and Cybernetics Society.
mation and Dirichlet process mixtures, Proceedings of the 24th AAAI Conference
on, Artificial Intelligence, 2010, pp. 563–568.
[21] C. Robert, G. Casella, Monte Carlo Statistical Methods, Springer Texts in Statistics,
Springer, 2004.
Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
View publication stats