0% found this document useful (0 votes)
11 views14 pages

Bayesian Probabilistic Matrix Factorization With Social Relations and Item Contents

This document summarizes a research paper that proposes two new recommendation approaches that fuse social relations and item contents with user ratings in Bayesian Probabilistic Matrix Factorization (BPMF). The approaches modify BPMF to address data sparsity and cold-start problems by incorporating additional information from social networks and item metadata. Experimental results on three datasets show the new methods achieve more accurate recommendations faster than other matrix factorization methods.

Uploaded by

tthod37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Bayesian Probabilistic Matrix Factorization With Social Relations and Item Contents

This document summarizes a research paper that proposes two new recommendation approaches that fuse social relations and item contents with user ratings in Bayesian Probabilistic Matrix Factorization (BPMF). The approaches modify BPMF to address data sparsity and cold-start problems by incorporating additional information from social networks and item metadata. Experimental results on three datasets show the new methods achieve more accurate recommendations faster than other matrix factorization methods.

Uploaded by

tthod37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/257015907

Bayesian Probabilistic Matrix Factorization with Social Relations and Item


Contents for recommendation

Article in Decision Support Systems · June 2013


DOI: 10.1016/j.dss.2013.04.002

CITATIONS READS

111 851

3 authors, including:

Juntao Liu Wenyu Liu


Huazhong University of Science and Technology Huazhong University of Science and Technology
16 PUBLICATIONS 452 CITATIONS 374 PUBLICATIONS 22,311 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Juntao Liu on 04 February 2018.

The user has requested enhancement of the downloaded file.


DECSUP-12330; No of Pages 13
Decision Support Systems xxx (2013) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Decision Support Systems


journal homepage: www.elsevier.com/locate/dss

Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents
for recommendation
Juntao Liu a, b, Caihua Wu c, Wenyu Liu a,⁎
a
Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
b
Department of Computer Engineering, Mechanical Engineering Institute, Shijiazhuang 050003, China
c
Information Combat Commanding Teaching and Research Section, Information Countermeasure Department, Air Force Radar Academy, Wuhan 430010, China

a r t i c l e i n f o a b s t r a c t

Article history: Recommendation systems have received great attention for their commercial value in today's online business
Received 3 September 2012 world. However, most recommendation systems encounter the data sparsity problem and the cold-start
Received in revised form 27 March 2013 problem. To improve recommendation accuracy in this circumstance, additional sources of information
Accepted 4 April 2013
about the users and items should be incorporated in recommendation systems. In this paper, we modify
Available online xxxx
the model in Bayesian Probabilistic Matrix Factorization, and propose two recommendation approaches fus-
Keywords:
ing social relations and item contents with user ratings in a novel way. The proposed approach is computa-
Recommendation system tionally efficient and can be applied to trust-aware or content-aware recommendation systems with very
Collaborative filtering large dataset. Experimental results on three real world datasets show that our method gets more accurate
Social network recommendation results with faster converging speed than other matrix factorization based methods. We
Item contents also verify our method in cold-start settings, and our method gets more accurate recommendation results
Matrix factorization than the compared approaches.
Tags Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.

1. Introduction rating matrix R ^ ¼ UV T . Here D is the dimension of user feature vector


and item feature vector. D is much less than M and N.
Recommendation systems have become an important research A number of algorithms have been proposed to solve matrix
area in the past decade. Recommendation systems typically try to factorization for recommendation, such as Variational Bayesian Matrix
predict the interests of a user by collecting rating information of Factorization (VBMF) [10], Probabilistic Matrix Factorization (PMF)
other users or items. Recommendation methods are generally divided [22], Bayesian Probabilistic Matrix Factorization (BPMF) [23], General
into collaborative filtering (CF) methods and content-based (CB) Probabilistic Matrix Factorization (GPMF) [25] and so on. Same as
methods [19]. In content-based recommendation methods, the rating other collaborative filtering methods, these methods encounter the
of an item for a user is estimated based on the ratings of similar items data sparsity problem [2]. For a particular recommendation system,
for this user. Collaborative filtering methods try to predict the rating the density of the observed rating matrix is usually less than 1% [24].
of an item for a particular user based on the previous ratings of In this case, it is difficult to find similar users or similar items. Another
this item rated by other similar users. The underlying assumption well-known problem in recommendation system is the cold-start
of collaborative filtering is that similar users have similar tastes. problem, that is, how to provide recommendation to new users who
Collaborative filtering methods are widely used in large commercial have expressed very few ratings. It is believed that social relations
systems, such as Amazon and Netflix. among users can alleviate these problems. For example, in our daily
Matrix factorization is one of the most popular collaborative filter- life, we often turn to our friends for recommendations. We share in-
ing methods in recent years. It is assumed that the preference of a terests with close friends. Our tastes are also often affected by our
user can be represented by a small number of unobserved features. friends. Cooperating with social relations in recommendation systems
Formally, supposing that there are M users and N items, M × N matrix can improve recommendation accuracy [15]. In recent years, some
R is the observed rating matrix. Matrix factorization based methods recommendation methods fusing social relations by regularization
find M × D user latent feature matrix U and  N × D item latent feature [6,9,16,28] or factorization [13,15] were proposed. On the other hand,
matrix V to minimize the loss function f loss R; ^R , which measures the item contents, such as tags, categories and item profile, also provide
difference between the observed rating matrix R and the predicted huge opportunity to improve the accuracy of recommendation. For
example, we may like movies performed by a same actor.
⁎ Corresponding author. Tel.: +86 27 87543236.
In this paper, to alleviate the data sparsity problem and the
E-mail addresses: [email protected] (J. Liu), [email protected] (C. Wu), cold-start problem and to improve recommendation accuracy further,
[email protected] (W. Liu). we integrate social relations and item contents into the framework

0167-9236/$ – see front matter. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2013.04.002

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
2 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx

of Bayesian Probabilistic Matrix Factorization (BPMF) [23] in a product of the user latent matrix and the item matrix and the regression
novel way which is different from regularization-based methods of user and item side information. Adams et al. [1] modified the BPMF
and factorization-based methods, and propose two novel recommen- model. The observed rating matrix, the user latent matrix and the
dation methods cooperating with social relations and item contents. item latent matrix are represented by time-varying functions. The
In BPMF, the hyperparameters of user feature vectors are the same variation processes of user latent matrix and item matrix are represent-
for all users. But in practice, users' preferences vary greatly, so the ed by Gaussian processes. Lu et al. [11] proposed a matrix factorization
hyperparameters of different users should be different. Generating based recommendation method to predict the variation of ratings with
user feature vectors by uniform user hyperparameters in BPMF may time. Two regularization terms, spatial term and temporal term,
lead to some recommendation errors. In this paper we modify the are added into the objective function. Gemulla et al. [5] proposed a
model in BPMF, and suggest that the hyperparameters of different stratified stochastic gradient descent (SSGD) algorithm to solve the
user vectors are different. We then propose the Bayesian Probabilis- general matrix factorization problem, and gave sufficient conditions
tic Matrix Factorization with Social Relations (BPMFSR) recommen- for convergence. Luo et al. [12] proposed an incremental collabora-
dation method. To alleviate data sparsity problem and cold-start tive filtering recommendation method based on the regularized ma-
problem we fuse social relations in this method. We argue that the trix factorization.
posterior distribution of user hyperparameters should be conditioned Generally, the main challenges for recommendation systems are the
on the feature vectors of trusted users. The underlying assumption is data sparsity problem and the cold-start problem [2]. To address these
that the users' preference may be influenced by their friends. On the problems, in recent years, researchers proposed some matrix factoriza-
other hand, similar to the uniform user hyperparameters, uniform tion based recommendation methods fusing social relations among
item hyperparameters in BPMF also lead to some recommendation er- users with rating data, which can help to improve the performance of
rors. To address this problem, we extend our BPMFSR method by fusing recommender systems. These methods can be divided into two types:
item contents information and propose an improved algorithm, Bayesian regularization-based methods and factorization-based methods.
Probabilistic Matrix Factorization with Social Relations and Item Con- Regularization-based methods typically add regularization term
tents (BPMFSRIC). In BPMFSRIC, we assume that the hyperparameters to the loss function and minimize it. For examples, recommendation
of different item vectors are different, and the posterior distribution of method proposed by Hao Ma et al. [16] adds social regularization term
item feature vector parameters is conditioned on the feature vectors to the loss function, which measures the difference between the latent
of linked items. The links among items can be extracted by item contents feature vector of a user and those of his (or her) friends. Local minimum
information, such as tags, categories and properties. The BPMFSR can of the loss function is found by gradient-based method. Jamali and Ester
be applied to trust-aware recommender systems. If item contents are [6] proposed a probability model similar to the model 1 in [16]. Relation
known additionally, BPMFSRIC can improve recommendation accuracy regularized matrix factorization (RRMF) [9] method proposed by Li and
further. The proposed method is computationally efficient, and can be Yeung adds the graph Laplacian regularization term of social relations
applied to large-scale real life datasets. Experimental results on Douban into the loss function and minimizes the loss function by alternative
dataset [16], Epinions dataset [17] and Last.fm dataset [4] show that the projection algorithm. Zhu et al. [28] used the same model in [9] and
accuracy of our method outperforms other methods based on matrix built graph Laplacian of social relations using three kinds of kernel
factorization. We also verify our method in cold-start settings, and our functions. The minimization problem is formulated as low-rank semi-
method gets better results than other methods. definite program (LRSDP) and is solved by the method proposed in [3].
The rest of this paper is organized as follows. In Section 2, a survey Regularization-based methods always minimize the difference between
of major recommendation methods based on matrix factorization is the latent feature vector of a user and those of his (or her) friends and
provided. Section 3 introduces PMF and BPMF briefly. The proposed give weights to the regularization terms to tradeoff between factori-
method is described in Section 4. The experimental results are presented zation error and regularization terms. The weights should be tuned
and analyzed in Section 5 followed by the conclusions and further work manually to avoid over fitting. So the drawback of this kind of methods
in Section 6. is the same as that of PMF.
In factorization-based methods, social relations are represented as
2. Related work social relation matrix, which is factored as well as the rating matrix.
The loss function is the weighted sum of the social relation matrix factor-
Rating-based recommendation methods are generally divided into ization error and the rating matrix factorization error. For example, SoRec
collaborative filtering (CF) methods and content-based (CB) methods [13] factorizes social relation matrix and rating matrix simultaneously.
[19]. Matrix factorization based methods are one kind of collaborative Social relation matrix is approximated as the product of the user latent
filtering methods. In this section we review several recommendation feature matrix and the factor feature matrix. SoRec can also be extended
methods based on matrix factorization. to fuse social tags and item tags with rating information [15]. Yuan et al.
Lim and Teh [10] proposed Variational Bayesian based Matrix Fac- [27] argued that factorization-based methods outperform regularization-
torization (VBMF) for movie recommendation. Nakajima and Sugiyama based methods for fusing membership information, and proposed a
[18] analyzed VBMF theoretically. Probabilistic Matrix Factorization method fusing membership and friendship by factorization and re-
(PMF) proposed by Salakhutdinov and Mnih [22] models the predictive gularization, respectively. Factorization-based methods encounter the
error of matrix factorization as Gaussian distribution. Gradient descent same problem as regularization-based methods: the weight of the social
algorithm is used to find the local maximal of the posterior probability relation matrix factorization error and the weight of the rating matrix
over user and item latent matrices with parameters. PMF gets accurate factorization error should be tuned to avoid over fitting, which is compu-
results on Netflix dataset. The main shortcoming of PMF is that careful tationally expensive especially on large-scale datasets. To avoid parame-
parameter tuning is needed to avoid over fitting. This leads to high ter tuning, Singh and Gordon proposed Hierarchical Bayesian Collective
computational complexity on large datasets. Bayesian Probabilistic Matrix Factorization (HBCMF) [26], in which two relation matrices are
Matrix Factorization (BPMF) [23] overcomes this drawback by using factored. The model of HBCMF is very similar to that of BPMF, and a
Markov Chain Monte Carlo (MCMC) method that gets more accurate block Metropolis–Hastings algorithm is used to sample from this model.
predictive results. As far as we know, BPMF method outperforms most In this paper, we propose two novel recommendation methods
of the recommendation methods based on matrix factorization. Shan cooperating with social relations and item contents. Our methods are
and Banerjee [5] extended PMF and BPMF and proposed a series of different from previous methods because the way we use social rela-
general PMF (GPMF) methods. Porteous et al. [20] fused side informa- tions and item contents is not factorization-based and regularization-
tion into BPMF model. In their model, the ratings are estimated by the based. To fuse social relations and item contents, we modify the

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 3

model in BPMF. The differences between our methods and PMF, and vectors. D is the dimension of user feature vector and item feature
BPMF are shown in Fig. 1. Our methods not only avoid parameter vector, which is much less than M and N. In Probabilistic Matrix Factor-
tuning just as BPMF, but also improve recommendation accuracy ization (PMF) [22] method, the conditional probability of observed
and converge speed. ratings matrix R is modeled as:
   M N h   iI
p RU; V; σ ¼ ∏ ∏ N Rij U i V j ; σ
2 T 2 ij
3. Preliminaries ð1Þ
i¼1 j¼1

In this section, we first introduce preliminaries for matrix factori-


where N(x|μ, σ2) is the probability density function of the Gaussian
zation based recommendation method. And then, we introduce the
distribution with mean μ and variance σ2. I is the indicator matrix. Iij
frameworks of PMF and BPMF briefly.
is equal to 1 if user i rated item j and 0 otherwise. The prior distributions
of latent matrices U and V are modeled as:
3.1. Matrix factorization for recommendation
   M   
p U σ U ¼ ∏ N U i 0; σ U
2 2
ð2Þ
Suppose there are M users and N items. Let matrix R denote the rat- i¼1
ing matrix, where Rij represents the rating of user i for item j. In most on-
   N   
line systems, Rij is the K-point integer. For example, in Douban website p V σ V ¼ ∏ N V j 0; σ V :
2 2
ð3Þ
(http://www.douban.com), rating values are 5-point integers, 1-point i¼1
means ‘very bad’ and 5-point means ‘excellent’. Let U∈RMD and
V∈RND be the user and item latent feature matrices, where row The graphical model for PMF is shown in Fig. 1(a). This model is
vectors Ui and Vj represent user-specific and item-specific latent feature learned by maximizing posterior probability of latent matrices U and

a b

c d
v

Fig. 1. Graphical models for PMF (a), BPMF (b), BPMFSR (c) and BPMFSRIC (d). The main difference between BPMF and BPMFSR is that BPMFSR generates user hyperparameters
separately for every user vectors, while BPMF uses uniform user hyperparameters. BPMFSRIC not only generates user hyperparameters separately but also generate item
hyperparameters separately.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
4 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx

V, which is equivalent to minimizing sum-of-squares of factorization where


error with quadratic regularization terms [22]:
M  T  
μ 0 ¼ β0 μβ0 þM
þ MU
  
1X M X N   λ λ ; β0 ¼ β0 þ M; v0 ¼ v0 þ M; S ¼ M1 ∑i¼1 U i −U U i −U
T 2 2 2
þ U ‖U‖Fro þ V ‖V‖Fro
0
E¼ Iij Rij −U i V j ð4Þ  −1
2 i¼1 j¼1 2 2 M  −1 β M  T  
U ¼ M1 ∑i¼1 U i and W 0 ¼ W 0 þ MS þ 0 μ 0 −U μ 0 −U :
β0 þ M
where λU = σ 2/σU2, λV = σ2/σV2 and ||⋅||Fro 2
denote Frobenius norm.
Local minimum of Eq. (4) is found through gradient descent method
in PMF [22]. BPMF gets more accurate recommendation results than PMF,
Although PMF is maybe the most popular method for collaborative fil- and avoids parameter tuning. But BPMF also ignores social relations
tering and it is very successful in the Netflix Prize contest, the drawbacks among users. The distribution parameters of user feature vectors are
of this method are two-fold. Firstly, it requires careful tuning of parame- estimated by all of the user feature vectors. It is observed that people
ters to avoid over fitting. This process is computationally expensive on with social relations are more likely to share same preferences. If we
large datasets. Secondly, PMF assumes that user vectors and item vectors estimate distribution parameters of a particular user feature vector by
are independent and identically distributed and ignores the social rela- the vectors of his (or her) friends, the recommendation accuracy will
tions among users. It is believed that social relations can alleviate the be improved further. By this idea, we propose our recommendation
data sparsity problem, and improve recommendation accuracy. For this methods.
reason, Hao Ma et al. [16] add social regularization term to the loss func-
tion in Eq. (4). The social regularization term measures the difference 4. Proposed method
between the feature vector of a user and those of his (or her) friends.
Hao Ma's method gets more accurate recommendation results than In BPMF framework, hyperparameters for all users are the same
NMF [8], PMF [22], RSTE [14] and other state-of-art on large real life (see Eq. (5)). This is unreasonable because users' preferences are dif-
datasets. In this paper, we compare our methods with Hao Ma's method. ferent, and the hyperparameters should be different too. We assume
that user hyperparameters are different for different users, and pro-
3.2. Bayesian Probabilistic Matrix Factorization pose Bayesian Probabilistic Matrix Factorization with Social Relations
(BPMFSR), in which user hyperparameters are sampled according to
To avoid parameter tuning, Salakhutdinov and Mnih [23] proposed a the social relations. Our method uses the social relations in a novel
Bayesian model of Probabilistic Matrix Factorization (BPMF). The prior way, which is not regularization-based and factorization-based, and
distributions of latent matrices U and V are given by: can improve recommendation accuracy.
In BPMF, uniform item hyperparameters encounter the same prob-
   M    lem. To improve the performance further, we fuse item contents as
p U μ U ; ΛU ¼ ∏ N U i μ U ; ΛU
−1
ð5Þ
i¼1
well as social relations, and propose Bayesian Probabilistic Matrix Factor-
ization with Social Relations and Item Contents (BPMFSRIC), in which
   N   
−1
p V μ V ; ΛV ¼ ∏ N V j μ V ; ΛV : ð6Þ item hyperparameters are sampled according to the item contents.
i¼1 In this section, we introduce BPMFSR and BPMFSRIC in details.

BPMF assumes that user hyperparameters ΘU = {μU,ΛU} and item 4.1. Bayesian Probabilistic Matrix Factorization with Social Relation
hyperparameters ΘV = {μV,ΛV} follow Gaussian-Wishart distribution.
The prior distributions of user hyperparameters and item hyper- It is unreasonable that hyperparameters ΘU are the same for
parameters are given by: different users in BPMF, which may cause some recommendation
errors. To address this problem, we assume every user has its
    −1     own hyperparameters. By this assumption, Eq. (5) should be mod-
p ΘU jΘ0 ¼ pðμ U jΛU Þ p ðΛU Þ ¼ N μ U jμ 0 ; β0 ΛU W ΛU W 0 ; v0 ð7Þ
ified as:

     −1     M  
−1
p ΘV jΘ0 ¼ p μ V jΛV Þ p ðΛV Þ ¼ N μ V jμ 0 ; β0 ΛV W ΛV W 0 ; v0 ð8Þ pðU Þ ¼ ∏ N U i jμ U;i ; ΛU;i ð11Þ
i¼1

where W(w|W0, v0) is the Wishart distribution with v0 degrees of freedom where ΘU,i = {μU,i,ΛU,i} are the hyperparameters for user feature vector Ui.
and a D × D scale matrix W0. Θ0 = {μ0,v0,W0}. The graphical model for The posterior distribution over user feature vector Ui described
BPMF is shown in Fig. 1(b). BPMF uses Gibbs algorithm to sample user in Eq. (9) should also be modified. In fact, the posterior distribution
and item feature vectors. It is assumed that the posterior distribution over user feature vector Ui is conditioned on the item feature matrix
over user feature vector Ui, which is conditioned on item feature matrix V, the observed rating matrix R and its own hyperparameters ΘU,i =
V, observed rating matrix R and hyperparameters, is Gaussian: {μU,i,ΛU,i} under our assumption. The posterior distribution over Ui is
 given as:
    
p U i jR; V; ΘU ; α ¼ N U i jμ i ; Λi −1
    −1 
N h  iI ð9Þ 
T
∼∏ N Rij jU i V j ; α
−1 ij
pðU i jμ U ; ΛU Þ p U i jR; V; ΘU;i ; α ¼ N U i jμ U;i ; ΛU;i
j¼1
N h  iI   ð12Þ
T −1 ij
  ∼∏ N Rij jU i V j ; α p U i jμ U;i ; ΛU;i
 N  
where Λ i∗ = ΛU + α ∑ N T
j = 1Vj Vj) and μ i ¼ Λi −1 α∑j¼1 V j Rij Iij þ j¼1

μ U ΛU .
 I   −1
 N T ij  N  I
The conditional distribution over user hyperparameters condi- where ΛU;i ¼ ΛU;i þ α∑j¼1 V j V j and μ U;i ¼ Λi α∑j¼1 V j Rij ij þ

tioned on the user feature matrix U is given as: μ U;i ΛU;i :
   −1  
     Because a user's preference is influenced by his (or her) friends, we
p μ U ; ΛU jU; Θ0 ¼ N μ U jμ 0 ; β0 ΛU W ΛU jW 0 ; v0 ð10Þ
suppose that the conditional distribution over user hyperparameters is

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 5

conditioned on feature vectors of user's friends. By this assumption, 4.2. Bayesian Probabilistic Matrix Factorization with Social Relation and
Eq. (10) should be modified as: Item Contents

      Using uniform hyperparameters for all of the items in BPMF


p ΘU;i jU; Θ0 ¼ p ΘU;i jU F;i ; Θ0 ¼ p μ U;i ; ΛU;i jU F;i ; Θ0
  encounters the same problem caused by uniform user hyper-
 −1   ð13Þ

¼ N μ U;i jμ U;i ; βU;i ΛU;i W ΛU;i jW U;i ; vU;i parameters. To avoid the prediction error caused by uniform item
hyperparameters and to further improve the recommendation accu-
racy, we assume that every item has its own hyperparameters. So
β0 μ 0 þ M i U ðiÞ Eq. (6) should be modified as:
μ U;i ¼ β 0 þMi ; βU;i ¼ β0 þ Mi ; vU;i ¼ v0 þ M i ;
 −1  
 −1 β M  T  N
−1
W U;i ¼ W 0 þ M i S U;i þ 0 i μ 0 −U ðiÞ μ 0 −U ðiÞ ; pðV Þ ¼ ∏ N V j jμ V;j ; ΛV;j ð14Þ
β0 þ Mi j¼1
1 1  T  
U ðiÞ ¼ ∑ U ;S ¼ ∑ U −U ðiÞ U j −U i ; Mi ¼ jF i j
Mi j∈F i j U;i M i j∈F i j where ΘV,j = {μV,j,ΛV,j} are the hyperparameters for item feature vec-
tor Vj. By this assumption, the posterior distribution over item feature
vector Vj is conditioned on the user feature matrix U, the observed
where UF,i is the matrix composed by feature vector of user i and feature
rating matrix R and its own hyperparameters ΘV,j = {μV,j,ΛV,j}. It is
vectors of user i's friends, Fi is the friend set of user i and himself and |⋅|
given as:
denotes the size of a set.
The method described above is called Bayesian Probabilistic Matrix     −1 

Factorization with Social Relations (BPMFSR), and the graph model is p V j jR; U; ΘV;j ; α ¼ N V j jμ V;j ; ΛV;j
shown in Fig. 1(c). In this model, user feature vector Ui is generated Mh  iI   ð15Þ
T −1
according to its own hyperparameters ΘU,i for each user. We also use ∼∏ N Rij jU i V j ; α
ij
p V j jμ V;j ; ΛV;j
Gibbs sampling algorithm to sample user feature vectors and item i¼1

feature vectors, which is given in Algorithm 1. In this algorithm, we


first generate user hyperparameters ΘU,i for each user by Eq. (13) and  M  T I    −1 M  I
where ΛV;j ¼ ΛV;j þ α∑i¼1 U j U j ij and μ V;j ¼ ΛV;j α∑i¼1 U i Rij ijþ
then generate user feature vector Ui with user hyperparameters ΘU,i. 
If a user has very few friends, for an example, less than D, the hyper- μ V;j ΛV;j .
parameter estimation according to the feature vectors of his (or her) Let Cj denote the item set in which every item links to item j. The
friends is meaningless. So in Algorithm 1, if user i has sufficient friends, links between items can be constructed according to contents, such as
hyperparameters are sampled according to the feature vectors of user i's item tags, categories and properties. For example, if two items are at-
friends, otherwise, hyperparameters are sampled according to the fea- tached with a same tag, there is a link between them. We can also link
ture vectors of all users. Item hyperparameters and item feature vectors the similar items by measuring item properties similarity.
are generated by the same way in BPMF. To fuse item contents, we assume that the conditional distribution
over item hyperparameters ΘV,j = {μV,j,ΛV,j} is only conditioned on
Algorithm 1. Gibbs sampling for Bayesian Probabilistic Matrix Factoriza- the feature vectors of items in Cj. The underlying assumption is
tion with Social Relationship (BPMFSR) that the items with links should receive similar ratings. The condi-
tional distribution over hyperparameters of item j, ΘV,j = {μV,j,ΛV,j},
is given by:
     
p ΘV;j jV; Θ0 ¼ p ΘV;j jV C;j ; Θ0 ¼ p μ V;j ; ΛV;j jV C;j ; Θ0
     ð16Þ
   
¼ N μ V;j jμ V;j ; βV;j ΛV;j W ΛV;j jW V;j ; vV;j

β 0 μ 0 þ N j V ðj Þ
μ V;j ¼ ; βV;j ¼ β0 þ Nj ; vV;j ¼ v0 þ Nj ;
β0 þNj
!
β0 Nj  T   −1
W V;j ¼ W 0 −1 þ Nj S V;j þ μ 0 −V ðjÞ μ 0 −V ðjÞ ;
β0 þ Nj
 T    
1  
V ðjÞ ¼ ∑ V k ; S V;j ¼ N1 ∑ V k −V ðjÞ V k −V ðjÞ ; N j ¼ C j 
N j k∈C j
k∈C j
j

where VC,j is the matrix composed by feature vectors of items in Cj,


and Cj is the item set in which items are linked to item j.

Table 1
Statistics of datasets.

Dataset Douban Epinions Last.fm

Num. of users 129,490 49,290 1892


Because different user hyperparameters are applied to different Num. of items 58,541 139,783 17,632
users, BPMFSR can get rid of the prediction errors caused by uniform Num. of ratings 16,830,839 664,824 92,834
user hyperparameters in BPMF. Furthermore, social relations are inte- Rating matrix density 2.220 × 10−3 9.649 × 10−5 2.783 × 10−3
grated in the model, which can improve prediction accuracy and alle- Num. of friend links 1,692,952 487,181 12,717
Num. of tag statements 0 0 186,479
viate the data sparsity problem and the cold-start problem.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
6 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx

This method is called Bayesian Probabilistic Matrix Factorization vectors and sampling all item feature vectors is O(K), where K is the
with Social Relations and Item Contents (BPMFSRIC). Graphical model number of nonzero entries in rating matrix R. So the computational
for BPMFSRIC is shown in Fig. 1(d). In this model, item hyperparameters complexity of one iteration in Algorithms 1 and 2 is O(K), which indi-
are generated for each item, as well as user hyperparameters. We also cates that the computational complexity of our method is linear with
use Gibbs sampler to sample user feature vectors and item feature vec- respect to the number of observed ratings. This complexity analysis
tors, which is given in Algorithm 2. Similar to Algorithm 1, Algorithm 2 shows that our methods are very efficient and can scale up with re-
samples item hyperparameters for each item if there are more than D spect to very large datasets.
items linking to it. Compared with BPMF, the computational complexity of our methods
is slightly higher in one iteration, because Algorithms 1 and 2 sample
user hyperparameters and item hyperparameters using Eqs. (13) and
Algorithm 2. Gibbs sampling for Bayesian Probabilistic Matrix (16) additionally. However, the computational complexity of sampling
Factorization with Social Relationship and Item Contents (BPMFSRIC) all user hyperparameters and sampling all item hyperparameters is
much less than that of sampling user feature vectors and sampling
item feature vectors. Furthermore, the experiments in Section 5 show
that our methods converge faster than BPMF. Considering both compu-
tational complexity in one iteration and converge speed, our methods
can achieve the same recommendation accuracy in few iterations and
in less time.

4.4. Convergence analysis

Because we infer our model through Gibbs sampling, the conver-


gence of the proposed method is guaranteed by that of Gibbs sampler.
It has been proved that under positivity conditions, the Markov chain
generated by Gibbs sampler will converge to its invariant distribution.

Definition 1. Positivity condition [21]

A density function f(x1,x2,…,xn) and marginal density functions fi(xi)


are said to satisfy the positivity condition if fi(xi) > 0 for all x1, x2,…, xn
implies that f(x1,x2,…,xn) > 0.

Lemma 1. [21] If the joint distribution f(x1,x2,…,xn) satisfies the positiv-


ity condition, the Gibbs sampler yields an irreducible and recurrent
Markov chain.

The distribution in our model satisfies the positivity condition, so


our methods converge as the number of iteration approaches to ∞.
Convergence of the proposed method is also confirmed by the exper-
iments in Section 5.

5. Experiments
4.3. Complexity analysis
In this section, we present the experimental results on three large
The main computation of Algorithms 1 and 2 is sampling user fea- scale datasets to compare our method with other recommendation
ture vectors using Eq. (12) and sampling item feature vectors using methods based on matrix factorization. We also verify our methods in
Eq. (15). The computational complexity of sampling all user feature cold-start settings.

Table 2
Predictive accuracy comparison on Douban Dataset (the values in brackets are the results reported in [16]).

D (dimension) Training Metrics Hao Ma's method BPMF BPMFSR

10 40% MAE 0.5845 ± 0.0008(0.5685) 0.5583 ± 0.0001 0.5577 ± 0.0001


RMSE 0.7411 ± 0.0005(0.7125) 0.7045 ± 0.0002 0.7045 ± 0.0001
60% MAE 0.5665 ± 0.0005(0.5593) 0.5512 ± 0.0001 0.5510 ± 0.0001
RMSE 0.7221 ± 0.0005(0.7042) 0.6971 ± 0.0001 0.6973 ± 0.0001
80% MAE 0.5562 ± 0.0003(0.5543) 0.5472 ± 0.0002 0.5467 ± 0.0002
RMSE 0.7099 ± 0.0004(0.6988) 0.6930 ± 0.0002 0.6934 ± 0.0003
30 40% MAE 0.5950 ± 0.0004 0.5588 ± 0.0001 0.5570 ± 0.0001
RMSE 0.7524 ± 0.0005 0.7051 ± 0.0001 0.7039 ± 0.0001
60% MAE 0.5789 ± 0.0002 0.5509 ± 0.0001 0.5486 ± 0.0001
RMSE 0.7336 ± 0.0007 0.6963 ± 0.0001 0.6948 ± 0.0001
80% MAE 0.5674 ± 0.0001 0.5448 ± 0.0001 0.5427 ± 0.0001
RMSE 0.7204 ± 0.0001 0.6896 ± 0.0001 0.6884 ± 0.0001

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 7

Table 3
Predictive accuracy comparison on Epinions dataset.

D (dimension) Training Metrics Hao Ma's method BPMF BPMFSR

10 40% MAE 0.9261 ± 0.0034 0.8535 ± 0.0018 0.8411 ± 0.0006


RMSE 1.2046 ± 0.0050 1.0858 ± 0.0026 1.0695 ± 0.0008
60% MAE 0.9313 ± 0.0023 0.8383 ± 0.0018 0.8359 ± 0.0006
RMSE 1.2003 ± 0.0047 1.0704 ± 0.0027 1.0655 ± 0.0009
80% MAE 0.9018 ± 0.0026 0.8144 ± 0.0020 0.8114 ± 0.0022
RMSE 1.1630 ± 0.0027 1.0498 ± 0.0023 1.0435 ± 0.0025
90% MAE 0.8915 ± 0.0036 0.8081 ± 0.0020 0.8056 ± 0.0020
RMSE 1.1510 ± 0.0055 1.0435 ± 0.0035 1.0334 ± 0.0040
30 40% MAE 0.9341 ± 0.0040 0.8446 ± 0.0073 0.8381 ± 0.0010
RMSE 1.2043 ± 0.0021 1.0785 ± 0.0069 1.0666 ± 0.0012
60% MAE 0.9332 ± 0.0031 0.8423 ± 0.0057 0.8348 ± 0.0017
RMSE 1.2015 ± 0.0039 1.0741 ± 0.0056 1.0640 ± 0.0025
80% MAE 0.9135 ± 0.0019 0.8156 ± 0.0018 0.8124 ± 0.0016
RMSE 1.1736 ± 0.0025 1.0496 ± 0.0026 1.0403 ± 0.0024
90% MAE 0.9078 ± 0.0039 0.8087 ± 0.0025 0.8078 ± 0.0024
RMSE 1.1661 ± 0.0051 1.0434 ± 0.0034 1.0352 ± 0.0030

5.1. Datasets and Fusion in Recommender Systems (HetRec 2011) [4]. It contains
1892 users, 17,632 artists and 11,946 tags. Different from Douban dataset
We evaluate our method on Douban dataset [16], Epinions dataset and Epinions dataset, Last.fm dataset only records listening count that
[17] and Last.fm dataset [4]. Douban (http://www.douban.com) dataset, each user listened to artists. Listening count ranges from 1 to 352,698.
crawled by Hao Ma et al. [16], contains 16,830,839 ratings of 129,490 To test our methods on Last.fm, we map listening counts into integer
users on 58,541 movies and 1,692,952 friend links between these users. values of 1 to 5 to represent the extent of favor of artists by the similar
For more details, please see [16]. way in [7]. The mapping formula is given as:
Epinions (http://www.epinions.com) dataset [17] contains 49,290
users and 139,783 items. These users issued 664,824 ratings and
487,181 trust statements. Note that the Epinions dataset collected by r¼ ⌊log10 l⌋ þ 1; if ⌊log10 l⌋ þ 1≤5 ð17Þ
5; otherwise
Massa and Avesani [17] used in this paper is different from that used in
[16]. Epinions dataset collected by Massa and Avesani [17] contains
more items, ratings, friend links and fewer users. where l is the listening count, r is the mapped value, and ⌊⋅⌋ is the operator
Last.fm (http://www.lastfm.com) dataset is released in the frame- of rounding towards zero. To test BPMFSRIC method, we build the links
work of the 2nd International Workshop on Information Heterogeneity according to tags. If two artists received a same tag more than 5 times,

a) 40% training data, D = 10 b) 60% training data, D = 10 c) 80% training data, D = 10

d) 40% training data, D = 30 e) 60% training data, D = 30 f) 80% training data, D = 30

Fig. 2. Testing RMSE of all methods on Douban dataset, the x-axis shows the number of epoch and y-axis shows testing RMSE.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
8 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx

a) 60% training data, D = 10 b) 80% training data, D = 10 c) 90% training data, D =10

d) 60% training data, D = 30 e) 80% training data, D = 30 f) 90% training data, D = 30

Fig. 3. Testing RMSE of all methods on Epinions dataset, the x-axis shows the number of epoch and y-axis shows testing RMSE.

we link these two artists. Last.fm contains 12,717 friend links, 92,834 5.2. Comparisons
listening counts and 186,479 tag statements.
The statistics of these datasets are summarized in Table 1. We compare Hao Ma's method [16] and BPMF [23] with our methods.
We use Mean Absolute Error (MAE) and Root Mean Square Error Hao Ma's method is implemented using model 2 with PCC similarity (see
(RMSE) to measure prediction accuracy of recommendation methods. [16] for details). It is reported that this implementation gets the highest
MAE is defined as: prediction accuracy and outperforms NMF [8], PMF [22], RSTE [14] and
other state-of-art. The initial solutions of our methods are the same as
1 ^ j those of BPMF. In our methods, μ0, v0, and W0 are set to be the same
MAE ¼ ∑jR −R ð18Þ
T i;j ij ij
values as those in BPMF. The experiments repeat 10 times. Mean and
standard deviation of MAE and RMSE are calculated.
^ is the prediction of
where Rij is the rating given by user i for item j, and R ij For Douban dataset, we randomly select 40%, 60% and 80% ratings as
Rij. T is the total number of tested ratings. RMSE is defined as: training data, and use the rest of ratings to test the algorithms. For
Epinions dataset, we use 40%, 60%, 80% and 90% ratings as training
data. Because Douban dataset and Epinions dataset don't contain item
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi contents information, which is needed by BPMFSRIC, we don't test
1  ^ 2:

RMSE ¼ ∑ Rij −R ij ð19Þ BPMFSRIC on these two datasets. The experimental results are shown
T i;j
in Tables 2 and 3. We give the mean and standard deviation of RMSE

Table 4
Predictive accuracy comparison on Last.fm dataset.

D (dimension) Training Metrics Hao Ma's method BPMF BPMFSR BPMFSRIC

10 40% MAE 0.4806 ± 0.0032 0.3359 ± 0.0018 0.3338 ± 0.0012 0.3341 ± 0.0007
RMSE 0.6936 ± 0.0047 0.4655 ± 0.0036 0.4613 ± 0.0028 0.4604 ± 0.0026
60% MAE 0.4539 ± 0.0070 0.3270 ± 0.0014 0.3261 ± 0.0015 0.3278 ± 0.0012
RMSE 0.6529 ± 0.0104 0.4502 ± 0.0023 0.4489 ± 0.0023 0.4502 ± 0.0021
80% MAE 0.4310 ± 0.0056 0.3234 ± 0.0012 0.3222 ± 0.0014 0.3237 ± 0.0016
RMSE 0.6210 ± 0.0080 0.4465 ± 0.0018 0.4449 ± 0.0021 0.4461 ± 0.0020
30 40% MAE 0.4849 ± 0.0020 0.3378 ± 0.0006 0.3354 ± 0.0004 0.3345 ± 0.0005
RMSE 0.6977 ± 0.0019 0.4686 ± 0.0018 0.4630 ± 0.0017 0.4596 ± 0.0013
60% MAE 0.4691 ± 0.0027 0.3283 ±0.0016 0.3277 ± 0.0019 0.3286 ± 0.0017
RMSE 0.6708 ± 0.0038 0.4529 ± 0.0021 0.4515 ± 0.0022 0.4508 ± 0.0020
80% MAE 0.4492 ± 0.0013 0.3241 ± 0.0012 0.3235 ± 0.0012 0.3244 ± 0.0014
RMSE 0.6418 ± 0.0026 0.4467 ± 0.0024 0.4452 ± 0.0024 0.4451 ± 0.0026

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 9

a) 40% training data, D = 10 b) 60% training data, D = 10 c) 80% training data, D = 10

d) 40% training data, D = 30 e) 60% training data, D = 30 f) 80% training data, D = 30

Fig. 4. Testing RMSE of all methods on Last.fm dataset, the x-axis shows the number of epoch and y-axis shows testing RMSE.

and MAE for each experiment. The best results which are statistically Ma's method and BPMF on Epinions datasets in the term of improved
significant (at the 5% significance level) are set to be bold. According RMSE and MAE. On Douban dataset, in the cases of 60% and 80% training
to the results, it can be observed that our method outperforms Hao data with D = 10, BPMF get lower RMSE than BPMFSR. MAE and RMSE

Table 5
Predictive accuracy comparison in cold-start settings on Douban dataset.

D (dimension) Cold-start users Metrics Hao Ma's method BPMF BPMFSR

10 60% MAE 0.7321 ± 0.0005 0.6337 ± 0.0004 0.6335 ± 0.0003


RMSE 0.9067 ± 0.0002 0.7902 ± 0.0004 0.7885 ± 0.0004
40% MAE 0.7319 ± 0.0003 0.6333 ± 0.0004 0.6327 ± 0.0003
RMSE 0.9071 ± 0.0006 0.7899 ± 0.0004 0.7877 ± 0.0005
30 60% MAE 0.7320 ± 0.0008 0.6340 ± 0.0007 0.6336 ± 0.0009
RMSE 0.9077 ± 0.0010 0.7906 ± 0.0010 0.7895 ± 0.0012
40% MAE 0.7324 ± 0.0007 0.6338 ± 0.0006 0.6334 ± 0.0007
RMSE 0.9080 ± 0.0010 0.7903 ± 0.0011 0.7891 ± 0.0012

Table 6
Predictive accuracy comparison in cold-start settings on Epinions dataset.

D (dimension) Cold-start users Metrics Hao Ma's method BPMF BPMFSR

10 60% MAE 0.9236 ± 0.0026 0.8558 ± 0.0021 0.8558 ± 0.0017


RMSE 1.2092 ± 0.0031 1.0809 ± 0.0034 1.0806 ± 0.0033
40% MAE 0.9250 ± 0.0008 0.8517 ± 0.0012 0.8517 ± 0.0010
RMSE 1.2105 ± 0.0015 1.0769 ± 0.0018 1.0759 ± 0.0018
10% MAE 0.9229 ± 0.0025 0.8496 ± 0.0040 0.8469 ± 0.0049
RMSE 1.2104 ± 0.0040 1.0725 ± 0.0053 1.0717 ± 0.0055
5% MAE 0.9192 ± 0.0049 0.8424 ± 0.0050 0.8389 ± 0.0048
RMSE 1.2071 ± 0.0048 1.0643 ± 0.0055 1.0625 ± 0.0049
30 60% MAE 0.9287 ± 0.0030 0.8551 ± 0.0025 0.8516 ± 0.0024
RMSE 1.2093 ± 0.0031 1.0803 ± 0.0030 1.0794 ± 0.0029
40% MAE 0.9264 ± 0.0014 0.8478 ± 0.0013 0.8469 ± 0.0011
RMSE 1.2064 ± 0.0023 1.0740 ± 0.0012 1.0742 ± 0.0013
10% MAE 0.9282 ± 0.0048 0.8498 ± 0.0040 0.8449 ± 0.0039
RMSE 1.2113 ± 0.0066 1.0730 ± 0.0054 1.0702 ± 0.0052
5% MAE 0.9243 ± 0.0108 0.8431 ± 0.0063 0.8388 ± 0.0071
RMSE 1.2082 ± 0.0152 1.0655 ± 0.0090 1.0623 ± 0.0101

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
10 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx

a) 60% cold-start user, D = 10 b) 60% cold-start user, D = 30

c) 40% cold-start user, D = 10 d) 40% cold-start user, D = 30

Fig. 5. Testing RMSE of all methods on Douban dataset for cold-start user setting, the x-axis shows the number of iteration and y-axis shows testing RMSE.

generated by BPMF and BPMFSR on Douban dataset are very close. The BPMFSR alleviates the data sparsity problem better than other methods.
advantages of BPMFSR are more obvious on Epinions dataset. Epinions We also notice that the increasing of user and item feature vector di-
dataset is much sparser than Douban dataset, so we can say that mension doesn't improve the predictive accuracy of Hao Ma's method

a) 10% cold-start user, D = 10 b) 10% cold-start user, D = 30

c) 5% cold-start user, D = 10 d) 5% cold-start user, D = 30

Fig. 6. Testing RMSE of all methods on Epinions dataset for cold-start user setting, the x-axis shows the number of epoch and y-axis shows testing RMSE.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 11

Table 7
Predictive accuracy comparison in cold-start settings on Last.fm dataset.

D (dimension) Cold-start users/items Metrics Hao Ma's method BPMF BPMFSR BPMFSRIC

10 10%/10% MAE 0.4973 ± 0.0155 0.4213 ± 0.0183 0.4209 ± 0.0186 0.4178 ± 0.0169
RMSE 0.7150 ± 0.0180 0.6039 ± 0.0225 0.6009 ± 0.0239 0.5960 ± 0.0191
5%/5% MAE 0.4860 ± 0.0187 0.4127 ± 0.0183 0.4106 ± 0.0183 0.4082 ± 0.0198
RMSE 0.7027 ± 0.0232 0.5945 ± 0.0260 0.5885 ± 0.0276 0.5857 ± 0.0271
30 10%/10% MAE 0.4883 ± 0.0095 0.4174 ± 0.0105 0.4143 ± 0.0086 0.4131 ± 0.0087
RMSE 0.7091 ± 0.0101 0.6017 ± 0.0114 0.5972 ± 0.0100 0.5954 ± 0.0097
5%/5% MAE 0.4800 ± 0.0183 0.4093 ± 0.0165 0.4055 ± 0.0159 0.4033 ± 0.0160
RMSE 0.6970 ± 0.0211 0.5878 ± 0.0212 0.5831 ± 0.0199 0.5786 ± 0.0203

and BPMF, while BPMFSR gets more accurate results with higher di- by Eq. (17). Tag statements are used to build links between items (artists).
mension. This indicates that Hao Ma's method and BPMF may over fit If two artists received a same tag more than 5 times, we link these two art-
in high dimension. ists. We randomly select 40%, 60% and 80% ratings as training data, and
Note that the results reported in [16], shown in the brackets in Table 2, use the rest of ratings as testing data. The experiments repeat 10 times,
are slightly lower than the results obtained by our implementation. Be- and mean and standard deviation of MAE and RMSE are calculated. The
cause we cannot get the code used in [16], we attribute the discrepancy experimental results are shown in Table 4. It can be observed that our
to some implementation details unmentioned in [16]. Although the methods get lower mean of MAE and RMSE than Hao Ma's method and
results reported in [16] and those obtained by us are slightly different, BPMF in all cases. When dimension is high, BPMFSRIC gets lower mean
we can draw the same conclusion that our method outperforms Hao of MAE and RMSE than BPMFSR.
Ma's method on Douban dataset in the case of D = 10 even using the Fig. 4 shows testing RMSE generated by all methods at every
results reported in [16]. epoch on Last.fm dataset. Trends of convergence are similar to those
Figs. 2 and 3 show testing RMSE generated by all methods at in Figs. 2 and 3. Furthermore, BPMFSRIC and BPMFSR converge faster
every epoch on Douban dataset and Epinions dataset. It can be than BPMF especially in the cases of 40% training data.
observed that Hao Ma's method over fits after several epochs, while
BPMF and BPMFSR do not over fit at all. RMSE values generated by 5.3. Comparisons in cold-start settings
BPMFSR are higher than those generated by BPMF at the beginning,
but after about 10 epochs BPMFSR outperforms BPMF. This indicates We test our methods in cold-start settings. For Douban dataset we
that BPMFSR converges faster than BPMF. randomly select 60% and 40% users as cold-start users. For Epinions
We compare BPMFSR and BPMFSRIC with Hao Ma's method and dataset, we randomly select 60%, 40%, 10% and 5% users as cold-start
BPMF on Last.fm dataset. Listening count is mapped into 5-point rating users. All the ratings stated by cold-start users are treated as testing

a) 10% cold-start user and 10% b) 10% cold-start user and 10%
cold-start item, D =10 cold-start item, D = 30

c) 5% cold-start user and 5% d) 5% cold-start user and 5%


cold-start item, D =10 cold-start item, D = 30

Fig. 7. Testing RMSE of all methods on Last.fm dataset for cold-start user setting, the x-axis shows the number of epoch and y-axis shows testing RMSE.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
12 J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx

a Fig. 7 shows the testing RMSE generated by all methods at every


epoch on Last.fm dataset in cold-start user and item settings. We can
find Hao Ma's method overfits at the beginning for all cases. Our methods
converge faster than BPMF and get better results in fewer epochs.

5.4. Impact of the feature dimension

We investigate the impact of the feature dimension. Using 60% train-


ing data, we change the feature dimension and calculate RMSE on
Douban dataset, Epinions dataset and Last.fm dataset. The results are
presented in Fig. 8. We observe that the feature dimension impacts
the recommendation results. As feature dimension increases, the pre-
diction accuracy increases quickly at first. But when feature dimension
increases further, the prediction accuracy increases slowly and even de-
creases on Last.fm. This phenomenon indicates that very high feature
b dimension cannot help to improve the recommendation accuracy.

6. Conclusion and future work

To address the data sparsity problem and the cold-start problem, in


this paper, we modify the model in BPMF. We assume that the user
hyperparameters and item hyperparameters are different for each user
vector and item vector. The proposed recommendation methods,
BPMFSR and BPMFSRIC, sample user hyperparameters and item
hyperparameters according to the social relations and item contents. By
this novel way we fuse social relations and item contents with ratings,
which is different from traditional regularization-based methods and
factorization-based methods. BPMFSR can be applied to trust-aware
recommendation systems, while if the item contents are available,
c BPMFSRIC can improve recommendation accuracy further. The pro-
posed methods are computationally efficient and can scale up with
respect to very large datasets. Experimental results on three large
real world datasets show that our methods get more accurate
recommendation results with faster converging speed than the
other state-of-the-art recommendation methods based on matrix
factorization. Moreover, our methods outperform other methods in
cold-start settings.
In our methods, we only use the trust information, while distrust
statements are also provided in many online social networks. How to
use distrust information is one of our further research directions. An im-
portant problem should be investigated: how the distrust relations affect
the user preference.
Furthermore, we only use the direct trust relations between users
Fig. 8. Impact of feature dimension on all datasets (60% training data).
and ignore the indirect trust relations. Trust relations can propagate
among people in real life, so indirect trust relations can also affect
the preference of a user. How to fuse indirect trust relations is anoth-
er research direction.
data and all the other ratings are treated as training data. We compare
Hao Ma's method and BPMF with BPMFSR. The results are shown in
Tables 5 and 6. It can be observed that BPMFSR outperforms the com- Acknowledgments
pared methods on Douban dataset in all cases. In Epinions dataset,
BPMFSR obtains lower mean of RMSE and MAE. This work is supported by the National Natural Science Foundation
Figs. 5 and 6 show testing RMSE generated by all methods at every of China (61173120). The authors would like to thank the reviewers
epoch on Douban dataset and Epinions dataset in cold-start-user set- and editor for their helpful comments.
tings. We can find Hao Ma's method overfits at the beginning in all
cases. BPMFSR converges faster than BPMF and gets better results in References
fewer epochs.
[1] R.P. Adams, G.E. Dahl, I. Murray, Incorporating Side Information in Probabilistic
For Last.fm dataset we randomly select 5% and 10% users and items as
Matrix Factorization with Gaussian Processes, UAI, 2010, pp. 1–9.
cold-start users and cold-start items. All the ratings for cold-start users [2] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender
and cold-start items are treated as testing data and all other ratings are systems: A survey of the state-of-the-art and possible extensions, IEEE Transac-
treated as training data. We compare Hao Ma's method and BPMF with tions on Knowledge and Data Engineering 17 (6) (2005) 734–749.
[3] S. Burer, R. Monteiro, A nonlinear programming algorithm for solving semidefinite
BPMFSR and BPMFSRIC. The results are shown in Table 7. It can be ob- programs via low-rank factorization, Mathematical Programming 95 (2) (2003)
served that BPMFSR and BPMFSRIC obtain lower mean of RMSE and 329–357.
MAE then the compared methods in all cases. Because item contents [4] I. Cantador, P. Brusilovsky, T. Kuflik, Second workshop on information heterogeneity
and fusion in recommender systems (hetrec2011), Proceedings of the Fifth ACM
are used in BPMFSRIC, BPMFSRIC gets more accurate prediction results Conference on Recommender systems, RecSys'11, ACM, New York, NY, USA, 2011,
than BPMFSR in all cases. pp. 387–388.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
J. Liu et al. / Decision Support Systems xxx (2013) xxx–xxx 13

[5] R. Gemulla, E. Nijkamp, P. Haas, Y. Sismanis, Large-scale matrix factorization with [22] R. Salakhutdinov, A. Mnih, Probabilistic Matrix Factorization, Advances in Neural
distributed stochastic gradient descent, Proceedings of the 17th ACM SIGKDD Information Processing Systems, 20, 2008. 1257–1264.
International Conference on Knowledge Discovery and Data Mining, ACM, 2011, [23] R. Salakhutdinov, A. Mnih, Bayesian probabilistic matrix factorization using
pp. 69–77. Markov Chain Monte Carlo, Proceedings of the 25th International Conference
[6] M. Jamali, M. Ester, A transitivity aware matrix factorization model for recommenda- on Machine Learning, ACM, 2008, pp. 880–887.
tion in social networks, Proceedings of the Twenty-Second International Joint [24] B. Sarwar, G. Karypis, J. Konstan, J. Reidl, Item-based collaborative filtering recom-
Conference on Artificial Intelligence, IJCAI'11, AAAI Press, 2011, pp. 2644–2649. mendation algorithms, Proceedings of the 10th International Conference on
[7] O. Koyejo, J. Ghosh, A kernel-based approach to exploiting interaction-networks World Wide Web, ACM, 2001, pp. 285–295.
in heterogeneous information sources for improved recommender systems, [25] H. Shan, A. Banerjee, Generalized probabilistic matrix factorizations for collaborative
Proceedings of the 2nd International Workshop on Information Heterogeneity filtering, Data Mining (ICDM), 2010 IEEE 10th International Conference on, IEEE,
and Fusion in Recommender Systems, ACM, 2011, pp. 9–16. 2010, pp. 1025–1030.
[8] D. Lee, H. Seung, et al., Learning the parts of objects by non-negative matrix [26] A. Singh, G. Gordon, A Bayesian matrix factorization model for relational data,
factorization, Nature 401 (6755) (1999) 788–791. Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in
[9] W. Li, D. Yeung, Relation regularized matrix factorization, Proceedings of the 21st Artificial Intelligence (UAI-10), AUAI Press, Corvallis, Oregon, 2010, pp. 556–563.
International Joint Conference on, Artificial Intelligence, 2009, pp. 1126–1131. [27] Q. Yuan, L. Chen, S. Zhao, Factorization vs. regularization: Fusing heterogeneous
[10] Y. Lim, Y. Teh, Variational Bayesian approach to movie rating prediction, Proceed- social relationships in top-n recommendation, Proceedings of the fifth ACM
ings of KDD Cup and Workshop, Citeseer, 2007, pp. 15–21. conference on Recommender Systems, ACM, 2011, pp. 245–252.
[11] Z. Lu, D. Agarwal, I. Dhillon, A spatio-temporal approach to collaborative filtering, [28] J. Zhu, H. Ma, C. Chen, J. Bu, Social Recommendation Using Low-rank Semidefinite
Proceedings of the Third ACM Conference on Recommender Systems, ACM, 2009, Program, AAAI, 2011, pp. 158–163.
pp. 13–20.
[12] X. Luo, Y. Xia, Q. Zhu, Incremental collaborative filtering recommender based on
regularized matrix factorization, Knowledge-Based Systems 27 (2012) 271–280. Juntao Liu received the BS and MS degrees in Computer Science from Ordnance Engi-
[13] H. Ma, H. Yang, M. Lyu, I. King, SoRec: social recommendation using probabilistic neering College, Shijiazhuang, China, in 2002 and 2005, respectively. He is a lecturer
matrix factorization, Proceedings of the 17th ACM Conference on Information and in the Department of Computer Engineering, Ordnance Engineering College. He is
Knowledge Management, ACM, 2008, pp. 931–940. currently pursuing the Ph.D. degree in the Department of Electronics and Information
[14] H. Ma, I. King, M. Lyu, Learning to recommend with social trust ensemble, Engineering, Huazhong University of Science and Technology. His research interests
Proceedings of the 32nd International ACM SIGIR Conference on Research and include data mining, machine learning and computer vision.
Development in Information Retrieval, ACM, 2009, pp. 203–210.
[15] H. Ma, T. Zhou, M. Lyu, I. King, Improving recommender systems by incorporating
social contextual information, ACM Transactions on Information Systems (TOIS) Caihua Wu received the BS, MS and PhD degrees in Computer Science from Ordnance
29 (2) (2011) 9:1–9:23. Engineering College, Shijiazhuang, China, in 2003, 2006 and 2009 respectively. Now she
[16] H. Ma, D. Zhou, C. Liu, M. Lyu, I. King, Recommender systems with social regular- is a lecturer in the Department of Information Counterwork, Air Force Radar Academy.
ization, Proceedings of the Fourth ACM International Conference on Web Search Her research interests include data mining, information counterwork and software
and Data Mining, ACM, 2011, pp. 287–296. engineering.
[17] P. Massa, P. Avesani, Trust-aware bootstrapping of recommender systems, ECAI
2006 Workshop on Recommender Systems, Citeseer, Riva del Garda, Italy, 2006,
pp. 29–33. Wenyu Liu received the BS degree in Computer Science from Tsinghua University,
[18] S. Nakajima, M. Sugiyama, Theoretical analysis of Bayesian matrix factorization, Beijing, China, in 1986, and the MS and PhD degrees, both in Electronics and Informa-
Journal of Machine Learning Research 12 (2011) 2583–2648. tion Engineering, from Huazhong University of Science and Technology (HUST),
[19] D.H. Park, H.K. Kim, I.Y. Choi, J.K. Kim, A literature review and classification of Wuhan, China, in 1991 and 2001, respectively. He is now a professor and associate
recommender systems research, Expert Systems with Applications 39 (11) (2012) dean of the Department of Electronics and Information Engineering, HUST. His
10059–10072. current research areas include computer graphics, multimedia information processing,
[20] I. Porteous, A. Asuncion, M. Welling, Bayesian matrix factorization with side infor- and computer vision. He is a member of IEEE System, Man and Cybernetics Society.
mation and Dirichlet process mixtures, Proceedings of the 24th AAAI Conference
on, Artificial Intelligence, 2010, pp. 563–568.
[21] C. Robert, G. Casella, Monte Carlo Statistical Methods, Springer Texts in Statistics,
Springer, 2004.

Please cite this article as: J. Liu, et al., Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation,
Decision Support Systems (2013), http://dx.doi.org/10.1016/j.dss.2013.04.002
View publication stats

You might also like