0% found this document useful (0 votes)
70 views6 pages

2012 Ieee International Workshop On Machine Learning For Signal Processing, Sept. 23-26, 2012, Santander, Spain

The document describes a proposed hybrid collaborative filtering model that incorporates social information for recommendations. It constructs a directed graph with nodes consisting of items, users, and additional information like item content and user profiles/social networks. User ratings are incorporated into the edge settings of the graph model. The model provides personalized recommendations to individuals and groups using a random walk approach. It was evaluated on MovieLens and Epinions datasets and performed well compared to other graph-based methods, especially for cold start cases.

Uploaded by

Janardhan Ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

2012 Ieee International Workshop On Machine Learning For Signal Processing, Sept. 23-26, 2012, Santander, Spain

The document describes a proposed hybrid collaborative filtering model that incorporates social information for recommendations. It constructs a directed graph with nodes consisting of items, users, and additional information like item content and user profiles/social networks. User ratings are incorporated into the edge settings of the graph model. The model provides personalized recommendations to individuals and groups using a random walk approach. It was evaluated on MovieLens and Epinions datasets and performed well compared to other graph-based methods, especially for cold start cases.

Uploaded by

Janardhan Ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2012 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT.

23–26, 2012, SANTANDER, SPAIN

A RANDOM WALK BASED MODEL INCORPORATING SOCIAL INFORMATION FOR


RECOMMENDATIONS

Shang Shang, Sanjeev R. Kulkarni, Paul W. Cuff Pan Hui

Department of Electrical Engineering Deutsche Telekom Laboratories


Princeton University Ernst-Reuter-Platz 7
Princeton, NJ, 08540, U.S.A. 10587 Berlin, Germany
arXiv:1208.0787v2 [cs.IR] 17 May 2013

ABSTRACT networks provides us a good opportunity to integrate user so-


cial information into the recommendation model, so as to im-
Collaborative filtering (CF) is one of the most popular ap- prove the recommendation results or to alleviate the cold start
proaches to build a recommendation system. In this paper, problem [4][5].
we propose a hybrid collaborative filtering model based on
a Makovian random walk to address the data sparsity and Collaborative filters use the known preferences of users
cold start problems in recommendation systems. More pre- to make recommendations or predictions to a target user.
cisely, we construct a directed graph whose nodes consist of Memory-based collaborative filtering uses the entire user-
items and users, together with item content, user profile and item database to calculate the similarity value between users
social network information. We incorporate user’s ratings or items, and then a weighted sum is taken as a prediction
into edge settings in the graph model. The model provides for the target user on a certain item. See, for example, Grou-
personalized recommendations and predictions to individu- pLens [6]. Model-based approaches such as Bayesian Belief
als and groups. The proposed algorithms are evaluated on Net CF [7] and regression-based CF [8] learn a complex pat-
MovieLens and Epinions datasets. Experimental results show tern from training data and use the model to predict a user’s
that the proposed methods perform well compared with other preference. The most related work are [2][9][10][11]. Fouss,
graph-based methods, especially in the cold start case. et al. [2] suggested a dissimilarity measure between nodes
of a graph, the expected commute time between two nodes,
Index Terms— Recommendation system, random walk, which the authors applied to collaborative filtering. Specially,
social networks, hybrid collaborative filtering model they constructed an indirected bipartite graph where nodes
are users and movies. A link is placed if the user watched
that movie. Movies are then ranked in an ascendending order
1. INTRODUCTION
according to the average commute time to the target node.
Gori et al [9] built their graph model by only using items as
Over the last decade, the commercialization of early gener-
nodes. In [9], two nodes are connected if at least one user
ations of recommendation systems achieved great success.
rated both nodes. The weight of the edge is set as the number
Recommendation systems serve as an important component
of users who rated both of the nodes. A random-walk based
of online retail and Video on Demand (VoD) services such as
algorithm is then used to rank items according to the target
Amazon and Netflix [1]. Recommenders typically provide the
user’s preference record. In [10], the authors combine the
target user a list of customized recommendations through col-
trust based and collaborative filtering approaches for recom-
laborative filtering or content-based filtering. Intensive work
mendation. Target users take a finite-step random walk on
has been done to improve the performance of both of these
a trust network, so as to use the ratings by trusted users to
techniques. Traditional recommendation systems assume that
assist prediction. More recently, Bogers [11] proposed Con-
users are independent, and recommendations are given ac-
textWalk, a collaborative filtering method to include different
cording to users’ explicit or implicit rating history and/or item
types on contextual information by taking random walks over
content information[2][3]. Problems such as data sparsity,
the contextual graph.
cold start, and shilling attack still challenge the design of
recommendation systems [3]. User profile and social infor- In this paper, we propose a random walk based hybrid
mation, on the other side, provides extra information on user collaborative filtering model that incorporates the social in-
preference. This information is especially helpful in the case formation of users. It is shown in [12] that a random walk ap-
of giving recommendations to a new user with little or no rat- proach is very effective in link prediction on social networks.
ing history. The emergence of e-commerce and online social Inspired by [12] and [13], we create a recommendation graph,
can either be explicit, for example, on a 1-5 scale as in Net-
flix, or implicit such as purchases or clicks. These data form
a m × n rating matrix R, where Rui denotes the rating of
user u on item i. Assume that binary tagging and user social
information is given. Let T = {t1 , t2 , ..., tk } be the set of
tagging information of items. For example, for movies, T can
be genre, main actor, release date, etc. Ti ∈ {0, 1}k denotes
the features of item i, where k is the total number of tags.
Correspondingly, let P = {p1 , p2 , ..., pl } be the set of user
profile information, including age, occupation, gender, etc..
Pu ∈ {0, 1}l denotes the profile features of user u, where
l is the dimension of the features of all users. S = (U, Es )
contains social network information, represented by an undi-
rected or directed graph, where U is a set of nodes and Es is
a set of edges. For all u, v ∈ U, (u, v) ∈ E if v is a friend of
u. We want to make recommendations for a target user or a
Fig. 1. Hybrid collaborative filtering graph example. group of users given the above information.
The rest of this paper is organized as follows. We pro-
pose our random walk based recommendation model in Sec-
as shown in Fig. 1, consisting of items, users, item genres, and
tion 2. The performance of the proposed model is evaluated
user profile information as nodes. Similar to PageRank, the
in Section 3, followed by conclusions and acknowledgements
stable distribution resulting from a random walk on the graph
in Section 4 and 5.
is interpreted as a ranking of the nodes for the purpose of
recommendation and prediction. The structure of the collab-
orative filtering part of the recommendation graph is similar 2. A HYBRID COLLABORATIVE FILTERING
as the graph proposed in [2] and [11] in the means of connect- MODEL BASED ON RANDOM WALKS
ing the user u node and item node i if there is a rating record
of u on i. Unfortunately, in [11], the author did not provide In this section, we will describe our algorithm in detail, in-
experimental result to evaluate the performance, and the edge spired by Google’s PageRank. Specially we describe how to
settings for constructing the network are not clear. In [2], the construct the graph and make recommendations.
authors assigned unit weight for the edges in the graph which PageRank [13] calculates a probability distribution repre-
cannot capture the user preference effectively. The expected senting the likelihood that a web surfer randomly clicking on
commute time between item and user nodes was taken as the web links will arrive at any webpage. A similar approach can
similarity measure to make recommendation. In [2] and [11], be used for movie recommendation. Every time a user has
the authors only gave a list of recommended items; no rating watched a movie, the system may show some more movies
prediction is available. In this paper, the edges of the graph that other users who like this one also like. As in PageRank,
is related to user rating score instead of simply being set to a there is a damping factor to indicate that the movie watcher
unit value. Apart from the collaborative filtering graph which may finally stop browsing. Now the key issue is how to con-
only contains user rating information, we add user social pro- struct this recommendation graph and represent flow on the
file and social network information, which makes it possible graph.
to provide customized recommendation to new users even if
no previous rating information is available. The main contri- 2.1. Graph construction
bution of this paper is: (1) we propose a hybrid collaborative
filtering graph model incorporating user social network, user 2.1.1. Graph settings
profile information, together with item content and user-item Let G = {V, E} be a directed graph model for CF, where
rating history together to give recommendations; (2) we de- V := U ∪ I ∪ T ∪ P. The nodes of the graph consist of users,
scribe in detail the construction and edge weight assignment items, item information and user profiles. For vi , vj ∈ V,
which reflect user preferences effectively; (3) we extend the (vi , vj ) ∈ E if and only if there is an edge point from vi to vj ,
application of the recommendation algorithm to group recom- which is determined as given below. The weight are specified
mendation; (4) we design experiments on multiple data sets to in the next subsection.
evaluate the performance of proposed algorithm.
In a typical setting, there is a list of m users U = • For u ∈ U, i ∈ I, (u, i) ∈ E and (i, u) ∈ E if and only
{u1 , u2 , ..., um } and a list of n items I = {i1 , i2 , ..., in }. if Rui 6= 0, i.e., an item i and a user u are connected if
Each user uj has a list of items Iuj , that the user has rated or there is a rating records of user u on item i, with weight
from which the user’s preference can be inferred. The ratings wui and wiu .
• For i ∈ I, t ∈ T , (i, t) ∈ E and (t, i) ∈ E if and only if 2.2.2. Rank score computation
(t)
Ti 6= 0, i.e., the item i and tag t are connected if i is
For the recommendation graph G = {V, E}. Let v = |V|
tagged by t, with weight wit and wti .
denote the number of nodes on the graph. m is a v × 1 cus-
• For u ∈ U, p ∈ P, (u, p) ∈ E and (p, u) ∈ E if and tomerized probability vector.
(p)
only if Pu 6= 0, i.e., a profile feature p and a user u are
connected if the user u belongs to the profile category θ = eu , (4)
p, with weight wup and wpu .
where e1 , e2 , ..., ev are the standard basis of column vectors.
• For u1 , u2 ∈ U, (u1 , u2 ) ∈ E if and only if (u1 , u2 ) ∈ β is a damping factor. With probability 1 − β, the random
Es , with weight wu1 u2 . Note that the relationship in walk is teleported back to node u. The rank score s satisfies
social networks is not necessarily mutual, it could be the following equation:
a unilateral relationship such as in Twitter1 , epinions2 ,
etc. s = βW s + (1 − β)θ, (5)

2.1.2. Edge weight assignment where W is the weighted transition matrix with Wij = Pji .
So we have,
The main part of our rank graph is the collaborative filter-
ing graph, which includes the user nodes, item nodes and the
s = βW + (1 − β)θ1T s := M s

(6)
edges between them. The weights of edges in the collabora-
tive filtering graph can be assigned as follows:
Hence the rank score is the principal eigenvector of M , which
  can be computed by iterations fast and easily as shown be-
r ui − r̄u low:
wui = wiu = exp  qP , (1)
(0)
i∈Iu (rui − r̄u )2 si ← v1 for all i
t=1
while |s(t) − s(t−1) | <  do
P
i∈Iu rui
r̄u := . (2) for i = 1 P
to v do
|Iu | (t) v (t−1)
si = j=1 βWij si + (1 − β)θi
where Iu denote the set of items which user u has rated. Note
end for
that a larger edge weight indicates more chance that the ran-
t←t+1
dom walk passes through that edge. If user u’s rating on item
end while
i rui is lower than the average rating r̄u , wui and wiu are
less than 1; otherwise are greater than 1. The assignment of Similar to PageRank, the rank score s is interpreted as
weights do not depend on the variance of the user’s ratings. the importance of other nodes to the target user u. It is easy
For the extended graph, i.e. nodes and edges containing to see that we can increase the rank score by shortening the
item content, user profile or social network information, we distance, adding more paths, or increasing the weight on the
simply assign an edge weight of 1 if an edge is present. path to u. These are desired properties in a recommendation
system. For example, even if item i is not directly connected
with u, but it is in the category to which many of u’s highly
2.2. Rank score computation
rated items belong, i is very likely to have a high rank score.
2.2.1. Random walk on a weighted graph Or if both user u and u0 have similar opinions on a variety
of items, u0 will have high rank, so we can use u0 ’s explicit
A random walk is a Markov process with random variables ratings to make recommendations and predictions for u.
X1 , X2 , ..., Xt , ... such that the next state only depends on
the current state. For a random walk on a weighted graph,
Xt+1 is a vertex chosen according to the following probability 2.3. Recommendation
distribution:
2.3.1. Direct method
wij
Pij := P (Xt+1 = j|Xt = i) = P , (3)
j∈Ni wij Solving Equation (5) iteratively, we have a rank score of all
nodes of the recommendation graph G. Since the rank score
where Ni are the neighbors of i, Ni := {j|(i, j) ∈ E}. As represents the importance to the target user, we then sepa-
mentioned in Section 2.1.2, a higher weight indicates a higher rate and sort them according to the categories, i.e. users U,
chance that the random walk moves through that edge. items I, tags T etc. Sorted items excluding Iu form a rec-
1 twitter.com ommendation list to the target user u. We can compute the
2 www.epinions.com recommendation for every user.
2.3.2. User-based recommendation
Similar to memory-based collaborative filtering which uses
Pearson correlation [6] as a similarity measure between users
and items, we use rank score as an influence measure to make
predictions. Given the rank score of the user set U, we take
the weighted sum of users’ ratings on item i as a prediction
for the target user u, as shown in Equation (7):
P
sx (rxi − r̄x )
r̂ui = x∈UP
user i
+ r̄u . (7)
x∈Ui sx

sx is the target user’s personalized rank score of user x.


Fig. 2. User rating distribution of Epinions and MovieLens
2.3.3. Item-based recommendation datasets.

As in Section 2.3.2, in order to perform an item-based recom-


mendation, we can use the rank score of item set I as weight 2.5.2. Dealing with “cold” users and “cold” items
to predict the rating of item i for the target user u. As shown
in Equation (8) A great challenge to recommendation systems resulting from
P data sparsity is the cold start problem, namely, the question
item j∈I sj ruj of how to effectively give recommendations to new users. A
r̂ui = P u . (8)
j∈Iu sj naı̈ve approach is to provide the same recommendation to ev-
eryone. Studies show that two persons connected via a social
In Equation (8) we use u’s rating on similar items to predict
relationship tend to have similar tastes, which is known as the
the rating on i. sj is the target user’s personalized rank score
“homophily principle” [16]. The availability of online social
of item j.
network offers us extra information about new users. Given
the social network information, if a new user is connected
2.4. Incremental computation with other nodes in our recommendation graph, we can then
In practice, the rating information and user’s social informa- make personalized recommendation for the “cold” user even
tion evolves. The recommendation graph changes when a new if we do not have any rating information from this user. Sim-
rating record is input, a new item is on sale, a new user reg- ilarly, for “cold” items we connect a new item in the recom-
isters, or even when a user changes his profile. Thanks to the mendation graph according to its tagging information, so that
popularity of PageRank, incremental computation of PageR- we can then recommend the “cold” item to users. Experimen-
ank has been studied intensively [14][15]. It is shown in [15] tal results are shown in Section 3.
that with a reset probability of , the total work needed to
maintain an accurate estimate of the PageRank of every node
at all times is O( n ln m 3. EXPERIMENTS
2 ) in a network with n nodes, and m
edges. Since it is beyond the scope of this paper, we do not
address technical details for this problem. 3.1. Data sets

In order to evaluate the performance of the proposed algo-


2.5. Discussions
rithm, we run experiments on Epinions and MovieLens data
2.5.1. Recommendations for groups sets, both of which are widely used benchmarks for recom-
mendation systems. Epinions is a website where users can
Because of the special structure of the rank graph, we can post their reviews and ratings (1-5) on a variety of items
naturally extend the recommendation for individual users to (songs, softwares, TVs, etc.), as long as user’s web of trust,
groups. Note that in order to give recommendations for indi- i.e. “reviewers whose reviews and ratings they have consis-
viduals, we set the personalized vector in Section 2.2.2 as eu . tently found to be valuable” [17]. We randomly select 946
Similarly, for recommendation for a group of users û, we can items, 973 users and their trust network from Epinion data
set the personal vector θ as sets [17] to perform the experiments. The MovieLens data
1 X sets consists of 1682 movies and 943 users. Movies are la-
θ= eu . (9) beled by 19 genres. User profile information such as age,
|û|
u∈û
gender and occupation is also available. User rating distribu-
The rest of the predictions are same as described in the previ- tions and histograms of ratings per user for both data sets are
ous sections. shown in Fig. 2 and Fig. 3.
Fig. 3. Histogram of ratings per user of Epinions and Movie- Fig. 4. Epinion data sets top-k recall.
Lens datasets.

3.2. Experimental methodology and results

We evaluate our results with two popular evaluation metrics


for top-k recommendations: recall and percentile.
Recall: In the top-k recommendations, we consider any
item in the top-k recommendations that match any items in
the testing set as a “hit”, as in [18].

#hits of top-k
recall(k) = , (10)
T Fig. 5. MovieLens data sets top-k recall.

where T is the size of testing set. A higher recall value indi-


cates a better prediction. 4. CONCLUSIONS
Percentile: The individual percentile score is simply the
average position (in percentage) that the item in the test set In this paper, we present a hybrid collaborative filtering model
occupies in the recommendation list. For example, if four based on a random walk for recommendation systems. It in-
items are ranked 1st, 9th, 10th and 20th in a recommendation corporates item content and user social information to make
list consisting of 100 items, the percentile score is 0.1. A recommendations and predictions for target users. Social in-
lower percentile indicates a better prediction. formation improves the “cold start” performance when lack-
ing user rating information. Experiments are performed on
In this experiment, the test set T contains all the 5-star
two standard real-world data sets. The experimental results
rating records, thus we can consider them as relevant items
shows that the proposed method performs well compared to
for recommendation. The recommendation list has a length
other state-of-art collaborative filtering methods.
of 500 items for Epinions data sets and 900 for MovieLens
data sets. We compare our methods UserRank CF (without
social information) and UserRank in Section 2 with two state-
of-art collaborative filtering methods L+ [2] and ItemRank [9] 5. ACKNOWLEDGEMENTS
described in Section 1.
Experimental results of recall score are shown in Fig. 4 This research was supported in part by the Center for Science
and Fig. 5. We can see that UserRank has a higher recall score of Information (CSoI), a National Science Foundation (NSF)
in both data sets compared with baseline methods. However, Science and Technology Center, under grant agreement CCF-
in a “warm start” scenario, adding social information does 0939370, by NSF under the grant CCF-1116013, by the U.S.
not change the performance much. In Table 1 and Table 2, Army Research Office under grant number W911NF-07-1-
we compared the percentile value for both warm start and 0185, and by a research grant from Deutsche Telekom AG.
cold start cases. It is worth noting that social information The authors would like to thank Dr. Guanchun Wang for
improves the performance of UserRank considerably in “cold valuable comments, and Mr. Julien Barbot for assistance in
start” case. running the experiments.
on Tools with Artificial Intelligence (ICTAI 06), pp. 497–
Table 1. Average percentile results obtained by 5-fold cross-
504, 2006.
validation for warm-start recommendation.
Methods Epionions MovieLens [8] Slobodan Vucetic and Zoran Obradovic, “Collaborative
L+ 0.4192 0.1157 filtering using a regression-based approach,” Knowledge
ItemRank 0.3983 0.1150 and Information Systems, vol. 7, pp. 1–22, 2005.
UserRank CF 0.2325 0.081
UserRank with social info. 0.2457 0.079 [9] Marco Gori and Augusto Pucci, “Itemrank: a random-
walk based scoring algorithm for recommender en-
gines,” In Proceedings of the 20th international joint
conference on Artifical intelligence, 2007.
Table 2. Average percentile results obtained by 5-fold cross-
validation for cold-start recommendation. [10] Mohsen Jamali and Martin Ester, “Trustwalker: a ran-
Methods Epionions MovieLens dom walk model for combining trust-based and item-
ItemRank 0.4592 0.1356 based recommendation,” In Proceedings of the 15th
UserRank CF 0.3231 0.1204 ACM SIGKDD international conference on Knowledge
UserRank with social info. 0.2874 0.1112 discovery and data mining, pp. 397–406, 2009.
[11] Toine Bogers, “Movie recommendation using random
walks over the contextual graph,” In Proceedings of
6. REFERENCES the 2nd Workshop on Context-Aware Recommender Sys-
tems, 2010.
[1] Ruslan Salakhutdinov and Andriy Mnih, “Probablistic
matrix factorization applied to the netflix rating predic- [12] David Liben-Nowell and Jon Kleinberg, “The link pre-
tion problem,” Proc. Advances in Neural Information diction problem for social networks,” In CIKM 03:
Processing Systems 20 (NIPS 07), pp. 1257–1264, 2008. Proceedings of the twelfth international conference on
Information and knowledge management, pp. 556–559,
[2] Francois Fouss, Alain Pirotte, Jean-Michel Renders, and 2003.
Marco Saerens, “Random-walk computation of similar-
[13] S. Brin and L. Page, “The anatomy of a large-scale hy-
ities between nodes of a graph with application to col-
pertextual web search engine,” in Seventh International
laborative recommendation,” Knowledge and Data En-
World-Wide Web Conference (WWW 1998), 1998.
gineering IEEE Transactions, vol. 19, no. 3, pp. 355–
369, March 2007. [14] Yen-Yu Chen, Qingqing Gan, and Torsten Suel, “Local
methods for estimating pagerank values,” In CIKM ’04:
[3] Xiaoyuan Su and Taghi M. Khoshgoftaar, “A survey of
Proceedings of the thirteenth ACM international confer-
collaborative filtering techniques,” Advances in Artifi-
ence on Information and knowledge management, pp.
cial Intelligence, vol. 2009, no. 421425, 2009.
381–389, 2004.
[4] Jianming He and Wesley W. Chu, “A social network [15] B. Bahmani, A. Chowdhury, and A. Goel,
based recom- mender system,” Annals of Information “Fast incremental and personalized pagerank,”
Systems: Special Issue on Data Mining for Social Net- http://arxiv.org/abs/1006.2880, 2010.
work Data (AIS-DMSND), 2010.
[16] Miller McPherson, Lynn Smith-Lovin, and James M.
[5] Jordi Palau, Miquel Montaner, Beatriz Lpez, and Josep Cook, “Birds of a feather: Homophily in social net-
Llus De La Rosa, “Collaboration analysis in recom- works,” Annual Review of Sociology, vol. 27, pp. 415–
mender systems using social networks,” Cooperative 444, 2001.
Information Agents VIII: 8th International Workshop,
2004. [17] Paolo Massa and Paolo Avesani, “Trust-aware boot-
strapping of recommender systems,” In Proceeedings of
[6] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter ECAI Workshop on Recommender Systems, pp. 29–33,
Bergstrom, and John Riedl, “Grouplens: an open archi- 2006.
tecture for collaborative filtering of netnews,” Proceed-
ings of the ACM Conference on Computer Supported [18] Karen H. L. Tso-Sutter, Leandro Balby Marinho, and
Cooperative Work, pp. 175–186, 1994. Lars Schmidt-Thieme, “Tag-aware recommender sys-
tems by fusion of collaborative filtering algorithms,”
[7] Xiaoyuan Su and Taghi M. Khoshgoftaar, “Collabora- Proceedings of the 2008 ACM symposium on Applied
tive filtering for multi-class data using belief nets algo- computing, 2008.
rithms,” In Proceedings of the International Conference

You might also like