Combining Memory-Based and Model-Based Collaborative Filtering in Recommender System
Combining Memory-Based and Model-Based Collaborative Filtering in Recommender System
Abstract—Collaborative filtering (CF) technique has been systems employ statistical techniques to find a set of users,
proved to be one of the most successful techniques in known as neighbors that have a history of agreeing with the
recommender systems. Two types of algorithms for target user. Once a neighborhood of users is formed, these
collaborative filtering have been researched: memory-based systems use different algorithms to combine the preferences
CF and model-based CF. Memory-based approaches identify
of neighbors to produce a prediction for the active user
the similarity between two users by comparing their ratings on
a set of items and have suffered from two fundamental [4,5,6]. However, the memory-based CF suffers from two
problems: sparsity and scalability. Alternatively, the model- basic problems: sparsity and scalability. Sparsity refers to
based approaches have been proposed to alleviate these the difficulty that most users rate only a small number of
problems, but these approaches tend to limit the range of items. As a result, the accuracy of the method is often quite
users. This paper presents an approach that combines the poor. As for scalability, memory-based approaches often
advantages of these two kinds of approaches by joining the two cannot deal with the large numbers of users and items [5].
methods. Firstly, it employs memory-based CF to fill the In contrast to the memory-based CF, model-based CF
vacant ratings of the user-item matrix. Then, it uses the item- groups different users in the training database into a small
based CF as model-based to form the nearest neighbors of
number of classes based on their rating patterns. The model
every item. At last, it produces prediction of the target user to
the target item at real time. The collaborative filtering building process is performed by different machine learning
recommendation method combining memory-based CF and algorithms such as Bayesian network, clustering, and rule-
model-based CF can provide better recommendation than based approaches [7,8,9,10]. The model-based CF
traditional collaborative filtering. approaches are often time-consuming to build and update,
and cannot cover as diverse a user range as the memory-
Keywords-recommender system; memory-based collaborative based approaches do.
filtering; model-based collaborative filtering In this paper, we present an approach that combines the
advantages of these two kinds of approaches by joining the
I. INTRODUCTION two methods. Firstly, the proposed approach employs
Electronic commerce sites provide millions of products memory-based CF to fill the vacant ratings of the user-item
for sale. Choosing among so many items is challenging for matrix. Then, it uses the item-based CF as model-based to
consumers. Recommender systems have emerged to solve form the nearest neighbors of every item. At last, it
this problem. A recommender system for an electronic produces prediction of the target user to the target item at
commerce site recommends products that are likely to fit real time. The collaborative filtering recommendation
customer needs. There are many recommender technologies method combining memory-based CF and model-based CF
in the systems and one of the earliest and most successful can provide better recommendation than traditional
recommender technologies is collaborative filtering. collaborative filtering.
Collaborative filtering predicts the interest of products for
II. EMPLOYING MEMORY-BASED CF
an active customer based on the aggregated rating
information of the likeminded customers in a historical At first, we employ the user-based collaborative filtering
database. CF has been very successful in both research and as memory-based CF to fill the vacant ratings of the users
practice [1,2,3]. However, there remain important research where are necessary.
questions in overcoming two fundamental challenges for A. Measuring the user rating similarity
collaborative filtering recommender systems just as sparsity
and scalability. There are several similarity algorithms that have been
Two types of algorithms for collaborative filtering have used in the collaborative filtering recommendation
been researched to solve these problems: memory-based CF algorithm [11,12,13,14]: Pearson correlation, cosine vector
and model-based CF. Memory-based CF utilize the entire similarity, adjusted cosine vector similarity, mean-squared
user-item rating dataset to generate a prediction. These difference and Spearman correlation.
∑ Rik ∑ R jk
m
k =1 k =1 ∑ (R it − At )( R ir − A r )
Where Ri,k is the rating of the item k by user i and n is sim ( t , r ) = i =1
m m
the number of items co-rated by both users.
The adjusted cosine, as following formula, is used in
∑
i =1
( R it − At ) 2 ∑ ( R ir − A r ) 2
i =1
some collaborative filtering methods for similarity among Where Rit is the rating of the target item t by user i, Rir is
users where the difference in each user’s use of the rating the rating of the remaining item r by user i, At is the average
scale is taken into account. rating of the target item t for all the co-rated users, Ar is the
∑ c∈Iij
( Ric − Ac )( R jc − Ac ) average rating of the remaining item r for all the co-rated
users, and m is the number of all rating users to the item t
sim(i, j ) =
∑ ( Ric − Ac ) * ∑ c∈I ( R jc − Ac )
2 2
and item r.
c∈Iij ij
B. Producing prediction
Where Ri,c is the rating of the item c by user i, Ac is the
average rating of user i for all the co-rated items, and Ii,j is In the producing phase, we use the item-based CF to
the items set both rating by user i and user j. predict the value of the target user to the item Ij. Because we
have formed the k nearest neighbors of the item Ij, the
B. Selecting the target user neighbors prediction can produce in these items.
After calculating the similarities of the users, we select Since we have got the membership of item, we can
the top N users as the neighbors whose similarities are the calculate the weighted average of neighbors’ ratings,
bigger. weighted by their similarity to the target item.
The rating of the target user u to the target item t is as
C. Filling the vacant values
following:
Since we have got the membership of user, we can c
calculate the weighted average of neighbors’ ratings,
weighted by their similarity to the target user.
∑R ui × sim ( t , i )
The rating of the target user u to the target item t is as Pu t = i =1
c
following:
c
∑ sim ( t , i )
i =1
∑ (R it − Ai ) * sim (u , i ) Where Rui is the rating of the target user u to the
Put = Au + i =1 neighbour item i, sim(t, i) is the similarity of the target item
c
t and the neighbour item i, and c is the number of the
∑ sim (u , i )
i =1
neighbours.
Where Au is the average rating of the target user u to the IV. DATASET AND PERFORMANCE MEASUREMENT
items, Rit is the rating of the neighbour user i to the target In this section, we describe the dataset, performance
item t, Am is the average rating of the neighbour user i to measurement and results for the comparison between
the items, sim(u, i) is the similarity of the target user u and traditional and proposed CF algorithm.
the neighbour user i, and c is the number of the neighbours.
691
A. Data set V. CONCLUSIONS
We use MovieLens collaborative filtering data set to Collaborative filtering technique has been proved to be
evaluate the performance of proposed algorithm. MovieLens one of the most successful techniques in recommender
data sets were collected by the GroupLens Research Project systems. Two types of algorithms for collaborative filtering
at the University of Minnesota [15,16,17] and MovieLens is have been researched: memory-based CF and model-based
a web-based research recommender system that debuted in CF. Memory-based approaches identify the similarity
Fall 1997. Each week hundreds of users visit MovieLens to between two users by comparing their ratings on a set of
rate and receive recommendations for movies. The site now items and have suffered from two fundamental problems:
has over 45000 users who have expressed opinions on 6600 sparsity and scalability. Alternatively, the model-based
different movies. We randomly selected enough users to approaches have been proposed to alleviate these problems,
obtain 100, 000 ratings from 1000 users on 1680 movies but these approaches tend to limit the range of users. In this
with every user having at least 20 ratings and simple paper, we present an approach that combines the advantages
demographic information for the users is included. The of these two kinds of approaches by joining the two
ratings are on a numeric five-point scale with 1 and 2 methods. Firstly, it employs memory-based CF to fill the
representing negative ratings, 4 and 5 representing positive vacant ratings of the user-item matrix. Then, it uses the
ratings, and 3 indicating ambivalence. item-based CF as model-based to form the nearest neighbors
B. Performance measurement of every item. At last, it produces prediction of the target
user to the target item at real time. The collaborative
Several metrics have been proposed for assessing the filtering recommendation method combining memory-based
accuracy of collaborative filtering methods. They are CF and model-based CF can provide better recommendation
divided into two main categories: statistical accuracy than traditional collaborative filtering.
metrics and decision-support accuracy metrics. In this paper,
we use the statistical accuracy metrics [18,19,20]. REFERENCES
Statistical accuracy metrics evaluate the accuracy of a [1] B. Sarwar, G. Karypis, J. Konstan and J. Riedl, Recommender
prediction algorithm by comparing the numerical deviation systems for large-scale e-commerce: Scalableneighborhood formation
of the predicted ratings from the respective actual user using clustering, Proceedings of the Fifth International Conference on
Computer andInformation Technology, 2002
ratings. Some of them frequently used are mean absolute
[2] Chong-Ben Huang, Song-Jie Gong, Employing rough set theory to
error (MAE), root mean squared error (RMSE) and alleviate the sparsity issue in recommender system, In: Proceeding of
correlation between ratings and predictions. All of the above the Seventh International Conference on Machine Learning and
metrics were computed on result data and generally Cybernetics (ICMLC2008), IEEE Press, 2008, pp.1610-1614.
provided the same conclusions. As statistical accuracy [3] Schafer, J. B., Konstan, J., and Riedl, J. (1999). Recommender
measure, mean absolute error (MAE) is employed. Systems in E-Commerce. In Proceedings of ACM E-Commerce 1999
conference.
Formally, if n is the number of actual ratings in an item
[4] Resnick, P., and Varian, H. R. (1997). Recommender Systems.
set, then MAE is defined as the average absolute difference Special issue of Communications of the ACM. 40(3).
between the n pairs. Assume that p1, p2, p3, ..., pn is the [5] Xue, G., Lin, C., & Yang, Q., et al. Scalable collaborative filtering
prediction of users' ratings, and the corresponding real using cluster-based smoothing. In Proceedings of the ACM SIGIR
ratings data set of users is q1, q2, q3, ..., qn. See the MAE Conference 2005 pp.114–121.
definition as following: [6] Yu Li, Liu Lu, Li Xuefeng, A hybrid collaborative filtering method
n
for multiple-interests and multiple-content recommendation in E-
∑| p
Commerce, Expert Systems with Applications 28 (2005) 67–77.
i − qi | [7] Songjie Gong, Chongben Huang, Employing Fuzzy Clustering to
MAE = i =1 Alleviate the Sparsity Issue in Collaborative Filtering
n Recommendation Algorithms, In: Proceeding of 2008 International
Pre-Olympic Congress on Computer Science, World Academic Press,
The lower the MAE, the more accurate the predictions 2008, pp.449-454.
would be, allowing for better recommendations to be [8] Huang qin-hua, Ouyang wei-min, Fuzzy collaborative filtering with
formulated. MAE has been computed for different multiple agents, Journal of Shanghai University (English Edition),
prediction algorithms and for different levels of sparsity. 2007,11(3):290-295.
[9] Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl, J. (2000).
C. Comparing with the traditional CF Analysis of Recommendation Algorithms for E-Commerce. In
We compare the proposed method combining memory- Proceedings of the ACM EC’00 Conference. Minneapolis,MN. pp.
158-167
based collaborative and model-based collaborative filtering
[10] Schafer J.B., Konstan J., Riedl J. (2000). Electronic Commerce
with the traditional collaborative filtering. The performance Recommender Applications. Journal of Data Mining and Knowledge
of our proposed CF is better than the traditional CF in terms Discovery, Vol 5 (1/2), 115-152.
of the MAE measure. [11] Herlocker, J. (2000). Understanding and Improving Automated
Collaborative Filtering Systems. Ph.D. Thesis, Computer Science
Dept., University of Minnesota.
[12] Miha Grcar, Dunja Mladenic, Blaz Fortuna and Marko Grobelnik,
Data Sparsity Issues in the Collaborative Filtering
692
Framework ,Lecture Notes in Computer Science,Volume 4198 [17] M.G. Vozalis, K.G. Margaritis, Using SVD and demographic data for
2006,pp58-76 the enhancement of generalized Collaborative Filtering, Information
[13] Breese J, Hecherman D, Kadie C. Empirical analysis of predictive Sciences 177 (2007) 3017–3037.
algorithms for collaborative filtering. In: Proceedings of the 14th [18] SongJie Gong, GuangHua Cheng, Mining User Interest Change for
Conference on Uncertainty in Artificial Intelligence (UAI’98). 1998. Improving Collaborative Filtering, In:Second International
43~52. Symposium on Intelligent Information Technology
[14] Billsus, D., and Pazzani, M. J. (1998). Learning Collaborative Application(IITA2008), IEEE Computer Society Press, 2008,
Information Filters. In Proceedings of ICML ’98. pp. 46-53. Volume3, pp.24-27.
[15] Sarwar B, Karypis G, Konstan J, Riedl J. Item-Based collaborative [19] GuangHua Cheng, SongJie Gong, An Efficient Collaborative
filtering recommendation algorithms. In: Proceedings of the 10th Filtering Algorithm with Item Hierarchy, In:Second International
International World Wide Web Conference. 2001. 285-295. Symposium on Intelligent Information Technology
Application(IITA2008), IEEE Computer Society Press, 2008,
[16] Gao Fengrong, Xing Chunxiao, Du Xiaoyong, Wang Shan, Volume3, pp.28-31.
Personalized Service System Based on Hybrid Filtering for Digital
Library, Tsinghua Science and Technology, Volume 12, Number 1, [20] George Lekakos, George M. Giaglis, A hybrid approach for
February 2007,1-8. improving predictive accuracy of collaborative filtering algorithms,
User Model User-Adap Inter (2007) 17:5–40
693