A Personalized Recommender Integrating Item-Based and User-Based Collaborative Filtering
A Personalized Recommender Integrating Item-Based and User-Based Collaborative Filtering
Abstract—Recommender systems employ prediction into a set of items; the target user has already rated and
algorithms to provide users with items that match their computes how similar they are to the target item under
interests. The collaborative filtering (CF) is the most popular recommendation. After that, it also combines his previous
system and the two of the most famous techniques in CF are preferences based on these item similarities. The challenge
the user-based CF (UBCF) and item-based CF (IBCF).
of these two CF as following [3,4]:
Nevertheless each of them takes only one-directional
information from the user-item ratings matrix to generate Sparsity: Even as users are very active, there are a few
recommendations. In other words, the UBCF utilizes user rating of the total number of items available in a database.
similarities and the IBCF tries to make a prediction by As the main of the CF algorithms are based on similarity
utilizing item similarities. It means that methods may use only measures computed over the co-rated set of items, large
half of the total information from the given data set. For levels of sparsity can lead to less accuracy.
completing the missing part of usable information, this paper Scalability: CF algorithms seem to be efficient in
proposes a CF algorithm integrating the UBCF and IBCF, filtering in items that are interesting to users. However, they
which takes both vertical and horizontal information in the require computations that are very expensive and grow non-
user-item matrix. It produces perdition using IBCF to form a
linearly with the number of users and items in a database.
dense user-item matrix and then recommends using UBCF
based on the dense matrix. The experimental results on Cold-start: An item cannot be recommended unless it has
MovieLens dataset show that the proposed algorithm been rated by a number of users. This problem applies to
outperformed in terms of prediction accuracy and robustness new items and is particularly detrimental to users with
to data sparseness. eclectic interest. Likewise, a new user has to rate a sufficient
number of items before the CF algorithm be able to provide
Keywords-personalized recommender; sparsity; item-based accurate recommendations.
collaborative filtering; user-based collaborative filtering In allusion to use the user-item rating matrix in one
direction, in this paper, we proposed to use the matrix twice,
I. INTRODUCTION horizontally and vertically, that is, make two-way
The large amount of information of web sites comes predictions. Sometimes one prediction looks more reliable
forth to people as the rapid growth and wide application of than the other, and vise versa, according to some criteria. In
the Internet. The problem of obtaining needful information this case we can make our final prediction based on these
from such environment becomes more and more serious two, which have basically orthogonal relationship each
[1,2]. To solve the problem, various types of recommender other. It produces perdition using IBCF to form a dense
systems have also been developed. And the collaborative user-item matrix and then recommends using UBCF based
filtering is becoming the most popular recommendation on the dense matrix. The experimental results show that the
algorithm in the real world. proposed algorithm outperformed in terms of prediction
Many researchers have proposed various kinds of CF accuracy.
technologies to make a quality recommendation. All of them
II. PREDICTION USING ITEM-BASED COLLABORATIVE
make a recommendation based on the same data structure as
FILTERING
user-item matrix having users and items consisting of their
rating scores. There are two methods in CF as UBCF and A. The sparse user-item matrix
IBCF [1]. UBCF assumes that a good way to find a certain
user’s interesting item is to find other users who have a The task of the recommendation system concerns the
similar interest. So, at first, it tries to find the user’s prediction of the target user’s rating for the target item,
neighbors based on user similarities and then combine the based on the users ratings on observed items. Each user is
neighbor users’ rating scores, which have previously been represented by item-rating pairs, and can be summarized in
expressed, by similarity weighted averaging. And IBCF a user-item table, which contains the ratings Rij that have
fundamentally has the same scheme with UBCF. It looks been provided by the ith user for the jth item, the table as
following.
∑ (R it − At ) 2
∑ (R ir − Ar ) 2
∑R ui × sim ( t , i )
i =1 i =1 Put = i =1
c
Where Rit is the rating of the target item t by user i, Rir is
the rating of the remaining item r by user i, At is the average
∑ sim ( t , i )
i =1
rating of the target item t for all the co-rated users, Ar is the Where Rui is the rating of the target user u to the
average rating of the remaining item r for all the co-rated neighbour item i, sim(t, i) is the similarity of the target item
users, and m is the number of all rating users to the item t t and the neighbour item i, and c is the number of the
and item r. neighbours.
The cosine measure, as following formula, looks at the
angle between two vectors of ratings as the target item t and III. RECOMMENDER USING USER-BASED COLLABORATIVE
the remaining item r. FILTERING
m
Through the calculating the vacant user’s rating by item-
∑ R it R ir based CF algorithm, we gained the complete users’ ratings.
sim ( t , r ) = i =1 Then, to generate prediction of a user's rating, we use the
m m user-based collaborative filtering algorithms.
∑
i =1
R it 2 ∑ R ir 2
i =1
A. The dense user-item matrix
After we use the item-based CF, we gained the complete
Where Rit is the rating of the target item t by user i, Rir is
ratings of the users to the items. So, the original sparse user-
the rating of the remaining item r by user i, and m is the
item rating matrix is now becoming the dense user-item
number of all rating users to the item t and item r.
matrix.
The adjusted cosine, as following formula, is used for
similarity among items where the difference in each user’s B. Measuring the user rating similarity
use of the rating scale is taken into account. We also use the Pearson correlation measurement to
compute the users’ similarity, as following formula.
Pearson’s correlation, as following formula, measures the
linear correlation between two vectors of ratings.
265
∑ (R − A)(R − A )
c∈Iij i,c i j,c j
statistical accuracy metrics and decision-support metrics.
Statistical accuracy metrics evaluate the accuracy of a
sim(i, j) = predictor by comparing predicted values with user provided
∑ (R − A) ∑ (R − A )
2 2 values. Decision-support accuracy measures how well
c∈Iij i,c i c∈Iij j,c j predictions help user select high-quality items. In this paper,
we use decision-support accuracy measures.
Where Ri,c is the rating of the item c by user i, Ai is the
average rating of user i for all the co-rated items, and Iij is the Decision support accuracy metrics evaluate how effective
items set both rating by user i and user j. a prediction engine is at helping a user select high-quality
items from the set of all items. The receiver operating
C. Selecting the target user neighbors characteristic (ROC) sensitivity is an example of the
Select of the neighbors who will serve as recommenders. decision support accuracy metric. The metric indicates how
Two techniques have been employed in recommender effectively the system can steer users towards highly-rated
systems: items and away from low-rated ones. We use ROC-4
(a) Threshold-based selection, according to which users measure as the evaluation metric. Assume that p1, p2, p3,
whose similarity exceeds a certain threshold value are ..., pn is the prediction of users' ratings, and the
considered as neighbors of the target user. corresponding real ratings data set of users is q1, q2, q3, ...,
(b) The top-n technique in which a predefined number of n- qn. See the ROC-4 definition as following:
best neighbors is selected. n
266
[1] Lee, J.-S., & Olafsson, S., Two-way cooperative prediction for [5] Hyung Jun Ahn, A new similarity measure for collaborative filtering
collaborative filtering recommendations, Expert Systems with to alleviate the new user cold-starting problem, Information Sciences
Applications (2008), doi:10.1016/ j.eswa.2008.06.106. 178 (2008) 37-51.
[2] Songjie Gong, Hongyan Pan, Personalized Recommendation in Short [6] Chong-Ben Huang, Song-Jie Gong, Employing rough set theory to
Message Serivices, In: Proceeding of 2008 International Pre-Olympic alleviate the sparsity issue in recommender system, In: Proceeding of
Congress on Computer Science, World Academic Press, 2008, pp.69- the Seventh International Conference on Machine Learning and
72. Cybernetics (ICMLC2008), IEEE Press, 2008, pp.1610-1614.
[3] Manos Papagelisa, Dimitris Plexousakis, Qualitative analysis of user- [7] Gao Fengrong, Xing Chunxiao, Du Xiaoyong, Wang Shan,
based and item-based prediction algorithms for recommendation Personalized Service System Based on Hybrid Filtering for Digital
agents, Engineering Applications of Artificial Intelligence 18 (2005) Library, Tsinghua Science and Technology, Volume 12, Number 1,
781–789. February 2007,1-8.
[4] Songjie Gong, Chongben Huang, Employing Fuzzy Clustering to [8] SongJie Gong, The Collaborative Filtering Recommendation Based
Alleviate the Sparsity Issue in Collaborative Filtering on Similar-Priority and Fuzzy Clustering, In: Proceeding of 2008
Recommendation Algorithms, In: Proceeding of 2008 International Workshop on Power Electronics and Intelligent Transportation
Pre-Olympic Congress on Computer Science, World Academic Press, System (PEITS2008), IEEE Computer Society Press, 2008, pp. 248-
2008, pp.449-454. 251.
267