An Item-based Collaborative Filtering Recommendation Algorithm Using Slope
An Item-based Collaborative Filtering Recommendation Algorithm Using Slope
DeJia Zhang
Wenzhou Vocational and Technical College
Wenzhou 325035, China
e-mail: [email protected]
Abstract—Collaborative filtering is one of the most important predictions concerning users’ interest on unobserved items.
technologies in electronic commerce. With the development of However, in most cases in real-world applications, the ratio
recommender systems, the magnitudes of users and items grow of rated items to the total of available items is very low. The
rapidly, resulted in the extreme sparsity of user rating data set. absence of a sufficient amount of available ratings
Traditional similarity measure methods work poor in this
significantly affects CF methods reducing the accuracy of
situation, make the quality of recommendation system
decreased dramatically. Poor quality is one major challenge in prediction. The sparsity of ratings problem is particularly
collaborative filtering recommender systems. Sparsity of users’ important in domains with large or continuously updated list
ratings is the major reason causing the poor quality. To of items as well as a large number of users. The sparsity
address this issue, an item-based collaborative filtering problem may occur when either none or few ratings are
recommendation algorithm using slope one scheme smoothing available for the target user, or for the target item that
is presented. This approach predicts item ratings that users prediction refers to, or for the entire database in average
have not rated by the employ of slope one scheme, and then [5,6,7]. Different treatments are required and different
uses Pearson correlation similarity measurement to find the prediction techniques must be employed depending on the
target items’ neighbors, lastly produces the recommendations.
sparsity conditions, making the selection of an appropriate
The experiments are made on a common data set using
different recommender algorithms. The results show that the approach a cumbersome task. Current CF approaches are
proposed approach can improve the accuracy of the limited in the sense that they address specific aspects of the
collaborative filtering recommender system. above problem.
To address the sparsity issue in the dataset, in this paper,
Keywords-recommender system; collaborative filtering; slope an item-based collaborative filtering recommendation
one scheme; sparsity algorithm using slope one scheme smoothing is presented.
This approach predicts item ratings that users have not rated
I. INTRODUCTION by the employ of slope one scheme, and then uses Pearson
With the rapid growth and wide application of the correlation similarity measurement to find the target items’
Internet and Electronic Commerce systems, information neighbors, lastly produces the recommendations. The
system has provided an unprecedented abundance of experiments are made on a common data set using different
information resources, and it has also led to the problem of recommender algorithms. The results show that the
information overload. Thus, methods to help find resources proposed approach can improve the accuracy of the
of interest have attracted much attention from both collaborative filtering recommender system.
researchers and vendors. To deal with the problem, the II. ITEM-BASED COLLABORATIVE FILTERING USING
personalized recommendation systems play a more SLOPE ONE SCHEME SMOOTHING
important role [1]. The famous electronic commerce website
Amazon and CD-Now have employed recommender A. Slope one scheme[8]
technique to recommend products to customers and it has
The Slope One scheme works on the intuitive principle of
improved quality and efficiency of their services [2,3]. a popularity differential between items for users. In a
Although a multifarious of recommendation techniques pairwise fashion, it determines how much better one item is
has been developed recently, collaborative filtering (CF) has liked than another. One way to measure this differential is
been known to be the most successful recommendation simply to subtract the average rating of the two items. In turn,
techniques and has been used in a number of different this difference can be used to predict another user’s rating of
applications such as recommending web pages, movies, one of those items, given their rating of the other.
tapes and products [1,4]. The CF assumes that a good way
to find a certain user’s interesting content is to find other B. Using slope one scheme to smooth the dataset
people who have similar interests with him. CF methods The slope one scheme takes into account both
operate upon user ratings on observed items making information from other users who rated the same item and
C. Smoothing ∑ sim ( t , i )
i =1
One of the challenges of the collaborative filtering is the Where Rui is the rating of the target user u to the
data sparsity problem. To prediction the vacant values in neighbour item i, sim(t, i) is the similarity of the target item
user-item rating dataset, we make explicit use of slope one t and the neighbour item i, and c is the number of the
scheme as prediction mechanisms. neighbours.
Based on the slope one scheme results, we apply the
prediction strategies to the vacant rating data as follows: III. DATASET AND MEASUREMENT
∑ (R it − At )( R ir − A r ) B. Performance measurement
sim ( t , r ) = i =1 Several metrics have been proposed for assessing the
m m accuracy of collaborative filtering methods. They are
∑ ( R it − At ) 2 ∑ ( R ir − A r ) 2 divided into two main categories: statistical accuracy
i =1 i =1 metrics and decision-support accuracy metrics. In this paper,
Where Rit is the rating of the target item t by user i, Rir is we use the statistical accuracy metrics [12,13].
the rating of the remaining item r by user i, At is the average Statistical accuracy metrics evaluate the accuracy of a
rating of the target item t for all the co-rated users, Ar is the prediction algorithm by comparing the numerical deviation
average rating of the remaining item r for all the co-rated of the predicted ratings from the respective actual user
ratings. Some of them frequently used are mean absolute
216
error (MAE), root mean squared error (RMSE) and make the quality of recommendation system decreased
correlation between ratings and predictions. All of the above dramatically. Poor quality is one major challenge in
metrics were computed on result data and generally collaborative filtering recommender systems. Sparsity of
provided the same conclusions. As statistical accuracy users’ ratings is the major reason causing the poor quality.
measure, mean absolute error (MAE) is employed. To address this issue, in this paper, an item-based
Formally, if n is the number of actual ratings in an item collaborative filtering recommendation algorithm using
set, then MAE is defined as the average absolute difference slope one scheme smoothing is presented. This approach
between the n pairs. Assume that p1, p2, p3, ..., pn is the predicts item ratings that users have not rated by the employ
prediction of users' ratings, and the corresponding real of slope one scheme, and then uses Pearson correlation
ratings data set of users is q1, q2, q3, ..., qn. See the MAE similarity measurement to find the target items’ neighbors,
definition as following: lastly produces the recommendations. The experiments are
n made on a common data set using different recommender
∑| p i − qi | algorithms. The results show that the proposed approach can
MAE = i =1 improve the accuracy of the collaborative filtering
n recommender system.
The lower the MAE, the more accurate the predictions
REFERENCES
would be, allowing for better recommendations to be
formulated. MAE has been computed for different [1] Chong-Ben Huang, Song-Jie Gong, Employing rough set theory to
alleviate the sparsity issue in recommender system, In: Proceeding of
prediction algorithms and for different levels of sparsity. the Seventh International Conference on Machine Learning and
Cybernetics (ICMLC2008), IEEE Press, 2008, pp.1610-1614.
C. Comparing with the traditional CF
[2] Jong-Seok Lee, Chi-Hyuck Jun, Jaewook Lee, Sooyoung Kim,
We compare the proposed method with the traditional Classification-based collaborative filtering using market basket data,
collaborative filtering. The size of the neighborhood has a Expert Systems with Applications 29 (2005) 700–704.
significant effect on the prediction quality. In our [3] Hyung Jun Ahn, A new similarity measure for collaborative filtering
experiments, we vary the number of neighbors and compute to alleviate the new user cold-starting problem, Information Sciences
178 (2008) 37-51.
the MAE. The obvious conclusion from Figure 1, which
[4] M.G. Vozalis, K.G. Margaritis, Using SVD and demographic data for
includes the Mean Absolute Errors for the proposed the enhancement of generalized Collaborative Filtering, Information
algorithm and traditional CF as observed in relation to the Sciences 177 (2007) 3017–3037.
different numbers of neighbors, is that our proposed [5] Zhao Liang, Hu NaiJing, and Zhang ShouZhi algorithm design for
algorithm is better. personlalized recommendation systems, journal of computer research
and development, Vol 39,No 18 ,2002.
0.9 [6] George Lekakos, George M. Giaglis, Improving the prediction
Traditional CF accuracy of recommendation algorithms: Approaches anchored on
Proposed CF human factors, Interacting with Computers 18 (2006) 410–431.
0.87 [7] S. Maneeroj, H. Kanai, K. Hakozaki, “Combining Dynamic Agents
and Collaborative Filtering without Sparsity Rating Problem for
Better Recommendation Quality”, Proceedings of the Second DELOS
Network of Excellence Workshop, 2001, pp.33-38
MAE
0.84
[8] Lemire, D., & Maclachlan, A.. Slope one predictors for online rating-
based collaborative filtering. In Proceedings of SIAM Data Mining
Conference, 2005.
0.81 [9] Huang, Z., Chen, H. and Zeng, D. Applying associative retrieval
techniques to alleviate the sparsity problem in collaborative filtering.
ACM Transactions on Information Systems, 22, 1 (2004), 116-142.
0.78 [10] Manos Papagelisa, Dimitris Plexousakis, Qualitative analysis of user-
20 25 30 35 40 45 50 based and item-based prediction algorithms for recommendation
Number of neighbours agents, Engineering Applications of Artificial Intelligence 18 (2005)
781–789.
Figure 1. Comparing the proposed CF algorithm with the [11] George Lekakos, George M. Giaglis, A hybrid approach for
improving predictive accuracy of collaborative filtering algorithms,
traditional CF algorithm User Model User-Adap Inter (2007) 17:5–40
IV. CONCLUSIONS [12] Huang qin-hua, Ouyang wei-min, Fuzzy collaborative filtering with
multiple agents, Journal of Shanghai University (English Edition),
With the development of recommender systems, the 2007,11(3):290-295.
magnitudes of users and items grow rapidly, resulted in the [13] Gao Fengrong, Xing Chunxiao, Du Xiaoyong, Wang Shan,
extreme sparsity of user rating data set. Traditional Personalized Service System Based on Hybrid Filtering for Digital
Library, Tsinghua Science and Technology, Volume 12, Number 1,
similarity measure methods work poor in this situation, February 2007,1-8.
217