0% found this document useful (0 votes)
11 views8 pages

A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering

The document presents a collaborative filtering recommendation algorithm that integrates user and item clustering to address issues of scalability and sparsity in personalized recommendation systems. It highlights the limitations of traditional collaborative filtering methods, such as data sparsity and cold-start problems, and proposes a more efficient approach that improves accuracy and real-time performance. The proposed method utilizes user and item clustering to enhance the recommendation process, making it more effective than conventional techniques.

Uploaded by

jiaqi bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering

The document presents a collaborative filtering recommendation algorithm that integrates user and item clustering to address issues of scalability and sparsity in personalized recommendation systems. It highlights the limitations of traditional collaborative filtering methods, such as data sparsity and cold-start problems, and proposes a more efficient approach that improves accuracy and real-time performance. The proposed method utilizes user and item clustering to enhance the recommendation process, making it more effective than conventional techniques.

Uploaded by

jiaqi bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

JOURNAL OF SOFTWARE, VOL. 5, NO.

7, JULY 2010 745

A Collaborative Filtering Recommendation


Algorithm Based on User Clustering and Item
Clustering
SongJie Gong
Zhejiang Business Technology Institute, Ningbo 315012, China
Email: [email protected]

Abstract—Personalized recommendation systems can help consisting of their rating scores. There are two methods
people to find interesting things and they are widely used in CF as user based collaborative filtering and item based
with the development of electronic commerce. Many collaborative filtering [3,4]. User based CF assumes that
recommendation systems employ the collaborative filtering a good way to find a certain user’s interesting item is to
technology, which has been proved to be one of the most
successful techniques in recommender systems in recent
find other users who have a similar interest. So, at first, it
years. With the gradual increase of customers and products tries to find the user’s neighbors based on user
in electronic commerce systems, the time consuming nearest similarities and then combine the neighbor users’ rating
neighbor collaborative filtering search of the target scores, which have previously been expressed, by
customer in the total customer space resulted in the failure similarity weighted averaging. And item based CF
of ensuring the real time requirement of recommender fundamentally has the same scheme with user based CF.
system. At the same time, it suffers from its poor quality It looks into a set of items; the target user has already
when the number of the records in the user database rated and computes how similar they are to the target
increases. Sparsity of source data set is the major reason item under recommendation. After that, it also combines
causing the poor quality. To solve the problems of
scalability and sparsity in the collaborative filtering, this
his previous preferences based on these item similarities.
paper proposed a personalized recommendation approach The challenge of these two CF as following [5,6]:
joins the user clustering technology and item clustering Sparsity: Even as users are very active, there are a few
technology. Users are clustered based on users’ ratings on rating of the total number of items available in a user-
items, and each users cluster has a cluster center. Based on item ratings database. As the main of the collaborative
the similarity between target user and cluster centers, the filtering algorithms are based on similarity measures
nearest neighbors of target user can be found and smooth computed over the co-rated set of items, large levels of
the prediction where necessary. Then, the proposed sparsity can lead to less accuracy.
approach utilizes the item clustering collaborative filtering Scalability: Collaborative filtering algorithms seem to
to produce the recommendations. The recommendation
joining user clustering and item clustering collaborative
be efficient in filtering in items that are interesting to
filtering is more scalable and more accurate than the users. However, they require computations that are very
traditional one. expensive and grow non-linearly with the number of
users and items in a database.
Index Terms—recommender systems, collaborative filtering, Cold-start: An item cannot be recommended unless it
user clustering, item clustering, scalability, sparsity, mean has been rated by a number of users. This problem
absolute error applies to new items and is particularly detrimental to
users with eclectic interest. Likewise, a new user has to
rate a sufficient number of items before the CF algorithm
I. INTRODUCTION be able to provide accurate recommendations.
As the development of the internet, intranet and To solve the problems of scalability and sparsity in the
electronic commerce systems, there are amounts of collaborative filtering, in this paper, we proposed a
information arrived we can hardly deal with. Thus, personalized recommendation approach joins the user
personalized recommendation services exist to provide us clustering technology and item clustering technology.
the useful data employing some information filtering Users are clustered based on users’ ratings on items, and
technologies. Information filtering has two main methods. each users cluster has a cluster center. Based on the
One is the content based filtering and the other is the similarity between target user and cluster centers, the
collaborative filtering. Collaborative filtering (CF) has nearest neighbors of target user can be found and smooth
proved to be one of the most effective for its simplicity in the prediction where necessary. Then, the proposed
both theory and implementation [1,2]. approach utilizes the item clustering collaborative
Many researchers have proposed various kinds of CF filtering to produce the recommendations. The
technologies to make a quality recommendation. All of recommendation joining user clustering and item
them make a recommendation based on the same data clustering collaborative filtering is more scalable and
structure as user-item matrix having users and items more accurate than the traditional one.

© 2010 ACADEMY PUBLISHER


doi:10.4304/jsw.5.7.745-752
746 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

II. TRADITIONAL COLLABORATIVE FILTERING Where Ri,c is the rating of the item c by user i, Ai is
ALGORITHM the average rating of user i for all the co-rated items, and
Iij is the items set both rating by user i and user j.
A. User Item Rating Content The cosine measure, as following formula, looks at the
The task of the traditional collaborative filtering angle between two vectors of ratings where a smaller
recommendation algorithm concerns the prediction of the angle is regarded as implying greater similarity.
target user’s rating for the target item that the user has not n
given the rating, based on the users’ ratings on observed
items. And the user-item rating database is in the central.
∑R ik R jk
Each user is represented by item-rating pairs, and can be sim(i, j ) = k =1
n 2 n 2
summarized in a user-item table, which contains the
ratings Rij that have been provided by the ith user for the ∑R ∑R
k =1
ik
k =1
jk
jth item, the table as following [7,8]. (2)
Where Rik is the rating of the item k by user i and n is
TABLE I
USER-ITEM RATINGS TABLE the number of items co-rated by both users. And if the
rating is null, it can be set to zero.
Item Item1 Item2 …… Itemn
The adjusted cosine, as following formula, is used in
User some collaborative filtering methods for similarity among
User1 R11 R12 …… R1n users where the difference in each user’s use of the rating
User2 R21 R22 …… R2n scale is taken into account.


…… …… …… …… ……
Userm Rm1 Rm2 …… Rmn c∈Iij
(Ric − Ac )(Rjc − Ac )
( , j) =
simi
∑ (Ric − Ac ) *∑c∈I (Rjc − Ac )
2 2
Where Rij denotes the score of item j rated by an
c∈Iij ij
active user i. If user i has not rated item j, then Rij =0. (3)
The symbol m denotes the total number of users, and n
denotes the total number of items. Where Ri,c is the rating of the item c by user i, Ac is
the average rating of user i for all the co-rated items, and
B. Measuring the Rating Similarity Ii,j is the items set both rating by user i and user j.
Collaborative filtering approaches have been popular Literature provides rich evidence on the successful
for both researchers and practitioners alike evidenced by performance of collaborative filtering methods. However,
the abundance of publications and actual implementation there are some shortcomings of the methods as well.
cases. Although there have been many algorithms, the Collaborative filtering methods are known to be
basic common idea is to calculate similarity among users vulnerable to data sparsity and to have cold-start
using some measure to recommend items based on the problems. Data sparsity refers to the problem of
similarity. The collaborative filtering algorithms that use insufficient data, or sparseness. Cold-start problems refer
similarities among users are called user based to the difficulty of recommending new items or
collaborative filtering [9,10]. recommending to new users where there are not sufficient
A set of similarity measures are presented and a metric ratings available for them.
of relevance between two vectors. When the values of C. Selecting Neighbors
these vectors are associated with a user’s model then the
similarity is called user based similarity, whereas when Select of the neighbors who will serve as
they are associated with an item’s model then it is called recommenders. Two techniques have been employed in
item based similarity. The similarity measure can be the collaborative filtering recommender systems.
effectively used to balance the ratings significance in a Threshold-based selection, according to which users
prediction algorithm and therefore to improve accuracy. whose similarity exceeds a certain threshold value are
There are several similarity algorithms that have been considered as neighbors of the target user.
used in the collaborative filtering recommendation The top-n technique, n-best neighbors is selected and
algorithm [1,3]: Pearson correlation, cosine vector the n is given at first.
similarity, adjusted cosine vector similarity, mean- D. Producing Prediction
squared difference and Spearman correlation. Since we have got the membership of user, we can
Pearson’s correlation, as following formula, measures calculate the weighted average of neighbors’ ratings,
the linear correlation between two vectors of ratings. weighted by their similarity to the target user.

∑ c∈Iij
(Ri,c − Ai )(Rj ,c − Aj ) The rating of the target user u to the target item t is as
following:
sim(i, j) = (1)
∑ ∑
2 2
c∈Iij
(Ri,c − Ai ) c∈Iij
(Rj ,c − Aj )

© 2010 ACADEMY PUBLISHER


JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010 747

c The rating for a user item pair is generated randomly


∑ ( Rit − Ai ) * sim(u , i) according to a distribution that depends on the cluster that
the user belongs to and the cluster that the item belongs to.
Put = Au + i =1
c They derive performance bounds for Bayesian sequential
∑ sim(u , i)
i =1
probability assignment for these two methods to elucidate
the trade offs involved in using these methods. Bayesian
(4)
sequential probability assignment does not appear to be
Where Au is the average rating of the target user u to computationally tractable for these model classes. They
the items, Rit is the rating of the neighbour user i to the propose heuristic approximations to Bayesian sequential
target item t, Am is the average rating of the neighbour probability assignment for the model classes and
user i to the items, sim(u, i) is the similarity of the target performed experiments on a movie rating data set. The
user u and the neighbour user i, and c is the number of the proposed algorithms are fast, perform well and the results
neighbours. of the experiments agree with the insights derived from
the theoretical considerations.
III. RELATED WORKS Automated collaborative filtering is a popular
L.H. Ungar et al. [11,12] present a formal statistical technique for reducing information overload. K Honda et
model of collaborative filtering and compare different al. [16] propose a new approach for the collaborative
algorithms for estimating the model parameters including filtering using local principal components. The new
variations of K-means clustering and Gibbs Sampling. method is based on a simultaneous approach to principal
This formal model is easily extended to handle clustering component analysis and fuzzy clustering with an
of objects with multiple attributes. And it is better than incomplete data set including missing values. In the
the traditional one. simultaneous approach, they extract local principal
M.O. Conner [13] reports on work in progress related components by using lower rank approximation of the
to applying data clustering algorithms to ratings data in data matrix. The missing values are predicted using the
collaborative filtering. They use existing data partitioning approximation of the data matrix. In numerical
and clustering algorithms to partition the set of items experiment, they apply the proposed technique to the
based on user rating data. Predictions are then computed recommendation system of background designs of
independently within each partition. Ideally, partitioning stationery for word processor.
will improve the quality of collaborative filtering S.H.S. Chee et al. [17] develop an efficient
predictions and increase the scalability of collaborative collaborative filtering method, called RecTree that
filtering systems. They report preliminary results that addresses the scalability problem with a divide-and-
suggest that partitioning algorithms can greatly increase conquer approach. The method first performs an efficient
scalability, but they have mixed results on improving k-means-like clustering to group data and creates
accuracy. However, partitioning based on ratings data neighborhood of similar users, and then performs
does result in more accurate predictions than random subsequent clustering based on smaller, partitioned
partitioning, and the results are similar to those when the databases. Since the progressive partitioning reduces the
data is partitioned based on a known content search space dramatically, the search for an advisory
classification. clique will be faster than scanning the entire database of
A. Kohrs et al. [14] identify two important situations users. In addition, the partitions contain users that are
with sparse ratings in the collaborative filtering more similar to each other than those in other partitions.
recommendation systems. Bootstrapping a collaborative This characteristic allows RecTree to avoid the dilution
filtering system with few users and providing of opinions from good advisors by a multitude of poor
recommendations for new users who rated only few items. advisors and thus yielding a higher overall accuracy.
Further, they present a novel algorithm for collaborative Based on they experiments and performance study,
filtering based on hierarchical clustering which tries to RecTree outperforms the well-known user based
balance robustness and accuracy of predictions and collaborative filtering, in both execution time and
experimentally show that it is especially efficient in accuracy. In particular, RecTree's execution time scales
dealing with the previous situations. by O(nlog2(n)) with the dataset size while the traditional
Lee, WS et al. [15] study two online clustering user based collaborative filtering recommendation scales
methods for collaborative filtering. In the first method, quadratically.
they assume that each user is equally likely to belong to B. Sarwar et al. [18] address the performance issues by
one of m clusters of users and that the user’s rating for scaling up the neighborhood formation process through
each item is generated randomly according to a the use of clustering techniques.
distribution that depends on the item and the cluster that The high cardinality and sparsity of a collaborative
the user belongs to. In the second method, they assume recommender's dataset is a challenge to its efficiency. D.
that each user is equally likely to belong to one of m Bridge et al. [19] generalize an existing clustering
clusters of users while each item is equally likely to technique and apply it to a collaborative recommender's
belong to one of n clusters of items. And the result is that dataset to reduce cardinality and sparsity. They
the proposed methods are good in some way. systematically test several variations, exploring the value
of partitioning and grouping the data.

© 2010 ACADEMY PUBLISHER


748 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

J. Kelleher et al. [20] present a collaborative Panagiotis Symeonidis et al. [25, 26] use bi-clustering
recommender that uses a user-based model to predict user to disclose this duality between users and items, by
ratings for specified items. The model comprises grouping them in both dimensions simultaneously. They
summary rating information derived from a hierarchical propose a novel nearest bi-clusters collaborative filtering
clustering of the users. They compare their algorithm algorithm, which uses a new similarity measure that
with several others. They show that its accuracy is good achieves partial matching of users’ preferences. They
and its coverage is maximal. They also show that the apply nearest bi-clusters in combination with two
proposed algorithm is very efficient: predictions can be different types of bi-clustering algorithms Bimax and
made in time that grows independently of the number of xMotif for constant and coherent biclustering,
ratings and items and only logarithmically in the number respectively. Extensive performance evaluation results in
of users. three real-life data sets are provided, which show that the
Xue, G. et al. [21] present a novel approach that proposed method improves substantially the performance
combines the advantages of memory based collaborative of the CF process.
filtering and model based collaborative filtering of
approaches by introducing a smoothing-based method. In IV. RATING SMOOTHING BASED ON USER CLUSTERING
their approach, clusters generated from the training data
provide the basis for data smoothing and neighborhood A. User Clustering
selection. As a result, they provide higher accuracy as User clustering techniques work by identifying groups
well as increased efficiency in recommendations. Their of users who appear to have similar ratings. Once the
empirical studies on two datasets as EachMovie and clusters are created, predictions for a target user can be
MovieLens show that their new proposed approach made by averaging the opinions of the other users in that
consistently outperforms other user based traditional cluster. Some clustering techniques represent each user
collaborative filtering algorithms. with partial participation in several clusters. The
George, T. et al. [22] consider a novel collaborative prediction is then an average across the clusters, weighted
filtering approach based on a recently proposed weighted by degree of participation. Once the user clustering is
co-clustering algorithm that involves simultaneous complete, however, performance can be very good, since
clustering of users and items. They design incremental the size of the group that must be analyzed is much
and parallel versions of the co-clustering algorithm and smaller [18].
use it to build an efficient real-time collaborative filtering The idea is to divide the users of a collaborative
framework. Their empirical evaluation of the proposed filtering system using user clustering algorithm and use
approach on large movie and book rating datasets the divide as neighborhoods, as Figure 1 show. The
demonstrates that it is possible to obtain accuracy clustering algorithm may generate fixed sized partitions,
comparable to that of the correlation and matrix or based on some similarity threshold it may generate a
factorization based approaches at a much lower requested number of partitions of varying size.
computational cost.
Rashid, A.M. et al. [23] propose ClustKnn, a simple
and intuitive algorithm that is well suited for large data
sets. The proposed method first compresses data
tremendously by building a straightforward but efficient
clustering model. Recommendations are then generated
quickly by using a simple Nearest Neighbor-based
approach. They demonstrate the feasibility of ClustKnn
both analytically and empirically. They also show, by
comparing with a number of other popular collaborative
filtering algorithms that, apart from being highly scalable
and intuitive, ClustKnn provides very good recommender
accuracy as well.
Cantador, I. et al. [24] propose a multilayered semantic
social network model that offers different views of
common interests underlying a community of people. The
applicability of the proposed model to a collaborative
filtering system is empirically studied. Starting from a
number of ontology-based user profiles and taking into Figure1. Collaborative filtering based on user clustering.
account their common preferences, they automatically
cluster the domain concept space. With the obtained Where Rij is the rating of the user i to the item i, aij
semantic clusters, similarities among individuals are the average rating of the user center i to the item i, m is
identified at multiple semantic preference layers, and the number of all users, n is the number of all items, and c
emergent, layered social networks are defined, suitable to is the number of user centers.
be used in collaborative environments and content
recommenders.

© 2010 ACADEMY PUBLISHER


JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010 749

B. Smoothing B. Item Clustering


In this paper, we user the k means clustering algorithm Item clustering techniques work by identifying groups
to cluster the users into some groups as clustering centers. of items who appear to have similar ratings. Once the
Specific algorithm as follows: clusters are created, predictions for a target item can be
Input: clustering number k, user-item rating matrix made by averaging the opinions of the other items in that
Output: smoothing rating matrix cluster. Some clustering techniques represent each item
Begin with partial participation in several clusters. The
Select user set U={U1, U2, …, Um}; prediction is then an average across the clusters, weighted
Select item set I={I1, I2, …, In}; by degree of participation. Once the item clustering is
Choose the top k rating users as the clustering complete, however, performance can be very good, since
CU={CU1, CU2, …, CUk}; the size of the group that must be analyzed is much
The k clustering center is null as c={c1, c2, …, ck}; smaller.
do The idea is to divide the items of a collaborative
for each user Ui∈U filtering system using item clustering algorithm and use
for each cluster center CUj∈CU the divide as neighborhoods, as Figure 2 show. The
calculate the sim(Ui, CUj); clustering algorithm may generate fixed sized partitions,
end for or based on some similarity threshold it may generate a
sim(Ui, CUm)=max{sim(Ui, CU1), sim(Ui, requested number of partitions of varying size.
CU2), …, sim(Ui, CUk);
cm=cm∪Ui
end for
for each cluster ci∈c
for each user Uj∈U
CUi=average(ci, Uj);
end for
end for
while (C is not change)
End
C. New Ratings
One of the challenges of the collaborative filtering is Figure2. Collaborative filtering based on item clustering.
the data sparsity problem. To prediction the vacant values
in user-item rating dataset, we make explicit use of item Where Rij is the rating of the user i to the item i, aij
clusters as prediction mechanisms. the average rating of the user i to the item center j, m is
Based on the item clustering results, we apply the the number of all users, n is the number of all items, and c
prediction strategies to the vacant rating data as follows: is the number of item centers.
⎧⎪ Rij if user i rate the item j C. Algorithm
Rij = ⎨ (5)
There are many algorithms that can be used to create
⎪⎩ c j else
item clustering. In this paper, we choose the k means
Where cj denotes the prediction value for user i rating algorithm as the basic clustering algorithm. The number k
towards an item j and cj has calculated in above specific is an input to the algorithm that specifies the desired
algorithm. number of clusters. Firstly, the algorithm takes the first k
items as the centers of k unique clusters. Each of the
V. USING THE ITEM CLUSTERING METHOD TO PRODUCE remaining items is then compared to the closest center. In
RECOMMENDATIONS the following passes, the cluster centers are re-computed
based on cluster centers formed in the previous pass and
Through the calculating the vacant user’s rating by the cluster membership is re-evaluated.
user clustering algorithm, we gained the dense users’ Specific algorithm as follows:
ratings. Then, to generate prediction of a user's rating, we Input: clustering number k, user-item rating matrix
use the item clustering based collaborative filtering Output: item-center matrix
algorithms. Begin
A. The dense user-item matrix Select user set U={U1, U2, …, Um};
Select item set I={I1, I2, …, In};
After we used the user clustering algorithm, we gained Choose the top k rating items as the clustering
the dense ratings of the users to the items. So, the original CI={CI1, CI2, …, CIk};
sparse user-item rating matrix is now becoming the dense The k clustering center is null as c={c1, c2, …, ck};
user-item matrix. do
for each item Ii∈I

© 2010 ACADEMY PUBLISHER


750 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

for each cluster center CIj∈CI m

calculate the sim(Ii, CIj); ∑ R it R ir


end for s im (t , r ) = i=1 (7)
m m
sim(Ii, CIx)=max{sim(Ii, CI1), sim(Ii, CI2), …,
sim(Ii, CIk); ∑i=1
R it 2 ∑i=1
R ir 2
cx=cx∪Ii
end for Where Rit is the rating of the target item t by user i, Rir
for each cluster ci∈c is the rating of the remaining item r by user i, and m is the
number of all rating users to the item t and item r.
for each user Ij∈I
CIi=average(ci, Ij); F. Producing Recommendations
end for Since we have got the membership of item, we can
end for calculate the weighted average of neighbors’ ratings,
while (CU and c is not change) weighted by their similarity to the target item.
End The rating of the target user u to the target item t is as
We use the pearson’s correlation, as following formula, following:
to measure the linear correlation between two vectors of
c


ratings as the target item t and the remaining item r.
R u i × s im ( t , i )
m

∑ (R it − At )( R ir − Ar )
(6)
Pu t = i =1
c
(8)
sim ( t , r ) =

i =1
m m s im ( t , i )

i =1
( R it − At ) 2 ∑ ( R ir − Ar ) 2
i =1
i =1

Where Rui is the rating of the target user u to the


Where Rit is the rating of the target item t by user i, Rir neighbour item i, sim(t, i) is the similarity of the target
is the rating of the remaining item r by user i, At is the item t and the neighbour it user i for all the co-rated
average rating of the target item t for all the co-rated items, and m is the number of all rating users to the item t
users, Ar is the average rating of the remaining item r for and item r.
all the co-rated users, and m is the number of all rating
users to the item t and item r. VI. EXPERIMENT RESULTS
D. Selecting Clustering Centers In this section, we describe the dataset, metrics and
An important step of item based collaborative filtering methodology for the comparison between traditional and
algorithm is to search neighbors of the target item. proposed collaborative filtering algorithm, and present
Traditional memory based collaborative filtering is to the results of our experiments.
search the whole ratings database and it suffers from poor
A. Data Set
scalability when more and more users and items are
added into the database [21]. We use MovieLens collaborative filtering data set to
When we cluster the items, we get the items centers. evaluate the performance of proposed algorithm.
This center is represented as an average rating over all MovieLens data sets were collected by the GroupLens
items in the cluster. So we can choose the target item Research Project at the University of Minnesota and
neighbors in some of the item center clustering. We use MovieLens is a web-based research recommender system
the Pearson’s correlation to the similarity between the that debuted in Fall 1997. Each week hundreds of users
target item and the items centers. visit MovieLens to rate and receive recommendations for
After calculating the similarity between the target item movies [3,27]. The site now has over 45000 users who
and the items centers, we take the items in the most have expressed opinions on 6600 different movies. We
similar centers as the candidates. randomly selected enough users to obtain 100, 000
ratings from 1000 users on 1680 movies with every user
E. Selecting Neighbors having at least 20 ratings and simple demographic
After we select the target item nearest clustering information for the users is included. The ratings are on a
centers, we also need to calculate the similarity between numeric five-point scale with 1 and 2 representing
the target item and items in the selected clustering negative ratings, 4 and 5 representing positive ratings,
centers. and 3 indicating ambivalence.
We select the Top K most similar items based on the
B. Performance Measurement
cosine measure, as following formula, which looks at the
angle between two vectors of ratings as the target item t Several metrics have been proposed for assessing the
and the remaining item r. accuracy of collaborative filtering methods. They are
divided into two main categories: statistical accuracy
metrics and decision-support accuracy metrics. In this
paper, we use the statistical accuracy metrics [28,29].
Statistical accuracy metrics evaluate the accuracy of a
prediction algorithm by comparing the numerical

© 2010 ACADEMY PUBLISHER


JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010 751

deviation of the predicted ratings from the respective conclusion from Figure 4, which includes the Mean
actual user ratings. Some of them frequently used are Absolute Errors for the proposed algorithm and the
mean absolute error (MAE), root mean squared error traditional collaborative filtering as observed in relation
(RMSE) and correlation between ratings and predictions. to the different numbers of neighbors, is that our
All of the above metrics were computed on result data proposed algorithm is better.
and generally provided the same conclusions. As
statistical accuracy measure, mean absolute error is 0.9
employed. Traditional CF
Formally, if n is the number of actual ratings in an item Proposed CF
set, then MAE is defined as the average absolute 0.87
difference between the n pairs. Assume that p1, p2, p3, ...,
pn is the prediction of users' ratings, and the

MAE
corresponding real ratings data set of users is q1, q2, 0.84
q3, ..., qn. See the MAE definition as following:
n

∑| p i − qi | 0.81
MAE = i =1

n (9)
The lower the MAE, the more accurate the predictions 0.78
would be, allowing for better recommendations to be 20 25 30 35 40 45 50
formulated. MAE has been computed for different Number of neighbours
prediction algorithms and for different levels of sparsity.
Figure4. Comparing the proposed CF algorithm with the traditional CF
C. Sensitivity of different training-test ratio x algorithm.
To determine the sensitivity of density of the dataset
we carried out an experiment where we varied the value
of x from 0.2 to 0.8 in an increment of 0.1. For each of VII. CONCLUSIONS
these training-test ratio values we ran our experiments Recommender systems can help people to find
using our proposed algorithm and the traditional CF interesting things and they are widely used in our life
algorithm. The results are shown in Figure 3. We observe with the development of electronic commerce. Many
that the quality of prediction increase as we increase x recommendation systems employ the collaborative
and our proposed CF is better than the traditional. filtering technology, which has been proved to be one of
the most successful techniques in recommender systems
in recent years. With the gradual increase of customers
0.93 T raditional CF and products in electronic commerce systems, the time
Proposed CF consuming nearest neighbor collaborative filtering search
of the target customer in the total customer space resulted
0.89
in the failure of ensuring the real time requirement of
recommender system. At the same time, it suffers from its
MAE

poor quality when the number of the records in the user


0.85
database increases. Sparsity of source data set is the
major reason causing the poor quality. To solve the
problems of scalability and sparsity in the collaborative
0.81
filtering, this paper proposed a personalized
recommendation approach joins the user clustering
0.77 technology and item clustering technology. Users are
0.2 0.3 0.4 0.5 0.6 0.7 0.8
clustered based on users’ ratings on items, and each users
train-test ratio x cluster has a cluster center. Based on the similarity
between target user and cluster centers, the nearest
Figure3. MAE of the different prediction algorithm with respect to neighbors of target user can be found and smooth the
train-test ratio x. prediction where necessary. Then, the proposed approach
utilizes the item clustering collaborative filtering to
D. Comparing with the traditional CF produce the recommendations. The recommendation
We compare the proposed method combining user joining user clustering and item clustering collaborative
clustering and item clustering collaborative filtering with filtering is more scalable and more accurate than the
the traditional collaborative filtering. The size of the traditional one.
neighborhood has a significant effect on the prediction
quality. In our experiments, we vary the number of
neighbors and compute the MAE. The obvious ACKNOWLEDGMENT

© 2010 ACADEMY PUBLISHER


752 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

A Project Supported by Scientific Research Fund of [16] K Honda, N Sugiura, H Ichihashi, S Araki. Collaborative
Zhejiang Provincial Education Department (Grant No. Filtering Using Principal Component Analysis and Fuzzy
Clustering,Lecture Notes in Computer Science, 2001
Y200806038). [17] S.H.S. Chee,J Han,K. Wang.Rectree: An efficient
collaborative filtering method. Lecture Notes in Computer
REFERENCES Science, 2114, 2001
[18] B. Sarwar, G. Karypis, J. Konstan and J. Riedl,
[1] Breese J, Hecherman D, Kadie C. Empirical analysis of Recommender systems for large-scale e-commerce:
predictive algorithms for collaborative filtering. In: Scalableneighborhood formation using clustering,
Proceedings of the 14th Conference on Uncertainty in Proceedings of the Fifth International Conference on
Artificial Intelligence (UAI’98). 1998. 43~52. Computer andInformation Technology, 2002
[2] Chong-Ben Huang, Song-Jie Gong, Employing rough set [19] D. Bridge and J. Kelleher, Experiments in sparsity
theory to alleviate the sparsity issue in recommender reduction: Using clustering in collaborative recommenders,
system, In: Proceeding of the Seventh International in Procs. of the Thirteenth Irish Conference on Artificial
Conference on Machine Learning and Cybernetics Intelligence and Cognitive Science, pp. 144–149. Springer,
2002.
(ICMLC2008), IEEE Press, 2008, pp.1610-1614.
[3] Sarwar B, Karypis G, Konstan J, Riedl J. Item-Based [20] J. Kelleher and D. Bridge. Rectree centroid: An accurate,
scalable collaborative recommender. In Procs. of the
collaborative filtering recommendation algorithms. In: Fourteenth Irish Conference on Artificial Intelligence and
Proceedings of the 10th International World Wide Web Cognitive Science, pages 89–94, 2003.
Conference. 2001. 285-295. [21] Xue, G., Lin, C., & Yang, Q., et al. Scalable collaborative
[4] Manos Papagelis, Dimitris Plexousakis, Qualitative filtering using cluster-based smoothing. In Proceedings of
analysis of user-based and item-based prediction the ACM SIGIR Conference 2005 pp.114–121.
algorithms for recommendation agents, Engineering [22] George, T., & Merugu, S. A scalable collaborative filtering
Application of Artificial Intelligence 18 (2005) 781-789. framework based on co-clustering. In Proceedings of the
[5] Hyung Jun Ahn, A new similarity measure for IEEE ICDM Conference. 2005
collaborative filtering to alleviate the new user cold- [23] Rashid, A.M.; Lam, S.K.; Karypis, G.; Riedl, J.;
starting problem, Information Sciences 178 (2008) 37-51. ClustKNN: A Highly Scalable Hybrid Model- & Memory-
[6] SongJie Gong, The Collaborative Filtering Based CF Algorithm. WEBKDD 2006.
Recommendation Based on Similar-Priority and Fuzzy [24] Cantador, I., Castells, P. Multilayered Semantic Social
Clustering, In: Proceeding of 2008 Workshop on Power Networks Modelling by Ontologybased User Profiles
Electronics and Intelligent Transportation System Clustering: Application to Collaborative Filtering. EKAW
2006, pp. 334-349.
(PEITS2008), IEEE Computer Society Press, 2008, pp.
248-251. [25] Panagiotis Symeonidis, Alexandros Nanopoulos,
Apostolos Papadopoulos, Yannis Manolopoulos,Nearest-
[7] SongJie Gong, GuangHua Cheng, Mining User Interest Biclusters Collaborative Filtering,WEBKDD 2006
Change for Improving Collaborative Filtering, In:Second
[26] Panagiotis Symeonidis, Alexandros Nanopoulos,
International Symposium on Intelligent Information
Apostolos N. Papadopoulos, Yannis Manolopoulos.
Technology Application(IITA2008), IEEE Computer
Nearest-biclusters collaborative filtering based on constant
Society Press, 2008, Volume3, pp.24-27.
and coherent values. Inf Retrieval 2007 DOI
[8] Duen-Ren Liu, Ya-Yueh Shih, Hybrid approaches to
10.1007/s10791-007-9038-4.
product recommendation based on customer lifetime value
[27] Gao Fengrong, Xing Chunxiao, Du Xiaoyong, Wang Shan,
and purchase preferences, The Journal of Systems and
Personalized Service System Based on Hybrid Filtering for
Software 77 (2005) 181–191.
Digital Library, Tsinghua Science and Technology,
[9] Yu Li, Liu Lu, Li Xuefeng, A hybrid collaborative filtering
Volume 12, Number 1, February 2007,1-8.
method for multiple-interests and multiple-content
[28] Huang qin-hua, Ouyang wei-min, Fuzzy collaborative
recommendation in E-Commerce, Expert Systems with
filtering with multiple agents, Journal of Shanghai
Applications 28 (2005) 67–77.
University (English Edition), 2007,11(3):290-295.
[10] George Lekakos, George M. Giaglis, Improving the
[29] Songjie Gong, Chongben Huang, Employing Fuzzy
prediction accuracy of recommendation algorithms:
Clustering to Alleviate the Sparsity Issue in Collaborative
Approaches anchored on human factors, Interacting with
Filtering Recommendation Algorithms, In: Proceeding of
Computers 18 (2006) 410–431.
[11] L. H. Ungar and D. P. Foster. Clustering Methods for 2008 International Pre-Olympic Congress on Computer
Collaborative Filtering. In Proc. Workshop on Science, World Academic Press, 2008, pp.449-454.
Recommendation Systems at the 15th National Conf. on
Artificial Intelligence. Menlo Park, CA: AAAI Press.1998
[12] L. H. Ungar and D. P. Foster. A Formal Statistical
Approach to Collaborative Filtering. Proceedings of
Conference on Automated Leading and Discovery
(CONALD), 1998. SongJie Gong was born in Cixi, Zhejiang Province,
[13] M. O. Conner and J. Herlocker. Clustering Items for P.R.China, in July 1, 1979. He received B. Sc degree from
Collaborative Filtering. In Proceedings of the ACM SIGIR Tongji University and M. Sc degree in computer application
Workshop on Recommender Systems, Berkeley, CA, from Shanghai Jiaotong University, P.R. China in 2003 and
August 1999. 2006 respectively. He is currently a teacher in Zhejiang
[14] A. Kohrs and B. Merialdo. Clustering for Collaborative Business technology Institute, Ningbo, P.R.China.
Filtering Applications. In Proceedings of CIMCA'99. IOS His research interest includes data mining, information
Press, 1999. processing and intelligent computing. He has published more
[15] Lee, WS. Online clustering for collaborative filtering. than 30 papers in journals and conferences.
School of Computing Technical Report TRA8/00. 2000.

© 2010 ACADEMY PUBLISHER

You might also like