0% found this document useful (0 votes)

11 views8 pages

A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering

The document presents a collaborative filtering recommendation algorithm that integrates user and item clustering to address issues of scalability and sparsity in personalized recommendation systems. It highlights the limitations of traditional collaborative filtering methods, such as data sparsity and cold-start problems, and proposes a more efficient approach that improves accuracy and real-time performance. The proposed method utilizes user and item clustering to enhance the recommendation process, making it more effective than conventional techniques.

Uploaded by

jiaqi bao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views8 pages

A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering

Uploaded by

jiaqi bao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

JOURNAL OF SOFTWARE, VOL. 5, NO.

7, JULY 2010 745

A Collaborative Filtering Recommendation

Algorithm Based on User Clustering and Item
Clustering
SongJie Gong
Zhejiang Business Technology Institute, Ningbo 315012, China
Email: [email protected]

Abstract—Personalized recommendation systems can help consisting of their rating scores. There are two methods
people to find interesting things and they are widely used in CF as user based collaborative filtering and item based
with the development of electronic commerce. Many collaborative filtering [3,4]. User based CF assumes that
recommendation systems employ the collaborative filtering a good way to find a certain user’s interesting item is to
technology, which has been proved to be one of the most
successful techniques in recommender systems in recent
find other users who have a similar interest. So, at first, it
years. With the gradual increase of customers and products tries to find the user’s neighbors based on user
in electronic commerce systems, the time consuming nearest similarities and then combine the neighbor users’ rating
neighbor collaborative filtering search of the target scores, which have previously been expressed, by
customer in the total customer space resulted in the failure similarity weighted averaging. And item based CF
of ensuring the real time requirement of recommender fundamentally has the same scheme with user based CF.
system. At the same time, it suffers from its poor quality It looks into a set of items; the target user has already
when the number of the records in the user database rated and computes how similar they are to the target
increases. Sparsity of source data set is the major reason item under recommendation. After that, it also combines
causing the poor quality. To solve the problems of
scalability and sparsity in the collaborative filtering, this
his previous preferences based on these item similarities.
paper proposed a personalized recommendation approach The challenge of these two CF as following [5,6]:
joins the user clustering technology and item clustering Sparsity: Even as users are very active, there are a few
technology. Users are clustered based on users’ ratings on rating of the total number of items available in a user-
items, and each users cluster has a cluster center. Based on item ratings database. As the main of the collaborative
the similarity between target user and cluster centers, the filtering algorithms are based on similarity measures
nearest neighbors of target user can be found and smooth computed over the co-rated set of items, large levels of
the prediction where necessary. Then, the proposed sparsity can lead to less accuracy.
approach utilizes the item clustering collaborative filtering Scalability: Collaborative filtering algorithms seem to
to produce the recommendations. The recommendation
joining user clustering and item clustering collaborative
be efficient in filtering in items that are interesting to
filtering is more scalable and more accurate than the users. However, they require computations that are very
traditional one. expensive and grow non-linearly with the number of
users and items in a database.
Index Terms—recommender systems, collaborative filtering, Cold-start: An item cannot be recommended unless it
user clustering, item clustering, scalability, sparsity, mean has been rated by a number of users. This problem
absolute error applies to new items and is particularly detrimental to
users with eclectic interest. Likewise, a new user has to
rate a sufficient number of items before the CF algorithm
I. INTRODUCTION be able to provide accurate recommendations.
As the development of the internet, intranet and To solve the problems of scalability and sparsity in the
electronic commerce systems, there are amounts of collaborative filtering, in this paper, we proposed a
information arrived we can hardly deal with. Thus, personalized recommendation approach joins the user
personalized recommendation services exist to provide us clustering technology and item clustering technology.
the useful data employing some information filtering Users are clustered based on users’ ratings on items, and
technologies. Information filtering has two main methods. each users cluster has a cluster center. Based on the
One is the content based filtering and the other is the similarity between target user and cluster centers, the
collaborative filtering. Collaborative filtering (CF) has nearest neighbors of target user can be found and smooth
proved to be one of the most effective for its simplicity in the prediction where necessary. Then, the proposed
both theory and implementation [1,2]. approach utilizes the item clustering collaborative
Many researchers have proposed various kinds of CF filtering to produce the recommendations. The
technologies to make a quality recommendation. All of recommendation joining user clustering and item
them make a recommendation based on the same data clustering collaborative filtering is more scalable and
structure as user-item matrix having users and items more accurate than the traditional one.

© 2010 ACADEMY PUBLISHER

doi:10.4304/jsw.5.7.745-752
746 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

II. TRADITIONAL COLLABORATIVE FILTERING Where Ri,c is the rating of the item c by user i, Ai is
ALGORITHM the average rating of user i for all the co-rated items, and
Iij is the items set both rating by user i and user j.
A. User Item Rating Content The cosine measure, as following formula, looks at the
The task of the traditional collaborative filtering angle between two vectors of ratings where a smaller
recommendation algorithm concerns the prediction of the angle is regarded as implying greater similarity.
target user’s rating for the target item that the user has not n
given the rating, based on the users’ ratings on observed
items. And the user-item rating database is in the central.
∑R ik R jk
Each user is represented by item-rating pairs, and can be sim(i, j ) = k =1
n 2 n 2
summarized in a user-item table, which contains the
ratings Rij that have been provided by the ith user for the ∑R ∑R
k =1
ik
k =1
jk
jth item, the table as following [7,8]. (2)
Where Rik is the rating of the item k by user i and n is
TABLE I
USER-ITEM RATINGS TABLE the number of items co-rated by both users. And if the
rating is null, it can be set to zero.
Item Item1 Item2 …… Itemn
The adjusted cosine, as following formula, is used in
User some collaborative filtering methods for similarity among
User1 R11 R12 …… R1n users where the difference in each user’s use of the rating
User2 R21 R22 …… R2n scale is taken into account.

∑
…… …… …… …… ……
Userm Rm1 Rm2 …… Rmn c∈Iij
(Ric − Ac )(Rjc − Ac )
( , j) =
simi
∑ (Ric − Ac ) *∑c∈I (Rjc − Ac )
2 2
Where Rij denotes the score of item j rated by an
c∈Iij ij
active user i. If user i has not rated item j, then Rij =0. (3)
The symbol m denotes the total number of users, and n
denotes the total number of items. Where Ri,c is the rating of the item c by user i, Ac is
the average rating of user i for all the co-rated items, and
B. Measuring the Rating Similarity Ii,j is the items set both rating by user i and user j.
Collaborative filtering approaches have been popular Literature provides rich evidence on the successful
for both researchers and practitioners alike evidenced by performance of collaborative filtering methods. However,
the abundance of publications and actual implementation there are some shortcomings of the methods as well.
cases. Although there have been many algorithms, the Collaborative filtering methods are known to be
basic common idea is to calculate similarity among users vulnerable to data sparsity and to have cold-start
using some measure to recommend items based on the problems. Data sparsity refers to the problem of
similarity. The collaborative filtering algorithms that use insufficient data, or sparseness. Cold-start problems refer
similarities among users are called user based to the difficulty of recommending new items or
collaborative filtering [9,10]. recommending to new users where there are not sufficient
A set of similarity measures are presented and a metric ratings available for them.
of relevance between two vectors. When the values of C. Selecting Neighbors
these vectors are associated with a user’s model then the
similarity is called user based similarity, whereas when Select of the neighbors who will serve as
they are associated with an item’s model then it is called recommenders. Two techniques have been employed in
item based similarity. The similarity measure can be the collaborative filtering recommender systems.
effectively used to balance the ratings significance in a Threshold-based selection, according to which users
prediction algorithm and therefore to improve accuracy. whose similarity exceeds a certain threshold value are
There are several similarity algorithms that have been considered as neighbors of the target user.
used in the collaborative filtering recommendation The top-n technique, n-best neighbors is selected and
algorithm [1,3]: Pearson correlation, cosine vector the n is given at first.
similarity, adjusted cosine vector similarity, mean- D. Producing Prediction
squared difference and Spearman correlation. Since we have got the membership of user, we can
Pearson’s correlation, as following formula, measures calculate the weighted average of neighbors’ ratings,
the linear correlation between two vectors of ratings. weighted by their similarity to the target user.

∑ c∈Iij
(Ri,c − Ai )(Rj ,c − Aj ) The rating of the target user u to the target item t is as
following:
sim(i, j) = (1)
∑ ∑
2 2
c∈Iij
(Ri,c − Ai ) c∈Iij
(Rj ,c − Aj )

© 2010 ACADEMY PUBLISHER

JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010 747

c The rating for a user item pair is generated randomly

∑ ( Rit − Ai ) * sim(u , i) according to a distribution that depends on the cluster that
the user belongs to and the cluster that the item belongs to.
Put = Au + i =1
c They derive performance bounds for Bayesian sequential
∑ sim(u , i)
i =1
probability assignment for these two methods to elucidate
the trade offs involved in using these methods. Bayesian
(4)
sequential probability assignment does not appear to be
Where Au is the average rating of the target user u to computationally tractable for these model classes. They
the items, Rit is the rating of the neighbour user i to the propose heuristic approximations to Bayesian sequential
target item t, Am is the average rating of the neighbour probability assignment for the model classes and
user i to the items, sim(u, i) is the similarity of the target performed experiments on a movie rating data set. The
user u and the neighbour user i, and c is the number of the proposed algorithms are fast, perform well and the results
neighbours. of the experiments agree with the insights derived from
the theoretical considerations.
III. RELATED WORKS Automated collaborative filtering is a popular
L.H. Ungar et al. [11,12] present a formal statistical technique for reducing information overload. K Honda et
model of collaborative filtering and compare different al. [16] propose a new approach for the collaborative
algorithms for estimating the model parameters including filtering using local principal components. The new
variations of K-means clustering and Gibbs Sampling. method is based on a simultaneous approach to principal
This formal model is easily extended to handle clustering component analysis and fuzzy clustering with an
of objects with multiple attributes. And it is better than incomplete data set including missing values. In the
the traditional one. simultaneous approach, they extract local principal
M.O. Conner [13] reports on work in progress related components by using lower rank approximation of the
to applying data clustering algorithms to ratings data in data matrix. The missing values are predicted using the
collaborative filtering. They use existing data partitioning approximation of the data matrix. In numerical
and clustering algorithms to partition the set of items experiment, they apply the proposed technique to the
based on user rating data. Predictions are then computed recommendation system of background designs of
independently within each partition. Ideally, partitioning stationery for word processor.
will improve the quality of collaborative filtering S.H.S. Chee et al. [17] develop an efficient
predictions and increase the scalability of collaborative collaborative filtering method, called RecTree that
filtering systems. They report preliminary results that addresses the scalability problem with a divide-and-
suggest that partitioning algorithms can greatly increase conquer approach. The method first performs an efficient
scalability, but they have mixed results on improving k-means-like clustering to group data and creates
accuracy. However, partitioning based on ratings data neighborhood of similar users, and then performs
does result in more accurate predictions than random subsequent clustering based on smaller, partitioned
partitioning, and the results are similar to those when the databases. Since the progressive partitioning reduces the
data is partitioned based on a known content search space dramatically, the search for an advisory
classification. clique will be faster than scanning the entire database of
A. Kohrs et al. [14] identify two important situations users. In addition, the partitions contain users that are
with sparse ratings in the collaborative filtering more similar to each other than those in other partitions.
recommendation systems. Bootstrapping a collaborative This characteristic allows RecTree to avoid the dilution
filtering system with few users and providing of opinions from good advisors by a multitude of poor
recommendations for new users who rated only few items. advisors and thus yielding a higher overall accuracy.
Further, they present a novel algorithm for collaborative Based on they experiments and performance study,
filtering based on hierarchical clustering which tries to RecTree outperforms the well-known user based
balance robustness and accuracy of predictions and collaborative filtering, in both execution time and
experimentally show that it is especially efficient in accuracy. In particular, RecTree's execution time scales
dealing with the previous situations. by O(nlog2(n)) with the dataset size while the traditional
Lee, WS et al. [15] study two online clustering user based collaborative filtering recommendation scales
methods for collaborative filtering. In the first method, quadratically.
they assume that each user is equally likely to belong to B. Sarwar et al. [18] address the performance issues by
one of m clusters of users and that the user’s rating for scaling up the neighborhood formation process through
each item is generated randomly according to a the use of clustering techniques.
distribution that depends on the item and the cluster that The high cardinality and sparsity of a collaborative
the user belongs to. In the second method, they assume recommender's dataset is a challenge to its efficiency. D.
that each user is equally likely to belong to one of m Bridge et al. [19] generalize an existing clustering
clusters of users while each item is equally likely to technique and apply it to a collaborative recommender's
belong to one of n clusters of items. And the result is that dataset to reduce cardinality and sparsity. They
the proposed methods are good in some way. systematically test several variations, exploring the value
of partitioning and grouping the data.

© 2010 ACADEMY PUBLISHER

748 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

J. Kelleher et al. [20] present a collaborative Panagiotis Symeonidis et al. [25, 26] use bi-clustering
recommender that uses a user-based model to predict user to disclose this duality between users and items, by
ratings for specified items. The model comprises grouping them in both dimensions simultaneously. They
summary rating information derived from a hierarchical propose a novel nearest bi-clusters collaborative filtering
clustering of the users. They compare their algorithm algorithm, which uses a new similarity measure that
with several others. They show that its accuracy is good achieves partial matching of users’ preferences. They
and its coverage is maximal. They also show that the apply nearest bi-clusters in combination with two
proposed algorithm is very efficient: predictions can be different types of bi-clustering algorithms Bimax and
made in time that grows independently of the number of xMotif for constant and coherent biclustering,
ratings and items and only logarithmically in the number respectively. Extensive performance evaluation results in
of users. three real-life data sets are provided, which show that the
Xue, G. et al. [21] present a novel approach that proposed method improves substantially the performance
combines the advantages of memory based collaborative of the CF process.
filtering and model based collaborative filtering of
approaches by introducing a smoothing-based method. In IV. RATING SMOOTHING BASED ON USER CLUSTERING
their approach, clusters generated from the training data
provide the basis for data smoothing and neighborhood A. User Clustering
selection. As a result, they provide higher accuracy as User clustering techniques work by identifying groups
well as increased efficiency in recommendations. Their of users who appear to have similar ratings. Once the
empirical studies on two datasets as EachMovie and clusters are created, predictions for a target user can be
MovieLens show that their new proposed approach made by averaging the opinions of the other users in that
consistently outperforms other user based traditional cluster. Some clustering techniques represent each user
collaborative filtering algorithms. with partial participation in several clusters. The
George, T. et al. [22] consider a novel collaborative prediction is then an average across the clusters, weighted
filtering approach based on a recently proposed weighted by degree of participation. Once the user clustering is
co-clustering algorithm that involves simultaneous complete, however, performance can be very good, since
clustering of users and items. They design incremental the size of the group that must be analyzed is much
and parallel versions of the co-clustering algorithm and smaller [18].
use it to build an efficient real-time collaborative filtering The idea is to divide the users of a collaborative
framework. Their empirical evaluation of the proposed filtering system using user clustering algorithm and use
approach on large movie and book rating datasets the divide as neighborhoods, as Figure 1 show. The
demonstrates that it is possible to obtain accuracy clustering algorithm may generate fixed sized partitions,
comparable to that of the correlation and matrix or based on some similarity threshold it may generate a
factorization based approaches at a much lower requested number of partitions of varying size.
computational cost.
Rashid, A.M. et al. [23] propose ClustKnn, a simple
and intuitive algorithm that is well suited for large data
sets. The proposed method first compresses data
tremendously by building a straightforward but efficient
clustering model. Recommendations are then generated
quickly by using a simple Nearest Neighbor-based
approach. They demonstrate the feasibility of ClustKnn
both analytically and empirically. They also show, by
comparing with a number of other popular collaborative
filtering algorithms that, apart from being highly scalable
and intuitive, ClustKnn provides very good recommender
accuracy as well.
Cantador, I. et al. [24] propose a multilayered semantic
social network model that offers different views of
common interests underlying a community of people. The
applicability of the proposed model to a collaborative
filtering system is empirically studied. Starting from a
number of ontology-based user profiles and taking into Figure1. Collaborative filtering based on user clustering.
account their common preferences, they automatically
cluster the domain concept space. With the obtained Where Rij is the rating of the user i to the item i, aij
semantic clusters, similarities among individuals are the average rating of the user center i to the item i, m is
identified at multiple semantic preference layers, and the number of all users, n is the number of all items, and c
emergent, layered social networks are defined, suitable to is the number of user centers.
be used in collaborative environments and content
recommenders.

© 2010 ACADEMY PUBLISHER

JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010 749

B. Smoothing B. Item Clustering

In this paper, we user the k means clustering algorithm Item clustering techniques work by identifying groups
to cluster the users into some groups as clustering centers. of items who appear to have similar ratings. Once the
Specific algorithm as follows: clusters are created, predictions for a target item can be
Input: clustering number k, user-item rating matrix made by averaging the opinions of the other items in that
Output: smoothing rating matrix cluster. Some clustering techniques represent each item
Begin with partial participation in several clusters. The
Select user set U={U1, U2, …, Um}; prediction is then an average across the clusters, weighted
Select item set I={I1, I2, …, In}; by degree of participation. Once the item clustering is
Choose the top k rating users as the clustering complete, however, performance can be very good, since
CU={CU1, CU2, …, CUk}; the size of the group that must be analyzed is much
The k clustering center is null as c={c1, c2, …, ck}; smaller.
do The idea is to divide the items of a collaborative
for each user Ui∈U filtering system using item clustering algorithm and use
for each cluster center CUj∈CU the divide as neighborhoods, as Figure 2 show. The
calculate the sim(Ui, CUj); clustering algorithm may generate fixed sized partitions,
end for or based on some similarity threshold it may generate a
sim(Ui, CUm)=max{sim(Ui, CU1), sim(Ui, requested number of partitions of varying size.
CU2), …, sim(Ui, CUk);
cm=cm∪Ui
end for
for each cluster ci∈c
for each user Uj∈U
CUi=average(ci, Uj);
end for
end for
while (C is not change)
End
C. New Ratings
One of the challenges of the collaborative filtering is Figure2. Collaborative filtering based on item clustering.
the data sparsity problem. To prediction the vacant values
in user-item rating dataset, we make explicit use of item Where Rij is the rating of the user i to the item i, aij
clusters as prediction mechanisms. the average rating of the user i to the item center j, m is
Based on the item clustering results, we apply the the number of all users, n is the number of all items, and c
prediction strategies to the vacant rating data as follows: is the number of item centers.
⎧⎪ Rij if user i rate the item j C. Algorithm
Rij = ⎨ (5)
There are many algorithms that can be used to create
⎪⎩ c j else
item clustering. In this paper, we choose the k means
Where cj denotes the prediction value for user i rating algorithm as the basic clustering algorithm. The number k
towards an item j and cj has calculated in above specific is an input to the algorithm that specifies the desired
algorithm. number of clusters. Firstly, the algorithm takes the first k
items as the centers of k unique clusters. Each of the
V. USING THE ITEM CLUSTERING METHOD TO PRODUCE remaining items is then compared to the closest center. In
RECOMMENDATIONS the following passes, the cluster centers are re-computed
based on cluster centers formed in the previous pass and
Through the calculating the vacant user’s rating by the cluster membership is re-evaluated.
user clustering algorithm, we gained the dense users’ Specific algorithm as follows:
ratings. Then, to generate prediction of a user's rating, we Input: clustering number k, user-item rating matrix
use the item clustering based collaborative filtering Output: item-center matrix
algorithms. Begin
A. The dense user-item matrix Select user set U={U1, U2, …, Um};
Select item set I={I1, I2, …, In};
After we used the user clustering algorithm, we gained Choose the top k rating items as the clustering
the dense ratings of the users to the items. So, the original CI={CI1, CI2, …, CIk};
sparse user-item rating matrix is now becoming the dense The k clustering center is null as c={c1, c2, …, ck};
user-item matrix. do
for each item Ii∈I

© 2010 ACADEMY PUBLISHER

750 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

for each cluster center CIj∈CI m

calculate the sim(Ii, CIj); ∑ R it R ir

end for s im (t , r ) = i=1 (7)
m m
sim(Ii, CIx)=max{sim(Ii, CI1), sim(Ii, CI2), …,
sim(Ii, CIk); ∑i=1
R it 2 ∑i=1
R ir 2
cx=cx∪Ii
end for Where Rit is the rating of the target item t by user i, Rir
for each cluster ci∈c is the rating of the remaining item r by user i, and m is the
number of all rating users to the item t and item r.
for each user Ij∈I
CIi=average(ci, Ij); F. Producing Recommendations
end for Since we have got the membership of item, we can
end for calculate the weighted average of neighbors’ ratings,
while (CU and c is not change) weighted by their similarity to the target item.
End The rating of the target user u to the target item t is as
We use the pearson’s correlation, as following formula, following:
to measure the linear correlation between two vectors of
c

∑
ratings as the target item t and the remaining item r.
R u i × s im ( t , i )
m

∑ (R it − At )( R ir − Ar )
(6)
Pu t = i =1
c
(8)
sim ( t , r ) =
∑
i =1
m m s im ( t , i )
∑
i =1
( R it − At ) 2 ∑ ( R ir − Ar ) 2
i =1
i =1

Where Rui is the rating of the target user u to the

Where Rit is the rating of the target item t by user i, Rir neighbour item i, sim(t, i) is the similarity of the target
is the rating of the remaining item r by user i, At is the item t and the neighbour it user i for all the co-rated
average rating of the target item t for all the co-rated items, and m is the number of all rating users to the item t
users, Ar is the average rating of the remaining item r for and item r.
all the co-rated users, and m is the number of all rating
users to the item t and item r. VI. EXPERIMENT RESULTS
D. Selecting Clustering Centers In this section, we describe the dataset, metrics and
An important step of item based collaborative filtering methodology for the comparison between traditional and
algorithm is to search neighbors of the target item. proposed collaborative filtering algorithm, and present
Traditional memory based collaborative filtering is to the results of our experiments.
search the whole ratings database and it suffers from poor
A. Data Set
scalability when more and more users and items are
added into the database [21]. We use MovieLens collaborative filtering data set to
When we cluster the items, we get the items centers. evaluate the performance of proposed algorithm.
This center is represented as an average rating over all MovieLens data sets were collected by the GroupLens
items in the cluster. So we can choose the target item Research Project at the University of Minnesota and
neighbors in some of the item center clustering. We use MovieLens is a web-based research recommender system
the Pearson’s correlation to the similarity between the that debuted in Fall 1997. Each week hundreds of users
target item and the items centers. visit MovieLens to rate and receive recommendations for
After calculating the similarity between the target item movies [3,27]. The site now has over 45000 users who
and the items centers, we take the items in the most have expressed opinions on 6600 different movies. We
similar centers as the candidates. randomly selected enough users to obtain 100, 000
ratings from 1000 users on 1680 movies with every user
E. Selecting Neighbors having at least 20 ratings and simple demographic
After we select the target item nearest clustering information for the users is included. The ratings are on a
centers, we also need to calculate the similarity between numeric five-point scale with 1 and 2 representing
the target item and items in the selected clustering negative ratings, 4 and 5 representing positive ratings,
centers. and 3 indicating ambivalence.
We select the Top K most similar items based on the
B. Performance Measurement
cosine measure, as following formula, which looks at the
angle between two vectors of ratings as the target item t Several metrics have been proposed for assessing the
and the remaining item r. accuracy of collaborative filtering methods. They are
divided into two main categories: statistical accuracy
metrics and decision-support accuracy metrics. In this
paper, we use the statistical accuracy metrics [28,29].
Statistical accuracy metrics evaluate the accuracy of a
prediction algorithm by comparing the numerical

JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010 751

deviation of the predicted ratings from the respective conclusion from Figure 4, which includes the Mean
actual user ratings. Some of them frequently used are Absolute Errors for the proposed algorithm and the
mean absolute error (MAE), root mean squared error traditional collaborative filtering as observed in relation
(RMSE) and correlation between ratings and predictions. to the different numbers of neighbors, is that our
All of the above metrics were computed on result data proposed algorithm is better.
and generally provided the same conclusions. As
statistical accuracy measure, mean absolute error is 0.9
employed. Traditional CF
Formally, if n is the number of actual ratings in an item Proposed CF
set, then MAE is defined as the average absolute 0.87
difference between the n pairs. Assume that p1, p2, p3, ...,
pn is the prediction of users' ratings, and the

MAE
corresponding real ratings data set of users is q1, q2, 0.84
q3, ..., qn. See the MAE definition as following:
n

∑| p i − qi | 0.81
MAE = i =1

n (9)
The lower the MAE, the more accurate the predictions 0.78
would be, allowing for better recommendations to be 20 25 30 35 40 45 50
formulated. MAE has been computed for different Number of neighbours
prediction algorithms and for different levels of sparsity.
Figure4. Comparing the proposed CF algorithm with the traditional CF
C. Sensitivity of different training-test ratio x algorithm.
To determine the sensitivity of density of the dataset
we carried out an experiment where we varied the value
of x from 0.2 to 0.8 in an increment of 0.1. For each of VII. CONCLUSIONS
these training-test ratio values we ran our experiments Recommender systems can help people to find
using our proposed algorithm and the traditional CF interesting things and they are widely used in our life
algorithm. The results are shown in Figure 3. We observe with the development of electronic commerce. Many
that the quality of prediction increase as we increase x recommendation systems employ the collaborative
and our proposed CF is better than the traditional. filtering technology, which has been proved to be one of
the most successful techniques in recommender systems
in recent years. With the gradual increase of customers
0.93 T raditional CF and products in electronic commerce systems, the time
Proposed CF consuming nearest neighbor collaborative filtering search
of the target customer in the total customer space resulted
0.89
in the failure of ensuring the real time requirement of
recommender system. At the same time, it suffers from its
MAE

poor quality when the number of the records in the user

0.85
database increases. Sparsity of source data set is the
major reason causing the poor quality. To solve the
problems of scalability and sparsity in the collaborative
0.81
filtering, this paper proposed a personalized
recommendation approach joins the user clustering
0.77 technology and item clustering technology. Users are
0.2 0.3 0.4 0.5 0.6 0.7 0.8
clustered based on users’ ratings on items, and each users
train-test ratio x cluster has a cluster center. Based on the similarity
between target user and cluster centers, the nearest
Figure3. MAE of the different prediction algorithm with respect to neighbors of target user can be found and smooth the
train-test ratio x. prediction where necessary. Then, the proposed approach
utilizes the item clustering collaborative filtering to
D. Comparing with the traditional CF produce the recommendations. The recommendation
We compare the proposed method combining user joining user clustering and item clustering collaborative
clustering and item clustering collaborative filtering with filtering is more scalable and more accurate than the
the traditional collaborative filtering. The size of the traditional one.
neighborhood has a significant effect on the prediction
quality. In our experiments, we vary the number of
neighbors and compute the MAE. The obvious ACKNOWLEDGMENT

752 JOURNAL OF SOFTWARE, VOL. 5, NO. 7, JULY 2010

A Project Supported by Scientific Research Fund of [16] K Honda, N Sugiura, H Ichihashi, S Araki. Collaborative
Zhejiang Provincial Education Department (Grant No. Filtering Using Principal Component Analysis and Fuzzy
Clustering,Lecture Notes in Computer Science, 2001
Y200806038). [17] S.H.S. Chee，J Han，K. Wang．Rectree: An efficient
collaborative filtering method. Lecture Notes in Computer
REFERENCES Science, 2114, 2001
[18] B. Sarwar, G. Karypis, J. Konstan and J. Riedl,
[1] Breese J, Hecherman D, Kadie C. Empirical analysis of Recommender systems for large-scale e-commerce:
predictive algorithms for collaborative filtering. In: Scalableneighborhood formation using clustering,
Proceedings of the 14th Conference on Uncertainty in Proceedings of the Fifth International Conference on
Artificial Intelligence (UAI’98). 1998. 43~52. Computer andInformation Technology, 2002
[2] Chong-Ben Huang, Song-Jie Gong, Employing rough set [19] D. Bridge and J. Kelleher, Experiments in sparsity
theory to alleviate the sparsity issue in recommender reduction: Using clustering in collaborative recommenders,
system, In: Proceeding of the Seventh International in Procs. of the Thirteenth Irish Conference on Artificial
Conference on Machine Learning and Cybernetics Intelligence and Cognitive Science, pp. 144–149. Springer,
2002.
(ICMLC2008), IEEE Press, 2008, pp.1610-1614.
[3] Sarwar B, Karypis G, Konstan J, Riedl J. Item-Based [20] J. Kelleher and D. Bridge. Rectree centroid: An accurate,
scalable collaborative recommender. In Procs. of the
collaborative filtering recommendation algorithms. In: Fourteenth Irish Conference on Artificial Intelligence and
Proceedings of the 10th International World Wide Web Cognitive Science, pages 89–94, 2003.
Conference. 2001. 285-295. [21] Xue, G., Lin, C., & Yang, Q., et al. Scalable collaborative
[4] Manos Papagelis, Dimitris Plexousakis, Qualitative filtering using cluster-based smoothing. In Proceedings of
analysis of user-based and item-based prediction the ACM SIGIR Conference 2005 pp.114–121.
algorithms for recommendation agents, Engineering [22] George, T., & Merugu, S. A scalable collaborative filtering
Application of Artificial Intelligence 18 (2005) 781-789. framework based on co-clustering. In Proceedings of the
[5] Hyung Jun Ahn, A new similarity measure for IEEE ICDM Conference. 2005
collaborative filtering to alleviate the new user cold- [23] Rashid, A.M.; Lam, S.K.; Karypis, G.; Riedl, J.;
starting problem, Information Sciences 178 (2008) 37-51. ClustKNN: A Highly Scalable Hybrid Model- & Memory-
[6] SongJie Gong, The Collaborative Filtering Based CF Algorithm. WEBKDD 2006.
Recommendation Based on Similar-Priority and Fuzzy [24] Cantador, I., Castells, P. Multilayered Semantic Social
Clustering, In: Proceeding of 2008 Workshop on Power Networks Modelling by Ontologybased User Profiles
Electronics and Intelligent Transportation System Clustering: Application to Collaborative Filtering. EKAW
2006, pp. 334-349.
(PEITS2008), IEEE Computer Society Press, 2008, pp.
248-251. [25] Panagiotis Symeonidis, Alexandros Nanopoulos,
Apostolos Papadopoulos, Yannis Manolopoulos,Nearest-
[7] SongJie Gong, GuangHua Cheng, Mining User Interest Biclusters Collaborative Filtering,WEBKDD 2006
Change for Improving Collaborative Filtering, In:Second
[26] Panagiotis Symeonidis, Alexandros Nanopoulos,
International Symposium on Intelligent Information
Apostolos N. Papadopoulos, Yannis Manolopoulos.
Technology Application(IITA2008), IEEE Computer
Nearest-biclusters collaborative filtering based on constant
Society Press, 2008, Volume3, pp.24-27.
and coherent values. Inf Retrieval 2007 DOI
[8] Duen-Ren Liu, Ya-Yueh Shih, Hybrid approaches to
10.1007/s10791-007-9038-4.
product recommendation based on customer lifetime value
[27] Gao Fengrong, Xing Chunxiao, Du Xiaoyong, Wang Shan,
and purchase preferences, The Journal of Systems and
Personalized Service System Based on Hybrid Filtering for
Software 77 (2005) 181–191.
Digital Library, Tsinghua Science and Technology,
[9] Yu Li, Liu Lu, Li Xuefeng, A hybrid collaborative filtering
Volume 12, Number 1, February 2007,1-8.
method for multiple-interests and multiple-content
[28] Huang qin-hua, Ouyang wei-min, Fuzzy collaborative
recommendation in E-Commerce, Expert Systems with
filtering with multiple agents, Journal of Shanghai
Applications 28 (2005) 67–77.
University (English Edition), 2007,11(3):290-295.
[10] George Lekakos, George M. Giaglis, Improving the
[29] Songjie Gong, Chongben Huang, Employing Fuzzy
prediction accuracy of recommendation algorithms:
Clustering to Alleviate the Sparsity Issue in Collaborative
Approaches anchored on human factors, Interacting with
Filtering Recommendation Algorithms, In: Proceeding of
Computers 18 (2006) 410–431.
[11] L. H. Ungar and D. P. Foster. Clustering Methods for 2008 International Pre-Olympic Congress on Computer
Collaborative Filtering. In Proc. Workshop on Science, World Academic Press, 2008, pp.449-454.
Recommendation Systems at the 15th National Conf. on
Artificial Intelligence. Menlo Park, CA: AAAI Press.1998
[12] L. H. Ungar and D. P. Foster. A Formal Statistical
Approach to Collaborative Filtering. Proceedings of
Conference on Automated Leading and Discovery
(CONALD), 1998. SongJie Gong was born in Cixi, Zhejiang Province,
[13] M. O. Conner and J. Herlocker. Clustering Items for P.R.China, in July 1, 1979. He received B. Sc degree from
Collaborative Filtering. In Proceedings of the ACM SIGIR Tongji University and M. Sc degree in computer application
Workshop on Recommender Systems, Berkeley, CA, from Shanghai Jiaotong University, P.R. China in 2003 and
August 1999. 2006 respectively. He is currently a teacher in Zhejiang
[14] A. Kohrs and B. Merialdo. Clustering for Collaborative Business technology Institute, Ningbo, P.R.China.
Filtering Applications. In Proceedings of CIMCA'99. IOS His research interest includes data mining, information
Press, 1999. processing and intelligent computing. He has published more
[15] Lee, WS. Online clustering for collaborative filtering. than 30 papers in journals and conferences.
School of Computing Technical Report TRA8/00. 2000.

Recommendation system techniques and related issues a survey
No ratings yet
Recommendation system techniques and related issues a survey
7 pages
A Personalized Recommender Integrating Item-Based and User-Based Collaborative Filtering
No ratings yet
A Personalized Recommender Integrating Item-Based and User-Based Collaborative Filtering
4 pages
Combining Memory-Based and Model-Based Collaborative Filtering in Recommender System
100% (1)
Combining Memory-Based and Model-Based Collaborative Filtering in Recommender System
4 pages
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
No ratings yet
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
9 pages
An Item-based collaborative filtering method using Item-based hybrid similarity
No ratings yet
An Item-based collaborative filtering method using Item-based hybrid similarity
4 pages
AN OPTIMIZED ITEM-BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHM
No ratings yet
AN OPTIMIZED ITEM-BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHM
5 pages
Review of Clustering-Based Recommender Systems
No ratings yet
Review of Clustering-Based Recommender Systems
22 pages
Unit Iii-Collaborative Filtering
No ratings yet
Unit Iii-Collaborative Filtering
34 pages
Collaborative Filtering beyond the User-Item Matrix A Survey of the State of the Art and Future Challenges
No ratings yet
Collaborative Filtering beyond the User-Item Matrix A Survey of the State of the Art and Future Challenges
45 pages
Online Book Recommendation System
100% (1)
Online Book Recommendation System
21 pages
Peng 2013
No ratings yet
Peng 2013
4 pages
A Survey For Personalized Item Based Recommendation System
No ratings yet
A Survey For Personalized Item Based Recommendation System
3 pages
VijayVerma CFRSfinal Formatted
No ratings yet
VijayVerma CFRSfinal Formatted
22 pages
4 - IEEE - DM - Collabrative Filtering User Intrest
No ratings yet
4 - IEEE - DM - Collabrative Filtering User Intrest
1 page
Collab Survey
No ratings yet
Collab Survey
19 pages
Collaborative Filtering Recommendation Algorithm B
No ratings yet
Collaborative Filtering Recommendation Algorithm B
14 pages
Slides Lecture 2 RecSys
No ratings yet
Slides Lecture 2 RecSys
86 pages
Article34
No ratings yet
Article34
8 pages
An Item-based Collaborative Filtering Recommendation Algorithm Using Slope
No ratings yet
An Item-based Collaborative Filtering Recommendation Algorithm Using Slope
3 pages
10 26599@bdma 2018 9020012
No ratings yet
10 26599@bdma 2018 9020012
9 pages
3 Clustering
No ratings yet
3 Clustering
86 pages
Incremental Collaborative Filtering For Binary Ratings: December 2008
No ratings yet
Incremental Collaborative Filtering For Binary Ratings: December 2008
5 pages
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
No ratings yet
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
19 pages
recommender_system_part4
No ratings yet
recommender_system_part4
28 pages
Miranda 2008 A
No ratings yet
Miranda 2008 A
5 pages
A Collaborative Filtering Recommendation Algorithm Based on Item Genre and Rating Similarity
No ratings yet
A Collaborative Filtering Recommendation Algorithm Based on Item Genre and Rating Similarity
4 pages
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
No ratings yet
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
7 pages
Collaborative Filtering Process in A Whole New Light
No ratings yet
Collaborative Filtering Process in A Whole New Light
8 pages
15.0 Collaborative Filtering
No ratings yet
15.0 Collaborative Filtering
13 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
Wang 2007
No ratings yet
Wang 2007
6 pages
Book Recommendation System
No ratings yet
Book Recommendation System
8 pages
Survey On Collaborative Filtering Technique in Recommendation System
No ratings yet
Survey On Collaborative Filtering Technique in Recommendation System
7 pages
Recommendation Systems: A Review
No ratings yet
Recommendation Systems: A Review
6 pages
Location based
No ratings yet
Location based
11 pages
Recommended
No ratings yet
Recommended
8 pages
LITERATURE SURVEY ON RECOMMENDATION ENGINEaper
No ratings yet
LITERATURE SURVEY ON RECOMMENDATION ENGINEaper
9 pages
Time Based Collaborative Recommendation System by Using Data Mining Techniques
No ratings yet
Time Based Collaborative Recommendation System by Using Data Mining Techniques
7 pages
A_Main - A Clustering-Based Hybrid Approach to Integrate Item Content Into Collaborative Filtering
No ratings yet
A_Main - A Clustering-Based Hybrid Approach to Integrate Item Content Into Collaborative Filtering
7 pages
Improving Collaborative Filtering Recommender Systems Using Semantic Information
No ratings yet
Improving Collaborative Filtering Recommender Systems Using Semantic Information
6 pages
P3 y X 1
No ratings yet
P3 y X 1
6 pages
Recommender System - New
No ratings yet
Recommender System - New
49 pages
Collaborative Filtering-Based Recommender System: Approaches and Research Challenges
No ratings yet
Collaborative Filtering-Based Recommender System: Approaches and Research Challenges
6 pages
09 Chapter 1
No ratings yet
09 Chapter 1
4 pages
RecommenderSystems-Shortened
No ratings yet
RecommenderSystems-Shortened
95 pages
Research Paper On Recommend On Er Systems
No ratings yet
Research Paper On Recommend On Er Systems
6 pages
ITEM-ITEM Complete Lecture
No ratings yet
ITEM-ITEM Complete Lecture
19 pages
Hybrid Travel Recommendation Algorithm Based On Center Aggregation Parameters
No ratings yet
Hybrid Travel Recommendation Algorithm Based On Center Aggregation Parameters
7 pages
Unit Iii Collaborative Filtering
No ratings yet
Unit Iii Collaborative Filtering
51 pages
book-recommendation-using-collaborative-filtering-IJERTV12IS040195
No ratings yet
book-recommendation-using-collaborative-filtering-IJERTV12IS040195
5 pages
book-recommendation-using-collaborative-filtering-IJERTV12IS040195
No ratings yet
book-recommendation-using-collaborative-filtering-IJERTV12IS040195
6 pages
M02 User-Based CF V02
No ratings yet
M02 User-Based CF V02
20 pages
A Novel Collaborative Filtering Model Based On Combination of Correlation Method With Matrix Completion Technique
No ratings yet
A Novel Collaborative Filtering Model Based On Combination of Correlation Method With Matrix Completion Technique
8 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
Unit 1 Recommender Systems
No ratings yet
Unit 1 Recommender Systems
33 pages
Role of Matrix Factorization Model in Collaborative Filtering Algorithm: A Survey
No ratings yet
Role of Matrix Factorization Model in Collaborative Filtering Algorithm: A Survey
6 pages
Item Based Collaborative Filtering Research
No ratings yet
Item Based Collaborative Filtering Research
4 pages
Recommender Systems
No ratings yet
Recommender Systems
12 pages
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Graph Regularized Non-negative Matrix Factorization for Data Representation
No ratings yet
Graph Regularized Non-negative Matrix Factorization for Data Representation
17 pages
Social network analysis- An overview
No ratings yet
Social network analysis- An overview
21 pages
4335-Article Text-7382-1-10-20190706
No ratings yet
4335-Article Text-7382-1-10-20190706
8 pages
Bordenave-NONBACKTRACKINGSPECTRUMRANDOM-2018
No ratings yet
Bordenave-NONBACKTRACKINGSPECTRUMRANDOM-2018
72 pages
A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
No ratings yet
A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
10 pages
A Biterm Topic Model for Short Texts
No ratings yet
A Biterm Topic Model for Short Texts
11 pages
mobile-phone-stores-mumbai
No ratings yet
mobile-phone-stores-mumbai
19 pages
Clearancedb_Admin_Guide
No ratings yet
Clearancedb_Admin_Guide
275 pages
Blockchain Interview Guide
No ratings yet
Blockchain Interview Guide
15 pages
Visual-Basic-operators
No ratings yet
Visual-Basic-operators
3 pages
Ejeg Volume9 Issue1 Article220
No ratings yet
Ejeg Volume9 Issue1 Article220
15 pages
Dav Public Schools, Jh-Zone-H Pre-Board Examination-2020-21 Class: XII Computer Science (083) (Theory)
No ratings yet
Dav Public Schools, Jh-Zone-H Pre-Board Examination-2020-21 Class: XII Computer Science (083) (Theory)
8 pages
RSU-4000 Data Sheet 0
No ratings yet
RSU-4000 Data Sheet 0
1 page
HTML Simplified Complete Notes Guide
No ratings yet
HTML Simplified Complete Notes Guide
40 pages
Steady Progress in Approaching The Quantum Advantage
No ratings yet
Steady Progress in Approaching The Quantum Advantage
7 pages
Cisco BGP 642-661
100% (1)
Cisco BGP 642-661
61 pages
Pertemuan 1 - Network Management Basic
No ratings yet
Pertemuan 1 - Network Management Basic
19 pages
Manual Simplex MINIPLEX 4100ES Series
No ratings yet
Manual Simplex MINIPLEX 4100ES Series
164 pages
Unit - 5
No ratings yet
Unit - 5
67 pages
W3!04!05-06-Cisco 7500 Series Product Overview
No ratings yet
W3!04!05-06-Cisco 7500 Series Product Overview
43 pages
Data Strategy
No ratings yet
Data Strategy
14 pages
CAM Unit 3
No ratings yet
CAM Unit 3
7 pages
Ccs 1000 D Digital Discussion System Manual
No ratings yet
Ccs 1000 D Digital Discussion System Manual
10 pages
Python Flask Framework A Step by Step Gu
No ratings yet
Python Flask Framework A Step by Step Gu
167 pages
RAD Rooms: Primax International Solutions
100% (1)
RAD Rooms: Primax International Solutions
51 pages
Cloud Computing KCS713
100% (1)
Cloud Computing KCS713
2 pages
CS3353 Unit3
No ratings yet
CS3353 Unit3
33 pages
System Firmware Update Instructions For 2019 Pioneer CD Receivers
No ratings yet
System Firmware Update Instructions For 2019 Pioneer CD Receivers
4 pages
Mount St. Mary's University Track & Field Recruiting Questionnaire
No ratings yet
Mount St. Mary's University Track & Field Recruiting Questionnaire
1 page
Accord Service Manual
No ratings yet
Accord Service Manual
1,379 pages
GRADE 3 UNIT 2 HARDWARE AND SOFTWARE WORK TOGETHER
No ratings yet
GRADE 3 UNIT 2 HARDWARE AND SOFTWARE WORK TOGETHER
3 pages
Instant download Build Your Own Cybersecurity Testing Lab: Low-cost Solutions for Testing in Virtual and Cloud-based Environments Ric Messier pdf all chapter
No ratings yet
Instant download Build Your Own Cybersecurity Testing Lab: Low-cost Solutions for Testing in Virtual and Cloud-based Environments Ric Messier pdf all chapter
41 pages
Voucher 16-7-2024
No ratings yet
Voucher 16-7-2024
15 pages
Tech Mahindra Tech Test QuesAVCOE
No ratings yet
Tech Mahindra Tech Test QuesAVCOE
44 pages
One Mark Sep 2023
No ratings yet
One Mark Sep 2023
6 pages
PassDB - A Password Database With Strict Privacy Protocol Using 3D Bloom Filter
No ratings yet
PassDB - A Password Database With Strict Privacy Protocol Using 3D Bloom Filter
20 pages

A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering

Uploaded by

A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering

Uploaded by

JOURNAL OF SOFTWARE, VOL. 5, NO.

7, JULY 2010 745

A Collaborative Filtering Recommendation

© 2010 ACADEMY PUBLISHER

© 2010 ACADEMY PUBLISHER

c The rating for a user item pair is generated randomly

© 2010 ACADEMY PUBLISHER

© 2010 ACADEMY PUBLISHER

B. Smoothing B. Item Clustering

© 2010 ACADEMY PUBLISHER

for each cluster center CIj∈CI m

calculate the sim(Ii, CIj); ∑ R it R ir

Where Rui is the rating of the target user u to the

© 2010 ACADEMY PUBLISHER

poor quality when the number of the records in the user

© 2010 ACADEMY PUBLISHER

© 2010 ACADEMY PUBLISHER

You might also like