Movie Recommendation
Movie Recommendation
Movie Recommendation
Abstract
The purpose of this research is to develop a movie recommender system using collaborative filtering technique and K-
means. Collaborative filtering is the most successful algorithm in the recommender system’s field. A recommender system
is an intelligent system that can help a user to come across interesting items. This paper considers the users m (m is the
number of users), points in n dimensional space (n is the number of items) and we present an approach based on user
clustering to produce a recommendation for the active user by a new approach. We used k-means clustering algorithm to
categorize users based on their interests. We evaluate the traditional collaborative filtering and our approach to compare
them. Our results show the proposed algorithm is more accurate than the traditional existing one, besides it is less time
consuming than the previous existing methods.
Keywords
Recommendation system, Collaborative filtering, K-means, Clustering, Data mining.
like which she hasn’t yet seen. There are other is the satisfaction that the target user t will have a
methods for performing recommendation, such as piece of data i.
finding items similar to the items liked by a user
using textual similarity in metadata content-based 3.Implementation
filtering (CBF). Now a popular technique for finding The implementation of recommendation system, k-
similarity has three parts: adjusted cosine-based means and collaborative filtering techniques steps are
similarity, cosine-based similarity and correlation- as follows:
based similarity.
3.1Study the problems and needs of the system
CF is dependent on the correlation-rate (Co-Rate) to Studied the algorithm used for k-means and
find user’s feature nearest to all users from the collaborative filtering techniques in using web
database. So a major cause significant problems such application PHP language to also study the need for
as scalability problem. This is a challenge for the the data to recommend screening of the movies
designers, because the system will have increased which are close to the needs of most applications,
amount of active users, which will result into user including legacy systems that provide educational
similarity hence processing takes long time [15]. So information using historical data, that allows users to
in case of a larger system, this solution can be set the preferences movies on a scale of one to five
achieved by designing the system to segment users (5) level as shown in Table 1.
into groups before entering into collaborative
filtering to reduce the time to process user similarity Table 1 The movies liking form rating scale
and the results will be faster. Liking Rating scale
Very like 5
∑ ( ̅ )( ̅ )
Rsim(t,c)= (2) Like 4
√∑ ( ̅ ) √∑ ( ̅ ) Normal 3
Not like 2
Do not like 1
Where: t is the user of data target and c is the user of
data comparison. 3.2Data collection
Rsim(t,c) is the correlation similarity between user of Collected information is to relate and to meet the
data t and c. objectives that have been set in the system. This part
R_(i,t) and R_(i,c )is the user rating for user of data t shows the basic data used to develop the system for
and c. the group of users with k-means and set the data used
t and c is the k-mean rating for user of data t to create a database of the system. The data has been
and c considered from website movie lens project
http://grouplens.org/datasets/. This data is clusterd in
Collaborative filtering is very important for the ten groups as shown in Table 2.
introduction of the result of the value forecast
satisfaction to predict the preferences of users on one Table 2 The nature of the group is to divide with k-
piece of information [14]. These values were means
calculated from the similarity between pairs of items. Group Member of group
The next step is to target users by using two types of 1 36
weight sum and regression herein speak only as part
2 70
of how the weight sum prediction is only due to the
method used in this research. Weight sum prediction 3 56
prophecy is the satisfaction target number of 4 52
information pieces, each piece k Rating is a weighted 5 61
value based on the resemblance the formula to 6 39
calculate. 7 41
8 52
∑
9 72
∑
10 20
54
International Journal of Advanced Computer Research, Vol 7(29)
3.3Systems analysis The system can be divided into three parts including
The processing model is shown in Figure 2. the user, the movie manager and system
administrator.
Start
Rating prediction
User profile
Ranking
Finish
Figure 2 Processing model
Figure 2 shows the work after active user gives a as the comparison tool. Then selects the most similar
rating to the movie, when you watched and requests a to the current user of 30 people that bring profile to
recommendation system of other movies. Then, the all of those to predict rating movies of the current
system will cluster active user to the group which you user that has never visited before with weighted-sum
are recommended by using k-means. After that, the prediction. After that, they were ranked and rated the
system will take that profile active user to compare movies with up to 10 of the current users.
with other users which are under same cluster with
active user to find out what you are most similar to 3.4Development system
the active user, using Pearson correlation coefficient
55
Phongsavanh Phorasim et al.
Figure 3 explains the four different phases in the Table 4 The data of centroid for three movies
system. The first phase is a group that provides users Movie K K K K K K K K K K1
with rating information of the movie that provides -ID 1 2 3 4 5 6 7 8 9 0
with a variable of the grouping. Phase 2 to 4 is a 124 3 3 3 2 1 4 5 3 3 4
processing collaborative filtering. 133 2 3 3 4 3 3 3 2 2 3
140 3 3 3 4 2 3 4 3 2 3
Collaborative filtering will calculate the rating
between users within the cluster, after that go and The system performs clustering algorithm with k-
compare the similarity values in the user-user means of measuring the distance from the point of a
similarity matrix. It will select closest user similarity centroid cluster of the 10 groups. The equation for
and pull data similarity values for neighbors that calculating the Euclidean distance as a group of users
come out and calculate the predicted rating and put in with one or K1 below:
store to search highest predicted rating for relevant
user and recommendation to the user. D=√ =2.23
3.5Processing k-means and collaborative filtering Similarly, calculate the distance of the user groups
The system will search group for users by using k- with 2 to 10, which will have the full results below.
means to find the distance between users, the group The distance of a user or a group 1 k 1 equals 2.23
of users and clustering of users. The system The distance of a user or a group 2 k 2 equals 2.44
clustering with the k- means algorithms by measuring The distance of a user or a group 3 k 3 equals2.44
the distance of each data point from the center of the The distance of a user or a group 4 k 4 equals 3.60
10 groups by using Euclidean distance and calculated The distance of a user or a group 5 k 5 equals 4.24
information will be stored in the database. The distance of a user or a group 6 k 6 equals 1.73
The distance of a user or a group 7 k 7 equals 1
Table 3 User gives movies rating The distance of a user or a group 8 k 8 equals 2.23
User_Id Movie_Id Rating The distance of a user or a group 9 k 9 equals 2.82
501 124 5 The distance of a user or a group 10 k 10 equals 1.73
501 133 2
501 140 4 So the above outcome determines the evidence that
such users are spaced from the least common group
Table 3 shows the movie rating given by the users. to the greater, so the system will provide the user 501
Each group were compared as shown in Table 4. in seven groups.
The calculation is similar to Person correlation to the Therefore, only User 4 and 5 which is similar have
resemblance of making comparisons will get result e considered for calculation.
like this:
The similarities between User_1 and User_3 = -0.5 K={ , }
The similarities between User_1 and User_4 = 0.5 = =
The similarities between User_1 and User_5 = 0
=
After that, the process of bringing the user that looks =4
for rating similar to the target the number of K to
predict satisfaction as possible by weight sum
The above calculation of the predicted rating User1 is
equation, Here, the size of the user is 2(K=2).
equal to User4 for movie casino.
Table 6 The calculation of the similarities between the User 1 and User 3
Father of Golden eye Casino Four Money Get shorty Assassins
the bride rooms train
User_1 1 4 ? 3 ? 3 2
User _3 3 ? 1 ? 2 1 2
57
Phongsavanh Phorasim et al.
The data is selected and converted into a comma simple k-means. Next left click on the smile k-means
separated values (CSV) and formatted using the CSV –N 2-A “weka.core. Euclidean Distance –R first-last”
converter and then cluster imported to WEKA then it will show function. On these pages we can set
program. It is the first open WEKA program, the number of clusters depending on how many
explorer, pre-process, and then open your file and groups do you want. In our paper, we use 10 clusters
click the cluster, after that you will see so many to get the results as shown in Figure 5 and 6.
options and then click on the choose button to choose
Figure 5 shows that group 1 includes 36 people, includes 72 people, on the same time we will see
group 2 includes 70 people, group 3 includes 56 Figure 6. That is centroids all of group for movies.
people, group 4 includes 52 people, group 5 includes Furthermore, by comparing this set of different
61 people, group 6 includes 39 people, group 7 groups, using the centroid method we figured out that
includes 41 people, group 8 includes 52 people, while the group is increasing, members are getting
group 9 includes 20 people and the last group 10 improved in number too.
58
International Journal of Advanced Computer Research, Vol 7(29)
4.Conclusion and future work [8] Han J, Kamber M. Data mining: concepts and
Collaborative filtering is the most successful and techniques. Elsevier; 2011.
[9] Hand DJ, Mannila H, Smyth P. Principles of data
popular algorithm in the recommender system’s field. mining. MIT press; 2001.
It helps customers to make a better decision by [10] Zhao Y. R and data mining: examples and case
recommending interesting items. Even though this studies. Academic Press; 2012.
algorithm is the best, it suffers from poor accuracy [11] Kużelewska U. Advantages of information granulation
and high running time. To solve these problems, this in clustering algorithms. In international conference
paper proposed a recommendation approach based on on agents and artificial intelligence 2011 (pp. 131-45).
user clustering by using the Euclidian distance to Springer Berlin Heidelberg.
calculate two users to cluster dataset. This method [12] McSherry D. Explaining the pros and cons of
combines clustering and neighbors’ vote to generate conclusions in CBR. In European conference on case-
based reasoning 2004 (pp. 317-30). Springer Berlin
predictions. In the future there may be techniques, Heidelberg.
fuzzy c-means in the group stages of the first system [13] Felfernig A, Friedrich G, Schmidt-Thieme L. Guest
to provide a more effective segmentation. editors' introduction: recommender systems. IEEE
Intelligent Systems. 2007; 22(3):18-21.
Acknowledgment [14] Adomavicius G, Tuzhilin A. Toward the next
None. generation of recommender systems: a survey of the
state-of-the-art and possible extensions. IEEE
Conflicts of interest Transactions on Knowledge and Data Engineering.
The authors have no conflicts of interest to declare. 2005; 17(6):734-49.
[15] Adomavicius G, Tuzhilin A. Recommendation
References technologies: survey of current methods and possible
[1] Zhan J, Hsieh CL, Wang IC, Hsu TS, Liau CJ, Wang extensions. Information Systems Working Papers
DW. Privacy-preserving collaborative recommender Series, 2004.
systems. IEEE Transactions on Systems, Man, and
Cybernetics, Part C (Applications and Reviews). 2010;
40(4):472-6.
[2] Gong S. A collaborative filtering recommendation Yu Lasheng, Vice professor in Central
algorithm based on user clustering and item clustering. South University of China. He is the
Journal of Software. 2010; 5(7):745-52. member of ACM and CCF. He is
[3] Manvi SS, Nalini N, Bhajantri LB. Recommender ACM/ICPC golden medal coach. He
system in ubiquitous commerce. In international received the B.Sc. degree in Computer
conference on electronics computer technology 2011 Science, the Master degree and a Ph.D.
(pp. 434-8). IEEE. degree in Control Theory and Control
[4] Pu P, Chen L, Hu R. A user-centric evaluation Engineering from Central South
framework for recommender systems. In proceedings Photo He is the editor of the Journal of Convergence
University.
of the fifth ACM conference on recommender systems Information Technology and Advances in Information
2011 (pp. 157-64). ACM. Sciences and Service Sciences, etc. He is also the reviewer
[5] Hu R, Pu P. Acceptance issues of personality-based for the journals such as Future Generation Computer
recommender systems. In proceedings of the third Systems, Journal of Parallel and Distributed Computing
ACM conference on recommender systems 2009 (pp. and Artificial Intelligence Review, etc. He has published at
221-4). ACM. least 70 papers on Agent technologies or Algorithms, and
[6] Pathak B, Garfinkel R, Gopal RD, Venkatesan R, Yin has published 3 books. He has organized and implemented
F. Empirical analysis of the impact of recommender many projects which have created great achievements in
systems on sales. Journal of Management Information the society. His main research interests include agent
Systems. 2010; 27(2):159-88. technologies and applications, structure and algorithm,
[7] Witten IH, Frank E, Hall MA. Data mining: practical smart computing etc.
machine leaning toots and techniques. Morgan Email: [email protected]
Kaufmann Publishers, Elsevier; 2011.
59