0% found this document useful (0 votes)
109 views9 pages

Movie Recommender Engine Using Collaborative Filtering: Smart Innovation October 2018

Uploaded by

prateek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views9 pages

Movie Recommender Engine Using Collaborative Filtering: Smart Innovation October 2018

Uploaded by

prateek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/320703320

Movie Recommender Engine Using Collaborative Filtering

Chapter  in  Smart Innovation · October 2018


DOI: 10.1007/978-981-10-5547-8_62

CITATIONS READS

3 2,493

6 authors, including:

Sadanand Howal
Rajarambapu Institute Of Technology
3 PUBLICATIONS   4 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Machine Learning and Deep Learning View project

All content following this page was uploaded by Sadanand Howal on 20 June 2019.

The user has requested enhancement of the downloaded file.


Movie Recommender engine using Collaborative
filtering

Howal Sadanand*, , Desai Vrushali1, Nerlekar Rohan 2, Mote Avadhut 3, Vanjari


Rushikesh4, Rananaware Harshada5,
1Rajarambapu Institute of Technology, Information technology, Islampur. 415414,
Maharashtra, India
{[email protected], [email protected],
[email protected],[email protected],
[email protected], [email protected]}
Abstract— The purpose of the paper is to research and form the hybrid algorithm using
different collaborative algorithms to achieve the smart clustering to get efficient results. The
volume of data which includes both unstructured and structured, and its knowledge has fully
grown heavily in recent days. Recommendation system is changing into growingly widespread
as they're victimization all over in E-commerce space. Managing large amount of data and
information and testing both trained data and tested data to give best recommendation are the
main aspects of the project. A massive framework which is used for processing distributed data
called Apache Spark is used in the project. As compared to old mapping functions, Spark
handles repetitive algorithms, interactive algorithms, and stripped down intervals of time. [1]
Keywords—Collaborative filtering, Pearson Correlation coefficient, Apache Spark

1. INTRODUCTION

Recommender systems area unit a vital a part of the informative websites, business
websites and e-commerce system. Recommender system plays a vital role within the
social media like Facebook, Twitter, YouTube etc. picture show recommendation
system aims to supply the most effective picture show recommendation victimization
the past information wherever in algorithms area unit accustomed value the results
.They represent a strong methodology for enabling users to filter through massive data
and products areas. Nearly 20 years of analysis on cooperative filtering have diode to a
varied set of algorithms and an upscale assortment of tools for evaluating their
performance. Data mining techniques area unit the results of a protracted method of
analysis and products development.

2. LITERATURE SURVEY

The major literature survey done by data science community. The book –DATA
MINING –Concepts and Techniques by Jiewei Han has mentioned various collaborative
filtering techniques. [5] By reading and analyzing IEEE conference paper by Sasmita
Panigrahi, Rakesh Ku.Lenka and Anaya Stitipragyan “A Hybrid Distributed Collaborative
Filtering Recommender Engine using Apache Spark” we got an idea of recommendation
system using collaborative filtering.[10] We are going to execute the idea on Apache Spark.
The paper ‘Hybrid web recommender systems written by Robin Burke talks about variety
of algorithms have been for getting recommendation, including content based collaborative
and other algorithms. This paper surveys the hierarchical pattern of actual and possible
hybrid recommenders, and introduces a great hybrid system that combines the knowledge
based recommendation and collaborative filtering to recommend restaurant. It purely relies
on the physical characteristics of the user. [3]
The paper ‘Clustering Methods for Collaborative filtering’ by authors Lyle H. Ungar
and Dean P. Foster discusses about a new method of grouping and clustering of items. The
methods used in this paper to optimize the model estimations give the best base rates and
link probabilities. Hence, for better results of algorithm it is necessary to use appropriate
input methods. [6]
The review article ‘A New Parallel Item-Based Collaborative Filtering Algorithm Based
on Hadoop’ by Qun Liu, Xiaobing Li* talks about various features extraction methods that
can be used for filtering using parallelization design for Item-Based Collaborative Filtering.
They describe various features and feature extraction methods. The overall article gives the
idea about constructing user’s preference vectors and computing co-occurrence matrix. [11]
We aim to study these methods and use the best suited one among them and combine the
best suited techniques to achieve the better results.
The authors Reena Pagare and Shalmali A. Patil in their paper titled as ‘Study of
Collaborative Filtering Recommendation Algorithm – Scalability Issue’ talk about
statistical approach of problems in filtering techniques of datasets. The paper discusses
about challenges in recommendation system. So from this paper we get an idea to analyze
the scalability of the algorithms. They talk about challenging problem of collaborative
filtering. [9]
The Paper “Recommendation System Based on Collaborative Filtering” by Zheng Wen
gives the proper idea to connect the choice of characteristics based on “matching” of user’s
profile specific characteristics of an item .Collaborative filtering algorithm such as sparse
matrix SVD approach model both user’s and movies by giving them coordinates in a low
dimensional feature space.[8]
The authors A.H.M Ragab, A.F.S. Mashat and A.M.Khedra in their paper entitled
“HRSPCA: Hybrid Recommender System For Predicting College Admission” talsk about
hybrid recommender based on data mining techniques and knowledge discovery rules for
tracking college admission problems. This paper gives an absolute idea about high
prediction accuracy rate, flexibility in advantage as the clustred hybrid algorithms to
perform attributes task faster and fairly. [12]
Data mining techniques area unit the results of a protracted method of analysis and
products development. This evolution began once business information was initial keep
on computers, continuing with enhancements in information access , and a lot of
recently, generated technologies that permit users to navigate through their information
in real time. Data mining takes this biological process on the far side retrospective
information access and navigation to prospective and proactive data delivery. Data
processing tools predict future trends and behaviors, permitting businesses to create
proactive, knowledge-driven choices. The machine-driven, prospective analyses
offered by data processing move on the far side the analyses of past events provided by
retrospective tools typical of call support systems. Data processing tools will answer
business queries that historically were too time intense to resolve. They scour databases
for hidden patterns, finding prophetic data that specialists could miss as a result of it
lies outside their expectations.

3. PROBLEM LIFE CYCLE

A. The amount of huge information in present days is leading to number of


choices and alarming growth of data available.[1] Internet is full of choices and there
are various recommendations available for everyone in field of movies, books, links
etc.[1]
B. Recommenders are used to suggest information products and services to the
regular customers based on the history, transactions and feedback. [3] Here the
similarity between user-user and item-item is used. In the growing field of Big-data,
the number of products, customers and provider are increasing tremendously. Hence
recommendation system become need and a growing challenge to produce results.
C. Many such systems recommend items to user are CF technologies [5].Major
problems of CF are scalability, cold start, sparsity which can be reduced with the help
of hybrid systems using combination of different algorithms.

3.1. Problem Formulation

In movie recommender a movie is an entity which is considered and


recommendation is done using similar entities. [1] Using most relevant similar entities
from a large number of dataset based on users query, recommendation can be given.
Movies are rated, this helps to retrieve other entities which are more relevant based on
relevance, authority, popularity etc. Below stating output, input and data of the
recommender and problem. [1]
Input: A Movie
Output: Recommender movies when given input.
Movie Data: Unstructured data and Semi structured data
Movie: An object, structured data
Using victimization huge processing Apache Spark to provide answer to make movie
recommender using huge processing is the problem. The goal of associate film
recommender over huge information is to style a system that's efficient, climbable, and
provides most effective doable answers for an oversized style of questions. Mentioning
below the challenges faced by Recommender:
1. Unstructured Dataset: Movie recommender of big data includes storing and
processing the huge unstructured data.[1]
2. Movie Disambiguation and Movie Resolution: “XYZ” may refer to actor as
well as movie “XYZ”. Also different strings can mean differently in different
cases.[1]
3. Movie Ranks: For a particular Movie, the user may not be interested in all
results available. The results need to be ranked. Many features of ranking are
available like page rank, click frequency etc.
3.2. Terminology

The following terms are widely used in the report:


●Movies: It is an object with unique properties and id. It is an abstract
concept which is meaningful. [1]
●Similarity: It is a numerical value calculated using similarity functions. It
also tells us relevance of a query and a movie. [1]
●Popularity: It is a measure to count the popularity of a certain movie
compared to other entities. [1]
● Property/Field: Movie’s attribute.

3.3. Problem Analysis

Why Recommender Systems?


1. Given the increasing amount and growing variety of products, services and
information, which are daily made available on the Internet, and the introduction of
new e-business services [2], making a choice from such wide range of options can
be somewhat complex and difficult to manage.
2. It can be argued that, while being able to choose is good, having so many
options to choose from is not always more rewarding.
3. To prevent overwhelmed users from making poor decisions, recommender
systems came into focus, facilitating users’ access to information about the items
they are most likely to be interested, whether such items are books, movies, music,
videos, Web pages, news or services, among others.
Why hybrid system?
A combination of different approaches and algorithms combine to increase the
potency of recommendation system is called as hybrid system. Hybrid system helps to
correct the present system additionally. The content and cooperative filtering face the
cold start problem. Hybrid system will help to solve this problem to some extent. [1]
•Why Movielens Dataset?
1. Due to copyright, Netflix data is not available for download. So, to perform
the recommendation evaluation on the movies domain, the MovieLens data is used.
2. The MovieLens dataset consists of anonymous ratings of movies collected by
the GroupLens Research that currently uses a movie recommendation system based on
collaborative filtering.
3. The MovieLens data set contains approximately 10 million ratings from
71,567 users on 10,681 movies. Ratings are made on a 5-star scale [4] (whole-star
ratings only) and each user has at least 20 ratings.
4. The data set was collected and made available by GroupLens Research at their
webpage.
A massive framework which is used for processing distributed data called
Apache Spark which has Hadoop MapReduce and it’s extension model having
additional use [1]. This is often a short tutorial that explains the fundamentals of Spark
Core programming.
An idea of how one entity is correlated with another entity is given by Pearson
Correlation coefficient. A measure of linear dependency between two variables, items
or users is given by this algorithm as attribute function [1]. But this dependency is not
based on whole dataset but a part of dataset is used which are similar on high level. [1]
Pearson Correlation Coefficient
Where:
N = Numbers of Pairs of Scores
Σxy = Sum of Product of Paired Scores
Σx = Sum of x Scores
Σy = Sum of y Scores
Σx2= Sum of Paired x Scores
Σy2=Sum of Paired y Scores

4. PROPOSED SYSTEM

Module 1: Implementation of basic algorithms in java, using eclipse & mahout.


Module 2: Study and research of Collaborative filtering
Module 3: Study & implementation of effective new recommendation algorithms
combination such as: Tanimoto Algorithm, Pearsons Algorithm, Slope Algorithm,
SVD Algorithm
Module 4: Study and research regarding Content based recommendation and
implement hybrid algorithm

Fig.No.1 System Architecture


5. TEST BED

We have tested this algorithms using simulation of system on following


requirements:
 Laptop with core i5 processor, 8 GB RAM

6. IMPLEMENTATIOIN AND RESULT ANALYSIS

We have implemented recommendation system algorithms using Item similarity,


user-user neighborhood approach, Tanimoto coefficient Algorithm, Pearson
coefficient Algorithm, Slope One Algorithm, SVD Algorithm by using mahout
libraries using JAVA platform in Eclipse on Linux platform.
The following table shows the time taken by each dataset to process from
asymmetrical data to symmetrical data:

Dataset Size 1K 10K 100K 10M


Time Required
0.80 2.0 4.0 4.57
(sec)
Table no 1. Time required for processing the datasets

6.1 Graphs

Fig.2. User –user Approach Fig.3. Item-Item Approach


Fig.4. User-User Neighborhood Approach Fig.5. SVD Approach

The above graphs give a brief comparison of our models with different standard
algorithm

CONCLUSION AND FUTURE WORK

We successfully implemented basic algorithms in java, using eclipse & mahout


libraries. The implemented algorithms are: item-item, user-user similarity, tanimoto,
Pearson coefficient, Slope one, SVD recommendation. We studied and filtered
Collaborative filtering algorithms and chose the three algorithms which gave best
results for implementation. The algorithms are Pearson coefficient algorithm and SVD
algorithm. Further implementation of modules is under work. Future works area unit
fascinating so as to stay examination the advice algorithms implementations obtainable
within the newer releases of Apache Spark and R language, since each engines for
large-scale processing area unit quickly evolving. It’ll facilitate to introduce new cluster
formula victimization the Scala. Spark is currently at version 2.0.0, discharged on
Gregorian calendar month 2016. Since unharness one.3 a brand new Data-Frames API
was introduced, that gives powerful and convenient operators once operating with
structure datasets. Since unharness one.4 they supply the SparkR, AN R binding for
Spark supported Spark’s new knowledge FrameAPI.[7] SparkR provides R users access
to Spark’s scale-out parallel runtime in conjunction with all of Spark’s input and output
formats.[7] It additionally supports career directly into Spark SQL. Finally, a right away
thanks to improve our work is by validation the algorithms on larger datasets, with
millions or billions of ratings, and if attainable from completely different
recommendation domains like music, movie, product and news
REFERENCES

1. Kulkarni, Swapna, “A Recommendation Engine Using Apache Spark” (2015). Master’s


Project,456
2. Francesceo Ricci , Lior Rokach and Bracha Shapira “Introduction To Recommender
System Handbook” DOI 10.1007/978-0-387-85820-3_1, @ Springer Science+Business
Media ,LLC 2011
3. Burke Robin, “Hybrid Web Recommender Systems” In The adaptive web pages 377-408
Springer Berlin Heidelberg ,2007
4. Jianwen Chen, Ling Feng ,“Efficient Pruning Algorithm For Top-K Ranking On Database
With Value Uncertinty” CIKM-13 Pages ,2231-2236 ,2013 ,ISBN-978-1-4503-2263-8
5. Jiawei Han, “Data Mining:Concepts And Techniques” ,2005 ISBN-1558609016
6. Ungar LH, Foster DP. “Clustering Methods For Collaborative Filtering”. In AAAI
Workshop On recommendations systems (Vol. 1,pp 114-129) July 26 1998
7. “Big Data Product Watch 8/28/15 : Streaming Analytics high Performance Computing And
More.” ICT Monitor Worldwide, August 29 2015 Issue.
8. Zhcng Wen “Recommendation System Based On Collaborative Filtering” Dec 12,2008
9. Pagare Reena, Patil Shalmali A. “Study of Collaborative Filtering Recommendation
Algorithm-Scalability Issue” International Journal of Computer Applications , Volume 67-
Number 25,2013
10. Sasmita Panigrahi, Rakesh Ku. Lenka And Ananya Stitipragyan ,“A Hybrid Distributed
Collaborative Filtering Recommender Engine Using Apache Spark” Procedia Computer
Science 1000-1016,2016
11. Qun Liu,Xiaobing Li* “A New Parallel Item-Based Collaborative Filtering Algorithm
Based On Hadoop” doi:10.17706/jsw.10.4.416-426, 2014
12. A.H.M. Ragab , A.F.S.Mashat and A.M.Khedra, “HRSPCA: Hybrid Recommender
System For Predicting College Admission,” doi:10.1109/ISDA.2012.6416521 pp 107-113
,2012

View publication stats

You might also like