Location Based
Location Based
ABSTRACT:
Recommendation systems supports the user to discover various products and contents by
foreseeing the users rating of the corresponding products or contents and showcase the items which
the users have rated highly. These items can be places, books, movies, restaurants and commodities
on which users can have different opinions. Since there is a increase of online services,Items and
products day by day,designing an efficient and effective recommendation has become an
significant task. Recent research shows that the Online service recommendation system focuses
on two prominent approaches: collaborative filtering and content-based recommendation. The
content-Based method involves characteristics of items and the Collaborative Filtering method
takes into consideration the user’s past behavior and ratings to form decisions. In existing web
services discovery approach the recommendation systems focuses on keyword-dominant web
service search engines.Thease searach engines may possess several limitations such as poor
recommendation performance and they are heavily dependenent on correct and complex queries
from users. In this paper,an Agglomerative Hierarchical Clustering-based Collaborative Filtering
approach is proposed for effective recommendation. The proposed recommender system
recommends similar or more accurate places to the customers present in the same clusters with
some similarity in preferences.
1. INTRODUCTION:
The data available in the web can be structured, semi-structured, or unstructured. The
traveling history of the people can be very vast and varied [1], which results in big data sets. Big
Data can be referred to as a collection of substantial data sets that are growing exponentially with
time, which has to be stored and managed. The traditional methods are very less efficient in
managing this big data as these approaches are expensive, time-consuming, and less scalable[2][6].
Insight of this challenge, a Clustering-based Collaborative Filtering approach (Club CF) is
proposed in this paper that aims at recruiting similar services within the same clusters to suggest
services collaboratively. Recommender systems in location-based social networks take benefit of
social and geographical influence in creating customized Points-of-interest (POI)
recommendations[1][4]. The social influence is obtained from similar users supported matching
visit history[3]. In contrast, the geographical influence is obtained from the geographic preferences
of users[4] we can get after their arrival at different POIs[7]. However, this approach could come
short once a user moves to a brand new location wherever there is no activity history[[9].
We tend to propose a system that models user preferences based on user ratings and
categories of POIs. This cluster-based recommendation system uses both types of Collaborative
Filtering, namely, User-Based CF[3] and Item-Based CF[8]. This process entails a data pre-
processing part, during which the datasets will be retrieved and processed and then grouped into
clusters with similar entities. Recommendations are then created for every cluster. The top side of
this approach is providing a recommendation which process the data in quicker manner at runtime
and as a result of virtually everything is pre-computed.
2. LITERATURE SURVEY:
The field of recommender systems uses mainly three methods namely, content-based,
collaborative and hybrid recommendation approaches [2]. The user-based collaborative filtering
identifies similar users to an active user for whom the recommendations have to be provided and
then using the user-item matrix, ratings for items not rated by the active user are predicted and
accordingly, the recommendations are done[3].
The relationships between the items are identified in Item-based collaborative filtering,
unlike user-based in which the relationships of users are considered. The similarities between the
items are calculated and then the recommendations are provided. This method is sometimes more
efficient than user-based CF, as relationships between items are not based on fluctuating moods of
users[8].Many different approaches for different purposes are used such as State-of-the-art and
time-aware recommender systems.
The state-of-the-art technique uses the user-generated textual reviews, which studies two
branches that are review-based user profile building and review-based product profile building.
The limitation in this system is that the user-defined reviews may not be very clear and sometimes
the negative comments can be considered positive and vise versa. This can be overcome by
considering only the ratings of the users[2][3].
The users tend to visit a different type of places at different time slots. For example, people usually
visit restaurants during the afternoon and pubs at night. So, the time-aware point-of-interest is
introduced based on geographical[6] and temporal influences[4][10].Personalized ranking and
suggestions are very crucial based on the user's history and preferences[5]. Many real-world
constraints, like POI availability, traveling time constraints, diversity, can interfere in providing
accurate and e[fficient suggestions to the user[9].
There are many limitations in the existing recommendation systems like cold start
problems, data scalability, data sparsity [7].Cold start problem occurs when a user is new to the
system and has no history or data to suggest the items.Data sparsity problem occurs because the
users and items data are in abundance, and the ratings are comparatively less[11]. The users are
not so used to rate immediately, causing a lack of data. Data scalability is the rapidly changing
data that cannot be controlled[12]. So many new users and new items are added to the datasets and
there is a lack of data. The proposed clustering methodology can overcome these limitations and
increases the efficiency of the system[13].
Initially, the data is preprocessed, converting it into some useful datasets using which different
clusters are created. According to the similarities, a user-item matrix is generated, which contributes to
predicting the missing ratings. This metric computes the Euclidean distance between two user points.
Collaborative filtering is taken into account for recommending the items based on users’ and items’
ratings.The architecture of the location based recommender system is shown in Figure 1.
In this module, the users’ ratings are collected as a dataset, and it has been used for the
analysis. The ratings given by the user are saved in CSV format of data. This data set is provided
as an input to the mapper class, which takes the userID, placeID and ratings given by the user in a
map. userID will be considered as a key, whereas placeID and ratings are taken as a value. Once
after loading this data, it forms an output. This output is given as an input to the reducer. The
reducer finds a particular cluster for each user. Places rated by each user are grouped by the reducer
preprocess step which is shown in Figure 2.
Clustering is an essential step in our approach. The clusters provide with the neighborhood
of the active user (the user for whom the recommendations are made). Agglomerative clustering
is a bottom-up approach that starts with several small-scale clusters and then merges them along
to create more massive clusters. Dendrograms are the visual representation of this algorithm. The
better explanation of this algorithm can be:
1. Treat every data as one individual cluster at the start of the algorithm.
2. Merge two clusters at a time into a replacement cluster. The dissimilarity calculation between
each merged group and the other groups are covered. Some ways to implement this:
CF systems recommend products to target users based on the opinions of other users. These
systems employ statistical techniques to find a set of users known as neighbors, who have a history
of agreeing with the target user, of recommending products.For better recommendations, our
approach is to use both User-based CF and Item-based CF.
J2EE is used to create a dynamic web project, using many libraries and datasets. The front-
end of the system gets user information, their visited places, and ratings. This information is
preprocessed and stored in datasets having format CSV using MapReduce Algorithm.MapReduce
is an algorithm or technique to handle the computations on a large amount of data basically written
in java, having to implement applications of mapping and reducing steps.
The mapping process breaks the data into key/value pairs also known as tuples. The output
of the mapping step is then taken as input in the reducing step. The reducer task is to reduce the
tuples into a smaller set of tuples. The reducer process involves the shuffling of data and then
sorting it and storing it in datasets.
Algorithm 1: The Map Function
1. for each element mi,j of M do
2. produce (key, value) pairs as (i,k), (M,j,m i,j)), for k=1,2,3,.. upto the number of columns of N.
3. for each element nj,k of N do
4. produce (key, value) pairs as (i,k),(N,j,n j,k)), for i=1,2,3,.. upto the number of rows of M
5. return Set of (key, value) pairs that each key, (i,k), has a list with values (M,j,mi,j) and (N,j,nj,k)
for all possible values of j.
The correlation between users and places is depicted in the form of user-item matrix. CF
systems suggest places to active users supported by the opinions of alternative users. These
systems use applied mathematics techniques to seek out a collection of users referred to as
neighbors who have a history of considering the target user. Similarity measures like cosine,
Pearson correlation, etc calculates the similarities and the neighbors are identified.
𝐴.𝐵 ∑𝑛
𝑖=1 𝐴𝑖 𝐵𝑖
Similarity = cos(𝜃) = =
||𝐴|| ||𝐵||
√∑𝑛 2 𝑛 2
𝑖=1 𝐴𝑖 ×√∑𝑖=1 𝐵𝑖
∑𝑛 ̅)
𝑖=1(𝑥𝑖 −𝑥̅ )×(𝑦𝑖 −𝑦
r=
√∑𝑛 2 𝑛 ̅)2
𝑖=1(𝑥𝑖 −𝑥̅ ) √∑𝑖=1(𝑦𝑖 −𝑦
Pa,i refers to the prediction for the active user a for a place i, wa,u refers to the similarity between
users and K refers to the neighborhood of most similar users.
In item-based CF, computation of similarities between pairs of items using centered cosine
similarity is done. The prediction function for the rating for a place i for the active user a is:
Where wi,j refers to the similarity between places and K refers to the neighborhood of most similar places
rated by user a. Using both user-based and item-based CF together helps in overcoming many issues such
as scalability problems and in achieving better performance.
The above mentioned algorithms are being done on a dataset of yelp reviews. Yelp is a
business directory service and crowd-sourced review forum, i.e, it is a useful platform for users to
post reviews and rate products on the services provided to them. The big dataset of the Yelp
platform can be downloaded from Kaggle. The data a lot of users is being retrieved. The size of
the dataset is about5,200,000 user reviews. After data cleaning and preprocessing, a user-item
matrix is created. During the preprocessing, the data is extracted or retrieved from the datasets
available for the project. Each data for the users are processed into a set of keys and values using
a java program.
The ratings of users are considered for the places they have visited and then the similarities
between the users are calculated, which can be then used to cluster the users and recommend them
places they would like to visit according to user-based and item-based Collaborative filtering. The
clusters are formed on the basis of similarities calculated using centered cosine. One matrix is
created from the datasets, row representing unique users and columns representing unique items.
Then this matrix is separated into two different matrices on the basis of their estimated similarities,
one for user- based CF and other for item-based CF.
The prediction is done on the matrices for the ratings of some items/places from point of
view of a particular user. The new places to visit are recommended by both the matrices and the
accuracy and efficiency of the recommendations are improved. Both the collaborative filtering
algorithm when implemented together on the clusters can recommend the places to the active user
efficiently. A large amount of data leads to better results and datasets which are less spread or with
less sparsity(which is handled by clustering) helps in improving the results.
6. REFERENCES:
12. Jayasuruthi L,Shalini A,Vinoth Kumar V.,(2018) ” Application of rough set theory in data
mining market analysis using rough sets data explorer” Journal of Computational and
Theoretical Nanoscience, 15(6-7), pp. 2126-2130
13. Maithili, K , Vinothkumar, V, Latha, P (2018). “Analyzing the security mechanisms to prevent
unauthorized access in cloud and network security” Journal of Computational and Theoretical
Nanoscience, Vol.15, pp.2059-2063