Restaurant Recommendation System Using Machine Learning

Nowadays a big challenge when going out to a new restaurant or cafe, people usually use websites or applications to look up nearby places and then choose one based on an average rating. But most of the time the average rating isn't enough to predict the quality or hygiene of the restaurant. Different people have different perspectives and priorities when evaluating a restaurant.

Uploaded by

Velumani s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views

Restaurant Recommendation System Using Machine Learning

Uploaded by

Velumani s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ISSN 2278-3091

Ketan Mahajan et al., International Journal of Advanced Trends10,

Volume in Computer Science
No.3, May and Engineering,
- June 2021 10(3), May - June 2021, 1671 – 1675

International Journal of Advanced Trends in Computer Science and Engineering

Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse261032021.pdf
https://doi.org/10.30534/ijatcse/2021/261032021

Restaurant Recommendation System using

Machine Learning
Ketan Mahajan1, Varsha Joshi2, Mohini Khedkar3, Jacky Galani4, Mayuri Kulkarni5
1
SVKM Institute Of Technology, India, [email protected]
2
SVKM Institute Of Technology, India, [email protected]
3
SVKM Institute Of Technology, India, [email protected]
4
SVKM Institute Of Technology, India, [email protected]
5
SVKM Institute Of Technology, India, [email protected]

 1. INTRODUCTION
ABSTRACT
Going out to a new restaurant is a big challenge faced by the
Nowadays a big challenge when going out to a new restaurant people as nowadays there are a lot of restaurants and choosing
or cafe, people usually use websites or applications to look up the one which has good taste according to the needs of the
nearby places and then choose one based on an average rating. person can be a nightmare. The opinions and feelings of the
But most of the time the average rating isn't enough to predict public about a restaurant's taste and hygiene greatly
the quality or hygiene of the restaurant. Different people have influences the user's opinion about the restaurant. Suppose a
different perspectives and priorities when evaluating a product has some copious negative reviews, it affects the
restaurant. Many online businesses now have implemented user's opinion and trust regarding that product in a negative
personalized recommendation systems which basically try to way. While exploring the available offers for a certain
identify user preferences and then provide relevant products product, it is appreciated by the user having the possibility to
to enhance the users experience . In turn, users will be able to access items that are generated by the Recommended System
enjoy exploring what they might like with convenience and
as it saves time as well as the money but the recommendation
ease because of the recommendation results. Finding an ideal
system should only consist of the products which suits the
restaurant can be a struggle because the mainstream
user's preference the most. In this paper we have implemented
recommender apps have not yet adopted the personalized
recommender approach. So we took up this challenge and we a recommendation system using hybrid filtering which is the
aim to build the prototype of a personalized recommender combination of Content-based filtering (CBF) and
system that incorporates metadata which is basically the collaborative filtering (CF). CBF models recommend based
information provided by interactions of customers and on the user's past behaviors and not from other users' data. If
restaurants online(reviews), which gives a pretty good idea of there is lack of enough information, the CBF will not be able
customers satisfaction and taste as well as features of the to discriminate the items properly. It will not perform upto the
restaurant. This type of approach enhances user experience of standards. On the other hand, CF looks upon the user's
finding a restaurant that suits their taste better. This paper has interactions and it tries to recommend items that were similar
used a package called lightfm(the library of python for to those items. In Cf, data sparsity problems occur because
implementing popular recommendation algorithms) and the interactions of many users are insufficient. However, there are
dataset from yelp. There are different methods of filtering the limitations to both methods. To avoid these, we have tried to
data, here we have used Hybrid filtering which is a use a hybrid approach that uses a combination of both
combination of Content-based filtering (CBF) and methods to give sufficient results and we have also compared
Collaborative Filtering (CF). Since the results from Hybrid the results running the CF model with our Hybrid Model to
filtering are far more closer to accuracy than CBF or CF
see which performs the best.
respectively. Then hybrid filtering gives results in the form of
personalized recommendations for users after training and
testing of the data 2. RELATED WORK

Key words : Restaurant recommendation system, The available recommendation system utilizes techniques
Content-based filtering, Collaborative Filtering and Hybrid across various fields, such as machine learning, data mining,
filtering database, statistics, similarity testing, etc. It generates
predictions of user satisfaction and/or recommends an item to
a user. Generally, the recommendation system is
1671
Ketan Mahajan et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1671 – 1675

implemented using three main conceptual approaches. For more details, please check out the dataset. [2]
Collaborative filtering,
Content-based filtering,
4. DATA PRE-PROCESSING
Hybrid approach.
Content-based filtering generates a prediction from attributes Now for building a recommendation system, we need an
of an item that the user prefers. Collaborative filtering is an interaction matrix between users and items, metadata
algorithm to generate a prediction using similarity of users’ associated with customers that indicates their taste preference
taste and preference. There are some critical problems in both and metadata of restaurants that summarizes their
concepts, such as limited content analysis, overspecialization, characteristics.
new user’s problem, and sparsity problem. Limited content
analysis is alimitation of content-based filtering that can only There is business of all categories from over 100 cities in the
draw a conclusion from features that are explicitly associated business dataset. We decided to filter out the dataset only for
with users. Over specialization is an overfitting problem one city because considering different cities means the items
where the recommendation system provides a high accuracy (restaurants) have less interactions with each other, So we
only on test data, but low accuracy on real data. New user selected Toronto which has 10,093 restaurants. Then we
problems arise from a lag in data to provide a new user with explored the restaurant attributes that would potentially be
an accurate prediction. Several recommendation systems use useful for recommendations. Then we explored the attributes
machine learning techniques to reduce the impact of those of restaurants that can be useful for recommendations. We
picked three attributes : 1. The rating of restaurants, 2.
problems. A hybrid approach combines the two concepts
Review count of the restaurant, 3.Restaurant categories as
together authorized licensed use limited to. The hybrid
item features since there are many features that have missing
recommendation system utilizes various approaches, such as
values. Usually Yelp has assigned an average of 10
combining separate recommenders, adding content-based categories/tags for each item(restaurant), and in total, across
characteristics to collaborative models, using multi-criteria, all restaurants, 436 tags exist such as breakfast, brunch,
etc. seafood, vegetarian, bars and so on. We selected upto 58 tags
with highest popularities since including some tags that only
3. DATASET appear a few times out of more than 10k restaurants would
add more noise in the recommender.[3]
The dataset used for this project is taken from Yelp. In that we
have three files that we converted to .csv viz users.csv,
reviews,csv, Business.csv. Given below are the column names
from each of the dataset.

Figure 3: Data pre-processing

Figure 1: User.csv Then we calculated the term frequency-inverse document

frequency (TF-IDF) values for each tag which would be used
as weights in model fitting later.

Figure 4: TF-IDF

In the review dataset, there was some possibility that one

user would rate one restaurant many times the history. To
solve that we used a very recent review as it was the reflection
of the latest preference of the user.
Figure 2: Reviews.csv

1672
Ketan Mahajan et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1671 – 1675

There were some cases where users give high ratings for each an example, users who like South Indian Restaurants would
restaurant , in such cases we subtract ratings from mean have similar embeddings with users who like North Indian
ratings and classify the result as positive for 1 value and Restaurants but won’t resemble the embeddings of an Italian
negative for -1 value and 0 for non-rated restaurants . In this food in the vector space. Since the embeddings are estimated
research paper instead of predicting user rating for each for every featur e. And the embeddings across all features
item(restaurant) we focus on ranking which restaurant user sums up the representations for items and users.
liked and disliked in proper order , as all this will lead to high III. Example of interaction matrix - user-movie ratings for
movie recommender(refer figure given below),
variance as time passes. So for the data cleansing step as the
final step, we selected users characteristic . We chose 4 not
sparse attribute, viz. The total written reviews , number of
useful reviews, if the user is elite/active in Yelp and the list of
liked restaurants of the user.[3]

• Evaluation Technique (AUC) :-

As per our case, the general idea is to figure out the customer's Figure 5: Example
preference which is practical and more important than the
IV. For various structures, there are 4 loss functions that are
prediction of rating for each restaurant. So, to evaluate
available in the LightFM model. They are listed as follows:
recommenders, we arrived at the conclusion of using
Logistics loss, BPR (Bayesian Personalized Ranking pairwise
AUC(Area Under the Curve) over the more traditional law), WRAP (Weighted approximate-Rank pairwise loss) and
technique used for measurement of the performance of k-OS WRAP, which are specified accurately in the light FM
explicit recommendation system which is the Root Mean package.
Square Error. AUC has a metric to support decisions that
checks only whether the item is preferred or not preferred by • Logistic loss: It is functional when both the negative (-1)
and the positive (+1) interactions are available.
the user.
• BPR: Bayesian Personalized Ranking pairwise loss
5. METHODOLOGY increases the prediction among a randomly chosen negative
example along with a positive example. It is functional only
I. So basically the algorithm is to recommend when there are positive interactions available and enhances
restaurants(items) that are high in ratings or are popular the ROC AUC is achieved.
regardless of the feedback from users or item features. This is
• WRAP: Weighted Approximate-Rank pairwise loss
a useful method for new customers, in which case limited
maximizes the number of the positive examples by doing the
information of user/item is available to us. These types of
sampling of the negative ones again and again until the rank
scenarios are also called cold start scenarios. The sorting of
violating is found to be one. It is useful when there aren't any
restaurants was done by putting the number of reviews and
negative interactions present. The positive interactions help
ratings in descending order. And at random, for all
in optimizing the top of the recommendation list to achieve
customers, treated the top k items as a list of
the precision@k.
recommendations for the implementation of our model in
Yelp. But while implementing this model the results infer • k-OS WRAP : k-th order statistic loss modification of
that there is a 50% chance for a random user (AUC value WRAP that uses the k-th positive examples for any available
turned out close to 0.5) to like the recommended restaurant. user as a ground for pairwise updates.
That concludes that this basic model unsurprisingly performs
V. When comparing WRAP and BPR, generally, WRAP
poorly.
performs better than BPR. WRAP keeps negative sampling
II. Now, Lightfm package is used since it incorporates a
until there is a violation, that means if the rank is not
matrix factorization model. Matrix factorization decomposes
violated, WRAP will take even more time to train. And as
a matrix in two or more matrices such that when multiplying
more iterations(epochs) are trained it becomes slower,
those matrices you get the original matrix.In the
because it tends to be problematic to find the violation.
recommendation system, the typical starting point is a matrix
Therefore, setting a cut-off value for searching is necessary
of interaction/rating between users and items and matrix
for training data WRAP loss. To know which loss gives the
factorization algorithm. It will decompose this matrix into
best result , we have to compare all 3 losses BPR,logistic and
aitem and user feature matrix which is also known as
WRAP.
embeddings. These embeddings have the same number of
rows that are called latent vector dimensions but the number
of columns is different depending on size of items or usersThe
latent embeddings could secure the features about attributes of
users and items, which also represents their taste. Let's take
1673
Ketan Mahajan et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1671 – 1675

6. MODEL ESTABLISHMENT

For LightFM to achieve a Pure CF mode, it should fit the

model in the interaction matrix. This gives the confirmation
that only the interaction information is accessed by the model
and no other metadata information. We splitted the data into
train and test interactions in the ratio of 20:80. We ensured
that the two sets were completely disjointed. After that, we
initialized some random parameters to fit the model using Figure 6: AUC Result.
different losses like logistic, BPR, WARP. As the parameters
III. Tag Similarity :-
are randomly selected, it was required to do a hyperparameter
search to get the best model. For that we used the LightFm package has its own feature it creates users and
scikit-optimizer package to look through a variety of values items features embedding that understand the similarity
for various hyperparameters and judged them on the obtained between tags. This makes this model highly efficient.
average AUC score. Embeddings produced by LightFM encode important
semantic information about features (tags).This is useful in
For Hybrid matrix factorization, user features and item the recommendation process to group the tags into categories
features were fitted to the interaction matrix, with the . In fig tag shows its semantic similar tags.
dimensions 10093 x 10156(no. of items x no. of features).
Lightfm creates unique features for every item (restaurant )
therefore in the item feature matrix each row contains lightfm
unique feature, ratings , sum of total number of reviews and
Figure 7: Tags.
tags (58). LightFm package implements a normalization
process for each row to get each value in range of (0,1) , but it
can create issues for features containing high values like
review count . By considering this , we calculate log(feature
with high value/max value of that feature ) to normalize such
features. Similarly we created a matrix of users and its feature 8.CONCLUSION
(76367 x 76406). We also did regularization to maintain In the proposed paper,it is a user preference restaurant
overfitting by adding item alpha and user alpha into the recommendation system using yelp dataset and LightFM
model .Then hyperparameter search to get the best model package. The goal of the paper is to give the best
performed. recommendation system by considering text-based review.
Our study shows that hybrid filtering technique gives best
7. RESULT performance as we compared performance of hybrid model
I. AUC Result :- with collaborative filtering model and different loss functions
In the figure given below, we get the result from the mean from LightFm package. Collaborative filtering uses just basic
AUC score. Logistic loss did not perform well, WRAP information between users and items. Hybrid filtering
outperforms both collaborative as well as in hybrid method, considers basic information as well as utilizes item & user
metadata, which makes it perform better in the
BPR looks fine for training data but performs badly for test
learning-to-rank setting and better for cold start problems.
data, it shows the problem of overfitting.

II. Demo :-
In the figure given below, restaurant recommendation for REFERENCES
user 1 and user 17. The number of ‘Known Positive’ means
1. Mara-Renata Petrusel, Sergiu-George Limboi, "A
the number of restaurant names that user is connected with,
Restaurants Recommendation System: Improving
on that basis, it recommends 5 restaurants to each user.
Rating Predictions using Sentiment Analysis", 21st
International Symposium on Symbolic and Numeric
Algorithms for Scientific Computing , SYNASC-2019.
2. https://www.kaggle.com/datafiniti/hotel-reviews
3. Asyush Singh, Solving business usecases by
recommender system using lightFM of
Towardsdatascience.com, 2018
4. Nanthaphat Koetphrom, Panachai Charusangvittaya,
Daricha Sutivong, "Comparing Filtering Techniques
in Restaurant Recommendation System", Department

1674
Ketan Mahajan et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1671 – 1675

of Industrial Engineering Faculty of Engineering

Chulalongkorn University Bangkok, Thailand, 2018..
5. R. M. Gomathi, P. Ajitha, G. Hari Satya Krishna, Harsha
Pranay, “Restaurant Recommendation System for
User Preference and Services Based on Rating and
Amenities”, Second International Conference on
Computational Intelligence in Data Science,
ICCIDS-2019
6. P.Murugavel, Dr. M. Punithavalli, Improved Hybrid
Clustering and Distance-based Technique for Outlier
Removal; IJCSE, 2011, Vol. 3.

1675