Fake Review Detection Iee Paper
Fake Review Detection Iee Paper
Abstract—Fake review detection and its elimination from the flagged. Since they tend to hype the product or they try t o
given dataset using different Natural Language Processing (NLP) emulate genuine reviews with the same words using it again
techniques is important in several aspects. In this article, the fake and again to make an impact on the buyer. Hence the issue of
2021 6th International Conference on Inventive Computation Technologies (ICICT) | 978-1-7281-8501-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICT50816.2021.9358716
review dataset is trained by applying two different Machine spam filtering requires huge data to train and be effective with
Learning (ML) models to predict the accuracy of how genuine added domain knowledge such as sarcasm sentences used by
are the reviews in a given dataset. The rate of fake reviews in E- users to show their dissent towards the product, sometimes the
commerce industry and even other platforms is increasing when product is good but not the delivery or the packing which
depend on product reviews for the item found online on different
affects the review classification. Here, an NLP technique is
websites and applications. The products of the company were
used to identify such reviews instead of misclassification to a
trusted before making a purchase. S o this fake review problem
must be addressed so that these large E-commerce industries
negative review as in sentiment analysis. To remove unwanted
such as Flipkart, Amazon, etc. can rectify this issue so that the
or outdated product reviews those include data pre-processing.
fake reviewers and spammers are eliminated to prevent users The focus of this research is to create an environment of
from losing trust on online shopping platforms. This model can online E-commerce industry where consumers build trust in a
be used by websites and applications with few thousands of users platform where the products they purchase are genuine and
where it can predict the authenticity of the review based on feedbacks posted on these websites/applications are true, are
which the website owners can take necessary action towards checked regularly by the company where the number of users
them. This model is developed using Naïve Bayes and random
forest methods. By applying these models one can know the
is increasing day by day, henceforth companies like Twitter,
number of spam reviews on a website or application instantly. To
WhatsApp, Facebook use sentiment analysis to check fake
counter such spammers, a sophisticated model is required in news, harmful/derogatory posts and banning such
which a need to be trained on millions of reviews. In this work users/organizations form using their platforms. Parallel to that
”amazon Yelp dataset” is used to train the models and its very E-commerce (Flipkart, Amazon) industries, hotels booking
small dataset is used for training on a very small scale and can be (Trivago), logistics, tourism (Trip Advisor), job search
scaled to get high accuracy and flexibility. (LinkedIn, Glass door), food (Swiggy, Zomato), etc. use
algorithms to tackle fake reviews, spammers to deceive the
Keywords—opinion mining, sentiment analysis, text mining consumers in buying below average products/ services. And
the users need to be alerted of the spammer like “not verified
I. INTRODUCTION profile” hence users need not worry about such false users.
Manual labelling of the reviews is practically time-consuming
The elegance with online review posting has grown at a and less effective. So supervised learning model is used for
faster rate and people buying almost everything online that labelling the reviews and then predicting the label is not
gets delivered at their doorsteps. Hence, people are not subject feasible. For example, Mukherjee et al. manually labelled
to physically inspect the product when buying online so they 2431 reviews for over 8 weeks, so automated review labelling
drastically unwantedly/wontedly depend on reviews of other should be possible to contain time and energy which is
buyers this must be made truthful as much as possible so that difficult and proposed by Sunil Soumya et.al. Some industries
the buyer is not cheated with fake reviewers or spammers time pay money to write fake reviews of their products and services
and again. The problem is simple yet tiring to be accomplished where it is not possible to label the review as spam or not
through/read every review to mark it as a fake or ambiguous spam. Amazon’s “Yelp” dataset has 30% to 40% of spam
category this must be done systematically to get to the root of reviews. Feature selection is an important aspect in selecting
the problem. This problem can be addressed by training an and training these models. In this work, the comparison of two
ML model which deals with the review section to flag a models developed to justify the model performance for this
particular review as genuine or spam. The interesting thing is “Amazon’s yelp” dataset and their relevance to deploy these
spammers who didn't use the product can be caught this way. models in real-time software. The Random Forests (RF) model
A spam review or the usage of different customer id can be performed well compared to the Naïve Bayes algorithm by a
used to filter review of the product falsely to get a good rating large margin in the fake review data analysis. The fake review
of the product. This can be filtered by checking the use of detection problem is addressed fairly and gives a fair insight
words like "awesome", "so good", "fantastic" etc. can be into its legality and need. The purpose is to select a suitable
Authorized licensed use limited to: California State University Fresno. Downloaded on June 22,2021 at 22:32:38 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9
algorithm to fulfill the task of fake review detection and its monitored to convert raw text to vectors which in turn used as
elimination. features to locate spam reviews [12].
The rest of the article is arranged as follows: Section II
describes the related works, Section III explains methods and III. M ET HODS
datasets, and Section IV depicts the model outline and 1) Dataset
working. Section V demonstrates the result and analysis. Dataset used is “amazon academic review” which contains
Finally, Section VI draws the conclusion of the research work reviews, useful votes, ratings, user id, and many other
along with the future scope of the research. attributes. The useful parameters are retrieved for feature
engineering. The dataset contains thousands of original and
II. RELATED WORK fake reviews mixed to easily assess the accuracy of the model
being implemented using this dataset. The Yelp dataset
The previous analysis is done on the expressed views
through text, blogs, reviews, feedbacks, etc. as opinions by released for the academic challenge contains information for
11,537 businesses. This dataset has 8,282 check-in sets,
users which are unique to compute, study to obtain relevant
43,873 users, 229,907 reviews for these businesses
information, that is nothing but sentiment analysis.
(www.yelp.com/dataset). The dataset is challenging since it
Exiting research used a two-step approach, SVM classifier contains a large set of varied reviews and parameters for
for classifying tweets [1]; other used emoticons, smileys, and training any algorithm.
Hashtags to classify labels into multiple sentiments [2]. The
other researcher used an SVM classifier for training data using 2) Pre-processing
emoticons [3]. Pre-processing is the first step in analyzing any dataset
which includes removing unnecessary attributes, punctuations,
2.1 Existing systems: stop words, missing words, redundant words , etc. to clean the
dataset for training purposes. This ensures proper training of
2.1.1 Lexicon-based methods: - Based on counting the
number of positive and negative words in a sentence- Twitter. the model.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 22,2021 at 22:32:38 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9
(1)
(2)
(3)
Authorized licensed use limited to: California State University Fresno. Downloaded on June 22,2021 at 22:32:38 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9
the models for diverse applications, they perform well in [4] “Fake review detection using opinion mining” by Dhairya Patel,
certain fields and incompatible in some areas, hence their Aishwerya Kapoor and SameetSonawane, International Research journal
of Engineering and technology (IRJET) , volume 5, issue 12,Dec 2018.
application needs some experience.
[5] Ravi, k. Ravi., 2015. A survey on opinion mining and sentiment
analysis: T asks, approaches and applications. Knowledge based systems,
VI. CONCLUSION AND FUT URE SCOPE 89.14-46.
[6] Khan, K. et al., “Mining opinion components from unstructured reviews:
The results discussed in this article are the comparison A review”. Journal of King Saud University – Computer and
of two models developed to justify the model performance for Information Sciences (2014),
this “Amazon’s yelp” dataset and their relevance to deploy http://dx.doi.org/10.1016/j.jksuci.2014.03.009.
these models in real-time software. Hence Random forests [7] “Fake review detection from product review using modified method of
model performed well compared to the Naïve Bayes algorithm iterative computation framework”, by EkaDyarWahyuni&ArifDjunaidy,
MAT EC web conferences 58.03003(2016) BISST ECH 20 15.
by a large margin. The fake review detection problem is
[8] Saumya, S., Singh, J.P. Detection of spam reviews: a sentiment analysis
addressed fairly and gives a fair insight into its legality and approach. CSIT 6, 137–148 (2018). https://doi.org/10.1007/s40012-018-
need, the purpose is to select an algorithm to fulfill the task of 0193-0.
fake review detection and its elimination. In future work, [9] Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via
hybrid models and new models can be tried for the fake temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD
review detection model. By using Google co-lab and NVIDIA international conference on Knowledge discovery and data mining.
ACM, pp 823–831.
graphics GPU, the research can speed up the process of [10] Mukherjee A, Venkataraman V, Liu B, Glance N (2013a) Fake review
execution. detection: classification and analysis of real and pseudo reviews.
T echnical Report UIC-CS-2013–03, University of Illinois at Chicago.
[11] Heydari A, T avakoli M, Salim N (2016) Detection of fake opinions
References using time series. Expert SystAppl 58:83–92.
[12] Ren Y, Ji D (2017) Neural networks for deceptive opinion spam
[1] Barbosa, Luciano & Feng, Junlan. (2010). Robust Sentiment Detection detection: an empirical study. InfSci 385:213–224.
on T witter from Biased and Noisy Data. Coling 2010 - 23rd [13] McCallum, Andrew. "Graphical Models, Lecture2: Bayesian Network
International Conference on Computational Linguistics, Proceedings of Represention" (PDF). Retrieved 22 October 2019.
the Conference. 2. 36-44. [14] Joseph, S. I. T . (2019). SURVEY OF DAT A MINING ALGORITHM’S
[2] Enhanced Sentiment Learning Using T witter Hashtags and Smileys FOR INT ELLIGENT COMPUT ING SYST EM. Journal of trends in
Dmitry Davidov, Oren Tsur, ICNC / 2, Institute of Computer Science Computer Science and Smart technology (T CSST ),1(01), 14 -24.
T he Hebrew University 2010.
[3] Go, Alec & Bhayani, Richa & Huang, Lei. (2009). T witter sentiment
classification using distant supervision. Processing. 150.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 22,2021 at 22:32:38 UTC from IEEE Xplore. Restrictions apply.