Role of Machine Learning in Fake Review Detection
Role of Machine Learning in Fake Review Detection
Abstract— reviews are polluted with fake reviews? How can a user
decide what to buy and what not to buy? Hence there is
In today’s culture the growing technology is promoting a lot of demand for fake review detection mechanisms.
products and events in a very positive way. Technology usage The fake review system segregates the reviews
in current generation has taken a new step in reaching great which are really genuine, and which are not. There are
reviews which are truthful to the product and some of them
heights. But when a technology brings in so much positiveness mislead the buyer. Different datasets are taken from different
it also has its own negative usage and one among them is the industries. When a garbage data is taken and used in the
fake reviews. Fake reviews are weakening the actual worth of
algorithms then the result might not be appropriate. A raw
data will be in different forms. While considering different
the product. To be more specific, the reviews can be datasets, the raw data should be converted into a single form.
divided into two categories: legitimate fake reviews and Further, the pre-processing techniques are applied by
reviews written intentionally to decapitate the product or
considering the raw data, which is then converted to fit in to
algorithms. Data quality assessment, Data cleaning, Data
brand value. On the other hand, the machine learning transformation, Data reduction are few steps in "Data Pre-
algorithms are extensively used. The incorporation of machine processing”. Any sentence which is said contains its own
emotion and it can be understandable. But when an emotion
learning techniques into the classification of the reviews is
of a written sentence is to be known then there is the real
considered as an excellent combination. In this work, various task. For that, the Natural language programming techniques
datasets from different industries such as airline industry, are considered to know more about the emotion of a sentence
or a word in particular. Stop words are frequently employed
movie industry and food industry are considered and fake
in text mining and natural language processing (NLP) to
reviews are classified using various algorithms including K- filter out terms that are overused and provide very little
Nearest Neighbors, Naive Bayes, Random Forest, Decision tree, valuable information. “a”, “the”, “is”, “are” and etc. are
some of the stop words. Tokenization is also one of the
S upport Vector Machine, Logistic Regression from Machine techniques used and it is the process of tokenizing or
learning. There are reviews which can be decoded using the splitting a string, text into a list of tokens. A sentence is a
sentiment analysis from Natural Language Programming. token in a paragraph. Classification techniques such as Naive
Bayes [1] and Decision Tree SVM are available [11]
S entiment analysis is used to find the emotion in a text. The similarly, Linear SVP is also used. In some of the papers
accuracy parameter result is analyzed for all the implemented authors used Multidimensional Feature Engineering for
models. The results demonstrate support vector machine better results [9].
Many small scale industries completely reside their business
technique giving high accuracy compared to other machine on word of mouth which are also called as reviews.
learning classification techniques. Industries like Movie industry ,Amazon shopping [5] gets
Keywords—Fake reviews, Machine learning, Natural
most of its revenue from the positive word of mouth. And
language processing, Sentiment analysis these fake reviews are misleading the common audience or
the user by not letting them to give a try.
also the user experience and present the positive reviews in The test results for the algorithms utilised by the
any platform. A model is proposed with the help of various author, who used the Restaurant Dataset, are as follows: The
machine learning techniques [2, 12]. Cclassifiers are applied Decision Tree has the best training accuracy, followed by the
to classify the reviews into Fake and Genuine reviews. Any XGBT, SVMs, Random Fores t, and MLP utilising Doc2Vec
user who goes online and checks the products ask for a document embedding. After hyperparameter adjustment,
genuine review. This proposed model will clear the issues for stand-alone classifiers may obtain up to 68.2% accuracy in
the user. the case of MLP. The adaboost ensemble of MLPs allows
ensemble learning-based classifiers to reach up to 77.3%
accuracy [7].
II. LIT ERAT URE WORK The author uses the amazon dataset. The author in
the paper studies the Fake review system by using
The author utilised TF-IDF to efficiently distinguish convolutional neural network model. For integrating the
false and true hotel reviews using a dataset of gold product related features with the product owned person. The
standard hotel ratings. Author discussed three Naive Bayes, author used bagging model to reduce overfitting and high
logistic regression, and support vector machines in his work. variance. By using the Bi grams and Tri grams the author got
They have obtained a validation set using a multinomial the result [8].
Naive Bayes classifier [1]. The author used manual annotated dataset. The author in
The author took movie review dataset and used the paper studies the Fake review system by using
various machine learning algorithms like K*, Naive bayes, multidimensional feature engineering. To recognise fake
SVM, KNN algorithms. After testing all the algorithms SVM reviews, six feature criteria are created. Relativity of review
surpasses as the best accuracy among the other classification items and content is determined by -
algorithms [2]. (1) Analyse through reviews for product characteristics.
The author has done the literature survey of various (2) Create word vectors from product reviews depending
papers to know which algorithm is giving accurate value and on the features of the item.
he has gone through techniques like Naive baye's from (3) Use the x2 statistical approach to choose the
Machine learning and LSTM, Bidirectional LSTM, GRNN. correlation product characteristics. [9]
And he finally got the highest accuracy in Naive bayes from The author combined actual and seemingly
machine learning algorithms and LSM gave the highest fraudulent reviews. Following Author study, behavioural and
accuracy from couple of techniques . He got 98.9% accuracy contextual aspects are crucial for spotting phoney reviews.
in deep learning that in bidirectional- LSTM for filtering Their study made use of the crucial reviewer behaviour trait
words. He also used maximum entropy, KNN,K-star known as "reviewer deviation." NNC, LTC, and BM25 term
algorithms and checked various publications and concluded weighting systems had all been tested. As per authors
the above accuracy from naive bayes which he got as the observation BM25 beat other term weighting schemes [10].
highest amongst them [3,13]. The author had taken a tourism hotel review dataset.
The author has used LIAR dataset and used pre- Author used Support Vector Machine model for fake review
processing techniques to know the sentiment analysis and detection and for the second analytical component a spelling
then have used various algorithms in machine learning. checker software tool was developed according to their
There are learning techniques like RNN, CNN, LSTM, usage. They used python for programming the software[11].
GRU, Logistic regression and SVM. Among which CNN has
done extremely well showing its best accuracy of 0.270 and
the other test accuracies are as follows SVM(0.255), Logistic III. PROPOSED WORK
regression(0.247), Bi-LSTM(0.233), GRU(0.217),
LSTM(0.2166) [4]. The literature works reveals that fake review detection is
The author utilised the Amazon Review Data an important research issue because it has great impact in
(2018) dataset to analyse up to 10M reviews from user’s daily life. And also literature work demonstrate that
Amazon.com in an effort to identify different sorts of machine learning techniques are playing vital role in fake
opinion spam. The fact that they automatically labelled review detection. This research study has applied multiple
totally copied and nearly replicated reviews as false reviews machine learning techniques to perform fake review
undermines the legitimacy of the results even if they attained detection. The results are verified with multiple data sets.
a respectable performance. To create false product reviews, This work supports to demonstrate the role of machine
they utilised two language models. ULMFiT- Universal learning techniques in fake review detection.
language Model Finetuning and GPT-2. The author asserts
that of the four prediction sources, the fake Roberta model
A. Architecture
performed the best Compared to the other ML model, the
OpenAI model fared much worse [5]. Figure 1 represents the working model of proposed fake
WEKA Tool (Waikato Environment for Knowledge review detection using machine learning techniques.
Analysis) used in data mining jobs, it is a tool for gathering The first step is to take a dataset and perform the data mining
machine learning algorithms. categorization, regression, techniques which are cleaning, clustering, classification.
clustering, association rules, and visualisation are all A\and in the next step we need to proceed with the NLP
methods for processing data. The author has used NB, DT- techniques which are removal of tokens and tokenization.
J48, LR and SVM algorithms to analyse Amazon reviews And then the sentiment analysis is done. And then the
datasets [6]. training and testing is performed using six machine learning
T ABLE I
MODELS PERFORMANCE ANALYSIS OF MODELS PRECISION, RECALL, F1-
SCORE AND SUP P ORT
OR 0.89 0.80 0.84 7119 As the fake reviews are declining the purchase of the
Naïve products by the customer. This system of removing the fake
Bayes
CG 0.81 0.90 0.85 7032 reviews and junk from the reviews by the usage of the
Classifier
OR 0.88 0.89 0.89 7119
effective methods from machine learning gives an edge for
Support the product owners. We have got the accuracies of the six
Vector
Machine
CG 0.89 0.88 0.88 7032 techniques we have used from machine learning and the best
accuracy is from SVM classifier.
OR 0.74 0.72 0.73 7119
Decision
T ree CG 0.72 0.75 0.74 7032 VII. FUT URE WORK
We need to develop more efficient model which has
OR 0.88 0.18 0.29 7119
K-Nearest much more accuracy and identify the Fake reviews more
Neighbors CG 0.54 0.97 0.69 7032 accurately. And also, we will focus on removing those fake
reviews so that the customer can truly rely on the reviews to
but his/her desired product. So, we want to introduce deep
T ABLE 2: PERFORMANCE ANALYSIS OF WORKING MODELS IN FAKE REVIEW learning technique techniques to the existing introduced
DETECTION system to get more accurate results.
Accuracy
REFERENCES
S.No Algorithm
86.33
1 logistic regression [1] R. Hassan and M. R. Islam, "A Supervised Machine Learning Approach
83.92 to Detect Fake Online Reviews,"
2 Random Forest 2020 23rd International Conference on Computer and Information
84.65 T echnology (ICCIT ), 2020, pp. 1-6, doi:
3 Naïve Bayes Classifier 10.1109/ICCIT 51783.2020.9392727.
88.48 [2] L. Gutierrez-Espinoza, F. Abri, A. Siami Namin, K. S. Jones and D. R.
4 Support Vector Machine
W. Sears, "Ensemble Learning for Detecting Fake Reviews," 2020 IEEE
73.27 44th Annual Computers, Software, and Applications Conference
5 Decision T ree
(COMPSAC), 2020, pp. 1320-1325, doi:
6 K-Nearest Neighbors 57.30 10.1109/COMPSAC48688.2020.00-73.
[3] J. C. Rodrigues, J. T . Rodrigues, V. L. K. Gonsalves, A. U. Naik, P.
Shetgaonkar and S. Aswale,"Machine & Deep Learning T echniques for
Detection of Fake Reviews: A Survey," 2020 International Conference on
Fig.3 represents the accuracy values of all the Emerging T rends in Information Technology and Engineering (ic-ETITE),
implemented models in graphical manner. The graphical 2020, pp. 1-8, doi: 10.1109/ic-ET IT E47903.2020.063.
[4] Girgis, Sherry, Eslam Amer, and Mahmoud Gadallah. "Deep
representation helps to understand or to analyze the
learning algorithms for detecting fake news in online text." 2018 13th
performance of the models easily. Where the X-axis shows international conference on computer engineering and systems
the various algorithms, and the Y-axis shows the accuracies. (ICCES). IEEE, 2018. [5] Salminen, Joni, et al. "Creating and detecting
fake reviews of online products." Journal of Retailing and Consumer
Services 64 (2022): 102771.
[6] Elmurngi, Elshrif & Gherbi, Abdelouahed. (2018). Unfair reviews ABOUT AUTHOR
detection on Amazon reviews using sentiment analysis with supervised
learning techniques.
JCS. 14. 714-726. 10.3844/jcssp.2018.714.726. Pemmasani Manish Kumar,
[7]L. Gutierrez-Espinoza, F. Abri, A. Siami Namin, K. S. Jones and D.
R. W. Sears, "Ensemble Learning for Detecting Fake Reviews," 2020 studying B. Tech in Department of
IEEE 44th Annual Computers,Software, and Applications Conference Computer Science and Engineering
(COMPSAC), 2020, pp. 1320-1325, doi: from Koneru Lakshmaiah Education
10.1109/COMPSAC48688.2020.00-73. Foundation, his research interests are
[8]Sun, Chengai, Qiaolin Du, and Gang T ian. "Exploiting product
related review features for fake review detection." Mathematical
Artificial Intelligence, Cloud
Problems in Engineering 2016 (2016). Computing, Internet of Things. He is
[9]Wang, Ge, et al. "Fake Review Identification Methods Based on certified at Amazon web Services in
Multidimensional Feature Engineering." Mobile Information Systems Solution Architect Associate, Oracle
2022 (2022).
[10] Kumar, Jay. "Fake Review Detection Using Behavioral and
Foundation Associate. Aviatrix Multi-Cloud Networking
Contextual Features." Associate.
arXiv preprint arXiv:2003.00807 (2020).
[11]Möhring, Michael, et al. "HOT FRED: A Flexible Hotel Fake Shri Harrsha Samala, studying B.
Review Detection System."
Information and Communication T echnologies in T ourism 2021. Tech in Department of Computer
Springer, Cham, 2021. 308 Science and Engineering from Koneru
[12]. Kumar, C. N., Keerthana, D., Kavitha, M., & Kalyani, M. (2022, Lakshmaiah Education Foundation,
June). Customer Loan Eligibility Prediction using Machine Learning his research interests are Artificial
Algorithms in Banking Sector. In 2022 7th International Conference on
Communication and Electronics Systems (ICCES) (pp. 1007 -1012).
Intelligence, Cloud Computing,
IEEE. Internet of Things. He is certified at
[13] Kavit ha, M., Srinivas, P. V. V. S., Kalyampudi, P. L., & Amazon web Services in Solution
Srinivasulu, S. (2021, September). Machine Learning T echniques for Architect Associate, Aviatrix Multi-
Anomaly Detection in Smart Healthcare. In 2021 T hird International
Conference on Inventive Research in Computing Applications
Cloud Networking Associate, Wipro Talent Next.
(ICIRCA) (pp. 1350-1356). IEEE.
Konda Abhiram, studying B. Tech in
Department of Computer Science and
Engineering from Koneru Lakshmaiah
Education Foundation, his research
interests are Artificial Intelligence,
Cloud Computing. He is certified at
Amazon web Services in Solution
Architect Associate.