0% found this document useful (0 votes)
51 views5 pages

Role of Machine Learning in Fake Review Detection

Uploaded by

name52513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

Role of Machine Learning in Fake Review Detection

Uploaded by

name52513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the Sixth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2022)

IEEE Xplore Part Number: CFP22J88-ART; ISBN: 978-1-6654-8271-4

Role of Machine Learning in Fake Review


Detection
P. Manish Kumar1 , S. Shri Harrsha 1 , K. Abhiram 1 , Dr. M. Kavitha 1 , M Kalyani 2
1
Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India.
2022 6th International Conference on Electronics, Communication and Aerospace Technology (ICECA) | 978-1-6654-8271-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICECA55336.2022.10009174

manishpemmasani123@g mail.co m, [email protected],


[email protected] , [email protected]
2
Department of Mathematics, PACE institute of technology and sciences, Ongole, AP, India .
modepallikalyani123@g mail.co m,

Abstract— reviews are polluted with fake reviews? How can a user
decide what to buy and what not to buy? Hence there is
In today’s culture the growing technology is promoting a lot of demand for fake review detection mechanisms.
products and events in a very positive way. Technology usage The fake review system segregates the reviews
in current generation has taken a new step in reaching great which are really genuine, and which are not. There are
reviews which are truthful to the product and some of them
heights. But when a technology brings in so much positiveness mislead the buyer. Different datasets are taken from different
it also has its own negative usage and one among them is the industries. When a garbage data is taken and used in the
fake reviews. Fake reviews are weakening the actual worth of
algorithms then the result might not be appropriate. A raw
data will be in different forms. While considering different
the product. To be more specific, the reviews can be datasets, the raw data should be converted into a single form.
divided into two categories: legitimate fake reviews and Further, the pre-processing techniques are applied by
reviews written intentionally to decapitate the product or
considering the raw data, which is then converted to fit in to
algorithms. Data quality assessment, Data cleaning, Data
brand value. On the other hand, the machine learning transformation, Data reduction are few steps in "Data Pre-
algorithms are extensively used. The incorporation of machine processing”. Any sentence which is said contains its own
emotion and it can be understandable. But when an emotion
learning techniques into the classification of the reviews is
of a written sentence is to be known then there is the real
considered as an excellent combination. In this work, various task. For that, the Natural language programming techniques
datasets from different industries such as airline industry, are considered to know more about the emotion of a sentence
or a word in particular. Stop words are frequently employed
movie industry and food industry are considered and fake
in text mining and natural language processing (NLP) to
reviews are classified using various algorithms including K- filter out terms that are overused and provide very little
Nearest Neighbors, Naive Bayes, Random Forest, Decision tree, valuable information. “a”, “the”, “is”, “are” and etc. are
some of the stop words. Tokenization is also one of the
S upport Vector Machine, Logistic Regression from Machine techniques used and it is the process of tokenizing or
learning. There are reviews which can be decoded using the splitting a string, text into a list of tokens. A sentence is a
sentiment analysis from Natural Language Programming. token in a paragraph. Classification techniques such as Naive
Bayes [1] and Decision Tree SVM are available [11]
S entiment analysis is used to find the emotion in a text. The similarly, Linear SVP is also used. In some of the papers
accuracy parameter result is analyzed for all the implemented authors used Multidimensional Feature Engineering for
models. The results demonstrate support vector machine better results [9].
Many small scale industries completely reside their business
technique giving high accuracy compared to other machine on word of mouth which are also called as reviews.
learning classification techniques. Industries like Movie industry ,Amazon shopping [5] gets
Keywords—Fake reviews, Machine learning, Natural
most of its revenue from the positive word of mouth. And
language processing, Sentiment analysis these fake reviews are misleading the common audience or
the user by not letting them to give a try.

As the Algorithms are such that the most liked


I. INT RODUCT ION
reviews are shown first. The misleading fake reviews are
Any user who comes online to checkout a product shown on the top and the genuine ones are pushed to the
first reaches to the review section to know what the other bottom This work mainly aims to clear the bogus in the
users experience on the intended product. But what if those reviews which are destroying not only the product owner but

978-1-6654-8271-4/22/$31.00 ©2022 IEEE 1026


Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:54 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2022)
IEEE Xplore Part Number: CFP22J88-ART; ISBN: 978-1-6654-8271-4

also the user experience and present the positive reviews in The test results for the algorithms utilised by the
any platform. A model is proposed with the help of various author, who used the Restaurant Dataset, are as follows: The
machine learning techniques [2, 12]. Cclassifiers are applied Decision Tree has the best training accuracy, followed by the
to classify the reviews into Fake and Genuine reviews. Any XGBT, SVMs, Random Fores t, and MLP utilising Doc2Vec
user who goes online and checks the products ask for a document embedding. After hyperparameter adjustment,
genuine review. This proposed model will clear the issues for stand-alone classifiers may obtain up to 68.2% accuracy in
the user. the case of MLP. The adaboost ensemble of MLPs allows
ensemble learning-based classifiers to reach up to 77.3%
accuracy [7].
II. LIT ERAT URE WORK The author uses the amazon dataset. The author in
the paper studies the Fake review system by using
The author utilised TF-IDF to efficiently distinguish convolutional neural network model. For integrating the
false and true hotel reviews using a dataset of gold product related features with the product owned person. The
standard hotel ratings. Author discussed three Naive Bayes, author used bagging model to reduce overfitting and high
logistic regression, and support vector machines in his work. variance. By using the Bi grams and Tri grams the author got
They have obtained a validation set using a multinomial the result [8].
Naive Bayes classifier [1]. The author used manual annotated dataset. The author in
The author took movie review dataset and used the paper studies the Fake review system by using
various machine learning algorithms like K*, Naive bayes, multidimensional feature engineering. To recognise fake
SVM, KNN algorithms. After testing all the algorithms SVM reviews, six feature criteria are created. Relativity of review
surpasses as the best accuracy among the other classification items and content is determined by -
algorithms [2]. (1) Analyse through reviews for product characteristics.
The author has done the literature survey of various (2) Create word vectors from product reviews depending
papers to know which algorithm is giving accurate value and on the features of the item.
he has gone through techniques like Naive baye's from (3) Use the x2 statistical approach to choose the
Machine learning and LSTM, Bidirectional LSTM, GRNN. correlation product characteristics. [9]
And he finally got the highest accuracy in Naive bayes from The author combined actual and seemingly
machine learning algorithms and LSM gave the highest fraudulent reviews. Following Author study, behavioural and
accuracy from couple of techniques . He got 98.9% accuracy contextual aspects are crucial for spotting phoney reviews.
in deep learning that in bidirectional- LSTM for filtering Their study made use of the crucial reviewer behaviour trait
words. He also used maximum entropy, KNN,K-star known as "reviewer deviation." NNC, LTC, and BM25 term
algorithms and checked various publications and concluded weighting systems had all been tested. As per authors
the above accuracy from naive bayes which he got as the observation BM25 beat other term weighting schemes [10].
highest amongst them [3,13]. The author had taken a tourism hotel review dataset.
The author has used LIAR dataset and used pre- Author used Support Vector Machine model for fake review
processing techniques to know the sentiment analysis and detection and for the second analytical component a spelling
then have used various algorithms in machine learning. checker software tool was developed according to their
There are learning techniques like RNN, CNN, LSTM, usage. They used python for programming the software[11].
GRU, Logistic regression and SVM. Among which CNN has
done extremely well showing its best accuracy of 0.270 and
the other test accuracies are as follows SVM(0.255), Logistic III. PROPOSED WORK
regression(0.247), Bi-LSTM(0.233), GRU(0.217),
LSTM(0.2166) [4]. The literature works reveals that fake review detection is
The author utilised the Amazon Review Data an important research issue because it has great impact in
(2018) dataset to analyse up to 10M reviews from user’s daily life. And also literature work demonstrate that
Amazon.com in an effort to identify different sorts of machine learning techniques are playing vital role in fake
opinion spam. The fact that they automatically labelled review detection. This research study has applied multiple
totally copied and nearly replicated reviews as false reviews machine learning techniques to perform fake review
undermines the legitimacy of the results even if they attained detection. The results are verified with multiple data sets.
a respectable performance. To create false product reviews, This work supports to demonstrate the role of machine
they utilised two language models. ULMFiT- Universal learning techniques in fake review detection.
language Model Finetuning and GPT-2. The author asserts
that of the four prediction sources, the fake Roberta model
A. Architecture
performed the best Compared to the other ML model, the
OpenAI model fared much worse [5]. Figure 1 represents the working model of proposed fake
WEKA Tool (Waikato Environment for Knowledge review detection using machine learning techniques.
Analysis) used in data mining jobs, it is a tool for gathering The first step is to take a dataset and perform the data mining
machine learning algorithms. categorization, regression, techniques which are cleaning, clustering, classification.
clustering, association rules, and visualisation are all A\and in the next step we need to proceed with the NLP
methods for processing data. The author has used NB, DT- techniques which are removal of tokens and tokenization.
J48, LR and SVM algorithms to analyse Amazon reviews And then the sentiment analysis is done. And then the
datasets [6]. training and testing is performed using six machine learning

978-1-6654-8271-4/22/$31.00 ©2022 IEEE 1027


Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:54 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2022)
IEEE Xplore Part Number: CFP22J88-ART; ISBN: 978-1-6654-8271-4

algorithms which are discussed below. And we get the final


accuracies, precession and recall which are shown in result
discussion.

Fig.1. Working model of proposed work Fig.2. Sample Data Set

IV. IMPLEMENT AT ION


B. Discussion on Working Algorithms For implementation, python programming language is
KNN used on anaconda platform. Anaconda platform supports
K nearest neighbors’ algorithm is used to find the vast libraries under python. The collected dataset is loaded
nearest node. And is helpful in finding the group of similar into platform and all the proposed models are implemented
instances. KNN is also a pattern recognition algorithm. And on it. The accuracy parameter observed to see the
also used in statistical estimation. performance of the implemented model. Accuracy is one of
SVM the parameters to assess the model performance or to find
SVM algorithm will work to find out the best which model is best in comparative with other models in
separable hyper plane by separating the training data into machine learning.
many classes.it is a discriminated classifier and it is a
supervised learning algorithm. The execution is carried out as per below steps:
Decision Tree --> The dataset of various reviews is taken from OSF and it
Decision Tree is one of the most used machine contains 20,000 CG reviews and 20,000 OG reviews.
learning algorithms which is used to make the right decision -->Pre-processing of the data is done using various methods
by splitting the data into nodes. It is there to construct a tree like Data Quality assessment and data cleaning.
on best possible features. It takes entropy, information gain -->Text summarization is done by removal of stop words
etc. into consideration and relies most on it. using NLTK in python
Random Forest -->To get the meaning of the word to decide whether the
Random forest is a Machine learning algorithm word is positive or negative Lemmatization is done.
which is there to handle the overfitting problems of decision -->For any algorithm in machine learning we need to have
tree. It constructs a bag of trees from various datasets, and it Training and testing data, so we created training and testing
generates a small number of features in constructing the data. While implementation 80% of the dataset is considered
trees in the forest i.e., the bag of trees. for training part and 20% of the dataset is used for testing
Naive Bayes part.
Naive bayes is completely based on bayes theorem -->Training and testing with the six-machine learning
which generally relies on the probability of the occurrences. Algorithm we listed earlier on the pre-processed data is
Naive bayes is frequently used in many occasions like text done
classification and fake reviews detection. And also main ly -->And finally, we get the final accuracies, precession,
used in recommendation systems. recall, f1 score, support of all the algorithms, and we get a
Logistic Regression chance to choose the best among them.
Logistic Regression is also a supervised learning
algorithm which is used in machine learning. It finds the Accuracy of all proposed algorithm is arrived to show
hyperplane which classifies the the best model among them more than 86%. The Random
training data. Forests Classifier and Multinomial Naive Bayes algorithm
predicted a precision level of approximately 84%. However,
C. Dataset Description the Decision Tree Classifier performed fake reviews
To perform any algorithm, the first thing we need is the prediction up to an accuracy of just over 73%. The worst
dataset. We have taken the dataset called "fake reviews performing algorithm was the K Nearest Neighbors
dataset" from OSF website. 20k false reviews and 20k actual algorithm which could only perform the predictions up to an
product reviews make up the dataset of created phoney accuracy level of nearly 58%. The evaluation has also got
reviews. OR = Original reviews (human created and the values of Precession, recall, F1 score and support.
authentic),CG = Computer-generated fake reviews. The data Precession is about predicting the specific model. Recall
set contains the attributes like Category , Rating , text ,Label. picks the number of positive predictions. F1 score is the
Figure 2 represents the Dataset which we have used for the evaluation of precession and recall getting the prediction
implementation. value. Support shows how many times the class has
occurred in the dataset.

978-1-6654-8271-4/22/$31.00 ©2022 IEEE 1028


Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:54 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2022)
IEEE Xplore Part Number: CFP22J88-ART; ISBN: 978-1-6654-8271-4

The bar graph in figure 3 shows the various accuracies


of six algorithms we have listed above. And from that we
V. RESULT S DISCUSSION observe that SVM is highest and KNN is least. By the usage
of this current proposed system the sorting of the reviews
The proposed model’s performance in terms of precision, becomes much easier and make a way for genuine
recall, f1-score and support is given in table-1 and the customers.
performance in terms of accuracy is shown in table-2. The
SVM Classifier performed the most accurate predictions
regarding the fake nature of reviews having a predictive
accuracy of just over 88%, closely followed by Logistic
Regression which had a prediction accuracy of 86% and the
least among all is KNN algorithm which Is 57.66%. which
is displayed in table 2.

T ABLE I
MODELS PERFORMANCE ANALYSIS OF MODELS PRECISION, RECALL, F1-
SCORE AND SUP P ORT

Cate Precisi Recall F1- Supp


Algorithm
gory on Score ort
OR 0.86 0.87 0.86 7119
logistic Fig. 3. Accuracy Analysis of Working Models in Fake
regression CG 0.87 0.86 0.86 7032 review detection.
OR 0.88 0.79 0.83 7119
Random
Forest CG 0.80 0.89 0.85 7032 VI. CONCLUSION

OR 0.89 0.80 0.84 7119 As the fake reviews are declining the purchase of the
Naïve products by the customer. This system of removing the fake
Bayes
CG 0.81 0.90 0.85 7032 reviews and junk from the reviews by the usage of the
Classifier
OR 0.88 0.89 0.89 7119
effective methods from machine learning gives an edge for
Support the product owners. We have got the accuracies of the six
Vector
Machine
CG 0.89 0.88 0.88 7032 techniques we have used from machine learning and the best
accuracy is from SVM classifier.
OR 0.74 0.72 0.73 7119
Decision
T ree CG 0.72 0.75 0.74 7032 VII. FUT URE WORK
We need to develop more efficient model which has
OR 0.88 0.18 0.29 7119
K-Nearest much more accuracy and identify the Fake reviews more
Neighbors CG 0.54 0.97 0.69 7032 accurately. And also, we will focus on removing those fake
reviews so that the customer can truly rely on the reviews to
but his/her desired product. So, we want to introduce deep
T ABLE 2: PERFORMANCE ANALYSIS OF WORKING MODELS IN FAKE REVIEW learning technique techniques to the existing introduced
DETECTION system to get more accurate results.

Accuracy
REFERENCES
S.No Algorithm
86.33
1 logistic regression [1] R. Hassan and M. R. Islam, "A Supervised Machine Learning Approach
83.92 to Detect Fake Online Reviews,"
2 Random Forest 2020 23rd International Conference on Computer and Information
84.65 T echnology (ICCIT ), 2020, pp. 1-6, doi:
3 Naïve Bayes Classifier 10.1109/ICCIT 51783.2020.9392727.
88.48 [2] L. Gutierrez-Espinoza, F. Abri, A. Siami Namin, K. S. Jones and D. R.
4 Support Vector Machine
W. Sears, "Ensemble Learning for Detecting Fake Reviews," 2020 IEEE
73.27 44th Annual Computers, Software, and Applications Conference
5 Decision T ree
(COMPSAC), 2020, pp. 1320-1325, doi:
6 K-Nearest Neighbors 57.30 10.1109/COMPSAC48688.2020.00-73.
[3] J. C. Rodrigues, J. T . Rodrigues, V. L. K. Gonsalves, A. U. Naik, P.
Shetgaonkar and S. Aswale,"Machine & Deep Learning T echniques for
Detection of Fake Reviews: A Survey," 2020 International Conference on
Fig.3 represents the accuracy values of all the Emerging T rends in Information Technology and Engineering (ic-ETITE),
implemented models in graphical manner. The graphical 2020, pp. 1-8, doi: 10.1109/ic-ET IT E47903.2020.063.
[4] Girgis, Sherry, Eslam Amer, and Mahmoud Gadallah. "Deep
representation helps to understand or to analyze the
learning algorithms for detecting fake news in online text." 2018 13th
performance of the models easily. Where the X-axis shows international conference on computer engineering and systems
the various algorithms, and the Y-axis shows the accuracies. (ICCES). IEEE, 2018. [5] Salminen, Joni, et al. "Creating and detecting
fake reviews of online products." Journal of Retailing and Consumer
Services 64 (2022): 102771.

978-1-6654-8271-4/22/$31.00 ©2022 IEEE 1029


Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:54 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Electronics, Communication and Aerospace Technology (ICECA 2022)
IEEE Xplore Part Number: CFP22J88-ART; ISBN: 978-1-6654-8271-4

[6] Elmurngi, Elshrif & Gherbi, Abdelouahed. (2018). Unfair reviews ABOUT AUTHOR
detection on Amazon reviews using sentiment analysis with supervised
learning techniques.
JCS. 14. 714-726. 10.3844/jcssp.2018.714.726. Pemmasani Manish Kumar,
[7]L. Gutierrez-Espinoza, F. Abri, A. Siami Namin, K. S. Jones and D.
R. W. Sears, "Ensemble Learning for Detecting Fake Reviews," 2020 studying B. Tech in Department of
IEEE 44th Annual Computers,Software, and Applications Conference Computer Science and Engineering
(COMPSAC), 2020, pp. 1320-1325, doi: from Koneru Lakshmaiah Education
10.1109/COMPSAC48688.2020.00-73. Foundation, his research interests are
[8]Sun, Chengai, Qiaolin Du, and Gang T ian. "Exploiting product
related review features for fake review detection." Mathematical
Artificial Intelligence, Cloud
Problems in Engineering 2016 (2016). Computing, Internet of Things. He is
[9]Wang, Ge, et al. "Fake Review Identification Methods Based on certified at Amazon web Services in
Multidimensional Feature Engineering." Mobile Information Systems Solution Architect Associate, Oracle
2022 (2022).
[10] Kumar, Jay. "Fake Review Detection Using Behavioral and
Foundation Associate. Aviatrix Multi-Cloud Networking
Contextual Features." Associate.
arXiv preprint arXiv:2003.00807 (2020).
[11]Möhring, Michael, et al. "HOT FRED: A Flexible Hotel Fake Shri Harrsha Samala, studying B.
Review Detection System."
Information and Communication T echnologies in T ourism 2021. Tech in Department of Computer
Springer, Cham, 2021. 308 Science and Engineering from Koneru
[12]. Kumar, C. N., Keerthana, D., Kavitha, M., & Kalyani, M. (2022, Lakshmaiah Education Foundation,
June). Customer Loan Eligibility Prediction using Machine Learning his research interests are Artificial
Algorithms in Banking Sector. In 2022 7th International Conference on
Communication and Electronics Systems (ICCES) (pp. 1007 -1012).
Intelligence, Cloud Computing,
IEEE. Internet of Things. He is certified at
[13] Kavit ha, M., Srinivas, P. V. V. S., Kalyampudi, P. L., & Amazon web Services in Solution
Srinivasulu, S. (2021, September). Machine Learning T echniques for Architect Associate, Aviatrix Multi-
Anomaly Detection in Smart Healthcare. In 2021 T hird International
Conference on Inventive Research in Computing Applications
Cloud Networking Associate, Wipro Talent Next.
(ICIRCA) (pp. 1350-1356). IEEE.
Konda Abhiram, studying B. Tech in
Department of Computer Science and
Engineering from Koneru Lakshmaiah
Education Foundation, his research
interests are Artificial Intelligence,
Cloud Computing. He is certified at
Amazon web Services in Solution
Architect Associate.

Dr M Kavitha, working as Associate


Professor in the Department of CSE, K L
University. She has 11 years of teaching
and research experience and has
published more than 25 articles in
reputable journals and conferences.
Participated in more than 50 global
challenges on various coding platforms
like codechef, hacker rank, hacker earth,
leet code, techgig, code forces and top coder. Her interested
research areas are Internet of Things, Artificial Intelligence
and Machine learning techniques.

M Kalyani working as Assistant


Professor in the department of
mathematics at Pace institute of
technology and sciences. She has 3
years of teaching experience and
published a good number of research
articles through Scopus indexed
conferences and journals. Her research
interests are mathematical and computational models.

978-1-6654-8271-4/22/$31.00 ©2022 IEEE 1030


Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:54 UTC from IEEE Xplore. Restrictions apply.

You might also like