Deep Learning Based Model For Fake Review Detection
Deep Learning Based Model For Fake Review Detection
Detection
Digvijay Singh Minakshi Memoria Rajiv Kumar
Research Scholar, Dept of CSE, Department of Computer Science Department of Computer Science
Uttaranchal University, Dehradun, and Engineering Uttaranchal and Engineering Uttaranchal
Uttarakhand, India Instituteof Technology, Instituteof Technology,
Uttaranchal University, Uttaranchal University,
[email protected] Dehradun, India Dehradun, India
[email protected] [email protected]
Abstract— In present time, peoples are more inclined towards have been instances of people ill-using this anonymity
the e-commerce for their purchases and their choices are much facility by publishing unreal or even false feedbacks and
influenced by the reviews available over there as review plays reviews with a motive to trick customers and gain
an important role in making their decision. If the reviews are significant commercial advantages [5]. These reviews are
more positive the possibility to buy the product is referred to as "opinion spam" or, more specifically, "false
comparatively high. Here, the necessity arrives to develop a reviews," and the individuals who engage in these nefarious
sustainable approach for the detection of malicious reviews to deeds are referred to as "opinion spammers". Several
save the customers from the fraud. There are many sites or significant developments have been made to enhance
agencies are available which are hired by the merchandise to automatic fake review identification over the past decade. In
generate the positive reviews for them to increase their sales or these techniques machine learning techniques have gained a
damage the competitor’s product sales. Deep learning
good repute to automatically detect the fake reviews with a
methodologies for malicious review detection includes,
Convolutional Neural Network (CNN) and Long Short-Term
good accuracy. Generally, the fake negative reviews posted
Memory (LSTM) are proposed in this paper. We have also by the spammers is to defame the competitor product. But
compared the performance of these methods with state of arts the reviews that include harsh comments, which accurately
techniques such as Naive Bayes (NB), K Nearest Neighbour reflect the opinions of genuine customers, cannot be
(KNN) and Support Vector Machine (SVM) for the detection categorized as spam. Therefore, it has become crucial to
of fake reviews and ultimately, its efficiency is illustrated for identify legitimate evaluations from spam to make online
both the traditional and the deep learning classifiers. reviews credible.
II. LITERATURE REVIEW
Keywords— Fake reviews, Spam review detection, Deep
learning, CNN, KNN, LSTM, SVM. Still, recognition of the fake or spam reviews is
considered as major challenge or concern for online
I. INTRODUCTION shopping. The majority of fake review detection approaches
During past few years the connectivity with the extract useful information from the review text. Bag-of-
world wide web has become an important aspect of our day words [12], psycholinguistic word lists, and part-of-speech
to day routine. People communicate and express their views tagging [13] are typical representations of these properties.
or opinion in the form of blogs, ratings, reviews or posts on In [14], aspect sentiment was employed to identify
various online platforms like social media sites, online misleading users. Jindal and Liu [7] perform a study using
trading websites, virtual commerce, blogs, or review sites. logistic regression classifier to detect fake reviews of
Also, in present, peoples are more attracted towards the e- product based on fake reviewer’s motive to replicate the
commerce for their purchases where this shared opinion product or service review. Persuaded by the reliability of
plays a very important role and considered as a trusted reviewer a bidirectional NN with an attention mechanism is
information [2] that’s why a customer believes that before used by Liu et al. [8] to create the multimodal embedded
making a purchasing decision, they should read the product representation of nodes in their suggested probabilistic graph
reviews. If the reviews are more positive the possibility to classifier. To mine consumer reviews, Minqing Hu and Bing
buy the product is comparatively high [1]. According to the Liu [9] suggested a set of data mining and natural language
survey [3], less than 20% customers do not treat online processing- based approaches for summarising product
reviews same as the personal recommendations. As a result, reviews. Semantic clustering was used by the Wang P. et al.
these reviews are considered as a fundamental component of [10] by incorporating a new layer to the CNN
business and a motivator for both customers and commercial architecture.
organizations. But here a question arises “Are the opinions An enhanced four-layer OpCNN algorithm based on the
or reviews that people are expressing authentic or real?” Chinese word order problem was described by Zhao et al.
In fact, according to current statistics, one in three [11]. Phrases including a specific word order are used as
TripAdvisor reviews are false [4]. Additionally, because input for input layer. Authors optimized the OpCNN model
anybody may write reviews for free and anonymously, there parameters and employed the k-max pooling approach. Lin
92
ISBN: 979-8-3503-9648-5/23/$31.00 ©2023 IEEE
Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:19 UTC from IEEE Xplore. Restrictions apply.
et. Al. [15] used sentimental classification algorithm, RNN Processing (NLP) operations like tokenization, punctuation
and LSTM and show how LSTM could be utilized for removal, managing any missing data, stop-word removal,
solving the long-term reliance issue by adding a memory to and stemming are performed as part of the pre-processing
the network that may have an impact on a document's process.
meaning and polarity. A deep attention algorithm based on We use the active learning method, also used by Istiaq et al.
recurrent neural networks (RNN) was proposed by Chen T [19], for labelling the dataset of yelp.com. SVM decision
et al. [16] to selective learning of temporal characterization function was used to select the unlabeled data samples and
of sequential rumour detection reports. Examine more were trained using the highest and lowest average absolute
complex features from belief clustering output and user
confidence.
activity trends to enhance early detection efficiency. Tang et
al. [17] conducted a sentiment classification experiment
using four large datasets, including three Yelp.com
restaurant review datasets and one IMDB dataset of movie
reviews. When it comes to classifying reviews as positive or
negative, the performance comparison demonstrates that
LSTM performs better than other classifiers. Numerous
research works had been developed using conventional
techniques, but researchers are constantly working to
increase the accuracy of detecting spam reviews.
III. PROPOSED SYSTEM
The major objective of our work is to identify spam
textual reviews using deep learning techniques to improve
the spam identification methodology with meaningful
outcomes.
93
Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:19 UTC from IEEE Xplore. Restrictions apply.
adjusted the hidden layer sizes to 50, 100, and 200. The Test embedding dimension
batch size, epoch count, and drop out parameters were also Ratio
adjusted. 90:10 200 200 94.88% word2vec
In experiment 2 CNN & LSTM methods were used over 90:10 100 94.64% word2vec
“Ott Dataset”.
80:20 100 93.67% word2vec
II. RESULT ANALYSIS AND COMPARISON:
70:30 100 93.66% word2vec
Table 6 shows the outcomes of some previous
research works, all of which used SVM, KNN, and NB 60:40 50 92.18% word2vec
conventional techniques.
Experiment 1 uses some conventional classifiers,
including Naive Bayes (NB), K-Nearest Neighbour (KNN), Table 6. Outcomes of some previousworks
and Support Vector Machine (SVM) To assess the
performance of the "Yelp Dataset”. Paper Technique Classifier Accuracy
Table 2. Result of lstm over ott dataset used
[18] Bigrams SVM 89.60%
Train-to- Dimensi Hidden
onal dimension Accuracy Technique [21] Unigram + SVM 86%
Test
embedd Bigram
Ratio
ing [22] features (n- SVM 86%
90:10 100 200 92.15% word2vec gram)
[23] Unigram KNN, NB, 82.00%
80:20 100 200 92.39% word2vec SVM, DT
Unigrams,
70:30 100 50 93.89% word2vec [24] bigrams, trigrams SVM 90.00%
and
60:40 200 50 92.35% word2vec fourgrams
94
Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:19 UTC from IEEE Xplore. Restrictions apply.
VI. FUTURE WORKS ML4Cyber, PAISI, DaMEMO, Melbourne, VIC, Australia, June 3,
2018, Revised Selected Papers 22 (pp. 40-52). Springer International
A lot of opportunities are available for the Publishing, 2018.
improvement of our work in the future. The size of the [19] D. Tang, B. Qin, and T. Liu, “Document modeling with gated
dataset can be increased. Deep learning methods can be used recurrent neural network for sentiment classification,"
In Proceedings of the 2015 conference on empirical methods in
for data labelling to label the unlabeled data. reviewer and natural language processing (pp. 1422-1432), 2015, September.
product-based features can also be included for better [20] E. R Kumar, and E. A Kaushik, “Premature convergence and genetic
results. Other variation of deep learning methods like hybrid algorithm under operating system process scheduling
problem,” Journal of Global Research in Computer Science, 1(5),
model can also be introduced. 2010.
[21] M. Ott, Y. Choi, C. Cardie, and J. T Hancock, “Finding deceptive
REFERENCE opinion spam by any stretch of the imagination,” arXiv preprint
[1] N. Jindal and B. Liu, “Opinion spam and analysis,” In Proceedings arXiv:1107.4557, 2011.
of the 2008 international conference on web search and data mining [22] D. Lin, Y. Matsumoto, and R. Mihalcea, “Proceedings of the 49th
(pp. 219-230), 2008 February. annual meeting of the association for computational linguistics:
[2] J. K Rout, A. K Dash, and N. K Ray, “A framework for fake review Human language technologies,” In Proceedings of the 49th Annual
detection: issues and challenges,” In 2018 International Conference Meeting of the Association for Computational Linguistics: Human
on Information Technology (ICIT) (pp. 7-10). IEEE, 2018, Language Technologies, 2011, June
December. [23] M. I Ahsan, T. Nahian, A. A Kafi, M. I Hossain and F. M Shah,
[3] R. Kumar, M. Memoria, A. Gupta, and M. Awasthi, “Critical “Review spam detection using active learning,” In 2016 IEEE 7th
Analysis of Genetic Algorithm under Crossover and Mutation Rate,” Annual Information Technology, Electronics and Mobile
In 2021 3rd International Conference on Advances in Computing, Communication Conference (IEMCON) (pp. 1-7). IEEE, 2016,
Communication Control and Networking (ICAC3N) (pp. 976-980). October.
IEEE, 2021, December [24] S. M Mohammad, and P. D Turney, “Crowdsourcing a word–
[4] Bright Local. Local consumer review survey 2023. Available at: emotion association lexicon,” Computational intelligence, 29(3),
https://www.brightlocal.com/research/local-consumer-review- 436-465, 2013.
survey/. Accessed on 8 Nov 2023. [25] M. Ott, C. Cardie and J. T Hancock, “Negative deceptive opinion
[5] The Times. A third of TripAdvisor reviews are fake’ as cheats buy spam,” In Proceedings of the 2013 conference of the north american
five stars. Available at: https://www.thetimes.co.uk/article/hotel-and- chapter of the association for computational linguistics: human
caf-cheats-are-caught-trying-to-buy-tripadvisor-stars-027fbcwc8. language technologies (pp. 497-501), 2013, June.
Accessed on 22 Jan 2022. [26] R. Kumar, S. Gill and A. Kaushik, “An impact of cross over operator
[6] B. Liu, “Sentiment analysis and opinion mining. Synthesis lectures on the performance of genetic algorithm under operating system
on human language technologies,” 5(1), 1-167, 2012 process scheduling problem,” In 2011 International Conference on
[7] M. Ott, C. Cardie and J. T Hancock, “Negative deceptive opinion Communication Systems and Network Technologies (pp. 704-708).
spam,” In Proceedings of the 2013 conference of the north american IEEE, 2011, June.
chapter of the association for computational linguistics: human [27] E. Elmurngi and A. Gherbi, “An empirical study on detecting fake
language technologies (pp. 497-501), 2013, June. reviews using machine learning techniques,” In 2017 seventh
[8] R. Kumar, “Efficient Genetic Operators Based on Permutation international conference on innovative computing technology
Encoding under OSPSP,” Int. J. Latest Res. Sci. Technol., 1(1), 55- (INTECH) (pp. 107-114). IEEE, 2017, August.
59, 2012. [28] H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams and
[9] N. Jindal, and B. Liu, “Analyzing and detecting review spam,” fake news using text classification,” Security and Privacy, 1(1), e9,
In Seventh IEEE international conference on data mining (ICDM 2018.
2007) (pp. 547-552). IEEE, 2007, October.
[10] Y. Liu, B. Pang and X. Wang, “Opinion spam detection by
incorporating multimodal embedded representation into a
probabilistic review graph,” Neurocomputing, 366, 276-283, 2019
[11] H. Minqing and L. Bing, “Mining and summarizing customer
reviews,” Proceedings of the tenth ACM SIGKDD international
conference on Knowledge discovery and data mining, 2004.
[12] P. Wang, J. Xu, B. Xu, C. Liu, H. Zhang, F. Wang and H. Hao,
“Semantic clustering and convolutional neural network for short text
categorization,” In Proceedings of the 53rd Annual Meeting of the
Association for Computational Linguistics and the 7th International
Joint Conference on Natural Language Processing (Volume 2: Short
Papers) (pp. 352-357), 2015, July.
[13] S. Zhao, Z. Xu, L. Liu, M. Guo, and J. Yun, “Towards accurate
deceptive opinions detection based on word order-preserving
CNN,” Mathematical Problems in Engineering, 2018.
[14] N. A. Patel and R. Patel, “A survey on fake review detection using
machine learning techniques,” In 2018 4th International Conference
on Computing Communication and Automation (ICCCA) (pp. 1-6).
IEEE, 2018, December.
[15] N. A. Patel and R. Patel, “A survey on fake review detection using
machine learning techniques,” In 2018 4th International Conference
on Computing Communication and Automation (ICCCA) (pp. 1-6).
IEEE, 2018, December.
[16] Y. Liu, and B. Pang, “A unified framework for detecting author
spamicity by modeling review deviation,”w Expert Systems with
Applications, 112, 148-155, 2018.
[17] T. Lin, B. G Horne, P. Tino, and C. L Giles, “Learning long-term
dependencies in NARX recurrent neural networks,” IEEE
Transactions on Neural Networks, 7(6), 1329-1338, 1996.
[18] T. Chen, X. Li, H. Yin, and J. Zhang, “Call attention to rumors:
Deep attention based recurrent neural networks for early rumor
detection,” In Trends and Applications in Knowledge Discovery and
Data Mining: PAKDD 2018 Workshops, BDASC, BDM,
95
Authorized licensed use limited to: Alliance College of Engineering and Design Bangalore. Downloaded on February 11,2024 at 15:27:19 UTC from IEEE Xplore. Restrictions apply.