Detection of Fake Online Reviews by Using Machine Learning
Detection of Fake Online Reviews by Using Machine Learning
M ohan Babu University (Erstwhile SreeVidyanikethan Engg. College SreeVidyanikethan Engg. College
SreeVidyanikethan Engg. College) Tirupati, India Tirupati, India
Tirupati, India [email protected] [email protected]
[email protected]
Abstract— Reviews, ratings, and personal stories written by the derived from the reviews. Yelp reviews, which are categorised
customers on online sites and also other services are helpful for using a few criteria, have been deemed to be a publicly available
both buyers and sellers. By writing reviews, the user may large scale and created dataset. Yelp reviews have been taken into
increase brand loyalty and help other buyers better consideration as a publicly available large-scale and created
understand about their product. If customers offer favourable dataset. These reviews are classified using a few well-known
feedback on reviews of their items, vendors can increase the supervised classifiers, which categorise them as true or misleading
sale of their products an d build additional profiles. by taking into account different data aspects. Given the complexity
Unfortunately, suppliers may abuse these review processes. of the suggested heterogeneous graph transformer model,
One may fabricate good reviews to boost a bran d reputation or significant computer resources may be needed to implement it.
attempt to denigrate rival brand’s items by posting fraud
reviews of negative evaluations. Based on the textual II. LITERATURE S URVEY
information in the reviews, sentiment classification has been Fan Cheng, et al. [1], proposed the complicated interactions
incorporated. Re views that are fake are identified and between customers, goods, and reviews are captured by using a
classified, classification models produce the results by applying heterogeneous graph transformer model. Utilizing user preferences
the machine learning algorithms. Hence, supervised learning and reviewer content analysis, the system seeks to increase the
model is used for labelling the reviews to identify the review as efficiency and accuracy of product suggestions. It was conducted
fake-review or genuine review. by using a small dataset, which might not accurately reflect real-
world circumstances and restricts the generalizability of the results.
Keywords— Sentimental Analysis, Text mining, Product review, It also given the complexity of the suggested heterogeneous graph
Machine Learning. transformer model, significant computer resources may be needed
to implement it
I. INTRODUCTION
The advancement of internet innovation has significantly changed J. Wang, et al. [2], proposed their strategy using two real-world
how people live their lives nowadays. Different e-commerce datasets, comparing the outcomes to a number of cutting-edge
websites, such as Amazon, Flipkart with the internet users techniques for false review identification. They discover that in
effective, and largely dependable environment for online. M ore regards to accuracy and F1-score, their strategy performs better
and more business owners are choosing to create their online stores than these approaches. The authors talk about the difficulties in
on various platforms. As more customers gradually become deep learning and provide numerous methods and algorithms to
accustomed to this method of buying, they automatically share solve these problems. They also emphasize the value of learning
their opinions and experiences online through the e-commerce algorithms in resolving complicated issues and its possibility for
website's review system. These reviews often reflect the quality of development in the future and introduces a unique method for
the product or the user experience because the majority of them are detecting false reviews that incorporates a number of variables as
written by online shoppers. Before making an order to purchase well as a rolling cooperative training strategy to increase precision.
items, more and more consumers have become accustomed to The review of the literature emphasis the importance of research
reading online reviews. M oreover, many business owners are on machine learning and its uses in a variety of sectors.
aware that the more favourable internet evaluations they have, the Fei et al. [3], authors examine the patterns of review activity and
more transactions they have, and the faster they may grow and discover that spammers frequently blast forth several reviews in a
establish a solid reputation. The major goal is to analyse the key short period of time. To identify review spammers, the authors
review and review-centric features that have been suggested to create a burstiness-based method that takes into account the
identify fraud or fake reviews, particularly methods that use distributions of the time gaps between 2 sequential reviews for
supervised machine learning techniques. Opinion spam detection each reviewers and test the strategy against a variety of current
can detect fraudulent reviews, fake stories, fake blogs, fraudulent spammer detection techniques using a real-world review dataset.
social networking postings, and deceptive messaging. When By utilizing the burstiness of reviews, this work makes a special
detecting fraudulent reviews, review-focused websites like Yelp addition to the field of reviewing spammer identification. The
might be taken into account. Unsupervised methods that are based authors' findings emphasize how crucial it is to take into account
on graphical techniques but are not very reliable have been used up the time patterns of reviewer activity while looking for review
to this point to detect bogus reviews. The supervised techniques spammers. This study has potential implications in e-commerce,
take into account both the reviewer's behaviour and other attributes
where unreliable and biassed customer reviews can significantly Yashika goyal, et al. [11], identifies fraud reviews in e-commerce
influence purchasing decisions. platforms and points out the shortcomings of conventional
approaches. SVM and Naive Bayes, were used in earlier research
Luhua Jin, et al. [4], proposed by exploiting the data in the to identify fake review and how their own approach is superior to
heterogeneous network, which is made up of customer, product, these earlier ones. The authors also describe how the suggested
and review nodes, the authors hope to improve the caliber of method was tested on a sizable datasets and how it was able to
product reviews. The heterogeneous graph transformer, which the identify fraudulent reviews with a high degree of accuracy. The
study offers, takes the heterogeneous graph for input and produces findings of this investigation show how well the suggested
as the representations for each node, is a novel model. By the help approach works at spotting bogus reviews and emphasise its
computational effectiveness and recommendation accuracy , the
potential for practical use. Overall, fake review identification at
findings demonstrate for suggested model performs better than the present and demonstrates how the suggested approach makes a
alternative and offers a fresh approach for raising the calibre of substantial addition to the area.
product reviews while also demonstrating the usefulness of the
heterogeneous graph transformer for ecommerce review systems. III. METHODOLOGY
HinaTufail et al. [5], The authors determined that fraudulent To sell products, people post unjustified positive reviews about the
reviews can significantly affect customer behavior, sales revenue, product. Sometimes fake reviews were also written against other
and market dynamics after conducting a literature scan to assess (competition) items in an effort to harm their reputation. Some of
prior studies on the subject and also covered the difficulties e- these are not reviews which don't express any views about the
commerce businesses confront in identifying and reducing the goods. It might be challenging to predict the nature of someone's
effects of fake review and offered some viable remedies. Those opinion when they make contradicting assertions. In a poor review,
reports show the need for additional research in this field and there might be a concealed positive meaning.
offers valuable information on the problem of fraudulent review in
e-commerce. Now a days people are using online E-commerce site to write
reviews on their own way in their respective accounts. People often
Z. Liu, et al. [6], The authors address the issue that because GNNs make contradictory claims, making it challenging to predict the
are sensitive to the structure and characteristics of the graph data, nature of their opinions. In a poor review, there might be a
they are unreliable in detecting fraud. The suggested method concealed positive meaning. Additionally, opinions regarding the
increases the accuracy and efficiency of GNNs in detection of product can occasionally be both favourable and unfavourable.
fraud by enhancing their consistency and stability. The study, After facing all of these difficulties, it becomes increasingly harder
which has been contributing to the expanding area of graph-based to identify reviews that are fake or that are being exploited to
fraudulent activities. swam consumer opinion. Since consumers these days heavily rely
on opinions and reviews, ecommerce sites and other service
J. Wang et al. [7], To increase the effectiveness of fake review
providers have a huge challenge with opinion spamming with the
identification, the technique combines a number of characteristics,
help of review detection.
including text-based and image-based features. The model can
adapt to new data and avoid overfitting to the rolling collaboration The Review system can offers to categorize fake reviews into fraud
training strategy. The suggested technique was tested on a data set review and no fraud review in order to identify any such spammed
by the authors, who found that it performed better than other fake reviews and address the major issue that online websites
cutting-edge techniques in terms of precision. confront due to opinion spamming. Using Naive Bayes, logistic
regression, SVM s and Decision Trees algorithms, these method
Y. Wu et al. [8], authors give a description of the definitions,
aims to more accurately classify the reviews obtained. Just a few of
causes, and effects of false reviews in addition to a rundown of the
the accessible datasets from many sources and categories. In
prevention and detection strategies that have been created to
additional to the review depth, other aspects are employed to boost
address the problem. A precise statement of fake reviews is
accuracy, such as comparing training and test accuracy , detection
required, and there may be room for new technology to aid in their
Product review type and ratio, Detection product review type with
detection and prevention, among other problems and possibilities
the overall score. This supervised learning method uses various
for future study in the topic that are also mentioned in the report.
machine learning algorithms to identify fraud reviews and no fraud
the authors suggest a study roadmap for the development of
reviews.
research on fraud online reviews, highlighting the significance of
multidisciplinary cooperation and the value of cross-disciplinary
cooperation and the requirement for data-driven methods.
J. Salminen et al. [9], authors cover strategies for identifying false
reviews as well as approaches for producing them. Additionally,
they provide their own contribution to the subject, such as a
collection of authentic and fraudulent ratings and a model for
identifying fraudulent reviews that combines language and meta-
data elements and also gives a thorough assessment of the state-of-
the-art at the moment, in this field and dis cusses the difficulties
and prospects for further research.
S.N.Tran, et al. [10], The authors evaluate a wide range of
currently used methodologies and strategies for spotting false
reviews, including network-based methods, machine learning
algorithms, and approaches based on natural language processing.
The authors also discuss the difficulties in detecting false reviews,
such as the abundance of data, the dynamic properties of fraudulent
reviews, and the dearth of annotated data and discusses the need
for more study to address the market conditions change of fake
reviews and to increase the precision and efficacy of review spam
detection algorithms.
happy it made them rather than describing the nature of the product The SVM classification algorithm performed better than the
or what it accomplishes. Analysis of the review's sentiment remaining other models when the vales is set to the review_Text
contributes to the determination of whether it is authentic or and the label attributes
fraudulent.
The result of confusion matric as shown in below figure.
Finally, there will be an outcome of the four algorithms can be
obtained and comparing with those algorithms SVM has high
accuracy as compared to other algorithms as NavieBayes, Logistic
Regression and Decision Tree classifier.
IV. RESULTS
A. Generating accuracy for the trained and test results
The classification report of 4 algorithms models based on the Fig 2. Confusion M atrix
review dataset.
D. Classification metrics:
Table 2: classification report of precision and recall
Classification report
s.no model Precision Recall
Table 4: Classification Metrics for S VM
1 Navie Bayes 0.98 1.00
s.no metrics precision recall F1-score support
2 SVM 0.96 0.98
1 Accuracy 0.95 202
3 Logistic regression 0.92 0.99
2 M acro_avg 0.58 0.55 0.56 202
4 DTC 0.95 0.98
3 Weighted_avg 0.93 0.95 0.94 202
From table2, can give the classification report of the 4 models with
the respective metrics as precision and recall values. In which From table 4, shows the classification report of the algorithm
naviebayes has higher precision and recall metrics can obtained. It SVM , where it can show results of the metric values such as
can give complete information about the classification reports on macro_avg, Weighted_avg and accuracy .
the metric values. E. Trained and Test accuracy results:
Table 3: classification report of F1-score and support
The plots of bar graphs shows the accuracy of trained and test
S.no M odel F1-score Support values for the four algorithms models.
1 Navie Bayes 0.98 195 In Figure 3, it shows the results of trained data and test data from
the dataset, which can give the visualization of the accuracy
2 SVM 0.97 192
results. The results of the Naviebayes, SVM , Logist ic regression
3 Logistic regression 0.97 192 and decision tree classifier methods.
4 DTC 0.94 194
From table3, can give the classification report of the 4 models with
the respective metrics as F1-score and support values. In which
Naviebayes has higher precision and recall metrics can obtained.
C. Confusion matrix
REFERENCES
[1]. LuhuaJin, Songkai Tang and Fan Cheng,” Online Product Review
Systems via Heterogeneous Graph Transformer” IEEE Entry,
2022,Vol.9 .
[2]. J. Wang, H. Kan, F. Meng, Q. Mu, G. Shi, and X. Xiao, ‘‘Fake
review detection based on multiple feature fusion and rolling
Fig 5: product review ratio collaborative training,’’ IEEE Access, vol. 8, pp. 182625–182639,
2020.
[3]. Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., & Ghosh, [21]. Kanika and J. Singla, "A Survey of Deep Learning based Online
R. (2021).” Exploiting Burstiness in Reviews for Review Spammer Transactions Fraud Detection Systems," 2020 International
Detection” Vol 7. Conference on Intelligent Engineering and Management (ICIEM),
[4]. S. T ang, L. Jin and F. Cheng, "Fraud Detection in Online Product London, UK, 2020, pp. 130-136, doi:
Review Systems via Heterogeneous Graph Transformer," in IEEE 10.1109/ICIEM48762.2020.9160200.
Access, vol. 9, pp. 167364-167373, 2021, doi:
10.1109/ACCESS.2021.3084924. [22]. V. Gupta, A. Aggarwal and T. Chakraborty, "Detecting and
[5]. Hina T ufail, M. Usman Ashraf, Khalid Alsubhi and Hani Moaiteq Characterizing Extremist Reviewer Groups in Online Product
Aljahdali “The Effect of Fake Reviews on e-Commerce During and Reviews," in IEEE Transactions on Computational Social Systems,
After Covid-19 Pandemic” IEEE Access, Vol 10 . vol. 7, no. 3, pp. 741-750, June 2020, doi:
[6]. Z. Liu, Y. Dou, P. S. Yu, Y. Deng, and H. Peng, “Alleviating the 10.1109/TCSS.2020.2988098.
inconsistency problem of applying graph neural network to fraud [23]. B. Conlin and U. Ruhi, "Current Research Landscape of Machine
detection,” in SIGIR, 2020. Learning Algorithms in Online Identity Fraud Prediction and
[7]. J. Mach. Learn. Res., vol. 7, pp. 1–30, Jan. 2006. [32] J. Wang, H. Detection," 2021 IEEE International Conference on Technology
Kan, F. Meng, Q. Mu, G. Shi, and X. Xiao, ‘‘Fake review detection Management, Operations and Decisions (ICTMOD), Marrakech,
based on multiple feature fusion and rolling collaborative training,’’ Morocco, 2021, pp. 1-6, doi:
IEEE Access, vol. 8, pp. 182625–182639, 2020. 10.1109/ICTMOD52902.2021.9739308.
[8]. Y. Wu, E. W. T. Ngai, P. Wu and C. Wu, "Fake online reviews:
Literature review synthesis and directions for future research", [24]. R. S. Solitario, "Fake Delivery Bookings In Context-Aware Food
Decis. Support Syst., vol. 132, May 2020. Delivery Systems: A Literature And Mobile Apps Review," 2021 1st
[9]. J. Salminen, C. Kandpal, A. M. Kamel, S.-G. Jung and B. J. Jansen, International Conference in Information and Computing Research
"Creating and detecting fake reviews of online products", J. (iCORE), Manila, Philippines, 2021, pp. 1-5, doi:
Retailing Consum. Services, vol. 64, Jan. 2022. 10.1109/iCORE54267.2021.00019.
[10]. R. Mohawesh, S. Xu, S. N. T ran, R. Ollington, M. Springer, Y.
Jararweh, et al., "Fake reviews detection: A survey", IEEE Access, [25]. S. Shehnepoor, R. Togneri, W. Liu and M. Bennamoun, "HIN-RNN:
vol. 9, pp. 65771-65802, 2021. A Graph Representation Learning Neural Network for Fraudster
[11]. D. j. S. K. Sayam Kumar Yashika Goyal, "Fake Reviews Filtering Group Detection With No Handcrafted Features," in IEEE
System Using Supervised Machine Learning," IEEE, vol. 9, no. 14 Transactions on Neural Networks and Learning Systems, doi:
October 2022, p. 10, 2022. 10.1109/TNNLS.2021.3123876.
[12]. S. Alaa, M. A. Farooq, and M. Younas, "Deep Learning Approaches [26]. J. Zhou, Y. -F. Liu and H. -L. Sun, "A Reputation Ranking Method
for Fraud Detection: A Comprehensive Review," in IEEE, vol. 6, no. based on Rating Patterns and Rating Deviation," 2022 5th
14, 2020. International Conference on Data Science and Information
[13]. S. Shehnepoor,R.Togneri,W.Liu and M.Bennamoun, "ScoreGAN:A Technology (DSIT ), Shanghai, China, 2022, pp. 1-6, doi:
Fraud Review Detector Based on Regulated GAN With Data 10.1109/DSIT55514.2022.9943923.
Augmentation," in IEEE Transactions on Information Forensics and
Security,vol.17,pp.280-291,2022,doi:10.1109/TIFS.2021.3139771.S. [27]. K. Joshi, S. Kumar, J. Rawat, A. Kumari, A. Gupta and N. Sharma,
Bagga, A. Goyal, N. Gupta, and A. Goyal, “Credit Card Fraud "Fraud App Detection of Google Play Store Apps Using Decision
Detection using Pipeling and Ensemble Learning,” Procedia Tree," 2022 2nd International Conference on Innovative Practices in
Comput. Sci., vol. 173, pp. 104–112, 2020. Technology and Management (ICIPTM), Gautam Buddha Nagar,
[14]. L. P. Pracidelli and F. S. Lopes, "Fraud identification architecture India, 2022, pp. 243-246, doi:
using data mining and machine learning in a private transport 10.1109/ICIPTM54933.2022.9754207.
company that operates by applications," 2020 15th Iberian
Conference on Information Systems and T echnologies (CIST I), [28]. S. Shehnepoor, R. Togneri, W. Liu and M. Bennamoun,
Seville, Spain, 2020, pp. 1-6, doi: "ScoreGAN: A Fraud Review Detector Based on Regulated GAN
With Data Augmentation," in IEEE Transactions on Information
10.23919/CISTI49556.2020.9140992.
Forensics and Security, vol. 17, pp. 280-291, 2022, doi:
[15]. M. N. Ashtiani and B. Raahemi, "Intelligent Fraud Detection in 10.1109/TIFS.2021.3139771.
Financial Statements Using Machine Learning and Data Mining: A
Systematic Literature Review," in IEEE Access, vol. 10, pp. 72504- [29]. C. G. Harris, "Combining Linguistic and Behavioral Clues to Detect
Spam in Online Reviews," 2022 IEEE International Conference on
72525, 2022, doi: 10.1109/ACCESS.2021.3096799.
e-Business Engineering (ICEBE), Bournemouth, United Kingdom,
[16]. G. J. Priya and S. Saradha, "Fraud Detection and Prevention Using 2022, pp. 38-44, doi: 10.1109/ICEBE55470.2022.00017.
Machine Learning Algorithms: A Review," 2021 7th International
[30]. P. Rathore, J. Soni, N. Prabakar, M. Palaniswami and P. Santi,
Conference on Electrical Energy Systems (ICEES), Chennai, India,
"Identifying Groups of Fake Reviewers Using a Semisupervised
2021, pp. 564-568, doi: 10.1109/ICEES51510.2021.9383631.
Approach," in IEEE Transactions on Computational Social Systems,
[17]. C. G. Harris, "Detecting Fake Yelp Reviews Using a Positional vol. 8, no. 6, pp. 1369-1378, Dec. 2021, doi:
LSTM / K-L Divergence Ensemble Approach," 2022 1st 10.1109/TCSS.2021.3085406.
International Conference on Information System & Information
Technology (ICISIT), Yogyakarta, Indonesia, 2022, pp. 61-66, doi: [31]. Silpa, C., Niranjana, G., Ramani, K. (2022). Fraud detection of
10.1109/ICISIT54091.2022.9872788. review using classification models- An Extensive Study. In:
Manogaran, G., Shanthini, A., Vadivu, G. (eds) Proceedings of
[18]. B. Al Smadi and M. Min, "A Critical review of Credit Card Fraud International Conference on Deep Learning, Computing and
Detection Techniques," 2020 11th IEEE Annual Ubiquitous Intelligence. Advances in Intelligent Systems and Computing, vol
Computing, Electronics & Mobile Communication Conference 1396. Springer, Singapore.
(UEMCON), New York, NY, USA, 2020, pp. 0732-0736, doi:
10.1109/UEMCON51285.2020.9298075. [32]. Inayathulla, Mohammed, and C. Silpa. "An Approach to Reduce
fraud review spam by Using Machine Learning Techniques."
[19]. S. Vyas and S. Serasiya, "Fraud Detection in Insurance Claim International Journal of Computer Science and Network Security
System: A Review," 2022 Second International Conference on (IJCSNS) 15, no. 9 (2015): 99.
Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India,
[33]. V Jyothsna, D R Kumar Raja, G Hemanth Kumar, Dileep Chnadra
2022, pp. 922-927, doi: 10.1109/ICAIS53314.2022.9742984.
E, “A Novel Manifold approach for fraud detection, Gongcheng
[20]. D. H. Bhatt and A. Meniya, "A Review on Machine Learning Kexue Yu Jishu/Advanced Engineering Science, Vol 54, Issue 02,
Methods for Credit Card Fraud Classification," 2022 Second PP.2043 – 2076, 2022
International Conference on Artificial Intelligence and Smart Energy
[34]. Jyothsna, V., Prasad, K.M., Rajiv, K. et al. Review based system
(ICAIS), Coimbatore, India, 2022, pp. 312-318, doi:
10.1109/ICAIS53314.2022.9743014. using ensemble classifier with Feature Impact Scale. Cluster Comput
24, 2461–2478 (2021).