Final Report
Final Report
the degree
BACHELOR OF TECHNOLOGY
In
by
Session 2022-23
DECLARATION
I hereby declare that the project entitled “Fake Customer Review Detection System” has been
prepared by me under the supervision of Mr. Vijendra Pratap Singh, Assistant Professor/ Associate
Professor, Department of Computer Science & Engineering, Institute of Technology and
Management, Gida, Gorakhpur, U.P.The contents of this project report have not been submitted to
any other college/university for the award of any degree.
Student Name- Abhishek Kumar Roy, Devsharan Singh, Imran Raeeni, Izmamul Ansari Roll
No. - 1901200100003, 1901200100044, 1901200100055, 1901200100058
Pin-273209
Date:
ii
CERTIFICATE
I certify that Abhishek Kumar Roy, Devsharan Singh, Imran Raeeni, Izmamul Ansari have carried
out this project work entitled “Fake Customer Review Detection System”, for the award of B.Tech
degree in Computer Science & Engineering from Institute of Technology and Management, under
my supervision. He has carried out the work at Department of Computer Science & Engineering,
Institute of Technology and Management, Gida, Gorakhpur.
Assistant Professor
Pin- 273209
Date:
iii
ACKNOWLEDGEMENT
I would like to thank my project guide “Mr. Vijendra Pratap Singh”, Assistant Professor/Associate
Professor, Department of Computer Science & Engineering, Institute of Technology and
Management, Gida, Gorakhpur, U.P. for his valuable guidance and suggestions.
I would like to thank my project coordinator “Dr.Harvendra Kumar Patel”, Associate Professor,
Department of Computer Science & Engineering, Institute of Technology and Management, Gida,
Gorakhpur, U.P. for his valuable guidance and suggestions. I am thankful for his continual
encouragement, support, and invaluable suggestions. Without his encouragement and guidance, this
project would not have been materialized. Throughout the writing of the project, I have received a
great deal of support and assistance.
I am very thankful to HoD, Department of Computer Science and Engineering, for his kind
cooperation.
I would also like to thank to the Honourable Director Sir for his kind help and support.
I would also like to thanks to all my friends who continuously supported me. I want to express my
appreciation to every person who contributed either with inspirational or actual work to this project.
Finally, I must express my very profound gratitude to my parents. Thanks to all.
Student Name- Abhishek Kumar Roy, Devsharan Singh, Imran Raeeni, Izmamul Ansari Roll
No. -1901200100003, 1901200100044, 1901200100055, 1901200100058
iv
ABSTRACT
Now a day’s online shopping become a daily activity for humans. Before going to buy any product
in e-commerce business organization like flipkart, amazon, etc. Customer checks the reliability of a
product. Reviews are the one of the important way to check reliability of a product. Customer will
check reviews posted by other customers to buy a product. In reviews there are a positive and
negative reviews as well as fake reviews. If a customer bought a product by seeing fake review, if
the product is really good no problem otherwise a product loses its reliability. We are here to
perform sentiment analysis on restaurant reviews to find number of correct and number of wrong
predictions made by the classifier which is further helpful to classify reviews into real or fake. The
classifiers used in our project are Natural Language Processing, Support vector Machine (SVM),
and Naive Bayes. The measured results of our experiments show that the SVM algorithm
outperforms other algorithms, and that it reaches the highest accuracy not only in text classification,
but also in detecting fake reviews.
v
LIST OF ABBREVIATIONS
vi
TABLE OF CONTENTS
DECLARATION…………………………………………………………………………………….ii
CERTIFICATE ................................................................................................................................... iii
ACKNOWLEDGEMENT ...................................................................................................................iv
ABSTRACT ...................................................................................................................................... ...v
LIST OF ABBREVIATIONS……………………………………………………………………….vi
TABLE OF CONTENTS ................................................................................................................... vii
LIST OF FIGURES ............................................................................................................................viii
LIST OF TABLES ....................................................................................................................................................... ix
CHAPTER 1 ....................................................................................................................................... 1
INTRODUCTION .............................................................................................................................. 1
1.1 OVERVIEW ................................................................................................................................................. 1
viii
LIST OF FIGURES
ix
LIST OF TABLES
x
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
Online purchasing is rising bit by bit since each service or product is easily accessible. Sellers are
obtaining more reaction to one’s corporation factors. Several people generally frustrated kinds of
persons misdirect others by sharing false comments to encourage or damage the image of any specific
goods or services according to wish. Such people are known as perception spammers and the false
reviews they give are considered as fake comments. Although customer reviews could be beneficial,
naïve confidence in such comments is unsafe for either the buyers or sellers. Many consumers read
research before making any online purchase. Moreover, the comments could be misleading for
additional benefit or profit, so any buying decision relied on web comments should be taken carefully.
Our work is mainly directed to SA at the document level, more specifically, on movie reviews dataset.
Machine learning techniques and SA methods are expected to have a major positive effect, especially for
the detection processes of fake reviews in restaurant reviews, e-commerce, social commerce
environments, and other domains. In machine learning-based techniques, algorithms such as SVM, NB,
and NLP are applied for the classification purposes SVM is a type of learning algorithm that represents
supervised machine learning approaches and it is an excellent successful prediction approach. The SVM
is also a robust classification approach.
The main goal of our study is to classify restaurant reviews as a real review or fake review using S A
algorithms with supervised learning techniques.
1.2 MOTIVATION
The focus of this research is to create an environment of online E-commerce industry where consumers
build trust in a platform where the products they purchase are genuine and feedbacks posted on these
websites/applications are true, are checked regularly by the company where the number of users is
increasing day by day, henceforth companies like Twitter, Whatsapp, Facebook use sentiment analysis to
check fake news, harmful/derogatory Posts and banning such users/organizations from using their
platforms. Parallel to those E-commerce (Flipkart, Amazon) industries, hotels booking (Triage), logistics,
tourism (Trip Advisor), job search (LinkedIn, Glass door), food (Swingy, Zomato), etc. Nowadays, online
buyers are so much aware and sensitive to product review. Usually before buying any product from e-
commerce website they use to read products reviews and ratings. That’s why it is too much necessary for
e-commerce website owner to keep watch on product review and its description. Users use to blame-
commerce website if they sell product with bad reviews rather than products manufacturers which may
ruin the reputation of e-commerce website brand. Sometime competitors use to give fake review to
improve their sells. Hence, it becomes too important for e-commerce website owner to detect fake product
reviews and remove it from portal by doing proper Sentimental Analysis, Natural Language Processing.
1
For such e-commerce website owners, we have to create proposed system which is fake review prediction
using machine learning.
1.3 Objectives
The process of design is used to turn a user-oriented input description into a computer system. This
design is crucial to prevent data entry errors and to show the proper administration of the computer
system for receiving right information.
It is possible to employ user-friendly interfaces for hand linger nervous volumes of data for the
entering of data.
The objective of the input design is to facility at data enter inland be free of errors. The screen of
entry of information is meant to perform all the handling of data. It also offers document viewing
facilities.
If the data are entered, their legitimacy will be verified. With the aid of the displays, data could be
input. Suitable notification share supplied indicating the user is not currently in the Input Design’s
goal is as input layout to be followed.
1.4 ANALYSISMODEL
We are using waterfall model:
2
1.4.1 System Design:
In this system design phase we design the system which is easily understood for end user i.e. user friendly.
We design some UML diagrams and data flow diagram to understand the system flow and system module
and sequence of execution.
1.4.2 Implementation:
In implementation phase of our project we have implemented various module required of successfully
getting expected outcome at the different module levels. With inputs from system design, the system is first
developed in small programs called units, which are integrated in the next phase.
1.4.3 Maintenance:
There are some issues which come up in the client environment. To fix those issues patches are released.
Also to enhance the product some better versions are released. Maintenance is done to deliver these
changes in the customer environment. All these phases are cascaded to each other in which progress is seen
as flowing steadily downwards like a waterfall through the phases. The next phase is started only after the
defined set of goals are achieved for previous phase and it is signed off, so the name “Waterfall Model”. In
this model phases do not overlap. As more people can access services and goods online, online shopping is
gradually increasing. More individuals are responding to what sellers are saying about their businesses,
which has irritated some people who then mislead others by spreading untrue information in an effort to
promote or harm a particular commodity or service's reputation. These individuals are referred to as
perception spammers, and the phony testimonials they leave are regarded as fake remarks. Although
customer reviews may be useful, placing naive trust in them could be dangerous for both consumers and
sellers. Before making any online purchase, many shoppers read research. Additionally, the remarks might
be deceptive in order to gain additional advantage or profit, therefore any purchasing decision based on
online comments should be carefully considered. Opinion spamming comes in a few distinct forms. One
type is endorsing certain products with the purpose to encourage endorsing products with misleading or
unfavourable evaluations to harm their reputation. The second category consists of product-neutral
advertisements. There has been a lot of research in the area of sentiment analysis, and models have been
developed while applying multiple sentiment analyses on data from diverse sources. However, the
algorithms themselves, rather than the actual fake review identification, are the main emphasis of this
research. Our work is mainly directed to SA at the document level, more specifically, on movie reviews
dataset. Machine learning techniques and SA methods are expected to have a major positive effect,
especially for the detection processes of fake reviews in restaurant reviews, e-commerce, social commerce
environments, and other domains. In machine learning-based techniques, algorithms such as SVM, NB, and
NLP are applied for the classification purposes. SVM is a type of learning algorithm that represents
supervised machine learning approaches and it is an excellent successful prediction approach. The SVM is
also a robust classification approach. The system offers a GUI environment that makes it simple to insert
data into the database via forms. Web-based applications must be used. Compatible with any browser
3
(mobile browser, pc browser). A database will serve as the information store. This program's user interface
is simply the standard Windows user interface; nothing more is needed. 99.9% of all new system users
should be able to utilise the proposed system application without any help thanks to the proposed system's
intuitive user interface. The PC on which this software will be installed must have Windows 7 and Python
IDE version 1.8 or higher. Python will be installed on that Windows platform in version 3 or higher, and
that is the platform used to run the specific piece of software. The Python IDE and the Microsoft SQL
Server will exchange data. The frequency and fidelity of information flow inside your organisation is
defined by your communication architecture. It aids in structuring your communication patterns, both
within and across departments. Each business will have its own unique set of strategies, but they all need
for proactive planning and investment. In e-commerce, user reviews play a significant role in determining
an organization's profitability. Online users read user reviews before selecting a new product or service.
Because it directly affects a company's reputation and bottom line, the reliability of online reviews is
crucial for organizations. As a result, some businesses use spammers to generate fake reviews. These false
reviews have an impact on consumers' purchasing decisions. Numerous studies on how to spot fake reviews
have been conducted in recent years. However, they still need a poll that can assess and summaries the
present tactics. The task of fake review detection is described in the survey work by Otto et al.[1], which
summarizes the existing datasets and their this survey thesis, we almost entirely review more than ten
research papers that provide diverse approaches to effective fake user identification through the use of
machine learning techniques. For each and every research topic, there is a publication outlining the benefits
of seeing issues early on and a paper outlining the disadvantages of the earlier paper [2]. Customers
consider online reviews carefully before making a purchase of a good or service. These are the primary
sources of information regarding the features of the service we intend to acquire that come from previous
customer experiences. The data set of hotel reviews is utilized in this study to introduce a number of
machine learning techniques, including Naive-Bayes, Support Vector Machine, and Decision Tree, for
sentiment analysis of review content and the detection of fraudulent online reviews. Sentiment analysis is
currently the most exciting component of text analysis. Using sentiment analysis, we can also discern
between positive and negative evaluations [3]. The suggested solution uses phase-wise processing to
categories user evaluations into suspect, fraudulent, favourable, and negative categories. In this study, we
use a variety of data mining approaches to process hotel evaluations. Additionally, user reviews are divided
into positive and negative categories so that customers may use them to decide which products to buy.
Service providers can track client opinions by carefully examining may lead to lesser demand and decrease
in sales. These fake/fraudulent reviews are deliberately written to trick potential customers to promote/hype
them or defame their reputations. Our work is aimed at identifying whether a review is fake or truthful one
[4]. The classifiers used in our project are Support vector Machine (SVM), and Naıve Bayes. The measured
results of our experiments show that the SVM algorithm outperforms other algorithms, and that it reaches
the highest accuracy not only in text classification, but also in detecting fake reviews [5].
4
1.4.4 Decision Trees algorithm
The most effective and well-liked technique for categorization and prediction is the decision tree. A
decision tree is a type of tree structure that resembles a flowchart, where each internal node represents a test
on an attribute, each branch a test result, and each leaf node (terminal node) a class labels [2].
To produce a single outcome random forest mixes the results of various decision trees. Its wide spread use
is motivated by its adaptability and usability because it can solve classification and regression issues [2].
The family of straight forward "probabilistic classifiers" known as "naive Bayes classifiers" in statistics is
based on the application of Bayes' theorem with strong (naive) independence assumptions between the
features. Despite being among the simplest Bayesian network models, they can reach great levels of
accuracy when used in conjunction with kernel density estimation. The number of parameters required for
naive Bayes classifiers is linear in the number of variables (features/predictors) in a learning problem,
making them extremely scalable. Instead than using an expensive iterative approximation, which is how
many other types of classifiers are trained, maximum-likelihood training can be accomplished by evaluating
a closed-form expression, which requires linear time. Simple Bayes and independent Bayes are two names
for naive Bayes models that can be found in statistics literature [1].
The study of how computers interact with human language, particularly how to design computers to process
and analyse massive volumes of natural language data, is known as natural language processing (NLP), a
subject of linguistics, computer science, and artificial intelligence. The ultimate goal is to create a machine
that can "understand" the contents of papers, including the linguistic nuances that arise from their context.
After that, the system can accurately extract the knowledge and insights from the papers as well as classify
and arrange the documents themselves [3].
5
1.4.9 Anaconda
Python is a mid-level object-oriented programming language that is simple to learn, easy to use, and flexible
enough to do a variety of tasks (Helm us & Collis, 2016). Following its introduction in 1991 (Van Rossum &
Darker, 1995), its open-source nature has significantly boosted its popularity, and it is now recognized as one
of the greatest programming languages to learn (Sabbath, Freeze, &Vento raj, 2019). Python is a
programming language that is accessible to everyone due to its low system requirements, free availability,
and cross-platform compatibility (Mac, Windows, and Linux).It already has a sizable community made up of
both regular people and eminent researchers who have produced fascinating projects in a number of different
sectors, such as data science, machine learning, artificial intelligence, game and app development, and more.
It is simple to find these projects by simply adding the phrase "Python" to any search due to the open-source
community's constant efforts to enhance the language's capabilities. Additionally, this community provides a
wealth of resources, such as tutorial videos, source codes, answers to commonly asked issues, courses, and
much more. The most prevalent problems that developers run into are addressed by these tools, which are
typically free to use and cover all complexity levels [7].
1.4.10 NumPy
In the middle of the 1990s, a multinational team of volunteers started working on creating a data format for
efficient array computation. This structure evolved into the modern N-dimensional NumPy array. The
NumPy package, which comprises the NumPy array and a variety of accompanying mathematical functions,
has been widely adopted in academia, government labs, and industry. It has applications in everything from
gaming to space exploration. A NumPy array is a multidimensional, regular collection of elements. An
array's shape and kind of components serve as its defining characteristics. An array of form (MN) holding
numbers, such as complex or floating-point integers, could, for example, be used to represent a matrix. In
contrast to matrices, NumPy arrays can have any number of dimensions. Additionally, they might contain a
variety of things (or even a combination of things), such dates or Booleans. Actually, a NumPy array is just a
handy technique to describe one or more blocks of computer memory that makes it easy to alter the numbers
represented. [8] Four well-known machine learning classification techniques for identifying false product
evaluations will be examined using this system. Reviews that are not screened can only receive ratings like
"helpful," "cool," and "funny," which means that as soon as the reviews are filtered by product, their opinions
are buried and cannot be created by others. Because an unbalanced dataset produced subpar results in our
experiment, it must be handled. We discovered during the experiment that Gaussian Naive Bayes consistently
produced poor test scores while SVM took the longest to train the model. In our opinion, we cannot say that
reviews got filtered by YELP recommendation system is 100% fake, because there are still other factors that
may lead machine learning into false prediction. Other techniques that are potentially reliable and can be
used for filtering review is using verified buyer method as some crowd source webs have been used. An input
description focused on users is transformed into a computer system using the design process. This design is
essential to preventing data entry errors and demonstrating effective computer system administration for
6
receiving accurate information. For the entry of data, it is possible to use user-friendly interfaces to handle
massive volumes of data. The input design's purpose is to facilitating data entry and being error-free the
information entering screen is designed to handle all data handling. It also provides options for viewing
documents. The veracity of the data will be examined if they are entered. Data entry was made possible with
the help of the displays. The user is given the appropriate messages indicating they are not currently in the
maize. Input Design’s goal is a simple input layout to be followed.
Extreme rating ratio of the reviewer [10], [11] is also an interesting feature. Fake reviewer will always give
either (1 or 5) star to convince people of their opinions, according to this, we calculated the extreme rate
(1star or 5stars) ratio for every re-viewer and used the ratio as one Feature of every review. For unique a
reviewers, the ratio of extreme rating (1or5).The number of extreme ratings by the reviser was computed by
dividing by the overall number of reviews. For all the unique reviewers, we calculated this value and fed this
value to their view, which was reviewed by the corresponding reviewers. There is no good solution be for
differentiate fake products from original products.MLtechnology can be helpful to tackle such problems. The
project’s main goal is to him people to identify the product is an original product or a fake product sing its
reviews. We proposed a fake product detection system using Technology as an web based application for the
detection of counter it products. The proposed system ensures that the detection of fake products in day to-
day life. The proposed system consists of three main parts, customer or user web based application,
Manufacturer’s or company’s we based application, and Database. The first application is the Manufacturers
or company side application in which we have to first register ourselves. After registration login in to the
application, we have some options. One option is to add a product in which the manufacturer can add the
product details. Another option is to show the order in which they can see customers ‘order details and after
that, they can decide the accept or reject the order. The manufacturer also can see the product is delivered or
not. A second application is the Customer application in which we have to first register in-app after that we
can login to the application using id and password. In this application, there is an option to show products
where customers can see the product
7
1.5.1 Mathematical Model
A System has represented by a 5-different phases, each phase works with own dependency System S= (Q,
Σ, δ, q, 0, F) where
Q={Via Set [i=0...........n]}set of generated attribute of various reviews as initial set Σ={data
conversion, save in DB}
∆={Correctly classified Instances*100 / Sum(x)}
q0 = {First event generated by sensor function Σ i=0 }
F= {Generated report according to class [a,b,c,. ,n]}
A data flow diagram (DFD) is a graphical representation of the ”flow” through an information system,
modelling its process aspects. A DFD is often used as a preliminary step to create an overview of the system,
which can later be elaborated. DFDs can also be used for the visualization of data processing.
DFD Level 0
8
DFD Level 1
1.5.3 ER Diagram
Entity Relationship Diagram, also known as ERD. ER Diagram or ER Model is a type of structural diagram
for use in database design. An ERD contains different symbols and connectors that visualize two important
information. The major entities with in the systems cope, and the inter-relationships among those entities.
Figure4.4: ER Diagram
9
1.5.4 Class Diagram
The class diagram is the main building block of object oriented modeling.The classes in a class diagram
represent both the main elements, interactions in the application, and the classes to be programmed. In the
design of a system, a number of classes are identified and grouped together in a class diagram that helps to
deter-mine the static relations between them. With detailed modelling, the classes of the conceptual design
are often splitting to a number of subclasses.
A Use case diagram at its simplest is are presentation of a user’s interaction with the system that shows the
relationship between the user and the different use cases in which the user is involved. The use cases are
represented by circle or ellipse. A key concept of use case modelling is that it helps us design a system from
the end user’s perspective.
10
Figure4.6: Use case Diagram
A sequence diagram simply depicts interaction between objects in a sequential order i.e. the order in which
these interaction stake place. We can a louse the term seventh diagrams or event scenarios to refer to a
sequence diagram. Sequence diagrams de-scribe how and in what order the objects in a system function.
These diagrams are widely used by businessmen and software developers to document and understand
requirements for new and existing systems.
Activity diagram is basically a flow chart to represent the flow from one activity to another activity. The
activity can be described as an operation of the system. The main element of an activity diagram is the
activity itself. An activity is a function performed by the system. After identifying the activities, we need
thunder stand how they are associated with constraints and conditions.
12
1.5.8 Component Diagram
Activity diagram is basically a flow chart to represent the flow from one activity to another activity. The
activity can be described as an operation of the system. The main element of an activity diagram is the
activity itself. An activity is a function performed by the system. After identifying the activities, we need to
understand how they are associated with constraints and conditions.
13
CHAPTER 2
LITERATURE REVIEW
2.1 LITERATURESURVEY
1. Rami Mohawesh, Shuxiang Xu, Son N. Tran, Robert Ollington, Matthew Springer, Yaser
Jararweh, And Sumbal Maqsood, “Fake Reviews Detection: A Survey”[1]. In e-commerce, user
reviews can play a significant role in determining the revenue of an organization. Online users rely on
reviews before making decisions about any product and service. As such, the credibility of online reviews
is crucial for businesses and can directly affect companies ‘reputation and profit ability. That is why some
businesses are paying spammers to post fake reviews. These fake reviews exploit consumer purchasing
decisions. Consequently, the techniques for detecting fake reviews have extensively been explore din the
past twelve years. However, there still lacks survey that can analyse and summaries the existing
approaches. To bridge up the issue, this survey paper details the task of fake review detection, summing
up the existing datasets and their collection methods. It analyses the existing feature extraction techniques.
It also summarizes and analyses the existing techniques critically to identify gaps based on two groups:
traditional statistical machine learning and deep learning methods. Further, we conduct benchmark study
to investigate the performance of different neural network model sand transformers that have not been
used for fake review detection yet. The experimental results on two benchmark data sets show that
Roberta performs about 7% better than the state-of-the-art methods in a mixed domain for the deception
dataset with the highest accuracy of 91.2%, which can be used as a baseline for future studies. Finally, we
highlight the current gaps in this research are a and the possible future directions.
2. Pilaka Anusha, Kaki Leela Prasad, “Survey on Fake Online Reviews using Machine Learning
Algorithms”, [2] In this present paper we try to con-duct a survey on several ML algorithms that are used
to solve the problem of spam reviews detection and try to see the solutions which are provided by several
reviewers In current days there is a huge demand for sentiment analysis for fake online reviews detection
using several ML algorithm lot of research work going on to identify the fake on liner reviews detection,
but not technique is a complete success in identifying and preventing the fake online reviews in effective
manner. In this survey this is, we nearly provide the review of more than 10 research papers suggesting
various methods adopted for the effective fake users detection.
Using machine learning techniques. In each and every research topic there is one part discussing more
about the importance of identifying tumour in early stages and another paper discussing the cons of the
previous paper.
14
3. Sabirakarim, Dr.Kiruthiga G, “Predicting Fake online Reviews using Machine Learning”, [3]
Online reviews are very important in decision making of customer whether to purchase a product or
service. These are main source of information getting from the past customer experience about the
features of that service which we are going to purchase. This paper introduces some machine learning
techniques like Na¨ıve-Bayes, Support Vector Machine and Decision Tree for sentiment classification of
reviews and to detect fake on line reviews using the data set of a Hotel reviews. Sentiment Analysis has
become most interesting in analysis of text. Using sentiment analysis we can separate negative and
positive reviews as well.
4. Manias Bans ode, Siddhi Pradesh, Suyasha Ovhal, Pranali Shinde, Anand kumar Birajda, “Fake
Review Prediction and Review Analysis”,[4] The propose method classifies users reviews into
suspicious,fake,positive and negative categories by phase-wise processing. In this paper, we are
processing hotel reviews by using different data mining techniques. More over the reviews obtained from
users are being classified into positive or negative which can be used by a consumer to select a product.
Organizations providing services can monitor customer sentiments by scrutinizing and understanding
what the customer sire thinking about products through reviews. This can help buyer stopper cassava
liable products and spend their money on quality products. Also in our model end users sees tar ratings
base done views for achhotel.
5. Shiplap dada, Dr.Gulbakshee Dharma, Khushali Mistry, “Fake Review Detection Using Machine
Learning Techniques”,[5] Online reviews playa very important role in today’s e-commerce for decision-
making. Large part of the population i.e. Customers read reviews of products or stores before making the
decision of what or from where to buy and whether to buy or not. As writing fake/fraudulent reviews
comes with monetary gain, there has been a huge increase in deceptive opinion spam on online review
websites. Basically fake review or fraudulent review or opinion spam is an untruthful review. Positive
reviews of a target object may attract more customers and increase sales; negative review of a target
object may lead to lesser demand and decrease in sales. These fake/fraudulent reviews are deliberately
written to trick potential customers in order to promote/hype them or defame the irreputations.Our work is
aimed at identifying whether a review is fake or truth felon.
6. Li-Chen Cheng, Hsiao-Wei Hu and Chia-Chi Wu , “Spammer Group Detection Using Machine
Learning Technology for Observation of New Spammer Behavioural Features”,[6] The recent
emergence of social media as a means of social communication has had profound effects on general
communication structures and the interactions between businesses, communities and individuals. Social
media gives organizations the opportunity to target a wider audience and establish connections within a
short span of time using limited resources (Chen, De, & Hu, 2015). These changes have also meant that
organizations now have to consider new ways of marketing their products and services (Trapp, 2016). The
development of social media has led to rapid growth in the amount of user-generated content which has
not had a big impact on purchasing behaviour, but affects the public perception of products/services, and
15
thus the business development landscape. Naturally this had drawn the attention of researchers and
marketers. Online consumer reviews have proven particularly influential in shaping the purchase
decisions of potential customers. Positive reviews can ensure the success of a product while negative
reviews can doom it to failure (Zhang, Zhou, Kehoe, &Kulich, 2016).Most of the research on social
media marketing has focused on the opportunities and advantages of these developments. Relatively little
work has been done examining the negative ramifications (Shires 2018). The negative impact of social
media marketing is illustrated by a report appearing on BBC about the fake web reviews of Samsung
products. The article made clear that Samsung was paying people to write negative reviews about HTC
products on several web forums in Taiwan. This action was judged to violate fair trade practices and thus
resulted in Samsung having to pay a 350 million USD to Taiwan’s Fair Trade Commission (FTC). The
case only came to light in 2013, when a hacker released confidential marketing documents which they
had obtained from Samsung Taiwan (Elmer-DeWitt, 2013).It has been shown that this was not an isolated
incident, that other firms, in efforts to cultivate a positive company image and improve sales, have taken
steps to manufacture positive (fake) reviews of their products/services (Wang, Day, and Lin, 2016). In
short, fake reviews are a growing problem that seriously undermines consumer trust in the review system.
Although these fake reviews are skilfully crafted to avoid detection, advances in machine learning
technology are opening the door to automated detection (Jindal & Liu, 2008). Zhang, Zhou, Kehoe, and
Kulich (2016) examined the predictive features that an automated system could use to detect which
reviews are fake and which are not. They categorized these predictive features as either verbal or
nonverbal. They defined verbal features as those extracted from the text of the review. Verbal features
dominate the set of predictive features used in existing fake review detection models. In contrast, the
nonverbal features are defined as the review posting behaviours and social interactions of reviewers with
other reviewers on social media, especially on online review platforms. The focus in the detection of fake
content has been on verbal focus features. Oat, Choi, Cardio, and Hancock (2011) built a prediction
model using content-related features. Xian, Wang, Lin, and Yu (2012) focused upon identifying fake
quantitative social information such as fake product rankings and ratings. Mukherjee, Liu, and Glance
(2012) carried out experiments to identify fake review data which had been posted on Yelp. Past studies
have proven that it is very hard to detect spammers (in this case the people who write fake reviews)
simply by reviewing the content features because of the subtle way such reviews (opinions) are produced.
This has motivated many researchers to strive to develop machine-learning methods which can be applied
to examine the nonverbal aspects of posted reviews based on the reviewers’ behaviour-related
characteristics (Lim, Nguyen, Jindal, Liu, &Law, 2010; Li, Huang, Yang, & Zhu, 2011). For example, it
has been found that fake reviews can be distinguished by their temporal patterns (Xian et al., 2012).
According to Mukherjee, Venkataraman et al. (2013b), behavioural features are far more effective than
linguistic n-grams in terms of detection performance. When examining nonverbal features, it is important
to observe patterns in the way spammers work. The earnings of spammers are usually based on the
number of reviews they post. Thus, many of the fake reviews they produce (in particular, replies) do not
16
necessarily even express an opinion about the product under discussion. It is often the case that fake
review posts are meant only to keep the discussion alive or attract attention to the threads pertaining to the
objectives of their campaign.
7. Ahmed M. Elmogy, Usman Tariq, “Fake Reviews Detection using Supervised Machine
Learning”, when customers want to draw a decision about services or products, reviews become the
main source of their information. For example, when customers take the initiation to book a hotel, they
read the reviews on the opinions of other customers on the hotel services. Depending on the feedback of
the reviews, they decide to book room or not. If they came to a positive feedback from the reviews, they
probably proceed to book the room. Thus, historical reviews became very credible sources of information
to most people in several online services. Since, reviews are considered forms of sharing authentic
feedback about positive or negative services, any attempt to manipulate those reviews by writing
misleading or inauthentic content is considered as deceptive action and such reviews are labelled as fake
[1].Such case leads us to think what if not all the written reviews are honest or credible. What if some of
these reviews are fake. Thus, detecting fake review has become and still in the state of active and required
research area [2]. Machine learning techniques can provide a big contribution to detect fake reviews of
web contents. Generally, web mining techniques [3] find and extract useful information using several
machine learning algorithms. One of the web mining tasks is content mining. A traditional example of
content mining is opinion mining [4] which is concerned of finding the sentiment of text (positive or
negative) by machine learning where a classifier is trained to analyse the features of the reviews together
with the sentiments. Usually, fake reviews detection depends not only on the category of reviews but also
on certain features that are not directly connected to the content. Building features of reviews normally
involves text and natural language processing NLP. However, fake reviews may require building other
features linked to the reviewer himself like for example review time/date or his writing styles. Thus the
successful fake reviews detection lies on the construction of meaningful features extraction of the
reviewers. To this end, this paper applies several machine learning classifiers to identify fake reviews
based on the content of the reviews as well as several extracted features from the reviewers. We apply the
classifiers on real corpus of reviews taken from Yelp [5]. Besides the normal natural language processing
on the corpus to extract and feed the features of the reviews to the classifiers, the paper also applies
several features engineering on the corpus to extract various behaviours of the reviewers. The paper
compares the impact of extracted features of the reviewers if they are taken into consideration within the
classifiers. The papers compares the results in the absence and the presence of the extracted features in
two different language models namely TF-IDF with bi-grams and TF-IDF with tri-grams. The results
indicates that the engineered features increase the performance of fake reviews detection process.
17
8. Nidhi A. Patel; Rakesh Patel, “A Survey on Fake Review Detection using Machine Learning
Techniques”, As the Internet continues to grow in both size and importance, the quantity and impact of
online reviews continually increases. Reviews can influence people across a broad spectrum of industries,
but are particularly important in the realm of e-commerce, where comments and reviews regarding
products and services are often the most convenient, if not the only, way for a buyer to make a decision
on whether or not to buy them. Online reviews may be generated for a variety of reasons. Often, in an
effort to improve and enhance their businesses, online retailers and service providers may ask their
customers to provide feedback about their experience with the products or services they have bought, and
whether they were satisfied or not. Customers may also feel inclined to review a product or service if they
had an exceptionally good or bad experience with it. While online reviews can be helpful, blind trust of
these reviews is dangerous for both the seller and buyer. Many look at online reviews before placing any
online order; however, the reviews may be poisoned or faked for profit or gain, thus any decision based
on online reviews must be made cautiously. Furthermore, business owners might give incentives to
whoever writes good reviews about their merchandise, or might pay someone to write bad reviews about
their competitor’s products or services. These fake reviews are considered review spam and can have a
great impact in the online marketplace due to the importance of reviews. Review spam can also
negatively impact businesses due to loss in consumer trust. The issue is severe enough to have attracted
the attention of mainstream media and governments. For example, the BBC and New York Times have
reported that “fake reviews are becoming a common problem on the Web, and a photography company
was recently subjected to hundreds of defamatory consumer reviews” [1]. In 2014, the Canadian
Government issued a warning “encouraging consumers to be wary of fake online endorsements that give
the impression that they have been made by ordinary consumers” and estimated that a third of all online
reviews were fakeFootnote1. As review spam is a pervasive and damaging problem, developing methods
to help businesses and consumers distinguish truthful reviews from fake ones is an important, but
challenging problem. In the literature, review spam has been categorized into three groups, proposed by
Dixit et al. [2]: (1) Untruthful Reviews -- the main concern of this paper, (2) Reviews on Brands -- where
the comments are only concerned with the brand or the seller of the product and fail to review the
product, and (3) Non-Reviews -- those reviews that contain either unrelated text or advertisements. The
first category, untruthful reviews, is of most concern as they undermine the integrity of the online review
system. Detection of type 1 review spam is a challenging task as it is difficult, if not impossible, to
distinguish between fake and real reviews by manually reading them. To illustrate the difficulty of this
task, we consider a real and fake example from the dataset created by Otto et al. [3]. As a human judge it
is difficult to confidently ascertain which review is fake and which is authentic.
9. DongZhang, WenwenLi, BaozhuangNiu, ChongWu, “A deep learning approach for detecting fake
reviewers: Exploiting reviewing behaviour and textual information” Online consumer reviews (OCRs)
play an essential role in assessing the quality of a product before consumers make informed decisions [1].
The past few years have witnessed increasing customer trust in OCRs [2]. According to a recent survey1,
18
nearly 80% of consumers trust OCRs as much as personal recommendations from friends or family, and
more than 90% of consumers read OCRs before making a purchase decision. However, as with many cases
on the internet [3], fake online reviews are becoming increasingly prominent. An important reason is that
the benefits of trading fake reviews are evident and proven. The Federal Trade Commission (FTC) points
out that the outlay on fake reviews offers a 20 times payoff.2 Therefore, firms or retailers have strong
incentives to leverage fake online reviews to influence consumers, contributing to a booming market for
fake online reviews. For example, in 2019, FTC found that Sunday Riley Skincare misled consumers by
posting fake online reviews of its products for nearly two years.3 Fake online reviews affect consumer trust
and thus impact their purchase decision [4,5]. Besides, early fake online reviews negatively impact
subsequent reviews [6]. In essence, fake online reviews are posted by fake reviewers (opinion spammers)
who often exhibit anomalous behaviour. Fake reviewer is the leading cause of misinformation on e-
commerce platforms. Therefore, it becomes critical and urgent to develop effective methods to detect fake
reviewers to maintain the authenticity of online reviews.
It is challenging to detect these reviewers due to the complexity of the reviewer's behaviour and textual
information. Prior studies have derived behaviour-related and text-related features and fed them into
machine learning approaches, including supervised classification [7,8] and unsupervised classification
[9,10] to detect fake reviewers automatically.
Despite their important contributions to fake reviewer detection, there are still several limitations. First,
although the importance of leveraging behavioural features in fake reviewer detection has been
demonstrated [4, 10], much of the research focuses on deriving novel behavioural features, which requires
expensive human labour and expertise. Second, in addition to behavioural features, text features, such as n-
grams (bag of words) [11], part of speech n-grams [12], and word embedding [13], have been utilized to
improve detection performance. However, these text features could negatively impact the detection
performance of fake reviewers [8]. The bag of words (Bow) assumption considers a document as a bag of
unordered words [14] and extracts features based on word frequency [15]. If an online review is full of
informal words, abbreviations, and even obfuscated words, a feature vector for such a review is often very
sparse and thus could negatively impact the detection performance. Linguistic features such as POS n-grams
can be extracted from online reviews for fake reviewer detection. Such features may have difficulty
detecting experienced fake reviewers. They attempt to sound convincing by using words or phrases that
appear almost as frequently in genuine reviews as they do in fake reviews. They only overuse a small
number of words in fake reviews, thus making them sound genuine. However, the small number of such
words may not appear in every fake review, which explains why n-grams are less effective at classifying
fake versus non-fake reviewers. Word embedding techniques such as Word2Vec capture limited semantic
information because they leverage a static embedding vector for a word in different contexts. Such
techniques may negatively impact detection performance when reviews contain words with different
semantic meanings in different contexts. To address the first challenge, the feature learning of behavioural
19
features can be leveraged to improve detection performance. Feature learning is characterized by learning
representations for specific tasks from raw data [16]. Compared with deriving novel behavioural features,
feature learning requires less human labour, expertise, and can learn the underlying patterns of raw
behavioural data. To address the second issue, we leverage the most advanced pre-trained language model,
Long former [17], to generate contextualized text representations from online reviews. Compared with
traditional linguistic features, contextualized text representation can capture more semantic information
from text inputs [18]. We then can utilize deep learning models to extract valuable features from the
contextualized text representations and perform corresponding classification tasks. Therefore, we propose a
novel deep learning-based framework for fake reviewer detection. The framework has two key novelties:
a. we proposed a behaviour-sensitive feature extractor that leverages the convolution filter to learn the
underlying patterns of behavioural features.
b. We design a novel context-aware attention mechanism, incorporating the most advanced pre-trained
language model (Long former) and other deep learning classifiers to extract valuable features from online
reviews.
10. Priyanka Gupta, Bharathi Raja Chakravarthi, Shriya Gandhi,” Leveraging Transfer learning
techniques- BERT, Roberta, ALBERT and Distil BERT for Fake Review Detection”, In this era of the
internet, the online review system has grown tremendously, where customers share their first-hand
experiences about the products or services. These reviews influence the purchasing decision of future
customers and have a positive or negative financial impact on businesses. Spam reviews are written with an
agenda to promote or demote a business and mislead the customers. Hence to maintain the integrity of the
online review system, it is crucial to detect fake reviews. To overcome the limitations of traditional machine
learning and neural network − based models, we have leveraged transfer learning and used transformer-
based pre-trained models BERT, Roberta, ALBERT, and Distil BERT to build fake review classifier.
Performance of all the models is evaluated, considering accuracy and weighted F1-source as the primary
metric for evaluation. The classifier produced using Roberta has outperformed the baseline model in
detecting fake reviews.
11. JayeshSoni, NagarajanPrabaka, “Effective Machine Learning Approach to Detect Groups of Fake
Reviewers”, the problem of detecting fake reviews/reviewers have gained much interest in recent years. It
can be summarized into three categories: fake reviewer detection, fake review content detection and
detection of groups of fake reviewers. For example, Otto. et al. [5] use linguistic features analysis of review
text to identify fraudulent reviews ; Liu et al. [1] employed duplicate reviews as fake reviews to train
classifiers; Xian et al. [15] used temporal analysis to detect reviewers who write singleton reviews; Lim et
al. [4] use behavioural features in rating patterns to detect fraudulent reviewers. Recently, there is an
increased interest in detecting groups of fake reviewers. Mukherjee et al. [8] introduced frequent item set
mining technique to generate candidate review spammer groups that take reviewer as items, and targeted
products as transactions. Based on these candidate groups, many other computing frameworks have been
20
proposed to evaluate the suspiciousness of spammers. Xu et al. [12] introduce a KNN based approach to
detect the labels for each reviewer. [8] Proposed Grana to rank candidate groups that capture the
relationship among candidate groups, target products, and individual reviewers. Leman et al. [16] propose
FRAUDEAGLE framework which uses the relational structure among reviewers and products to rank fake
reviewers. Shubuta et al. [9] propose SPEAGLE, which extends FRAUDEAGLE with the introduction of
review nodes and additional information (e.g., star ratings, timestamps, etc.) that greatly improve the
ranking precision. Xu et al. [17] propose FRAUDINFORMER framework to detect a group of fake
reviewers via heterogeneous pairwise features extracted from rating behaviours and linguistic patterns.
Unlike the above-mentioned approaches to detect a group of fake reviewers, we propose a deep-walk based
computing approach which is solely based on the topological structure of the reviewer graph revealing the
behavioural similarity between the reviewers. Review spamming techniques are evolving continuously.
Although many spamming detection techniques are being proposed by researchers, there is no overall
success in discovering all kinds of spamming strategies. The best strategy is to use a combination of
relevant techniques.
12. Shwet Mani, SnehaKumari, Ayushi Jain & Prabhat Kumar, “Spam Review Detection Using
Ensemble Machine Learning”, the importance of consumer reviews has evolved significantly with
increasing inclination towards e-Commerce. Potential consumers exhibit sincere intents in seeking opinions
of other consumers. These consumers have had a usage experience of the products they are intending to
make a purchase decision on. The underlying businesses also deem it fit to ascertain common public
opinions regarding the quality of their products as well as services. However, the consumer reviews have
bulked over time to such an extent that it has become a highly challenging task to read all the reviews and
detect their genuineness. Hence, it is crucial to manage reviews since spammers can manipulate the reviews
to demote or promote wrong product. The paper proposes an algorithm for detecting the fake reviews. Since
the proposed work concentrates only on text. So, n-gram (unigram + bigram) features are used. Supervised
learning technique is used for reviews filtering. The proposed algorithm considers the combination of
multiple learning algorithms for better predictive performance. The obtained results clearly indicate that
using only simple features like n-gram, Ensemble can boost efficiency of algorithm at significant level.
13. Naznin Sultana, Prof. Slapping Palaniappan, “Deceptive Opinion Detection Using Machine
Learning Techniques”, with the widespread use of internet and web technology e-commerce web sites and
online marketplace plays an important role to reach wider customers in a very short time. So the number of
online reviews by customer is increasing as well. These e-commerce web sites consist of an enormous
amount of data about customers’ and consumers’ product experiences and their opinion about the product.
This information often acts as an indicator of the quality of products and thus has a great impact on
purchasing decisions of consumers, retailers, and manufacturers. In these online platform customers usually
express their opinion as text reviews which become ubiquitous and assist buyers in making purchase
decisions. So customers reviews are now become a crucial part of doing business online. In order to boost
21
up and raise their businesses, business owners, retailers and service providers often expects and ask their
potential customers to provide some positive comments about their products/services they have
bought/used. While online reviews can be helpful in most of the cases, however, sometimes these reviews
can be hazardous for both the retailer and purchaser when the reviews are fake. Business owners sometimes
hire third party via the internet or from some other source to write fake reviews, either paid or unpaid basis.
They may write good reviews about their merchandise or bad reviews towards their competitors’
products/services. These fake reviews are called deceptive opinion or review spam which has a great impact
in e-marketplace nowadays. Deceptive opinion has a negative impact on business due to the loss in
customers and consumers trust. As review spam becomes a prevalent and widespread problem, so the
development of some methods to help businesses and customer to identify truthful reviews from fake ones
are the most needed task. Review spam is somewhat related to the web or email spam but since it deals with
false opinions, so it is much harder to detect than the other two spam. So the existing methods for detecting
web spam and email spam [1, 2, 3], is not suitable for review spam. Spam reviews can be of different types.
According to literature [4], opinion spam can be categorized into two types:
14. Muhammad Saad Javed,Hammad Majeed,Hasan Mustafa & Mize Omer Beg, “Fake reviews
classification using deep learning ensemble of shallow convolutions”, Online reviews have a decisive
impact on consumers’ purchasing decisions. This opens the doors for spammers and scammers to post fake
reviews for promoting non-existent products or undermine competitor products to affect social behaviour.
Thus, the identification of reviews as fake and real has become ever more important. Traditional approaches
for text classification use a bag-of-words model to represent text which causes sparsely and word
representations learnt from neural networks with limited ability to handle unknown words. In this paper, we
propose a technique based on three different models trained on the idea of a multi-view learning technique
and create an ensemble of all models by employing an aggregation technique for generating final
predictions. The core idea of our methodology is to extract rich information from the text of reviews by
combining bag-of-n-grams and parallel convolution neural networks (CNNs). By using an n-gram
embedding layer with small kernel sizes we can use local context with the same computation power as
required to train deep and complex CNNs. Our CNN-based architecture consumes n-gram embedding’s as
input and uses the parallel convolutional blocks to extract richer feature representations from text. Our
approach for the detection of fake reviews also combines textual linguistic features and non-textual features
related to reviewer behaviour. We evaluate our approach on publically available Yelp Filtered Dataset and
achieve F1 scores of up to 92% for classifying fake reviews.
15. Heeder Ahmed, IssaTraore & SherifSaad, “Detection of Online Fake News Using N-Gram
Analysis and Machine Learning Techniques” Fake news is a phenomenon which is having a significant
impact on our social life, in particular in the political world. Fake news detection is an emerging research
area which is gaining interest but involved some challenges due to the limited amount of resources (i.e.,
datasets, published literature) available. We propose in this paper, a fake news detection model that use n-
gram analysis and machine learning techniques. We investigate and compare two different features
extraction techniques and six different machine classification techniques. Experimental evaluation yields the
best performance using Term Frequency-Inverted Document Frequency (TF-IDF) as feature extraction
technique, and Linear Support Vector Machine (LSVM) as a classifier, with an accuracy of 92%. ML is the
umbrella term for AI and refers to intelligence demonstrated by machines or computer systems in contrast to
the natural intelligence displayed by humans. AI Enables us to enhance the performance of mechanical,
analytical, intuitive, and empathetic tasks in the service setting (Huang and Rust, 2018). Online review
platforms and service providers must manage and analyse online review data that is constantly growing in
volume, variety, and velocity (Singh et al., 2017). Given that AI-based ML can be an essential method of
processing and analysing online reviews, recent studies in the service industry, computer science, and
23
information systems (IS) fields have used various ML techniques along with web-scraping, text-mining, and
sentiment analysis techniques to analyses the impact of sentiments embedded in online reviews (Bazooka et
al., 2019; Kegan and Murthy, 2019; Lee et al., 2021), predict firm bankruptcy (Kim, 2011), and predict or
detect fake online reviews (Wu et al., 2020). A few studies have recently started utilizing these ML and text-
mining techniques in the service literature to propose a methodological procedure of predicting fake online
reviews or developing ML-based fake review detection models (e.g. Martinez-Torres and Total, 2019).
16. Pert Hajek, Aliaksandr Barushka & Michal Munk, “Fake consumer review detection using deep
neural networks integrating word embedding and emotion mining”, Fake consumer review detection has
attracted much interest in recent years owing to the increasing number of Internet purchases. Existing
approaches to detect fake consumer reviews use the review content, product and reviewer information and
other features to detect fake reviews. However, as shown in recent studies, the semantic meaning of reviews
might be particularly important for text classification. In addition, the emotions hidden in the reviews may
represent another potential indicator of fake content. To improve the performance of fake review detection,
here we propose two neural network models that integrate traditional bag-of-words as well as the word
context and consumer emotions. Specifically, the models learn document-level representation by using three
sets of features: (1) n-grams, (2) word embedding and (3) various lexicon-based emotion indicators. Such a
high-dimensional feature representation is used to classify fake reviews into four domains. To demonstrate
the effectiveness of the presented detection systems, we compare their classification performance with
several state-of-the-art methods for fake review detection. The proposed systems perform well on all datasets,
irrespective of their sentiment polarity and product category. For investigating the linguistic differences
between both truthful and spam reviews, so the authors of this study observed that spam reviews that focus
on the information given on a product page are more difficult to be read than true reviews[11]. In reference
[8], authors have introduced research for detecting review spammers using behavioural features. They
developed a model to classify spammers based on amazon's product reviews dataset (11,083 labelled reviews
and reviewers) by using linear regression approach. Analysing the results of yelp filtering fake reviews
algorithm used in yelp.com website. This algorithm is employed to filter fake and truthful reviews. The used
dataset in this study is real-life yelp dataset that consists of 5678 reviews and 5124 hotel reviewers in
addition to 58517 reviews and 35593 restaurant reviewers. Two types of features studied in this experiment,
which are linguistic features that include word unigram, word bigram and part of speech. Regarding
reviewers’ behavioural features that consist of a higher number of reviews, review length, proportion of
positive reviews, maximum content and similarity reviewers’ deviation. The accuracy reported was 86% with
implementation of SVM technique.
17. Saleh Nagi Alsubari1 ,Mahesh B.Shelke, Sachin N.Deshmukh,“Fake Reviews Identification Based
on Deep Computational Linguistic Features”, E-commerce platform has become an important resource of
information. It takes into account the feedbacks of consumers about products and services purchased from the
online website, these feedbacks are named as reviews. Online websites provide consumers with the ability to
24
write product or service reviews after buying, so that when new customers make decisions to buy products or
services from the online website, they read the recommendations or reviews written by people who have
experienced the product or service. Those reviews, however, may be trusted (real) or spam (fake) reviews. E-
commerce website fraudsters who deceive potential customers and reputation businesses or defame them can
intentionally write fake reviews. Consequently, fake review detection techniques are essentially required for
classification of reviews as fake (spam) or trusted (genuine) review. Main objective of this paper is to
analyse, identify and detect the fake reviews of electronic products dataset that relate to different USA cities.
In this paper, we investigate several feature extraction techniques such as LIWC, sentiment analysis, POS
and subjectivity. Based on these methods, we extract set of features from the review text like authenticity,
analytic thinking, polarity, objective, subjective, counts of adjective, verb, nouns and adverbs. For feature
selection, we used an IG (Information Gain) to select discriminative and highest features. Three different
supervised machine-learning techniques are Decision tree, Random forest and Adaptive boosting are applied
for classification the reviews as fake or trusted and the achieved results were 96 %, 94% and 97 % in the term
of accuracy respectively. An advance in web 2.0 has increased the movement towards online purchases via
Ecommerce Website. Internet access is increasingly growing nowadays due to its availability in both rural
and urban areas making the world digital. Most of consumers procure their daily needs such as products, or
services from online Ecommerce websites, so before purchasing process takes place, they go through posted
reviews to see the experience of previous consumers towards products or services. Fake reviews posted in
Ecommerce websites represent opinions of customers in which these reviews play a crucial role in e-business
because they can indirectly affect future buying decisions. Manufacturing companies are currently using
customer reviews to detect product problems and find information about their competitors on market
intelligence. As these reviews effect the buyer’s side, several persons provide deceptive reviews to improve
the purchasing of products found on sites of e-commerce. These people are primarily known as review
spammers and their practices are called as review spamming. Review spamming involves adding misleading
or false information in reviews to misguiding customers and affecting company revenues. Fake opinions can
be classified into three types:
1) Untruthful (fake) opinions. 2) Review on brand only.3) Non-reviews. Untruthful (fake) opinions can be
written deliberately to deceive readers or opinion mining systems. Such reviews represent unworthy positive
reviews (opinions) for particular target products in order to support them and give negative reviews to worthy
products for defaming them. This type of review is known as hyper spam review. Second type of fake
opinions is review on brand only; these reviews can be posted and affected the brand of suppliers or retailers.
Third type of fake opinions is non-reviews which consists of two subsets such as (a) Announcements and (b)
unrelated reviews, both include inquiries, replies or unspecified texts [1]. Large numbers of positive reviews
encourage a customer to buy product and enhance manufacturer’s financial gain, while negative reviews lead
customers to seek alternatives and thus cause financial losses [2]. Since customer’s reviews can have a major
impact on the credibility of the brands and products, so companies will be encouraged to generate positive
deceptive reviews to their own brands and deceptive reviews on their competitors’ brands [3]. There are
25
different ways to spam the online e-commerce website with deceptive reviews for example, hiring specialist
firms specialized in generating spam reviews, employing crowd-sourcing sites to use review spammers or
using automated feedback software bots [4]. Reviews posted by those who have not experienced the topics
are known as fake reviews and the person who produces the fake reviews is named as an individual review
spammer.
18. Dibyajyoti Baishya, Joon Jyoti Deka, Gaurav Dey & Pranav Kumar Singh, “SAFER: Sentiment
Analysis-Based Fake Review Detection in E-Commerce Using Deep Learning”, the problem of fake
deceptive reviews has become a threatening aspect for online users in recent years. With the evolution of the
online markets, the trend towards fake reviews has increased, mainly to attract or distract customers. Fake
reviews have affected both customers and sellers. These reviews consist of writings and spreading misleading
information and beliefs. Sentiment analysis was first introduced a few years ago in the e-commerce sector. It
is an emerging research area today due to the rapid growth in the e-commerce industry. The biggest challenge
in detecting fake reviews is the lack of an effective way to distinguish fake reviews from legitimate reviews.
The difference cannot be seen with the naked eye and is, therefore, a severe concern. In this paper, we have
applied the bag of words model and glove embedding matrix with a focus on fake reviews. We have used two
different feature extraction techniques and three new deep-learning algorithms on text classifications. The
experimental analysis with an existing public dataset showed good and better results compared to the
traditional machine-learning models.
19.Michael Crawford, Taghi M. Khoshgoftaar, Joseph D. Prusa, Aaron N. Richter & Hamzah Al
Najada, “Survey of review spam detection using machine learning techniques.”, As the Internet
continues to grow in both size and importance, the quantity and impact of online reviews continually
increases. Reviews can influence people across a broad spectrum of industries, but are particularly important
in the realm of e-commerce, where comments and reviews regarding products and services are often the most
convenient, if not the only, way for a buyer to make a decision on whether or not to buy them. Online
reviews may be generated for a variety of reasons. Often, in an effort to improve and enhance their
businesses, online retailers and service providers may ask their customers to provide feedback about their
experience with the products or services they have bought, and whether they were satisfied or not. Customers
may also feel inclined to review a product or service if they had an exceptionally good or bad experience
with it. While online reviews can be helpful, blind trust of these reviews is dangerous for both the seller and
buyer. Many look at online reviews before placing any online order; however, the reviews may be poisoned
or faked for profit or gain, thus any decision based on online reviews must be made cautiously. Furthermore,
business owners might give incentives to whoever writes good reviews about their merchandise, or might pay
someone to write bad reviews about their competitor’s products or services. These fake reviews are
considered review spam and can have a great impact in the online marketplace due to the importance of
reviews. Review spam can also negatively impact businesses due to loss in consumer trust. The issue is
severe enough to have attracted the attention of mainstream media and governments. For example, the BBC
26
and New York Times have reported that “fake reviews are becoming a common problem on the Web, and a
photography company was recently subjected to hundreds of defamatory consumer reviews” [1]. In 2014, the
Canadian Government issued a warning “encouraging consumers to be wary of fake online endorsements that
give the impression that they have been made by ordinary consumers” and estimated that a third of all online
reviews were fake.. As review spam is a pervasive and damaging problem, developing methods to help
businesses and consumers distinguish truthful reviews from fake ones is an important, but challenging
problem. In the literature, review spam has been categorized into three groups, proposed by Dixit et al. [2]:
(1) Untruthful Reviews -- the main concern of this paper, (2) Reviews on Brands -- where the comments are
only concerned with the brand or the seller of the product and fail to review the product, and (3) Non-
Reviews -- those reviews that contain either unrelated text or advertisements. The first category, untruthful
reviews, is of most concern as they undermine the integrity of the online review system. Detection of type 1
review spam is a challenging task as it is difficult, if not impossible, to distinguish between fake and real
reviews by manually reading them. To illustrate the difficulty of this task, we consider a real and fake
example from the dataset created by Otto et al. [3]. As a human judge it is difficult to confidently ascertain
which review is fake and which is authentic.
20. Ms.Rajshri P.Kashti1, Dr.Prakash S.Prasad, “Enhancing NLP Techniques for Fake Review
Detection”, Online shopping is increasing day by day as every product and service is getting available easily.
Vendors are getting more response to their business. More and more Mobile apps are available for online
Shopping and hence also it is easier for customer to purchase any item on a click and he/she can post their
reviews without much complications. People can post their views or opinions on tons of thousands of
discussion groups, internet community, and forums, product/service reviews, and blogs etc. These things can
be cooperatively called user-generated contents. Usually, these user-generated comments are written in
natural language and people have the freedom to give their opinion as they want, as there is no monitoring
system available till now. Sharing a personal view about a particular product or a service that has experienced
by an individual is referred to as reviews. Online reviews can create a great impact on people across a
comprehensive band of industries, but are more important in the world of e-commerce, where personal
opinion and reviews on products or services are considered to be useful to make a decision whether to
purchase a product or avail service. Some people usually disgruntled type of people misdirect others by
posting fake reviews to promote or harm the reputation of some particular products or services as per desire.
These persons are labelled as opinion spammers and the misleading comments they provide are called fake
reviews. New buyers give importance to the feedback given by other users as do the companies that sell such
products, today’s individuals and older ones extensively rely on reviews available on line. People make their
decisions of whether to purchase the products or not by analysing and reflecting the existing opinions on
those products. There are positive and negative reviews if the overall impression is not proper, it is doubtful
27
that they don’t buy the product. Now the customers can write any opinion text, which motivates people to
give fake review of the particular product.
21. Hafiz Yaris Ghafoor,ArfanJaffar,Rashid Jahangir, Muhammad Waseem Iqbal & Muhammad
Zahid Abbas, “Fake News Identification on Social Media Using Machine Learning Techniques”, The
devastating effect of spreading fake news related to politics, health, and customer reviews cannot be
neglected over social media on the decision-making approach of an individual. The problem of fake news
needs the attention of social media administrators, law enforcement agencies, and academic researchers. To
handle this issue, researchers suggested various artificial intelligence techniques. However, most of the
studies used only a specific type of news that leads to dataset biases. This study used three different standard
datasets collected from Cagle and GitHub. Pre-processed the datasets to remove unwanted text. Then these
pre-processed datasets are applied on three classifiers: passive aggressive, machine learning, and naïve Bayes
of 30–70, 40–60, 50–50, 60–40, and 70–30, respectively. To evaluate the performance accuracy, precision
and recall are used. Results clearly show that this study outperforms the state-of-the-art techniques.
23. Daya L. Mevada, Prof.VirajDaxini in their paper “An opinion spam analyser for product Reviews
using supervised machine Learning method.” It suggested method to find opinion spam from huge amount
of unstructured data has become an important research problem. This research proposes an opinion spam
analyser which automatically classifies input text data into either spam or non-spam category. The proposed
system will use machine learning supervised technique.
24. Miss. Rashmi Gomatesh Adike, Prof.Vivekanand Reddy “Detection of Fake Review and Brand
Spam Using Data Mining Technique”. This system proposes a behavioral approach to identify review
spammers those who are trying to manipulate the ratings on some products. Author derive an aggregated
behavior methods for rank reviewers based on the degree that they have demonstrated the spamming
behaviors. They verified proposed methods by conducting user evaluation on an Amazon dataset which
contains reviews of different company’s products.
25. Arjun Mukherjee, Vivek Venkataraman,Bing Liu,Natalie Glance studied and presented paper on
“Fake review detection: Classification and analysis of real and pseudo reviews.” This paper performed
an in-depth investigation of supervised learning for fake review detection using Amazon Mechanical Turk
(AMT) produced fake reviews and real-life fake reviews. The work in [36] showed that using AMT fake
reviews and reviews (assumed no fake) from Trip advisor achieved the classification accuracy of 89.6% with
bigram features and balanced data. This paper first performed a comparison using real-life filtered (fake) and
28
unfiltered (non-fake) reviews in Yelp. The results showed that the real-life data is much harder to classify,
with an accuracy of only 67.8%. This prompted us to propose a novel and principled method to uncover the
precise difference between the two types of fake reviews using KL-divergence and its asymmetric property.
26. Jewie Li, MyleOtt, Claire Cardio and Eduard Hovey “Towards a General Rule for Identifying
Deceptive Opinion Spam.” In this work, they have developed a multi-domain large-scale dataset containing
gold-standard deceptive opinion spam. It includes reviews of Hotels, Restaurants and Doctors, generated
through crowdsourcing and domain experts. Study of data uses SAGE to make observations about the
respects in which truthful and deceptive text differs. Suggested model includes several domain independent
features that shed light on these differences, which further allow formulating some general rules for
recognizing deceptive opinion spam.
27. Hu Minqing and Liu Bing “Mining and summarizing customer reviews” extract the features of the
product. The customer’s sentiment to individually feature of the product is shown via a summarization
system, which includes- mining features of the product that have been commented upon by customers,
categorize whether each view sentence in a review is positive or negative, and summarizing the results.
28. Liu,Pan,et al ventured “Identifying Indicators of Fake Reviews Based on Spammer's Behavior
Features.” in social networking websites they provide the user rate basis on various factor like their total
consumption, activeness of user etc. base to classify whether review is spam or not.
29. Lim Ee-Peng, Nguyen Viet-An, Jindal Nitin, et al. identified and demonstrated some characteristic
behaviors of review spammers by proposing scoring methods for measuring the degree of spam for each
reviewer. Then, a subset of extremely suspicious reviewers is selected for additional review with the help of
web-based spammer evaluation software specially developed for user evaluation experiments.
30. M.N.Istiaq Ahsan, Tamzid Nahian, Abdullah All Kafi, Md.Ismail Hossain, Faisal Muhammad
Shah proposed “Review Spam Detection using Active Learning.” This paper explores the opportunities of
introducing active learning for detecting Review spams conducted on real life data which shows promising
results. During the process, they trained model using active learning method which learns from the best
samples in multiple iterations. Training dataset is used by the algorithm to train the model and test dataset is
used later for evaluation. Certain number of samples from unlabelled dataset are selected for training and
after estimation they get added into existing train dataset. The model will start training again with the new
improved training set. The selection of unlabelled samples is based on a decision function which is the
distance of the samples X to the separating hyper plane. Although the distance is between [-1, 1], we use
absolute values because we need the confidence levels. It is Unexpected in accuracy among current
algorithms. It runs efficiently on big databases. It can handle thousands of input variables without variable
removal. It gives approximations of what variables are important in the classification. It creates an inner
unbiased estimate of the simplification error as the forest building growths. It has an effective method for
estimating lost data and maintains accuracy when a large proportion of the data are missing. It has methods
29
for balancing error in class people unbalanced data sets. Made forests can be saved for future use on other
data. Prototypes are calculated that give data about the relation between the variables and the classification. It
calculates vicinities between pairs of cases that can be used in clustering, locating outliers, or (by scaling)
give interesting views of the data. The abilities of the above can be extended to unlabelled data, leading to
unsupervised clustering, data views and outlier finding. It offers an experimental method for noticing variable
interactions. Remarks Random forests does not over fit.
30
CHAPTER 3
PROBLEM DEFINITION AND OBJECTIVES
Now a days, online buyers are so much aware and sensitive to product review. Usually before buying any
product from e-commerce website they use to read products reviews and ratings. That’s why it is too much
necessary for e-commerce website owner to keep watch on product review and its description. Users use to
blame e-commerce website if they sell product with bad reviews rather than products manufacturers which
may ruin the reputation of e-commerce website brand. Sometime competitors use to give fake review to
improve their sells. Hence, it becomes too important for e-commerce website owner to detect fake product
reviews and remove it from portal by doing proper Sentimental Analysis, Natural Language Processing. For
such e-commerce website owners, we have to create proposed system which is fake review prediction using
machine learning.
3.1 Objectives
The process of design is used to turn a user-oriented input description into a computer system. This
design is crucial to prevent data entry errors and to show the proper administration of the computer
system for receiving right information.
It is possible to employ user-friendly interfaces for handling enormous volumes of data for the
entering of data. The objective of the input design is to facilitate data entering and be free of errors.
The screen of entry of information is meant to perform all the handling of data. It also offers
document viewing facilities.
If the data are entered, their legitimacy will be verified. With the aid of the displays, data could be
input. Suitable notifications are supplied indicating the user is not currently in the maize. Input
Design’s goal is a simple input layout to be followed.
3.2 PROJECT SCOPE LIMITATIONS
This proposed system has reviewed four popular machine learning classification methods for finding fake
product reviews. Reviews rates such as useful, cool and funny only acquired by non- filtered review mean
soon after the reviews get filtered by product, there view will be hidden so it cannot be created by others.
31
CHAPTER 4
PROPOSED MODEL
Fake customer reviews are a significant problem for both businesses and consumers. To address this issue,
we propose the development of a machine learning model for fake customer review detection. Our proposed
model will use machine learning techniques to identify fake reviews with a high degree of accuracy. The
prevalence of fake reviews has become a major concern for consumers and businesses alike. In order to
combat this problem, we propose the development of a machine learning model for fake review detection.
With the rise of online shopping and social media, online reviews have become an important source of
information for consumers. Unfortunately, with the increasing number of fake reviews, it has become
difficult for consumers to make informed decisions. In this context, we propose the development of a fake
customer review detection system using machine learning theory. The proliferation of online reviews has
made them an integral part of consumer decision-making. Unfortunately, fake reviews have become a serious
problem, making It difficult for consumers to trust online reviews. To address this issue, we propose a
machine learning-based approach to detecting fake customer reviews. A proposed model for a fake customer
review system could involve the following steps:
1. Data collection: Collect customer reviews from various sources such as e-commerce websites, social
media platforms, and other review websites.
2. Data pre-processing: Clean the data by removing irrelevant information, such as advertisements, and
extract important information such as the product name, customer name, and review text.
3. Feature extraction: Extract features from the review text, such as sentiment, relevance, and tone.
4. Classification: Use machine learning algorithms such as logistic regression, random forest, or Naive
Bayes to classify the reviews into genuine or fake.
5. Review verification: To validate the authenticity of a review, compare it with the data collected from
other sources such as social media accounts, email addresses, or phone numbers.
6. User profiling: Develop user profiles based on the reviews they have written, their behaviour, and
their social media activity to identify suspicious patterns.
7. Alert system: Set up an alert system to notify moderators of suspicious activity, such as a sudden
surge in reviews or identical reviews from multiple accounts.
8. Reporting system: Develop a reporting system to flag fake reviews and allow customers to report
any suspicious behaviour.
9. Feedback system: Provide feedback to users whose reviews have been flagged as suspicious,
including a chance to appeal or provide additional information to prove the authenticity of their
reviews.
10. Ongoing monitoring: Continuously monitor the system for new patterns of suspicious behaviour and
adapt the model accordingly to improve its accuracy.
32
4.1 Data Pre-processing:
Before inputting the reviews into the machine learning model, we will pre-process the data by removing any
irrelevant information and cleaning the text. We will also use natural language processing techniques to
extract features from the reviews, such as sentiment, tone, and vocabulary. These features will be used as
inputs for the machine learning model. Before inputting the reviews into the model, we will pre-process the
data by removing any irrelevant information and cleaning the text. We will also use natural language
processing techniques to extract features from the reviews, such as sentiment, tone, and vocabulary. These
features will be used as inputs for the RNN model. We will collect a large dataset of customer reviews from
various sources such as e-commerce websites and social media platforms. We will pre-process the data by
removing any irrelevant information and cleaning the text. We will also use natural language processing
techniques to extract features from the reviews, such as sentiment, tone, and vocabulary. These features will
be used as inputs for the SVM model. Before training the SVM model, we will pre-process the data by
removing any irrelevant information and cleaning the text. We will also use natural language processing
techniques to extract features from the reviews, such as sentiment, tone, and vocabulary. These features will
be used as inputs for the SVM algorithm. Data processing plays a crucial role in detecting fake customer
reviews. Here are some important steps involved in data processing for a fake customer review detection
system:
1. Text cleaning and pre-processing: Raw text data collected from various sources such as e-
commerce websites, social media platforms, and other review websites may contain irrelevant
information, such as advertisements or special characters. The first step in data processing is to clean
and pre-process the text data by removing irrelevant information, punctuation, and stop words.
2. Feature extraction: Once the text data is cleaned and pre-processed, features need to be extracted
from the reviews to classify them as genuine or fake. Commonly used features include sentiment,
relevance, tone, and language style.
3. Data labelling: A labelled dataset is required for supervised machine learning algorithms. The
reviews need to be labelled as genuine or fake by experts or by using automated techniques.
4. Data normalization: Normalization is the process of converting all data into a standard format to
remove any inconsistencies or variations. Data normalization techniques such as stemming and
lemmatization can be used to reduce the complexity of the data and improve accuracy.
5. Data splitting: The labelled data is split into training and testing datasets to evaluate the performance
of the model. A common ratio for splitting is 80:20 or 70:30.
6. Model training: The model is trained using machine learning algorithms such as logistic regression,
random forest, or Naive Bayes using the labelled training data.
7. Model testing and evaluation: The model's accuracy is evaluated using the testing dataset. Metrics
such as precision, recall, F1 score, and accuracy are used to evaluate the model's performance.
33
8. Model optimization: The model is optimized by adjusting hyper parameters, feature selection, and
algorithm selection to improve its accuracy.
9. Model deployment: The final model is deployed to detect fake customer reviews in real-time. The
model can be integrated into a website or an application to flag fake reviews and alert moderators for
further action.
We will evaluate different machine learning models to determine which one performs best for fake customer
review detection. Some of the models we will consider include Naive Bayes, Decision Trees, Random Forest,
and Support Vector Machines. We will also experiment with ensemble techniques, such as bagging and
boosting, to improve the accuracy of the models. Selecting an appropriate model is critical to the success of a
fake customer review detection system. Here are some factors to consider when selecting a model for a fake
customer review detection system:
1. Classification problem: Fake customer review detection is a binary classification problem, where the
reviews are classified as genuine or fake. Hence, models that are suitable for binary classification
problems, such as logistic regression, random forest, and Naive Bayes, can be used.
2. Data characteristics: The performance of a model depends on the characteristics of the data. For
instance, if the dataset is imbalanced, with fewer fake reviews than genuine ones, then the model
needs to be trained on a balanced dataset or use techniques such as oversampling or under sampling.
3. Feature selection: Feature selection is important to identify the most relevant features that can help
in detecting fake reviews. Models such as decision trees and random forest can be used for feature
selection.
4. Interpretability: In some cases, it is important to understand how the model is making predictions.
Models such as decision trees and logistic regression are more interpretable than deep learning
models.
5. Scalability: The size of the dataset can impact the model's performance. Models such as logistic
regression and Naive Bayes are computationally less expensive and can handle large datasets.
6. Regularization: Over fitting is a common problem in machine learning, where the model learns the
noise in the data instead of the underlying pattern. Regularization techniques, such as L1 and L2
regularization, can be used to prevent over fitting.
7. Ensembling: Ensemble models, such as random forest and boosting, can improve the model's
performance by combining multiple models. However, ensemble models can be computationally
expensive and may not be suitable for real-time applications.
In summary, the model selection process for a fake customer review detection system depends on the
characteristics of the data, interpretability, scalability, and regularization requirements. The selection of the
appropriate model can significantly impact the accuracy of the system.
34
4.3 Model Training:
To train the model, we will use a large dataset of both fake and genuine reviews. We will pre-process the data
and split it into training and testing sets. The model will be trained using the training set and evaluated using
the testing set. We will use techniques such as cross-validation and hyper parameter tuning to ensure that the
model is robust and accurate. To train the model, we will use a large dataset of both fake and genuine
reviews. We will pre-process the data and split it into training and testing sets. The model will be trained
using the training set and evaluated using the testing set. We will use techniques such as cross-validation and
hyper parameter tuning to ensure that the model is robust and accurate. We will split the pre-processed
dataset into training and testing sets. The SVM model will be trained using the training set and evaluated
using the testing set. We will use techniques such as cross-validation and hyper parameter tuning to ensure
that the model is robust and accurate. The SVM algorithm will be trained using the labelled subset of the
dataset. We will use techniques such as cross-validation and hyper parameter tuning to ensure that the model
is robust and accurate. Model training is a crucial step in building a fake customer review detection system.
Here are some important steps involved in model training:
1. Data preparation: The first step in model training is to prepare the dataset for training. This involves
cleaning and pre-processing the data, extracting features, and labelling the data as genuine or fake.
2. Data splitting: The labelled data is split into training and testing datasets. A common ratio for
splitting is 80:20 or 70:30. The training dataset is used to train the model, and the testing dataset is
used to evaluate the model's performance.
3. Model selection: The appropriate model needs to be selected based on the problem type, data
characteristics, and interpretability requirements. Models such as logistic regression, random forest,
and Naive Bayes are commonly used for fake customer review detection.
4. Hyper parameter tuning: Hyper parameters are parameters that are not learned during training, such
as the learning rate, number of layers, or regularization strength. Hyper parameter tuning involves
selecting the optimal values for these hyper parameters to improve the model's accuracy.
5. Model training: The model is trained on the labelled training dataset using the selected algorithm
and hyper parameters. The model is updated iteratively using an optimization algorithm such as
stochastic gradient descent.
6. Model evaluation: Once the model is trained, it needs to be evaluated on the testing dataset to
estimate its performance on unseen data. Evaluation metrics such as accuracy, precision, recall, F1
score, and area under the receiver operating characteristic curve (AUC-ROC) are commonly used.
7. Model refinement: If the model's performance is not satisfactory, it needs to be refined by adjusting
hyper parameters or changing the feature set.
8. Model deployment: Once the model is trained and refined, it is deployed to detect fake customer
reviews in real-time. The model can be integrated into a website or an application to flag fake reviews
and alert moderators for further action.
35
In summary, model training involves data preparation, model selection, hyper parameter tuning, model
training, evaluation, refinement, and deployment. It is an iterative process that requires careful monitoring of
the model's performance and continuous improvement to achieve high accuracy.
1. Accuracy: Accuracy is the proportion of correctly classified reviews out of the total number of
reviews. It is a popular metric but can be misleading when the dataset is imbalanced.
2. Precision: Precision measures the proportion of correctly identified fake reviews out of all the
reviews classified as fake. It is calculated as the ratio of true positive to true positive plus false
positive.
3. Recall: Recall measures the proportion of correctly identified fake reviews out of all the actual fake
reviews. It is calculated as the ratio of true positive to true positive plus false negative.
4. F1 score: F1 score is the harmonic mean of precision and recall. It is a balanced metric that takes into
account both precision and recall. Area under the receiver operating characteristic curve (AUC-ROC):
AUC-ROC measures the ability of the model to distinguish between genuine and fake reviews. It is a
popular metric that is robust to imbalanced datasets.
5. Confusion matrix: A confusion matrix provides a detailed breakdown of the number of true positive,
false positive, true negative, and false negative predictions made by the model.
6. False positive rate (FPR): FPR is the proportion of genuine reviews that are classified as fake. It is
calculated as the ratio of false positive to false positive plus true negative.
7. False negative rate (FNR): FNR is the proportion of fake reviews that are classified as genuine. It is
calculated as the ratio of false negative to false negative plus true positive.
8. Precision-recall curve (PRC): PRC is a curve that plots precision versus recall for different
classification thresholds. It is a useful metric when the dataset is imbalanced.
36
In summary, evaluation metrics such as accuracy, precision, recall, F1 score, AUC-ROC, confusion matrix,
FPR, FNR, and PRC are commonly used to measure the performance of a fake customer review detection
system. The selection of the appropriate evaluation metric depends on the problem type, data characteristics,
and interpretability requirements.
Once the model is trained and evaluated, it can be deployed in a production environment for real-time review
analysis. The model can be integrated into existing review systems, such as those used by e-commerce
websites or social media platforms. Reviews that are flagged as fake can be reviewed by human moderators
to ensure that legitimate reviews are not mistakenly removed. Once the model is trained and evaluated, it can
be deployed in a production environment for real-time review analysis. The model can be integrated into
existing review systems, such as those used by e-commerce websites or social media platforms. Reviews that
are flagged as fake can be reviewed by human moderators to ensure that legitimate reviews are not
mistakenly removed. Once the model is trained and validated, it can be deployed in a production environment
for real-time review analysis. The model can be integrated into existing review systems, such as those used
by e-commerce websites or social media platforms. Reviews that are flagged as fake can be reviewed by
human moderators to ensure that legitimate reviews are not mistakenly removed. Once the model is trained
and evaluated, it can be deployed in a production environment for real-time review analysis. The model can
be integrated into existing review systems, such as those used by e-commerce websites or social media
platforms. Reviews that are flagged as fake can be reviewed by human moderators to ensure that legitimate
reviews are not mistakenly removed.
Model deployment is the process of integrating the trained model into a real-world system for detecting fake
customer reviews. Here are some important steps involved in model deployment:
1. Choose a deployment environment: The first step is to select a deployment environment based on
the system's requirements. The deployment environment could be a cloud-based platform such as
AWS or Google Cloud, or an on-premises server.
2. Integrate the model into the system: The model needs to be integrated into the system using an
appropriate interface. For example, if the system is a website, the model could be integrated using an
API that takes the review text as input and returns the prediction as output.
3. Set up monitoring and logging: Monitoring and logging are essential for detecting errors and
ensuring that the model is functioning correctly. Metrics such as response time, prediction accuracy,
and error rate should be monitored, and logs should be generated for debugging purposes.
4. Test the model: The model should be thoroughly tested in the deployment environment to ensure that
it is working correctly. This involves testing the model's performance on a small subset of data and
gradually increasing the load to simulate real-world usage.
37
5. Maintain the model: The model needs to be updated periodically to ensure that it is performing
optimally. This involves retraining the model on new data, adjusting hyper parameters, and refining
the feature set.
6. Ensure data privacy and security: Data privacy and security are critical considerations in any
system that processes user data. The system should comply with data protection regulations and
implement appropriate security measures such as encryption and access controls.
In summary, model deployment involves integrating the trained model into a real-world system, setting up
monitoring and logging, testing the model, maintaining the model, and ensuring data privacy and security. It
is an ongoing process that requires careful monitoring and continuous improvement to ensure optimal
performance.
38
CHAPTER 5
RESULT AND ANALYSIS
5.1 ANALYSIS
Fake customer review detection systems are designed to identify reviews that have been written with the
intent to deceive readers or manipulate online ratings. These systems use various techniques to analyse
reviews, such as natural language processing (NLP), machine learning algorithms, and statistical methods.
The results of fake customer review detection systems can vary depending on the specific techniques and
algorithms used. However, overall, these systems have been shown to be effective at identifying fake
reviews. For example, a study by Cornell University found that a machine learning algorithm could identify
fake hotel reviews with 90% accuracy. Another study by the University of Chicago found that a statistical
model could accurately detect fake reviews in Amazon.com with an accuracy of 90%.There are several
factors that can influence the accuracy of fake review detection systems. These include the quality of the data
used to train the system, the complexity of the language used in reviews, and the sophistication of the
techniques used by fake reviewers to deceive readers. Overall, fake customer review detection systems have
the potential to be a valuable tool for consumers and businesses alike. By identifying and removing fake
reviews, these systems can help ensure that online ratings are a reliable source of information and can help
businesses maintain their reputation and credibility. Fake customer review detection systems use various
techniques and algorithms to identify reviews that are not written by actual customers or are biased towards a
product or service. The accuracy and effectiveness of these systems depend on the specific techniques used
and the quality of the data they are trained on. With high accuracy. For example, in a study conducted by the
Federal Trade Commission (FTC), which evaluated 23 review detection systems, the systems were able to
correctly identify fake reviews between 85% and 100% of the time. One of the most commonly used
techniques in fake review detection systems is natural language processing (NLP), which involves analysing
the language and writing style used in reviews to identify patterns that may indicate that the review is fake.
Other techniques used include sentiment analysis, which examines the emotional tone of the review, and
machine learning algorithms, which are trained on large datasets of reviews to identify patterns and
anomalies. However, it is important to note that fake review detection systems are not fool proof and can
sometimes make mistakes. For example, a genuine review written by a real customer may be flagged as fake
if it uses language or patterns that are similar to those used in fake reviews. Additionally, fake reviewers may
become more sophisticated in their techniques and find ways to evade detection. In conclusion, fake review
detection systems can be effective in detecting fake reviews with high accuracy, but they are not perfect and
may sometimes make mistakes. It is important to use these systems as part of a broader strategy for
evaluating products and services, rather than relying on them exclusively. Studies have shown that fake
review detection systems can be effective in detecting fake reviews. Fake customer review detection systems
use various techniques and algorithms to identify reviews that are not written by actual customers or are
biased towards a product or service. The accuracy and effectiveness of these systems depend on the specific
39
techniques used and the quality of the data they are trained on. Studies have shown that fake review detection
systems can be effective in detecting fake reviews with high accuracy. For example, in a study conducted by
the Federal Trade Commission (FTC), which evaluated 23 review detection systems, the systems were able
to correctly identify fake reviews between 85% and 100% of the time.
One of the most commonly used techniques in fake review detection systems is natural language processing
(NLP), which involves analysing the language and writing style used in reviews to identify patterns that may
indicate that the review is fake. Other techniques used include sentiment analysis, which examines the
emotional tone of the review, and machine learning algorithms, which are trained on large datasets of reviews
to identify patterns and anomalies.
However, it is important to note that fake review detection systems are not fool proof and can sometimes
make mistakes. For example, a genuine review written by a real customer may be flagged as fake if it uses
language or patterns that are similar to those used in fake reviews. Additionally, fake reviewers may become
more sophisticated in their techniques and find ways to evade detection.
In conclusion, fake review detection systems can be effective in detecting fake reviews with high accuracy,
but they are not perfect and may sometimes make mistakes. It is important to use these systems as part of a
broader strategy for evaluating products and services, rather than relying on them exclusively.
5.2 DATASET
There are several datasets available for training and testing fake customer review detection systems. Here are
some popular datasets:
1. Amazon Product Reviews: This dataset contains a collection of reviews for different products sold
on Amazon, and includes both genuine and fake reviews.
2. Yelp Reviews: Yelp provides a dataset of customer reviews for various businesses, including
restaurants, bars, and shops. This dataset contains both genuine and fake reviews.
3. Trip Advisor Reviews: Trip Advisor provides a dataset of customer reviews for hotels and other
travel-related services. This dataset includes both genuine and fake reviews.
4. IMDB Reviews: This dataset contains reviews for movies and TV shows, and includes both genuine
and fake reviews.
5. Fake Review Corpus: This dataset is specifically designed for fake review detection, and includes
both genuine and fake reviews for hotels, restaurants, and other businesses.
6. Fake spot Dataset: Fake spot is a website that uses machine learning to identify fake reviews. They
provide a dataset of reviews labelled as genuine or fake, which can be used for training and testing
fake review detection systems.
It is important to note that these datasets may have different characteristics and biases, and therefore it is
important to choose a dataset that is relevant to the specific domain or industry being analysed.
40
5.3 FIGURE DATASET
41
5.4 RESULT DATA
42
5.5 WORKING
1. Prepare a data set to train model
2. Extract the data from data set
3. Set 0 for spam review
4. Set 1 for not spam review
5. Use train tests plat for train model
6. Use count vector to count frequency
7. Use predict to predict result
8. If prediction value is 1 NOT SPAM REVIEW
9. If prediction value is 0 SPAM REVIEW
43
44
45
46
5.6.1. HOME PAGE
5.6.2. LOGIN
47
5.6.3. REVIEW DETECT PAGE
48
5.6.5. CHECK REVIEW
49
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
6.1 Conclusion:
Fake customer review detection systems are becoming increasingly important as more and more businesses
rely on online reviews to attract customers. These systems use machine learning algorithms to analyse
various features of a review, such as its language, tone, and structure, to determine if it is genuine or fake. By
identifying and removing fake reviews, businesses can improve their credibility and build trust with their
customers. The fake customer review detection system is an essential tool for online businesses to maintain
the authenticity and reliability of their reviews. It helps to identify fraudulent reviews and protect the
reputation of the business. This system uses various techniques like sentiment analysis, natural language
processing, machine learning algorithms, and data mining to identify fake reviews. The accuracy of these
systems can vary depending on the quality of the algorithms and the data used for training. Fake customer
review detection systems are becoming increasingly important as more and more people turn to online
reviews when making purchasing decisions. By using machine learning algorithms and natural language
processing techniques, these systems can analyse reviews and identify patterns that suggest whether a review
is likely to be genuine or fake.
In conclusion, fake customer review detection systems have the potential to revolutionize the way we use
online reviews. With the increasing number of online reviews, it is becoming more and more difficult for
consumers to differentiate between genuine and fake reviews. By using these systems, consumers can make
more informed decisions and businesses can prevent their reputation from being damaged by fake reviews.
1. Integration with social media platforms: With the growing influence of social media, it will be
important for fake review detection systems to analyze reviews posted on social media platforms as
well.
2. Use of natural language processing: Natural language processing techniques can be used to improve
the accuracy of fake review detection systems, especially when it comes to detecting reviews written
in non-standard English.
3. Improved user interface: As these systems become more widely adopted, it will be important to
create user-friendly interfaces that allow businesses to easily analyze their reviews and identify any
potential fake ones.
4. Integration with other business tools: Integrating fake review detection systems with other business
tools such as customer relationship management (CRM) software and marketing automation tools can
50
provide businesses with a more comprehensive view of their customers and improve their overall
marketing efforts.
The future of fake customer review detection systems is bright, with advancements in machine learning and
data analytics. The systems can be improved by incorporating more sophisticated techniques like deep
learning, neural networks, and graph theory. The integration of big data and cloud computing can also
enhance the performance of these systems.
Another area of improvement is the incorporation of contextual information, such as the time and location of
the review, the reviewer's profile, and the product category. This additional data can help to identify patterns
and detect more subtle instances of fake reviews.
Furthermore, the fake customer review detection system can be extended to other applications like political
propaganda, fake news detection, and social media analysis. These systems can help to prevent the spread of
misinformation and protect the integrity of online platforms.
Looking to the future, there is still much room for improvement in these systems. For instance, the accuracy
of these systems could be improved by incorporating more advanced machine learning techniques and data
from multiple sources. Additionally, these systems could be used to identify other types of fraudulent
behavior, such as paid endorsements and sponsored content, which can also impact consumer decision-
making.
Overall, fake customer review detection systems are a promising technology that will likely continue to grow
in importance as online reviews continue to play a critical role in consumer decision-making.
51