Wa0006.
Wa0006.
Wa0006.
1.0 Abstract
ABSTRACT
Online spam reviews are deceptive evaluations of products and services. They are
often carried out as a deliberate manipulation strategy to deceive the readers.
Recognizing such reviews is an important but challenging problem. In this work, I try
to solve this problem by using different data mining techniques. I explore the strength
and weakness of those data mining techniques in detecting fake review. I start with
different supervised techniques such as Support Vector Ma-chine (SVM),
Multinomial Naive Bayes (MNB), and Multilayer Perceptron.
The results attest that all the above mentioned supervised techniques
can successfully detect fake review with more than 86% accuracy. Then, I work on a
semi-supervised technique which reduces the dimensionality of the input features
vector but offers similar performance to existing approaches. I use a combination of
topic modeling and SVM for the implementation of the semi- supervised tech-nique. I
also compare the results with other approaches that consider all the words of a dataset
as input features. I found that topic words are enough as input features to get similar
accuracy compared to other approaches where researchers consider all the words as
input features. At the end, I propose an unsupervised learning approach named as
Words Basket Analysis for fake review detection. I utilize five Amazon products
review dataset for an experiment and report the performance of the proposed on these
datasets.
Fake product review detection CPP 22508 sem v
OBJECTIVE:-
There are several to emsider when conduct in this section, smer issues are addressed
First, the viewpoint for opinion) observed a negative in a simation might be
considered positive in another situation. Second, people do not always express upon
the same way. Most common a processing techniques employ the fact that minor
changes between the two text fagnar unlikely change the accing
Textual reviews
Most of the available reputation models depend on numeric data available in different
fields, an example ratings in e-commerce. Abe most of the reputation models locas
only in the metall ratings of products with condering the news which are provided by
comes [15]. On the other hand, most webs allow conners to add texmal reviews to
provide a detailed opinion about the product [14] [17]. The reviews are available for
users to read. Also, customers are lydings reviews her than on ratings Reputation
models can use SA methods to extract users opinions and use this data in the
Reputation syscm. This information may include consumers opinions about different
Fake product review detection CPP 22508 sem v
Filter and fiction fake reviews hasé substanta gnificance (20) Moraes et al [21]
proposed a technique for courting a single pe text new. A sement chuted document
level in applied for stating a sive or positive sentiment Supervised leaming methods
an amused of two phases alycoction and extraction of reviews utilizing learning
models such as SVM
Exmeting the best and most create app and shiny catering the ones witten reviews in
septive or positive opinies has stracted asennon as a major research field. Although
istianidactors phase, there has been a lot of work related to several languages (22)-
124), Our work used several supervised learning algatus uch as SVM. NB. KNN-
IBK, K and DT-148 fur Somm Classification of test to detect fake reviews
SVM is robust and accurate for detecting fake reviews by evaluation of measuring the
performance with accuracy, precision. E-mesure and recall. However, in our empirical
study, results in three cases with movie reviews dataset V1.0 and movie reviews dataset
V2.0 and movie reviews dataset V3.0 prove that SVM is robust and accurate for detecting
fake reviews.
Fake product review detection CPP 22508 sem v
3.0 Methodology
StringToWordVector filter, which is the main tool for text analysis in Weka.
The StringToWordVector filter makes the attribute value in the transformed datasets
Positive or Negative for all single-words, depending on whether the word appears in
the document or not. This filtration process is used for configuring the different steps
of the term extraction. The filtration process comprises the following two sub-
processes
Tokenization
Stopwards Removal
The stopwords are the words we want to filter out, eliminate, before training the
classifier. Some of those words are commonly used (eg., "a" "the" "of." "L" "you," ","
"and") but do not give any substantial information to our labeling scheme, but instead
they introduce confusion to our classifier In this study. we wood a 630 English
stopwords list with Product reviews datusets.
Stopwords removal helps to reduce the memory requirements while classifying the
reviews.
Attribute Selection
In this step, we will use sentiment classification algorithms, and they have been
applied in many domains such as commerce, medicine, media, biology, etc. There are
many different techniques in classification method like NB. DT- 148, SVM, K-NN,
Neural Networks, and Genetic Algorithm. In this study, we will use five popular
supervised classiters: NB, DT-4S, SVM. K-NN. KStar algorithms.
fakes: False Negativos (FN) are fake events incorrectly classified as Real events. The
confusion matrix, shows numerical parameters that could be applied following
measures to evaluate the Detection Process (DP) performance. In Table III, the
confusion matrix shows the counts of real and fake predictions obtained with known
data, and for each algorithm used in this study there is a different performance
evaluation and confusion matrix.
The confusion matrix is a very important part of our study because we can classify
the reviews from datasets whether they are fake or real reviews.
The confinion matrix is applied to each of the five algorithm discussed in Step 4.
Software requirements:
1.
Fake product review detection CPP 22508 sem v
Flow Chart:-
.
Fake product review detection CPP 22508 sem v
Fake product review detection CPP 22508 sem v
Fake product review detection CPP 22508 sem v