Fraud Detection in E-Commerce Using Machine Learning
Fraud Detection in E-Commerce Using Machine Learning
2206
Muhammad Ahsan Saeed et al ., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 2206 – 2211
(3) Non-Reviews - those reviews or updates containing text can be found in the future that can help you find accurate
or unrelated ads. information [2].
Fake review detection using data mining is the study whose
The first phase, the revised review, is of great concern as it objective was to solve the fake reviews problem by using
undermines the integrity of the online review system. different data mining techniques and explore the weaknesses
Identifying a type of spam review for a particular type is as and strength in data mining techniques. For this study
challenging, if not impossible, to distinguish between false supervised approach was made to detect the fake reviews
reviews and self-study. To show the difficulty of this task. which includes Support Vector Machine (SVM), Multinomial
Naive Bayes (MNB) & Multilayer Perceptron in this research,
2.1. Robust Algorithm: the authors took different approaches for spam review
Amazon has developed a robust algorithm to detect fake detection. they started with supervised method, then tried
reviews, that can be both positive or negative [2]. Fake with semi-supervised method and finally, used a fully
reviews are positive when bought by the seller and can be unsupervised method for spam review detection. First, they
negative when bought by the competitors to drop the rating. used supervised approach that requires large scale of datasets
So, it’s a complex matrix to detect fake reviews which then they used semi-supervised which depends a lot on
Amazon is trying to overcome with the help of technology and graphs. It was suggested by the authors that in future other
the team of professionals to manually monitor. Overall, the methods for validation of Words Basket Analysis (WBA) can
approach of Amazon is good but is more time consuming by be proposed. One suggestion for this purpose is to manually
manually detecting the fake reviews. label the fake and real reviews this will help in reducing the
2.2. Fake Review Fraud Detection using Data Mining: size of datasets, then the performance of WBA approach can
Another approach towards fake reviews detection is data be improved and for labeling the truthful and deceptive
mining. While it is a good way but it also has restrictions and reviews a behavioral approach can be employed [3],
drawbacks [3]. Spam Review Detection Techniques: Systematic Literature
1. Violates user privacy: It is a well-known fact that data Review A study in which researchers conduct a
mining collects information about people using certain comprehensive review of existing studies on the availability
market-based strategies and information technology. of spam reviews using the Systematic Literature Review
2. Additional irrelevant information. (SLR.) In total, 76 existing studies are reviewed and analyzed.
3. Misuse of information. Researchers evaluated studies based on how features are
extracted from review data sets and the different methods and
2.3. Yelp filtering Algorithm: techniques used to solve spam detection problem detection.
Yelp is the largest and most popular online review site that This study has shown that the success of any spam retrieval
filters untrue or suspicious reviews. It uses various filtering review method depends. Feature releases depend on the
algorithm in order to find out fake reviews. After studying a update database, and the accuracy of spam detection methods
lot about yelp, it was decided that machine learning approach depends on the choice of feature engineering method.
is always a better approach towards fake and spam reviews Therefore, in the successful use of the spam review
detection. acquisition model and achieve better accuracy, these factors
Fake Review Detection and classification and analysis of real need to be considered in conjunction. To the knowledge of the
and Pseudo reviews in false review analysis and real false researchers, this is the first complete review of existing
review to understand the psyche of false reviewers to produce studies in the field of spam review for the use of the SLR
data sets that can provide high accuracy in detection using process. This study presented a systematic review of the
supervised learning. Kl-divergence method is used to study literature on the field of spam review findings and highlighted
data sets behavioral features are used along with n-gram the contributions of recent research in the form of various
features to check dataset of AMT. AMT’s generated dataset engineering methods, methods of detecting spam reviews,
was not found as representative of fake reviews and and various measures used in performance testing. To bring
furthermore it was found that behavioral features alone give out direct pragmatic evidence, this work organized a review
good results accuracy [4-7]. process, focused on a search query, raised research questions,
Exploiting Product Related Review Features for Fake Review selected papers from reputable publishers, applied formal
Detection exploits product-related review features to detect submissions and a study assessment process. The main
false review acquisition A convolutional neural network advantage of this study is that, to our knowledge, this is the
model is suggested to integrate a product with a brand name. first attempt to integrate all available spatial reviews of spam
For maximum flexibility, the wrap-up strategy is used to reviews using the SLR method. In addition, the release of this
incorporate a network model bag with two efficient dividers. study may assist in further research in the field of spam review
Various types of tests were performed to evaluate the [8].
performance of the suggested model. Another missing feature
2207
Muhammad Ahsan Saeed et al ., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 2206 – 2211
An Empirical Study on Detecting Fake Reviews Using to make the e-commerce environment better. People relish to
Machine Learning Strategies, this paper receives false read the reviews before they made up their minds to buy
reviews using machine learning methods many ways to anything from webstores. By detecting fake reviews and
analyze the data of movie reviews and introduce the
eliminating them will lead us to the point where buyers will
algorithms for dividing the emotions and guiding the learning
used in this work with stops and non-stop word methods [9]. not be further manipulated by any of this unauthentic and
Algorithms are emotionally divided by the Install tool, which spurious reviews stuff. Buying and selling of products will get
is used to separate movie update databases into non-fiction more frequent and facile.
updates. Finally, in the future this can be applied to different
commerce websites like the amazon e bay dataset or a According to Literature review, it is ascertained that system
different movie review database and use a variety of options.
like this subsists but with the different attributes and domain.
This method does not apply to default commerce websites.
Some of them are fraud detection of credit cards and some are
Because its algo produces manual results [10].
After thoroughly viewing different research papers on fraud predicated on fake reviews detection, but we are intended to
detections. We came to a point where we believe that engender a product that gives us access to not just find those
approaching techniques to deal with fraud detections are reviews out but withal to efface them from database. This will
many and different algorithms for fraud detections are let us to maintain the webstores and keep a check and balance
designed. But keeping the world need in mind there was a of reviews.
missing factor related to this and that is Machine Learning.
An approach that covers almost every aspect towards fraud 4. PROPOSED METHOD
detection handling that's why the idea that we have proposed Usually, it is noticed that an original buyer will review on the
are much more up to date and much more efficient while quality of the service or product only once until or unless he or
dealing with fraud detection. Table1 illustrates the she wants to respond to other customers or deliberately wants
comparison of different research papers. to misguide others by hyper or fake reviews.
The attributes of spam reviews & non-spam as discussed in
Table 1: Summary of literature comparison the next section in detail are used to build the database. A web
Research IP Mac Location Gmail Spam
application has been created that captures user reviews, which
Article Addres Addre ID word
s ss detect stores data in a database and detects spam according to the
ion proposed method. Customer identification is tracked using its
Mukherjee login email, location and IP address of its device and Mac
et al address. The process as discussed in Figure.1
Sun et al
Hossain et
al
Mirza et al
Elmurngi
et al
Bajaj et al
3. PROBLEM STATEMENT
The proposed software solution will help the user to
determine the fake unauthentic and spam reviews of any
product. It always has been an immense problem that all the
buyers face when buying stuffs from e-commerce webstores.
Fake and spam comments most of the time manipulate the
buyers which deplorably impact on the productivity of the
seller at astronomically immense level. With all of our
research we have ascertain that fake reviews are still an
immensely colossal issue for all e-commerce community.
With proposed software store owners will be able to detect
these fake reviews and withdraw them from the database.
As discussed above, this system is being developed to mainly
target the unauthentically spurious and spam reviews in order
Figure 1: Flow of proposed method
2208
Muhammad Ahsan Saeed et al ., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 2206 – 2211
ii. IP Address
iii. EMAIL ID
Every website allows a customer to access their services and
Figure 2 : Flow of review and buying of products
allow transactions only after verifying their ownership, i.e.,
the user must create an account on that website. One
component of the information required by each online service
website is the customer email id. On proposed web
application, the buyer may send only 1 review to a specific
product by the email provided. In the event that a customer
logs into an email website for example [email protected], and
the next time they try to log in using [email protected], then
existing E-commerce websites treat them as two separate
accounts or customers. two different. But in reality, these IDs
are the same in relation to email. Therefore, it is easier for
spammers to sign in with different accounts and send multiple
updates with different identities all the time. To avoid this, we
Figure 3: Reviews of products
have suggested the removal of names and compare whether
the IDs are the same or different. A better idea would be to let 5. CONCLUSION & RESULT
the email handle authenticity and instead of having a separate In this work it was the aimed to propose a new way of finding
subscription, simply allow the customer to use the website fake reviews that affect the buying behaviors of customers.
using his or her email. The proposed web application Our purposed strategy can cover a few attributes of a reviewer,
implements both methods. But what if spammer creates in this way, keeping a remarkable personality for every client.
another email address? Although it will be difficult to get a It is adequately ready to perceive spam exercises under certain
new ID every time you post a review but yes, it is possible. To presumptions. The preliminary experiments have shown
combat this attack, we can track the IP address of the device. promising results.
After viewing variety of research on fraud detection, it was
iv. LOCATION found that work has been done on spam words utilizing
Another feature used is the customer's location. To determine different techniques, but still detection was destitute in some
the person’s location, it is just needed to find the persons manners which involves other aspects like IP addresses, MAC
longitude and latitude and where the person provides reviews addresses and Email accounts additionally Machine Learning
to the product. When the location is found, we look at how approaches were missing and very less work has been done till
many reviews of the same product are offered from the same now utilizing these approaches. Some researchers suggested
location. A mock reviewer trying to increase or decrease a the utilization of IP addresses was found but still it was the
product will post more than one review to affect its rating destitute in mac addresses so proposed approach covered all
which may also adversely affect another customer’s purchase these aspects which involves IP addresses, mac addresses,
decision. But location can also be changed using a VPN.
2209
Muhammad Ahsan Saeed et al ., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 2206 – 2211
2210
Muhammad Ahsan Saeed et al ., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 2206 – 2211
2211