0% found this document useful (0 votes)
61 views9 pages

Paper 43-Fraud Detection Using Machine Learning in E Commerce

it is a fraud detection process using machine leaning

Uploaded by

meg38977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views9 pages

Paper 43-Fraud Detection Using Machine Learning in E Commerce

it is a fraud detection process using machine leaning

Uploaded by

meg38977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336148901

Fraud Detection using Machine Learning in e-Commerce

Article · October 2019


DOI: 10.14569/IJACSA.2019.0100943

CITATIONS READS
10 4,999

2 authors:

Adi Saputra Saputra Suharjito Suharjito


Binus University Bina Nusantara University, Jakarta, Indonesia
1 PUBLICATION 10 CITATIONS 136 PUBLICATIONS 726 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

[email protected] View project

effort estimation model for software development View project

All content following this page was uploaded by Adi Saputra Saputra on 01 October 2019.

The user has requested enhancement of the downloaded file.


(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

Fraud Detection using Machine Learning in


e-Commerce
Adi Saputra1, Suharjito2
Computer Science Department, BINUS Graduate Program–Master of Computer Science
Bina Nusantara University Jakarta, Indonesia 11480

Abstract—The volume of internet users is increasingly Fraud detection that has developed very rapidly is fraud
causing transactions on e-commerce to increase as well. We detection on credit cards ranging from fraud detection using
observe the quantity of fraud on online transactions is increasing machine learning to fraud detection using deep learning [6]
too. Fraud prevention in e-commerce shall be developed using but unfortunately fraud detection for transactions on e-
machine learning, this work to analyze the suitable machine commerce is still small, fraud detection research on e-
learning algorithm, the algorithm to be used is the Decision Tree, commerce commerce is still not much so far, fraud detection
Naïve Bayes, Random Forest, and Neural Network. Data to be research on e-commerce is only limited to the determination of
used is still unbalance. Synthetic Minority Over-sampling features or attributes [7] which will be used to determine the
Technique (SMOTE) process is to be used to create balance data.
nature of fraud or non-fraud transactions in e-commerce.
Result of evaluation using confusion matrix achieve the highest
accuracy of the neural network by 96 percent, random forest is The dataset used in this paper has a total of 151,112
95 percent, Naïve Bayes is 95 percent, and Decision tree is 91 records, the dataset classified as fraud is 14,151 records, the
percent. Synthetic Minority Over-sampling Technique (SMOTE) ratio of fraud data is 0.093 percent. Datasets that have very
is able to increase the average of F1-Score from 67.9 percent to small ratios result in an imbalance of data. Imbalance data
94.5 percent and the average of G-Mean from 73.5 percent to results in accuracy results that are more inclined to majority
84.6 percent. data than minority data. The dataset used results more in the
classification of the majority of non-fraud than fraud. Accuracy
Keywords—Machine learning; random forest; Naïve Bayes;
SMOTE; neural network; e-commerce; confusion matrix; G-Mean;
results that are more inclined to majority data make the
F1-score; transaction; fraud classification results worse; handling imbalance data using the
SMOTE (Synthetic Minority Oversampling Technique).
I. INTRODUCTION Recent research about fraud detection in e-commerce
Insight of previous research results on internet users in transactions still determine feature extraction [8], purpose of
Indonesia as released on October 2019 edition of Marketeers this paper is to find the best model to detect fraud in e-
Magazine [1], according to the research the number of internet commerce transactions.
users in Indonesia on 2019 alone, had reached 132 million
users, an increase from the previous year at 143.2 million users In this paper research fraud transaction in ecommerce,
show in Fig. 1. research use dataset from Kaggle, improve classification
machine learning using SMOTE, SMOTE using to handling
The increasing number of internet users in Indonesia has unbalance data, after using SMOTE, dataset will be training
triggered market players in Indonesia to try opportunities to using machine learning. Machine learning is decision tree,
develop their business through internet media. One method Naïve Bayes, random forest, and neural network machine
used is to develop an E-Commerce business [3]. learning to determine accuracy, precision, recall, G-mean, F1-
Score.
Based on statistical data obtained by Statista.com, it is
shown that the number of retail e-Commerce (electronic
commerce) sales in Indonesia will grow 133.5% to the US $
16.5 billion or around IDR 219 trillion in 2022 from the
position in 2017. This growth is supported by the rapid
advances in technology that provide convenience for
consumers to shop.
Huge number of transactions in e-commerce raises the
potential for new problems namely fraud in e-commerce
transactions shows in Fig. 2. The number of e-commerce-
related frauds has also increased every year since 1993. As per
a 2013 report, 5.65 cents lost due to a fraud of every $ 100 in e-
commerce trading turnover. Fraud has reached more than 70
trillion dollars until 2019 [5]. Fraud detection is one way to
reduce the amount of fraud that occurs in e-commerce Fig. 1. Growth of Internet users [2].
transactions.

332 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

attribute/feature process used to determine behavior in e-


commerce transactions. This attribute is used as fraud detection
in e-commerce. This attribute determines the transaction
conditions.
Another research on fraud detection in e-commerce is a
reason transaction based on the attributes or features that exist
in e-commerce transactions. The features/attributes used are
features of the transaction, namely invalid rating, confirmation
interval, average stay time on commodities, a feature of buyer
namely real name, positive rating ratio, transaction frequency.
Imbalance of data results in suboptimal classification
results. The dataset on the paper has a total number of 151,112
records, the dataset classified as fraud is 14,151 records, and
the ratio of fraud data is 0.093 percent. Synthetic Minority
Oversampling Technique (SMOTE) is one of the methods used
to make data into balance, Synthetic Minority Oversampling
Technique (SMOTE) [17] is one of the oversampling methods
that work by increasing the number of positive classes through
random replication of data, so that the amount of data positive
Fig. 2. Sales of e-Commerce, Statista.com [4].
is the same as negative data. The way to use synthetic data is to
II. RELATED WORKS replicate data in a small class. The SMOTE algorithm works by
finding k closest neighbor for a positive class, then
Fraud detection that has developed very rapidly is fraud on constructing duplicate synthetic data as much as the desired
credit cards. Many studies discuss the fraud method. One of the percentage between randomly and positively chosen k classes.
studies carried out using deep learning is auto-encoder and
restricted Boltzmann machine [9]. Deep learning is used to Recent paper about fraud detection only limited to the
build a fraud detection model that runs like a human neural determination of features or attributes. Improvement fraud
network, where data will be made in several layers that are detection in e-commerce is used machine learning. Machine
tiered for the process, starting from the Encoder at layer 1 learning used is the Decision Tree, Naïve Bayes, Random
hinge decoder at layer 4. The researcher compares the deep Forest, and Neural Network.
learning method with other algorithms such as Hidden Markov
III. RESEARCH METHODOLOGY
Model (HMM) [10].
This paper aims to classify e-commerce transactions that
Credit card fraud detection research was also using include fraud and non-fraud using machine learning, namely
machine learning [11] machine learning used as a decision tree Decision Tree, Naïve Bayes, Random Forest, and Neural
algorithm, naïve Bayes, neural networks, and random forests. Network. The research process is carried out as shown Fig. 3.
Decision tree is one algorithm that is widely used in fraud The classification process begins with the feature selection
detection because it is easy to use. Decision tree is a prediction process in the dataset. After the feature is determined, what is
model using tree structure or hierarchical structure. done is preprocessing data using PCA, the process is carried
Naïve Bayes is used in fraud detection credit cards because out by transformation, normalization, and scaling of features so
Naïve Bayes is a classification with probability and statistical that the features obtained can be used for classification after the
methods. Naïve Bayes is very fast and quite high inaccuracy in classification process is done by the SMOTE (Synthetic
real-world conditions neural network on fraud detection credit Minority Oversampling Technique) process. SMOTE is useful
cards uses genetic algorithms to determine the number of for making imbalance data into balance. The SMOTE
hidden layer architectures on neural networks [12] with genetic (Synthetic Minority Oversampling Technique) process is
algorithm, the genetic algorithm produces the most optimal useful for dealing with data imbalance problems in fraud cases,
number of hidden layers [13]. Fraud detection on credit cards because fraud cases are usually below 1 percent, so as to
also uses random forest [14]. Random forest uses a reduce the majority class in the dataset. The majority class can
combination of each good tree and then combined into one make the classification more directed to the majority class so
model. Random Forest relies on a random vector value with the that the predictions of the classification are not as expected; the
same distribution on all trees where each decision tree has a results of the SMOTE dataset transaction fraud process will be
maximum depth [15]. balanced [18]
Research on fraud detection in e-commerce is still not Machine learning used in the classification process is
much so far. Fraud detection research on e-commerce is only decision tree, random forest, artificial neural network, and
limited to the determination of features or attributes that will be naïve Bayes. This machine-learning algorithm will be
used to determine the nature of the fraud or non-fraud compared to find the best accuracy results from the transaction
transactions [16]. The study describes the extraction dataset in e-commerce.

333 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

 Branch / Sub-Tree: Subdivisions of all trees are called


branches or sub-trees.
 Parent and Child Node: A node, which is divided into
sub-nodes [22].
The fraud detection architecture using a decision tree
consists of the root node, internal node and leaf node of the
decision tree architecture as shown Fig. 4.
C. Naïve Bayes
Naïve Bayes predicts future opportunities based on past
experience [23], it uses the calculation formula as below.

(1)

Where:
B: Data with unknown classes

Fig. 3. Research Steps. A: The data hypothesis is a specific class


P(A|B): Hypothesis probability based on conditions
A. Preprocessing Data (posterior probability)
Preprocessing is used to extract, transform, normalize and
scaling new features that will be used in the machine learning P (A): Hypothesis probability (prior probability)
algorithm process to be used. Preprocessing is used to convert P(B|A): Probability-based on conditions on the hypothesis
raw data into quality data. In this study preprocessing uses
PCA (Principle Component Analysis) with the features [19] of P (B): Probability A
extraction, transformation, normalization and scaling. By using the formula above can be obtained opportunities
PCA is a linear transformation commonly used in data from fraud transactions and non-fraud transactions
compression and is a technique commonly used to extract D. Random Forest
features from data at a high-dimensional scale. PCA can reduce
complex data to smaller dimensions to display unknown parts Random forest (RF) is an algorithm used in the
and simplify the structure of data. PCA calculations involve classification of large amounts of data. Random Forest (RF) is
calculations of covariance matrices to minimize reduction and a development of the Classification and Regression Tree
maximize variance. (CART) method by applying the bootstrap aggregating
(bagging) method and random feature selection Architecture
B. Decision Tree Random forest as shown in Fig. 5.
Decision trees are useful for exploring fraud data, finding Random forest is a combination of each good transaction
hidden relationships between a number of potential input fraud tree which is then combined into one model. Random
variables and a target variable. Decision tree [20] combines Forest relies on a random vector value with the same
fraud data exploration and modeling, so it is very good as a distribution on all trees, each decision tree in e-commerce fraud
first step in the modeling process even when used as the final detection which has a maximum depth. The class produced
model of several other techniques [21]. from the classification process is chosen from the most classes
Decision tree is a type of supervised learning algorithm; a produced by the decision tree.
decision tree is good for classification algorithm. Decision tree
divides the dataset into several branching segments based on
decision rules, this decision rule is determined by identifying a
relationship between input and output attributes.
 Root Node: This represents the entire population or
sample, and this is further divided into two or more.
 Splitting: This is the process of dividing a node into two
or more sub-nodes.
 Decision Node: When a sub-node is divided into several
sub nodes.
 Leaf / Terminal Node: Unspecified nodes are called
Leaf or Terminal nodes.
 Pruning: When a sub-node is removed from a decision. Fig. 4. Architecture of Decision Trees.

334 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

Fig. 5. Architecture of Random Forest.

E. Neural Network
The algorithm neural network is an artificial intelligence
method whose concept is to apply a neural network system in
the human body where nodes are connected to each other, Fig. 6. Architecture of Neural Network.
architecture neural network as shown in Fig. 6.
F. Confusion Matrix
The number of input layers before training is 11 input
layers, after preprocessing the input layer to 17 input layers, in Confusion matrix is a method that can be used to evaluate
addition to determining the hidden layer, genetic algorithms on classification performance. Table I shows a dataset with only
the neural network is used [24]. The GA-NN [25] algorithm two types of classes [26].
process for this forecasting process is as follows: True Positive (TP) and True Negative (TN) are the number
 This forecasting is as follows: of positive and negative classes that are classified correctly,
False Positive (FP) and False Negative (FN) is the number of
 Initialization count = 0, fitness = 0, number of cycles positive and negative classes that are not classified correctly.
Based on the confusion matrix, performance criteria such as
 Early population generation. Individual chromosomes Accuracy, Precision, Recall, F-Measure, G-Mean can be
are formulated as successive gene sequences, each determined.
encoding the input.
Accuracy is the most common criteria for measuring
 Suitable network design classification performance, but if working in an imbalanced
 Assign weights class, this criterion is not appropriate because the minority
class will have a small contribution to the accuracy criteria.
 Conduct training with backpropagation Looks for The recommended evaluation criteria are recall, precision F-1
cumulative errors and fitness values. Then evaluated Score and G-Mean. F-1 Score is used to measure the
based on the value of fitness. classification of minority classes in unbalanced classes, and the
G-mean index is used to measure overall performance (overall
 If the previous fitness <current fitness value, save the
classification performance).
current value
In this study, classification performance using Recall,
 Count = count +1
Precision, F-1 Score and G-Mean:
 Selection: Two mains are selected using a wheel
roulette mechanism (2)

 Genetic Operations: crossover, mutation, and (3)


reproduction to produce new feature sets
 If (number of cycles <= count) return to number four (4)

 Network training with selected features √ (5)


 Study performance with test data. (6)

335 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

TABLE. I. CONFUSION MATRIX Decision tree without SMOTE produce Accuracy is 91%,
recall is 59.8%, Precision is 54.1%, F1-Score is 56.8%, G-
Class Predictive Positive Predictive Negative
Mean is 75.2%. Table II shows result from confusion matrix
Actual Positive TP TN decision tree without SMOTE.
Actual Negative FP FN Decision tree with SMOTE produce Accuracy is 91%,
recallis 60.4%, Precisionis 91.6%, F1-Score is 91.2%, G-Mean
IV. RESULTS AND DISCUSSION is 75.3%. Table III shows result from confusion matrix
decision tree with SMOTE.
A. Dataset
This study uses an e-commerce fraud dataset sourced from C. Naïve Bayes
Kaggle. The dataset consists of 151,112 records, a dataset The process of testing using the Naïve Bayes model is done
classified as fraud is 14,151 records, and the ratio of fraud data by preparing data that has already been done in the
is 0.093. SMOTE (Synthetic Minority Oversampling preprocessing process. After preprocessing, the data will be
Technique) [27] minimizes class imbalance in the fraud carried out oversampling using Naïve Bayes classification will
transaction dataset by generating synthesis data, so that the be done using data that has been oversampling, and also Naïve
total data consists of 151,112 records, dataset classified as Bayes will be done using data that is not oversampling. The
fraud is 14,151 records, fraud data ratio is 0.093, as shown in results of these two experiments will show the results of the
Fig. 7. classification using the comparison of Naïve Bayes and the
SMOTE (Synthetic Minority Oversampling Technique)
After oversampling at the picture Fig. 8 oversampling process.
The SMOTE (Synthetic Minority Oversampling Naïve Bayes without SMOTE produce Accuracy is 95%,
Technique) process makes the synthesis data so that the data recall is 54.1%, Precision is 91.1%, F1-Score is 67.9%, G-
becomes balance. Mean is 73.3%. Table IV shows result from confusion matrix
B. Decision Trees naïve Bayes without SMOTE.
The experimental process using the decision tree model is Naïve Bayes with SMOTE produce Accuracy is 95%,
done by preparing data that has been done by the preprocessing recall is 54.2%, Precision is 94.9%, F1-Score is 94.5%, G-
process. After preprocessing, the data will be carried out by Mean is 73.4%. Table V shows result from confusion matrix
oversampling the classification using the decision tree will be Naïve Bayes with SMOTE.
done using the oversampling data, and also the decision tree
will be done by using the data that has not been oversampled. TABLE. II. CONFUSION MATRIX DECISION TREE WITHOUT SMOTE
The results of these two experiments will show the results of
the classification using a comparison of decision trees and the Class Predictive Positive Predictive Negative
SMOTE (Synthetic Minority Oversampling Technique)
Actual Positive 38782 38782
oversampling process.
Actual Negative 1746 2595

TABLE. III. CONFUSION MATRIX DECISION TREE WITH SMOTE

Class Predictive Positive Predictive Negative

Actual Positive 38651 2342

Actual Negative 1724 2617

TABLE. IV. CONFUSION MATRIX NAÏVE BAYES WITHOUT SMOTE

Class Predictive Positive Predictive Negative


Fig. 7. Ratio Fraud.
Actual Positive 40764 229

Actual Negative 1993 2348

TABLE. V. CONFUSION MATRIX NAÏVE BAYES WITH SMOTE

Class Predictive Positive Predictive Negative

Actual Positive 40760 233

Actual Negative 1988 2353


Fig. 8. Ratio Fraud after over Sampling.

336 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

D. Random Forest TABLE. IX. CONFUSION MATRIX NEURAL NETWORK WITH SMOTE

The trial process using the Random Forest model is carried Class Predictive Positive Predictive Negative
out by preparing data that has already been done by the
preprocessing process. After preprocessing, the data will be Actual Positive 38566 2539
carried out classification oversampling using Random Forest Actual Negative 9585 31487
will be done using data that has been oversampled, and also
Random Forest will be done using data that is not Experiments using several algorithms produce accuracy
oversampling. The results of these two experiments will show values as shown in Fig. 9. The highest accuracy value in the
the classification results using the Random Forest comparison neural network algorithm is 96%.
and the SMOTE (Synthetic Minority Oversampling Technique)
oversampling process. Experiments using several algorithms produce recall values
as shown in Fig. 10, recall values increase using machine
Random forest without SMOTE produce Accuracy is 95%, learning algorithms and also the Synthetic Minority Over
recall is 55%, Precision is 95.5%, F1-Score is 69.8%, G-Mean Sampling Technique (SMOTE) compared only using the
is 74.0%. Table VI shows result from confusion matrix random decision tree algorithm, random forest, Naïve Bayes, and
forest without SMOTE. neural networks only, the highest increase occurred in the
neural network algorithm and the SMOTE (Synthetic Minority
Random Forest with SMOTE produce Accuracy is 95%,
Over Sampling Technique).
recall is 58.1%, Precision is 80.5%, F1-Score is 94.3%, G-
Mean is 75.7%. Table VII shows result from confusion matrix Experiments using several algorithms produce precision
random forest with SMOTE. values as shown in Fig. 11, the value decreases using machine
learning algorithm and the Synthetic Minority Over Sampling
E. Neural Network Technique (SMOTE) compared only using the decision tree
Research using the Neural Network model is done by algorithm, random forest, Naïve Bayes, and neural networks,
preparing data that has already been done by the preprocessing highest occurs in neural network algorithms and SMOTE
process. After preprocessing, the data will be carried out (Synthetic Minority Over Sampling Technique).
classification oversampling using Neural Network will be done
using data that has been oversampling, and also Random Forest Experiments using several algorithms produce F1-Score
will be done using data that is not oversampling. The results of values as shown in Fig. 12, F1-Score values are increased by
these two experiments will show the results of classification using machine learning algorithms and also Synthetic Minority
using the Neural Network comparison and the SMOTE Over Sampling Technique (SMOTE) compared only using
(Synthetic Minority Oversampling Technique) oversampling algorithms. F1-Score is used to measure the classification of
process. minority classes in unbalanced classes.

Neural network without SMOTE produce Accuracy is


96%, recall is 54%, Precision is 97.1%, F1-Score is 97.1%, G-
Mean is 73.5%. Table VIII shows result from confusion matrix
neural network without SMOTE.
Neural network with SMOTE produce Accuracy is 85%,
recall is 76.7%, Precision is 92.5%, F1-Score is 85.1%, G-
Mean is 84.6%. Table IX shows result from confusion matrix
neural network with SMOTE.

TABLE. VI. CONFUSION MATRIX RANDOM FOREST WITHOUT SMOTE

Class Predictive Positive Predictive Negative Fig. 9. Accuracy Result.


Actual Positive 40881 112
Actual Negative 1954 2387

TABLE. VII. CONFUSION MATRIX RANDOM FOREST WITH SMOTE

Class Predictive Positive Predictive Negative


Actual Positive 40383 610
Actual Negative 1820 2521

TABLE. VIII. CONFUSION MATRIX NEURAL NETWORK WITHOUT SMOTE

Class Predictive Positive Predictive Negative


Actual Positive 41113 24
Fig. 10. Recall Result.
Actual Negative 1932 2265

337 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

algorithms can be used for improving ANN performance.


Genetic algorithm can determine the number of hidden nodes
and hidden layers, select relevant features, neural network. The
SMOTE method in the experiment showed an increase in the
value of recall, f1-score and also G -Mean, Neural network
recall increased from 54% to 76.7%, Naïve Bayes recall
increased from 41.2% to 41.3%, recall random forest from
55% to 58%, and recall decision tree from 59.8% to 60.4%.
The value of f1-score also increased for all machine learning
methods for neural networks increased from 69.8% to 85.1%,
f1-score Naïve Bayes increased from 67.9% to 94.5%, f1-score
random forest 69.8% to 94.3%, the f1-score for the decision
Fig. 11. Precision Result. tree also increased from 56.8% to 91.2%. By using SMOTE
the value of G-Mean also increased for neural networks
increased from 73.5% to 84.6%, G-Mean Naïve Bayes
increased from 73.3% to 73.4%, G-Mean random forest 74% to
75 7%, the G-Mean for decision tree also increased from
75.2% to 75.3%.
Based on the results of the above experiment, it was
concluded that the application of SMOTE on neural networks,
random forests, decision trees, and Naïve Bayes was able to
handle the imbalance of the e-commerce fraud dataset by
producing higher G-Mean and F-1 scores compared to neural
networks, random forest, decision tree, and Naïve Bayes. This
proves that the SMOTE method is effective in increasing the
Fig. 12. F1-Score Result. performance of unbalanced data classification.
VI. FUTURE WORK
In Future studies, it is expected to be able to use other
algorithms or deep learning for fraud detection in e-commerce
and other future study to improve neural network accuracy
when using the SMOTE (Synthetic Minority Over Sampling
Technique) process.
REFERENCES
[1] Asosiasi Penyelenggara Jasa Internet Indonesia, " Magazine
APJI(Asosiasi Penyelenggara Jasa Internet Indonesia)" (2019): 23 April
2018.
[2] Asosiasi Penyelenggara Jasa Internet Indonesia, "Mengawali integritas
era digital 2019 - Magazine APJI(Asosiasi Penyelenggara Jasa Internet
Indonesia)" (2019).
Fig. 13. G-Mean Result. [3] Laudon, Kenneth C., and Carol Guercio Traver. E-commerce: business,
technology, society. 2016.
The G-mean value increased by using machine learning [4] statista.com. retail e-commerce revenue forecast from 2017 to 2023 (in
algorithm values as shown in Fig. 13, Synthetic Minority Over billion U.S. dollars). (2018). Retrieved April 2018, from Indonesia: :
Sampling Technique (SMOTE) compared only using the G- https://www.statista.com/statistics/280925/e-commerce-revenue-
forecast-in-indonesia/.
mean algorithm used to measure overall performance (overall
[5] Renjith, S. Detection of Fraudulent Sellers in Online Marketplaces using
classification performance). Support Vector Machine Approach. International Journal of Engineering
Trends and Technology (2018).
V. CONCLUSION
[6] Roy, Abhimanyu, et al. "Deep learning detecting fraud in credit card
The e-commerce transaction fraud dataset is a database that transactions." 2018 Systems and Information Engineering Design
has a class imbalance. This study applies the Synthetic Symposium (SIEDS). IEEE, 2018.
Minority Over Sampling Technique (SMOTE) method to deal [7] Zhao, Jie, et al. "Extracting and reasoning about implicit behavioral
with class imbalances in the e-commerce transaction fraud evidences for detecting fraudulent online transactions in e-Commerce."
Decision support systems 86 (2016): 109-121.
dataset, the algorithm used is the decision tree, Naïve Bayes.
[8] Zhao, Jie, et al. "Extracting and reasoning about implicit behavioral
random forest and neural network. evidences for detecting fraudulent online transactions in e-Commerce."
Decision support systems 86 (2016): 109-121.
The results showed that the highest accuracy was 96%
neural network, then random forest, and Naïve Bayes were [9] Pumsirirat, Apapan, and Liu Yan. "Credit card fraud detection using
deep learning based on auto-encoder and restricted boltzmann machine."
95%, for decision trees accuracy was 91%. Neural network International Journal of advanced computer science and applications 9.1
has best accuracy because GA (genetic algorithms). Genetic (2018): 18-25.

338 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 9, 2019

[10] Srivastava, Abhinav, et al. "Credit card fraud detection using hidden [19] Sadaghiyanfam, Safa, and Mehmet Kuntalp. "Comparing the
Markov model." IEEE Transactions on dependable and secure Performances of PCA (Principle Component Analysis) and LDA (Linear
computing 5.1 (2008): 37-48. Discriminant Analysis) Transformations on PAF (Paroxysmal Atrial
[11] Lakshmi, S. V. S. S., and S. D. Kavilla. "Machine Learning For Credit Fibrillation) Patient Detection." Proceedings of the 2018 3rd
Card Fraud Detection System." International Journal of Applied International Conference on Biomedical Imaging, Signal Processing.
Engineering Research 13.24 (2018): 16819-16824. ACM, 2018.
[12] Aljarah, Ibrahim, Hossam Faris, and Seyedali Mirjalili. "Optimizing [20] Harrison, Paula A., et al. "Selecting methods for ecosystem service
connection weights in neural networks using the whale optimization assessment: A decision tree approach." Ecosystem services 29 (2018):
algorithm." Soft Computing 22.1 (2018): 1-15. 481-498.
[13] Bouktif, Salah, et al. "Optimal deep learning lstm model for electric load [21] Randhawa, Kuldeep, et al. "Credit card fraud detection using AdaBoost
forecasting using feature selection and genetic algorithm: Comparison and majority voting." IEEE access 6 (2018): 14277-14284.
with machine learning approaches." Energies 11.7 (2018): 1636. [22] Lakshmi, S. V. S. S., and S. D. Kavilla. "Machine Learning For Credit
[14] Xuan, Shiyang, Guanjun Liu, and Zhenchuan Li. "Refined weighted Card Fraud Detection System." International Journal of Applied
random forest and its application to credit card fraud detection." Engineering Research 13.24 (2018): 16819-16824.
International Conference on Computational Social Networks. Springer, [23] Li, Tong, et al. "Differentially private Naïve Bayes learning over
Cham, 2018. multiple data sources." Information Sciences 444 (2018): 89-104.
[15] Hong, Haoyuan, et al. "Landslide susceptibility mapping using J48 [24] Suganuma, Masanori, Shinichi Shirakawa, and Tomoharu Nagao. "A
Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles genetic programming approach to designing convolutional neural
in the Guangchang area (China)." Catena 163 (2018): 399-413. network architectures." Proceedings of the Genetic and Evolutionary
[16] Zhao, Jie, et al. "Extracting and reasoning about implicit behavioral Computation Conference. ACM, 2017.
evidences for detecting fraudulent online transactions in e-Commerce." [25] Ruehle, Fabian. "Evolving neural networks with genetic algorithms to
Decision support systems 86 (2016): 109-121. study the string landscape." Journal of High Energy Physics 2017.8
[17] Sharma, Shiven, et al. "Synthetic oversampling with the majority class: (2017): 38.
A new perspective on handling extreme imbalance." 2018 IEEE [26] Ting, Kai Ming. "Confusion matrix." Encyclopedia of Machine Learning
International Conference on Data Mining (ICDM). IEEE, 2018. and Data Mining (2017): 260-260.
[18] Kim, Jaekwon, Youngshin Han, and Jongsik Lee. "Data imbalance [27] Siringoringo, Rimbun. "Klasifikasi Data Tidak Seimbang Menggunakan
problem solving for smote based oversampling: Study on fault detection Algoritma Smote Dan K-Nearest Neighbor." Journal Information
prediction model in semiconductor manufacturing process." Advanced System Development (ISD) 3.1 (2018).
Science and Technology Letters 133 (2016): 79-84.

339 | P a g e
www.ijacsa.thesai.org
View publication stats

You might also like