Comparative Evaluation of Credit Card Fraud Detection
Comparative Evaluation of Credit Card Fraud Detection
Comparative Evaluation of Credit Card Fraud Detection
net/publication/339019564
CITATIONS READS
13 178
4 authors, including:
Hemant Jaiman
Rajasthan Technical University
1 PUBLICATION 13 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Developing a prediction and classification model for loan risk using machine learning View project
All content following this page was uploaded by Julius Wosowei on 02 April 2022.
Abstract—Credit card fraud is a serious and growing problem includes the utilization of the credit or debit card to obtain
with the increase in e-commerce and online transactions in this money by questionable or fraudulent methods. A good deal
modern era. With this identity theft and loss of money, such of research has focused on identifying external card fraud
mischievous practices can affect millions of people around the which often accounts for many of these credit card fraud.
world. Criminal activity is a rising threat to the financial sector It is possible to classify external credit card fraud into two
with-reaching implications.
different types, card-not-present fraud and card-present
Information extraction seemed to have assumed a basic job in
recognition of online payment fraud, fraud detection efficiency fraud. Card-not-present fraud happens when a client's card
in credit card purchases is significantly affected by the data set details including card number, termination date, and card
measuring strategy, the choice of variable and the detection verification code (CVC) are undermined and afterward
techniques used. utilized without physically exhibiting a credit card to a seller,
This publication inspects execution of, Support Vector Machine, for example, in online transactions. Card-present fraud
Naive Bayes, Logistic Regression and K-Nearest Neighbor on happens when credit card data is stolen legitimately from a
exceptionally distorted data on credit card fraud. physical credit card [1]. Distinguishing false transactions
The execution of these techniques is assessed dependent on utilizing traditional techniques for manual recognition of
accuracy, sensitivity, precision, specificity. The outcomes show
fraud is tedious and uneconomical, rendering manual
an ideal accuracy for logistic regression, Naive Bayes, k-nearest
neighbor and Support vector machine classifiers are 99.07%, methods more unfeasible due to the introduction of big data.
95.98%, 96.91%, and 97.53% respectively. The relative In any case, financial organizations have centered
outcomes demonstrate that logistic regression performs consideration regarding later computational strategies to deal
superior to other algorithms. with credit card fraud.
Information mining technique is one of the outstanding
Keywords— Credit card fraud, Data Mining, Machine Learning, and famous techniques utilized in taking care of identification
Naive Bayes, Logistic Regression, Support Vector Machine, K- of fraud, the true motive and legitimacy behind any
Nearest Neighbor transaction cannot be absolutely certain. In fact, the best
effective option is to search for possible evidence of fraud
I. INTRODUCTION from available data using statistical algorithms. Identification
of card fraud is the conceptual model for recognizing
Financial fraud is a consistently developing threat with fraudulent activity into sub classes of authentic class and not-
sweeping outcomes in the sectors of financial services, genuine class [2].
business and government organizations. Scam is termed as Detection of credit card fraud is based on analyzing the spen
illegal deceit with the intention of obtaining monetary profit. ding behavior of a card. Numerous methods have been
Monetary fraud is a growing threat to business, corporate connected to card fraud recognition using a support vector
affiliations as well as government with extensive outcome. machine [3], genetic algorithm [4], decision tree [5], artificial
Extortion can be portrayed as criminal dubiousness or neural network [6] and naïve bayes [7].
trickery with the objective of getting financial benefit. Credit card companies are actually attempting to predict a
Significantly increased credit card transactions have enjoyed purchase’s authenticity by evaluating discrepancies in
high reliance on internet innovation. As card transactions various areas such as purchasing location, transaction amount
becomes the predominant form of exchange for physical and and user purchase history. Nevertheless, with a current rise in
digital operations, the rate of fraudulent activity credit card credit card fraud cases, optimizing algorithm solutions is
fraud rate additionally increases rapidly. crucial for credit card companies [8].
Scam on cards may be an internal or exterior scam. Credit card detection is aligned with a number of
Internal card misrepresentation happens because of assent challenges, firstly fraudulent behavior patterns which are
among cardholder with the financial institutions, using a fake dynamic, which means falsified operations will, in general,
character to cheat the system whereas the exterior scam
Manhattan distance measure between two points (xi, yi) and Figure 2. Support Vector Machine
(xn, yn) is a metric in which the distance between two points is
the absolute difference of their Cartesian coordinate.
IV. EVALUATION AND RESULT
M = (𝑥𝑖 − 𝑥𝑛 ) + (𝑦𝑖 − 𝑦𝑛) (4) To evaluate this machine learning models we considered
E. Naïve Bayes: two different method namely;
(1) Classification accuracy, which is the ratio of number
Naïve Bayes classifier is based on Bayes theorem that of correct prediction to the number of input sample,
selects the highest probability based decision. Bayesian as seen in equation 6. But this is very effective only if
probability estimates from known values and known there are equal number of samples in each class.
probabilities.
number of correct prediction
It is a supervised machine learning algorithm which is Accuracy = total number of predicted made (6)
represented by
𝑃(𝐵|𝐴). 𝑃(𝐴) (2) Confusion Matrix: this gives a matrix as output and
P (A|B) = (5) describe the complete performance of the model. Four
𝑃(𝐵)
Bayes theorem provides a method of calculating the posterior essential measurements are utilized in evaluating the
likelihood P (A|B), the likelihood of outcome (A) provided analyses, to be specific True Positive Ratio (TPR),
certain conditions (B). True Negative Ratio (TNR), False Positive Ratio
The theorem calculates the later probability by using a (FPR) and False Negative Ratio (FNR) rates metric
probability ratio P (B|A) = P (B) to relate it to the previous individually.
probability of the result without any knowledge of influential In which true positive, true negative, false positive and
conditions. false negative are the quantity characterized by true positive,
The theorem of the naïve bayes is based on the assumption false positive, true negative, and false negative experiments,
that each factor affects the outcome independently and is
Based on accuracy, sensitivity (recall), specificity, precision, Table 2 above shows the efficiency assessment of the four
performance of support vector machine, naïve bayes, k- models for information allocation. The stronger efficiency
nearest neighbor and logistic regression classifier is evaluated. was shown by this information allocation. The method of
Actual no of sample Predicted No Predicted Yes logistic regression showed the greatest accuracy of results
across the assessment metrics used.
Actual No True Negative False positive
V. CONCLUSION
Actual Yes False Negative True Positive Four classifiers models are being developed in this study
based on Support Vector Machine, Naïve Bayes, K-Nearest
Neighbor, and Logistic Regression. 80% of the dataset is used
Table 1. Confusion Matrix Table for validation and testing.
Precision, Sensitive, Specificity, Accuracy are used to assess
Accuracy is the ratio of the sum of true positive and true performance. However, an unrealistic expectation is the
negative to the sum of all the predicted samples as seen in presence of a balanced training and testing dataset of the same
equation 7. distribution.
TP+TN
According to Table 2, when tested under a realistic
Accuracy = TP+TN+FP+FN (7) conditions, Logistic Regression was the most accurate in
detecting credit card fraud.
Sensitivity which is also called recall is the measure of the Based on this exploration, a credit card organization ought
ratio of true positive predictions to the sum of true positive and to consider executing a Logistic Regression algorithm that
false negative. The recall evaluate the completeness of the investigates the buy time to distinguish whether a credit card
program, examining how many true positives were detected transaction is fraud.
as positive.as seen in equation 8.
TP
Sensitivity (recall) = TP+FN (8) A. FUTURE WORKS
This exploration on distinguishing charge card extortion
Specificity is the measure of the ration of true negative to the has extraordinary potential for future ramifications. In the
sum of true negative and false positive event that a dataset with decoded fields were discharged to
people in general, the genuine components which can be
TN followed for charge card extortion identification can know.
Specificity = TN+FP (9) Besides, the aftereffects of this project were restricted by the
small data size of fraudulent cases given by the dataset.
Precision is the ratio of the number of true positives to the
By utilizing a bigger dataset with a more noteworthy
sum of true positive and false positive. It can be said to be the
number of fraudulent cases, the calculations can be prepared
measure of the quality of the positive feedback data. The to make expectations of more noteworthy exactness. To seek
equation for precision can be seen in Equation 10. after these objectives, all the more processing power might be
TP
required. Different strategies for bias avoidance, for example,
Precision= (10) other re sampling strategies, cost-sensitive learning methods,
TP+FP
and ensemble learning techniques could likewise be tried in
Four algorithm systems are developed in this research that future datasets to find the best strategy for managing a skewed
is based on logistic regression, svm, naive bayes, and k- dataset.
closest neighbor. 80% of that same sample is utilized for
preparation to evaluate the design whereas 20% was set aside
for experimentation. To evaluate the implementation of the ACKNOWLEDGMENT
classifiers, specificity, precision, accuracy, and sensitivity are The authors of this paper thankfully recognize project
used. mentor Bokefode Jayant for his knowledgeable impact in
machine learning and data science and also research
Classifiers (%) coordinator Yogesh Kakde for his assistance and for his
Metrics Support K-nearest Naïve Logistics valuable assistance.
Vector Neighbor Bayes Regression
Machine REFERENCES