A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Article title: A Review: Credit Card Fraud Detection in Banks using Machine Learning Algorithms

Authors: Muhammad Hazeel Ahmed[1]


Affiliations: [1]
Orcid ids: 0000-0003-3869-8558[1]
Contact e-mail: [email protected]
License information: This work has been published open access under Creative Commons Attribution License
http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at
https://www.scienceopen.com/.
Preprint statement: This article is a preprint and has not been peer-reviewed, under consideration and submitted to
ScienceOpen Preprints for open peer review.
Links to data: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
DOI: 10.14293/S2199-1006.1.SOR-.PPFI7P0.v2
Preprint first posted online: 06 February 2023
Keywords: Fraud detection, Random forest, Fraudulent behavior detection., Machine Learning
A Review: Credit Card Fraud Detection in Banks using Machine Learning
Algorithms
February, 2023

Correspondence
ABSTRACT
University of Management and
Technology Fraud events are increasing on day to day bases. Credit card fraud is the most
frequent fraud that is making a big financial loss on a global level. Researchers
Dr. Ahmed Hassan Butt have implemented many machine learning algorithms to detect the credit card
[email protected] frauds. This research has briefly described several algorithms and compares the
performance of Random Forest, Naïve Bayes, K-Nearest Neighbor, Logistic
M. Hazeel Ahmed Butt
[email protected] Regression and Multilayer Perceptron. These Algorithms are also used to
classify the real transactions or fraudulent transactions. Datasets are compared
on the basis of accuracy, precision, recall & false positive rate. Comparison of
the result shows that Random Forest performs best in credit card fraud detection
dataset among others. Research shows that any ML algorithm can be used to
demonstrate the classification of fraud detection.

KEYWORDS
Fraud detection, Credit card fraud detection, Random forest, Machine learning,
fraudulent behavior detection.

LAYOUT DESCRIPTION 2. We also highlight different fraud detection


techniques, some issues and challenges related
• Fraud is a criminal activity of a human being,
to it. Further, we discusses analysis of different
which might be illegal act of money transferring
machine learning algorithms in credit card
from one’s account without notifying. In [1], the
fraud detection.
author explains Fraud as to misuse of someone’s
3. Future optimization and enhancement in fraud
money or assets for one’s own advancement.
detection system.
• Now a days, in transactional frauds security are the
major issue. According to the researchers different We summarize different techniques and algorithms of

organizations of the world suffer 10% to 15% lose fraud detection from the research of 2016 till 2022. It

in their revenues due to fraudulent activities. also provides positive knowledge about transactional
fraud that are occurring currently though credit cards
• Fraud Detection is a technique to protect a system
these days.
from fraud. It can also be used to enhance the
security of the system in order prevent it from a I. INTRODUCTION
fraudulent activity. In today’s world E-commerce systems are trending
and every single person is aware of it. Now the
WHAT THIS RESEARCH ADDS ON TO
THE FRAUD DETECTION IN BANKS? question arises with the enhancement in electronic
commerce systems, frauds are also increased on the
1. We have categorized and explored in a positive
day to day bases because these e-commerce system
sense to discuss a detailed knowledge about the
has security issues regarding the privacy of
scenarios in which fraud detection techniques
customers’ credit cards.
are used by researchers.

1
These E-commerce websites have two types of users, methods and techniques which are adopted. Section 4,
i.e. Authorized users & Scammers. brings the results with outcomes of doing the research.
At last section 5 contains the concluding part of this
Advancement in the system makes users aware of
online transactions through credit cards, which is the research paper.
most easiest and popular way of money transferring II. LITERATURE REVIEW
on a commercial scale [2]. There can be two types of In history, fraud detection has always been a hot topic
credit card frauds i.e. fraudster can make an unknown towards the research. Different researchers have done
identity to make an online fraud transaction or research in many areas of fraud. In [1], the author
fraudster can use the stolen or lost credit card to refers to the techniques of Artificial Intelligence
transact money or get cash. where he explains the working of neural network with

However, transactional fraud is increasing rapidly and the central nerve system of animals, because it is

every year it is a great loss of money globally [3]. In capable to learn and recognize easily. Researchers

today’s world a person does not need a physical credit represented many types of frauds i.e. bankruptcy

card to make any transaction online, but just the frauds, theft fraud and credit card frauds. Paper [1],

information on a credit card is needed to process the highlights some of the issues & challenges related to

transaction. Credit card has very sensitive a fraud detection system such as Concept Drift, Real

information, i.e. Card Number, CVV code and expiry Time Detection, and Skewed Distribution & Large

date. If a fraudster can have these three private details, amount of Data. It also provides an overview of

then he can make a large amount of purchase. different state-of-art Fraud Detection System
approaches & methods such as artificial neural
To overcome the transactional frauds organizations
network (ANN), support vector machines (SVM),
uses different machine learning algorithms for fraud
decision trees & meta-heuristics etc. They conclude
detection. We have analyzed some algorithms of
by highlighting some challenges that are improper to
machine learning, i.e. Random Forest, which is used
model & have weak accuracy.
as a classifier to classify normal & fraudulent
behaviors by using voting of base classifiers in the According to data mining concept of classification
data set. [4] Naïve Bayes (NB), Logistic Regression fraud detection falls in the bucket of classification
(LG), Multilayer Perceptron (MLP) and decision trees problem [2]. As fraud detection works on the
[5] to identify that which algorithm fits best in credit algorithm of data mining to classify the credit card
card fraud detection. transaction as an original or fraudulent one. The
author proposed in [2] that Credit Card Fraud
This paper also highlights the consequences of
Detection is a problem of Data Mining and there are
financial literacy [6], which is mainly focused on the
two major reasons for which credit card fraud
knowledge of finance and financial decisions in order
detection is becoming more complex & challenging.
to carefully manage the money and transactions of
They also performed a performance test on the bases
credit card. So, by combining the data results of the
of comparison on European cardholders having
above analysis from all techniques and algorithms,
284,807 transactions by using three techniques K-
this paper positively shows that which algorithm will
nearest, Naive Bayes & Logistic Regression. They
better perform on the commercial scale. Section 2,
conclude by showing the effect of hybrid sampling.
defines the literature review and related work of
previous researchers. Section 3, defines the research

2
There are two types of credit card frauds, one is a III. METHODOLOGY
fraud from the application in which fraudster gets a We come up with the search space that includes
card by using false or wrong information & second is different e-databases to acquire valid papers about
a fraud through stealing the card number and fraud detection. We assessed different online
password of card owner and make transactions refer research papers and explored the work of previous
to as behavioral fraud [7]. In [8], misuse detection and researchers to provide a positive review that will
anomaly detection are two fraud detection techniques help future researchers have a better understanding

used in for the detection of fraudulent transaction. of fraud detection techniques.

Misuse detection usually identifies known a. DATASET DESCRIPTION


transactions because it is trained on fake transactions, For this research we have used a dataset provided on
whereas anomaly detection identify novel ones’ Kaggle named as Credit Card Fraud Detection, which
because it is trained on normal transactions. is available at [10]. It was an unbalanced dataset
which contains 284,807 total transactions. This
Researchers have classified algorithms of machine dataset contains different attributes such as Class,
learning which are helpful for analyzing the results. In Amount and Time etc.
[9], researchers combined LR, SVM, GB and RD on
Amendment of Principle Component Analysis (PCA)
European dataset which provide 91% of the overall. was performed on this dataset. Out of 284,807
In [4] they combined three techniques LR, DT and RF. transactions only 492 transactions were recognized as
So, after analyzing and exploring the techniques they fraudulent and 284,315 transactions were recognized
have come up with the best performance of RF with as original ones.
95.5% and DT with 94.3% and LR with 90%. Paper For further production dataset has been preprocessed
[4], highlights the classification of Credit Card by using the selector tool provided at [11], which is
transactions that they are genuine or fraud. used to remove the features that do not have
collective significance towards the results. It is an
Basically, credit card fraud detection is based on the
unbalanced dataset, so we have seen different data
behavior of card owner that how it uses. Normally,
sampling techniques such as ADASYN [12] and
variables are predicted to know the behavior and this
SMOTE [13] to implement it on this dataset to get
selection has a great effect on the performance of
balanced data for the future research.
fraud detection systems [5]. Analysis of machine
learning algorithms and algorithms of Bayesian b. MACHINE LEARNING ALGORITHMS
network empirically performed better to the economic RANDOM FOREST:
efficiency. In [5], author tells us about two major Is a machine learning algorithm. It can be used
problems due to which fraud detection is becoming as classification as well as regression, working of
more complex & challenging, i.e. Profile of a random forest is used to make decision trees. It takes
fraudster / normal behavior change and highly skewed the input of samples from datasets and produce
data. They also perform comparison in between decision trees on every sample. Random forest
different algorithms to measure the best out of it, these produces decision trees with some of the results and
methods include quadrant discriminative analysis, at last it combines or merge the results of different
pipelining and ensemble learning on CCFD. trees to provide accuracy in prediction. This
technique is used to train the dataset in two parts i.e.
normal behavior & fraudulent behavior [3].
3
that are known in order to produce the estimated
K NEAREST NEIGHBORS: probability of unknowns. In this paper the Naïve
It uses different mathematical operations to
Bayes classifier is used as fraudulent or non-
produce a better output for input datasets. Three main
fraudulent.
operations are used, i.e. Manhattan distance,
Minkowski distance & Euclidean distance. These
LOGISTIC REGRESSION:
three measures of similarity are also discussed in [2]. It is a machine learning algorithm popularly
In this paper Euclidean distance is used to find out the known for its working, while calculating probabilities
classifiers of nearest neighbors. It is the distance it make dependent variable binary. It will predict that
between two state points in space is the length of a something will happen or not. It also uses a function
line segment between two points. that is called the sigmoid to find the best predictions
𝒏 of the results. Values of a function will be treated as 0
𝒅(𝒑, 𝒒) = √ ∑(𝒒𝒊 − 𝒑𝒊 )𝟐 if it is less than 0.5 and considered 1 if it is greater
𝒊=𝟏
than 0.5.
It is calculated between current & new input for every
single data point and results re-establish in ascending IV. RESULTS & DISCUSSIONS
order and those were selected which have low
We have discussed five techniques for the detection of
distance to the input.
fraudulent transactions. We use a concept of
classification evaluation described in [15], which
MULTILAYER PERCEPTRON:
include Confusion matrix having 4 different sets of
It is a multilayer artificial neural network values, i.e. True Positive (TP), False Positive (FP),
having 3 different layers, i.e. input, output & hidden False Negative (FN) and True Negative (TN). For
layers. Input layer processed the signals of input. comparison, we have considered its Recall, Precision,
Hidden layer is placed between input & output layer Accuracy and False Positive Rate. We also use ROC
and Output layer performs the classification [14]. It Curve & Precision Recall Curve for some pictorial
uses functions which calculate the weight and adds representation of the dataset.
bias to its output. Its function will be look like as:
𝒇(𝒙) = 𝒇(𝑾𝒙𝑻 + 𝒃) 𝑇𝑃
Recall = 𝑇𝑃 + 𝐹𝑁
𝑇𝑃
Precision = 𝑇𝑃 + 𝐹𝑃
𝑇𝑃 + 𝑇𝑁
Accuracy = 𝑁
𝐹𝑃
False Positive Rate = 𝐹𝑃 + 𝑇𝑁

We have discussed models of classification above and


Figure 1: Pictorial representation of MLP used them on the dataset.
In below given tables Precision, Recall and False
NAIVES BAYES:
Positive Rate basically shows the fraudulent
It is a decision maker, which make decisions
transaction in the dataset.
on the bases of high probability [2]. It is one of the
algorithm in which attribute dependencies are not
applicable & its probability always uses the values

4
Table 1: Represent Results of Classification
Evaluation

CLASSIFIE RECA PRECI FPR ACCU


RS LL SION RACY

Random 0.89 0.31 0.46 99.7%


Forest

K-Nearest 0.55 0.02 0.03 94.4%


Neighbor

Multilayer 0.90 0.08 0.15 98.4% Figure 3: ROC Curve between TPR and FPR
Perceptron

V. CONCLUSION
Naïve Bayes 0.85 0.26 0.40 99.6%
Credit Card fraud has been increasing in recent years,
these fraudulent transactions, from credit cards are
Logitic 0.91 0.07 0.14 98.2%
Regression causing serious money loss at global level to the people
using bank accounts for the safety of their money.
Many Fraud detection systems have been proposed
As we can see that Random Forest gives highest
technically to prevent the fraudulent activities. Our
accuracy and K-Nearest Neighbor gives lowest
findings are based on the original dataset of credit card
accuracy among these models of classification. In our
fraud detection. The main goal of this paper is to make a
experiment Random forest performs better than other
comparison between different techniques & models of
four models, but on the other side for any other dataset
ML. This research shows that Random Forest gives better
it might be possible that some other model will work
results among other algorithms of ML.
better than Random forest. We have created a bar chart
We have extracted the results of classification evaluation
of comparison on the basis of accuracy and we have
in tabular forms to provide better understanding.
constructed ROC_Curve of Random Forest Classifier to
Furthermore, we have created a bar chart of comparison
demonstrate the pictorial graph.
on the bases of accuracy to give a pictorial representation
of the results.
All our findings are actual and reality-based. Future
researchers should make a comparison with other models
in order to provide a wider view of the fraud detection
techniques. We hope that this paper will help the
researchers have positive minds towards Credit Card
Fraud Detection.

VI. ACKNOWLEDGEMENT

I’m thankful to Dr. Ahmed Hassan Butt, who is having


a vast knowledge of Machine Learning &
Figure 2:
Bioinformatics for providing His valuable time and
Comparison between Machine Learning Algorithms
5
guidance. I’m pleased with the support of Dr. Ahmed Minimal Optimization with Logistic
Hassan Butt because he helped me to do quality Regression,” Int. J. Interact. Mob. Technol.
IJIM, vol. 15, no. 05, p. 24, Mar. 2021, doi:
research. He also provided me with a complete 10.3991/ijim.v15i05.17173.
guidance of how to write a quality paper by using [7] “[PDF] Unsupervised Profiling Methods for
Fraud Detection | Semantic Scholar.”
different tools and techniques. This paper is written by https://www.semanticscholar.org/paper/Unsuper
Muhammad Hazeel Ahmed from the University of vised-Profiling-Methods-for-Fraud-Detection-
Bolton-
Management and Technology.
Hand/5b640c367ae9cc4bd072006b05a3ed7c2d5
f496d (accessed Jun. 11, 2022).
VII. REFERENCES [8] A. Kundu, S. Sural, and A. Majumdar, “Two-
Stage Credit Card Fraud Detection Using
[1] Aisha Abdallah, M. A. Maarof, and A. Zainal, Sequence Alignment,” 2006. doi:
“Fraud detection system: A survey,” J. Netw. 10.1007/11961635_18.
Comput. Appl., vol. 68, pp. 90–113, Jun. 2016, [9] “Amusan et al. - 2021 - Credit Card Fraud
doi: 10.1016/j.jnca.2016.04.007. Detection on Skewed Data using M.pdf.”
[2] John O. Awoyemi, A. O. Adetunmbi, and S. A. [10] “Credit Card Fraud Detection.”
Oluwadare, “Credit card fraud detection using https://www.kaggle.com/mlg-
machine learning techniques: A comparative ulb/creditcardfraud (accessed Jun. 11, 2022).
analysis,” in 2017 International Conference on [11] W. Koehrsen, Feature Selector: Simple Feature
Computing Networking and Informatics Selection in Python. 2022. Accessed: Jun. 11,
(ICCNI), Oct. 2017, pp. 1–9. doi: 2022. [Online]. Available:
10.1109/ICCNI.2017.8123782. https://github.com/WillKoehrsen/feature-
[3] Shiyang Xuan, G. Liu, Z. Li, L. Zheng, S. selector
Wang, and C. Jiang, “Random forest for credit [12] “ADASYN: Adaptive synthetic sampling
card fraud detection,” in 2018 IEEE 15th approach for imbalanced learning | IEEE
International Conference on Networking, Conference Publication | IEEE Xplore.”
Sensing and Control (ICNSC), Mar. 2018, pp. https://ieeexplore.ieee.org/document/4633969
1–6. doi: 10.1109/ICNSC.2018.8361343. (accessed Jun. 11, 2022).
[4] D. Varmedja, M. Karanovic, S. Sladojevic, M. [13] “SMOTE for Learning from Imbalanced Data:
Arsenovic, and A. Anderla, “Credit Card Fraud Progress and Challenges, Marking the 15-year
Detection - Machine Learning methods,” in Anniversary | Request PDF.”
2019 18th International Symposium INFOTEH- https://www.researchgate.net/publication/32578
JAHORINA (INFOTEH), East Sarajevo, Bosnia 9071_SMOTE_for_Learning_from_Imbalanced
and Herzegovina, Mar. 2019, pp. 1–5. doi: _Data_Progress_and_Challenges_Marking_the_
10.1109/INFOTEH.2019.8717766. 15-year_Anniversary (accessed Jun. 11, 2022).
[5] Siddhant Bagga, A. Goyal, N. Gupta, and A. [14] “Multilayer Perceptron - an overview |
Goyal, “Credit Card Fraud Detection using ScienceDirect Topics.”
Pipeling and Ensemble Learning,” Procedia https://www.sciencedirect.com/topics/computer-
Comput. Sci., vol. 173, pp. 104–112, Jan. 2020, science/multilayer-perceptron (accessed Jun. 12,
doi: 10.1016/j.procs.2020.06.014. 2022).
[6] A. Saleh Hussein, R. Salah Khairy, S. M. [15] “Confusion Matrix - an overview |
Mohamed Najeeb, and H. Th. S. Alrikabi, ScienceDirect Topics.”
“Credit Card Fraud Detection Using Fuzzy https://www.sciencedirect.com/topics/engineerin
Rough Nearest Neighbor and Sequential g/confusion-matrix (accessed Jun. 12, 2022).

You might also like