0% found this document useful (0 votes)
47 views4 pages

Khatri 2020

Uploaded by

nitinkymr21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views4 pages

Khatri 2020

Uploaded by

nitinkymr21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Supervised Machine Learning Algorithms fo r Credit

Card Fraud Detection: A Comparison


Samidha Khatri Aishwarya Arora Arun Prakash Agrawal
Department of Computer Science & Department of Computer Science & Department of Computer Science &
Engineering Engineering Engineering
Amity University Uttar Pradesh Amity University Uttar Pradesh Sharda University Greater Noida
[email protected] [email protected] [email protected]

Abstract— In today’s economic scenario, credit card use has become well as unsupervised machine learning techniques can be
extremely commonplace. These cards allow the user to make payments of applied to the data.
large sums of money without the need to carry large sums of cash. They
have revolutionized the way of making cashless payments and made The objective of this paper is to evaluate an imbalanced
making any sort of payments convenient for the buyer. This electronic dataset with the help of various supervised machine learning
form of payment is extremely useful but comes with its own set of risks. models and to determine which one of those is the best suited
With the increasing number of users, credit card frauds are also for detecting credit card frauds. We make use of 5 supervised
increasing at a similar pace. The credit card information of a particular
individual can be collected illegally and can be used for fraudulent machine learning models to evaluate a dataset on the basis of
transactions. Some Machine Learning Algorithms can be applied to various predefined criteria.
collect data to tackle this problem. This paper presents a comparison of
some established supervised learning algorithms to differentiate between
genuine and fraudulent transactions. II. RELATED WORK
Keywords: Credit Card, Credit Card Fraud, Machine Learning, Supervised Specific algorithms based on artificial intelligence and neural
Learning. networks are also being proposed and implemented to predict
the credit card frauds with increased accuracy. The
I. In t r o d u c t io n
distribution of the datasets used for fraud detection is highly
A Fraud can be described as an intentional deceit which is imbalanced. So, to overcome this obstacle, under- sampling
perpetrated for some kind of gain, mostly monetary. It is an and oversampling techniques are being designed to obtain
unfair practice whose occurrences are increasing by the day. comparatively balanced data. Data mining techniques are also
There has been a sharp increase in the usage of electronic being implemented in order to create a more efficient Fraud
payment methods like credit and debit cards and this has in Detection System [9]. Another important area of development
turn led to a rise in credit card frauds. These cards may be is the emergence of new hybrid models. These are derived
used in both online as well as offline modes to make payments from preexisting supervised as well as unsupervised machine
[7]. In case of the online mode of payment, the card may not learning techniques. Hybrid Models may be able to produce a
have to be physically presented. In such cases the card data is more accurate result as they encapture the capabilities of both
prone to attack by hackers or cyber criminals. These kinds of supervised as well as unsupervised machine learning [15].
frauds result in millions being lost every year. To overcome
this obstacle, many algorithms have and are being developed. It is observed that the performance of all machine learning
Various detection approaches are being worked upon to solve datasets is hindered due to the skewness of available data sets
this issue most efficiently [8]. which are usually unbalanced. To overcome this problem, the
unbalanced datasets are to be converted to balanced ones. This
Credit card transactions are extremely commonplace now but can be done by mainly two ways which are Intrinsic Method
they also come with their own set of problems. There are a lot and Network based Method. In Intrinsic Feature Method, a
of problems faced during fraud detection. The process of pattern in the customer Activity is observed whereas in
acceptance or rejection of a transaction happens within a very Network -based features Method, the network of users and the
small-time frame, which may range between micro and card merchants is exploited. These techniques may
milliseconds. Therefore, the process adopted for the detection significantly improve the functioning of certain Models as
of a fraudulent transaction has to be extremely quick and they work on a more Balanced Dataset[5].
effective. Another problem is that there are a vast number of
similar types of transactions happening at the same time. This
makes it difficult to monitor each and every transaction III. Ma c h in e Le a r n in g

individually and hence determine a fraud. Thus, an efficient Machine Learning is basically an application of Artificial
Fraud Detection System must be put into work to be able to Intelligence techniques in order to make the systems learn by
differentiate between a genuine and a fraud transaction. Such a themselves. This means that the system automatically learns,
system works on the principle of learning user-specific card improvises and adapts through experience without it being
usage behavior. Thus, existing approaches of supervised as programmed for performing a particular operation. This field

978-l-7281-2791-0/20/$31.00 ©2020 IEEE 680


deals with the coming up of programs that can deal with data unequal distribution of classes. The particular dataset that we
on their own, that is, which can access and modify the use is also an imbalanced one. This particular dataset contains
provided data according to the need of the user. Machine the record of transactions made by European cardholders. It
Learning can be classified into 3 main categories which are has the records of 284,807 transactions made over a span of
Supervised Learning, Unsupervised Learning and two days, out of which 492 were found out to be fraud. The
Reinforcement Learning. percentage of fraudulent transactions is found out to be
extremely low. This dataset was made and further analyzed
A. Machine Learning in Credit Card Fraud Detection
during a joint effort of Worldline and the Machine Learning
Machine Learning basically provides the system with the Group of ULB (Universite Libre de Bruxelles) [14]. There
“ability to learn” . The machine is able to use previously are 28 features obtained after the analysis of the main
procured data and analyze it further without being explicitly components of the actual attributes. The Time and Amount
commanded to. This feature is basically beneficial in detection components are not transformed and are provided as it is.
of credit card frauds. This enables machine learning Accuracy and some metrics cannot be used as they are not
algorithms to be successfully implemented in the banking sensitive to imbalanced data [1].
domain to identify the potentially risky transactions [13].
There are more than a million transactions which occur daily,
all these need to be checked for authenticity. To carry out this V. Cr it e r ia f o r Co m p a r is o n
task, the system can be trained to separate out the fraudulent In order to evaluate the performance of a particular model, we
transactions from the non-fraudulent ones. This is mostly done make use of various parameters. The models are used on the
by feeding it past transactions data, especially the ones from trained dataset and the outputs obtained with the use of each
the non-authentic transactions, so that all the newly model are compared systematically to those produced by the
approaching transactions can be labelled as normal or other models. Based on these comparisons, a conclusion is
suspicious respectively. Subsequently, the suspicious ones formed as to which is the best suited model for a particular
will be set apart for further investigation. dataset or a particular type of problem. In this paper we make
B. Supervised Learning use of the parameters Sensitivity, Precision and Time to
compare the various models being used [1][2].
This type of learning is also referred to as predictive learning
as it predicts the class of unknown objects based on prior A. Sensitivity
class-related information of similar objects. The main It is a measure of the proportion of actual positive cases that
inspiration behind this type of learning is to learn from the got predicted as positive or true positive. This actually implies
information about the task, which has been provided in the that there are supposed to be some proportion of actual
past. A machine requires the basic data about the task to be positive cases that would get predicted indirectly as negative.
provided to it. This basic input, or experience is given to it in Sensitivity is also sometimes referred to as Recall.
the form of ‘training data’. This is the past information or data Mathematically, sensitivity can be calculated as follows:
of a particular task. In this paper, we use the supervised
approach to detect fraud detection and analyze the various Sensitivity —(TP)/(TP + FN) (1)
algorithms based on supervised machine learning [5]. In this
kind of supervised approach, a database of past cases of Where, TP= True Positive and FN = False Negative
fraudulent and genuine transactions is stored. This database
acts as a reference point for the various algorithms. The B. Precision
process starts with the analysis of a provided dataset, then the It gives a measure of the proportion of positive predictions
selected algorithm produces an inferred function in order to that are truly positive. It indicates the reliability of a model in
make predictions about the possibility of getting various predicting a class of interest. Precision is basically a ratio of
output values. Irrespective of the model chosen, supervised correctly positively labelled to all positively labelled.
learning works as well as the data being used to train it. The Mathematically, precision can be calculated as follows:
prediction will be as accurate, as is the quality of data being
provided to the machine. In this paper, we use various Precision —(TP)/(TP + FP) (2)
Classification models of Supervised machine learning to
predict wrongful transactions with the help of an imbalanced Where, TP= True Positive and FP = False Positive
dataset. These various models are compared after they are run
on the basis of the outputs that they provide. C. Time
Time is used as a parameter for performance evaluation of the
various models that are used. We calculate the time for
IV. DATASET USED training the model and predicting the test data. The Time
A dataset is basically a collection of related data. In this paper, calculated is not the actual time, but the approximate time
we make use of a publicly available imbalanced dataset. An taken by a particular model. This parameter is used to compare
imbalanced dataset is one in which disparity occurs in the the various models used based upon the time taken by them in
dependent variables. Imbalanced implies that there is an handling the data.

10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 681
D. Random Forest
VI. Mo d e l s Us e d This model is basically an ensemble classifier, i.e. a
combining classifier that uses and combines many decision
A. Decision Tree tree classifiers. The main agenda behind using multiple trees is
This is one of the most widely used predictive modelling to be able to train the trees enough, such that, contribution
approaches. As per the name of the model, this is built in the from each of them comes in the form of a model. After the
form of a tree like structure [16]. This model maybe used in generation of the tree, the output is combined through
case of a multi-dimensional analysis where there are multiple majority. It uses multiple decision trees so that, the
classes present. The past data also known as the past vector is dependence of each of them is on a particular dataset
used to create a model that can be used to predict the value of possessing similar distribution throughout the tree [6]. This
the output based on the input being provided. There are particular model has the quality of efficiently balancing errors
multiple nodes in a tree and each node corresponds to one or in a class population of unbalanced data sets. It can be used to
the other vector. The tree terminates at a leaf node where each solve both classification as well as regression problems.
such node represents a possible outcome or output.
E. Naive Bayes
B. kNN It is a form of probabilistic classifier model; this implies that it
k- Nearest Neighbor model is one the simplest but most has the ability to make predictions for multiple classes at once.
effective models. In this model, the class label of the test It is based on the Bayes Theorem. Probabilistic Classifiers are
datasets on the basis of the class label of the neighboring those which make it possible to predict multiple classes. The
training data elements. The similarity between two elements is decision is made based on conditional probability. This model
measured using Euclidean Distance [4][16]. It is also known uses a set of algorithms instead of a single algorithm, but all of
as an Instance learning or Lazy model. The value of ‘k ’ is these have a common principle. In this model, it is assumed
calculated which actually the number of is nearest neighbors that each feature makes an equal and individual contribution to
that have to be considered. the output. This model has certain advantage over other
A suitable value for ‘k’ should be chosen. An appropriate models as it requires only a small amount of training data [4].
distance metric is also a requirement. Sometimes, the
‘Minkowski’ distance may be used. It is a generalization of the
Euclidean and Manhattan distance. Mathematically, it is can VII. OBSERVATIONS

be represented as:
TABLE I
PERFORMANCE EVALUATION OF VARIOUS MODELS AT
THRESHOLD VA LUE OF 0.5
d (x(l),x 0)) = j £ fc|x® - x ^ Y (3)
MODEL S E N S IT IV IT Y PREC ISIO N
C. Logistic Regression DECISION TREE 79.21 85.11
It is basically a statistical model which makes use of a logistic KNN 81.19 91.11
function to model a binary dependent variable. This model is LOGISTIC REGRESSION 63.34 87.67
RANDOM FOREST 75.25 93.83
mainly used where there is a chance of occurrence of a binary
NAIVE BAYES 85.15 6.56
classification issue. It works well on linearly separable classes
We use the imbalanced dataset to analyze the 5 supervised
[4].The odds ratio is one concept using which we can also
learning models and find out the values of sensitivity and
define the logit function. It is the probability of an event
precision for each of these models. The default threshold value
occurring.
is taken as 0.5 according to standards.
Odds Ratio —p / ( l —p) (4) TABLE II
PERFORMANCE EVALUATION OF VARIOUS MODELS AT
Where, p = probability of the positive event THRESHOLD VA LUE OF 0.4
The logit function is the logarithm of the odds ratio. It takes
MODEL S E N S IT IV IT Y P R EC ISIO N
input in the range of [0,1] and transforms them to values over DECISION TREE 79.21 85.11
the real-number range. KNN 81.19 91.11
The logit function can be defined as follows: LOGISTIC REGRESSION 69.31 87.5
RANDOM FOREST 78.22 89.77
N A IV E BAYES 85.15 6.52
Logit (P) = l o g ^ (5)
The Threshold value is changed from 0.5 to 0.4 for calculating
In this model, the sigmoid function is also used effectively sensitivity and precision for each model. The analysis was
performed at different values, but the best output was obtained
0(*) - ^ (6) when the threshold value was taken as 0.4. When the value
was changed, an increase was observed in the sensitivity and

682 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
precision of Logistic Regression, Naive Bayes and Random the best approach to be used for detecting credit card fraud
Forest. detection. But, the performance of Decision Tree Model must
also be evaluated with the help of unsupervised machine
TABLE III learning models in the future to produce a more conclusive
TIME TAKEN FOR TRAINING AND PREDICTING DATA USING
result. This tells us whether the model which is chosen is a
VARIOUS MODELS
better option or the unsupervised machine learning techniques
M O D EL T IM E TAKEN T IM E TA K EN FOR perform better.
FOR PR ED IC TIN G THE
TR A IN IN G TEST DATA
TH E M O DEL (SECONDS) REFERENCES
(SECONDS)
DECISION TREE 5s 0s
KNN 1s 462s [1] https ://www.analyticsvidhya.com/blog/2016/03/practical-guide-deal-
LOGISTIC REGRESSION 3s 0s imbalanced-classification-problems/.[Accessed: Oct 12, 2019].
RANDOM FOREST 23s 0s [2] https://www.ritchieng.com/machine-learning-evaluate-classification-
model/.[Accessed: Oct 12, 2019].
NAIVE BAYES 0s 0s
[3] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi and G. Bontempi,”
Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning
The time taken by all 5 models for training the model and Strategy,” in IEEE Transactions on Neural Networks and Learning Systems,
predicting the test data was recorded. These values are not the vol. 29, no. 8, pp. 3784-3797, Aug. 2018.
[4] J. O. Awoyemi, A. O. Adetunmbi and S. A. Oluwadare, ’’Credit card
actual values, but the approximate values of time taken by fraud detection using machine learning techniques: A comparative analysis,”
them. 2017 International Conference on Computing Networking and Informatics
(ICCNI), Lagos, 2017, pp. 1-9.
[5] S. Dhankhad, E. Mohammed and B. Far, ’ Supervised Machine Learning
V III. CONCLUSION AND FUTURE WORK Algorithms for Credit Card Fraudulent Transaction Detection: A Comparative
Study,” 2018 IEEE International Conference on Information Reuse and
In this study, we used an imbalanced dataset to check the Integration (IRI), Salt Lake City, UT, 2018, pp. 122-125.
suitability of different supervised machine learning models to [6] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang and C. Jiang, ’’Random forest
predict the chances of occurrence of a fraudulent transaction. for credit card fraud detection,” 2018 IEEE 15th International Conference on
Networking, Sensing and Control (ICNSC), Zhuhai, 2018, pp. 1-6.
We used sensitivity, precision and time as the deciding [7] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data
parameters to come to a particular conclusion. Accuracy as a mining for credit card fraud: A comparative study,” Decis. Support Syst., vol.
parameter was not used as it is not sensitive to imbalanced 50, no. 3, pp. 602-613, 2011.
data and does not give a conclusive answer. We analyzed the [8] K. Chaudhary, J. Yadav, and B. Mallick, “A review of Fraud Detection
Techniques: Credit Card,” Int. J. Comput. Appl., vol. 45, no. 1, pp. 975-8887,
kNN, Naive Bayes, Decision Tree, Logistic Regression and 2012.
Random Forest models in this study. We used these models [9] F. N. Ogwueleka, “Data Mining Application in Credit Card Fraud
for predicting the chances of occurrence of a fraudulent credit Detection System,” vol. 6, no. 3, pp. 311-322, 2011.
card transaction out of a given number of transactions. Credit [10] O. S. Yee, S. Sagadevan, N. Hashimah, and A. Hassain, “Credit Card
Fraud Detection Using Machine Learning As Data Mining Technique,” vol.
Card frauds are a modern-day issue and we came to the 10, no. 1, pp. 23-27.
conclusion that the best suited model for predicting such [11] C. Phua, D. Alahakoon and V. Lee, "Minority report in fraud
frauds is the Decision Tree model. The analysis shows that the detection", ACMSIGKDD Explorations Newsletter, vol. 6, no. 1, p. 50, 2004.
sensitivity of the kNN model is greater than that of Decision [12] N. Sethi and A. Gera, "A Revived Survey of Various Credit Card Fraud
Detection Techniques", International Journal o f Computer Science and
tree, but as time taken by kNN for testing the data is very Mobile Computing, vol. 3, no. 4, pp. 780-791, 2014.
large, we choose Decision Tree over kNN. In case of fraud [13] J. Awoyemi, A. Adetunmbi and S. Oluwadare, "Credit card fraud
detection, we need to ensure that minimum time is taken for detection using machine learning techniques: A comparative analysis", 2017
prediction, therefore, Decision Tree is the preferred model. International Conference on Computing Networking and Informatics (ICCNI),
2017.
[14] http://www.ulb.ac.be/di/map/adalpozz/imbalancedatasets.zip.[Accessed:
Future researchers in this field may apply the resampling Oct 10, 2019].
techniques to the respective datasets being used. This [15] S. Mittal and S. Tyagi, "Performance Evaluation of Machine Learning
technique helps to reduce the imbalance ratio of datasets Algorithms for Credit Card Fraud Detection", 2019 9th International
Conference on Cloud Computing, Data Science & Engineering (Confluence),
which in turn produces better classification results. 2019.
[16] S.Dutt, A.K.Das and S.Chandramouli, Machine Learning. Pearson
After the comparative analysis of the various Supervised Education India, 2018.
Learning models, we can infer that the Decision Tree Model is

10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 683

You might also like