Performance Evaluation of Machine Learning
Performance Evaluation of Machine Learning
Abstract—Credit card transactions have become common place Primary stakeholder is the credit card issuing company
today and so is the frauds associated with it. One of the most as with increase in frauds done on its cards the companys
common modus operandi to carry out fraud is to obtain the reputation suffers a lot. Thus, it is up to the issuer to implement
card information illegally and use it to make online purchases.
For credit card companies and merchants, it is in-feasible to a fraud prevention and detection mechanism. For preventing
detect these fraudulent transactions among thousands of normal frauds, companies issue periodic advisories to its customers on
transactions. If sufficient data is collected and made available, dos and donts of safe card usage. In some cases, extra factors
machine learning algorithms can be applied to solve this problem. of authentication like OTP and security question are employed
In this work, popular supervised and unsupervised machine to deter fraudulent usage. However, fraud cases are inevitable
learning algorithms have been applied to detect credit card frauds
in a highly imbalanced dataset. It was found that unsupervised despite these prevention mechanisms [3]. Thus, when a fraud
machine learning algorithms can handle the skewness and give occurs and is reported the bank must put in resources for post
best classification results. mortem analysis and try to recover and punish the perpetrator.
Keywords—credit card, credit card fraud, fraud detection sys- The turnaround time for this detection has been several days
tem, machine leaning, supervised learning, unsupervised learning. which doesnt prove useful to deter the frauds.
Fraud Detection Systems (FDS) are automated machine
learning based solutions that credit card companies employ
I. I NTRODUCTION to detect the fraudulent transactions even before end users
feedback [4]. Goal of such a system is to detect the fraudulent
Due to ease of use and money borrowing option, Credit transaction before it is committed to the database and thus
cards are being used as a payment instrument by both online prevent the fraud from taking place. An ideal FDS should also
and offline buyers in a big way [1]. However, this convenience minimize the false detections where a genuine transaction is
has come with its own share of troubles too. Credit card- interrupted causing inconvenience to the end-user.
based transactions have become a major vulnerable target Machine learning based algorithms work with lots of exam-
for criminals, hackers and perpetrators. Online use of credit ple data of the underlying domain to define computation model
card requires only the card information to be entered and so as to classify future data seen in the domain. A class of these
not present the card physically. In some cases, an extra au- algorithms called Supervised Learning Algorithms requires
thentication factor of sending a One Time-Password (OTP) is the example data classes to be pre-labeled. On the other
considered. In all others, where this is not required, specifically hand, other class of algorithms uses Unsupervised Learning
for international transactions, it can be used for unauthorized where the data is clustered into identical groups and termed
purchases. Such usage is called Card-Not-Present as instead as belonging to one class. Many algorithms based on both
of physical card only details of card are required. With approaches have been proposed in literature [5-12]. FDS
methods like card stealing, shoulder surfing, buying credit card collect lot of historical data to apply computations on them.
information and web traffic sniffing becoming possible, it is But the transaction data sets are typically imbalanced with
very easy to steal the card information. number of normal transactions far outnumbering the fraudulent
Card holder, issuing bank as well as merchant all three ones. In this paper, we outline and evaluate various popular
become victims of a credit card fraud, as it is one of them who machine learning algorithms with respect to their capability
has to bear the burden of fraud. Generally, it is the duty of card to correctly classify fraudulent transactions in a real world
older to detect the fraud and report fraudulent transactions to imbalanced dataset.
the issuing bank. The bank then investigates the issue and if Rest of the paper is organized in four more sections. Section
evidence of fraud is found then the process for reversing the II describes the credit card detection problem and challenges in
credit for the transaction is initiated. This process is non-real solving it. In section III, various machine learning algorithms
time and has no guarantee of successfully resolving the issue chosen for this study have been briefed. Results obtained
[2]. on various metrics of comparison have been presented and
978-1-5386-5933-5/19/$31.00 2019
c IEEE 320
Authorized licensed use limited to: University of Queensland. Downloaded on September 14,2023 at 23:32:14 UTC from IEEE Xplore. Restrictions apply.
discussed in Section IV. Section V concludes the paper with A. Supervised Learning
final remarks. A spectrum of supervised learning algorithm from classical
to recent ones has been considered. These include tree-based
II. C REDIT C ARD F RAUD D ETECTION algorithms, Bayesian approaches, Neural networks both clas-
sical and deep learning and hybrid algorithms.
Credit cards are small plastic rectangular shaped cards
issued to an authenticated user by a financial institution like • Random Forest - The Random Forest (RF) is an essemble
bank. End user can use it by presenting physically at the point method classifier that combines various tree predictors.
of sale terminals or online e-commerce sites to purchase goods The advantage of using RF is that it is robust to noise
and services. Any unauthorized use of the card in either of and outliers [17].
these forms is termed as Credit Card Fraud. • Neural Networks (NN) and Artificial Neural Networks
(ANN) - For fraud detection, the commonly used method
A. Credit Card Fraud Detection are NNs. NN recognize related patterns, predict values
based on associative memory of patterns it learnt [18].
Credit card fraud detection mechanism depends on the fraud An ANN consists of inert-connected artificial neurons.
mechanism itself. There are several vectors by which a fraud ANNs are Feed-Forward Neural Networks (NN) that use
can be committed [13]. However, these can be classified in two the backpropagation algorithm for training. [16].
main categories. First of them is the frauds caused by illegally • Deep Learning (DL) - DL is based on a multi-layer
obtaining the possession of physical card. This can not only perceptron network trained using a stochastic gradient
be done by stealing the card from actual owner before or after descent with. In [19], authors proposed a deep learning
delivery but by many other methods also where a new cloned based on auto-encoder (AE) which is an unsupervised
card is created that can act as counterfeit for actual card. The learning algorithm that applies backpropagation by set-
other category of frauds is caused by obtaining credit card ting the inputs equal to the outputs.
information illegally. This can be done while taking imprints • Support Vector Machine (SVM) - SVM are used for
of cards at hotels, shoulder surfing, phishing among others supervised learning where the data is linearly classified
[14]. Sometimes, actual allottee may try to bluff the company and analyzed [16] [18].
by denying the transaction made by him. In all cases, if an • Nave Bayes Naive Bayes (NB) method classifiers are
extra authentication measure like OTP has been implemented based on Bayesian theory that selects the decision based
then its security also must be circumvented. A computational on conditional probability [7].
Fraud Detection System (FDS) is designed to detect all types • Logistic Regression - Logistic Regression (LR) finds the
of frauds by differentiating the behavior of fraudster from best fit parameter to estimate the probability of the binary
actual user. response based on one or more features [7].
• Extended Gradient Boosted Tree (XGBT) It is based on
B. Challenges iterated decision tree classification. In each iteration, re-
sults of previously incorrect classification act as feedback
A desirable FDS is the one that can detect all types of
to improve results in newer ones.
credit card frauds. It works on the principle of learning user-
• Quadratic Discriminant Analysis (QDA) It classifies
specific card usage behavior and fraudsters spending patterns
instances by learning a mathematical non-linear quadratic
instead of focusing on fraud vector [15]. If long term card
decision boundary to differentiate instances of different
usage data of many users and fraudulent transactions occurring
classes.
within that period are available, FDS creation becomes a
• K-Nearest Neighbours - The K-Nearest Neighbour
binary classification problem. The two classes of interest here
(KNN) classifiers are instance based supervised learning
are Normal and Fraud transactions. Existing approaches of
methods that classifies based on a similarity measure, like
Supervised and Unsupervised machine learning can be thus
Euclidean, Mahanttan or Minkowski distance functions
applied to these datasets. However, there are a few challenges
[7].
that come in the way of good classification results from these
• Hybrid Supervised Approaches In these types of systems,
algorithms. Some of these challenges Class Skewness, changes
accurate fraud detection has been obtained by applying
in fraudster behavior to avoid getting caught, seasonal changes
multiple approaches in different phases [16]. XGBT is an
in users behavior, domain metrics, lack of truth labels, real-
ensemble approach.
time classification requirements [16].
B. Unsupervised Learning
III. M ACHINE L EARNING A LGORITHMS FOR F RAUD
D ETECTION Unsupervised techniques cluster identical data rows and
consider them to belonging to same class. These are thus very
Some popular machine learning algorithms in supervised useful in detecting outliers, rows that dont belong to any of the
and unsupervised categories were chosen to be evaluated for clusters. Thus, unsupervised methods are particularly useful
the underlying problem. here as the frauds cases being too few than normal cases, these
9th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 321
Authorized licensed use limited to: University of Queensland. Downloaded on September 14,2023 at 23:32:14 UTC from IEEE Xplore. Restrictions apply.
can be considered as outliers. Some methods of unsupervised TABLE I
detection that have been used in this study are Self organizing CONFUSION MATRIX FOR EVALUATING CLASSIFICATION
maps, K- means, Isolation Forest and Local Outlier Factor. Predicted Actual
• Self-Organizing Maps (SOM): It is a neural network that Normal Fraud
Normal True Negatives (TN) False Negatives (FN)
employs unsupervised learning to configure its neurons Fraud False Positives (FP) True Positives (TP)
according to the topological structure of the input data.
This process, called as self-organization, is an iterative
tuning of the weights of neurons to approximate the
input data. SOM provides a clustering method, which • Positive Predictive Value or Precision: The positive pre-
is appropriate for constructing and analyzing customer dictive value (PPV) is defined as:
profiles in credit card fraud detection, as suggested in
[16]. P P V = T P/(T P + F P ) (1)
• K-means -K-means clustering algorithm is an unsuper-
PPV is a measure of correct positive results among all
vised technique and popular clustering algorithm. K-
positive predictions.
Means clustering is a simple and efficient method to
• Negative Predictive Value: The negative predictive value
cluster the data [6].
is defined as:
• Isolation Forest is an ensemble regressor, and it uses the
concept of isolation to explain/separate-away anomalies. N P V = T N/(T N + F N ) (2)
No profiling of normal instances, and no point-based
distance calculation. Instead, IF builds an ensemble of NPV is the proportion of correctly identified negative
random trees for a given data set, and anomalies are values (0 here) among all negative predicted ones
points with the shortest average path length. • Specificity: measures the proportion of actual negatives
• The local outlier factor is based on a concept of a local that are correctly identified.
density, where locality is given by nearest neighbors,
whose distance is used to estimate the density by com- Specif icity = T N/(T N + F P ) (3)
paring the outliers.
• Sensitivity: measures the proportion of actual positives
In next section, the chosen algorithms are applied on a real
that are correctly identified.
credit card data set and their performance has been evaluated.
IV. P ERFORMANCE E VALUATION Sensitivity = T P/(T P + F P ) (4)
A. Dataset • Balanced Accuracy: is average detection rate obtained on
We used a publicly available, processed real dataset for either class.
evaluation. The dataset was collected and analyzed during a
research collaboration of Worldline and the Machine Learning BalancedAccuracy = (Sensitivity + Specif city)/2
Group of ULB (University Libre de Bruxelles) on big data (5)
mining and fraud detection by Andrea Dal Pozzolo and his • Prevalence: is the term used for how often the condition
peers [20]. The dataset has total of 284,807 transactions made yes actually occurs.
in September 2013 by European cardholders. The data set con-
tains 492 fraud transactions, which is highly imbalanced. Due P revalence = (T P + F N )/(T P + F P + F N + T N )
to the confidentiality issue, a total of 28 features obtained after (6)
principal components analysis of actual attributes are provided. • Diagnostic Odd Ratio (DOR) is a term taken from
Only the time and the amount data are not transformed and medical domain. It checks the overall efficacy of a
are provided as such. The feature ’Time’ contains the seconds classification test.
elapsed between each transaction and the first transaction in
DOR = (P P V ∗N P V )/((1−P P V )∗(1−N P V )) (7)
the dataset. The attribute ’Amount’ is the transaction Amount.
Finally, attribute ’Class’ is the type of transaction label and it C. Results
takes value 1 in case of fraud and 0 otherwise.
Tables II-IV represent the results of performance evalua-
B. Metrics tion of selected supervised, hybrid and unsupervised machine
The chosen algorithms assume the underlying fraud de- learning algorithms over chosen metrics. For each run, a
tection issue as classification problem. We have considered randomly selected subset was used as test set.
the confusion matrix given in Table 1 for evaluating metrics. There are few NaNs in the table where the classifier couldnt
However, classical metrics of accuracy and confusion matrix detect even a single true positive or true negative for the result
will not be able to capture the actual fraud identification rate set. Figure 1 shows the results of balanced accuracy. KNN,
due to skewness in instances of each class. Thus, metrics that NB and K-means based classifier has not been shown as it
balance the detection of both classes have been considered. was very low at 0.50 due to almost nil true positives.
322 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: University of Queensland. Downloaded on September 14,2023 at 23:32:14 UTC from IEEE Xplore. Restrictions apply.
TABLE II
PERFORMANCE EVALUATION OF SUPERVISED MACHINE
LEARNING ALGORITHMS
Classifier Metrics Values Specificity is best for KNN and least for NN. It is
Positive Negati Preva True True Diagno however very high at 0.99 and 1 for most of hybrid and
Techn Predicti ve lence Negati Positi stic
iques ve Value Predic ve ve Odd
unsupervised learning models.
Used (Precisi tive Rate Rate Ratio
on) Value (Speci (Sensiti) (DOR) 4) What is the Normal Transaction Detection rate?
ficity) vity/
Recall)
SOM 0.92 0.84 0.99 0.83 0.92 60.375 Sensitivity results show that NN, hybrid models and IF,
Hybrid LOF supervised learning give perfect detection rates for
Isolation 0.99 0.99 0.99 1.0 1.0 9801
Forest
normal transactions. K-means couldnt correctly identify
Local 0.99 0.99 0.998 1.0 1.0 9801 any of the cases.
Outlier
Factor 5) Testing overall effectiveness of classification
K-Means 0.99 NaN 0.998 1.0 0.0 0
9th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 323
Authorized licensed use limited to: University of Queensland. Downloaded on September 14,2023 at 23:32:14 UTC from IEEE Xplore. Restrictions apply.
two measures to detect classification effectiveness in [10] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams and P. Beling, ”Deep
imbalanced datasets. Out of all chosen methods NN, /learning detecting fraud in credit card transactions,” 2018 Systems and
Information Engineering Design Symposium (SIEDS), Charlottesville,
ANN, XGBT LR, ensemble model, IF and LOF gave VA, 2018, pp. 129-134.
near perfect results. DOR is also another balanced [11] Alex G.C.de S, Adriano C.M.Pereira, Gisele L.Pappa, A customized
metric where higher values are interpreted as better classification algorithm for credit card fraud detection, Engineering
Applications of Artificial IntelligenceVolume 72, June 2018, Pages 21-
results. XGBT gave best DOR while all hybrid models 29.
gave similar results. Unsupervised Learners, IF LOF [12] S. Dhankhad, E. Mohammed and B. Far, ”Supervised Machine Learning
scored best among unsupervised as well as overall also. Algorithms for Credit Card Fraudulent Transaction Detection: A Com-
parative Study,” 2018 IEEE International Conference on Information
Reuse and Integration (IRI), Salt Lake City, UT, 2018, pp. 122-125.
Therefore, unsupervised, specifically IF and LOF are the [13] Zareapoor, Masoumeh K.R., SeejaAlam, Afshar. (2012). Analysis on
Credit Card Fraud Detection Techniques: Based on Certain Design
overall winners for dealing with highly imbalanced datasets. Criteria. International Journal of Computer Applications. 52. 35-42.
10.5120/8184-1538.
[14] Rajeshwari U and B. S. Babu, ”Real-time credit card fraud detection
V. C ONCLUSION using Streaming Analytics,” 2016 2nd International Conference on
Applied and Theoretical Computing and Communication Technology
In this paper, suitability of machine learning algorithms in (iCATccT), Bangalore, 2016, pp. 439-444.
detecting credit card frauds has been evaluated. Credit card [15] Credit Card Fraud Detection: What Payment Gateways Can Do for You:
fraud detection is a peculiar classification problem due to https://www.chargebee.com/blog/credit-card-fraud-detection-tools/
[16] Sethi, Neha and Anju Gera. A Revived Survey of Various Credit Card
very high imbalance in instances of normal and fraudulent Fraud Detection Techniques. (2014).
transactions as examples. A number of popular algorithms in [17] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang and C. Jiang, ”Random
supervised, ensemble and unsupervised categories were eval- forest for credit card fraud detection,” 2018 IEEE 15th International
Conference on Networking, Sensing and Control (ICNSC), Zhuhai,
uated on different metrics. It is concluded that unsupervised 2018, pp. 1-6.
algorithms handle the dataset skewness in better ways and [18] Jain, Rajni Gour, Bhupesh Dubey, Surendra. (2016). A Hybrid
hence perform well over all metrics absolutely and relatively Approach for Credit Card Fraud Detection using Rough Set and Decision
Tree Technique. International Journal of Computer Applications. 139. 1-
to other techniques. 6. 10.5120/ijca2016909325.
There were few NaNs values in the result table where the [19] Pumsirirat, Apapan Yan, Liu. (2018). Credit Card Fraud Detection
classifier couldnt detect even a single true positive or true using Deep Learning based on Auto-Encoder and Restricted Boltzmann
Machine. International Journal of Advanced Computer Science and
negative value. The future work should be contributed towards Applications. 9. 10.14569/IJACSA.2018.090103.
learning about resampling techniques that will help us to [20] D.J. Newman A. Asuncion. UCI machine learning
reduce imbalance ratio of the datasets and furthermore remove repository, 2007. Transformed datasets are available at
http://www.ulb.ac.be/di/map/adalpozz/ imbalanced-datasets.zip.
NaN values for classifiers in order to improve the skewness
of imbalanced datasets for better classification results.
R EFERENCES
[1] The importance of credit cards: https://budgeting.thenest.com/importance-
credit-cards-29514.html
[2] The chargeback process in a credit card:
https://chargebacks911.com/chargeback-process/
[3] Low and Slow Is How the Credit Card Fraudsters
Roll: https://www.threatmetrix.com/digital-identity-blog/fraud-
prevention/low-and-slow-is-how-the-credit-card-fraudsters-roll/
[4] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi and G. Bontempi,
”Credit Card Fraud Detection: A Realistic Modeling and a Novel
Learning Strategy,” in IEEE Transactions on Neural Networks and
Learning Systems, vol. 29, no. 8, pp. 3784-3797, Aug. 2018.
[5] L. Zheng, G. Liu, C. Yan and C. Jiang, ”Transaction Fraud Detection
Based on Total Order Relation and Behavior Diversity,” in IEEE
Transactions on Computational Social Systems, vol. 5, no. 3, pp. 796-
806, Sept. 2018.
[6] Vaishali. Article: Fraud Detection in Credit Card by Clustering Ap-
proach. International Journal of Computer Applications 98(3):29-32,
July 2014.
[7] J. O. Awoyemi, A. O. Adetunmbi and S. A. Oluwadare, ”Credit
card fraud detection using machine learning techniques: A comparative
analysis,” 2017 International Conference on Computing Networking and
Informatics (ICCNI), Lagos, 2017, pp. 1-9.
[8] L. Zheng et al., ”A new credit card fraud detecting method based
on behavior certificate,” 2018 IEEE 15th International Conference on
Networking, Sensing and Control (ICNSC), Zhuhai, 2018, pp. 1-6.
[9] SurajPatil*, VarshaNemade, PiyushKumarSoni, Predictive Modelling
for Credit Card Fraud Detection Using Data Analytics, International
Conference on Computational Intelligence and Data Science (ICCIDS
2018)
324 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: University of Queensland. Downloaded on September 14,2023 at 23:32:14 UTC from IEEE Xplore. Restrictions apply.