0% found this document useful (0 votes)
140 views11 pages

Fraud Detection in Banking Data by Machine Learning

This document is the author's version of an article that has been accepted for publication in IEEE Access. The article proposes an efficient approach for detecting credit card fraud using optimized machine learning algorithms like LightGBM, XGBoost, and CatBoost. The authors evaluate the performance of these algorithms on publicly available banking datasets. They also use ensemble learning and deep learning to further improve fraud detection performance.

Uploaded by

anhttv21406
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views11 pages

Fraud Detection in Banking Data by Machine Learning

This document is the author's version of an article that has been accepted for publication in IEEE Access. The article proposes an efficient approach for detecting credit card fraud using optimized machine learning algorithms like LightGBM, XGBoost, and CatBoost. The authors evaluate the performance of these algorithms on publicly available banking datasets. They also use ensemble learning and deep learning to further improve fraud detection performance.

Uploaded by

anhttv21406
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Fraud Detection in Banking Data by


Machine Learning Techniques
SEYEDEH KHADIJEH HASHEMI1 , SEYEDEH LEILI MIRTAHERI 1 , AND SERGIO GRECO.2 ,
1
Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran, Iran; (e-mail: [email protected],
[email protected])
2
Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, Italy (e-mail: [email protected])
Corresponding author: Seyedeh Leili Mirthaeri (e-mail: [email protected]).

ABSTRACT As technology advanced and e-commerce services expanded, credit cards became one
of the most popular payment methods, resulting in an increase in the volume of banking transactions.
Furthermore, the significant increase in fraud requires high banking transaction costs. As a result, detecting
fraudulent activities has become a fascinating topic. In this study, we consider the use of class weight-
tuning hyperparameters to control the weight of fraudulent and legitimate transactions. We use Bayesian
optimization in particular to optimize the hyperparameters while preserving practical issues such as
unbalanced data. We propose weight-tuning as a pre-process for unbalanced data, as well as CatBoost and
XGBoost to improve the performance of the LightGBM method by accounting for the voting mechanism.
Finally, in order to improve performance even further, we use deep learning to fine-tune the hyperparameters,
particularly our proposed weight-tuning one. We perform some experiments on real-world data to test
the proposed methods. To better cover unbalanced datasets, we use recall-precision metrics in addition
to the standard ROC-AUC. CatBoost, LightGBM, and XGBoost are evaluated separately using a 5-fold
cross-validation method. Furthermore, the majority voting ensemble learning method is used to assess
the performance of the combined algorithms. LightGBM and XGBoost achieve the best level criteria of
ROC-AUC = 0.95, precision 0.79, recall 0.80, F1 score 0.79, and MCC 0.79, according to the results. By
using deep learning and the Bayesian optimization method to tune the hyperparameters, we also meet the
ROC-AUC = 0.94, precision = 0.80, recall = 0.82, F1 score = 0.81, and MCC = 0.81. This is a significant
improvement over the cutting-edge methods we compared it to.

INDEX TERMS Bayesian Optimization, Data Mining, Deep Learning, Ensemble Learning, Hyper
parameter, Unbalanced Data, Machine Learning

I. INTRODUCTION People who commit fraud usually use security, control,


N recent years, there has been a significant increase in and monitoring weaknesses in commercial applications to
I the volume of financial transactions due to the expansion
of financial institutions and the popularity of web-based e-
achieve their goals. However, technology can be a tool to
combat fraud [4]. To prevent further possible fraud, it is
commerce. Fraudulent transactions have become a growing important to detect the fraud right away after its occurrence
problem in online banking, and fraud detection has always [5].
been challenging [1], [2]. Fraud can be defined as wrongful or criminal deception
Along with credit card development, the pattern of credit intended to result in financial or personal gain. Credit card
card fraud has always been updated. Fraudsters do their fraud is related to the illegal use of credit card information
best to make it look legitimate, and credit card fraud has for purchases in a physical or digital manner. In digital
always been updated. Fraudsters do their best to make it look transactions, fraud can happen over the line or the web, since
legitimate. They try to learn how fraud detection systems the cardholders usually provide the card number, expiration
work and continue to stimulate these systems, making fraud date, and card verification number by telephone or website
detection more complicated. Therefore, researchers are con- [6].
stantly trying to find new ways or improve the performance There are two mechanisms, fraud prevention and fraud
of the existing methods [3]. detection, that can be exploited to avoid fraud-related losses.

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Fraud prevention is a proactive method that stops fraud methods outperform the existing and based methods.
from happening in the first place. On the other hand, fraud For evaluations, we use publicly available datasets and
detection is needed when a fraudster attempts a fraudulent also publish the source codes 1 with public access to be
transaction [7]. used by other researchers.
Fraud detection in banking is considered a binary clas- The reminder of this paper is organized as follows: In
sification problem in which data is classified as legitimate Section II we review the related state-of-the-art. The pro-
or fraudulent [8]. Because banking data is large in volume posed approach for credit card fraud detection including
and with datasets containing a large amount of transaction the dataset, pre-processing, feature extraction and feature
data, manually reviewing and finding patterns for fraudu- selection, algorithms, framework, and evaluation metrics, is
lent transactions is either impossible or takes a long time. presented in Section III. Section IV discusses the evaluation
Therefore, machine learning-based algorithms play a pivotal results of the experiments performed, and finally Section V
role in fraud detection and prediction [9]. Machine learning concludes the paper.
algorithms and high processing power increase the capability
of handling large datasets and fraud detection in a more effi- II. RELATED WORKS
cient manner. Machine learning algorithms and deep learning In order to prevent fraudulent transactions and detect credit
also provide fast and efficient solutions to real-time problems card fraud, several methods have been proposed by re-
[10]. searchers. A review of state-of-the-art related works is pre-
In this paper, we propose an efficient approach for de- sented in the following.
tecting credit card fraud that has been evaluated on publicly Halvaiee & Akbari study a new model called the AIS-
available datasets and has used optimised algorithms Light- based fraud detection model (AFDM). They use the Immune
GBM, XGBoost, CatBoost, and logistic regression individu- System Inspired Algorithm (AIRS) to improve fraud detec-
ally, as well as majority voting combined methods, as well tion accuracy. The presented results of their paper show that
as deep learning and hyperparameter settings. An ideal fraud their proposed AFDM improves accuracy by up to 25%,
detection system should detect more fraudulent cases, and reduces costs by up to 85%, and reduces system response
the precision of detecting fraudulent cases should be high, time by up to 40% compared to basic algorithms [11].
i.e., all results should be correctly detected, which will lead
Bahnsen et al. developed a transaction aggregation strategy
to the trust of customers in the bank, and on the other hand,
and created a new set of features based on the periodic
the bank will not suffer losses due to incorrect detection.
behaviour analysis of the transaction time by using the von
The main contributions of this paper are summarized as
Mises distribution. In addition, they propose a new cost-
follows:
based criterion for evaluating credit card fraud detection’s
• We adopt Bayesian optimization for fraud detection
models and then, using a real credit card dataset, examine
and propose to use the weight-tuning hyperparameter how different feature sets affect results. More precisely, they
to solve the unbalanced data issue as a pre-process extend the transaction aggregation strategy to create new
step. We also suggest using CatBoost and XGBoost offers based on an analysis of the periodic behaviour of
alongside LightGBM to improve performance. We use transactions [12].
the XGBoost algorithm due to the high speed of training Randhawa et al. study the application of machine learning
in big data as well as the regularization term, which algorithms to detect fraud in credit cards. They first use Naive
overcomes overfitting by measuring the complexity of Bayes, stochastic forest and decision trees, neural networks,
the tree, and it does not require much time to set the linear regression (LR), and logistic regression, as well as
hyperparameters. We also use the Catboost algorithm support vector machine standard models, to evaluate the
because there is no need to adjust hyperparameters available datasets. Further, they propose a hybrid method by
for overfitting control, and it also obtains good results applying AdaBoost and majority voting. In addition, they add
without changing hyperparameters compared to other noise to the data samples for robustness evaluation. They
machine learning algorithms. perform experiments on publicly available datasets and show
• We propose a majority-voting ensemble learning ap-
that majority voting is effective in detecting credit card fraud
proach to combine CatBoost, XGBoost, and LightGBM cases [6].
and review the effect of the combined methods on the
Porwal and Mukund propose an approach that uses clus-
performance of fraud detection on real, unbalanced data.
tering methods to detect outliers in a large dataset and is
We also propose to use deep learning for adjusting and
resistant to changing patterns [13]. The idea behind their
fine-tuning the hyperparameters.
proposed approach is based on the assumption that the good
• To evaluate the performance of the proposed methods,
behaviour of users does not change over time and that the
we perform extensive experiments on real-world data.
data points that represent good behaviour have a consistent
To better cover the unbalanced datasets, we use recall-
spatial signature under different groupings. They show that
precision in addition to the typically used ROC-AUC.
We also evaluate the performance using F1_score and 1 The codes are available at https://github.com/khadijehHashemi/Fraud-
MCC metrics. According to the results, the proposed Detection-in-Banking-Data-by-Machine-Learning-Techniques

2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

fraudulent behaviours can be detected by identifying the TABLE 1. The features of the credit-card fraud dataset that is used in this
paper.
changes in this data. They show that the area under the
precision-recall curve is better than ROC as an evaluation Variable Name Description Type
criterion [13]. V1 , V2 , ..., V28 Transaction feature after PCA transforma- Integer
The authors in [14], propose a group learning framework tion
Time Seconds elapsed between each transaction Integer
based on partitioning and clustering of the training set. Their with the first transaction
proposed framework has two goals: 1) to ensure the integrity Amount Transaction Value Integer
of the sample features, and 2) to solve the high imbalance Class Legitimate or Fraudlent 0 or 1
of the dataset. The main feature of their proposed framework
is that every base estimator can be trained in parallel, which
improves the effectiveness of their framework. III. PROPOSED APPROACH TO DETECTING CREDIT
Itoo et al. use three different ratios of datasets and an CARD FRAUD
oversampling method to deal with the problem of data The proposed framework for fraud detection is presented
imbalance. Authors use three machine learning algorithms: in Fig. 2. As this figure shows, we first apply the desired
logistic regression, Naive Bayes, and K-nearest neighbor. pre-processing on the data and further divide the data into
The performance of the algorithms is measured based on two sections: training and testing, followed by performing
accuracy, sensitivity, specificity, precision, F1-score, and area Bayesian optimization on the training data to find the best
under the curve. They show that the logistic regression-based hyperparameters that lead to the improvement of the perfor-
model outperforms the other commonly used fraud detection mance. We use the cross-validation method to obtain perfor-
algorithms in the paper [15]. mance comparison in an unbalanced set and then examine
The authors in [16] propose a framework that combines the algorithms using different evaluation metrics, including
the potential of meta-learning ensemble techniques and a accuracy, precision, recall, the Matthews correlation coeffi-
cost-sensitive learning paradigm for fraud detection. They cient (MCC), the F1-score, and AUC diagrams. These steps
perform some evaluations, and the results obtained from are explained in detail as follows:
classifying unseen data show that the cost-sensitive ensemble
classifier has acceptable AUC value and is efficient as com- A. DATASET
pared to the performances of ordinary ensemble classifiers. In this paper, we use a real dataset so that the outcome of
Altyeb et al. propose an intelligent approach for detect- the proposed algorithm can be used in practice. We consider
ing fraud in credit card transactions [17]. Their proposed a dataset named “creditcard” that contains 284,807 records
Bayesian-based hyperparameter optimization algorithm is of two days of transactions made by credit card holders in
used to tune the parameters of a LightGBM. They perform September 2013. There are 492 fraudulent transactions, and
experiments on publicly available credit card transaction the rest of the transactions are legitimate. The positive class
datasets. These datasets consist of fraudulent and legitimate (frauds) accounts for 0.172% of all transactions; hence, the
transactions. Their evaluation results are reported in terms dataset is highly imbalanced. This dataset is available and
of accuracy, area under the receiver operating characteristic can be accessed through https://www.kaggle.com/mlg-ulb/
curve (ROC-AUC), precision, and F1-score metrics. creditcardfraud.
Xiong et al. propose a learning-based approach to tackle This dataset contains only numerical input variables re-
the fraud detection problem. They use feature engineer- sulting from a principle component analysis (PCA) transfor-
ing techniques to boost the proposed model’s performance. mation. Unfortunately, the original features and background
The model is trained and evaluated on the IEEE-CIS fraud information about the data are not given due to confidentiality
dataset. Their experiments show that the model outperforms and privacy considerations. PCA yielded the following prin-
traditional machine-learning-based methods like Bayes and cipal components: V1 , V2 , V28 . The untransformed features
SVM on the used dataset [18]. with PCA are "time" and "amount." The "Time" column
Viram et al. evaluate the performance of Naive Bayes contains the time (in seconds) elapsed between each trans-
and voting classifier algorithms. They demonstrate that in action and the first transaction in the dataset. The feature
terms of evaluated metrics, particularly accuracy, the voting "Amount" shows the transaction amount. Feature "Class" is
classifier outperforms the Naive Bayes algorithm [19]. the response variable, and it takes the value 1 in case of fraud
Verma and Tyagi investigate machine learning algorithms and 0 otherwise. The summary of the variables and features
in order to determine the best supervised ML-based algo- is presented in Table 1.
rithm for credit card fraud detection in the presence of an
imbalanced dataset. They evaluate five classification tech- B. DATA PRE-PROCESSING
niques and show that the supervised vector classifier and As illustrated in Table 2, the total number of fraudulent
logistic regression classifier outperform other algorithms in transactions is significantly lower than the total number of
an imbalanced dataset [20]. The summary of the literature legitimate transactions, indicating that the data distribution is
review is presented in Fig. 1. unbalanced. In real datasets for credit card fraud detection,
unbalanced data is expected. This data imbalance causes
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Immune System Inspired


Halvaiee and Algorithm (AIRS)
2014
three different ratios of data set and Akbari called AIS-based fraud
over-sampling method to deal with Itoo et al. detection model (AFDM)
the problem of data imbalance
transaction aggregation strategy
a framework that combines and create a new set of features
the potential of meta- 2015 Bahnsen et al. based on the periodic behavior
learning ensemble Olowookere et al. analysis of the transaction time by
2020
techniques and a cost- using thevon Mises distribution
sensitive learning paradigm
a hybrid method by applying the
Randhawa et al.
an intelligent approach that uses AdaBoost and majority voting
Bayesian-based hyper-parameter
Altyeb et al.
optimization algorithm to tune the
Related Works using clustering methods to
parameters of a LightGBM Porwal and detect outliers in a large
2018 Mukund dataset that is resistant to
evaluate the performance Naive changing patterns
Bayes and voting classifier Viram et al.
algorithms group learning framework based on
2022 Wang et al. partitioning and clustering of the
study the machine learning training set
algorithms to identify the best
Verma and Tyagi
supervised ML-based a learning-based approach that uses
algorithm 2021 Xiong et al.
feature engineering

FIGURE 1. The summary of the related works on Fraud Detection in banking industry with machine learning techniques

Apply algorithms &


RAW Data
5-fold cross validation

Feature Extraction
Model Evaluation
Data Preprocessing
Accuracy
Feature Selection

Precision

Splitting data into Recall


training & testing
F1_Score

MCC
Bayesian Optimization
ROC_AUC

FIGURE 2. The proposed framework for credit card fraud detection.

4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 2. The transaction label distribution in the "credit card" dataset This
unbalanced data is expected in real-life datasets.

No. of Trans- No. of legit- No. of Legitimate fraudulent


actions imate Trans- fraudulent (%) (%)
actions Transac-
tions
284,807 284,315 492 99.83% 0.17%

performance issues in machine learning algorithms, and hav-


ing a class with the majority of the samples influences the
evaluation results [6]. Therefore, in many studies, under-
sampling and over-sampling methods are used to solve the
data imbalance problem [15]. Using under-sampling methods
leads to data loss [21]. Besides, using over-sampling methods
leads to the production of duplicate data that doesn’t provide FIGURE 3. Feature importance diagram that shows the IG for the unknown
features of the “creditcard” dataset. The top six features are used in
information (the data and information are different, and the evaluations.
subject is discussed under the “Entropy”). Some researchers
use synthetic minority oversampling (SMOTE) as a solution,
which avoids the drawbacks of under and over sampling [5], E. ALGORITHMS
[17], [22]. However, the SMOTE method causes an increase
Hyperparameters have a significant effect on the performance
in the false-positive rate, which is not acceptable in banking
of machine learning models. We refer to optimization as
for customer orientation. To solve this problem, in this study,
the process of finding the best set of hyperparameters that
we use class weight tuning hyperparameter to solve the men-
configure a machine learning algorithm during its training.
tioned disadvantages [5], [17], [22]. However, the SMOTE
Recently, it was shown that the Bayesian method is capable
method causes an increase in the false-positive rate, which
of finding the optimised values in a much smaller number
is not acceptable in banking for customer orientation. To
of training courses compared with evolutionary optimization
solve this problem, in this study, we use class weight tuning
methods [25], [26]. In this paper, we use the Bayesian op-
hyperparameter to solve the mentioned disadvantages.
timization algorithm to tune the hyperparameters that lead
to computational time reduction and performance improve-
C. FEATURE EXTRACTION ment.
The “time” feature includes the time (in seconds) elapsed Logistic regression|: Logistic regression is a predictive
between each transaction and the first transaction. To make analysis that finds out if two or more variables are related
the most of the feature, we expand it to extract the transaction to each other. This method determines whether there is a
hour feature, which gives us more information than the time relationship between one binary dependent variable and one
feature itself. or more ordinal, nominal, interval, or ratio-level independent
variables [27].
D. FEATURE SELECTION This algorithm could not be used for unbalanced data.
The features are unknown except for “Time” and “Amount”, Therefore, we used hyperparameter class weight to solve
and we have no additional information. Feature selection the class imbalance prior to applying logistic regression. We
tries to find a subset of features that improve the classifier’s show that the ROC-AUC curve cannot be used for the evalu-
performance on effectively detecting credit card fraud [23]. ation of unbalanced data and leads to false interpretations.
The information gain (IG) method is used to select the most LightGBM: The LightGBM algorithm is built on the
important features that lead to a dimension reduction of the GBDT framework and aims to improve computational ef-
training data. Information gain functions by extracting sim- ficiency, particularly on big data prediction problems [28].
ilarities between credit card transactions and then awarding The high performance LightGBM algorithm can quickly
the greatest weight to the most significant features based on handle large amounts of data, and the distributed processing
the class of legitimate and fraudulent credit card transactions of data [17]. In LightGBM, the histogram-based algorithm
[17], [24]. The information gain method has been proven to and trees’ leaf-wise growth strategy with a maximum depth
be computationally efficient and shows leading performance limit are adopted to increase the training speed and reduce
in terms of precision [17]. Therefore, we also consider the memory consumption. The tuned hyperparameters include
IG method for feature selection in the proposed framework. the “num_leaves”, which is the number of leaves per tree,
Figure 3 shows the diagram of the IG, and the top six features “max_depth”, which denotes the maximum depth of the
extracted by this method have been used to evaluate the tree, and “learning_rate” which is also balanced by tuning
proposed algorithm. the weight of the class. With the excessive increase of the
leaves, the problem fits horizontally. Therefore, we need to
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

consider a suitable range for this algorithm to obtain good


optimization results. 1 X
XGBoost: eXtreme Gradient Boosting (XGBoost) has be- ŷ = argmax (p1 , ..., pn ) (1)
NClassifiers
Classif iers
come a dominant algorithm in the field of applied machine
learning. XGBoost is a type of decision tree algorithm with
boosted gradients. It is preferred over other gradient boosting Deep Learning: Deep learning algorithms are a class of
machines (GBMs) due to its fast execution speed, model machine learning algorithms where multiple hidden layers
performance, and memory resources [28]. This algorithm is are used to improve the outcome. Deep learning is shown
a hybrid technique in which new models are added to fix to be a very promising solution to deal with fraud in finan-
errors caused by existing models. XGBoost includes parallel cial transactions, making the best use of banks’ big data.
computation to construct trees using all the CPUs during [34]. Deep learning is a generic term that refers to machine
training. Instead of traditional stopping criteria (i.e., criterion learning using a deep multi-layer artificial neural network
first), it makes use of the "max depth" parameter and starts (ANN). It is a biologically inspired model of human neurons,
tree pruning from the backward direction, which significantly composed of multilevel hidden layers of nonlinear processing
improves the computational performance and speed of XG- units, where each neuron is able to send data to a connected
Boost [28]. XGBoost employs a more regularised technique neuron within the hidden layers. These processing units dis-
called "formalization" to control over-fitting and achieve cover intermediate representations in a hierarchical manner.
better performance [29]. The tuned hyperparameters include The features discovered in one layer form the basis for the
learning rate, number of trees, and maximum tree depth, as processing of the succeeding layer. In this way, deep learning
well as applying weight to classes algorithms learn intermediate concepts between raw input
CatBoost: Category Boosting (CatBoost) is a new gradient and target knowledge [34].
boosting algorithm proposed by Prokhorenkova et al. [29]. In this paper, we use a sequential model, which is a linear
CatBoost is a competitive candidate in the realm of clas- stack of layers to construct an artificial neural network model.
sifiers for highly unbalanced data. [30]. CatBoost machine Our model has a dense class, which is a very common layer
learning algorithm is a particular type of Gradient boosting and is often used. In the neural network, the activation func-
on the decision trees as it can handle categorical, ordered tion is used to increase the predictive power. This function
features, and the over-fitting of the model is taken care of by divides input signals into output signals. We use the Relu
Bayesian estimators [31]. CatBoost doesn’t require extensive activation function, and in the last layer, we use “Sigmoid”,
data training like other machine learning models and can be since our output is binary. The Sigmoid function generates
successfully applied to diverse types and formats of data [29], values in a range of zero and one. In the “Relu” function, if
[30]. CatBoost has both CPU and GPU implementations, the value x is smaller than or equal to zero, the output is zero.
the GPU implementation allows for much faster training and The function of the Relu activation function is in many ways
is faster than both state-of-the-art open-source GBDT GPU similar to the function of our biological neurons.
implementations, XGBoost and LightGBM, on ensembles of Neural networks require initial weighting. We use kernel-
similar sizes [32]. CatBoost uses a more efficient strategy hat initializer, which defines the method of determining the ran-
reduces over-fitting and allows the use of the whole dataset dom weights of the primary Keras layers. To overcome the
for training. We perform a random permutation of the dataset, unbalanced data problem, we consider the ratio of 1 to 4
and also, for data imbalance problems, we use a class weight for the weight of the majority class to the minority class.
hyperparameter. This causes an increase in the processing speed as well as
Majority voting: Ensemble learning (EL), which is a type increasing the efficiency of the model. The size of the input
of machine learning, combines several classifiers, minimises layer is equal to the number of features plus the extracted
the error of the classifiers, and achieves more reasonable features. We also remove the "time" feature. To build the
results than a single technique. A voting majority classifier is Keras model, we optimise the number of layers and neurons,
not a real classifier, but a method that is trained and evaluated the number of epochs, and the batch size, which leads to
in parallel in order to use the different features of each algo- an increase in speed. Commonly, batch size is set to 32
rithm. We can train the data using different hybrid algorithms or 128. However, our dataset is highly unbalanced, and by
to predict the final output. The final result of the prediction is choosing the common batch size, there may be no fraud cases
determined by a majority of votes according to two different in the batch during training. Therefore, our range is chosen
strategies: hard voting and soft voting. If voting is hard, it so that we can see fraudulent samples in each batch. Also, by
uses the predicted class labels to vote for the majority law. choosing a larger batch size, the processing is faster, and we
Otherwise, if the vote is soft, it predicts the class label based also need less memory. Large epoch sizes can result in either
on "Argmax," the sum of the predicted probabilities, which is over- or under-fitting. Therefore, selecting the appropriate
recommended for a set of well-calibrated classifiers. In this range for optimization not only increases the efficiency of
case, the probability vector is calculated on average for each the algorithm but also reduces the time required to find the
predicted class (for all classifiers). The winning class is the optimal points. By performing Bayesian optimization, the
one with the highest value [27], [33]. number of neurons in the first hidden layer is set to 86, the
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 3. Details of our deep learning model used in the paper are provided. TP × FP − FP × FN
The total parameters are set to 7593, and all are trainable.
MCC = p
(T P + F P )(T P + F N )(T N + F P )(T N + F N )
(6)
Layer(Type) Output Shape Param No.
dense (Dense) (None, 86) 2752
Accuracy Accuracy quantifies the total performance of
dense-1 (Dense) (None, 44) 3828 the classifier and is defined as the number of correct predic-
dense-2 (Dense) (None, 22) 990 tions made by the model. When dealing with data that isn’t
dense-3 (Dense) (None, 1) 23 balanced, this criterion doesn’t give good results because it
also gives a high value if even one fraudulent transaction
is found. Recall shows the efficiency of the classifier in
number of epochs is set to 117, and the batch size is set to detecting actual fraudulent transactions. Precision measures
1563. The details of our model are presented in Table 3. the reliability of the classifier and F1-Score is the harmonic
Following Keras and with the help of the compile method average of recall and precision measures, that considers both
and Adam’s optimizer, we perform weight updates and use false negatives and positives.
binary-cross entropy for the loss function that finalises the ROC-AUC is a measure of separability that demonstrates
configuration of the learning and training process. the model’s ability to differentiate between classes [15].
ROC-AUC is a graphical plot of the false positive rate (FPR)
F. EVALUATION METRICS and the true positive rate (TPR) at different possible levels
We apply a cross-validation test to evaluate the performance [17]. The area under the ROC curve is not a suitable criterion
of the proposed model for credit card fraud detection. Similar for evaluating fraud detection methods since it only considers
to [6], [17], We use a stratified 5-fold validation test to obtain positive values.
a reliable performance comparison in the unbalanced set. The precision and recall curves are commonly used to
The dataset is divided randomly into five separate subsets compare classifiers in terms of precision and recall. Usually,
of equal size, where the number of samples in each class in this two-dimensional graph, the precision rate is plotted
is divided into equal proportions in each category. In all on the y-axis and the recall is plotted on the x-axis. There
steps of validation, a single subset (20% of the dataset) is is no good way to describe the true and false positives and
reserved as the validation data to test the performance of the negatives using one indicator. One good solution is to use
proposed approach, while the remaining four subsets (80% MCC, which measures the quality of a two-class problem,
of the dataset) are employed as the training data. We repeat taking into account the true and false positives and negatives.
this process five times until all subsets are used. The average It is a balanced measure, even when the classes are of
performances of the five test subsets are calculated, and the different sizes [6].
final result is the performance of the proposed approach on a
5-fold cross-validation test. IV. EXPERIMENTAL RESULTS AND DISCUSSION
To be fair in our comparisons, we use the common metrics We use the stratified 5-fold cross validation method and the
for our evaluations, including accuracy, precision, recall, the boosting algorithms with the Bayesian optimization method
Matthews correlation coefficient (MCC), the F1-score, and to evaluate the performance of the proposed framework.
AUC diagrams. Positive numbers represent fraudulent trans- We extract the hyperparameters and evaluate each algorithm
actions in our experiments, while negative numbers represent individually before using the majority voting method. We
legitimate ones. True positive (T P ) represents fraudulent examine the algorithms in triple and double precision. The
transactions that have been classified as such. False positives comparison results are presented in Table 5.
(F P ) indicate the number of legitimate transactions mis- Most studies in the literature rely on AUC diagrams to
classified as fraudulent. The true negative (T N ) represents evaluate performance. However, as can be seen from the
legitimate transactions classified as legitimate, and the false ROC-AUC curve in Fig. 4, the value of AUC in severely un-
negative (F N ) indicates the misclassified fraudulent transac- balanced data is not a good evaluation metric. It is influenced
tions as legitimate [15]. The mathematical expressions for the by the real positives and considers the negatives irrelevant.
metrics used are given in Eq. (2) to Eq. (6). According to the ROC-AUC Fig. 4, the logistic regression
algorithm 0.9583 has the highest number of fraud detection,
TP + TN but it has the lowest value in other criteria.
Accuracy = (2)
TP + TN + FP + FN The precision-recall curve is illustrated in Fig. 5 and shows
the system performance in a more precise manner compared
TP with the ROC-AUC curve. However, the results cannot be
Recall = (3)
TP + TN cited because false negatives are far from the view of this
diagram. As Fig. 5 shows, the highest value belongs to the
TP
Precision = (4) combination of the CatBoost and LightGBM algorithms with
TP + TN + FP + FN a value of 0.7672, and the lowest value belongs to logistic
Precision × Recall regression and is 0.7361.
F1-Score = 2 × (5) Comparing the precision, recall, and F1-score as well as
Precision+Recall
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 4. Performance evaluation of Algorithms

Model Accuracy AUC Recall Precision F1-score MCC


Log_Reg 0.97477 0.9578 0.8730 0.0617 0.1143 0.2248
LGBM 0.99919 0.9472 0.7990 0.7534 0.7699 0.7727
XGB 0.99923 0.9517 0.7949 0.7862 0.7830 0.7864
CatBoost 0.99880 0.9390 0.8096 0.6431 0.7066 0.7158
Vot_Lg, Xg, Ca 0.99924 0.9501 0.8033 0.7720 0.7825 0.7847
Vot_Lg, Xg 0.99927 0.9522 0.8012 0.7901 0.7901 0.7925
Vot_g, Ca 0.99923 0.9492 0.8097 0.7681 0.7823 0.7852
Vot_Lg, Ca 0.99912 0.9459 0.8075 0.7260 0.7581 0.7620

FIGURE 4. ROC_AUC Curve . FIGURE 6. Performance comparing algorithms with different evaluation
criteria .

TABLE 5. Deep Learning Model Results

Model Accuracy AUC Recall Precision F1-score MCC


Keras 0.9994 0.9401 0.8222 0.8043 0.8132 0.8129

XGBoost algorithms, which have an MCC value of 0.79 and


an F1-score of 0.79. In individual algorithms, XGBoost has
the highest values.
According to the digits obtained in Table 5, deep learning
has achieved better performance compared with individual
algorithms and majority voting ensemble learning. The MCC
and F1-score metrics have values of 0.8129 and 0.8132,
respectively. The area under the ROC curve in the deep
learning method is illustrated in Fig. 7 and shows a value of
0.9401.
The diagram of the Precision-Recall curve is shown in
Fig. 8, and shows the value as 0.7922.
The evaluation results of the proposed approach using
different pre-processing and class weight hyperparameter
FIGURE 5. Precision_Recall Curve . tuning to deal with the problem of data unbalance compared
to the paper [17] are shown in Fig. 9. The results show
improvement of both methods compared to the method pre-
the MCC, the algorithms used are shown in Fig. 6. The best sented in [17].
performance is related to the combination of lightGBM and According to the Table 6, it is shown that the proposed
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 6. Performance comparison of the proposed approach and the method presented in [17]

Model Accuracy AUC Recall Precision F1-score


Method presented in [17] 0.984 0.909 0.406 0.973 0.569
Proposed LightGBM 0.9992 0.947 0.799 0.753 0.769
Proposed Approach 0.9993 0.952 0.801 0.79 0.79

FIGURE 9. Performance comparison of the proposed approach with the


paper [17] based on the different evaluation criteria .

FIGURE 7. ROC Curve of Deep Learning . We used the common evaluation metrics, including accuracy,
precision, recall, F1-score, and AUC. Our experimental re-
sults showed that the proposed LightGBM method improved
the fraud detection cases by 50% and the F1-score by 20%
compared with the recently presented method in [17]. We
improve the performance of the algorithm with the help of the
majority voting algorithm. We also improved the criteria by
using the deep learning method. The assurance of the results
of MCC for unbalanced data proved that, compared to other
criteria of evaluation, it’s stronger. In this paper, by com-
bining the LightGBM and XGBoost methods, we obtained
0.79 and 0.81 for the deep learning method. Using hyper
parameters to address data unbalance compared to sampling
methods, in addition to reducing memory and time needed to
evaluate algorithms, also has better results.For future studies
and work, we propose using other hybrid models as well as
working specifically in the field of CatBoost by changing
more hyperparameters, especially the hyperparameter num-
ber of trees. Also, due to hardware limitations in this study,
FIGURE 8. Precision- Recall Curve of Deep Learning . the use of stronger and better hardware may bring better
results that can ultimately be compared with the results of
this study.
methods outperform the intelligence method presented in
[17] using common metrics and a public dataset. REFERENCES
[1] Jay Nanduri, Yung-Wen Liu, Kiyoung Yang, and Yuting Jia. Ecommerce
fraud detection through fraud islands and multi-layer machine learning
V. CONCLUSION AND FUTURE WORK model. In Future of Information and Communication Conference, pages
In this paper, we studied the credit card fraud detection 556–570. Springer, 2020.
problem in real unbalanced datasets. We proposed a machine- [2] Irum Matloob, Shoab Ahmed Khan, Rukaiya Rukaiya, Muazzam A Khan
Khattak, and Arslan Munir. A sequence mining-based novel architecture
learning approach to improve the performance of fraud de- for detecting fraudulent transactions in healthcare systems. IEEE Access,
tection. We used a publicly available "credit card" dataset 10:48447–48463, 2022.
with 28 features and 0.17 percent of the fraud data. We [3] Haonan Feng. Ensemble learning in credit card fraud detection using
boosting methods. In 2021 2nd International Conference on Computing
proposed two methods. In the proposed LightGBM, we used and Data Science (CDS), pages 7–11. IEEE, 2021.
class weight tuning to choose the proper hyperparameters. [4] Mohammad Soltani Delgosha, Nastaran Hajiheydari, and Sayed Mahmood

VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Fahimi. Elucidation of big data analytics in banking: a four-stage delphi [24] Morteza Rakhshaninejad, Mohammad Fathian, Babak Amiri, and Navid
study. Journal of Enterprise Information Management, 34(6):1577 – 1596, Yazdanjue. An ensemble-based credit card fraud detection algorithm using
2020. Cited by: 2; All Open Access, Green Open Access. an efficient voting strategy. The Computer Journal, 2021.
[5] Maja Puh and Ljiljana Brkić. Detecting credit card fraud using selected [25] A Helen Victoria and G Maragatham. Automatic tuning of hyperparame-
machine learning algorithms. In 2019 42nd International Convention on ters using bayesian optimization. Evolving Systems, 12(1):217–223, 2021.
Information and Communication Technology, Electronics and Microelec- [26] Hyunghun Cho, Yongjin Kim, Eunjung Lee, Daeyoung Choi, Yongjae Lee,
tronics (MIPRO), pages 1250–1255. IEEE, 2019. and Wonjong Rhee. Basic enhancement strategies when using bayesian
[6] Kuldeep Randhawa, Chu Kiong Loo, Manjeevan Seera, Chee Peng Lim, optimization for hyperparameter tuning of deep neural networks. IEEE
and Asoke K Nandi. Credit card fraud detection using adaboost and Access, 8:52588–52608, 2020.
majority voting. IEEE access, 6:14277–14284, 2018. [27] Fairoz Nower Khan, Amit Hasan Khan, and Lamiah Israt. Credit card
[7] Nishamathi Kumaraswamy, Mia K Markey, Tahir Ekin, Jamie C Barner, fraud prediction and classification using deep neural network and ensem-
and Karen Rascati. Healthcare fraud data mining methods: A look back ble learning. In 2020 IEEE Region 10 Symposium (TENSYMP), pages
and look ahead. Perspectives in Health Information Management, 19(1), 114–119. IEEE, 2020.
2022. [28] Weizhang Liang, Suizhi Luo, Guoyan Zhao, and Hao Wu. Predicting
[8] Esraa Faisal Malik, Khai Wah Khaw, Bahari Belaton, Wai Peng Wong, and hard rock pillar stability using gbdt, xgboost, and lightgbm algorithms.
XinYing Chew. Credit card fraud detection using a new hybrid machine Mathematics, 8(5):765, 2020.
learning architecture. Mathematics, 10(9):1480, 2022. [29] Sami Ben Jabeur, Cheima Gharib, Salma Mefteh-Wali, and Wissal Ben
[9] Kavya Gupta, Kirtivardhan Singh, Gaurav Vikram Singh, Mohd. Hassan, Arfi. Catboost model and artificial intelligence techniques for corpo-
Himani, and Upasana Sharma. Machine learning based credit card fraud rate failure prediction. Technological Forecasting and Social Change,
detection - a review. In 2022 International Conference on Applied 166:120658, 2021.
Artificial Intelligence and Computing (ICAAIC), pages 362–368, 2022. [30] John Hancock and Taghi M Khoshgoftaar. Medicare fraud detection using
[10] Raghad Almutairi, Abhishek Godavarthi, Arthi Reddy Kotha, and Ebrima catboost. In 2020 IEEE 21st international conference on information reuse
Ceesay. Analyzing credit card fraud detection based on machine learning and integration for data science (IRI), pages 97–103. IEEE, 2020.
models. In 2022 IEEE International IOT, Electronics and Mechatronics [31] B Dhananjay and J Sivaraman. Analysis and classification of heart rate
Conference (IEMTRONICS), pages 1–8. IEEE, 2022. using catboost feature ranking model. Biomedical Signal Processing and
Control, 68:102610, 2021.
[11] Neda Soltani Halvaiee and Mohammad Kazem Akbari. A novel model for
[32] Yeming Chen and Xinyuan Han. Catboost for fraud detection in finan-
credit card fraud detection using artificial immune systems. Applied soft
cial transactions. In 2021 IEEE International Conference on Consumer
computing, 24:40–49, 2014.
Electronics and Computer Engineering (ICCECE), pages 176–179. IEEE,
[12] Alejandro Correa Bahnsen, Djamila Aouada, Aleksandar Stojanovic, and
2021.
Björn Ottersten. Feature engineering strategies for credit card fraud
[33] Anil Goyal and Jihed Khiari. Diversity-aware weighted majority vote
detection. Expert Systems with Applications, 51:134–142, 2016.
classifier for imbalanced data. In 2020 International Joint Conference on
[13] Utkarsh Porwal and Smruthi Mukund. Credit card fraud detec- Neural Networks (IJCNN), pages 1–8. IEEE, 2020.
tion in e-commerce: An outlier detection approach. arXiv preprint [34] Abhimanyu Roy, Jingyi Sun, Robert Mahoney, Loreto Alonzi, Stephen
arXiv:1811.02196, 2018. Adams, and Peter Beling. Deep learning detecting fraud in credit card
[14] Hongyu Wang, Ping Zhu, Xueqiang Zou, and Sujuan Qin. An en- transactions. In 2018 Systems and Information Engineering Design
semble learning framework for credit card fraud detection based on Symposium (SIEDS), pages 129–134. IEEE, 2018.
training set partitioning and clustering. In 2018 IEEE SmartWorld,
Ubiquitous Intelligence & Computing, Advanced & Trusted Com-
puting, Scalable Computing & Communications, Cloud & Big Data
Computing, Internet of People and Smart City Innovation (Smart-
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 94–98. IEEE,
2018.
[15] Fayaz Itoo, Satwinder Singh, et al. Comparison and analysis of logistic
regression, naive bayes and knn machine learning algorithms for credit
card fraud detection. International Journal of Information Technology,
13(4):1503–1511, 2021.
[16] Toluwase Ayobami Olowookere and Olumide Sunday Adewale. A frame-
work for detecting credit card fraud with cost-sensitive meta-learning
ensemble approach. Scientific African, 8:e00464, 2020.
[17] Altyeb Altaher Taha and Sharaf Jameel Malebary. An intelligent approach
to credit card fraud detection using an optimized light gradient boosting
machine. IEEE Access, 8:25579–25587, 2020.
[18] Xiong Kewei, Binhui Peng, Yang Jiang, and Tiying Lu. A hybrid deep
learning model for online fraud detection. In 2021 IEEE International Con-
ference on Consumer Electronics and Computer Engineering (ICCECE),
pages 431–434. IEEE, 2021.
[19] T Vairam, S Sarathambekai, S Bhavadharani, A Kavi Dharshini,
N Nithya Sri, and Tarika Sen. Evaluation of naive bayes and voting
classifier algorithm for credit card fraud detection. In 2022 8th Interna-
tional Conference on Advanced Computing and Communication Systems
(ICACCS), volume 1, pages 602–608, 2022.
[20] Pradeep Verma and Poornima Tyagi. Analysis of supervised machine
learning algorithms in the context of fraud detection. ECS Transactions,
107(1):7189, 2022. SEYEDEH KHADIJE HASHEMI is a former
[21] Junyi Zou, Jinliang Zhang, and Ping Jiang. Credit card fraud detection student of the Electrical and Computer Engineer-
using autoencoder neural network. arXiv preprint arXiv:1908.11553, ing Department at Kharazmi University. She has
2019. received her MSc and Bsc on computer engineer-
[22] Doaa Almhaithawi, Assef Jafar, and Mohamad Aljnidi. Example- ing. Her Master thesis has been performed on
dependent cost-sensitive credit cards fraud detection using smote and fraud detection for banking with machine learning
bayes minimum risk. SN Applied Sciences, 2(9):1–12, 2020. techniques. Her research interests includes appli-
[23] Jipeng Cui, Chungang Yan, and Cheng Wang. Learning transaction cation of machine learning techniques, focusing
cohesiveness for online payment fraud detection. In The 2nd International on banking.
Conference on Computing and Data Science, pages 1–5, 2021.

10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3232287

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

SEYDEH LEILI MIRTAHERI is a faculty mem-


ber of the Electrical and Computer Engineering
Department at Kharazmi University. She is re-
searching next-generation high-performance com-
puting systems and GPU Computing. Her research
interests are distributed and parallel systems, Ex-
ascale computing, cluster computing, mathemat-
ics, and scientific computing. She has worked on
Distributed Systems and done several successful
industrial experiments in these areas. She Re-
ceived Exemplary Professor of Kharazmi University, Tehran, Iran, 2020
and also she Received Leading Young Research in Alborz Province, 2020.
She Received the First award of Inventions at National Science Foundation
Invention Festival, 2011, IUST (Iran University of Science and Technol-
ogy)’s Awards for Excellence in Researching in 2009, Second Level Reward
of National Science Foundation in Ph.D. duration, 2009, First Award for
presenting" CSharifi: Kernel Level Cluster Management System Software,"
at the Khwarizmi Young Awards, 2008 and Grant of Excellent Researcher
of National Science Foundation, 2008 and Iranian Organization of Scientific
and Industrial Research appreciation to cooperating and presenting "A Clus-
ter Management System Software" at the Khwarizmi International Awards,
2007. She has published more than 50 papers in credible conferences and
journals.

SERGIO GRECO is a Full Professor with


the DIMES Department, University of Cal-
abria, Rende, Italy. His research interests in-
clude database theory, data integration and ex-
change, inconsistent data, incomplete data, data
mining, knowledge representation, logic program-
ming, computational logic and argumentation the-
ory. He has written over 220 papers, including
more than 60 journal papers in prestigious confer-
ences and journals.

VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like