Credit Card Fraud Detection Model Based On LSTM Re
Credit Card Fraud Detection Model Based On LSTM Re
net/publication/350691336
Credit Card Fraud Detection Model Based on LSTM Recurrent Neural Networks
CITATIONS READS
76 4,381
3 authors:
Bouabid Ouahidi
Mohammed V University
62 PUBLICATIONS 824 CITATIONS
SEE PROFILE
All content following this page was uploaded by Samira Douzi on 16 July 2021.
Abstract—With the increasing use of credit cards in Machine learning models that are never updated are
electronic payments, financial institutions and service inadequate as they do not adapt to new fraud
providers are vulnerable to fraud, costing huge losses every strategies.
year. The design and the implementation of efficient fraud
Static machine learning models do not take account
detection system is essential to reduce such losses. However,
machine learning techniques used to detect automatically of changes and trends in consumer spending
card fraud do not consider fraud sequences or behavior behavior, for example during holiday seasons and
changes which may lead to false alarms. In this paper, we geographical regions.
develop a credit card fraud detection system that employs In these situations, the implementation of an accurate
Long Short-Term Memory (LSTM) networks as a sequence fraud detection system that adapts to new fraud behaviors
learner to include transaction sequences. The proposed and evolves continuously is of crucial importance for
approach aims to capture the historic purchase behavior of financial institutions in order to prevent fraud before it
credit card holders with the goal of improving fraud
occurs, protect consumers’ interests and reduce the
detection accuracy on new incoming transactions.
Experiments show that our proposed model gives strong
damages caused by fraud [2], [3].
results and its accuracy is quite high. In this paper, we propose a new credit card fraud
detection system based on Long Short-Term Memory
Index Terms—credit card, fraud detection, sequence learning, (LSTM) networks to predict the fraudulent behavior of
recurrent neural networks, LSTM credit card transactions and deliver good fraud detection
performance. We provide the experimental results to
validate the effectiveness of our approach.
I. INTRODUCTION The structure of this work is as follows. Section II gives
In recent years, credit card transactions have been set as a general introduction to sequence classification. Section
the most popular payment mode thanks to the improvement III presents a review of credit card fraud literature. In
of technology and the emergence of new e-service payment Section IV, the structure of our proposed method is
solutions, such as e-commerce and mobile payments. described. Section V details the dataset used in this study
However, credit card fraud has also increased with the and discusses the results obtained. Finally, the paper is
advent of these new technologies. concluded in Section VI and suggested ideas for future
The security of card payments and the trust of research.
consumers in making card payments is a matter of concern
for any bank in the world. According to the statistics II. SEQUENCE CLASSIFICATION FOR CREDIT CARD
published by the Nilson Report site in 2017 [1], the FRAUD DETECTION
financial losses caused by credit card fraud were amounted In credit card fraud detection, traditional fraud detection
to $24.71 billion in 2016 and $27.69 billion in 2017. It is systems aim to identify transactions with a high probability
also reported that the actual amount of losses will increase of being fraud, based only on individual transaction
by 2020. information such as amount, time and transaction location.
Despite developing advanced technologies to prevent Such systems are inadequate, since they do not consider the
fraud, such as the use of chip and pin verification, 3-D consumer spending behavior, which is useful to discover
Secure for online transactions and security questions for relevant fraud patterns [4].
internet banking, traditional machine learning models used A fraud is not just a property of the transaction itself, but
to automate detection of fraud are inadequate, as they fail a property of both the transaction and the particular context
to predict whether a transaction is fraudulent or not for the in which it occurred i.e. the account and the merchant.
following raisons: Therefore, identical buying behaviors may at the same time
Fraudsters invent new fraud patterns and represent either entirely legitimate behavior in the context
continuously change their strategies to avoid being of some customers or obvious anomalies in the context of
detected. others [5].
To construct such a context that defines consumers’
Manuscript received August 1, 2020; revised February 8, 2021. profile, it is very important to summarize the history of
consumer spending patterns, in order to capture the networks to model long term dependencies within
sequential dependency between consecutive credit card transaction sequences.
transactions. The objective is to allow a classifier to better
detect very dissimilar transactions within the purchases of IV. PROPOSED MODEL
a consumer.
In this section, we describe our proposed model based
Therefore, in the following section, we construct such a
on LSTM architecture for credit card fraud detection. The
context by using the sequence learner LSTM recurrent
steps of this model are detailed below.
neural networks as a dynamic pattern recognition classifier
to model long term dependencies within transaction A. Data Preparation
sequences. The values and types of the dataset’s features that will
ultimately be used as input to neural networks are different.
III. LITERATURE REVIEW Such differences can vary widely, affecting the
Credit card fraud detection is a challenging problem that performance of the classifier. Data normalization is then
attracts the attention of machine learning and artificial done by fine-tuning the input features to align the entire
intelligence communities for several reasons. For instance, probability distribution of values. In addition, all
credit card fraud data sets are highly imbalanced since the categorical features must be converted to numerical values
number of fraudulent transactions is much lower than the in order to use neural networks and other classifier
legitimate ones. Thus, many of traditional classifiers fail to algorithms that deal only with numerical data. Thus, each
detect minority class objects for these skewed data sets [4], input data is normalized to the range values [0, 1]. We
[6], [7]. On the other hand, credit card fraud detection choose the Min-Max normalization technique because it
system has to respond in very short times to become useful reduces noise effects and ensures that neural networks
in real scenarios. Another critical aspect is the data efficiently update parameters and accelerate network
conditional distribution that evolves over time because of training [28]. We use the following formula (1):
seasonality and new attack strategies [8]. x(t ) x(t ) min
Therefore, many modern techniques based on x(t )'
(1)
supervised learning, unsupervised learning, anomaly x(t ) max x(t ) min
detection and ensemble learning have been devoted to
payment card fraud detection [9]. In particular, supervised where x(t )' is the normalized value of x(t ) and x(t )max
classification techniques demonstrated to be extremely and x(t ) min are the maximum and minimum values of the
effective for facing this challenge, where pre-classified
whole sequence respectively.
datasets containing labeled historical transactions are used
The neural network is trained by using the historical
for training a classifier that builds a detection model
credit card data that includes details about the card holder’s
capable to predict whether a new transaction is fraudulent
purchases. Using these data, the neural network compares
or genuine. Some of these algorithms are support vector
the transaction information with the previously stored
machines [10], [11], hidden Markov models [11], [12],
information. If the data fits the pattern, then the card is
logistic regression algorithms [10], [13], decision trees
definitely used by its owner. If there is no match, the
[14], [15], random forests [10], [16]-[19], and k-nearest
probability of fraud is then high.
neighbors [20], [21].
In order to group the observations and transform them
Unsupervised classification methods are used to detect
into sequences that are appropriate for network
unusual behavior of a system and to identify transactions
presentation and classification, we follow the steps below:
that do not conform to the model as potential fraudulent
cases [22]-[24]. It can help to detect some new patterns of Group the transactions by account and count the
fraud that have not been detected before. number of transactions for each account.
However, most of these approaches handle each Split the accounts into different sets according to
transaction as a single object and neglect the relationship their transaction counts.
between them. This sequential information between Order the transactions by time for each account in
transactions may have major impact on the outcome of each set.
credit card fraud detection model. Therefore, each transaction i at time t can be then
Recently, deep learning methods based on Recurrent extended into a sequential vector
Neural Networks (RNN) have been used in fraud detection
X i xi1 , xi 2 , xi 3 ,..., xi (t 1) , xit .
field given their reputation as one of the most accurate
learning algorithms in sequence analysis work [25]-[27]. B. Long Short Term Memory Networks
RNN is a dynamic machine learning approach capable of Long Short-Term Memory (LSTM) is a special type of
analyzing the dynamic temporal behaviors of various bank artificial Recurrent Neural Network (RNN) architecture
accounts by modeling the sequential dependency between used to model time series information in the field of deep
consecutive transactions of credit card holders. learning (Fig. 1).
In this paper, we propose a novel sequence learner for In contrast to standard feedforward neural networks,
credit card fraud detection by using LSTM recurrent neural LSTM has feedback connections between hidden units that
are associated with discrete time steps, which allow matrix, b f is the bias. The equation for the forget
long-term sequence dependencies to be learned and a
transaction label to be predicted given the sequence of past gate is given by (4):
transactions. LSTMs were developed to overcome the f t W ht 1, xt b f (4)
problem of vanishing and exploding gradient that can be f
observed during the training of traditional RNNs [29]. Calculate the value of the current moment memory
cell ct , and ct 1 is the state value of the last LSTM
unit. We use the following Eq. (5):
ct f t c f it c~t (5)
ht o tanh(c )
(7)
t t
c~t tanh W
c ht 1, xt bc (2) This section describes the dataset and provides the
evaluation metrics used in this work. The results of the
Calculate the value of the input gate it , the input experiments of the proposed method are then presented.
gate controls the update of the current input data to A. Dataset Description
the state value of the memory cell, σ is sigmoid
Datasets provide a way to train and validate the efficacy
function, Wi is the weight matrix, bi is the bias. of the proposed methods, hence playing an important role
The equation for the input gate is given by (3): in motivating research. One of the challenges with studying
ht 1, xt bi
it W
i
(3) credit card fraud detection systems is that it is considered
highly confidential and not publicly disclosed [30], [31].
Researchers have therefore suggested using synthetic data
Calculate the value of the forget gate f t , the forget
that is modeled after a real data set to contain similar
gate controls the update of the historical data to the patterns. For this work, we use BankSim software, a
state value of the memory cell, W f is the weight simulation tool specifically designed to emulate fraud data
[32]. BankSim generated data is obtained from the Kaggle A categorical variable indicating what type of
Category purchase
website. good or service was purchased
BankSim uses a multi-agent-based simulation Amount of purchase The total amount that the transaction cost
methodology based on a sample of aggregated real A binary variable indicating if the transaction
Fraud status
transaction data that a bank in Spain offers. The original was fraudulent of not
bank data is made up of thousands of transactional data
records from November 2012 to April 2013. BankSim uses B. Building the Model
multiple agents of three different categories to mimic this We build a pattern recognition LSTM networks with 9
original bank data: traders, customers, and fraudsters. input neurons since each input feature present in our
These agents communicate with each other over a dataset will be represented by its input neuron. Feature
sequence of simulated days, resulting in a purchase 'Fraud status' is used as output neuron. One hidden layer
transaction log closely resembling the original bank data. with 15 neurons was used to analyze the structure of the
The data set used in these experiments contains details networks. Table II presents the parameter values used in
of 594,643 different transactions across a six-month time the proposed LSTM model.
period. There is a significant class imbalance problem
associated with our dataset. Only 7,200 transactions TABLE II. LSTM TRAINING PARAMETERS
(≈1,2%) are labeled as “Fraud”, while the remaining Parameters LSTM values
587,443 transactions are labeled as “Genuine”. Fig. 3 Number of features 9
illustrates the class distribution of the dataset used in our LSTM memory size 15
experiments. Epoch number 100
The dataset used in this work contains transactions Learning rate [0.1, 0.4]
corresponding to card purchases made during 180 Loss function Cross Entropy
simulated days and consists of 594,643 different Optimiser Adam Optimiser
transactions, among which 7,200 (≈1,2%) are labeled as
“Fraud”, while the remaining 587,443 are labeled as
This model is based on Keras deep learning framework.
“Genuine”. Raw data provides information about
The implementation steps of the proposed model are
transaction and account details. Each transaction message
detailed below:
is represented as a feature vector composed of 10 features
Reshape dataset into three-dimensional tensor
described in Table I.
(samples, number of timesteps, number of features).
Define learning parameters (memory size, learning
rate, batch size and epochs).
Define LSTM cell.
Set tensor variables for weight and bias vectors.
Divide dataset into training, validation, and testing.
Compute the output based on softmax activation
function.
Define cross entropy loss function.
Add Adam optimization function to minimize the
cross-entropy loss function.
Repeat:
Compute training error.
Figure 3. Class distribution of credit card DataSet.
Compute validation error.
Update weights and biases using back propagation.
TABLE I. FEATURE VECTORS DESCRIPTION Predict for testing dataset using trained LSTM.
Name Description
The day the transaction took place from 1 to
Step
180
A number identifying the customer account
Customer ID
involved in the transaction
A categorical value putting the customer into
Age Category
one of 8 different age groups
A categorical variable indicating the gender of
Gender
the customer
Zip Code of account The zip code associated with the customer
A number identifying the merchant involved
Merchant ID
in the transaction
Zip Code of
The zip code of the merchant
Merchant Figure 4. LSTM loss function.
The loss function used for the pattern recognition dedicated to the study of other variants of RNN and
network is Cross-entropy. Fig. 4 shows the performance compare their performances with our approach.
plot of train and validation data subsets. In our case the
network is well trained since that the loss function CONFLICT OF INTEREST
decreases for both training and validating data.
The authors declare no conflict of interest.
C. Performance Metrics
In this study, we trained the feedforward networks with AUTHOR CONTRIBUTIONS
our dataset divided into three sets. The first subset 70% of Ibtissam Benchaji conducted the research and wrote the
data is the training set, the second subset 15% of data is the paper, Samira Douzi and Bouabid El Ouahidi directed,
validation set and the last test subset 15% of data is used to guided, and provided suggestions for each stage of
test the network generalization. research. All authors had approved the final version.
To assess the performance of our model with more
accuracy, we introduce the following evaluation metrics REFERENCES
represented by Eqs. (8), (9) and (10):
The Mean Square Error (MSE): [1] The Nilson Report, Trade Publication on Consumer Payment
Systems, issue 1118, October 2017.
yn y'n
2 [2] T. P. Bhatla, V. Prabhu, and A. Dua, “Understanding credit card
1 N
MSE
N n1
(8)
[3]
frauds,” Cards Business Review, pp. 1-15, June 2003.
J. L. Liu, C. Chen, and H. Yang, “Efficient evolutionary data
mining algorithms applied to the insurance fraud prediction,”
where yn is the original value associated to the nth sample International Journal of Machine Learning and Computing, vol. 2,
no. 3, pp. 308-313, June 2012.
and y'n is the value predicted. [4] J. T. Quah and M. Sriganesh, “Real-Time credit card fraud
detection using computational intelligence,” Expert Systems with
The Mean Absolute Error (MAE): Applications, vol. 35, pp. 1721-1732, November 2008.
[5] J. Jurgovsky, M. Granitzer, K. Ziegler, S. Calabretto, P. E. Portier,
L. He-Guelton, and O. Caelen, “Sequence classification for
yt y't
N
1 (9)
MAE credit-card fraud detection,” Expert Systems with Applications, vol.
N t 1
100, pp. 234-245, June 2018.
The Root Mean Square Error (RMSE): [6] M. Hlosta, R. Stríž, J. Kupčík, J. Zendulka, and T. Hruška,
“Constrained classification of large imbalanced data by logistic
yt y't
2 regression and genetic algorithm,” International Journal of
1 N
RMSE
N t 1
(10) Machine Learning and Computing, vol. 3, no. 2, pp. 214-218,
April 2013.
[7] I. Benchaji, S. Douzi, and B. E. Ouahidi, “Novel learning strategy
In the above formula, yt represents the original value based on genetic programming for credit card fraud detection in big
data,” in Proc. International Conference Big Data Analytics, Data
of the t moment, y't represents the predicted value of the t Mining and Computational Intelligence, July 2019, pp. 3-10.
[8] A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi,
moment, and N is the total number of the test samples. If “Credit card fraud detection: A realistic modeling and a novel
the value of MAE, RMSE, and MAPE is smaller, then the learning strategy,” IEEE Transactions on Neural Networks and
deviation between the predicted value and the original Learning Systems, vol. 29, no. 8, pp. 3784-3797, August 2018.
[9] A. Abdallah, A. M. Maarof, and A. Zainal, “Fraud detection system:
value is also smaller. Table III lists the results obtained for A survey,” Journal of Network and Computer Applications, vol. 68,
LSTM model over the last 10 epochs. pp. 90-113, June 2016.
[10] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland,
TABLE III. LIST OF 10 LAST EPOCHS RESULTS “Data mining for credit card fraud: A comparative study,” Decision
Support Systems, vol. 50, no. 3, pp. 602-613, February 2011.
Epoch AUC MSE MAE [11] S. S. Dhok and G. R. Bamnote, “Credit card fraud detection using
1 0.9953 0.0037 0.0067 hidden Markov model,” International Journal of Advanced
0.9949 0.0042 0.0078 Research in Computer Science, vol. 3, no. 3, pp. 816-820, 2012.
2
[12] A. Srivastava, A. Kundu, S. Sural, and S. Member, “Credit card
3 0.9956 0.0034 0.0063 fraud detection using hidden Markov model,” IEEE Transactions
4 0.9951 0.0039 0.0069 on Dependable and Secure Computing, vol. 5, no. 1, pp. 37-48,
5 0.9955 0.0036 0.0066 February 2008.
6 0.9951 0.0038 0.0069 [13] A. D. Pozzolo, R. A. Johnson, O. Caelen, S. Waterschoot, N. V.
0.9953 0.0037 0.0067 Chawla, and G. Bontempi, “Using HDDT to avoid instances
7
propagation in unbalanced and evolving data streams,” in Proc.
8 0.9954 0.0036 0.0065 International Joint Conference on Neural Networks, July 2014, pp.
9 0.9951 0.0038 0.0069 588-594.
10 0.9955 0.0035 0.0065 [14] C. Phua, V. Lee, K. Smith, and R. Gayler, “A comprehensive
survey of data mining-based fraud detection research,” arXiv:
1009.6119, 2010.
VI. CONCLUSION [15] Y. Sahin, S. Bulkan, and E. Duman, “A cost-sensitive decision tree
approach for fraud detection,” Expert Systems with Applications,
In this study, we have proposed a sequence classifier vol. 40, no. 15, pp. 5916-5923, November 2013.
based on the LSTM networks to catch the consumer [16] A. D. Pozzolo, O. Caelen, Y. A. L. Borgne, S. Waterschoot, and G.
behavior of individual cardholders when constructing a Bontempi, “Learned lessons in credit card fraud detection from a
credit card fraud detection model. Future work will be
practitioner perspective,” Expert Systems with Applications, vol. [31] E. A. Lopez-Rojas and S. Axelsson, “A review of computer
41, no. 10, pp. 4915-4928, August 2014. simulation for fraud detection research in financial datasets,” in
[17] A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, Proc. Future Technologies Conference, December 2016, pp.
“Feature engineering strategies for credit card fraud detection,” 932-935.
Expert Systems with Applications, vol. 51, no. 1, pp. 134-142, June [32] G. Vaughan, “Efficient big data model selection with applications
2016. to fraud detection,” International Journal of Forecasting, June
[18] A. C. Bahnsen, A. Stojanovic, and D. Aouada, “Cost sensitive 2018.
credit card fraud detection using Bayes minimum risk,” in Proc.
the 12th International Conference on Machine Learning and Copyright © 2021 by the authors. This is an open access article
Applications, December 2013, pp. 333-338. distributed under the Creative Commons Attribution License (CC
[19] V. V. Vlasselaer, C. Bravo, O. Caelen, T. Eliassi-Rad, L. Akoglu, BY-NC-ND 4.0), which permits use, distribution and reproduction in
M. Snoeck, and B. Baesens, “APATE: A novel approach for any medium, provided that the article is properly cited, the use is
automated credit card transaction fraud detection using non-commercial and no modifications or adaptations are made.
network-based extensions,” Decision Support Systems, vol. 75, pp.
38-48, July 2015.
[20] V. R. Ganji and S. N. R. Mannem, “Credit card fraud detection Ibtissam Benchaji received the engineer’s
using anti-k nearest neighbor algorithm,” International Journal on degree from the National School of Applied
Computer Science and Engineering, vol. 4, no. 6, pp. 1035-1039, Sciences of Tangier, Morocco in 2009. She is
June 2012. currently a predoctoral researcher at the
[21] J. Pun and Y. Lawryshyn, “Improving credit card fraud detection Computer Science Department of the Faculty
using a meta-classification strategy,” International Journal of of Sciences Rabat Agdal at the University
Computer Applications, vol. 56, no. 10, pp. 41-46, October 2012. Mohammed V, Rabat, Morocco under the
[22] F. T. Liu, K. M. Ting, and Z. H. Zhou, “Isolation forest,” in Proc. supervision of Prof. El Ouahidi Bouabid. Her
the Eighth IEEE International Conference on Data Mining, 2008, current research interests include machine
pp. 413-422. learning techniques for anomaly and fraud
[23] X. Zhao, J. Zhang, and X. Qin, “Loma: A local outlier mining detection.
algorithm based on attribute relevance analysis,” Expert Systems
with Applications, vol. 84, no. 30, pp. 272-280, October 2017.
[24] C. S. Hemalatha, V. Vaidehi, and R. Lakshmi, “Minimal infrequent Samira Douzi received the master degree in
pattern based approach for mining outliers in data streams,” Expert development quality in 2013, from the
Systems with Applications, vol. 42, no. 4, pp. 1998-2012, March Department of Computer Science at the Faculty
2015. of Sciences Rabat Agdal. Since 2016 she is a
[25] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning predoctoral researcher in the Department
representations by back-propagating errors,” Nature, vol. 323, no. Computer Science at the Faculty of Sciences
6088, pp. 533-536, 1986. Rabat Agdal where she is pursuing a Ph.D.
[26] J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, degree. Her main researches interests include
no. 2, pp. 179-211, June 1990. big data, deep learning and cyber security.
[27] A. Graves and N. Jaitly, “Towards end-to-end speech recognition
with recurrent neural networks,” in Proc. the 31st International
Conference on Machine Learning, June 2014, pp. 1764-1772.
[28] I. A. Basheer and M. Hajmeer, “Artificial neural networks: Bouabid El Ouahidi is a university professor
Fundamentals, computing, design, and application,” Journal of and ex head of the Computer Science
Microbiological Methods, vol. 43, no. 1, pp. 3-31, December 2000. Department. He received Ph.D. Degree in
[29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” computer security from the University of
Neural Computation, vol. 9, no. 8, pp. 1735-1780, November Caen-France. His research interests include
1997. open distributed systems, quality of services of
[30] A. D. Pozzolo, O. Caelen, Y. A. L. Borgne, S. Waterschoot, and G. distributed applications, big data, cyber
Bontempi, “Learned lessons in credit card fraud detection from a security and machine learning.
practitioner perspective,” Expert Systems with Applications, vol.
41, no. 10, pp. 4915-4928, August 2014.