Itmconf Icdsia2023 02012
Itmconf Icdsia2023 02012
1051/itmconf/20235302012
ICDSIA-2023
1 Introduction
In the time of a pandemic, online shopping has proven beneficial to the public, as they can
purchase anything they want from the comfort of their homes. Online payment is
comfortable, convenient, and easy to use. Now a days, at the time of shopping, many people
use a credit card for payment purposes. A credit card can be described as a thin, rectangular
piece of plastic or metal issued to a number of users that can be used as one of the modes of
payment. Generally, credit cards offer certain credit limits, which can be used to make
purchases, transfer balances, or make cash advances, and it is essential that the user pay back
the loan amount in the future.
A credit card user needs to pay a minimal remittance every month by the due date on the
balance. It is a fact that there are a good number of advantages to credit card usage, but we
cannot ignore the financial losses that normally result from online payments done through a
credit card. Nowadays, criminals use a credit card to commit fraud. We can describe a fraud
as using money, goods, or services in an illegal manner. Credit card fraud can be described
as when a person uses another person’s credit card for personal reasons while the card owner
and the card issuer company are both unaware of it.
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
The person who is using another person's credit card does not have any connection with
the cardholder. The figure for the number of credit card users has increased in several
countries, but due to a lack of trust in the payment system, many users don’t use credit cards
for payment or have abandoned the use of cards. Therefore, there is a need for a reliable fraud
detection system so that credit card users can use their cards safely. Fraud detection can be
described as a classification problem. A fraud can be detected after examining a large number
of transactions, identifying them, and then categorising them into fraudulent and genuine
transactions. Different types of credit card fraud exist; a few of them are: application fraud,
duplication fraud, identity fraud, skimming, CNP, lost and stolen card fraud, mail non-receipt
card fraud, account takeover, triangulation, merchant collusion, and site cloning.
2 Literature Review
2
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
been concluded that ANN gives the best accuracy [8]. Dejan Varmedja, Mirjana Karanovic,
Srdjan Sladojevic, Marko Arsenovic, and Andras Anderla have compared various algorithms
like Logistic Regression, Random Forest, Naive Bayes, and Multilayer Perceptron on the
basis of accuracy, recall, and precision and concluded that Random Forest is a more suitable
algorithm for credit card fraud [9]. Navanshu Khare and Saad Yunus Sait have presented a
paper in which the decision tree, support vector machines, logistic regression, and random
forest algorithms have been compared on the basis of various metrics and reached the
conclusion that random forest is more accurate in comparison with the other three [10].
3 Experimental Methodology
3.1 Data Analysis
The dataset and reference for analysis have been taken from the Kaggle site [12]. In this
paper, we have tried to show various graphs that will provide more insight into the
information in more user-friendly manner than the reference material. The dataset is a
simulated credit card transaction dataset containing legitimate and fraudulent transactions
from the duration January 1, 2019 to December 31, 2020. It can be seen from the following
figure that the dataset is highly imbalanced.
As part of the data cleaning, it has been taken care to check for null values as well as
duplicate records in the dataset. Type casing hasbeen applied wherever; it was required. As
part of the data preprocessing, OneHotEncoding method is used for category and gender
features. The Target Guided Mean encoding method is used for state and trans_dayofweek
features.
The number of records in the training dataset is 129667. The number of records in the test
dataset is 555719. It can be seen that the ratio between the training and test datasets is
60.0%:40.0%.
3
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
The following figure shows the gender distribution among male and female card holders
for fraud as well as normal transactions. It canalso be observed that the fraud transactions are
equally distributed among male and female card owners.
4
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
From the following figure, it can be observed that when genuine card owners are sleeping
between the 21st and to 04 hour at that time majority of fraud transactions are taken place.
From the following figure, it can be seen that fraud transactions are occurring more in
categories like grocery_pos, shopping_net, and misc_net. The attribute “is_fraud” with a
value 1 indicates that the transaction is fraudulent, and 0 indicates that the transaction is not-
fraudulent.
As we have seen in the previous section, that dataset is imbalanced. It has been observed
that ML algorithms find difficulty in learning when classification category data are not
equally distributed. Because of the large volume of data, parameter tuning may take longer.
Therefore, we have taken samples from both train and test datasets to work with the different
5
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
models. To balance the imbalanced dataset, we have taken the help of SMOTE and
RandomUnderSampler techniques. For parameter Tuning, we have used
HalvingRandomSearchCV because it is faster than GridSearchCV and
RandomizedSearchCV.
Here, the data has been tested by using various ML algorithms. For any data science work,
a few packages are vital to use. During the implementation, NumPy (numeric Python) is used
for numeric calculation, Pandas is used for reading data and storing it in specific variables,
Matplotlib is used for visualizing the data, and Seaborn is used for customization like colour
setting. Anaconda navigator is used to implement machine learning algorithms. Jupyter
Notebook is used to process the written code. For implementation purposes, decision trees
and random forest algorithms are used.
A decision tree is one of the supervised algorithms. It is used to solve classification and
regression problems. Decision trees always begin with a root node, which can be considered
a starting point situated at the top. Tree is followed by splits that produce branches. A leaf
node does not produce any new branches, and it results in a terminal node. Decision trees use
the concept of entropy. Entropy indicates the measure of variance in the data among separate
classes.
The random forest classifier is applicable for multiple conditions. It is an improved
version of a decision tree classifier. A decision tree classifier is applicable to one condition
only, while a random forest is applicable to multiple conditions.
Due to the nature of the dataset, any classifier will have 100% accuracy. So, accuracy is
not the proper metrics and therefore, other metrices are used for evaluating the performance
of the model. In the paper [11], various classifiers’ results have been compared with an
analysis of them, and using that reference, we have decided to use decision trees and random
forest algorithms for our dataset. The following figure shows the result of the performance
evaluation of the decision tree classifier before parameter tuning.
6
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
As can be seen from the figures 6 and 7, it can be observed that the classifier is suffering
with overfitting problem, as it gives good performance on train data but not on test data. The
following figure shows the result of performance evaluation of random forest classifier before
parameters Tuning.
7
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
Again, after hyper parameter tuning, we have the best parameters like 'n_estimators',
'min_samples_split',min_samples_leaf','max_features' and 'max_depth'. The model has been
fitted again with these best parameters, and we have got the following result.The following
figures show the random forest classifier’s performance and confusion matrix after parameter
tuning.
As from the figures 9 and 10, it can be observed that the result has been improved a bit in
comparison of decision tree algorithm, but the random forest classifier still gives good
performance on training data but not on test data.
8
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
Fig. 11. Confusion Matrix for Random Forest After Parameter Tuning
4 Conclusion
This paper emphasizes the significance of technological advancements and the widespread
availability of online shopping. It acknowledges the time-saving benefits and convenience of
online shopping, particularly the elimination of the need to physically visit stores. Credit card
payment emerges as a popular mode of transaction in this digital era, with a substantial
number of credit card users worldwide. However, the increasing prevalence of fraudulent
credit card transactions poses challenges for both banks and customers, resulting in financial
losses. To address these issues, the paper underscores the importance of implementing a
secure credit card fraud detection system. It explores the application of various machine
learning algorithms, including Naïve Bayes, Logistic Regression, SVM, Decision Trees,
Random Forest, Genetic Algorithm, J48, and AdaBoost, for credit card fraud detection. These
algorithms play a crucial role in analyzing datasets and identifying fraudulent transactions
accurately.
References
1. Ms. K. RamaKalyani and Prof. Dr. D. Uma Devi, Fraud Detection of Credit Card
Payment System by Genetic Algorithm in the International Journal of Scientific &
Engineering Research Volume 3, Issue 7, July-2012, ISSN 2229- 5518
2. Ms. Rimpal R. Popat and Mr. Jayesh Chaudhary, A Survey on Credit Card Fraud
Detection Using Machine Learning in the Proceedings of the 2nd International
Conference on Trends in Electronics and Informatics (ICOEI 2018) IEEE Conference
Record: # 42666; IEEE Xplore ISBN: 978-1-5386-3570-4
3. S P Maniraj, Aditya Saini, Swarna Deep Sarkar, Shadab Ahmed, Credit Card Fraud
Detection Using Machine Learning and Data Science in the International Journal of
Engineering Research and Technology (IJERT), ISSN: 2278- 0181,Vol. 8 Issue 09,
September-2019
4. Aman Gulati, Prakash Dubey, MdFuzailC, Jasmine Norman and Mangayarkarasi R,
Credit card fraud detectionUsing neural network and geolocation in the 14th ICSET-
9
ITM Web of Conferences 53, 02012 (2023) https://doi.org/10.1051/itmconf/20235302012
ICDSIA-2023
2017 ,IOP Conf. Series: Materials Science and Engineering 263 (2017) 042039
doi:10.1088/1757-899X/263/4/042039
5. Andhavarapu Bhanusri, K.Ratna Sree Valli , P.Jyothi , G.Varun Sai , R.Rohith Sai
Subash, Credit card fraud detection Using Machine learning algorithms in the Quest
Journals, Journal of Research in Humanities and Social Science, Volume 8 ~ Issue 2
(2020)pp.: 04-11, ISSN(Online):2321-9467
6. Mr. John O. Awoyemi, Mr. Adebayo O. Adetunmbi and Mr. Samuel A. Oluwadare,
Credit card fraud detection using Machine Learning Techniques: A Comparative
Analysis,978-1-5090-4642-3/17/$31.00 ©2017 IEEE
7. Ms. Heta Naik,Credit Card Fraud Detection for Online Banking Transactions in the
International Journal for Research in Applied Science and Engineering Technology
(IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue
IV, April 2018
8. Mr. Varun Kumar K S, Mr. Vijaya Kumar V G, Mr. Vijay Shankar A and Ms. Pratibha
K, Credit Card Fraud Detection using Machine Learning Algorithms in the International
Journal of Engineering Research & Technology (IJERT) http://www.ijert.org , ISSN:
2278-0181 , Vol. 9 Issue 07, July-2020
9. Dejan Varmedja, Mirjana Karanovic, Srdjan Sladojevic, Marko Arsenovic, Andras
Anderla, Credit Card Fraud Detection - Machine Learning methods in the Conference
Paper, March 2019 DOI: 10.1109/INFOTEH.2019.8717766 978-1-5386-7073-
6/19/$31.00 ©2019 IEEE
10. Navanshu Khare and Saad Yunus Sait, Credit Card Fraud Detection Using Machine
Learning Models and Collating Machine Learning Models in the International Journal of
Pure and Applied Mathematics Volume 118 No. 20 2018, 825- 838 ISSN: 1314-3395
(on-line version)
11. Mr. Dhwanir Shah and Dr. Lokesh Kumar Sharma, A Survey on Credit Card Fraud
Detection Using Machine Learning in the National Conference on Contemporary
Practices in Management & Information Technology KSCON2021 (Virtual mode)
(November 2021), published in the E-Book with ISBN No. 978-93-92008-00-9
12. Kaggle.com. Credit Card Fraud Detection. [online] Available at:
https://www.kaggle.com/code/rahulrajml/fraud-detection-systematic-approach/data
10