0% found this document useful (0 votes)

84 views10 pages

Credit Card Fraud Detection Using Machine Learning

Credit card fraud has existed ever since credit cards were introduced, resulting in financial losses, identity theft, severe security threatas, and misuse of personal information. Such a situation already dire at an individual level only worsens when an organization gets involved.

Uploaded by

IJRASETPublications

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views10 pages

Credit Card Fraud Detection Using Machine Learning

Uploaded by

IJRASETPublications

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

10 XI November 2022

https://doi.org/10.22214/ijraset.2022.47432
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

Credit Card Fraud Detection Using Machine

Learning
Disha A. Date1, Rugvedi Y. More2, Rutuja P. Harne3, Mr. Dilip M. Dalgade4
1, 2, 3 4
UG Student, Assistant Professor, Department of Computer Engineering, Rajiv Gandhi Institute of Technology–University of
Mumbai

Abstract: Credit card fraud has existed ever since credit cards were introduced, resulting in financial losses, identity theft, severe
security threatas, and misuse of personal information. Such a situation already dire at an individual level only worsens when an
organization gets involved. With the COVID-19 pandemic taking over the world and the introduction of quarantine, online
transactions have surged exponentially. Naturally, credit cards have become one of the main means to process these
transactions. With an extraordinary number of people making transactions every second, it becomes difficult to keep track of
fraudulent ones. An increase in online transactions also increases the risk of cybercrimes, in this case, fraudulent
transactions. In this paper, we proposed a fast yet light system to detect fraudulent transactions in the IEEE-CIS dataset. We
used feature engineering and data mining techniques to analyze the dataset and make it usable for the model. Then, we fit the
dataset to LightGBM and XGBoost to classify the transactions as fraud or non-fraud. Finally, we compared the performance of
the two models. Since we were dealing with a heavily skewed dataset, we gauged the performance of our model with the F1 score
and ROC AUC score. The obtained ROC AUC score was 95% for our proposed model.
Keywords: Credit card fraud, online transaction, feature engineering, machine learning, LightGBM.

I. INTRODUCTION
The number of e-commerce users has been steadily increasing in recent years, as has the size of online transactions. Fraudsters
frequently employ a variety of channels to steal card information and transfer big sums of money in a short period of time, resulting
in significant property losses for both customers and banks. As a result, machine learning and data mining can be used to create
fraud detection systems. Techniques used for this purpose are primarily classification-based. Data mining is applied to the dataset in
question, then, classification algorithms are implemented to detect fraudulent transactions.
Credit Card fraud detection is a heavily researched problem. Due to that, we were able to look at popularly used datasets and
algorithms for the same purpose as well as devise a way that would prove to be better than them. SVM [1], Naïve Bayes, Logistic
Regression, Artificial Neural Networks, Decision Trees, and K-Nearest Neighbours were initially widely used for classification in
this domain [7][17]. They provided a moderate accuracy between 80-90% but improved with the incorporation of data mining
techniques and hybrid models [3][5] the scores went up further. As we moved up the years, Random Forest was observed to be the
preferred choice to classify fraudulent data [2][4][6]. It was better at overcoming the errors caused by highly imbalanced data in the
fraud detection dataset. This occurs because each tree is generated by a random vector and each tree votes for the most popular
category to classify the inputs. Random Forest’s generalization performance is superior. Although it did extremely well, it had a
high training time and didn’t provide excellent results while working with a huge dataset. XGBoost, a step above Random Forest,
significantly reduced training times and increased the efficiency of memory usage [24][9]. However, in the year 2017, a faster
algorithm based on decision trees was released by Microsoft by the name of Light Gradient Boosting Machine or LightBGM. It is
lighter and faster than XGBoost hence, we selected that as our classifier [8].
In this paper, we establish a fraud detection system based on the Gradient Boosting Decision Trees (GBDT) techniques of
LightGBM and XGBoost. Also, the data processing done is related to feature engineering. The hidden knowledge behind the data is
gained through the cleaning, correction, extraction, selection, and summary of a vast number of data attributes. Our goal is to extract
the different information between the upcoming Gradient Boosting Decision Trees (GBDT) model in better achieving the goal of
detecting fraudulent transactions, to find various behaviour patterns of actual users and fraud activity patterns in the data.

A. Organisation of the Paper

Section 2 discusses the related work – the base concepts and the reasons for choosing those concepts. Section 3 talks about the
proposed system – a brief breakdown of our system, analysis of the dataset, and application of feature engineering techniques.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 745
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

In section 4 we discuss the results of our machine-learning model. In section 5, we discuss our results and show the comparison
between LightGBM and XGBoost models. The performance of the algorithms is compared with respect to recall, f1 score, and AUC
score parameters. Lastly, in section 7 we conclude our paper.

II. RELATED WORK

A. Ensemble Learning
Ensemble learning requires combining the strengths of a number of simpler base models to generate a prediction model. It can be
broken down into two tasks: developing a population of base learners from the training data, and then combining them to form the
composite predictor [11].
Ensemble learning methods exploit multiple machine learning algorithms to produce weak predictive results based on features
extracted through a diversity of projections on data, and fuse results with various voting mechanisms to achieve better performances
than that obtained from any constituent algorithm alone [10].
Ensemble learning has two methods that serve as umbrella terms for several methods under them: The first is averaging methods,
the driving principle here is to make multiple estimators individually and average their predictions. On average, the combined
estimators are usually better than the individual base estimators because of the small variance [12]. This includes methods such as
Random forests, bagging ensembles, etc.
The second, in contrast to the first, is boosting methods. Here, base estimators are built sequentially, and one tries to reduce the bias
of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble [12]. Examples of
these are AdaBoost, and Gradient tree boosting (LightGBM, XGBoost, etc.)
Ensembles have been shown to be an efficient way of improving predictive accuracy or/and decomposing a complex, difficult
learning problem into easier sub-problems [13].

B. XGBoost
XGBoost or Extreme Gradient Boosting is a large-scale, distributed general-purpose Gradient Boosting library, or a Boosting
iterative method [18]. The XGBoost model has the following benefits: great generalization, high expandability, and quick
processing speed. A series of base classifiers make up the XGBoost algorithm. Decision trees, KNN, SVM, logistic regression, and
other algorithms are among the available base classifiers. Following the determination of the base classifier, the base classifiers are
linearly overlaid to optimize the method.

C. LightGBM
Gradient Boosting Decision Trees (GBDT) techniques have been shown to be among the top machine learning algorithms. And
LightGBM is one of the superior ones. Both LightGBM and XGBoost are decision tree-based algorithms. The difference in them
arises from how the trees are formed. LightGBM splits the tree leaf wise according to the best fit unlike XGBoost, which splits the
tree depth wise. This means that when growing on the same leaf in LightGBM, the leaf-wise algorithm can reduce more loss than
the level-wise algorithm leading to much better accuracy which can rarely be achieved by any of the existing boosting algorithms
[21-23]. It is intended to be distributed and efficient. It offers the benefits of increased training efficiency and speed, reduction in
memory utilization and better precision. Parallel, distributed and GPU learning are all supported, and it can deal with massive
amounts of data.

D. Performance Metrics
The predicted values right or wrong are given a label on the basis of their correctness.
1) True Positive (TP): The prediction and actual value are both positive.
2) True Negative (TN): The prediction and actual value are both negative.
3) False Positive (FP): The model’s prediction for a negative value is wrong. It predicts the actual negative value as positive.
4) False Negative (FN): The model’s prediction for a positive value is wrong. It predicts the actual positive value as negative.
5) Accuracy: Percentage of correctly predicted values (positive and negative) over total values.
Accuracy = (TP+TN)/(TP+TN+FP+FN)

6) Precision: Percentage of correct positive predictions out of all positive predictions.

Precision = TP/(TP+FP)

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 746
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

7) Recall or True Positive Rate (TPR): Percentage of correct positive predictions out of all the actual positive values.
Recall = TP/(TP+FN)

8) F1-Score: Combines precision and recall into one single metric. Harmonic mean of precision and recall or the weighted average
of precision and recall.
F1 score = 2*(Precision*Recall)/(Precision+Recall)

9) False Positive Rate (FPR): Percentage of incorrect negative predictions out of all the actual negative values (actual negative
values predicted as positive).
False Positive Rate = FP/(FP+TN)

10) ROC AUC Score: The term ROC stands for Receiver Operating Characteristics and AUC refers to the area under the curve. To
speak about the ROC AUC score, we must first establish the ROC curve. It's a graph that depicts the trade-off between the true
positive rate (TPR) and the false positive rate (FPR). Basically, we calculate TPR and FPR for each threshold and plot them on
a single chart. Naturally, the higher the TPR and the lower the FPR for each threshold, the better, therefore classifiers with more
top-left-side curves are better. The more top-left the curve is the higher the area and higher the ROC AUC score.
We used Recall and ROC AUC as our performance metrics. Precision and Recall are sensitive to imbalanced data. Given a dataset,
if actual positive values increase, true positive rates also increase. And if all the values in the dataset increase, actual negative values
may also increase, meaning the value of true negative also increases. Hence, Recall is a good measure for imbalanced dataset as it
can change significantly with a change in the values of TP and TN.
AUC is a better performance measure than accuracy for comparing learning algorithms [19]. The closer the ROC score to 1, the
better will be the classifier. ROC AUC is a better measure of classifier performance than accuracy because it does not bias the size
of test or evaluation data. Accuracy is always biased on the size of test data.
Another reason to use metrics other than accuracy for imbalanced data is that accuracy is insensitive to it [20]. This is because as the
skew in our data increases, predicting the accuracy also increases - giving an illusion that our model is doing good. If some data has
1 fraudulent transaction and 99 nonfraudulent transactions, if the model always predicts nonfraudulent transactions, its accuracy will
be 99%. Since our data is heavily skewed, judging the model’s performance on the basis of accuracy will not be correct.

III. PROPOSED SYSTEM

To get the most efficient results, every stage of our project had to be analyzed in detail. Our project is divided into the following
steps:
1) The IEEE-CIS Dataset
2) Data Pre-processing
3) Data Analysis
4) Building the Model

Collection of data

Data Preprocessing Data Cleaning

Data Analysis Data Mining

Model Building Feature

Engineering

Evaluation

Fig. 1 System Flow

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 747
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

A. IEEE-CIS Dataset
The dataset transaction table and the identification table are the two types of tables in the IEEE-CIS dataset. The key transaction id
connects these two types of tables, however not all transactions provide identity information. The data has been classified into two
categories: isFraud = 0 or isFraud = 1. However, because some characteristic information is still missing, we clean and pre-process
the data before combining it to create the final training data. We can examine the two tables separately in order to better handle data.
Table I - Transaction Table
Name Description Type

Transaction ID ID of transaction ID

isFraud Binary values Categorical

TransactionDT Transaction date Time

TransactionAmt Transaction amount Numerical

card1-card6 Card (credit card) Categorical

ProductCD Product code Categorical

addr1-addr2 Address Categorical

M1-M9 Anonymous features Categorical

P_email domain Purchaser email domain Categorical

R_email domain Receiver email domain Categorical

dist1-dist2 Country distance Numerical

C1-C14 Anonymous features Numerical

D1-D15 Anonymous feature Numerical

V1-V339 Anonymous features Numerical, Categorical

Table II - Identity Table

Name Description Type

TransactionID ID of transaction ID

DeviceType Type of device Categorical

DeviceInfo Device information Categorical

id01-id11 Identification data Numerical

id12-id38 Identification data Numerical, Categorical

The above-mentioned dataset is highly skewed, meaning, the ratio of fraud and non-fraud data is drastic.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 748
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

Fig. 2 Comparison of fraudulent and non-fraudulent data

To deal with data of such nature, it is essential to clean and analyze it to extract the essential or useful attributes. By performing
feature engineering, we were able to reduce many features that were redundant or highly correlated which could have easily biased
our model and led to overfitting. We will elaborate on these techniques further.

B. Data Pre-processing
Since the dataset stage deals with data loading and integration. We focused on reducing the memory of the dataset and eliminating
highly correlated columns. Removing collinear features can aid in the generalization of a model while also improving its
interpretability. If the correlation coefficient exceeds the threshold, those collinear features are removed. Finally, we used frequency
encoding for training and testing. Frequency encoding is the optimal solution for this data since one hot encoding and label encoding
are not suitable for the nominal categorical data.

C. Data Analysis
Since the dataset has over 1 million samples, it is necessary for us to get rid of unnecessary data and extract valuable information
from usable data. Data mining and feature engineering are performed on the dataset to clean the data and handle missing values.
While cleaning the data, we deleted the columns which have a greater percentage of missing values. We deleted the columns having
only 1 unique value and having more than 90% missing values.
A very important attribute in the dataset is TransactionDT, but on its own, it is not very helpful. To obtain more accurate
information from it, we used a time series split. Because raw time-series data usually only has one single column to describe the
time attribute, namely date-time, feature engineering is critical in this area. Regarding this date-time data, feature engineering can be
seen as extracting useful information from such data as standalone (distinct) features - day of the week, month, hour, etc. With the
help of data visualization, we can get an insight of the data and visual graphs of the same.
To give an example, we can observe from Fig.3 that in December, the Fraud rate decreased. The month of December is prominent
in both train and test datasets.

Fig. 3 Month-wise fraud rate

Similarly, in the following Fig. 5, we can see that the fraud rate tends to increase from the beginning of the week and tends to
decrease on Friday and Sunday.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 749
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

Fig. 4 Day and week wise fraud rate

Further, we individually analysed categorical variables to either reduce their size, combine columns, or extract meaningful data by
splitting them further. We found that in the DeviceType attribute, test data had new devices train data didn’t have. To sort that
anomaly, we replaced those missing values with nan values. Then to get an idea of the device types, we visually represented it. It is
interesting to see here that most fraudulent transactions have taken place from mobile phones (Fig. 6).

Fig. 5 Device type distribution of fraud

For the attributes R_emaildomain and P_emaildomain, we split them into the mail server and domain name (.com, .net, .jp, .fr, etc.).
Hence two new useful attributes were created out of one which made processing easier.
This stage dealt with dropping highly correlated columns, creating aggregated features, handling missing values and reducing the
memory and dimensionality. Reduction of dimensionality was done by using frequency encoding for nominal categorical variables.
Finally, the datasets were saved in pickle format.

D. Model Building and Evaluation

PCA is a popular strategy for speeding up Machine Learning algorithms by removing linked variables that don't aid decision
making. The training time of the algorithms is considerably decreased by having fewer features. Furthermore, when a dataset has
too many variables, over fitting occurs. As a result, PCA helps Gto alleviate over fitting by reducing the number of features. PCA
was applied to the ‘V’ columns for 62 features.
Section 3 elaborated on ensemble learning, LightGBM and XGBoost model and the reason why we chose to use it. It is particularly
good at dealing with a large amount of data.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 750
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

IV. RESULTS
The confusion matric and the AUC ROC curve for our LightGBM model are shown in Fig. 6 and Fig. 7 respectively.
Since we were dealing with a heavily skewed dataset, we gauged the performance of our model with F1and ROC AUC. Table III
shows the comparison of ROC, Recall and F1-score for LightGBM and XGBoost. As seen from the table, LightGBM outperforms
XGBoost in all the above metrics for the same dataset.

Fig. 6 Confusion Matrix Fig. 7 AUC and ROC curve

Table III - Comparative Analysis

Model ROC Recall F1 Score

LightGBM 0.9547 0.8337 0.4478

XGBoost 0.8824 0.2704 0.4117

LightGBM (Fig. 8) performs better than XGBoost (Fig. 9) in terms of training time and memory use as well.

Fig. 8 Training time for LightGBM Fig. 9 Training time for XGBoost

V. CONCLUSIONS
In this paper, a machine learning based model---LightGBM---has been used for effective detection of financial fraud. Its
effectiveness is verified on the IEEE CIS dataset. The imbalance or skewed nature of the dataset has been investigated in this project.
Data pre-processing, particularly feature extraction, has been crucial to the performance of this model, as it is for any machine
learning workflow. This model has displayed effective handling of imbalanced cases in credit card fraud, showing an ROC-AUC
score of 0.95, Recall of 0.83 and, F1-score of 0.45 and performing better compared to the XGBoost Model on the same dataset
which gave an ROC-AUC score of 0.94, Recall of 0.47 and, F1-score of 0.41.

REFERENCES
[1] I. M. Mary, M. Priyadharsini, K. K and M. S. F, "Online Transaction Fraud Detection System," 2021 International Conference on Advance Computing and
Innovative Technologies in Engineering (ICACITE), 2021, pp. 14-16, doi: 10.1109/ICACITE51222.2021.9404750
[2] D. Shaohui, G. Qiu, H. Mai and H. Yu, "Customer Transaction Fraud Detection Using Random Forest," 2021 IEEE International Conference on Consumer
Electronics and Computer Engineering (ICCECE), 2021, pp. 144-147, doi: 10.1109/ICCECE51280.2021.9342259.
[3] O. Vynokurova, D. Peleshko, O. Bondarenko, V. Ilyasov, V. Serzhantova and M. Peleshko, "Hybrid Machine Learning System for Solving Fraud Detection
Tasks," 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP), 2020, pp. 1-5, doi: 10.1109/DSMP47368.2020.9204244.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 751
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

[4] D. Devi, S. K. Biswas and B. Purkayastha, "A Cost-sensitive weighted Random Forest Technique for Credit Card Fraud Detection," 2019 10th International
Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, pp. 1-6, doi: 10.1109/ICCCNT45670.2019.8944885
[5] M. R. Dileep, A. V. Navaneeth and M. Abhishek, "A Novel Approach for Credit Card Fraud Detection using Decision Tree and Random Forest Algorithms,"
2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021, pp. 1025-1028, doi:
10.1109/ICICV50876.2021.9388431.
[6] W. Deng, Z. Huang, J. Zhang and J. Xu, "A Data Mining Based System For Transaction Fraud Detection," 2021 IEEE International Conference on Consumer
Electronics and Computer Engineering (ICCECE), 2021, pp. 542-545, doi: 10.1109/ICCECE51280.2021.9342376
[7] Aisha Abdallah, Mohd Aizaini Maarof, Anazida Zainal,Fraud detection system: A survey,Journal of Network and Computer Applications,Volume
68,2016,Pages 90-113,ISSN 1084-8045, https://doi.org/10.1016/j.jnca.2016.04.007.
[8] Z. Song, "A Data Mining Based Fraud Detection Hybrid Algorithm in E-bank," 2020 International Conference on Big Data, Artificial Intelligence and Internet
of Things Engineering (ICBAIE), 2020, pp. 44-47, doi: 10.1109/ICBAIE49996.2020.00016.
[9] F. Wan, "XGBoost Based Supply Chain Fraud Detection Model," 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of
Things Engineering (ICBAIE), 2021, pp. 355-358, doi: 10.1109/ICBAIE52039.2021.9390041.
[10] Zhou Z H. Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC, 2012
[11] Hastie, T., Tibshirani, R., Friedman, J. (2009). Ensemble Learning. In: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York,
NY. https://doi.org/10.1007/978-0-387-84858-7_16
[12] https://scikit-learn.org/stable/modules/ensemble.html
[13] Bartosz Krawczyk, Leandro L. Minku, João Gama, Jerzy Stefanowski, Michał Woźniak, Ensemble learning for data stream analysis: A survey, Information
Fusion, Volume 37, 2017, Pages 132-156, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2017.02.004.
[14] Siddhant Bagga, Anish Goyal, Namita Gupta, Arvind Goyal, Credit Card Fraud Detection using Pipeling and Ensemble Learning, Procedia Computer Science,
Volume 173, 2020, Pages 104-112, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2020.06.014.
[15] Siddhartha Bhattacharyya, Sanjeev Jha, Kurian Tharakunnel, J. Christopher Westland, Data mining for credit card fraud: A comparative study, Decision
Support Systems, Volume 50, Issue 3, 2011, Pages 602-613, ISSN 0167-9236, https://doi.org/10.1016/j.dss.2010.08.008.
[16] F. Ahmed and R. Shamsuddin, "A Comparative Study of Credit Card Fraud Detection Using the Combination of Machine Learning Techniques with Data
Imbalance Solution," 2021 2nd International Conference on Computing and Data Science (CDS), 2021, pp. 112-118, doi: 10.1109/CDS52072.2021.00026.
[17] Khaled Gubran Al-Hashedi, Pritheega Magalingam, Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019,
Computer Science Review, Volume 40, 2021, 100402, ISSN 1574-0137, https://doi.org/10.1016/j.cosrev.2021.100402.
[18] Li, S., Zhang, X. Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Comput & Applic 32, 1971–1979
(2020). https://doi.org/10.1007/s00521-019-04378-4
[19] https://www.site.uottawa.ca/~stan/csi7162/presentations/William-presentation.pdf (AUC ref)
[20] https://towardsdatascience.com/selecting-the-right-metric-for-skewed-classification-problems-6e0a4a6167a7
[21] M. R. Machado, S. Karray and I. T. de Sousa, "LightGBM: an Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance
Industry," 2019 14th International Conference on Computer Science & Education (ICCSE), 2019, pp. 1111-1116, doi: 10.1109/ICCSE.2019.8845529.
[22] D. Zhang and Y. Gong, "The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure," in IEEE Access,
vol. 8, pp. 220990-221003, 2020, doi: 10.1109/ACCESS.2020.3042848
[23] Xiaojun Ma, Jinglan Sha, Dehua Wang, Yuanbo Yu, Qian Yang, Xueqi Niu, Study on a prediction of P2P network loan default based on the machine learning
LightGBM and XGboost algorithms according to different high dimensional data cleaning, Electronic Commerce Research and Applications, Volume 31,
2018, Pages 24-39, ISSN 1567-4223, https://doi.org/10.1016/j.elerap.2018.08.002.
[24] V. Jain, M. Agrawal and A. Kumar, "Performance Analysis of Machine Learning Algorithms in Credit Cards Fraud Detection," 2020 8th International
Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2020, pp. 86-88, doi:
10.1109/ICRITO48877.2020.9197762.

Credit Card Fraud Detect
No ratings yet
Credit Card Fraud Detect
19 pages
Supervised Machine Learning Algorithms For Credit Card Fraudulent Transaction Detection: A Comparative Study
No ratings yet
Supervised Machine Learning Algorithms For Credit Card Fraudulent Transaction Detection: A Comparative Study
4 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
6 pages
Credit Card Fraud Detection System
100% (1)
Credit Card Fraud Detection System
7 pages
Predictive Analytics With Machine Learning For Fraud Detection
No ratings yet
Predictive Analytics With Machine Learning For Fraud Detection
5 pages
Implementation of Credit Card Fraud Detection Using Support Vector Machine
No ratings yet
Implementation of Credit Card Fraud Detection Using Support Vector Machine
13 pages
A Review On Credit Card Fraud Detection Techniques Using ML
No ratings yet
A Review On Credit Card Fraud Detection Techniques Using ML
7 pages
Credit Card Fraud Detection System Using CNN
No ratings yet
Credit Card Fraud Detection System Using CNN
7 pages
Fraud Detection in Credit Card Data Using Unsupervised Machine Learning Algorithm
No ratings yet
Fraud Detection in Credit Card Data Using Unsupervised Machine Learning Algorithm
10 pages
Counterfeit Diagnosis Using Artificial Neural Network
No ratings yet
Counterfeit Diagnosis Using Artificial Neural Network
6 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
Credit Card Fraud Detection Using Machine Learning Algorithms
No ratings yet
Credit Card Fraud Detection Using Machine Learning Algorithms
8 pages
Credit_Card_Fraud_Detection_System
No ratings yet
Credit_Card_Fraud_Detection_System
16 pages
Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Fraud Detection in Banking Data by Machine Learning Techniques
10 pages
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
No ratings yet
Online Transaction Fraud Detection Using Backlogging on e Commerce Website IJERTV11IS050319 (1)
6 pages
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
63 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
No ratings yet
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
7 pages
JETIR2305424
No ratings yet
JETIR2305424
6 pages
Credit Card Fraud Detection Using Machine Learning Methods
No ratings yet
Credit Card Fraud Detection Using Machine Learning Methods
7 pages
Research Paper 4 (Abnormal Transactions)
No ratings yet
Research Paper 4 (Abnormal Transactions)
7 pages
1_IJSC_Vol_14_Iss_1_Paper_1_3089_3093
No ratings yet
1_IJSC_Vol_14_Iss_1_Paper_1_3089_3093
5 pages
Anomaly Detection Using Machine Learning
No ratings yet
Anomaly Detection Using Machine Learning
4 pages
Special Issue On Innovations and Technology in FinTech 2023 - Unveiled at GFF 2023
No ratings yet
Special Issue On Innovations and Technology in FinTech 2023 - Unveiled at GFF 2023
86 pages
Performance Analysis of Algorithms for Credit Card Fraud Detection
No ratings yet
Performance Analysis of Algorithms for Credit Card Fraud Detection
4 pages
Credit Card Fraud Detection-ppt-1
100% (1)
Credit Card Fraud Detection-ppt-1
22 pages
Research Paper Danish
No ratings yet
Research Paper Danish
6 pages
Comparing ML Algorithms On Financial Fraud Detection N
No ratings yet
Comparing ML Algorithms On Financial Fraud Detection N
5 pages
UPI Fraud Detection
100% (1)
UPI Fraud Detection
5 pages
A Survey on the Identification of Credit Card Fraud Using Machine Learning with Precision, Performance, and Challenges
No ratings yet
A Survey on the Identification of Credit Card Fraud Using Machine Learning with Precision, Performance, and Challenges
8 pages
Credit Card Fraud Detection Web Application Using Streamlit and Machine Learning
No ratings yet
Credit Card Fraud Detection Web Application Using Streamlit and Machine Learning
5 pages
Bioconf Iscku2024 00076
No ratings yet
Bioconf Iscku2024 00076
18 pages
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
No ratings yet
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
23 pages
Credit_Card_Fraud_Detection_Framework_A
No ratings yet
Credit_Card_Fraud_Detection_Framework_A
5 pages
Granular computing framework for credit card fraud detection
No ratings yet
Granular computing framework for credit card fraud detection
15 pages
1 PB
No ratings yet
1 PB
9 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
11 pages
Detecting Fraudulent Transactions
No ratings yet
Detecting Fraudulent Transactions
3 pages
Using Machine Learning Models To Detect The Increasing Threats of Financial Fraud in The Cyberspace
No ratings yet
Using Machine Learning Models To Detect The Increasing Threats of Financial Fraud in The Cyberspace
8 pages
A_Review_of_Machine_Learning_Applications_for_Cred
No ratings yet
A_Review_of_Machine_Learning_Applications_for_Cred
11 pages
Credit Card Fraud Detection Using Machine Learning and Blockchain
100% (1)
Credit Card Fraud Detection Using Machine Learning and Blockchain
9 pages
Fraud Detection in Banking Data by Machine Learning
No ratings yet
Fraud Detection in Banking Data by Machine Learning
11 pages
s11042-023-14698-2
No ratings yet
s11042-023-14698-2
19 pages
Ms Arjocs 1355
No ratings yet
Ms Arjocs 1355
13 pages
Paper9-Ijisae 12 Batini+Dhanwanth
No ratings yet
Paper9-Ijisae 12 Batini+Dhanwanth
10 pages
ML
No ratings yet
ML
22 pages
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
No ratings yet
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
4 pages
IRJET-V10I12130
No ratings yet
IRJET-V10I12130
5 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Online Transaction Fraud Detection System Based On Machine Learning
No ratings yet
Online Transaction Fraud Detection System Based On Machine Learning
4 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Credit Card Fraud 1.4% Positive Class
No ratings yet
Credit Card Fraud 1.4% Positive Class
17 pages
HR template
No ratings yet
HR template
6 pages
Literature Review
No ratings yet
Literature Review
8 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
7 pages
Elevating Fraud Detection: Machine Learning Models With Computational Intelligence Optimization
No ratings yet
Elevating Fraud Detection: Machine Learning Models With Computational Intelligence Optimization
8 pages
MPML10 2022 FR
No ratings yet
MPML10 2022 FR
24 pages
SECURITY PROVISION FOR UPI TRANSCATIONS USING MACHINE LEARNING TO DETECT FRAUD TRANSCATIONS
No ratings yet
SECURITY PROVISION FOR UPI TRANSCATIONS USING MACHINE LEARNING TO DETECT FRAUD TRANSCATIONS
5 pages
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Design and Analysis of Fixed-Segment Carrier at Carbon Thrust Bearing
No ratings yet
Design and Analysis of Fixed-Segment Carrier at Carbon Thrust Bearing
10 pages
Air Conditioning Heat Load Analysis of A Cabin
No ratings yet
Air Conditioning Heat Load Analysis of A Cabin
9 pages
Design and Analysis of Components in Off-Road Vehicle
No ratings yet
Design and Analysis of Components in Off-Road Vehicle
23 pages
Se of Optimism Software To Observe Effect of Different Sources in Optical Fiber
No ratings yet
Se of Optimism Software To Observe Effect of Different Sources in Optical Fiber
7 pages
IoT-Based Smart Medicine Dispenser
100% (1)
IoT-Based Smart Medicine Dispenser
8 pages
Study and Analysis of Non-Newtonian Fluid Speed Bump
No ratings yet
Study and Analysis of Non-Newtonian Fluid Speed Bump
8 pages
Adsorption Study On Waste Water Characteristics by Using Natural Bio-Adsorbents
No ratings yet
Adsorption Study On Waste Water Characteristics by Using Natural Bio-Adsorbents
6 pages
Study and Analysis of Non-Newtonian Fluid Speed Bump
No ratings yet
Study and Analysis of Non-Newtonian Fluid Speed Bump
8 pages
11 V May 2023
No ratings yet
11 V May 2023
34 pages
Advanced Wireless Multipurpose Mine Detection Robot
No ratings yet
Advanced Wireless Multipurpose Mine Detection Robot
7 pages
Role of Artificial Intelligence in Emotion Recognition
No ratings yet
Role of Artificial Intelligence in Emotion Recognition
5 pages
Controlled Hand Gestures Using Python and OpenCV
No ratings yet
Controlled Hand Gestures Using Python and OpenCV
7 pages
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
No ratings yet
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
5 pages
Topology Optimisation of Piston
No ratings yet
Topology Optimisation of Piston
8 pages
TNP Portal Using Web Development and Machine Learning
No ratings yet
TNP Portal Using Web Development and Machine Learning
9 pages
Skill Verification System Using Blockchain SkillVio
No ratings yet
Skill Verification System Using Blockchain SkillVio
6 pages
Real Time Human Body Posture Analysis Using Deep Learning
100% (1)
Real Time Human Body Posture Analysis Using Deep Learning
7 pages
Structural Analysis of The Performance of The Diagrid System With and Without Shear Wall
No ratings yet
Structural Analysis of The Performance of The Diagrid System With and Without Shear Wall
13 pages
Design and Analysis of Fixed Brake Caliper Using Additive Manufacturing
No ratings yet
Design and Analysis of Fixed Brake Caliper Using Additive Manufacturing
9 pages
BIM Data Analysis and Visualization Workflow
No ratings yet
BIM Data Analysis and Visualization Workflow
7 pages
Image Detection and Real Time Object Detection
100% (1)
Image Detection and Real Time Object Detection
8 pages
Smart Parking System Using MERN Stack
No ratings yet
Smart Parking System Using MERN Stack
6 pages
Pneumonia Detection Using X-Rays by Deep Learning
No ratings yet
Pneumonia Detection Using X-Rays by Deep Learning
6 pages
Fund Future Empowering The Crowdfunding
No ratings yet
Fund Future Empowering The Crowdfunding
6 pages
CryptoDrive A Decentralized Car Sharing System
100% (1)
CryptoDrive A Decentralized Car Sharing System
9 pages
Comparative in Vivo Study On Quality Analysis On Bisacodyl of Different Brands
No ratings yet
Comparative in Vivo Study On Quality Analysis On Bisacodyl of Different Brands
17 pages
Low Cost Scada System For Micro Industry
No ratings yet
Low Cost Scada System For Micro Industry
5 pages
Dark Store E-Commerce Website Using Sentiment Analysis Prediction
No ratings yet
Dark Store E-Commerce Website Using Sentiment Analysis Prediction
6 pages
Business Support System For Local Stores
No ratings yet
Business Support System For Local Stores
8 pages
AR GP_2005
No ratings yet
AR GP_2005
52 pages
Building Project Site visit questionnaire
No ratings yet
Building Project Site visit questionnaire
5 pages
Offer Letter Coforge
No ratings yet
Offer Letter Coforge
4 pages
Automate PDF Testing
No ratings yet
Automate PDF Testing
4 pages
Seattle Transit Benefit District Annual Report
No ratings yet
Seattle Transit Benefit District Annual Report
56 pages
Duralast Water Outlet DL902-846
No ratings yet
Duralast Water Outlet DL902-846
1 page
Filtros Minicargador 246D
No ratings yet
Filtros Minicargador 246D
2 pages
DX330 Manual Eng
No ratings yet
DX330 Manual Eng
45 pages
Longino V General
No ratings yet
Longino V General
2 pages
Lab2 SNMP Advanced
No ratings yet
Lab2 SNMP Advanced
6 pages
Sans10222 5 1 5
No ratings yet
Sans10222 5 1 5
22 pages
MGT Micro Group 1
No ratings yet
MGT Micro Group 1
18 pages
90 NBR 109-En - 1
No ratings yet
90 NBR 109-En - 1
3 pages
Khirun Naher (111 171 159) (Sec C)
No ratings yet
Khirun Naher (111 171 159) (Sec C)
4 pages
Rex's Resume
No ratings yet
Rex's Resume
1 page
Project Report On Brand Positioning of Different Soaps of Hindustan Unilever Limited
No ratings yet
Project Report On Brand Positioning of Different Soaps of Hindustan Unilever Limited
3 pages
Beautiful Soup Documentation
No ratings yet
Beautiful Soup Documentation
61 pages
Hybrid Self Tuned Fuzzy PID Controller For Speed Control of Brushless DC Motor
No ratings yet
Hybrid Self Tuned Fuzzy PID Controller For Speed Control of Brushless DC Motor
9 pages
PR1 Group 3 Final
50% (2)
PR1 Group 3 Final
44 pages
SWOT Analysis of Google
No ratings yet
SWOT Analysis of Google
11 pages
HRPro User Manual.v2
No ratings yet
HRPro User Manual.v2
62 pages
UNO Introduction
No ratings yet
UNO Introduction
68 pages
Bomba SPP Mod. TD12F - 750@150
No ratings yet
Bomba SPP Mod. TD12F - 750@150
13 pages
Dessler Hrm12ge PPT 03
0% (1)
Dessler Hrm12ge PPT 03
42 pages
Cash Flow Statement of Amul
50% (6)
Cash Flow Statement of Amul
6 pages
Troubleshooting-Guide-new-number - Pad Printing
100% (1)
Troubleshooting-Guide-new-number - Pad Printing
16 pages
XPR2-400.815.04.04.02
No ratings yet
XPR2-400.815.04.04.02
84 pages
Effect of ASUU Strike On Academic Performance of Students
No ratings yet
Effect of ASUU Strike On Academic Performance of Students
1 page
Comparison of Landfill Leachate Treatment Methods
No ratings yet
Comparison of Landfill Leachate Treatment Methods
48 pages
OPENMARK 4000 System Specifications
No ratings yet
OPENMARK 4000 System Specifications
7 pages

Credit Card Fraud Detection Using Machine Learning

Uploaded by

Credit Card Fraud Detection Using Machine Learning

Uploaded by

10 XI November 2022

Credit Card Fraud Detection Using Machine

A. Organisation of the Paper

II. RELATED WORK

6) Precision: Percentage of correct positive predictions out of all positive predictions.

III. PROPOSED SYSTEM

Data Preprocessing Data Cleaning

Data Analysis Data Mining

Model Building Feature

Fig. 1 System Flow

isFraud Binary values Categorical

TransactionDT Transaction date Time

TransactionAmt Transaction amount Numerical

card1-card6 Card (credit card) Categorical

ProductCD Product code Categorical

addr1-addr2 Address Categorical

M1-M9 Anonymous features Categorical

P_email domain Purchaser email domain Categorical

R_email domain Receiver email domain Categorical

dist1-dist2 Country distance Numerical

C1-C14 Anonymous features Numerical

D1-D15 Anonymous feature Numerical

V1-V339 Anonymous features Numerical, Categorical

Table II - Identity Table

DeviceType Type of device Categorical

DeviceInfo Device information Categorical

id01-id11 Identification data Numerical

id12-id38 Identification data Numerical, Categorical

Fig. 2 Comparison of fraudulent and non-fraudulent data

Fig. 3 Month-wise fraud rate

Fig. 4 Day and week wise fraud rate

Fig. 5 Device type distribution of fraud

D. Model Building and Evaluation

Fig. 6 Confusion Matrix Fig. 7 AUC and ROC curve

Table III - Comparative Analysis

LightGBM 0.9547 0.8337 0.4478

XGBoost 0.8824 0.2704 0.4117

You might also like