0% found this document useful (0 votes)

4 views

PaymentsFraudDetectionusingMLmethodsExploringPerformanceEthicalandReal-WorldConsiderationsinMachineLearningbasedFraudDetectionforSecurePayments

This conference paper explores the use of machine learning (ML) methods for detecting payments fraud, focusing on various algorithms and techniques to improve classification accuracy. It highlights the significance of explainable AI (XAI) and responsible AI (RAI) in addressing fairness and bias in ML models, while also discussing challenges like imbalanced datasets. The research aims to enhance fraud detection strategies, ensuring secure payment systems through effective ML applications.

Uploaded by

trippin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

PaymentsFraudDetectionusingMLmethodsExploringPerformanceEthicalandReal-WorldConsiderationsinMachineLearningbasedFraudDetectionforSecurePayments

Uploaded by

trippin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/373756548

Payments Fraud Detection using ML methods: Exploring Performance, Ethical

and Real-World Considerations in Machine Learning-based Fraud Detection
for Secure Payments

Conference Paper · September 2023

CITATIONS READS

0 496

3 authors, including:

Seyed Sahand Mohammadi Ziabari

State University of New York
128 PUBLICATIONS 348 CITATIONS

SEE PROFILE

All content following this page was uploaded by Seyed Sahand Mohammadi Ziabari on 08 September 2023.

The user has requested enhancement of the downloaded file.

Payments Fraud Detection using ML methods: Exploring
Performance, Ethical and Real-World Considerations in
Machine Learning-based Fraud Detection for Secure
Payments

Agniv Chatterjee 1, S. Sahand Mohammadi Ziabari1 [0000-0003-3803-6714], Amr Elsherbini2

1
University of Amsterdam, Amsterdam, The Netherlands
2
Deloitte, Amsterdam, The Netherlands
[email protected],
[email protected], [email protected]

Abstract. Advancements in machine learning (ML) techniques have revolution-

ized fraud detection and prevention in the payments industry. This research in-
vestigates the application of ML algorithms, including MLPs, Random Forests,
and gradient boosting models, in classifying fraudulent transactions. Addition-
ally, outlier detection and data resampling techniques were found to have a sig-
nificant impact on model performance. This paper highlights the importance of
explainable AI (XAI) and responsible AI (RAI) in evaluating fairness, bias, and
interpretability of ML models. Upon investigating outlier detection techniques, it
was found that approaches like the Local Outlier Factor (LOF) and Isolation For-
est algorithms are more effective than the Z-score method in enhancing the per-
formance of ML classifiers. However, despite their effectiveness, the impact of
these techniques was overshadowed by the significant improvements achieved
through the implementation of SMOTE (Synthetic Minority Oversampling Tech-
nique) These findings contribute to more effective fraud detection strategies, en-
suring secure and reliable payment systems.

Keywords: Machine Learning, Deceptive Transaction, Fraudulent Transaction,

Responsible AI.

1 Introduction

The number and severity of cyber-attacks are increasing rapidly these days, with mali-
cious parties often targeting transactions made by banks, financial institutions and pay-
ment service providers (PSPs). The European Central Bank found that in 2019 alone,
the value of fraudulent transactions amounted to over €1.8 billion [1]. Credit card fraud
has thus become a major issue for banks and financial institutions leading to theft of
funds, stolen personal data and massive financial damages to customers and business
owners. Traditional fraud detection methods are often found to be limited by their lack
of adaptability to new & evolving patterns of fraud. The aim of this project is to develop
a fraud detection system using machine learning (ML) techniques to classify fraudulent
2

transactions. This can serve to improve monitoring, response and support many of the
current threat mitigation practices in place at financial institutions. However, there are
challenges associated with use of ML for fraud detection, namely having imbalanced
datasets where the number of legitimate transactions far outnumber fraudulent pay-
ments and can lead to model bias [8, 14]. While there is a growing body of research on
the issue of class imbalance, there is a lack of studies that address the problem using
anomaly detection algorithms [2, 6]. This paper has been done in collaboration with
Deloitte Cyber and involve leveraging their expertise to understand the security land-
scape for payments, including the types of threats faced, tactics used by attackers, indi-
cators for fraud and the selection of appropriate threat mitigation strategies. Through
this paper we are looking to answer the following research question: To what extent
can machine learning (ML) techniques be used to develop methods supporting fraud
detection and security of payments?

2 Related Work

Previous works have shown that ML-based fraud detection can help improve accuracy
and efficiency of threat mitigation by analyzing large amounts of data, identifying fea-
tures and patterns in the data and adapting to them. For example, Mathew et. al inves-
tigate common fraud interface processes and propose a methodology for classifying
fraudulent credit card transactions, applying Logistic Regression, K-NN, Random For-
est and Decision tree methods for fraud detection on a credit card dataset [10]. A similar
study by Asha and Kumar [4] also evaluated SVMs and K-NN alongside artificial neu-
ral networks, achieving high numbers for accuracy, precision and recall. Khine and
Khin propose a novel boosting approach using ensemble methods and apply it for fraud
detection with sample benchmark datasets [7]. However, as introduced there are chal-
lenges associated with the use of ML for fraud detection. For instance, both Kulatilleke
[8] and Tomar et. al [14] address the challenge of having imbalanced fraud data, where
the number of fraudulent transactions is far less than the number of legitimate payments
and can lead to bias. Attempts to solve this problem have mainly looked to resampling
techniques [3, 11], cost-sensitive learning [13] and ensemble methods such as bagging
[14]. However, as mentioned in Section 1, there is a lack of research addressing the
problem of class imbalance using anomaly detection algorithms. These techniques have
various advantages as they can be used to identify instances of the minority class and
give them more weight in the model, effectively balancing the class distribution. This
provides a more balanced dataset for training a machine learning model and helps re-
duce model bias. They can also be used to identify unusual patterns in data and flag
transactions that deviate from normal behavior, which can be a strong indicator for
fraud. Finally, there is need for more research on the interpretability of ML models, as
it is important to understand how a model makes predictions in order to ensure legiti-
mate transactions are not incorrectly flagged as fraudulent. Explainable AI (XAI) tech-
niques can help provide insights into the decision-making process of ML models and
make it more transparent, so that the decisions made by a model can be justified and
trusted by humans.
3

3 Methodology
The method followed for this study relies on a machine learning process for classifying
credit card transactions, followed by research on current approaches to fraud detection
and how they can be supported by these ML techniques.

3.1 Experimental Setup Machine Learning Process

In order to carry out the experiment and train classifiers that can predict fraudulent
transactions, several key steps such as data collection, Exploratory Data Analysis
(EDA), preparation and preprocessing, outlier detection, model training and evaluation
were undertaken. Figure 1 depicts an overview of these operations with regards to the
datasets used and provides a step-by-step progression of the experimentation con-
ducted.

Fig.1. Overview of Machine Learning process used.

Data collection & EDA. The datasets being considered for model training are the Vesta
1 and ULB 2 datasets. These have been selected as they are the largest public datasets
found and have been previously used in research. The Vesta dataset contains 590,540
transactions, of which 20,663 are labelled fraudulent. Each transaction in the dataset
has 430 distinct features, including a transactionID, transactionDT timestamp, pay-
ment amount (TransactionAmt) and class label (isFraud) of whether it is legitimate or
fraudulent. For anonymization purposes, the names of the identity features have been
hidden although some information such as browser, OS, screen resolution, device used
etc. can be inferred from the data.
Data preparation and preprocessing. In this phase, the aim is to prepare the dataset
and format it appropriately for subsequent steps. This involved replacing missing val-
4

ues for numerical columns with the bounded mean, accounting for outliers, and imput-
ing null values in categorical columns using the mode. The categorical columns were
then encoded with ‘One-hot encoding‘ to ensure an overall quantitative representation.
Correlation analysis was performed to assess the correlation between the feature col-
umns of the Vesta dataset as well as with the target variable. This allowed us to identify
the most correlated feature pairs. The results of this analysis were used to perform fea-
ture engineering on the dataset, i.e. dropping features based on correlation and gener-
ating new polynomial variables that can help improve performance. At the end of this
phase, the dataset to be used contained 144,233 transactions and 609 feature columns
(including transactionID, transactionDT timestamp, payment amount
(TransactionAmt) and ‘Class‘label). This dataset is used for subsequent
experimentation and can be referred to as the ’base’ dataset. In this process, PCA
(Principal Components Analysis) algorithm was also applied to transform data to a
lower-dimensiona dataset, by identifying the most important underlying features. This
is done by finding the directions, called principal components, that capture the
maximum amount of variation in the data [14].
Outlier detection techniques. In order to account for outliers in the dataset, different
methods were applied to the preprocessed dataset and their results compared to one
another. To start off, the Z-score method (also known as standard score method) was
used to find outliers statistically. As per this approach, the Z-score value is calculated
for each data point by subtracting the mean from the data point and dividing the result
by the standard deviation. This method was used with threshold=4, which corresponds
to 4 standard deviations away from the mean. Data points with Z-scores exceeding this
are classified as outliers [22]. The second technique used to identify outliers in the da-
taset is the Isolation Forest algorithm [17]. This is an unsupervised learning algorithm
that isolates data points using a binary tree structure, calculates the path length required
to isolate each data point in the tree and assigning an anomaly score for each data point
based on average path length [17]. This method was used with basic parameters such
as estimators=100, contamination=0.05, random_state=42. Finally, the third tech-
nique to identify and filter outliers is the Local Outlier Factor (LOF) algorithm [8].
Unlike the previous two, the LOF method chooses outliers based on the density distri-
bution of the data points, measuring local deviation of a point with respect to neighbor-
ing points. A high LOF value would indicate that the data point has lower density com-
pared to its neighbors, suggesting that it is an outlier. This approach was used with
specified parameters neighbors=20 and contamination=0.1.
Data resampling. The steps taken in this phase are to reduce the impact of class im-
balance as well as ensure we use the most important features of the dataset for model
training. SMOTE (Synthetic Minority Over-sampling Technique) [9] was applied to
the preprocessed dataset, to oversample the underrepresented class. The algorithm
works by generating synthetic examples of the minority class (i.e. fraudulent payments)
to balance the label distribution. This helps in reducing bias [9] towards the majority
class (i.e. legitimate payments) and improving the model's ability to reliably classify
minority class instances. This resampling method was used with sampling strat-
egy=0.5, which results in SMOTE producing synthetic samples such that, in the end,
there is a minority class instance for every 2 instances of the majority class. This leads
to a higher amount of fraudulent payments in the dataset which ML models can use to
better learn and classify.
5

Model training and evaluation. In this phase, a set of machine learning (ML) algo-
rithms are trained on the dataset and used to classify transactions - either as 'legitimate'
(0) or 'fraudulent' (1). The models that are used include Logistic regression, Linear
SVC, Multi-layer perceptron (MLP), Decision Tree, Random Forest, AdaBoost and
Gradient Boosting classifiers. Other gradient boosting algorithms namely CatBoost,
LightGBM and XGBoost, are also used for classification. The selection of these models
in our experiment is due to their diverse characteristics and proven success in various
machine learning tasks.
Model refinement. The selected ML classifiers are fine-tuned to optimize for perfor-
mance on the chosen metrics. This involves adjusting hyperparameter settings of the
algorithms and selecting them such that the model performs optimally. GridSearch [7]
is the preferred tool for such experimentation as it can systematically search through a
predefined hyperparameter space and select optimal combinations yielding the best per-
formance. For each of the previously mentioned classifiers, a grid of hyperparameters
is defined, encompassing different values or ranges.
Model interpretation and explainability. To better understand the predictions made
by these ML classifiers, explainable AI (XAI) methods have been used to better inter-
pret the underlying decision-making process. Through these techniques, we will also
investigate which features in the dataset contribute the most for predictions made by
the algorithms. Firstly, LIME [24] generates local explanations by approximating
black-box model predictions, allowing us to identify most relevant features. Then,
SHAP [18] is a game-theoretic approach to explain individual predictions, by assigning
a "credit" score to each feature based on the impact on the model's final prediction. The
approach to validate these trained ML models revolves around understanding real-
world practices and how they can be deployed to support these processes. This can be
broken down into the following steps:
Evaluating models for fairness, accountability & bias. To evaluate these ML classi-
fiers for fairness and bias, the implications they have on algorithmic parity measures
(such as demographic parity, predictive parity, equalized odds etc.) can be studied. It is
important to consider the impact of a model's predictions on protected groups of people.
Since the Vesta dataset is largely blackbox and anonymized to conceal personal details
such as gender, age, region etc., salient groups can instead be created based on the
transaction amount (TransactionAmt). If the results indicate the classifiers are biased,
debiasing methods [12] can be considered such as data augmentation techniques, mod-
ifying the algorithms used or post-processing the output prediction of an algorithm.
Critical Assessment and Deployment Feasibility. This phase involves a study of cur-
rent fraud detection practices in the payments industry and mitigation strategies that are
used by financial institutions to prevent crimes such as money laundering, identity theft,
card skimming etc.

4 Data Strategy and Experiments

The data strategy followed in this research includes the following stages: data collection
and exploration, preprocessing for learning experiments, model training and evaluation
of results.
6

Exploratory Data Analysis. The Vesta dataset was explored to discover insights on
the transactions recorded and their features. Over 85,000 transactions in the original
dataset were found to be made on desktop compared to mobile devices, which saw
around 55,000 payments. This reflects general consumer habits as well since growth in
mobile conversion rate is sluggish and lagging behind desktop - people are more happy
to browse on mobile but buy ultimately on desktop [27]. As likely another indicator for
this preference, larger screen sizes such as ‘1920 x 1080’ and ‘1366 x 768’ were found
more often in making transactions. It was also found that most transactions are made
on Windows 10 and 7 operating systems which again are desktop-oriented, compared
to mobile OS versions. Notably, among mobile OS, iOS versions recorded a higher
share of transactions compared to Android. Most transactions made were using Visa
(385k) and Mastercard (189k) cards, with American Express (8.3k) and Discover (6.7k)
coming at distant 3rd and 4th spots respectively. Most transactions made were using a
debit card (441k) with significantly less through credit (149k) and charge (15) cards.
Interestingly, the number of fraudulent transactions is roughly the same for both debit
and credit (10k). This would suggest that on average, credit cards are nearly 3x more
susceptible to fraud. The main reason for this could be the size of transactions made
with credit cards. The average transaction amount was found to be 64% higher for credit
card transactions compared to debit payments in the Vesta dataset. This may be because
credit cards are typically more used for work, business or commercial purposes. Addi-
tionally, it was found that much of the original dataset was sparse as many of the col-
umns had missing values. The most sparse features are M3, V4, M1, M2, dist1, M5,
M6, M7, M8, M9, V2, V3, V1, V5, V9, D11, V6, V10, V11, V8, V7 with close to
100\% null values. In order to find the most relevant features in the dataset, the corre-
lation coefficient scores between the feature and the ‘Class’ column were used. The top
10 important features found were V87, V45, V86, V257, V246, V244, V242, V44,
V201 and V200 respectively, with the first 8 all having a correlation above 35\% with
the target variable ‘Class’. A confusion matrix from these scores was also derived to
find pairs of features that are correlated to one another. The top 10 highly inter-corre-
lated feature pairs found are D12-D4, V322-V95, V323-V96, V324-V97, C6-C4,
V293-V279, V101-V95, V322-V101, V322-V279 and V324-V280. After preparation
using correlation analysis results, the missing values were cleared from the dataset and
it was preprocessed to contain 144,233 payments (out of the original 590,540) and 609
columns in total, including the Class label. Out of these, only 11,318 (7.85%) transac-
tions are labelled fraudulent and this also signifies the data is highly imbalanced.
Experimental Setup. In order to collect results, four different experiments were de-
signed to train and evaluate the chosen supervised learning algorithms. These are based
on different combinations of the outlier detection techniques and data resampling ap-
proach discussed in Section 3. In experiment 1, the preprocessed dataset is transformed
using PCA and all ten ML models are subsequently applied in a 80-20 split. In experi-
ment 2, three different outlier detection techniques are first used (separately) to remove
outliers in the preprocessed dataset before PCA and then the supervised learning algo-
rithms are applied. Experiment 3 sees the preprocessed data resampled with SMOTE
prior to PCA and the set of ML algorithms. Finally, experiment 4 involves a combina-
tion of the previous experiments by first removing outliers, then resampling with
7

SMOTE prior to PCA and ML models. In all these experiments, different performance
metrics are measured namely precision, recall, F1, area under ROC curve (AUROC)
and R2 score. While accuracy is widely used in classification problems, it is not a reli-
able metric when the data in question suffers from class imbalance.

5 Result
This section presents the findings from the experimental setup and running the set of
ML classifiers on an unseen subset of the data in order to evaluate performance. The
following subsections examine results found for each of the four experiments con-
ducted to evaluate how well ML models can be used for fraud detection purposes.

5.1 Exp. 1 – Non-resampled (base) dataset

In the first experiment, PCA is applied to the preprocessed dataset to identify the most
important underlying features and obtain principal components capturing the maximum
amount of variation in the data. After applying PCA, the dataset consisted of a total of
144,233 transactions and 175 features. From Table 1, it can be seen that linear classifi-
ers such as Logistic Regression and SVMs perform moderately well and have good
ability to discriminate between classes. Most algorithms perform with 70-90% preci-
sion and 40-60% recall, with the Decision Tree classifier underperforming and the gra-
dient boosting models (CatBoost, LightGBM and XGBoost) doing better than the oth-
ers. This can be seen in Figure 2 below. This could be due to the fact that decision trees
are more prone to overfitting while ensemble methods combine predictions and often
incorporate regularization techniques, such as feature subsampling, learning rate ad-
justment and tree depth limitations.
Table 1. performance of the selected ML classifiers on Fig. 2. Exp. 1 model comparison.
the designated test set.

SHAP values were used in this experiment, to evaluate for each input feature the impact
it has on the final prediction. Figure 3 displays the summary plot for SHAP analysis of
the XGBoost classifier tested. As per this analysis, the top 5 most important features
found are V0, V11, V3, V1 and V51 with SHAP values of 0.69, 0.21, 0.17, 0.15 and
0.148 respectively.
Demographic parity can be seen [10] as when an equal proportion of positive pre-
dictions are made by a model for each group category. Figure 4 displays an overview
of both metrics for the ML classifiers selected. From the results, it can be seen that the
Random Forest and LightGBM models achieved the highest demographic parity on the
base dataset. This was with values of 0.545 and 0.548 respectively, meaning only little
8

over half the time is the same fraction of transactions predicted 'fraudulent' regardless
of transaction size. This implies the transaction size has a rather large impact on the
proportion of payments that are predicted fraudulent, and the models trained are not
insusceptible to the impacts of payment amount on classification performance.

Fig. 3. SHAP summary for XGBoost model. Fig. 4. Demographic Parity and Equalized Odds

5.2 Exp. 2 – Non-resampled dataset with outlier detection techniques applied

In the second experiment, three anomaly detection methods were applied (separately)
to the preprocessed dataset to identify outliers. Outliers deviate significantly from the
majority of the data and can introduce noise into the learning process. Removing them
can help improve the overall quality of the dataset, ensuring that the trained ML models
are robust and reliable in real-world scenarios. Table 2 (a) displays performance of the
selected ML classifiers after applying the Z-score method (as outlined in Section 3) on
the dataset.
Table 2. (a) Experiment 2 - Z-score method results – (b) Isolation Forest method results.
(a) (b)

Table 2 (b) displays performance of the selected ML classifiers after applying the Iso-
lation Forest algorithm (as outlined in Section 3) on the dataset. As can be seen above,
the performance of ML classifiers is marginally higher on the dataset with outliers re-
moved using the Isolation Forest algorithm. This can be mainly due to the fact that the
Isolation Forest algorithm (with contamination=0.05) only picked 5% as outliers, while
the Z-score method (with threshold of 4) identified 59.5% as outliers. As a result, there
is less data for the ML models to be trained on in the latter case, which can be the reason
for worse performance on test set. Moreover, the Z-score method assumes the data fol-
lows an underlying Gaussian distribution which is not likely the case, since only 40%
9

of the data here lies within 4 standard deviations of the mean. Table 3 displays perfor-
mance of the selected ML classifiers after applying the Local Outlier Factor algorithm
(as outlined in Section 3) on the preprocessed dataset.
Table 3. Exp. 2 - LOF method results. Fig. 5. Exp. 2 -Plot of metrics for all approaches.

As evidenced from the results, application of LOF algorithm on the data to find outliers
led to superior performance by the selected ML classifiers. This approach with basic
parameters identified roughly 10% of the transactions as outliers. Since it takes into
account the density of neighboring points to determine if a point is an outlier or not,
LOF is more effective for datasets with complex structures. Figure 5 displays an over-
view of all performance metrics for one of the better classifiers, Random Forest, across
the three outlier detection techniques discussed. Interestingly, the recall, F1 and R2
scores can be seen to be highest with the LOF approach (in comparison to the other two
techniques). This could be due to LOF's ability to capture local anomalies which can
result in better recall, as it can detect a higher proportion of true outliers in the dataset.
With improved recall, the ML models are able to identify more positive instances cor-
rectly, which is particularly important in scenarios where the focus is on detecting rare
events or anomalies. LOF's ability to accurately identify local outliers also helps in
reducing false positives (high precision) while still capturing true positives (high re-
call), leading to an improved overall F1 score. Removing or down weighting identified
outliers allows the model to focus on the meaningful patterns and relationships within
the data, leading to a better fit and improved R2 score.

5.3 Exp. 3 – Resampled (base) dataset

In the third experiment, SMOTE was first applied to the preprocessed dataset to
resample the amount of fraudulent transactions and balance the class imbalance. Upon
applying SMOTE, the dataset consisted of 199,372 transactions, of which a third
(66,457) are labelled fraudulent and two-thirds (132,915) are labelled legitimate. Sub-
sequently, PCA is applied to the resampled dataset to obtain the principal components
and simplify the data for models to be trained on. After applying PCA, the dataset con-
sisted of a total of 199,372 transactions and 177 features. Table 4 (a) displays perfor-
mance of the selected ML classifiers on a 20% (test) split of this dataset. As can be seen
from these results, there is a massive improvement in how well ML models can classify
transactions after resampling the dataset. There are several reasons for this; firstly, by
synthesizing new samples for the minority class (i.e. fraudulent payments), SMOTE
increases the representation of the underrepresented class in the training data. This al-
lows an ML model to learn from a more balanced set of samples, leading to improved
performance metrics. Figure 6 displays a comparison between performance on the non-
resampled and resampled datasets by the Random Forest classifier. The increase in pre-
cision indicates that the model is making fewer false positive predictions (i.e. legitimate
10

transactions incorrectly classified as fraudulent), while the increase in recall suggests

that it can capture a higher proportion of true positive instances (i.e. fraudulent pay-
ments that are correctly classified). As a result, the F1 score, which combines precision
and recall, is enhanced, indicating a better trade-off between correctly identifying pos-
itive instances and avoiding false positives. Overall, resampling techniques like
SMOTE help address the challenges associated with imbalanced datasets by providing
a more balanced representation of classes. This leads to better classification perfor-
mance, as the model can learn from a more representative and discriminative set of
examples.
5.4 Resampled dataset with outlier detection techniques applied
In the fourth experiment, the three outlier detection techniques mentioned above were
applied (separately) to the resampled dataset to identify and remove outliers. Table 4
(b) below displays performance of ML classifiers after applying the Z-score method on
the resampled dataset.
Table 4. (a) Exp. 3 performance results. (b) Exp. 4 - Z-score method results.
(a) (b)

As we can observe, there is not much change in the results upon applying outlier detec-
tion and removal techniques on the resampled dataset. There are a few reasons why this
might be happening; firstly, SMOTE generates synthetic samples to increase the repre-
sentation of the minority class. These synthetic samples are often located in regions
between existing minority class instances. Since outlier detection techniques typically
focus on identifying observations that deviate significantly from the majority of the
data, the synthetic samples generated by SMOTE might not be considered outliers. As
a result, the distribution and characteristics of the outliers in the dataset may remain
relatively unchanged, leading to limited impact on the outlier detection results. The
primary objective of applying SMOTE is to address the issue of class imbalance and
improve representation of the minority class. Outlier detection techniques primarily aim
to identify and handle extreme values or observations that deviate significantly from
the norm. While both techniques deal with anomalies in the dataset, their focus and
objectives are quite distinct. As a result, the impact of applying outlier detection tech-
niques on classification performance is overshadowed by the substantial improvements
achieved through SMOTE in addressing class imbalance.
11

(a) (b)
Fig. 6. (a) Exp. 3 - Plot of metrics post-resampling. (b) Exp. 4 - Random Forest classifier.
As shown in Figures 6 (a) and (b), this can be seen in the numbers of both Random
Forest and XGBoost classifiers trained on the resampled dataset, where values for pre-
cision, recall and F1 remain unchanged for different outlier detection techniques while
area under ROC curve and R2 scores shift slightly. Interestingly, for both of these clas-
sifiers, the Z-score method for removing outliers seems to now marginally outperform
the Isolation Forest and LOF algorithms. This could be because resampling algorithms
such as SMOTE can potentially alter the distribution of the data. The Z-score method
calculates the Z-score for each data point based on the mean and standard deviation of
the original data distribution. If the resampling process significantly affects the mean
and standard deviation, Z-score values of the data points may be recalculated in a way
such that they are more apparent in terms of their deviation from the mean. This can
improve the Z-score method's ability to identify and remove outliers effectively.

6 Conclusion

This paper investigated the use of machine learning (ML) techniques in fraud detection
and payment security. It explores various aspects, including current fraud prevention
approaches, the effectiveness of ML algorithms, the impact of outlier detection and data
resampling techniques, and the applicability of explainable AI (XAI) and Responsible
AI principles. The results indicate that MLPs, Random Forest, CatBoost, and XGBoost
perform well in predicting fraudulent transactions, while Decision Trees perform
poorly. SMOTE effectively addresses class imbalance, and XAI techniques enhance
transparency and understanding of ML models. Responsible AI principles are crucial
for evaluating fairness and bias. The study emphasizes the need for a comprehensive
approach that combines technology, data analysis, and human expertise, highlighting
real-time monitoring, anomaly detection, verification processes, collaboration with law
enforcement, and sharing threat intelligence as important strategies. Leveraging tech-
nology and AI can strengthen fraud detection capabilities, minimize losses, and ensure
secure payment transactions.

References
1. Fraud Detection: An Ultimate Guide for Protecting & Preventing Fraud. https://www.inscribe.ai/fraud-detec-
tion.
2. Deep Dive: How AI and ML Improve Fraud Detection Rates And Reduce False Positives. https://www.fintech-
news.org/how-ai-and-machine-learning can-turn-the-tide-of-fraud
3. Seventh Report on Card Fraud. (29 Oct. 2021). https://www.ecb. europa.eu/pub/cardfraud/html/ecb.card-
fraudreport202110~cac4c418e8.en.html.
4. Charu C Aggarwal. 2015. Outlier analysis. Springer.
5. Noor Alfaiz and Suliman Fati. 2022. Enhanced Credit Card Fraud Detection Model Using Machine Learning.
Electronics 11 (02 2022), 662.
6. R. B. Asha and K.R. Suresh Kumar. 2021. Credit card fraud detection using artificial neural network. Global
Transitions Proceedings 2, 1 (2021), 35–41. 1st International Conference on Advances in Information, Com-
puting and Trends in Data Engineering (AICDE - 2020).
12

7. James Bergstra and Yoshua Bengio. 2012. Random Search for Hyper-Parameter Optimization. J. Mach. Learn.
Res. 13, null (feb 2012), 281–305.
8. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-
Based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management
of Data (Dallas, Texas, USA) (SIGMOD ’00). Association for ComputingMachinery, NewYork, NY, USA,
93–104.
9. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-
sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321–357.
10. Eleni Digalaki. 2022. The impact of artificial intelligence in the banking sector how AI is being used in 2022.
11. Kat Edwards. 2023. Banking Fraud Investigations – How Do Banks Detect Fraud?
12. Michael Feldman, Sorelle A Friedler, Jeremy Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian.
2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Con-
ference on Knowledge Discovery and Data Mining. ACM, 259–268.
13. Zhi-Min Huang, Huan Liu, and Artyom Sedrakyan. 2010. Outlier detection in large databases: a wavelet-based
approach. In Proceedings of the 2010 ACM symposium on Applied computing. ACM, 3107–3111.d
14. Ian T. Jolliffe and Jorge Cadima. 2016. Principal component analysis: A review and recent developments.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374,
2065 (2016), 20150202.
15. Aye Aye Khine and Hint Wint Khin. 2020. Credit Card Fraud Detection Using Online Boosting with Extremely
Fast Decision Tree. In 2020 IEEE Conference on Computer Applications (ICCA). 1–4.
16. Gayan K. Kulatilleke. 2022. Challenges and Complexities in Machine Learning based Credit Card Fraud De-
tection.
17. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In 2008 Eighth IEEE International
Conference on Data Mining. 413–422.
18. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances
in Neural Information Processing Systems. 4765–4774.
19. Jincy C Mathew, B Nithya, C R Vishwanatha, Prathiksha Shetty, H Priya, and G Kavya. 2022. An Analysis
on Fraud Detection in Credit Card Transactions using Machine Learning Techniques. In 2022 Second Interna-
tional Conference on Artificial Intelligence and Smart Energy (ICAIS). 265–272.
20. Niccolo Mejia. 2020. AI-Based Fraud Detection in Banking – Current Applications and Trends.
21. Jelle Oorebeek. 2023. Efficient Bank Fraud Investigations | A Complete Guide.
22. Keith Ord. 1996. Outliers in statistical data: V. Barnett and T. Lewis, 1994, 3rd edition, (John Wiley Sons,
Chichester), pp., [UK pound]55.00, ISBN 0-471- 93094-6. International Journal of Forecasting 12, 1 (1996),
175–176.
23. Andrea Dal Pozzolo and Gianluca Bontempi. 2015. Adaptive Machine Learning for Credit Card Fraud Detec-
tion.
24. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the
Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 1135–1144.
25. Yusuf Sahin, Serol Bulkan, and Ekrem Duman. 2013. A cost-sensitive decision tree approach for fraud detec-
tion. Expert Systems with Applications 40 (11 2013), 5916–5923.
26. Pooja Tomar, Sonika Shrivastava, and Urjita Thakar. 2021. Ensemble Learning based Credit Card Fraud De-
tection System. In 2021 5th Conference on Information and Communication Technology (CICT). 1–5.
27. Casey Turnbull. 2023. Why are Mobile Conversion Rates behind Desktop?

View publication stats

Digital Signal Processing Using MATLAB
100% (16)
Digital Signal Processing Using MATLAB
425 pages
jcp-05-00009
No ratings yet
jcp-05-00009
36 pages
Credit Card Fraud Detection Using A Deep Learning Multistage Model
No ratings yet
Credit Card Fraud Detection Using A Deep Learning Multistage Model
26 pages
Fraud Detection Using Machine Learning and Deep Learning: December 2019
No ratings yet
Fraud Detection Using Machine Learning and Deep Learning: December 2019
7 pages
Final_synopsis_fraud_detection[1]
No ratings yet
Final_synopsis_fraud_detection[1]
15 pages
New Synopsis
No ratings yet
New Synopsis
18 pages
IEEE_Conference_Template (2)
No ratings yet
IEEE_Conference_Template (2)
3 pages
Report
No ratings yet
Report
14 pages
Fraud Detection Using Machine Learning V 2
No ratings yet
Fraud Detection Using Machine Learning V 2
33 pages
1 s2.0 S2666285X22000425 Main
No ratings yet
1 s2.0 S2666285X22000425 Main
7 pages
Fraud Detection Synopsis[1]
No ratings yet
Fraud Detection Synopsis[1]
14 pages
Poster
No ratings yet
Poster
2 pages
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
No ratings yet
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
23 pages
Project Zero
No ratings yet
Project Zero
15 pages
Final Year Project
No ratings yet
Final Year Project
27 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Credit Card Fraud Detection Web Application Using Streamlit and Machine Learning
No ratings yet
Credit Card Fraud Detection Web Application Using Streamlit and Machine Learning
5 pages
FD Rout, 2024
No ratings yet
FD Rout, 2024
5 pages
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
63 pages
Internship project
No ratings yet
Internship project
8 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Fraud Detection using Machine LearningV2
No ratings yet
Fraud Detection using Machine LearningV2
33 pages
Synopsis FinalFINAL
No ratings yet
Synopsis FinalFINAL
4 pages
A SMOTe Based Oversampling Data-Point Approach To Solving The Credit Card Data Imbalance Problem in Financial Fraud Detection 2020
No ratings yet
A SMOTe Based Oversampling Data-Point Approach To Solving The Credit Card Data Imbalance Problem in Financial Fraud Detection 2020
10 pages
Ds 1
No ratings yet
Ds 1
6 pages
Financial Fraud Detection Using Machine Learning Techniques
No ratings yet
Financial Fraud Detection Using Machine Learning Techniques
43 pages
computers-14-00120-v2
No ratings yet
computers-14-00120-v2
19 pages
A Hybrid Approach For Optimized Fraudulent Transaction Detection With Credit Card Using
No ratings yet
A Hybrid Approach For Optimized Fraudulent Transaction Detection With Credit Card Using
7 pages
Res Ayu
No ratings yet
Res Ayu
16 pages
Fraud Detection Handbook
No ratings yet
Fraud Detection Handbook
6 pages
PID 89: Analysis and Performance Evaluation of Credit Card Fraud Detection by Multi-Model ML
No ratings yet
PID 89: Analysis and Performance Evaluation of Credit Card Fraud Detection by Multi-Model ML
19 pages
elnaby2021-4
No ratings yet
elnaby2021-4
5 pages
Research Proposal Template for Master Student
No ratings yet
Research Proposal Template for Master Student
15 pages
Upi Fraud Detection Using Machine Learning
No ratings yet
Upi Fraud Detection Using Machine Learning
11 pages
TE Seminar Formatfinal
No ratings yet
TE Seminar Formatfinal
16 pages
Financial Fraud Detection Using Machine Learning Techniques
No ratings yet
Financial Fraud Detection Using Machine Learning Techniques
43 pages
Advancements in Fraud Detection Systems Using Machine Learning
No ratings yet
Advancements in Fraud Detection Systems Using Machine Learning
3 pages
MITS6011 - ResearchReport
No ratings yet
MITS6011 - ResearchReport
15 pages
Fraud Detection
No ratings yet
Fraud Detection
16 pages
Predictive Analytics With Machine Learning For Fraud Detection
No ratings yet
Predictive Analytics With Machine Learning For Fraud Detection
5 pages
Copy of final eddited research paper1
No ratings yet
Copy of final eddited research paper1
6 pages
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
No ratings yet
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
6 pages
Proactive Fraud Defense Machine Learnings Evolvin
No ratings yet
Proactive Fraud Defense Machine Learnings Evolvin
10 pages
19792
No ratings yet
19792
9 pages
IT Task 3 Capstone Report
No ratings yet
IT Task 3 Capstone Report
18 pages
Credit Card Fraud Detection Using AI
No ratings yet
Credit Card Fraud Detection Using AI
18 pages
Proactive fraud defense
No ratings yet
Proactive fraud defense
1 page
DS 1
No ratings yet
DS 1
9 pages
AI-Enhanced Data Mining Techniques for Large-Scale Financial
No ratings yet
AI-Enhanced Data Mining Techniques for Large-Scale Financial
29 pages
Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Fraud Detection in Banking Data by Machine Learning Techniques
10 pages
Sensors 22 07162 v3
No ratings yet
Sensors 22 07162 v3
20 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
Introduction and Context 1600
No ratings yet
Introduction and Context 1600
4 pages
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
No ratings yet
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
10 pages
Report
No ratings yet
Report
14 pages
UPI Fraud Detection
100% (1)
UPI Fraud Detection
5 pages
DBNex_Deep_Belief_Network_and_Explainable_AI_based_Financial_Fraud_Detection
No ratings yet
DBNex_Deep_Belief_Network_and_Explainable_AI_based_Financial_Fraud_Detection
10 pages
1 s2.0 S0957417423000635 Main
No ratings yet
1 s2.0 S0957417423000635 Main
11 pages
JETIR2404299
No ratings yet
JETIR2404299
8 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning Applications
From Everand
Machine Learning Applications
Kai Turing
No ratings yet
DCN-Lab-Manual-20092024 (1)
No ratings yet
DCN-Lab-Manual-20092024 (1)
39 pages
ANN Unit 1
No ratings yet
ANN Unit 1
77 pages
Inheritance 1
No ratings yet
Inheritance 1
67 pages
ANN Unit 3
No ratings yet
ANN Unit 3
100 pages
ANN Unit 4
No ratings yet
ANN Unit 4
66 pages
Energy and Potential
No ratings yet
Energy and Potential
41 pages
CIP Notes
No ratings yet
CIP Notes
19 pages
FIELDS AND WAVES UNIT 3 (FOR NMIT) (PaperFree Pro)
No ratings yet
FIELDS AND WAVES UNIT 3 (FOR NMIT) (PaperFree Pro)
56 pages
124 Paper Customer Segmentation Camera Ready Final
No ratings yet
124 Paper Customer Segmentation Camera Ready Final
6 pages
UxApps ErrorReport
No ratings yet
UxApps ErrorReport
2 pages
Fast - Lane F5 NETWORKS - CONFIGURING BIG IP ADVANCED WAF
No ratings yet
Fast - Lane F5 NETWORKS - CONFIGURING BIG IP ADVANCED WAF
5 pages
Module 7 - Web Content Development
No ratings yet
Module 7 - Web Content Development
4 pages
LTE Pre-Launch
No ratings yet
LTE Pre-Launch
17 pages
Netbackup Interview Question: Server
No ratings yet
Netbackup Interview Question: Server
6 pages
Sample Size and Power Calculations in Repeated Measurement Analysis (2001)
No ratings yet
Sample Size and Power Calculations in Repeated Measurement Analysis (2001)
4 pages
Describe Your Ideal Company, Location and Job
No ratings yet
Describe Your Ideal Company, Location and Job
6 pages
Bascis of Graphic Design
No ratings yet
Bascis of Graphic Design
41 pages
Robotic Arm
No ratings yet
Robotic Arm
6 pages
Nazemian Et Al V Nvidia Corp, U.S. District Court, Northern District of California, No. 24-01454.
No ratings yet
Nazemian Et Al V Nvidia Corp, U.S. District Court, Northern District of California, No. 24-01454.
18 pages
Omae2017 61282
No ratings yet
Omae2017 61282
13 pages
Data Structure Lab Manual
No ratings yet
Data Structure Lab Manual
24 pages
SAMEERA documentation
No ratings yet
SAMEERA documentation
32 pages
Idera SQL Workload Analysis
No ratings yet
Idera SQL Workload Analysis
12 pages
87504
No ratings yet
87504
27 pages
Metrosil 300-A and 600-A Varistors
No ratings yet
Metrosil 300-A and 600-A Varistors
2 pages
Article 1
No ratings yet
Article 1
1 page
HKDSE Mathematics in Action (3rd Edition) 5A - Chapter 04 Linear Programming - Full Solution
No ratings yet
HKDSE Mathematics in Action (3rd Edition) 5A - Chapter 04 Linear Programming - Full Solution
52 pages
Mad Sample Question
No ratings yet
Mad Sample Question
3 pages
Introduction To Safety and Reliability of Structures
No ratings yet
Introduction To Safety and Reliability of Structures
164 pages
Public and Private Documents
No ratings yet
Public and Private Documents
18 pages
Reflection Paper BAWIGA CUTE
No ratings yet
Reflection Paper BAWIGA CUTE
2 pages
Module 1
No ratings yet
Module 1
178 pages
Owner'S / Operator'S Manual
No ratings yet
Owner'S / Operator'S Manual
20 pages
Asset Properties - FlexNet Manager Suite
No ratings yet
Asset Properties - FlexNet Manager Suite
1 page
Product Anti-Counterfeiting Using Blockchain
No ratings yet
Product Anti-Counterfeiting Using Blockchain
7 pages
Chapter17.BSNL Software Packages CDR ERP Sancharsoft
No ratings yet
Chapter17.BSNL Software Packages CDR ERP Sancharsoft
45 pages
Mod Menu Crash 2023 05 02-19 19 01
No ratings yet
Mod Menu Crash 2023 05 02-19 19 01
3 pages

PaymentsFraudDetectionusingMLmethodsExploringPerformanceEthicalandReal-WorldConsiderationsinMachineLearningbasedFraudDetectionforSecurePayments

Uploaded by

PaymentsFraudDetectionusingMLmethodsExploringPerformanceEthicalandReal-WorldConsiderationsinMachineLearningbasedFraudDetectionforSecurePayments

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Payments Fraud Detection using ML methods: Exploring Performance, Ethical

Conference Paper · September 2023

Seyed Sahand Mohammadi Ziabari

The user has requested enhancement of the downloaded file.

Agniv Chatterjee 1, S. Sahand Mohammadi Ziabari1 [0000-0003-3803-6714], Amr Elsherbini2

Abstract. Advancements in machine learning (ML) techniques have revolution-

Keywords: Machine Learning, Deceptive Transaction, Fraudulent Transaction,

3.1 Experimental Setup Machine Learning Process

Fig.1. Overview of Machine Learning process used.

4 Data Strategy and Experiments

5.1 Exp. 1 – Non-resampled (base) dataset

5.2 Exp. 2 – Non-resampled dataset with outlier detection techniques applied

5.3 Exp. 3 – Resampled (base) dataset

transactions incorrectly classified as fraudulent), while the increase in recall suggests

View publication stats

You might also like