Credit Card Fraud Detection Comparing Multiple Supervised Learning Algorithms For Optimal Accuracy
Credit Card Fraud Detection Comparing Multiple Supervised Learning Algorithms For Optimal Accuracy
Abstract: Supervised machine learning algorithms are widely used for classification problems across various domains.
However, selecting the best model requires a thorough evaluation of accuracy, robustness, and generalization ability. This
research compares multiple supervised learning techniques using real- world datasets, focusing on evaluation metrics such
as accuracy, sensitivity, specificity, and AUC-ROC. The study also considers the risk of overfitting, using cross- validation
techniques to strengthen the conclusions. Results indicate that AdaBoost achieves near-perfect accuracy while Stochastic
Gradient Descent (SGD) provides a balanced performance and generalisation, making their hybrid or combination a
preferable choice for fraud detection.
How to Cite: Ayan Kumar Mahato; Cezan Mendonca; Harita Jasani; Hariharan B. (2025). Credit Card Fraud Detection Comparing
Multiple Supervised Learning Algorithms for Optimal Accuracy. International Journal of Innovative Science and
Research Technology, 10(3), 2641-2653. https://doi.org/10.38124/ijisrt/25mar1761.
II. LITERATURE REVIEW The next study {9} shows how algorithms like Logistic
Regression and Random Forest were used in creating Fraud
For this research, we have referred many informative Fort, an advanced system designed to detect credit card fraud.
papers and studies done on topics co-relating to our research The study illustrates the efficacy of integrating both models
such as the {1} investigation of performance of Naive Bayes, in Fraud Fort. The results indicate the combined advantages
knn and LRM on highly skewed credit card fraud data, of logistic regression and random forest so that fraud
undertaking a hybrid technique of under and over sampling detection system can become strong, eventually leading to a
of data. Evaluations done using confusion matrix outputs. more secure and reliable economic ecosystem. The next study
The next study proposes a {2}3-stage fraudulent card {10} depicts how popular supervised and unsupervised
detection system which relies on a) Detection of invalid and machine learning algorithms have been applied to detect
fake credit from legitimate ones by using the Luhn algorithm credit card frauds in a highly imbalanced dataset. It was found
for card number validation, b) dynamic verification of card that unsupervised machine learning algorithms can handle the
expiry date, and c) a script code validation for Card skewness and give best classification results. The next study
Verification Value (CVV) or Card Verification Code (CVC) {11} is about how financial institutions aim to secure credit
that compute the total number of digits which should be card transactions and allow their customers to use e-banking
within the specified range. The next study proposes a model services safely and efficiently. To reach this goal, they try to
to {3} handle imbalanced data using XGBoost classifier to develop more relevant fraud detection techniques that can
detect fraud transactions. The typical technique pre- identify more fraudulent transactions and decrease frauds.
determines the threshold value, resulting in inefficiency Defining the fundamental aspects of fraud detection, the
where several threshold values are computed and compared current systems of fraud detection, the issues and challenges
to identify the ideal value that provides an optimal outcome of frauds related to the banking sector, and the existing
and high efficiency. The next study {4} compares 3 ML solutions based on machine learning techniques. The next
algorithms (i.e. Random Forest, Logistic regression, and study {12} implements six widely used machine learning
AdaBoost) and compared the machine learning algorithms techniques for credit card fraud detection. Their efficacy is
based on their Accuracy and Mathews Correlation analysed based on the parameters such as accuracy,
Coefficient (MCC) Score. In these three algorithms, the precision, recall, specificity, misclassification, and F1 score.
Random Forest Algorithm achieved the best Accuracy and Results show that machine learning techniques are helpful for
MCC score. The Streamlit framework is used to create the credit card fraud detection. We strongly recommend using
machine learning web application. multiple machine learning techniques for fraud detection
before they occur or in the process of occurrence. The next
The next study {5} is a research investigating the study {13} performs a comparative experimental study to
application of advanced machine learning techniques to detect credit card frauds, as well as to tackle the imbalance
effectively detect fraudulent transactions. The findings classification problem by applying different machine
highlight the need for resilient, scalable, and real-time learning algorithms for handling imbalanced datasets. The
mechanisms to combat evolving fraud strategies. The next study shows that there is no need to process imbalance dataset
study {6} examines the latest advances and application in the by applying resampling techniques to measure the
field of machine learning-based credit card fraud detection performance of our classifiers and it is sufficient to measure
where four machine learning algorithms have been analyzed the performance through the three-performance
and compared on the basis of their accuracies. It is found out measurements (Accuracy, Sensitivity, and Area Under
that Catboost algorithm works best to detect credit card fraud Precision/Recall Curve (PRC) to prove the accuracy of the
with an accuracy of 99.87 percentage. The dataset for credit prediction of classification.
card fraud detection was taken from kaggle. The next study
{7} proposes a method to overcome the problem of Credit III. MODULE DESCRIPTION
Card identification by combining Deep Learning with
Machine Learning techniques. To reduce the number of false This section provides an overview of the different
negatives, this study has performed data-matching trials with components that make up the fraud detection system. Each
the implementation of Deep Learning Techniques. Utilizing module is responsible for a specific task, ensuring the
the suggested strategy, it is possible to locate Credit Card pipeline runs smoothly from raw data input to fraud
Fraud (CCF) remotely from any location. The next study {8} classification. Explained below are the different modules and
discusses the problem of detecting a credit score includes their key purposes:
modelling of past transactions for credit cards with the facts
of those who have been revealed to fraud. The version is then A. Data Preparation and Preprocessing
used to delay whether new transactions are fraudulent or are This is the starting or initial phase of the process. It
now not new transactions. In this method, focus was put on ensures the below given processes:
the analysis and preprocessing of several anomaly detection
algorithms and record sets, such as "neighbor outliers" and Import Libraries: Loads essential Python libraries such as
"forest zone isolation" algorithms, in PCA-converted credit NumPy, Pandas, Scikit-Learn, Matplotlib, and Seaborn
card transaction statistics. for data handling, model training, and visualization.
Load Dataset: Ensure correct dataset is loaded for all the
algorithms and all their models are trained on the same
dataset. We make use of a balanced dataset with equal ROC AUC Score and curve plotting: Computing and
share of fraud and legit attribute. The dataset is in the form plotting the Receiver Operating Characteristic curve and
of a CSV file and contains over half a million rows of data, Area Under Curve score. This gives us an idea and
with many columns with sensitive data having PCA based understanding of the trade-off between sensitivity(recall)
data. and specificity.
Preprocess Data: Handle instances of missing data points Determining Optimal Threshold and Prediction
with mean, median or mode replacement, do encoding and Recalculation: Identifying the best classification
detecting outlier using methods like Z-score analysis. threshold for improving model accuracy and adjusts
Feature Scaling: To ensure uniform scale, standardise or predictions based on the optimal threshold and re-
normalise data in a certain specific format across all the evaluation performance.
data. Aids in convergence in Machine Learning models. Calculating Optimal Test Accuracy: Measures the new
Data Splitting: Using splitting functions like accuracy score after determining threshold optimisation.
train_test_split_data to split the dataset into training and
testing data in a specified ration, in our case, we have used E. Data Insights and Visualisation
80% of the dataset for training and 20% to test against the The following steps involve he use of visual
trained and fitted model. representations to analyse key data characteristics:
Final prediction:
Backpropagation updates weights using gradient
descent.
G. XGBoost
Gradient Boosting algorithm optimized for speed and
accuracy. Uses tree pruning and regularization to prevent
D. Gaussian Naive Bayes overfitting. Mathematical formulation:
A probabilistic classifier based on Bayes’ theorem.
Assumes independence among features and normal Boosted trees minimize loss:
distribution. Mathematical formulation:
Given features.
H. Random Forest
Ensemble of Decision Trees. Reduces variance
compared to a single tree. Mathematical formulation:
Gaussian probability for a feature. Predictions are aggregated across trees:
Gini Imputiry:
Subject to:
Entropy:
L. Logistic Regression
Predicts probability using sigmoid function.
Mathematical formulation:
J. K-Nearest Neighbours
Classifies based on the majority of K nearest data
Sigmoid Function:
points. Mathematical formulation:
Distance Metric:
A factor that affects the scale to which a research can be ttable given below addresses the cost comparison of the
done is based on resource availability and feasibility, the Interactive environments—
The environment of choice must have acceptable levels ensure that all of our algorithms are able to run and provide
of support for Libraries and Frameworks so that we can us with models.
Another major factor that will definitely produce operates, the comparison of performance capabilities is given
different values for us based on which environment is chosen below.
is the performance levels at which each environment
We primarily use and refer to the data and results we Performance Evaluation Metric. In Stage 1, we simply
obtain from Jupyter Notebook because it caters to the needs compared all 12 algorithms based on their training, testing,
of the research team. It is best for local execution, it has and overall accuracies and their fit time. We managed to
persistent environment, full control over dependencies and eliminate 7 algorithms from the running for multiple reason,
more power efficient on our systems. such as unable to finish process-LRM, SVM, MLPC. We also
eliminated based on overfitting- XGBoost, Decision Trees,
V. RESULTS AND EVALUATION Random Forest, Extra Trees Classifier. This left us with 5
remaining algorithms to compare in Stage 2 using more
We have done a 2 Stage analysis and evaluation of all Metrics.
the algorithms based on their output values for each
Stage 2 is the phase where we compare the algorithms A. Highly Crucial Metrics
that did not get eliminated in the comparison of Stage 1. Here, Four basic metrics are used in evaluating the
we compare the performance of the algorithm’s models experiments, namely True positive (TPR), True Negative
across more sensitive and a bigger set of Evaluation (TNR), False Positive (FPR) and False Negative (FNR) rates
Metrics.We must also consider Over-fitting for most of these metric respectively.
Metrics.
Precision:
F1-score:
Sensitivity:
Matthews Correlation Coefficient:
ROC-AUC:
B. Moderately Important Metrics Consists of: predictions. Helps in optimizing probabilistic models like
Logistic Regression & Neural Networks.
Balanced Classification Rate(BCR) which ensures the Average Precision (AP Score) summarizes precision-
model performs well across both classes (legitimate & recall tradeoff at different thresholds. Useful for
fraud) and also balances sensitivity and specificity. comparing models.
Cohen’s Kappa measures agreement between predicted & Optimal Threshold, based on the type of experimental
actual fraud cases and accounts for chance agreement. setup being made, can matter even more Cohen’s Kappa,
Log Loss penalizes incorrect confident which is true in our case.
The remainder of the metrics are not of utmost greatly in the case of a Credit Card fraud detection system:
importance in the case of our experiment pertains to Credit
Card Fraud Detection. We would now switch over to Training Accuracy and Testing Accuracy can be
representing the usable levels of our algorithms in the form misleading in an imbalanced dataset and overall not
of graphs. crucial.
Optimal Test Accuracy is not compulsory either because
C. Less Important Metrics it doesnt pertain specifically to our needs of fraud
These metrics are still useful and referred to but, in detection.
comparison to the other metrics, their significance diminishes
Weaknesses:
Stochastic Gradient Descent (SGD) Highly sensitive to learning rate and hyper
parameters.
Strengths:
With the results we have tallied for each of the
Efficient for large datasets. Algorithm’s models, we can come up with a table that
Works well for high-dimensional data. perfectly sums up the best points and the impending
Fast training time. drawback of the algorithm with respect to our research on
which would be ideal for a Credit Card Fraud Detection
Weaknesses: system. We will discuss alternative ways alongside picking
the best one for the job and catering to industry metrics and
Low accuracy, precision, and recall. values, meaning we must ensure that we not follow the
Does not generalize well to complex fraud detection biggest most pleasing values but ones that cater to the
patterns. industry standard ranges for what the individual values must
be.
Combining SGD and AdaBoost is a promising approach This approach ensure working security in a real life
because it balances generalization (SGD) with high accuracy scenario where the industry acceptable fit values of SGD
(AdaBoost). This type of hybrid model can leverage the ensure reliability and no over-fitting combined with the high
strengths of both algorithms: levels of precision, sensitivity and security of imbalanced
data handling abilities of AdaBoost classifier.
SGD: Works well with large datasets, avoids
overfitting, and complies with industry standards. Given below are some methods by which we can
AdaBoost: Offers high accuracy, strong recall, and combine the 2 algorithm’s to make the most of their strong
precision for detecting fraudulent cases. points while having minimal problems in implementation
complexity:
Now we shall further discuss how to approach this
idea of combining two algorithms and their use cases. Boosting SGD as a Weak Learner:
AdaBoost traditionally works with weak classifiers (like
C. Combining SGD with AdaBoost decision stumps), but we can use SGD as a weak learner.
Now that we have established SGD gives us the most Since SGD is fast and performs well in high- dimensional
stable values and a very small margin of error by considering spaces, we can apply AdaBoost to iteratively improve it:
it’s immensely low Log Loss value in comparison to the other
algorithm’s present in this algorithm, we will choose it as a Step 1: Train an SGD model on the dataset.
potential model to work on a Credit Card Fraud Detection Step 2: Use AdaBoost to assign more weight to
System alongside a model that works with far more efficiency misclassified samples.
and better handle on imbalanced data, that is AdaBoost. Step 3: Boost multiple weak SGD classifiers into a
stronger ensemble model. AdaBoost’s high accuracy, recall, and fraud detection
Pros: capability.
Retains SGD’s generalization power, reducing
overfitting. Table 15: Conditional Approach Table
Boosts performance where SGD alone struggles. Condition Approach
Cons: If computation is Stacking (SGD + AdaBoost + Meta-
Training time is higher due to boosting multiple SGD not an issue Classifier)
classifiers. If efficiency is Dynamic Switching (SGD for
required general, AdaBoost for high-risk
Hybrid Stacking Model: cases)
Instead of AdaBoost, we can stack SGD and AdaBoost
separately, then use a meta-classifier (like Logistic FUTURE SCOPE
Regression or a simple Neural Network) to make final
decisions: Combining SGD (Stochastic Gradient Descent) and
AdaBoost (Adaptive Boosting) for credit card fraud detection
Model 1: Train an SGD classifier to capture creates a balanced and robust system, leveraging SGD’s
generalization and prevent overfitting. compliance with industry standards and AdaBoost’s high
Model 2: Train an AdaBoost classifier to maximize accuracy in complex data scenarios. Below are the key future
recall and precision. scopes for such a system:
Meta-Classifier: Combine predictions from both models
to make a final decision. A. Improved Fraud Detection Efficiency
Pros:
Balances bias and variance (SGD reduces overfitting, The hybrid model can adaptively learn from new fraud
AdaBoost improves accuracy). patterns while avoiding overfitting, ensuring long-term
Industry compliance while boosting detection effectiveness.
power. SGD prevents overfitting, keeping the model aligned with
Cons: real-world data, while AdaBoost enhances feature
More computationally expensive (training two models. selection and identifies subtle fraud patterns.