0% found this document useful (0 votes)

7 views13 pages

Credit Card Fraud Detection Comparing Multiple Supervised Learning Algorithms For Optimal Accuracy

This research paper compares multiple supervised learning algorithms for credit card fraud detection, focusing on accuracy, sensitivity, specificity, and AUC-ROC metrics. The study finds that AdaBoost achieves near-perfect accuracy, while Stochastic Gradient Descent offers balanced performance, suggesting their combination as the optimal approach for fraud detection. The paper emphasizes the importance of model evaluation and cross-validation techniques to mitigate overfitting.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

Credit Card Fraud Detection Comparing Multiple Supervised Learning Algorithms For Optimal Accuracy

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

Credit Card Fraud Detection Comparing

Multiple Supervised Learning
Algorithms for Optimal Accuracy
Ayan Kumar Mahato1; Cezan Mendonca2; Harita Jasani3; Hariharan B4
1,2,3
B. Tech 4th Year, 4Assistant Professor,
1,2,3,4
Department of CSE, SRMIST, Chennai, India

Publication Date: 2025/04/12

Abstract: Supervised machine learning algorithms are widely used for classification problems across various domains.
However, selecting the best model requires a thorough evaluation of accuracy, robustness, and generalization ability. This
research compares multiple supervised learning techniques using real- world datasets, focusing on evaluation metrics such
as accuracy, sensitivity, specificity, and AUC-ROC. The study also considers the risk of overfitting, using cross- validation
techniques to strengthen the conclusions. Results indicate that AdaBoost achieves near-perfect accuracy while Stochastic
Gradient Descent (SGD) provides a balanced performance and generalisation, making their hybrid or combination a
preferable choice for fraud detection.

Keywords: Machine Learning, Accuracy, Classification, Generalisation.

How to Cite: Ayan Kumar Mahato; Cezan Mendonca; Harita Jasani; Hariharan B. (2025). Credit Card Fraud Detection Comparing
Multiple Supervised Learning Algorithms for Optimal Accuracy. International Journal of Innovative Science and
Research Technology, 10(3), 2641-2653. https://doi.org/10.38124/ijisrt/25mar1761.

I. INTRODUCTION Private Networks). One way to find them out would be to

monitor the transactions happening to and from that account.
The concept everyone relied on for getting access to These transactions will have certain features and attributes
banking services for a long time was simply the in-person where the values will vary by a noticeable margin compared
services provided by Banking Organisations, until Citibank to legitimate transactions. Using these differences in
and Wells Forgo Bank introduced internet banking attributes and properties between legit and fraud transactions
application in the United States of America in the year 1996. combined with the modern day power and versatility of AI
This was the beginning of a huge surge in online transactions, and ML, we can create systems and models using Machine
especially after the use of Credit Card services were adopted Learning Algorithms which can assist the existing systems in
into online banking. All of a sudden, there was a huge market place for purposes of detection fraudulent transactions.
to entertain the use of online Card features such as payments.
Loans, etc. across different types of platforms like e- In the research we have undertaken, we are comparing
commerce sites, Social networking, online banking, work 12 Supervised Learning Algorithms, the Supervised here
from home, etc. signifies that the model will be trained using labeled data. We
use a balanced dataset, meaning the number of fraud and legit
Now the online transaction space is more crowded than transactions is the exact same. However during Predictive and
ever, with millions and even billions of online transactions training sessions it will be imbalanced. Some of the models
taking place every single moment. With this increase in are Stochastic Gradient Descent, DecisionTree, Logistic
online transactions also comes the fraud transactions and Regression, Random Forest, Support Vector Machine,
scams that take place online to steal information and data and Perceptron. All the mentioned algorithms/ their models will
money from certain users or organisations as a whole. Online be compared across multiple deciding factors across multiple
Cyber crime now more than ever is at an all time high, threat stages of comparisons to be able to find out the optimal
actors can access online data using methods like Cross Site Algorithm and its model respectively for the task of detecting
Scripting to gain access to websites as other users using their fraud transactions from a set of Supervised learning
session cookies, and using such actions, even gain Algorithms.
administrative access. After all these actions, they also have
services to mask their identities such as using VPN’s(Virtual

IJISRT25MAR1761 www.ijisrt.com 2641

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

II. LITERATURE REVIEW The next study {9} shows how algorithms like Logistic
Regression and Random Forest were used in creating Fraud
For this research, we have referred many informative Fort, an advanced system designed to detect credit card fraud.
papers and studies done on topics co-relating to our research The study illustrates the efficacy of integrating both models
such as the {1} investigation of performance of Naive Bayes, in Fraud Fort. The results indicate the combined advantages
knn and LRM on highly skewed credit card fraud data, of logistic regression and random forest so that fraud
undertaking a hybrid technique of under and over sampling detection system can become strong, eventually leading to a
of data. Evaluations done using confusion matrix outputs. more secure and reliable economic ecosystem. The next study
The next study proposes a {2}3-stage fraudulent card {10} depicts how popular supervised and unsupervised
detection system which relies on a) Detection of invalid and machine learning algorithms have been applied to detect
fake credit from legitimate ones by using the Luhn algorithm credit card frauds in a highly imbalanced dataset. It was found
for card number validation, b) dynamic verification of card that unsupervised machine learning algorithms can handle the
expiry date, and c) a script code validation for Card skewness and give best classification results. The next study
Verification Value (CVV) or Card Verification Code (CVC) {11} is about how financial institutions aim to secure credit
that compute the total number of digits which should be card transactions and allow their customers to use e-banking
within the specified range. The next study proposes a model services safely and efficiently. To reach this goal, they try to
to {3} handle imbalanced data using XGBoost classifier to develop more relevant fraud detection techniques that can
detect fraud transactions. The typical technique pre- identify more fraudulent transactions and decrease frauds.
determines the threshold value, resulting in inefficiency Defining the fundamental aspects of fraud detection, the
where several threshold values are computed and compared current systems of fraud detection, the issues and challenges
to identify the ideal value that provides an optimal outcome of frauds related to the banking sector, and the existing
and high efficiency. The next study {4} compares 3 ML solutions based on machine learning techniques. The next
algorithms (i.e. Random Forest, Logistic regression, and study {12} implements six widely used machine learning
AdaBoost) and compared the machine learning algorithms techniques for credit card fraud detection. Their efficacy is
based on their Accuracy and Mathews Correlation analysed based on the parameters such as accuracy,
Coefficient (MCC) Score. In these three algorithms, the precision, recall, specificity, misclassification, and F1 score.
Random Forest Algorithm achieved the best Accuracy and Results show that machine learning techniques are helpful for
MCC score. The Streamlit framework is used to create the credit card fraud detection. We strongly recommend using
machine learning web application. multiple machine learning techniques for fraud detection
before they occur or in the process of occurrence. The next
The next study {5} is a research investigating the study {13} performs a comparative experimental study to
application of advanced machine learning techniques to detect credit card frauds, as well as to tackle the imbalance
effectively detect fraudulent transactions. The findings classification problem by applying different machine
highlight the need for resilient, scalable, and real-time learning algorithms for handling imbalanced datasets. The
mechanisms to combat evolving fraud strategies. The next study shows that there is no need to process imbalance dataset
study {6} examines the latest advances and application in the by applying resampling techniques to measure the
field of machine learning-based credit card fraud detection performance of our classifiers and it is sufficient to measure
where four machine learning algorithms have been analyzed the performance through the three-performance
and compared on the basis of their accuracies. It is found out measurements (Accuracy, Sensitivity, and Area Under
that Catboost algorithm works best to detect credit card fraud Precision/Recall Curve (PRC) to prove the accuracy of the
with an accuracy of 99.87 percentage. The dataset for credit prediction of classification.
card fraud detection was taken from kaggle. The next study
{7} proposes a method to overcome the problem of Credit III. MODULE DESCRIPTION
Card identification by combining Deep Learning with
Machine Learning techniques. To reduce the number of false This section provides an overview of the different
negatives, this study has performed data-matching trials with components that make up the fraud detection system. Each
the implementation of Deep Learning Techniques. Utilizing module is responsible for a specific task, ensuring the
the suggested strategy, it is possible to locate Credit Card pipeline runs smoothly from raw data input to fraud
Fraud (CCF) remotely from any location. The next study {8} classification. Explained below are the different modules and
discusses the problem of detecting a credit score includes their key purposes:
modelling of past transactions for credit cards with the facts
of those who have been revealed to fraud. The version is then A. Data Preparation and Preprocessing
used to delay whether new transactions are fraudulent or are This is the starting or initial phase of the process. It
now not new transactions. In this method, focus was put on ensures the below given processes:
the analysis and preprocessing of several anomaly detection
algorithms and record sets, such as "neighbor outliers" and  Import Libraries: Loads essential Python libraries such as
"forest zone isolation" algorithms, in PCA-converted credit NumPy, Pandas, Scikit-Learn, Matplotlib, and Seaborn
card transaction statistics. for data handling, model training, and visualization.
 Load Dataset: Ensure correct dataset is loaded for all the
algorithms and all their models are trained on the same

IJISRT25MAR1761 www.ijisrt.com 2642

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

dataset. We make use of a balanced dataset with equal  ROC AUC Score and curve plotting: Computing and
share of fraud and legit attribute. The dataset is in the form plotting the Receiver Operating Characteristic curve and
of a CSV file and contains over half a million rows of data, Area Under Curve score. This gives us an idea and
with many columns with sensitive data having PCA based understanding of the trade-off between sensitivity(recall)
data. and specificity.
 Preprocess Data: Handle instances of missing data points  Determining Optimal Threshold and Prediction
with mean, median or mode replacement, do encoding and Recalculation: Identifying the best classification
detecting outlier using methods like Z-score analysis. threshold for improving model accuracy and adjusts
 Feature Scaling: To ensure uniform scale, standardise or predictions based on the optimal threshold and re-
normalise data in a certain specific format across all the evaluation performance.
data. Aids in convergence in Machine Learning models.  Calculating Optimal Test Accuracy: Measures the new
 Data Splitting: Using splitting functions like accuracy score after determining threshold optimisation.
train_test_split_data to split the dataset into training and
testing data in a specified ration, in our case, we have used E. Data Insights and Visualisation
80% of the dataset for training and 20% to test against the The following steps involve he use of visual
trained and fitted model. representations to analyse key data characteristics:

B. Model Initialisation and Training  Plotting Distributions: V isualising the

Once the correct libraries have been imported, dataset distributions of transaction amounts, their id’s, plotting
has been imported and processed, we move to the steps given correlation heat map to display feature relationships.
below which explain the continuing workflow.  Organise Metrics: Summarise all evaluation factors and
their values for comparison or study or further analysis.
 Initialise Model: Selecting the required Machine Learning
Algorithm to train the model and based on which model, IV. DESIGN METHODOLOGY
defining parameters upon which model will have initial
calibration and setting. Methodology is used to describe the step-by-step
 Timer: Using the Time library in python to calculate time approach to how the system as a whole was made and
taken to fit specific model. This becomes a big factor that designed. What all parts have had to come together to make
contributes to deciding ultimately which algorithm’s the system work. We will understand the methodology of our
model will be more efficient to train, test and fit. research below:
 Model Fitting: Trains the specific algorithm’s model on
the provided training dataset. Some algorithms are A. Extra Trees Classifier (Extremely Randomised Trees)
equipped with hyper parameters and optimising functions An ensemble learning method based on Decision Trees.
and features like grid search for optimal n value (eg. knn), Unlike Random Forest, Extra Trees selects features randomly
etc. and splits at random thresholds instead of calculating the best
split. Faster than Random Forest due to random splits. Good
C. Model Predictions and Evaluation for handling imbalanced datasets like fraud detection.
The trained and tested model will now be evaluated Mathematical Formulation:
based on it prediction capabilities and performance overview
done via below given steps:  Given a dataset with features and labels,
Extra Trees
 Predict on Training Data: Generates trained model  Selects a subset of features randomly.
predictions on the training dataset. A crucial component  Picks a random split value instead of best split
in detection of overfitting.  Constructs multiple decision trees.
 Predict on Test Data: Uses the trained model to generate  Prediction:
predictions on the unseen testing data, provides first
proper result of how well model has been trained.
 Calculate Training Accuracy: Computes the model’s
accuracy on training data.
 Calculate Testing Accuracy: Computes the model’s
accuracy on testing data in order to evaluate Where s the output from the decision tree, and
generalisation performance. is the number of trees.

D. Performance Metrics and Analysis B. Perceptron

The following steps provide results based on detailed One of the earliest binary classifiers. It learns a linear
evaluation using standard classification metrics— decision boundary to separate fraud (1) and non-fraud (0)
transactions. Uses Stochastic Gradient Descent (SGD) for
 Matrix and Plotting: Generating confusion matrix and weight updates. Mathematical Formulation:
plotting heat map of the confusion matrix for easy
interpretation.

IJISRT25MAR1761 www.ijisrt.com 2643

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

 Given input. , E. Stochastic Gradient Descent

Optimization method for large datasets. Iteratively
 Weights. and bias. updates weights for each instance instead of the entire batch.
Mathematical formulation:

 Given cost function.

 Activation function (step function):

 Gradient update rule:

 Weight Update Rule:

Where. is the learning rate. Where. is the learning rate.

C. AdaBoost Classifier F. Multi-Layer Perceptron

Boosting algorithm that combines multiple weak Neural Network with hidden layers. Uses
classifiers to form a strong classifier. Assigns weights to backpropagation and activation functions like ReLU or
misclassified instances and re-trains on them. Sigmoid. Mathematical formulation:
Mathematical Formulation:
 Forward propagation:
 Given classifiers. AdaBoost assigns a weight.

 Error computation using Cross-Entropy Loss:

Where. is the weighted classification error.

 Final prediction:
 Backpropagation updates weights using gradient
descent.

G. XGBoost
Gradient Boosting algorithm optimized for speed and
accuracy. Uses tree pruning and regularization to prevent
D. Gaussian Naive Bayes overfitting. Mathematical formulation:
A probabilistic classifier based on Bayes’ theorem.
Assumes independence among features and normal  Boosted trees minimize loss:
distribution. Mathematical formulation:

 Given features.

H. Random Forest
Ensemble of Decision Trees. Reduces variance
compared to a single tree. Mathematical formulation:
 Gaussian probability for a feature.  Predictions are aggregated across trees:

 Classification: Assign to the class with the highest

posterior probability.

IJISRT25MAR1761 www.ijisrt.com 2644

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

I. Decision Trees  Objective:

Recursive partitioning based on feature splits. Uses Gini
Impurity or Entropy for splitting. Mathematical formulation:

 Gini Imputiry:

 Subject to:

 Entropy:

L. Logistic Regression
Predicts probability using sigmoid function.

 Mathematical formulation:
J. K-Nearest Neighbours
Classifies based on the majority of K nearest data
 Sigmoid Function:
points. Mathematical formulation:

 Distance Metric:

M. Environmental Setup and Requirements

The interactive environment used must be suitable for
K. Support Vector Machine coding, particularly for Machine Learning and Deep
Finds the optimal hyperplane that separates classes. Learning. Coding language of choice is Python, due to its ease
of use and as it is the standard ML coding language to code
 Mathematical Formulation: models. Using a proper environment, in our case we have
used 3 which are Google Colab, Jupyter Notebook and
 Decision boundary: Kaggle. Writing and executing code in these notebooks in
ipynb format files provides us with good visualisation of
results including graphs and curves, and also properly
highlighted errors and quick ways to resolve them, given
below are the tables comparing the properties of the 3
different interactive environment platforms or services we
have used—
Table 1: Dataset Handling Comparison
Feature Jupyter Notebook Google Colab Kaggle Notebooks
Dataset Access Local files, databases, cloud Google Drive, cloud storage, Direct access to Kaggle
(manual setup required). or direct uploads. datasets; easy integration.
File Storage Local machine storage. Temporary cloud storage Persistent storage (within
(resets after session). Kaggle).
Integration with Manual setup for AWS/GCP. Seamless integration with Kaggle datasets and APIs easily
Cloud Services Google Drive. accessible.

A factor that affects the scale to which a research can be ttable given below addresses the cost comparison of the
done is based on resource availability and feasibility, the Interactive environments—

Table 2: Cost Considerations Comparison

Feature Jupyter Notebook Google Colab Kaggle Notebooks
Free Access Free, but dependent on local Free with limited resources; Pro Free with fair usage limits.
hardware. versions available.
Pro/Paid Plans No paid plans Colab Pro ($9.99/month), Colab No paid version; completely
(hardware- dependent). Pro+ ($49.99/month) for better GPUs free.
and longer runtimes.

The environment of choice must have acceptable levels ensure that all of our algorithms are able to run and provide
of support for Libraries and Frameworks so that we can us with models.

IJISRT25MAR1761 www.ijisrt.com 2645

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

Table 3: Framework Support Comparison

Feature Jupyter Notebook Google Colab Kaggle Notebooks
Pre-installed Libraries Requires manual installation Most ML libraries pre- ML/DL libraries pre-installed
(pip install). installed. (TensorFlow, PyTorch, etc.).
Custom Libraries Full control; can install any Can install new libraries Can install new libraries (!pip
package. (!pip install). install).
TensorFlow/ PyTorch Full support if installed. Pre-installed; supports Pre-installed; optimized for
Support TPUs. Kaggle competitions.

Another major factor that will definitely produce operates, the comparison of performance capabilities is given
different values for us based on which environment is chosen below.
is the performance levels at which each environment

Table 4: Performance Capability Comparison

Feature Jupyter Notebook Google Colab Kaggle Notebooks
Processing Power Limited by local machine Free tier provides cloud- based Free cloud GPUs (T4,
resources (CPU/ GPU/RAM). GPUs (T4, P100, V100 in P100); limited time usage.
Pro/Pro+).
RAM Dependent on local hardware Up to 12GB (Free), 24GB 16GB RAM for free users.
Availability (8GB, 16GB, etc.). (Colab Pro), 32GB (Colab Pro+).
Disk Storage Uses local disk; constrained Limited to 107GB (temp storage, 20GB disk storage, persists
by storage capacity. resets after session). across sessions.
Runtime Limitations No restrictions (local 12-hour session limit (Free), 9-hour session limit, but state
execution). longer in Pro. persists across runs.
Internet Dependency Not required; runs offline. Required; cloud-based. Required; cloud-based.

We primarily use and refer to the data and results we Performance Evaluation Metric. In Stage 1, we simply
obtain from Jupyter Notebook because it caters to the needs compared all 12 algorithms based on their training, testing,
of the research team. It is best for local execution, it has and overall accuracies and their fit time. We managed to
persistent environment, full control over dependencies and eliminate 7 algorithms from the running for multiple reason,
more power efficient on our systems. such as unable to finish process-LRM, SVM, MLPC. We also
eliminated based on overfitting- XGBoost, Decision Trees,
V. RESULTS AND EVALUATION Random Forest, Extra Trees Classifier. This left us with 5
remaining algorithms to compare in Stage 2 using more
We have done a 2 Stage analysis and evaluation of all Metrics.
the algorithms based on their output values for each

Table 5: Stage 1 Comparison of Algorithms

Algorithm Fit Time (s) Train % Test %
ETC 21.21 1.0 0.999824
Perceptron 0.70 0.686103 0.685850
Adaboost 116.79 0.999635 0.999674
Naive Bayes 0.20 0.994491 0.994521
SGD 107.21 0.748404 0.747920
MLPC 148.51 0.998533 0.998355
XGB 2.01 1.0 0.999718
RandomForest 240.81 1.0 0.999672
Decision Tree 22.65 1.0 0.999586
KNN 0.12 0.999059 0.998497
SVM - - -
LRM - - -

Stage 2 is the phase where we compare the algorithms A. Highly Crucial Metrics
that did not get eliminated in the comparison of Stage 1. Here, Four basic metrics are used in evaluating the
we compare the performance of the algorithm’s models experiments, namely True positive (TPR), True Negative
across more sensitive and a bigger set of Evaluation (TNR), False Positive (FPR) and False Negative (FNR) rates
Metrics.We must also consider Over-fitting for most of these metric respectively.
Metrics.

IJISRT25MAR1761 www.ijisrt.com 2646

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

 Precision:

 F1-score:

The highly crucial metrics directly assess how well the

model identifies fraud cases while handling class imbalance.

 Sensitivity:
 Matthews Correlation Coefficient:

 ROC-AUC:

Table 6: Stage 2 High Crucial Metrics Comparison

Metric SGD Perceptron ABC Gaussian NB KNN
Sensitivity 0.76206 0.559397 0.999596 0.989044 0.999947
Precision 0.758883 0.748764 0.999754 1.0 0.996427
F1 Score 0.760468 0.640374 0.999675 0.994492 0.998209
MCC 0.519938 0.38419 0.999349 0.989103 0.996418
ROC AUC 0.76006 0.790711 0.999968 0.999107 0.999002

B. Moderately Important Metrics Consists of: predictions. Helps in optimizing probabilistic models like
Logistic Regression & Neural Networks.
 Balanced Classification Rate(BCR) which ensures the  Average Precision (AP Score) summarizes precision-
model performs well across both classes (legitimate & recall tradeoff at different thresholds. Useful for
fraud) and also balances sensitivity and specificity. comparing models.
 Cohen’s Kappa measures agreement between predicted &  Optimal Threshold, based on the type of experimental
actual fraud cases and accounts for chance agreement. setup being made, can matter even more Cohen’s Kappa,
 Log Loss penalizes incorrect confident which is true in our case.

Table 7: Stage 2 Moderately Important Metrics Comparison

Metric SGD Perceptron ABC Gaussian NB KNN
BCR 0.759967 0.685085 0.999675 0.994522 0.998206
Cohen's Kappa 0.519434 0.3717 0.999349 0.989044 0.996412
Log Loss 8.651668 - 0.300422 0.057248 0.007822
Average Precision 0.692371 0.807149 0.999926 0.999456 0.999085
Optimal Threshold 8.086462 -1.196368 0.502309 0.321221 0.88
× 10⁻³³ × 10¹⁰
Sensitivity 0.76206 0.559397 0.999596 0.989044 0.999947
Specificity 0.757874 0.812303 0.999754 1.0 0.996465

The remainder of the metrics are not of utmost greatly in the case of a Credit Card fraud detection system:
importance in the case of our experiment pertains to Credit
Card Fraud Detection. We would now switch over to  Training Accuracy and Testing Accuracy can be
representing the usable levels of our algorithms in the form misleading in an imbalanced dataset and overall not
of graphs. crucial.
 Optimal Test Accuracy is not compulsory either because
C. Less Important Metrics it doesnt pertain specifically to our needs of fraud
These metrics are still useful and referred to but, in detection.
comparison to the other metrics, their significance diminishes

IJISRT25MAR1761 www.ijisrt.com 2647

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

Fig 1: Graphical Comparison of Testing Accuracy

Fig 2: Graphical comparison of Sensitivity

Fig 3: Graphical Comparison of F1-Score

IJISRT25MAR1761 www.ijisrt.com 2648

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

Fig 4: Graphical Comparison of MCC

VI. DISCUSSION A. Best Practise for Model Selection

While we have narrowed down the set of algorithms
This section of our research allows us to discuss and from which we have to choose, we must also narrow down
come up with ideas on which algorithm or what combination the criteria by which we choose the most optimal or most
of algorithms would be most suitable based on all the data we optimal combination of the algorithms and their models. For
have collected and processed through and made the models to Credit Card Fraud detection, while Test Accuracy is not a
be tested for their predictive capabilities and fraud detection high priority, we will still have to consider and eliminate all
prowess. We will go over all of our findings point wise based overfit models. That was the base criteria for Stage 1, for
on what category that factor falls into. Stage 2 , the order of importance of criteria is given below—

Table 8: Ranking Importance of Crucial Metrics

Rank Metric Importance
1 Recall (Sensitivity) Detects fraud cases (avoids false negatives)
2 Precision Reduces false alarms (avoids blocking real transactions)
3 F1-Score Balances Recall & Precision
4 ROC-AUC Measures overall fraud detection ability
5 MCC Best single-number metric for imbalanced data
6 BCR Balances performance across both classes
7 Cohen’s Kappa Accounts for chance predictions
8 Log Loss Useful for probabilistic models
9 Average Precision Summarizes precision- recall tradeoff
10 Training/Testing Accuracy Misleading in imbalanced datasets
11 Optimal Test Accuracy Not directly useful for fraud detection
12 Specificity (TNR) Less important than Recall & Precision
13 Training Time Speed matters but not at the cost of fraud detection

B. Performance Evaluation Tables  AdaBoost Classifier (Boosting)

We will now go through the results of each of our 5
algorithms individually discussing their values, how they  Strengths:
stack up to our requirements based on Industry Ranges of the
same values and also whether it is Optimal for Credit Card  Excellent accuracy and generalization ability.
Fraud Detection. We will also discuss each of their Strengths  Works well with imbalanced datasets (boosting improves
and Weaknesses— minority class detection).
 Strong precision, recall, and F1-score.

 Weaknesses:

 Training time is longer due to boosting iterations.

 Sensitive to noisy data and outliers.
 May overfit on complex datasets.

IJISRT25MAR1761 www.ijisrt.com 2649

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

Table 9: Ada Boost Performance Evaluation Table

Metric Value Industry Standard Compliance
Test Accuracy 0.9997 High (Excellent)
Precision ~0.9997 High
Sensitivity (Recall) ~0.9996 High
Log Loss Low Good (Indicates Confidence in Predictions)
F1 Score ~0.9996 High (Balanced Precision & Recall)
ROC AUC Score ~0.9997 High (Near Perfect Discrimination Ability)
MCC High Strong Positive Correlation

 Perceptron (Linear Classifier)  Weaknesses:

 Strengths:  Extremely poor performance in fraud detection.

 Cannot handle non-linearly separable data.
 Computationally efficient (low training time).  Low precision, recall, and accuracy.
 Works well in linearly separable problems.

Table 10: Perceptron Performance Evaluation Table

Metric Value Industry Standard Compliance
Test Accuracy 0.4983 Low (Below Industry Standard)
Precision ~0.498 Low
Sensitivity (Recall) ~0.498 Low
Log Loss High Bad (Indicates Poor Confidence in Predictions)
F1 Score ~0.498 Low (Poor Balance of Precision & Recall)
ROC AUC Score ~0.50 Random Guessing Level
MCC ~0.0 No Correlation

 Gaussian Naïve Bayes (NB)  Weaknesses:

 Strengths:  Assumes feature independence.

 Lower accuracy.
 Fast & scalable (low training time).
 Performs well with independent features.

Table 11: Gaussian Naive Bayes (NB) Performance Evaluation Table

Metric Value Industry Standard Compliance
Test Accuracy 0.9945 High (Good)
Precision ~0.9944 High
Sensitivity (Recall) ~0.9945 High
Log Loss Low Good
F1 Score ~0.9944 High (Good Balance)
ROC AUC ~0.9945 High
Score
MCC High Strong Correlation

 K-Nearest Neighbors (KNN)  Weaknesses:

 Strengths:  Computationally expensive for large datasets.

 Memory-intensive (stores all training data).
 Simple and effective model.  Sensitive to choice of k and feature scaling.
 Works well for small to medium datasets.
 Good performance across all metrics.

Table 12: K-Nearest Neighbour (KNN) Performance Evaluation Table

Metric Value Industry Standard Compliance
Test Accuracy 0.9990 High (Good Enough)
Precision ~0.9985 High
Sensitivity (Recall) ~0.9984 High
Log Loss Low Good

IJISRT25MAR1761 www.ijisrt.com 2650

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

F1 Score ~0.9985 High

ROC AUC Score ~0.9986 High
MCC High Strong Correlation

 Stochastic Gradient Descent (SGD)  Highly sensitive to learning rate and hyper
parameters.
 Strengths:
With the results we have tallied for each of the
 Efficient for large datasets. Algorithm’s models, we can come up with a table that
 Works well for high-dimensional data. perfectly sums up the best points and the impending
 Fast training time. drawback of the algorithm with respect to our research on
which would be ideal for a Credit Card Fraud Detection
 Weaknesses: system. We will discuss alternative ways alongside picking
the best one for the job and catering to industry metrics and
 Low accuracy, precision, and recall. values, meaning we must ensure that we not follow the
 Does not generalize well to complex fraud detection biggest most pleasing values but ones that cater to the
patterns. industry standard ranges for what the individual values must
be.

Table 13: Stochastic Gradient Descent (SGD) Performance Evaluation Table

Metric Value Industry Standard Compliance
Test Accuracy 0.7479 Below Standard (Too Low)
Precision ~0.748 Low
Sensitivity (Recall) ~0.747 Low
Log Loss Very Low (8.08e-303) Good
F1 Score ~0.747 Low
ROC AUC Score ~0.748 Low
MCC Low Weak Correlation

Table 14: Best Use Case for Model Table

Algorithm Best Use Case
AdaBoost Excellent accuracy, works well with imbalanced data
KNN Good accuracy but computationally expensive
Naive Bayes Fast and scalable but makes independence assumptions
SGD Fast with good Generalisation
Perceptron Performs worse than random guessing

Combining SGD and AdaBoost is a promising approach This approach ensure working security in a real life
because it balances generalization (SGD) with high accuracy scenario where the industry acceptable fit values of SGD
(AdaBoost). This type of hybrid model can leverage the ensure reliability and no over-fitting combined with the high
strengths of both algorithms: levels of precision, sensitivity and security of imbalanced
data handling abilities of AdaBoost classifier.
 SGD: Works well with large datasets, avoids
overfitting, and complies with industry standards. Given below are some methods by which we can
 AdaBoost: Offers high accuracy, strong recall, and combine the 2 algorithm’s to make the most of their strong
precision for detecting fraudulent cases. points while having minimal problems in implementation
complexity:
Now we shall further discuss how to approach this
idea of combining two algorithms and their use cases.  Boosting SGD as a Weak Learner:
AdaBoost traditionally works with weak classifiers (like
C. Combining SGD with AdaBoost decision stumps), but we can use SGD as a weak learner.
Now that we have established SGD gives us the most Since SGD is fast and performs well in high- dimensional
stable values and a very small margin of error by considering spaces, we can apply AdaBoost to iteratively improve it:
it’s immensely low Log Loss value in comparison to the other
algorithm’s present in this algorithm, we will choose it as a  Step 1: Train an SGD model on the dataset.
potential model to work on a Credit Card Fraud Detection  Step 2: Use AdaBoost to assign more weight to
System alongside a model that works with far more efficiency misclassified samples.
and better handle on imbalanced data, that is AdaBoost.  Step 3: Boost multiple weak SGD classifiers into a

IJISRT25MAR1761 www.ijisrt.com 2651

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

stronger ensemble model.  AdaBoost’s high accuracy, recall, and fraud detection
 Pros: capability.
 Retains SGD’s generalization power, reducing
overfitting. Table 15: Conditional Approach Table
 Boosts performance where SGD alone struggles. Condition Approach
 Cons: If computation is Stacking (SGD + AdaBoost + Meta-
 Training time is higher due to boosting multiple SGD not an issue Classifier)
classifiers. If efficiency is Dynamic Switching (SGD for
required general, AdaBoost for high-risk
 Hybrid Stacking Model: cases)
Instead of AdaBoost, we can stack SGD and AdaBoost
separately, then use a meta-classifier (like Logistic FUTURE SCOPE
Regression or a simple Neural Network) to make final
decisions: Combining SGD (Stochastic Gradient Descent) and
AdaBoost (Adaptive Boosting) for credit card fraud detection
 Model 1: Train an SGD classifier to capture creates a balanced and robust system, leveraging SGD’s
generalization and prevent overfitting. compliance with industry standards and AdaBoost’s high
 Model 2: Train an AdaBoost classifier to maximize accuracy in complex data scenarios. Below are the key future
recall and precision. scopes for such a system:
 Meta-Classifier: Combine predictions from both models
to make a final decision. A. Improved Fraud Detection Efficiency
 Pros:
 Balances bias and variance (SGD reduces overfitting,  The hybrid model can adaptively learn from new fraud
AdaBoost improves accuracy). patterns while avoiding overfitting, ensuring long-term
 Industry compliance while boosting detection effectiveness.
power.  SGD prevents overfitting, keeping the model aligned with
 Cons: real-world data, while AdaBoost enhances feature
 More computationally expensive (training two models. selection and identifies subtle fraud patterns.

 Dynamic Model Switching: B. Real-Time Fraud Detection with Adaptive Learning

Use SGD for general cases and AdaBoost for high-
risk cases:  Using online learning capabilities of SGD, the system can
update itself with new fraud cases without retraining from
 Step 1: Train SGD on the entire dataset (general scratch.
fraud detection).  AdaBoost’s adaptive nature helps in improving the
 Step 2: Identify high-confidence fraud cases using classification of difficult fraudulent transactions.
AdaBoost.
 Step 3: If SGD is unsure, use AdaBoost as a fallback C. Scalability for High-Volume Transactions
decision-maker.
 Pros:  The hybrid model can be deployed in large-scale banking
 Reduces computational cost compared to full ensemble and financial institutions, handling millions of
learning. transactions efficiently.
 Uses SGD’s generalization while relying on AdaBoost  Can be optimized for cloud-based deployment, enabling
only when necessary. fraud detection in distributed financial systems.
 Cons:
 Requires a threshold mechanism to decide when to D. Robust Against Evolving Fraud Tactics
switch models
 Fraud patterns continuously evolve, making it essential
With the necessary data tallied and all the subsequent for models to adapt dynamically.
discussions made, we will now provide the final verdict in  The combined approach ensures resilience to emerging
the conclusion section of this research. fraud techniques, reducing false positives and false
negatives.
VII. CONCLUSION
REFERENCES
 Combining SGD with AdaBoost is not only Possible but
Strategically Beneficial! It Helps Balance: [1]. J. O. Awoyemi, A. O. Adetunmbi and S. A.
Oluwadare, "Credit card fraud detection using
 SGD’s industry compliance (no overfitting, strong machine learning techniques: A comparative
generalization). analysis," 2017 International Conference on
Computing Networking and Informatics (ICCNI),

IJISRT25MAR1761 www.ijisrt.com 2652

Volume 10, Issue 3, March – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar1761

Lagos, Nigeria, 2017, pp. 1-9, doi: 10.1109/CONFLUENCE.2019.8776925.

10.1109/ICCNI.2017.8123782. [11]. N. Boutaher, A. Elomri, N. Abghour, K. Moussaid and
[2]. Singh, A. Singh, A. Aggarwal and A. Chauhan, M. Rida, "A Review of Credit Card Fraud Detection
"Design and Implementation of Different Machine Using Machine Learning Techniques," 2020 5th
Learning Algorithms for Credit Card Fraud International Conference on Cloud Computing and
Detection," 2022 International Conference on Artificial Intelligence: Technologies and Applications
Electrical, Computer, Communications and (CloudTech), Marrakesh, Morocco, 2020, pp. 1-5, doi:
Mechatronics Engineering (ICECCME), Maldives, 10.1109/CloudTech49835.2020.9365916.
Maldives, 2022, pp. 1-6, doi: [12]. Shah and A. Mehta, "Comparative Study of Machine
10.1109/ICECCME55909.2022.9988588. Learning Based Classification Techniques for Credit
[3]. Alneyadi, H. Lamaazi, M. Alshamsi, M. Albaloushi, Card Fraud Detection," 2021 International Conference
M. Alneyadi and N. Megrez, "Toward an Efficient on Data Analytics for Business and Industry
Credit Card Fraud Detection," 2024 Arab ICT (ICDABI), Sakheer, Bahrain, 2021, pp. 53-59, doi:
Conference (AICTC), Manama, B a h r a i n , 2 0 2 4 , 10.1109/ICDABI53623.2021.9655848.
p p . 7 3 - 7 8 , d o i : 1 0 . 1 1 0 9 / [13]. T. Baabdullah, A. Alzahrani and D. B. Rawat, "On the
AICTC58357.2024.10735025. Comparative Study of Prediction Accuracy for Credit
[4]. S. Jain, N. Sharma and M. Kumar, "FraudFort: Card Fraud Detection wWith Imbalanced
Harnessing Machine Learning for Credit Card Fraud Classifications," 2020 Spring Simulation Conference
Detection," 2024 First International Conference on (SpringSim), Fairfax, VA, USA, 2020, pp. 1-12, doi:
Technological Innovations and Advance Computing 10.22360/SpringSim.2020.CSE.004.
(TIACOMP), Bali, Indonesia, 2024, pp. 41-46, doi:
10.1109/TIACOMP64125.2024.00017.
[5]. S. Nijwala, S. Maurya, M. P. Thapliyal and R. Verma,
"Extreme Gradient Boost Classifier based Credit Card
Fraud Detection Model," 2023 International
Conference on Device Intelligence, Computing and
Communication Technologies, (DICCT), Dehradun,
India, 2023, pp. 500-504, doi: 10.1109/
DICCT56244.2023.10110188.
[6]. V. Jain, H. Kavitha and S. Mohana Kumar, "Credit
Card Fraud Detection Web Application using
Streamlit and Machine Learning," 2022 IEEE
International Conference on Data Science and
Information System (ICDSIS), Hassan, India, 2022,
pp. 1-5, doi: 10.1109/ICDSIS55133.2022.9915901.
[7]. V. R. Sonwane, S. Zanje, S. Yenpure, Y. Gunjal, Y.
Kulkarni and R. Yeole, "Advanced Machine Learning
Techniques for Credit Card Fraud Detection: A
Comprehensive Study," 2024 5th International
Conference on Smart Electronics and Communication
(ICOSEC), Trichy, India, 2024, pp. 1978-1981, doi:
10.1109/ICOSEC61587.2024.10722667.
[8]. Y. Singh, K. Singh and V. Singh Chauhan, "Fraud
Detection Techniques for Credit Card Transactions,"
2022 3rd International Conference on Intelligent
Engineering and Management (ICIEM), London,
United Kingdom, 2022, pp. 821-824, doi: 10.1109/
ICIEM54221.2022.9853183.
[9]. P. Y. Prasad, A. S. Chowdary, C. Bavitha, E.
Mounisha and C. Reethika, "A Comparison Study of
Fraud Detection in Usage of Credit Cards using
Machine Learning," 2023 7th International
Conference on Trends in Electronics and Informatics
(ICOEI), Tirunelveli, India, 2023, pp. 1204-1209, doi:
10.1109/ ICOEI56765.2023.10125838.
[10]. S. Mittal and S. Tyagi, "Performance Evaluation of
Machine Learning Algorithms for Credit Card Fraud
Detection," 2019 9th International Conference on
Cloud Computing, Data Science & Engineering
(Confluence), Noida, India, 2019, pp. 320-324, doi: