0% found this document useful (0 votes)
8 views

Credit Card Fraud Detection Using Machine Learning

The document presents a research project on credit card fraud detection using machine learning techniques, highlighting the inadequacies of traditional rule-based systems in adapting to evolving fraud patterns. It details the implementation of various algorithms, including K-Nearest Neighbors, Logistic Regression, Support Vector Machine, and Decision Tree, to improve detection rates while addressing challenges such as class imbalance and real-time prediction. The study aims to develop a scalable and explainable system that enhances financial fraud prevention through effective machine learning applications.

Uploaded by

Dani Jojo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Credit Card Fraud Detection Using Machine Learning

The document presents a research project on credit card fraud detection using machine learning techniques, highlighting the inadequacies of traditional rule-based systems in adapting to evolving fraud patterns. It details the implementation of various algorithms, including K-Nearest Neighbors, Logistic Regression, Support Vector Machine, and Decision Tree, to improve detection rates while addressing challenges such as class imbalance and real-time prediction. The study aims to develop a scalable and explainable system that enhances financial fraud prevention through effective machine learning applications.

Uploaded by

Dani Jojo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

International Journal of Scientific Research in Engineering and Management (IJSREM)

Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING


Hemalatha D1,Jarom Jerwin P2,Genga Ganesh L3,Ahamed Abdul Kadhar J4,Magesh Kumar P5
1
Assistant Professor -Department of Information Technology & Kings Engineering College-India.
2,3,4,5
Department of Information Technology & Kings Engineering College-India

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - With the rise in online transactions and  Lost/Stolen Card Fraud: Physical possession of the
digital banking, credit card fraud has emerged as a critical card is misused.
threat to financial institutions and customers. Traditional rule-
 Phishing and Social Engineering: Users are tricked
based systems fail to adapt to evolving fraud patterns,
into revealing sensitive information.
resulting in poor detection rates and financial losses.This
project presents a credit card fraud detection system using  Skimming: Card information is captured using
machine learning algorithms—K-Nearest Neighbors (KNN), devices at ATMs or terminals.
Logistic Regression, Support Vector Machine (SVM), and
Decision Tree. We utilized a publicly available dataset of real While many fraud detection systems are already in place,
credit card transactions, applied preprocessing and balancing traditional rule-based approaches are insufficient. They
techniques to address class imbalance, and trained each model operate on pre-defined rules (e.g., "flag all transactions over
on the transformed data.Model performance was evaluated $5,000 from foreign IPs") which:
using accuracy, precision, recall, and F1-score. The Decision
Tree and SVM classifiers demonstrated high recall values,  Fail to adapt to evolving fraud patterns.
suitable for minimizing false negatives in fraud detection.
 Often result in high false positive rates.
This research contributes to financial fraud prevention by
implementing efficient ML techniques to detect fraudulent  Struggle with scalability and non-linear patterns.
behavior in real-time.
Furthermore, the class imbalance problem poses a
Keywords: Credit card fraud, machine learning, KNN, SVM, substantial challenge—fraudulent transactions are often <1%
Logistic Regression, Decision Tree, anomaly detection, of total data. Standard classifiers become biased toward the
imbalanced data majority (legitimate transactions), leading to high accuracy
but poor fraud detection.

Therefore, the key challenges are:


1.INTRODUCTION
 Detecting fraud in real-time.
1.1 Problem Statement
 Minimizing false negatives, which directly result in
The rapid digital transformation across the globe has financial loss.
revolutionized the financial industry. With increasing
adoption of online banking, e-commerce, and mobile  Handling massive, imbalanced datasets.
payments, the use of credit cards has skyrocketed. While this
convenience benefits consumers and businesses, it has also  Ensuring the explainability and fairness of decisions.
created significant vulnerabilities—particularly with respect to
The emergence of machine learning (ML) offers a robust
fraud. Credit card fraud refers to unauthorized use of credit
alternative. ML models can learn complex patterns from
card information for transactions without the owner's consent.
historical data, adapt to new threats, and predict anomalies
Such fraud has become a global menace, with the Nilson
effectively. This project proposes the design and development
Report (2023) estimating annual losses surpassing $35 billion
of a machine learning-based fraud detection framework
worldwide.
tailored for real-world financial environments.
The types of fraud are diverse:

 Identity Theft: Criminals use stolen credentials to


open or access accounts.

 Card-Not-Present Fraud: Fraudulent transactions


done online or over the phone.
Fig.1.Methodology diagram

© 2025, IJSREM | www.ijsrem.com DOI: | Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

Specific Objectives:

1.2 Motivation 1. Design an ML pipeline from data acquisition and


preprocessing to evaluation.
The motivation behind this project is twofold:
practical significance and technological opportunity. 2. Apply and compare multiple ML classifiers (KNN,
Logistic Regression, SVM, Decision Tree).

3. Use SMOTE to address data imbalance effectively.


Practical Importance:
4. Analyze and visualize model performance using
 Rising Fraud Rates: With increased digital payment precision, recall, F1-score, ROC/PR curves.
adoption post-COVID-19, cyber fraud has also
increased. Financial institutions are constantly under 5. Deploy the best model using Flask or FastAPI for
attack from sophisticated fraudsters using real-time prediction.
automation, bots, and even AI.
6. Ensure interpretability using explainability tools like
 Customer Trust: False alarms (false positives) SHAP.
annoy genuine users, while undetected fraud (false
negatives) erodes trust and causes financial damage. 7. Incorporate fairness, data privacy, and ethical
safeguards into the system.
 Regulatory Pressure: Governments and regulatory
bodies mandate secure, transparent systems for 8. Explore integration and future scalability, including
customer protection (e.g., GDPR, RBI, GLBA). real-time streaming and ensemble models.

Technological Motivation: These objectives are focused not only on algorithmic


performance but also on real-world applicability.
 Machine Learning Capabilities: ML can detect
subtle correlations and anomalies in high-
dimensional data.

 Open Datasets and Tooling: Availability of


benchmark datasets (e.g., Kaggle), libraries (scikit-
learn, XGBoost), and tools (Jupyter, Streamlit) make
experimentation and deployment accessible.

 Explainability Techniques: SHAP and LIME


provide the transparency needed in regulated
environments.

Academic Motivation: Fig.2.DfD Diagram

 Contributes to the growing literature on practical ML 1.4 Expected Outcomes


applications.
At the end of the project, the following deliverables
 Serves as a proof-of-concept for integrating classical are expected:
algorithms in operational systems.
1. A Functional Prototype
This project aligns with the broader goal of creating
intelligent, responsive, and responsible AI systems in fintech.  A web service/API that can receive transaction inputs
and return fraud predictions.
1.3 Objectives
 Integration of a simple dashboard (e.g., Streamlit or
This project has the following major and minor objectives: Power BI) for fraud analysis.

Primary Goal: 2. Comparative Performance Report

To develop, evaluate, and deploy a machine learning-  Tabulated and visual comparison of models across
based credit card fraud detection system that is scalable, key metrics.
explainable, and capable of real-time fraud prediction.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

 Confusion matrix and ROC/PR curves for in-depth  The dataset is anonymized, limiting feature
analysis. engineering.

3. Deployment Strategy  Real-world fraud may involve behavioral/contextual


features (IP, device ID) which are absent here.
 Documentation and architecture for integrating the
ML model into existing banking systems.  The model may degrade over time due to concept
drift.
 Deployment via Docker/Kubernetes for scalability.
 Interpretability may be limited for some models like
4. Explainability Module SVM.

 SHAP or LIME-based visualizations to justify model


2. PROPOSED SYSTEM
predictions to analysts or regulators.
This chapter presents the design and development of the
5. Research Contribution
credit card fraud detection system based on machine learning
 A comprehensive academic report for use in future techniques. The proposed system is structured to ensure data
studies or implementation. quality, address data imbalance, optimize predictive accuracy,
and support real-time fraud detection requirements.
 Possibility of journal publication or conference
submission. 2.1 System Architecture

6. Ethical and Legal Framework The system architecture of a credit card fraud
detection platform must be designed to process large volumes
 A checklist of compliance with GDPR, GLBA, of transaction data rapidly and accurately while being capable
DPDPA, and other relevant laws. of adapting to evolving fraud patterns. The proposed
architecture for this project is structured into modular layers,
 Bias mitigation strategies and fairness metrics. each with a dedicated function to ensure efficiency,
scalability, and maintainability.
1.5 Scope of the Project
2.1.1. Data Acquisition Layer
The scope includes:
This layer is responsible for collecting transaction data in
 Only binary classification (fraud vs genuine).
real-time or batch mode from multiple sources such as:
 Use of supervised learning techniques.
 Bank transaction databases.
 Focus on classical ML algorithms (not deep
 Third-party APIs.
learning).
 Payment gateways.
 Use of a publicly available Kaggle dataset.
The system must support both streaming data (real-time
 Real-time prediction via REST API.
transactions) and historical data (for training and evaluation).
 Exclusion of financial or legal liability from This layer ensures that the data is captured securely and with
prediction outcomes. minimal latency.

1.6 Assumptions and Limitations 2.1.2. Data Processing Layer

Assumptions: Once data is acquired, it undergoes preprocessing before


being fed into any model. This includes:
 The dataset is representative of real-world behavior.
 Data Cleaning: Handling missing values, removing
 Fraud patterns are somewhat learnable from duplicates, and treating outliers.
historical data.
 Feature Scaling: Normalizing 'Time' and 'Amount' to
 Transaction features remain similar over time. ensure all features contribute equally.

Limitations:

© 2025, IJSREM | www.ijsrem.com DOI: | Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

 Encoding: Although the original dataset contains This layer ensures seamless integration with banking
numerical values (PCA-transformed), any additional systems and enables real-time decision-making.
categorical data can be one-hot encoded.
2.1.6. Monitoring and Feedback Layer
 Data Balancing: Using SMOTE to address the class
imbalance by generating synthetic examples of To maintain accuracy over time, models must adapt. This
minority (fraud) class. layer:

This layer transforms raw data into a clean, consistent,  Monitors prediction accuracy and drift in
and structured format suitable for modeling. transaction patterns.

2.1.3. Modeling Layer  Enables periodic retraining using labeled data.

This is the core layer where machine learning algorithms  Collects user feedback from analysts for
are applied. The models implemented in this project include: continuous learning.

 K-Nearest Neighbors (KNN). This feedback loop transforms the system into a self-
improving fraud detection platform.
 Logistic Regression.
3.METHODOLOGY
 Support Vector Machine (SVM).
This chapter describes the step-by-step methodology followed
 Decision Tree Classifier. for developing, training, and evaluating the credit card fraud
detection models. It includes dataset details, preprocessing
Each model is trained on the preprocessed and balanced strategies, model selection, hyperparameter tuning, and
dataset. Cross-validation and hyperparameter tuning are evaluation criteria.
applied to optimize performance. The layer also includes
version control and model validation mechanisms. 3.1 Dataset Description

2.1.4. Evaluation Layer The dataset used in this study is sourced from
Kaggle's publicly available Credit Card Fraud Detection
After training, models are evaluated based on multiple repository, originally made available by a European card
metrics: issuer. This dataset has become a standard benchmark for
evaluating fraud detection models due to its real-world origin
 Accuracy
and the challenges it presents, such as extreme class
 Precision imbalance and anonymized features.

 Recall Key Characteristics:

 F1-Score  Total Transactions: 284,807

 ROC-AUC  Fraudulent Transactions: 492 (approximately


0.17%)
This layer generates performance reports and
visualizations (e.g., confusion matrices, ROC curves),  Genuine Transactions: 284,315
allowing comparison and selection of the best-performing
 Features: 30 in total:
model.
o V1 to V28: These are principal components
2.1.5. Deployment Layer
obtained using Principal Component
Once a model is selected, it is deployed using: Analysis (PCA). The actual feature names
are not disclosed due to privacy and
 Flask or Django (Python web frameworks) for confidentiality concerns.
REST API serving.
o ‘Time’: Represents the time elapsed in
 Kafka or AWS Kinesis for transaction stream seconds between each transaction and the
integration. first transaction in the dataset.

 Dashboards (e.g., Grafana, Power BI) for fraud o ‘Amount’: The transaction amount in euros.
analysts.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

o ‘Class’: The target variable (0 = genuine, 1  SMOTE was applied only to the training set to avoid
= fraud). data leakage.

Reasons for Dataset Selection:  We used imblearn's SMOTE() function in Python.

 It represents real-world conditions with naturally 3.2.2 Feature Scaling


occurring fraud patterns.
While PCA-transformed features (V1–V28) are
 The extreme imbalance mimics actual industry data. already standardized, the ‘Time’ and ‘Amount’ features are
not. Since algorithms like KNN and SVM are sensitive to
 Anonymized yet rich feature set allows feature scale, normalization is crucial.
experimentation without breaching privacy.
Steps Taken:
Although anonymization limits interpretability, the dataset
still enables the construction and validation of high-  StandardScaler from sklearn.preprocessing was used
performance machine learning models. to scale ‘Time’ and ‘Amount’.

3.2 Data Preprocessing  The transformation ensures these features have zero
mean and unit variance.
Preprocessing is a critical step in any machine
learning project. It ensures that the data fed into algorithms is Why not normalize all features?
clean, consistent, and optimized for learning. For fraud The PCA components already have unit variance due to
detection, preprocessing must also tackle class imbalance and the nature of dimensionality reduction. Re-scaling them could
feature scale issues. distort their meaning.

3.2.1 Handling Imbalance: SMOTE 3.2.3 Train-Test Split

The original dataset contains 492 fraudulent For supervised learning, it is essential to evaluate the
transactions out of 284,807, making it highly imbalanced. A model’s performance on unseen data. Therefore, we split the
naïve model trained on this data may classify all transactions dataset as follows:
as genuine and still achieve 99.8% accuracy—yet such a
model would be useless in practice.  Training Set: 70%

To address this, we apply the Synthetic Minority  Testing Set: 30%


Oversampling Technique (SMOTE).
Stratification was applied to ensure that both classes
What is SMOTE? (fraud and genuine) are proportionally represented in both
SMOTE creates synthetic samples for the minority class sets.
(fraud) by:
Reasoning:
 Selecting a minority class instance.
 Prevents overfitting by isolating a portion of the data
 Identifying its k nearest minority neighbors. for unbiased testing.

 Generating a new sample along the line  Stratification prevents the testing set from being
segment connecting the selected sample and dominated by the majority class.
its neighbors.
Further Improvements:
Advantages: For final evaluation, k-fold cross-validation was used
during model training to ensure robustness and
 Reduces class imbalance without duplicating data. generalizability.

 Preserves important characteristics of fraudulent 3.2.4 Additional Preprocessing Steps


transactions.
 Null Checks: No missing values detected.
 Improves recall and F1-score by enabling models to
learn fraud patterns more effectively.  Duplicate Removal: Verified via hashing—none
detected.
Implementation:

© 2025, IJSREM | www.ijsrem.com DOI: | Page 5


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

 Correlation Heatmaps: No strong multicollinearity  Penalty: L2 regularization (Ridge)


due to PCA transformation.
Pros:
 Outlier Handling: Not required due to data
normalization.  Fast and efficient.

3.3 Algorithms Used  Produces interpretable outputs (probabilities).

We selected four widely recognized classification  Works well on linearly separable data.
algorithms to compare their effectiveness in detecting credit
Cons:
card fraud. Each has different strengths and computational
characteristics.  Less effective with non-linear data.

 Might require manual feature transformations.


3.3.1 K-Nearest Neighbors (KNN) Use Case in Fraud:
Highly suited for real-time systems where decisions must
Description:
be explained to regulators or auditors.
KNN is a lazy learning algorithm. It makes predictions
by calculating the distance between the query point and all 3.3.3 Support Vector Machine (SVM)
instances in the training set, choosing the most frequent label
among the k-nearest neighbors. Description:
SVM seeks the optimal hyperplane that separates the
Hyperparameters: classes in feature space. It uses kernel functions to handle
non-linear boundaries.
 k = 5 (number of neighbors)
Hyperparameters:
 Distance Metric: Euclidean
 Kernel: Radial Basis Function (RBF)
Pros:
 C (Regularization): 1.0
 Simple, non-parametric.
Pros:
 Performs well when the decision boundary is
irregular.  Strong performance in high-dimensional spaces.
Cons:  Effective at minimizing false negatives.
 Computationally expensive at inference time. Cons:
 Performance degrades with high-dimensional or  Computationally expensive to train.
noisy data.
 Difficult to interpret.
Use Case in Fraud:
Useful as a benchmark. Despite its simplicity, it Use Case in Fraud:
sometimes identifies subtle fraud clusters not detected by Excellent for batch fraud detection scenarios where
linear models. latency is acceptable and accuracy is paramount.

3.3.2 Logistic Regression 3.3.4 Decision Tree

Description: Description:
A linear model that estimates the probability that a A hierarchical structure that splits the data based on
given input belongs to the positive class using a logistic feature values, leading to decisions at the leaf nodes.
function. It is particularly suited for binary classification
problems. Hyperparameters:

Hyperparameters:  Criterion: Gini impurity

 Solver: liblinear  Max Depth: None (splits until leaves are pure)

Pros:

© 2025, IJSREM | www.ijsrem.com DOI: | Page 6


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

 Fast training and prediction.  Recall is critical to reduce undetected frauds.

 Easy to interpret and visualize.  Precision ensures that flagged transactions are indeed
fraudulent.
Cons:
 F1-Score balances both.
 Prone to overfitting.
 AUC-ROC evaluates performance across thresholds.
 Less stable (small data changes can cause large tree
structure changes). 3.4.1 Confusion Matrix

Used to visualize:

Use Case in Fraud:  How many frauds were correctly/incorrectly


Useful in decision support systems where analysts predicted
must understand the rationale behind model predictions.
 Effectiveness of model classification boundaries
3.4 Evaluation Metrics
3.4.2 Cross-Validation Strategy
Evaluating fraud detection models requires metrics
beyond accuracy due to class imbalance. In this project, we k-fold Cross-Validation (k = 5) was used during model
employed the following metrics: training to:

Metric Formula Purpose  Avoid overfitting

 Ensure generalization across different data subsets


Overall correctness (but
(TP + TN) / (TP + TN
Accuracy misleading for imbalanced 34.3 Threshold Tuning
+ FP + FN)
data)
The default decision threshold (0.5) was adjusted based on
Proportion of positive Precision-Recall tradeoffs using ROC and PR curves.
Precision TP / (TP + FP)
predictions that were correct
4.5 Tools and Frameworks
Proportion of actual frauds
Recall TP / (TP + FN)
that were detected Function Tool Used

2 × (Precision × ML Modeling scikit-learn


Harmonic mean of Precision
F1-Score Recall) / (Precision +
and Recall Data Preprocessing pandas, numpy
Recall)

Trade-off between True Oversampling imbalanced-learn


AUC- Area under the ROC
Positive Rate and False
ROC Curve Visualization matplotlib, seaborn
Positive Rate

Experiment Tracking MLflow (optional)

Where: IDE Jupyter Notebook, VSCode

 TP: True Positives (correct fraud detections)

 TN: True Negatives (correct non-fraud detections) 4.RESULTS AND DISCUSSION


 FP: False Positives (legitimate transactions flagged The goal of this chapter is to present a comprehensive
as fraud) evaluation of the machine learning models applied to the
credit card fraud detection problem. We assess each model
 FN: False Negatives (missed frauds)
using a variety of performance metrics, compare their
Why Use Multiple Metrics? strengths and limitations, analyze confusion matrices, and
discuss their suitability for real-world deployment. Special
 Accuracy may be high even when the model fails to attention is given to minimizing false negatives while
detect fraud. maintaining acceptable false positive rates.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 7


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

4.1 Overall Results 4.2.1 Precision-Recall Comparison

After preprocessing and training, the four selected The Precision-Recall Curve is especially useful in
machine learning models were evaluated on the test dataset fraud detection because it focuses on the positive (fraud)
using accuracy, precision, recall, F1-score, and AUC-ROC class. A model that can achieve high recall without
metrics. Each model exhibited different strengths and compromising much on precision is considered optimal.
weaknesses, offering trade-offs in terms of detection ability,
interpretability, and real-time feasibility Area Under Precision-Recall
Model
Curve (PR-AUC)
Below is a summary table of performance metrics:
KNN 0.942
Model Accuracy Precision Recall F1-Score
Logistic
0.963
KNN 94.1% 92.3% 90.8% 91.5% Regression

Logistic SVM 0.981


95.6% 94.8% 91.7% 93.2%
Regression
Decision Tree 0.955
SVM 96.2% 93.5% 95.4% 94.4%
4.2.2 ROC Curve Analysis
Decision Tree 95.3% 91.2% 94.1% 92.6%
The Receiver Operating Characteristic (ROC)
Curve plots the True Positive Rate (Recall) against the False
Positive Rate. The Area Under the ROC Curve (AUC-
Observation: ROC) reflects the model's ability to distinguish between fraud
and legitimate transactions.
 SVM achieved the highest overall performance,
particularly excelling in Recall (95.4%), which is  SVM and Logistic Regression demonstrated AUC
critical for fraud detection because it minimizes the values close to 1.0, indicating excellent separability.
number of undetected fraudulent transactions.
 Decision Tree followed closely, showing robust
 Logistic Regression maintained a strong balance generalization.
between Precision and Recall, making it suitable for
real-time fraud detection systems that demand fast  KNN, while strong in recall, had a slightly lower
and interpretable results. AUC due to its sensitivity to noisy and high-
dimensional data.
 KNN showed reasonable results but had higher
computation time during inference due to distance Conclusion:
calculations, making it less ideal for real-time While all models performed admirably, SVM and
deployment. Logistic Regression consistently outperformed others across
both PR and ROC curve analyses.
 Decision Tree achieved good recall but slightly lower
precision, indicating a tendency toward higher false 4.3 Confusion Matrix (SVM Example)
positives.
To illustrate real-world performance, we present
4.2 Model Comparison the confusion matrix for the Support Vector Machine model

 SVM showed the highest recall, making it ideal for Predicted Predicted
minimizing false negatives (missing actual frauds). Genuine Fraud

 Logistic Regression performed well with high Actual 83,450


precision, making it suitable where false alarms are 52 (FP)
Genuine (TN)
costly.
Actual Fraud 21 (FN) 438 (TP)
 KNN had reasonable performance but high
computation time for large datasets.

 Decision Tree was fast and interpretable but prone to Insights:


overfitting.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 8


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

 True Positives (TP): 438 transactions were correctly Even with SMOTE, some fraud patterns may
flagged as fraud. remain underrepresented, especially rare or highly novel
schemes.
 False Positives (FP): 52 genuine transactions were
incorrectly flagged. 2. Generalization to Real-World Systems:

 False Negatives (FN): 21 frauds were missed, which  The dataset lacks contextual features (e.g., IP
is relatively low. address, location, device ID).

 True Negatives (TN): Over 83,000 genuine  Models trained on static datasets may not generalize
transactions were correctly classified. well unless continuously updated.

Business Implication: 3. Concept Drift:

 Low FN is critical: Missing frauds can lead to large Fraud techniques evolve rapidly. Static models
financial losses. degrade in accuracy over time unless retrained frequently with
updated data.
 Moderate FP is manageable: False positives can be
reviewed manually or verified through OTPs, 4. Scalability Issues:
ensuring minimal customer inconvenience.
 KNN is slow at prediction time due to distance
Analysis: calculations.

 Only 21 fraudulent transactions were missed out of  SVM, although highly accurate, is computationally
459 frauds. expensive for very large datasets.

 52 legitimate transactions were incorrectly flagged as 5. Explainability:


fraud (manageable in banking systems where manual
review is possible). While models like Logistic Regression and
Decision Trees are interpretable, SVM and KNN lack
 Overall, the SVM model shows a strong ability to transparency, posing challenges in highly regulated industries.
detect fraudulent behavior with minimal
misclassification. 4.7 Key Takeaways

4..4 Time and Resource Efficiency The model evaluation reveals valuable insights
into the practical applicability of each algorithm:
4.5Training Time
Support Vector Machine (SVM):
Training Time Inference Time (avg per 1000
Model  Best Overall Performance: Achieved highest recall
(s) samples)
and F1-score.
KNN <5 1.2 seconds
 Use Case: Ideal for high-stakes fraud detection
Logistic systems where minimizing false negatives is crucial.
<2 0.01 seconds
Reg.
Logistic Regression:
SVM ~30 0.05 seconds  Best Interpretability: High precision and low
latency.
Decision
<3 0.01 seconds
Tree  Use Case: Suitable for deployment in financial
institutions where decisions must be explainable and
fast.
4.6 Limitations Identified Decision Tree:
Despite strong performance, some limitations  Fast and Transparent: Slight trade-off in precision.
were observed in this study:
 Use Case: Great for rule-based augmentation or as
1. Data Imbalance: part of ensemble models.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 9


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

K-Nearest Neighbors:  Balancing data with SMOTE helps but must be done
carefully.
 Effective but Inefficient: High computational cost  Transparency and deployment readiness are
makes it less practical for real-time fraud detection. essential.
 Regulations guide building trustworthy AI.
 Use Case: Academic benchmarks or systems with
small datasets. 5.5 Final Thoughts
Fraud detection is an ongoing battle. This project lays the
4.8 Recommendations Based on Results foundation for strong machine learning solutions, but
continuous improvements and ethical considerations are
 Implement SVM with fallback logic to Logistic needed to keep systems effective and fair.
Regression if latency exceeds threshold.

 Set dynamic fraud thresholds based on:


6.ACKNOWLEDGEMENT
o User risk profiles
We thank God for His blessings and also for giving as good
o Transaction type knowledge and strength in enabling us to finish our project.
Our deep gratitude goes to our founder late Dr.D.
o Historical fraud probability SELVARAJ, M.A., M.Phil., for his patronage in the
completion of our project. We like to take this opportunity to
 Integrate human-in-the-loop decision system for
thank our honourable chairperson Dr.S. NALINI
edge cases.
SELVARAJ, M.COM., MPhil., Ph.D. and honourable
director, MR.S. AMIRTHARAJ, M.Tech. , M.B.A for their
5. CONCLUSION AND FUTURE support given to us to finish our project successfully. Also we
ENHANCEMENTS would like to extend my sincere thanks to our respected
Principal, Dr .C. RAMESH BABU DURAI, M.E., Ph.D. for
5.1 Conclusion having provided me with all the necessary facilities to
This project focused on detecting credit card fraud using
undertake this project. We extend our deepest gratitude to our
machine learning. We tested four algorithms—KNN, Logistic
Regression, SVM, and Decision Tree—on a real dataset with Head of the Department and our Project Guide,
imbalanced classes. SVM performed best overall, especially Mrs.Hemalatha D B.TECH.,M.E., whose invaluable
in detecting fraud. Logistic Regression was good for quick, suggestions, guidance, and encouragement were instrumental
interpretable results. We also addressed ethical concerns and in the success of our project. Her expertise and direction not
real-world deployment. only steered us through challenges but also elevated our
project to a remarkable achievement. Additionally, we express
5.2 Future Enhancements heartfelt thanks to our parents, friends, and staff members
To improve the system, future work can include:
whose unwavering support and encouragement sustained us
throughout the entirety of this project. Their belief in our
 Ensemble Methods: Combining models like Random
Forest and XGBoost for better accuracy. capabilities fueled our determination, and their assistance
 Deep Learning: Using autoencoders and LSTMs to ensured the smooth progress of our work. Together, their
catch complex fraud patterns. contributions have been integral to the realization of our
 Real-Time Detection: Building systems that catch fraud project's goals, and we are profoundly grateful for their
instantly with streaming data. unwavering support and belief in our endeavors.
 Explainable AI: Making model decisions easy to
understand using SHAP or LIME. REFERENCES
 Adaptive and Multimodal Detection: Updating models
continuously and using more data types like user [1] Y. Abakarim, M. Lahby, and A. Attioui, "An efficient real-time
behavior and location. model for credit card fraud detection based on deep learning," in
Proc. 12th Int. Conf. Intelligent Systems: Theories and Applications
5.3 Importance (SITA), Oct. 2018, pp. 1–7.
This work offers practical and ethical solutions for financial
fraud detection that can help both researchers and industry. [2] H. Abdi and L. J. Williams, "Principal component analysis,"
Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no.
5.4 Key Learnings 4, pp. 433–459, Jul. 2010.

[3] V. Arora, R. S. Leekha, K. Lee, and A. Kataria, "Facilitating user


 Recall is more important than accuracy in fraud
authorization from imbalanced data logs of credit cards using
detection.
artificial intelligence," Mobile Information Systems, vol. 2020, pp. 1–
13, Oct. 2020.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 10


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

[4] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim,


"Performance analysis of feature selection methods in software
defect prediction: A search method approach," Applied Sciences, vol.
9, no. 13, pp. 2764, Jul. 2019.

[5] B. Bandaranayake, "Fraud and corruption control at education


system level: A case study of the Victorian department of education
and early childhood development in Australia," Journal of Cases in
Educational Leadership, vol. 17, no. 4, pp. 34–53, Dec. 2014.

[6] J. Błaszczyński, A. T. de Almeida Filho, A. Matuszyk, M.


Szelński, and R. Słowiński, "Auto loan fraud detection using
dominance-based rough set approach versus machine learning
methods," Expert Systems with Applications, vol. 163, Jan. 2021.

[7] B. Branco, P. Abreu, A. S. Gomes, M. S. C. Almeida, J. T.


Ascensão, and P. Bizarro, "Interleaved sequence RNNs for fraud
detection," in Proc. 26th ACM SIGKDD Int. Conf. Knowledge
Discovery & Data Mining, 2020, pp. 3101–3109.

[8] F. Cartella, O. Anunciacao, Y. Funabiki, D. Yamaguchi, T.


Akishita, and O. Elshocht, "Adversarial attacks for tabular data:
Application to fraud detection and imbalanced data,"
arXiv:2101.08030, 2021.

[9] S. S. Lad and A. C. Adamuthe, "Malware classification with


improved convolutional neural network model," International
Journal of Computer Network and Information Security, vol. 12, no.
6, pp. 30–43, Dec. 2021.

[10] V. N. Dornadula and S. Geetha, "Credit card fraud detection


using machine learning algorithms," Procedia Computer Science,
vol. 165, pp. 631–641, Jan. 2019.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 11

You might also like