0% found this document useful (0 votes)

3 views6 pages

Pdsreport

The report outlines the development of a machine learning-based fraud detection system aimed at accurately classifying financial transactions as legitimate or fraudulent. Various models were implemented, with XGBoost emerging as the most effective due to its high accuracy, precision, and recall. Future enhancements may include real-time predictions and integration with live payment systems.

Uploaded by

Lohitha Jangala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Pdsreport

Uploaded by

Lohitha Jangala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Transaction Fraud Detection Report

1. Introduction

With the rapid growth of online transactions and digital payments, fraud detection has become a critical
issue for financial institutions and businesses. Fraudulent transactions not only cause financial losses but
also damage an organization's reputation. This project focuses on developing a machine learning-based
fraud detection system using various algorithms to classify transactions as legitimate or fraudulent.

The goal was to build models capable of identifying fraudulent activities from transactional datasets while
minimizing false positives. This report covers the dataset used, data preprocessing steps, feature
engineering, machine learning models applied, evaluation metrics, and performance analysis.

2. Problem Statement

The project aims to detect fraudulent financial transactions using machine learning models. The challenge
is to accurately classify transactions as legitimate or fraudulent, ensuring high recall (to catch fraud cases)
and high precision (to reduce false alarms).

Objectives:

 Develop a fraud detection model using machine learning techniques.

 Compare various models based on key performance metrics.

 Optimize the best-performing model for deployment.

3. Dataset Description

The dataset used in this project consists of historical transactional data, including various features related
to transaction details. This dataset is highly imbalanced, with fraudulent transactions being only a small
fraction of the total data.

Columns:

 Transaction ID: Unique identifier for each transaction.

 Transaction Amount: Amount involved in the transaction.

 Transaction Type: Type of the transaction (e.g., debit, credit).

 Account Balance (Before and After): Account balance before and after the transaction.

 Fraud Label: Indicates whether the transaction is fraudulent (1) or legitimate (0).

 Time of Transaction: Timestamp indicating when the transaction occurred.

 Merchant ID: Identifier for the merchant where the transaction took place.
 Customer ID: Identifier for the customer involved in the transaction.

 Location: Geographical location of the transaction (e.g., country, city).

 Device ID: Identifier for the device used to make the transaction.

 IP Address: IP address from which the transaction was initiated.

4. Data Preprocessing

Given the complexity of transactional data, several preprocessing steps were applied to clean and
transform the dataset for machine learning:

1. Handling Missing Values:

Missing values were filled using median imputation to avoid data loss while maintaining
robustness.

2. Feature Scaling:
Numerical features such as transaction amounts and balances were normalized using Min-Max
scaling to ensure equal importance during model training.

3. Encoding Categorical Variables:

Transaction types, merchant IDs, customer IDs, and other categorical features were converted
into numerical values using one-hot encoding.

4. Data Splitting:
The dataset was split into training and testing sets using an 80:20 ratio to evaluate the models
effectively.

5. Data Balancing:
Since the dataset was highly imbalanced, SMOTE (Synthetic Minority Oversampling
Technique) was applied to balance the classes and ensure fair model training.

5. Feature Selection

Feature selection was performed using the Boruta algorithm, which helps in identifying the most
important features for the classification task. Boruta is a feature selection method that iteratively removes
features deemed irrelevant by comparing them to random 'shadow' features.

The following columns were selected using Boruta for the final model:

 step

 oldbalance_org

 newbalance_orig

 newbalance_dest

 diff_new_old_balance

 diff_new_old_destiny
 type_TRANSFER

These selected features were used in training the machine learning models, as they were determined to
have the highest relevance to detecting fraudulent transactions.

6. Top 3 Data Insights

 All the fraud amount is greater than 10.000.

 60% of fraud transaction occours using cash-out-type method.

 Values greater than 100.000 occours using transfers-type method.

 Fraud Transactions that occurs at least in 3 days

6. Machine Learning Models

Multiple machine learning models were implemented, evaluated, and compared based on their
performance metrics. Below is a detailed explanation of each model, its underlying mechanism, and its
performance in fraud detection.

1. Dummy Classifier: A baseline model that performed poorly, with a balanced accuracy of 0.5 and
no predictive power (precision, recall, F1, and Kappa all at 0.0).
2. Logistic Regression: Showed high precision (1.0) but low recall (0.129), indicating it was good
at identifying non-fraudulent transactions but missed many fraudulent ones.

3. LightGDM: Achieved moderate performance with balanced accuracy of 0.681, but low precision
(0.27) and recall (0.364), making it less effective for fraud detection.

4. Support Vector Machine (SVM): High precision (1.0) but low recall (0.192), similar to logistic
regression, failing to detect many fraudulent transactions.

5. K-Nearest Neighbors (KNN): Showed strong precision (0.943) and moderate recall (0.411),
making it effective for detecting non-fraudulent transactions but missing some fraud cases.

6. Random Forest: Performed well with high balanced accuracy (0.861), precision (0.969), and
recall (0.721), making it suitable for fraud detection.

7. XGBoost: The best-performing model with the highest balanced accuracy (0.887), precision
(0.938), and recall (0.775), making it the most effective for detecting fraud.

7. Hyperparameter Tuning for XGBoost

Hyperparameter tuning was performed to optimize the XGBoost model. The process involved adjusting
parameters such as the learning rate, max depth, and number of estimators. The tuning significantly
improved the model’s performance, resulting in the following metrics:

 Balanced Accuracy: 0.874

 Precision: 0.942
 Recall: 0.748
 F1-Score: 0.834
 Kappa: 0.834

8. Evaluation on Unseen Data

After training the model with the selected hyperparameters, we evaluated its performance on the unseen
test data. The final model's predictions were compared against the actual labels from the test set, and the
following performance metrics were obtained:
 Balanced Accuracy: 0.912

 Precision: 0.957

 Recall: 0.823

 F1-Score: 0.885

 Kappa: 0.885

These results indicate that the model performs well on unseen data, with a high balanced accuracy and
precision, suggesting that it is correctly classifying most of the positive instances. The recall value reflects
a reasonable ability to detect positive instances, though there is still room for improvement. The F1-score
and Kappa indicate a strong agreement between predicted and actual labels, demonstrating good overall
model performance and robustness.

9. Conclusion

This project successfully developed and evaluated various machine learning models for detecting
fraudulent transactions. After extensive testing and evaluation, XGBoost emerged as the best model due
to its high accuracy, precision, and recall, making it suitable for deployment in a real-world fraud
detection system.

Future improvements could include real-time prediction capabilities, integration with a live payment
processing system, and continuous model retraining to handle evolving fraud patterns.

10. References

Fraud Detection with Machine Learning: Identifying Suspicious Patterns in Financial Transactions | by
Zhong Hong | Medium

https://www.sciencedirect.com/science/article/pii/S2772662223000036

https://cs229.stanford.edu/proj2018/report/261.pdf

https://ieeexplore.ieee.org/document/9004231

Final Year Project
100% (4)
Final Year Project
58 pages
Design and Construction of A Battery Level Indicator
No ratings yet
Design and Construction of A Battery Level Indicator
10 pages
ML Fraud Detection Case Study
No ratings yet
ML Fraud Detection Case Study
5 pages
FDS Project Report
No ratings yet
FDS Project Report
7 pages
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
11 pages
Credit Card Fraud Detection Using Machine Learning Techniques
No ratings yet
Credit Card Fraud Detection Using Machine Learning Techniques
4 pages
Fraud Detection
No ratings yet
Fraud Detection
19 pages
Fraud Detection Project Report
No ratings yet
Fraud Detection Project Report
4 pages
Internship Project
No ratings yet
Internship Project
8 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
Script KHDL
No ratings yet
Script KHDL
4 pages
Phase 5
No ratings yet
Phase 5
10 pages
HR Template
No ratings yet
HR Template
6 pages
Phase-2 For DS
No ratings yet
Phase-2 For DS
13 pages
Synopsis ML Projectpdf
No ratings yet
Synopsis ML Projectpdf
13 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
3 pages
SSRN 5240326
No ratings yet
SSRN 5240326
8 pages
11
No ratings yet
11
15 pages
Online Transactions Fraud Detection Using Machine Learning
No ratings yet
Online Transactions Fraud Detection Using Machine Learning
4 pages
ML Final
No ratings yet
ML Final
34 pages
ONLINE PAYMENT FRAUD DETECTION USING MACHINE LEARNING MODEL - Key
No ratings yet
ONLINE PAYMENT FRAUD DETECTION USING MACHINE LEARNING MODEL - Key
12 pages
307 A029 Seminar
No ratings yet
307 A029 Seminar
16 pages
Fraud Detection Synopsis
No ratings yet
Fraud Detection Synopsis
5 pages
Final Year Project
No ratings yet
Final Year Project
27 pages
B17 Discrete Report
No ratings yet
B17 Discrete Report
16 pages
Fraud Detection ML Research Paper
No ratings yet
Fraud Detection ML Research Paper
3 pages
Report
No ratings yet
Report
14 pages
Upi Fraud Detection Using Machine Learning
No ratings yet
Upi Fraud Detection Using Machine Learning
11 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Bank Fraud Prediction
No ratings yet
Bank Fraud Prediction
16 pages
Research Proposal Template For Master Student
No ratings yet
Research Proposal Template For Master Student
15 pages
Mano Phase 2
No ratings yet
Mano Phase 2
10 pages
.Trashed 1750261541 Phase 2 - Hari
No ratings yet
.Trashed 1750261541 Phase 2 - Hari
3 pages
Final Project Document
No ratings yet
Final Project Document
8 pages
Fraud Detection Introduction
No ratings yet
Fraud Detection Introduction
6 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
25 pages
Domaine Des Transactions D'argent Mobile
No ratings yet
Domaine Des Transactions D'argent Mobile
6 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
Project Presentation
No ratings yet
Project Presentation
11 pages
TE Seminar Formatfinal
No ratings yet
TE Seminar Formatfinal
16 pages
Fraud Detection in Financial Transaction
No ratings yet
Fraud Detection in Financial Transaction
5 pages
Data Miing
No ratings yet
Data Miing
6 pages
ONLINE PAYMENT FRAUD DETECTION USING MACHINE LEARNING MODEL - Key
No ratings yet
ONLINE PAYMENT FRAUD DETECTION USING MACHINE LEARNING MODEL - Key
11 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
46 pages
Fraud Detectionusing Machine Learning
No ratings yet
Fraud Detectionusing Machine Learning
36 pages
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
No ratings yet
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
22 pages
Fraud Detection in Online Transactions Using Machine Learning Techniques
No ratings yet
Fraud Detection in Online Transactions Using Machine Learning Techniques
24 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Final Synopsis Fraud Detection
No ratings yet
Final Synopsis Fraud Detection
15 pages
A Comparison Study of Fraud Detection in Usage of Credit Cards Using Machine Learning
No ratings yet
A Comparison Study of Fraud Detection in Usage of Credit Cards Using Machine Learning
24 pages
Computer Science
No ratings yet
Computer Science
30 pages
Machine Learning Report
No ratings yet
Machine Learning Report
5 pages
Nityananda Vyawhare 2223216 Case Study 5
No ratings yet
Nityananda Vyawhare 2223216 Case Study 5
5 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
36 pages
Topic 2
No ratings yet
Topic 2
5 pages
IT Task 3 Capstone Report
No ratings yet
IT Task 3 Capstone Report
18 pages
Cryptocurrencies : The new era of currency for companies
From Everand
Cryptocurrencies : The new era of currency for companies
MAX EDITORIAL
No ratings yet
How AI is Enhancing Business Performance
From Everand
How AI is Enhancing Business Performance
akosnemeth
No ratings yet
Machine Learning for Quants: Algorithms for Predicting Market Movements
From Everand
Machine Learning for Quants: Algorithms for Predicting Market Movements
William Johnson
No ratings yet
AI in Quantitative Analysis
From Everand
AI in Quantitative Analysis
Anand Vemula
No ratings yet
ĐỀ NGHE SỐ 13A
No ratings yet
ĐỀ NGHE SỐ 13A
10 pages
CS-403 S.E LabManual Jan-June 2025
No ratings yet
CS-403 S.E LabManual Jan-June 2025
44 pages
My Triumph Connectivity - Faq - English
No ratings yet
My Triumph Connectivity - Faq - English
21 pages
Fonduri Europene Digitalizare
No ratings yet
Fonduri Europene Digitalizare
4 pages
Letters For Ojt
No ratings yet
Letters For Ojt
13 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
No ratings yet
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
26 pages
Informatics Practices Class 12
No ratings yet
Informatics Practices Class 12
8 pages
Start Guide
No ratings yet
Start Guide
37 pages
Fault Diagnosis and Fault Tolerant Control of A Three-Phase VSI Supplying Sensorless Speed Controlled Induction Motor Drive
No ratings yet
Fault Diagnosis and Fault Tolerant Control of A Three-Phase VSI Supplying Sensorless Speed Controlled Induction Motor Drive
17 pages
Report Content
No ratings yet
Report Content
29 pages
Ajeesha (SIP)
No ratings yet
Ajeesha (SIP)
43 pages
Synopsis (MCA) Clothes Management
No ratings yet
Synopsis (MCA) Clothes Management
9 pages
Maximum Demand Controller
0% (1)
Maximum Demand Controller
4 pages
Kalika Manavgyan Secondary School
No ratings yet
Kalika Manavgyan Secondary School
2 pages
5G Green Communication Networks - at
No ratings yet
5G Green Communication Networks - at
2 pages
Entry Level Web Developer Resume Example
No ratings yet
Entry Level Web Developer Resume Example
1 page
LAS 03 Illustrating A Probability Distribution For A Discrete Random Variable
No ratings yet
LAS 03 Illustrating A Probability Distribution For A Discrete Random Variable
1 page
EOI - INSDAG Architecture Award Competition 2023
No ratings yet
EOI - INSDAG Architecture Award Competition 2023
1 page
1 Agile Manifesto
No ratings yet
1 Agile Manifesto
39 pages
Unit 6 Advanced Databases
No ratings yet
Unit 6 Advanced Databases
108 pages
Mining Graphs
No ratings yet
Mining Graphs
23 pages
Texas Homework and Practice Workbook Holt Mathematics Course 2 Answers
100% (1)
Texas Homework and Practice Workbook Holt Mathematics Course 2 Answers
6 pages
IP ROUTING (Unit III)
No ratings yet
IP ROUTING (Unit III)
38 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Aptitude Questions
No ratings yet
Aptitude Questions
10 pages
Thaunkanhe Baisakha 2077 Pages 36
No ratings yet
Thaunkanhe Baisakha 2077 Pages 36
36 pages
A Narrative Review of Medical Image Processing by Deep Learning Models: Origin To COVID-19
No ratings yet
A Narrative Review of Medical Image Processing by Deep Learning Models: Origin To COVID-19
22 pages