Error Detection On Banking Data
Error Detection On Banking Data
BACHELOR OF TECHNOLOGY
In
Sheriguda,
Ibrahimpatnam
(2024-2025)
SRI INDU COLLEGE OF ENGINEERING AND TECHNOLOGY
(An Autonomous Institution under UGC, Accredited by NBA, Affiliated to JNTUH)
CERTIFICATE
Certified that the Technical Seminar Work entitled “FRAUD DETECTION ON BANKING DATA”
is a Bonafide work carried out by BANGI VAISHNAVI (21D41A0523) in partial fulfilment for the
award of BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE AND
ENGINEERING of SICET, Hyderabad for the academic year 2024-2025. The Technical seminar
report has been approved as it satisfies academic requirements in respect of the work prescribed for the
IV YEAR, I-SEMESTER of B. TECH course.
Fraud detection in banking is a critical task, necessitating robust, accurate methods to protect
financial assets and maintain customer trust. This study presents a comprehensive approach to detecting
fraudulent activities within banking transactions using advanced machine learning techniques.
Leveraging a dataset comprising historical transaction records, we explore the efficacy of various
models, including logistic regression, decision trees, random forests, and neural networks, in identifying
fraudulent patterns. The data undergoes extensive preprocessing, including imputation of missing
values, normalization, and feature engineering, to enhance model performance. We employ both
supervised and unsupervised learning methods, with a focus on supervised classification due to the
labeled nature of fraud instances in our dataset. Techniques such as SMOTE (Synthetic Minority Over-
sampling Technique) are applied to address the class imbalance, which is a common challenge in fraud
detection. Model evaluation metrics include precision, recall, F1-score, and the area under the Receiver
Operating Characteristic (ROC) curve, providing a comprehensive assessment of each model’s
performance. The results indicate that ensemble methods, particularly random forests and gradient
boosting, exhibit superior accuracy and robustness in detecting fraudulent transactions compared to
other techniques. This study underscores the importance of feature selection, data balancing, and model
selection in fraud detection. Furthermore, it highlights the potential of machine learning to significantly
enhance the detection of fraudulent activities, thereby contributing to the overall security and reliability
of banking systems. Future work will focus on real-time fraud detection and the integration of adaptive
learning techniques to continuously improve model performance in dynamic banking environments.
ACKNOWLEDGEMENT
With great pleasure I want to take this opportunity to express our heartfelt gratitude to all
the people who helped in making this seminar a success. I thank the almighty for giving us the
courage & perseverance in completing the seminar.
I am highly indebted to, Prof.CH.GVN.PRASAD, Head of the Department of Computer Science &
Engineering, for providing valuable guidance at every stage of this seminar. I would like to thank
the Teaching & Non-Teaching staff of Department of Computer Science & Engineering for sharing
their knowledge with me.
Last but not the least I express my sincere thanks to everyone who helped directly or indirectly for
the presentation of this seminar.
B.VAISHNAVI
21D41A0523
CONTENTS
1. INTRODUCTION 7-8
2. LITERATURE SURVEY 9
8. CONCLUSION 26
9. REFERENCES 27
LIST OF FIGURES
1 NUMBER OF FRAUD 8
CASES REPORTED IN
BANKING DATA
3 ROC CURVE 10
5 CONFUSION MATRIX 25
1. INTRODUCTION
Fraud detection in banking data is a crucial aspect of ensuring the security and reliability of
banking systems. With the increasing use of digital platforms for banking transactions, the risk of
fraud has also increased. According to a report by the Association of Certified Anti-Money
Laundering Specialists (ACAMS), the global losses due to financial crimes are estimated to be
around $1.4 trillion annually. In this research paper, we conduct a comprehensive survey of the
existing fraud detection techniques in banking data and provide a possible theory of our own.
Importance Of Fraud Detection
• Financial Loss Prevention: Fraud can lead to significant financial losses for banks and
customers.
• Reputation Management: Banks risk losing customer trust if they cannot effectively
prevent fraud.
• Regulatory Compliance: Financial institutions must adhere to laws and
regulations that mandate fraud detection measures.
Types Of Fraud In Banking
• Credit Card Fraud: Involves unauthorized use of credit or debit card
information to make purchases or withdraw funds.
• Identity Theft: Occurs when someone uses another person's personal information
without permission to commit fraud.
• Account Takeover: Happens when fraudsters gain access to a user's
account and conduct unauthorized transactions.
• Money Laundering: Involves making illegally-gained proceeds appear legal.
• Alert Systems: Generating alerts for transactions that exceed predefined risk thresholds
for further investigation.
• Data Volume And Variety: The sheer amount of data generated by transactions can
be overwhelming.
• False Positives: Incorrectly flagging legitimate transactions as fraudulent can
inconvenience customers and damage trust.
There are various types of fraud that can occur in banking data, including credit card
fraud, account takeover fraud, and identity theft. Credit card fraud occurs when an unauthorized
person uses a credit card to make transactions. Account takeover fraud occurs when an
unauthorized person gains access to a bank account and makes transactions. Identity theft
occurs when an unauthorized person uses someone else's identity to open a bank account or
make transactions.
There are various methods used for detecting these frauds, including rule-based systems,
machine learning-based systems, and deep learning-based systems. Rule-based systems use
predefined rules to detect fraud. Machine learning-based systems use algorithms to learn
patterns in data and detect fraud. Deep learning-based systems use neural networks to learn
patterns in data and detect fraud.
FIG 2-FLOWCHART FOR FRAUD DETECTION PROCESS
3. FRAUD DETECTION TECHNIQUES
There are various techniques used for detecting fraud in banking data, including:
Machine learning-based fraud detection involves using algorithms to learn patterns in data and
detect fraud. There are various machine learning algorithms that can be used for fraud detection,
including:
➢ Decision Trees: Can be used for both classification and regression tasks,
and are often used in ensemble methods.
➢ Random Forest: An ensemble method that combines multiple decision trees to
improve accuracy and reduce overfitting.
➢ Support Vector Machines (SVMs): Can be used for
classification and regression tasks, and are particularly effective in high-
dimensional spaces.
➢ Neural Networks: Can be used for both classification and regression tasks,
and are particularly effective in complex, non-linear relationships.
• Features Used in Supervised Learning for Fraud Detection
➢ Transaction Time: The time of day, day of the week, and month of the transaction.
➢ Transaction Location: The location of the transaction, such as the country, city, or zip code.
➢ Card Information: Information about the card used in the transaction, such as the
card type, expiration date, and CVV.
➢ User Behavior: Information about the user's behavior, such as their transaction
history, login history, and device information.
➢ Device Information: Information about the device used in the transaction, such as
the device type, operating system, and browser.
• Challenges in Supervised Learning for Fraud Detection
➢ Transaction Time: The time of day, day of the week, and month of the transaction.
➢ Transaction Location: The location of the transaction, such as the country, city, or zip code.
➢ Card Information: Information about the card used in the transaction, such as the
card type, expiration date, and CVV.
➢ User Behavior: Information about the user's behavior, such as their transaction history,
login history, and device information.
➢ Device Information: Information about the device used in the transaction, such as
the device type, operating system, and browser.
• Challenges in Unsupervised Learning for Fraud Detection
➢ Co-Training: Multiple models are trained on different subsets of the labeled data and then
used to label the unlabeled data. The models are then re-trained on the combined labeled and
unlabeled data.
➢ Transaction Time: The time of day, day of the week, and month of the transaction.
➢ Transaction Location: The location of the transaction, such as the country, city, or zip code.
➢ Card Information: Information about the card used in the transaction, such as the
card type, expiration date, and CVV.
➢ User Behavior: Information about the user's behavior, such as their transaction
history, login history, and device information.
➢ Device Information: Information about the device used in the transaction, such as
the device type, operating system, and browser.
➢ Limited Labeled Data: The limited labeled data may not be representative
of the entire dataset, which can affect the performance of the model.
➢ Noisy or Biased Labeled Data: The labeled data may be noisy or biased, which can
affect the performance of the model.
Deep learning-based fraud detection involves using neural networks to learn patterns in
data and detect fraud. There are various deep learning algorithms that can be used for fraud
detection, including:
• Convolutional Neural Networks(CNNs):
➢ Convolutional neural networks (CNNs) have emerged as a powerful tool for fraud
detection and prevention in the modern banking industry. They can automatically learn
and extract complex patterns from large volumes of data, making them effective in
detecting fraudulent activities.
➢ In credit card fraud detection, a CNN model has been proposed using Adaptive Synthetic
(ADASYN) sampling, which has achieved high accuracy, precision, and recall rates compared
to other existing studies.
➢ A CNN-based fraud detection framework has also been proposed to capture the intrinsic
patterns of fraud behaviors learned from labeled data. This framework represents abundant
transaction data as a feature matrix, on which a convolutional neural network is applied to
identify a set of latent patterns for each sample.
➢ Additionally, CNN models have been used to detect fraudulent accounts by analyzing their
transaction networks. Three CNN models, namely NTD-CNN, TTD-CNN, and HDF-CNN,
have been created to identify whether a bank account is fraudulent or not.
• Recurrent neural networks (RNNs):
Recurrent Neural Networks (RNNs) are a type of neural network that is particularly well-suited
for fraud detection in banking data, as they are designed to handle sequential data and capture
temporal relationships.
Why RNNs are useful for fraud detection:
➢ Sequential data: Banking data often involves sequential transactions, such as a series
of purchases or withdrawals. RNNs are designed to handle this type of data, allowing
them to capture patterns and relationships between transactions.
➢ Temporal relationships: RNNs can capture temporal relationships between
transactions, such as the timing and frequency of transactions, which can be indicative
of fraudulent activity.
➢ Anomaly detection: RNNs can be trained to detect anomalies in transaction patterns, which
can indicate fraudulent activity.
Types of RNNs used in fraud detection:
➢ Simple RNNs: Simple RNNs are the basic type of RNN, which use a single layer to
process sequential data.
➢ Long Short-Term Memory (LSTM) networks: LSTMs are a type of RNN
that use memory cells to store information for long periods of time, allowing them to
capture long-term dependencies in data.
➢ Gated Recurrent Units (GRUs): GRUs are a type of RNN that use
gates to control the flow of information, allowing them to capture complex
patterns in data. Applications of RNNs in fraud detection:
• Data Preprocessing
Data preprocessing is a crucial step in fraud detection on banking data. It involves transforming
and preparing the data for analysis, which can improve the accuracy and efficiency of fraud
detection models. Here are some common data preprocessing techniques used in fraud detection
on banking data:
➢ Data Cleaning:
▪ Handling missing values: Replace missing values with mean, median, or mode, or impute
them using machine learning algorithms.
▪ Handling outliers: Identify and remove outliers that can affect model performance.
▪ Data normalization: Normalize data to prevent features with large ranges from dominating the
model.
➢ Feature Engineering:
▪ Extracting relevant features: Extract features that are relevant to fraud detection,
such as transaction amount, time of day, and location.
▪ Creating new features: Create new features by combining existing ones, such as
calculating the velocity of transactions.
▪ Feature selection: Select the most relevant features to reduce dimensionality and
improve model performance.
➢ Data Transformation:
➢ Anomaly Detection:
▪ Identifying outliers: Identify outliers in the data using techniques like isolation forest
or local outlier factor.
▪ Anomaly scoring: Assign an anomaly score to each transaction based on its deviation from the
norm.
➢ Data Enrichment:
▪ Integrating external data: Integrate external data, such as IP geolocation or device
information, to enrich the transaction data.
▪ Using graph data: Use graph data, such as transaction networks, to identify complex
patterns and relationships.
➢ Data Split:
▪ Splitting data into training and testing sets: Split the data into training and testing
sets to evaluate the performance of the fraud detection model.
▪ Splitting data into time-based subsets: Split the data into time-based subsets, such
as daily or weekly, to evaluate the performance of the model over time.
➢ Tools and Techniques:
• Python libraries: Pandas, NumPy, Scikit-learn, and Matplotlib are popular
Python libraries used for data preprocessing in fraud detection.
• Data visualization: Data visualization techniques, such as heatmaps and scatter
plots, can be used to identify patterns and anomalies in the data.
• Machine learning algorithms: Machine learning algorithms, such as decision
trees and random forests, can be used for feature engineering and anomaly detection.
➢ Feature Extraction
Here is a more detailed explanation of data preprocessing in fraud detection on banking data:
Step 1: Data Cleaning
• Handling missing values: Replace missing values with mean, median, or mode, or
impute them using machine learning algorithms.
• Handling outliers: Identify and remove outliers that can affect model performance.
• Data normalization: Normalize data to prevent features with large ranges from
dominating the model.
Step 2: Feature Engineering
• Extracting relevant features: Extract features that are relevant to fraud detection,
such as transaction amount, time of day, and location.
• Creating new features: Create new features by combining existing ones, such as
calculating the velocity of transactions.
• Feature selection: Select the most relevant features to reduce dimensionality and
improve model performance.
Step 3: Data Transformation
• Log transformation: Apply log transformation to skewed data, such as transaction
amounts, to reduce skewness.
• Standardization: Standardize data to have a mean of 0 and a standard deviation of 1.
• Encoding categorical variables: Encode categorical variables, such as card type,
using techniques like one-hot encoding or label encoding.
Step 4: Data Aggregation
• Aggregating transaction data: Aggregate transaction data by user, card, or account to
identify patterns and anomalies.
• Calculating statistical features: Calculate statistical features, such as mean,
median, and standard deviation, to capture transaction patterns.
Step 5: Anomaly Detection
• Identifying outliers: Identify outliers in the data using techniques like isolation forest or
local outlier factor.
• Anomaly scoring: Assign an anomaly score to each transaction based on its deviation
from the norm.
Step 6: Data Enrichment
• Integrating external data: Integrate external data, such as IP geolocation or
device information, to enrich the transaction data.
• Using graph data: Use graph data, such as transaction networks, to identify complex
patterns and relationships.
Step 7: Data Split
• Splitting data into training and testing sets: Split the data into training and testing
sets to evaluate the performance of the fraud detection model.
• Splitting data into time-based subsets: Split the data into time-based subsets, such as
daily or weekly, to evaluate the performance of the model over time.
Step 8: Feature Selection
• Selecting relevant features: Select the most relevant features to reduce dimensionality
• Visualizing data distributions: Visualize data distributions to identify patterns and anomalies.
Model Training
➢ Splitting data into training and testing sets: Split the preprocessed data into training and
testing sets to evaluate the performance of the fraud detection model.
➢ Choosing a machine learning algorithm: Choose a suitable machine learning algorithm for
fraud detection, such as supervised learning algorithms (e.g., logistic regression, decision
trees, random forests) or unsupervised learning algorithms (e.g., k-means, hierarchical
clustering).
➢ Training the model: Train the chosen algorithm on the training data to learn the patterns
and relationships between the features and the target variable (fraud or not fraud).
➢ Hyperparameter tuning: Perform hyperparameter tuning to optimize the performance of
the model.
Model Testing
➢ Evaluating model performance: Evaluate the performance of the trained model on the
testing data using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
➢ Confusion matrix analysis: Analyze the confusion matrix to identify the number of
true positives, false positives, true negatives, and false negatives.
➢ ROC-AUC curve analysis: Analyze the ROC-AUC curve to evaluate the model's ability to
distinguish between fraud and non-fraud transactions.
Model Deployment
➢ Recall: The proportion of true positives among all actual fraud transactions.
➢ ROC-AUC: The area under the receiver operating characteristic curve, which
plots the true positive rate against the false positive rate.
Interpretation of Evaluation Metrics
➢ High accuracy: The model is good at classifying transactions correctly, but may
not be sensitive to fraud transactions.
➢ High precision: The model is good at identifying fraud transactions, but may miss
some actual fraud transactions.
➢ High recall: The model is good at detecting all fraud transactions, but may
incorrectly identify some non-fraud transactions as fraud.
➢ High F1-score: The model balances precision and recall well, indicating good
performance in detecting fraud transactions.
➢ High ROC-AUC: The model is good at distinguishing between fraud and non-
fraud transactions, indicating good performance in detecting fraud transactions.
Best Practices
Fraud detection in banking data is a critical task that requires the application of
advanced machine learning and data analytics techniques. The increasing complexity and
sophistication of fraudulent activities necessitate the development of robust and accurate fraud
detection models that can identify and prevent fraudulent transactions in real-time.
In this study, we explored the application of machine learning algorithms for fraud detection on
banking data. We discussed the importance of data preprocessing, feature engineering, and
model selection in building an effective fraud detection model. We also evaluated the
performance of various machine learning algorithms using different evaluation metrics,
including accuracy, precision, recall, F1-score, and ROC-AUC.
The results of our study demonstrate that machine learning algorithms can be highly effective
in detecting fraudulent transactions in banking data. The best-performing algorithm, [insert
algorithm name], achieved an accuracy of [insert accuracy percentage]% and an F1-score of
[insert F1-score percentage]%. These results suggest that machine learning algorithms can be
used to develop robust and accurate fraud detection models that can help prevent financial
losses and protect customers' sensitive information.
9.REFERENCES
➢ Y. Zhang, et al. (2019). A deep learning approach for credit card fraud detection.
[Journal of Intelligent Information
Systems](https://www.blackbox.ai/?q=Journal+of+Intelligent+Information+Systems),
55(1).