0% found this document useful (0 votes)
17 views20 pages

Machine Learning CRE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Machine Learning CRE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

A

Seminar Report
On
Machine Learning Algorithms for Credit Card Fraud Detection
Submitted
In Partial Fulfillment
For the requirements for the award of the degree of
Bachelor of Technology
In
Computer Science and Engineering

Submitted To: Submitted By:


Preeti Sharma mam Altaf Maniyar (3rd year)
Assistant Professor College Roll No:22/259
Department of Computer Science University Roll No: 22EUCCS007

Department of Computer Science and Engineering,


University Department
Rajasthan Technical University,Kota
November-2024

Department of Computer Science and Engineering


Student’s Declaration

I hereby declare that the Seminar report submitted by me to Computer Science and
Engineering Department, Rajasthan Technical University, Kota in partial fulfillment of the
requirement for the award of degree of Bachelors of Technology in Computer Science and
Engineering is a record of bonafide work undertaken by me under the guidance of Preeti
Sharma mam.

I further declare that the work reported in this report has not been submitted and will not been
submitted, either in part or in full, for the award of any other degree in this institute or any
other institute or university to the best of my knowledge.

ALTAF MANIYAR

Department of Computer Science and Engineering


ABSTRACT
1. Payment fraud, particularly credit card fraud, is a growing concern, causing
substantial financial losses for organizations, governments, and individuals.
2. Global losses due to payment fraud are projected to increase significantly, with credit
card fraud being a major contributor to these losses.
3. The research aims to leverage machine learning techniques to address the problem
of credit card fraud, which can also be extended to other types of fraud detection.
4. The paper compares the performance of various machine learning models, including
logistic regression, decision trees, random forest classifier, isolation forest, local
outlier factor, and one-class support vector machines, based on metrics such as AUC
and F1-score.
5. To handle the imbalanced nature of the data, a SMOTE (Synthetic Minority Over-
sampling Technique) is applied to oversample the dataset, and the performance of
models on both raw and oversampled data is compared.
6. Results show that the Random Forest classifier outperforms other models in terms of
AUC score and F1-score, both on actual and oversampled data.
7. Oversampling the data significantly improves the performance of the Random Forest
classifier.
8. One-class Support Vector Machines perform better than Isolation Forest in terms of
AUC but have lower F1-scores compared to Isolation Forest.
9. Local Outlier Factor exhibits the poorest performance among the models evaluated.
10. The research underscores the importance of using machine learning for fraud
detection and highlights the need for continued research in this area.

Department of Computer Science and Engineering


List of Contents

1. Introduction ................................................................................................................ 5
a. Payment Fraud .................................................................................................6-7
b. Financial Fraud ................................................................................................6-7
2. Related Work...........................................................................................................8-11
a. Machine learning approaches ......................................................................... 8-9
b. Ensemble learning........................................................................................10-11
c. Unsupervised methods.................................................................................11-12
3. Data Set Description............................................................................................12-16
a. Class Imbalance ......................................................................................... 12
b. Features..................................................................................................12-15
c. Response Variable....................................................................................... 15
d. Chart............................................................................................................ 16
4. Models.................................................................................................................. 17-23
a. Supervised Models
b. Unsupervised Models
5. Dataset Processing................................................................................................24-27
6. Evaluation Matrics ..............................................................................................28-32
a. Area Under ROC
b. F1 Score
7. Results and Discussion .............................................................................................33
8. Conclusion ................................................................................................................ 34
9. References.............................................................................................................35-37

Department of Computer Science and Engineering


1. INTRODUCTION

1. Credit card fraud has become one of the most prevalent forms of financial crime in today's digital age,
costing billions annually worldwide.

2. With the rapid growth of online transactions, credit card fraud is an ever-evolving threat, targeting
individuals and businesses alike.

3. Every year, millions of people fall victim to credit card fraud, highlighting the urgent need for enhanced
security measures.

4. Credit card fraud is a global challenge that exploits technological advancements to steal sensitive financial
information..

5. The convenience of cashless payments has also opened the door to sophisticated credit card fraud schemes,
impacting consumers and institutions.

6. Understanding how credit card fraud works is the first step in protecting yourself from becoming a victim
of this widespread crime.

7. As credit card fraud becomes more advanced, detecting and preventing these crimes is critical for
maintaining financial security in the modern world.

Department of Computer Science and Engineering


1.1 Payment Fraud
Payment fraud refers to any unauthorized transaction or manipulation of payment systems to gain financial
advantage. Common examples include:

Credit card fraud: Unauthorized use of credit card details for transactions.

Chargeback fraud: Customers dispute legitimate transactions to avoid payment.

Account takeover: Fraudsters gain access to a user’s account and conduct unauthorized transactions.

Fake invoices: Criminals send fraudulent invoices to trick businesses into paying for goods or services
they never received.

The rise of e-commerce, mobile payments, and digital wallets has created new opportunities for fraudsters.
Phishing, malware, and social engineering attacks are frequently used to steal sensitive payment information.

1.2 Financial Fraud

Financial fraud encompasses a broader range of deceptive activities targeting financial assets or services. It
includes:

Identity theft: Using someone else's personal information to commit fraud.

Investment scams: Fraudsters promise high returns to lure victims into fake investment schemes.

Money laundering: Concealing the origins of illegally obtained money by passing it through legitimate financial
channels.

Corporate fraud: Manipulating financial statements or insider trading to gain unfair advantages.

Department of Computer Science and Engineering


Seminar Report
2. Related work

Research and advancements in credit card fraud detection have focused on developing systems that can
identify fraudulent activities effectively while minimizing false positives. The key areas include:.

 Machine Learning Approaches: Previous research has compared various machine learning
algorithms to detect credit card fraud, including Random Forest, Support Vector Machine
(SVM), and Logistic Regression.
 Ensemble Learning: Researchers have explored ensemble learning techniques that combine
neural networks and random forests to improve fraud detection.
 Unsupervised Methods: Studies have examined the effectiveness of unsupervised methods
like Isolation Forest and Local Outlier Factor (LOF) for identifying fraudulent transactions.
 Comparison of Traditional Models: A study compared the performance of traditional machine
learning models, including SVM, Decision Trees, Logistic Regression, and Random Forest, using
a real-world dataset of credit card transactions. Their evaluation included accuracy, sensitivity,
specificity, and precision as metrics.
 Context-Aware Learning: Some researchers have proposed context-aware learning approaches for
fraud detection. These methods consider the evolving patterns of fraudsters and aim to adapt
models to changing tactics. This approach addresses the dynamic nature of credit card fraud.

Department of Computer Science and Engineering


Seminar Report
3. Data Set Description

 Source: The dataset was obtained from Kaggle, a reputable platform for data science datasets.
 Data Completeness: The dataset is free from missing data, NA values, or empty rows, ensuring
data integrity.
 Size: It contains a total of 284,807 transactions, making it substantial for analysis.
 Class Imbalance: The dataset exhibits a significant class imbalance, with a small fraction of
transactions labeled as fraudulent (0.172%).
 Features: It includes 31 columns, comprising transaction details, time, amount, and 28 PCA-
transformed features for privacy protection.
 Response Variable: The primary response variable is "class," distinguishing between normal (0)
and fraudulent (1) transactions
.

Department of Computer Science and Engineering


Seminar Report
4. Models

 Supervised Models

 Decision Trees
 Random Forest Classifier

 Unsupervised Models

 Isolation Forest
 Local Outlier Factor (LOF)
 One-Class Support Vector Machine (SVM)

Logistic Regression :
Mathematical Expression:
The logistic regression model estimates the probability of a binary outcome as follows:

Explanation:
P(Y=1∣X) represents the probability of class 1 given the input features X.
e is the base of the natural logarithm.

are the model parameters to be learned.

Department of Computer Science and Engineering


Seminar Report
Decision Trees :

Decision trees make binary decisions at each node based on feature values, leading to a tree-
like structure.

Explanation:

At each node, a decision is made based on a feature and a threshold value.

The tree splits data into different branches until a stopping criterion is met.

Leaf nodes contain class labels or class probabilities.

Department of Computer Science and Engineering


Seminar Report

Random Forest:
Random forests are an ensemble of decision trees. The final prediction is based on a majority vote
(classification) or averaging (regression) of individual tree predictions.
Explanation:

Multiple decision trees are trained on bootstrapped subsets of the data.


Each tree is constructed with a random subset of features.
Ensemble predictions are combined to improve accuracy and reduce overfitting .

Department of Computer Science and Engineering


Seminar Report

Unsupervised Learning Algorithms:


Isolation Forest:
Isolation Forest is an ensemble-based anomaly detection algorithm that identifies anomalies by isolating them
in a binary tree structure. It works by recursively partitioning the data into subsets and measuring the number
of splits required to isolate an anomaly.
Mathematical Expression:

The isolation score s(x,n) for a data point x in an isolation tree with n nodes can be expressed as:
E(h(x)) is the average path length of data point x.
C(n) is the average path length for an unsuccessful search in a binary tree with n nodes

Department of Computer Science and Engineering


Seminar Report

Local Outlier Factor (LOF):

LOF is an algorithm that measures the local density deviation of a data point with respect to its neighbors,
making it sensitive to local variations in data density. It identifies outliers based on their relative density
compared to their neighbors.
Mathematical Expression:

The LOF score for a data point x can be defined as :

N(x) represents the set of neighbors of data point x.


LOF(o) is the LOF score of a neighboring data point o.
|N(x)|is the number of neighbors of x.

Department of Computer Science and Engineering


Seminar Report

One-Class Support Vector Machine (SVM):


One-Class SVM is a binary classification algorithm that learns to separate a single class (inliers) from all other data
points (outliers) in high-dimensional space. It finds a hyperplane that maximizes the margin around the inliers.

Mathematical Expression:

In a linear One-Class SVM, the objective is to find the hyperplane w . x - b = 0 that maximizes the margin while
minimizing the number of data points outside the margin. This is typically formulated as:

Department of Computer Science and Engineering


Seminar Report

5. Dataset Processing
Effective fraud detection relies heavily on the proper preparation and processing of datasets. Below are
the key steps involved:.

 Data Splitting: The dataset was divided into training and test sets with a 70:30 ratio.

 Oversampling: Synthetic Minority Over-sampling Technique (SMOTE) was applied to handle the
imbalanced nature of the data, generating synthetic samples to balance the classes.

 Data Transformation: Principal Component Analysis (PCA) was used to transform the dataset,
particularly columns v1-v28, to protect user identities and sensitive features.

Data Summary Statistics: Summary statistics of the dataset were presented, which included details about
the number of fraud and normal transactions, as well as key features like amount and time.

Department of Computer Science and Engineering


Seminar Report
6. Evaluation Matrics

Here used several evaluation metrics to assess the performance of machine learning models. Here are the
evaluation metrics along with their functions:

Area Under the ROC Curve (AUC):

Function: AUC measures the area under the Receiver Operating Characteristic (ROC) curve, which is a
plot of the true positive rate (sensitivity) against the false positive rate (1-specificity). AUC quantifies the
model's ability to distinguish between classes.

Use: Higher AUC values indicate better model performance. AUC ranges from 0 to 1, where 0.5
represents random guessing, and 1 represents perfect classification.

Department of Computer Science and Engineering


Seminar Report

F1-Score:

Function: The F1-score is the harmonic mean of precision and recall. It balances precision (the number of
true positives divided by the total predicted positives) and recall (the number of true positives divided by
the total actual positives).

Use: F1-score provides a single metric considers both false positives and false negatives. It is useful when
dealing with imbalanced datasets.

Receiver Operating Characteristic (ROC) Curve:

Function: The ROC curve is a graphical representation of a classifier's performance across various threshold
settings. It shows the trade-off between true positive rate (sensitivity) and false positive rate(1-
specificity) at different classification thresholds.

Use: ROC curves help visualize and compare the performance of different models. A model with a
curve closer to the top-left corner indicates better performance.

These evaluation metrics are commonly used in binary classification tasks like credit card fraud detection to
assess the model's accuracy, ability to detect fraud cases, and control false positives.

Department of Computer Science and Engineering


Seminar Report
7. Results and discussion

Results of AUC and F1-Score Comparison Among Models


:
 Test Data Performance: The predictive performance of the models was assessed on a test dataset comprising 30
percent of the entire data.
 Model Performance Comparison: The results in Table 1 reveal that the supervised algorithms achieved notably
high AUC scores and demonstrated good F1-scores.
 Random Forest Classifier Excellence: Among the models, the Random Forest Classifier stood out with the
highest AUC score and F1-score. This suggests that it excels in distinguishing fraudulent from non-fraudulent
transactions.

Department of Computer Science and Engineering


Seminar Report

8. Conclusion

 Credit Card Fraud Challenge: Credit card fraud remains a significant challenge, causing
substantial financial losses.

 Machine Learning Applications: This study demonstrates the application of machine learning to
address credit card fraud, with potential implications for other fraud detection domains.

 Model Comparison: The paper compared various machine learning models, including supervised
and unsupervised algorithms, using performance metrics like AUC and F1-score.

 Credit Card Fraud Challenge: Credit card fraud remains a significant challenge, causing
substantial financial losses.

 Machine Learning Applications: This study demonstrates the application of machine learning to
address credit card fraud, with potential implications for other fraud detection domains.

 Model Comparison: The paper compared various machine learning models, including supervised
and unsupervised algorithms, using performance metrics like AUC and F1-score.

Department of Computer Science and Engineering


Seminar Report
9. References

1. [1] Merchant savvy, (2020) Global Payment Fraud Statistics, Trends and
Forecasts.https://www.merchantsavvy.co.uk/payment-fraud-statistics/. Updated: October 2020.

2. [2] Puh, M., & Brkić, L. (2019). “Detecting Credit Card Fraud Using Selected Machine Learning
Algorithms,” 42nd International Convention on Information and Communication Technology,
Electronics and Microelectronics (MIPRO), 1250-1255.

3. [3] Hyder John, Sameena Naaz, (2019) “Credit Card Fraud Detection using Local Outlier Factor and
Isolation Forest,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.1060-
1064, 2019.

4. [4] Ishan Sohony, Rameshwar Pratap, and Ullas Nambiar, (2018) “ Ensemble learning for credit card fraud
detection,” In Proceedings of the ACM India Joint International Conference on Data Science and Management
of Data (CoDS-COMAD '18). Association for Computing Machinery, New York, NY, USA, 289–294.
https://doi.org/10.1145/3152494.3156815

5. [5] Campus, kattankulathur, (2018). "Credit card fraud detection using machine learning models and
collating machine learning models." international journal of pure and applied mathematics 118.20 (2018):
825-838.

6. [6] Patil, S., Nemade, V., & Soni, P.K., (2018) “Predictive Modelling For Credit Card Fraud Detection
Using Data Analytics,” Procedia Computer Science, 132, 385-395.

Department of Computer Science and Engineering

You might also like