Machine Learning CRE
Machine Learning CRE
Seminar Report
On
Machine Learning Algorithms for Credit Card Fraud Detection
Submitted
In Partial Fulfillment
For the requirements for the award of the degree of
Bachelor of Technology
In
Computer Science and Engineering
I hereby declare that the Seminar report submitted by me to Computer Science and
Engineering Department, Rajasthan Technical University, Kota in partial fulfillment of the
requirement for the award of degree of Bachelors of Technology in Computer Science and
Engineering is a record of bonafide work undertaken by me under the guidance of Preeti
Sharma mam.
I further declare that the work reported in this report has not been submitted and will not been
submitted, either in part or in full, for the award of any other degree in this institute or any
other institute or university to the best of my knowledge.
ALTAF MANIYAR
1. Introduction ................................................................................................................ 5
a. Payment Fraud .................................................................................................6-7
b. Financial Fraud ................................................................................................6-7
2. Related Work...........................................................................................................8-11
a. Machine learning approaches ......................................................................... 8-9
b. Ensemble learning........................................................................................10-11
c. Unsupervised methods.................................................................................11-12
3. Data Set Description............................................................................................12-16
a. Class Imbalance ......................................................................................... 12
b. Features..................................................................................................12-15
c. Response Variable....................................................................................... 15
d. Chart............................................................................................................ 16
4. Models.................................................................................................................. 17-23
a. Supervised Models
b. Unsupervised Models
5. Dataset Processing................................................................................................24-27
6. Evaluation Matrics ..............................................................................................28-32
a. Area Under ROC
b. F1 Score
7. Results and Discussion .............................................................................................33
8. Conclusion ................................................................................................................ 34
9. References.............................................................................................................35-37
1. Credit card fraud has become one of the most prevalent forms of financial crime in today's digital age,
costing billions annually worldwide.
2. With the rapid growth of online transactions, credit card fraud is an ever-evolving threat, targeting
individuals and businesses alike.
3. Every year, millions of people fall victim to credit card fraud, highlighting the urgent need for enhanced
security measures.
4. Credit card fraud is a global challenge that exploits technological advancements to steal sensitive financial
information..
5. The convenience of cashless payments has also opened the door to sophisticated credit card fraud schemes,
impacting consumers and institutions.
6. Understanding how credit card fraud works is the first step in protecting yourself from becoming a victim
of this widespread crime.
7. As credit card fraud becomes more advanced, detecting and preventing these crimes is critical for
maintaining financial security in the modern world.
Credit card fraud: Unauthorized use of credit card details for transactions.
Account takeover: Fraudsters gain access to a user’s account and conduct unauthorized transactions.
Fake invoices: Criminals send fraudulent invoices to trick businesses into paying for goods or services
they never received.
The rise of e-commerce, mobile payments, and digital wallets has created new opportunities for fraudsters.
Phishing, malware, and social engineering attacks are frequently used to steal sensitive payment information.
Financial fraud encompasses a broader range of deceptive activities targeting financial assets or services. It
includes:
Investment scams: Fraudsters promise high returns to lure victims into fake investment schemes.
Money laundering: Concealing the origins of illegally obtained money by passing it through legitimate financial
channels.
Corporate fraud: Manipulating financial statements or insider trading to gain unfair advantages.
Research and advancements in credit card fraud detection have focused on developing systems that can
identify fraudulent activities effectively while minimizing false positives. The key areas include:.
Machine Learning Approaches: Previous research has compared various machine learning
algorithms to detect credit card fraud, including Random Forest, Support Vector Machine
(SVM), and Logistic Regression.
Ensemble Learning: Researchers have explored ensemble learning techniques that combine
neural networks and random forests to improve fraud detection.
Unsupervised Methods: Studies have examined the effectiveness of unsupervised methods
like Isolation Forest and Local Outlier Factor (LOF) for identifying fraudulent transactions.
Comparison of Traditional Models: A study compared the performance of traditional machine
learning models, including SVM, Decision Trees, Logistic Regression, and Random Forest, using
a real-world dataset of credit card transactions. Their evaluation included accuracy, sensitivity,
specificity, and precision as metrics.
Context-Aware Learning: Some researchers have proposed context-aware learning approaches for
fraud detection. These methods consider the evolving patterns of fraudsters and aim to adapt
models to changing tactics. This approach addresses the dynamic nature of credit card fraud.
Source: The dataset was obtained from Kaggle, a reputable platform for data science datasets.
Data Completeness: The dataset is free from missing data, NA values, or empty rows, ensuring
data integrity.
Size: It contains a total of 284,807 transactions, making it substantial for analysis.
Class Imbalance: The dataset exhibits a significant class imbalance, with a small fraction of
transactions labeled as fraudulent (0.172%).
Features: It includes 31 columns, comprising transaction details, time, amount, and 28 PCA-
transformed features for privacy protection.
Response Variable: The primary response variable is "class," distinguishing between normal (0)
and fraudulent (1) transactions
.
Supervised Models
Decision Trees
Random Forest Classifier
Unsupervised Models
Isolation Forest
Local Outlier Factor (LOF)
One-Class Support Vector Machine (SVM)
Logistic Regression :
Mathematical Expression:
The logistic regression model estimates the probability of a binary outcome as follows:
Explanation:
P(Y=1∣X) represents the probability of class 1 given the input features X.
e is the base of the natural logarithm.
Decision trees make binary decisions at each node based on feature values, leading to a tree-
like structure.
Explanation:
The tree splits data into different branches until a stopping criterion is met.
Random Forest:
Random forests are an ensemble of decision trees. The final prediction is based on a majority vote
(classification) or averaging (regression) of individual tree predictions.
Explanation:
The isolation score s(x,n) for a data point x in an isolation tree with n nodes can be expressed as:
E(h(x)) is the average path length of data point x.
C(n) is the average path length for an unsuccessful search in a binary tree with n nodes
LOF is an algorithm that measures the local density deviation of a data point with respect to its neighbors,
making it sensitive to local variations in data density. It identifies outliers based on their relative density
compared to their neighbors.
Mathematical Expression:
Mathematical Expression:
In a linear One-Class SVM, the objective is to find the hyperplane w . x - b = 0 that maximizes the margin while
minimizing the number of data points outside the margin. This is typically formulated as:
5. Dataset Processing
Effective fraud detection relies heavily on the proper preparation and processing of datasets. Below are
the key steps involved:.
Data Splitting: The dataset was divided into training and test sets with a 70:30 ratio.
Oversampling: Synthetic Minority Over-sampling Technique (SMOTE) was applied to handle the
imbalanced nature of the data, generating synthetic samples to balance the classes.
Data Transformation: Principal Component Analysis (PCA) was used to transform the dataset,
particularly columns v1-v28, to protect user identities and sensitive features.
Data Summary Statistics: Summary statistics of the dataset were presented, which included details about
the number of fraud and normal transactions, as well as key features like amount and time.
Here used several evaluation metrics to assess the performance of machine learning models. Here are the
evaluation metrics along with their functions:
Function: AUC measures the area under the Receiver Operating Characteristic (ROC) curve, which is a
plot of the true positive rate (sensitivity) against the false positive rate (1-specificity). AUC quantifies the
model's ability to distinguish between classes.
Use: Higher AUC values indicate better model performance. AUC ranges from 0 to 1, where 0.5
represents random guessing, and 1 represents perfect classification.
F1-Score:
Function: The F1-score is the harmonic mean of precision and recall. It balances precision (the number of
true positives divided by the total predicted positives) and recall (the number of true positives divided by
the total actual positives).
Use: F1-score provides a single metric considers both false positives and false negatives. It is useful when
dealing with imbalanced datasets.
Function: The ROC curve is a graphical representation of a classifier's performance across various threshold
settings. It shows the trade-off between true positive rate (sensitivity) and false positive rate(1-
specificity) at different classification thresholds.
Use: ROC curves help visualize and compare the performance of different models. A model with a
curve closer to the top-left corner indicates better performance.
These evaluation metrics are commonly used in binary classification tasks like credit card fraud detection to
assess the model's accuracy, ability to detect fraud cases, and control false positives.
8. Conclusion
Credit Card Fraud Challenge: Credit card fraud remains a significant challenge, causing
substantial financial losses.
Machine Learning Applications: This study demonstrates the application of machine learning to
address credit card fraud, with potential implications for other fraud detection domains.
Model Comparison: The paper compared various machine learning models, including supervised
and unsupervised algorithms, using performance metrics like AUC and F1-score.
Credit Card Fraud Challenge: Credit card fraud remains a significant challenge, causing
substantial financial losses.
Machine Learning Applications: This study demonstrates the application of machine learning to
address credit card fraud, with potential implications for other fraud detection domains.
Model Comparison: The paper compared various machine learning models, including supervised
and unsupervised algorithms, using performance metrics like AUC and F1-score.
1. [1] Merchant savvy, (2020) Global Payment Fraud Statistics, Trends and
Forecasts.https://www.merchantsavvy.co.uk/payment-fraud-statistics/. Updated: October 2020.
2. [2] Puh, M., & Brkić, L. (2019). “Detecting Credit Card Fraud Using Selected Machine Learning
Algorithms,” 42nd International Convention on Information and Communication Technology,
Electronics and Microelectronics (MIPRO), 1250-1255.
3. [3] Hyder John, Sameena Naaz, (2019) “Credit Card Fraud Detection using Local Outlier Factor and
Isolation Forest,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.1060-
1064, 2019.
4. [4] Ishan Sohony, Rameshwar Pratap, and Ullas Nambiar, (2018) “ Ensemble learning for credit card fraud
detection,” In Proceedings of the ACM India Joint International Conference on Data Science and Management
of Data (CoDS-COMAD '18). Association for Computing Machinery, New York, NY, USA, 289–294.
https://doi.org/10.1145/3152494.3156815
5. [5] Campus, kattankulathur, (2018). "Credit card fraud detection using machine learning models and
collating machine learning models." international journal of pure and applied mathematics 118.20 (2018):
825-838.
6. [6] Patil, S., Nemade, V., & Soni, P.K., (2018) “Predictive Modelling For Credit Card Fraud Detection
Using Data Analytics,” Procedia Computer Science, 132, 385-395.