Final Presentation

Uploaded by

MOHSIN AKRAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views12 pages

Final Presentation

Uploaded by

MOHSIN AKRAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Credit Card Fraud

Project II Checkpoint I - Artificial Inteligence

Grupo 04 1A
[email protected] Marcelo Henriques Couto
[email protected] Francisco Pinto de Oliveira
Faculdade de Engenharia da Universidade do Porto

June 6, 2022

1 / 12
Problem Specification

Credit card fraud is a crime easily stopable if detected. However, this detection must be
quick to occur, as transactions are supposed to be fast. The objective of this project is
to develop a machine learning model based on supervised learning classification
algorithms able to distinguish, based on some input data, if a given transaction related to
a bank account is fraudulent or not.
In order to come up with the ideal model, we must experiment with at least three
different classification algorithms, as well as carry out an Exploratory Data Analasys.

2 / 12
Problem Specification
The dataset contains the following data:
• Distance from home - continuous
• Distance from last transaction - continuous
• Ratio to median purchase price: Ratio of the value of the transaction to the
median transaction value - continuous
• Repeat retailer: Whether the retailer where the transaction happened had other
transactions registered for the same person
• Used chip: Transcation via credit card - discrete, binary
• Used pin number - discrete, binary
• Online order - discrete, binary
• Fraud - discrete, binary - label
We considered that all distances are expressed in kilometers.
3 / 12
Related Work
Code
• Imbalanced Learn - over sampling and undersampling tools
• Scikit Learn - machine learning algorithms

Websites
• https://machinelearningmastery.com/what-is-imbalanced-classification/
• https://developers.google.com/machine-learning/data-prep/construct/sampling-
splitting/imbalanced-data

Articles
• Bekkar, M., Djemaa, H. K., Alitouche, T. A. (2013). Evaluation measures for
models assessment over imbalanced data sets. J Inf Eng Appl, 3(10).
• Tharwat, A. (2021), ”Classification assessment methods”, Applied Computing and
Informatics, Vol. 17 No. 1, pp. 168-192.
4 / 12
Tools and Algorithms

Tools
We will use Python as programming language, programming in a Jupyter Notebook
environment. For machine learning algorithms, we will resource to Scikit-Learn and
Imbalanced Learn libraries, as well as Pandas to read and handle the data and Seaborn
and Matplotlib to visualize it.

Algorithms
Because our dataset is imbalanced, we plan to explore Synthetic Minority
Oversampling and Undersampling techniques to handle this issue.
For the classification we plan on using Logistic Regression, Random Forest, Decision
Trees, Neural Networks and possibly others.

5 / 12
Implementation Already Carried Out
Language: Python - Anaconda
Environment: Visual Studio Code using JupyterNotebooks
Data Structures: The dataset is represented as a pandas.DataFrame Data
Preprocessing and Exploratory Data Analysis:
• Analysis of dataset’s validity and integrity
• Analysis of dataset balance
• Analysis of outliers, missing values and other errors
• Analysis of correlations between the multiple variables

Model Training
• Logistic Regression with cross validation using Area Under ROC Curve as evaluation
measure
6 / 12
Data Pre-Processing

Our data set presented to be ready to use as:

• There were no missing values

• There were no apparent measurement
errors
• There were many outliers but they did
not represent any errors and were, in all
likelihood, the exact object of our
analysis

Figure: Boxplots

7 / 12
Exploratory Data Analysis

Performing an Exploratory Data Analysis

we concluded that:
• As previously mentioned, our data set
was moderately imbalanced
(classification label is binary and 8.7%
of the entries represent fraudulent
transactions).
• Many of the features had little relation
with the label feature, as their values
did not vary much between the two
label classes and their correlation was
very low.
Figure: Correlation Matrix

8 / 12
Machine Learning Models
Thealgorithms we chose to train our models were:
•Logistic Regression(LR): Good with binary data, easy and simple to understand
•Decision Tree(DT): Decently robust to imbalanced datasets and irrelevant features
•K Nearest Neighbours(KNN): Overall simple
•Random Forest(RF): Handles imbalanced datasets well and is impervious to
overfitting
• Multilayer Perceptron(MLP): Very powerful all arround
Notes:
• KNN was only tested with smaller data sets due to elevated training and testing
times (lack of processing power)
• Logistic Regression was tested with a dataset where a column had been removed, in
an attempt to fight the algorithm’s sensitivity to irrelevant features
• All algorithms but MLP were tested with Grid Search with portions of the data set,
in order to find the best parameters for the final model
9 / 12
Model Evaluation and Comparison
Model Time to train (s) Time to test (s) Recall Roc Auc Precision Accuracy Balanced Accuracy
lr control full 19.497249 0.005409 0.599163 0.796077 0.890613 0.95874 0.796077
lr smoted full 55.498035 0.015769 0.980637 0.943067 0.497086 0.912032 0.943067
lr weight full 120.202866 0.016395 0.97291 0.94429 0.523563 0.920648 0.94429
dt control full 12.073779 0.008223 0.999908 0.999952 0.999954 0.999988 0.999952
dt smoted full 26.15653 0.016174 0.999862 0.999922 0.999816 0.999972 0.999922
dt weight full 6.897408 0.015349 0.999908 0.999952 0.999954 0.999988 0.999952
mlp normal full 960.845785 0.208947 0.892052 0.945036 0.977226 0.988804 0.945036
rf control full 361.549095 1.325936 0.999908 0.999954 1 0.999992 0.999954
rf smoted full 132.806744 0.117373 0.999908 0.99995 0.999908 0.999984 0.99995
rf weight full 303.098518 0.771723 0.999908 0.999952 0.999954 0.999988 0.999952
knn control 10p 19.848561 47.906985 0.828354 0.904025 0.79605 0.966481 0.904025
knn smoted 10p 233.760107 181.564677 0.989553 0.993826 0.980322 0.997353 0.993826
knn none 10p 119.013859 121.487901 0.983827 0.991484 0.990951 0.997803 0.991484

Note: control are models trained with default parameters, weight are models trained on
normal data sets but best parameters in grid search (focusing fixing the imbalance with
class weights), smote are models trained with best parameters from grid search and using
SMOTE in training data set, none is the same as weight but without class weights (for
parameter details, check notebook)
10 / 12
Model Evaluation and Comparison

Figure: Model Scores and Times

11 / 12
Conclusions
From the graphs and table we are able to conclude that:
• The best models were the ones based in Random Forest and Decision Tree
algorithms, which goes accordingly to their resistance against imbalance. Random
Forest did not present much advantage as the Decision Tree models already featured
perfect scores and clearly did not suffer from overfitting
• The worse models were the ones based in Logistic Regression algorithms, most likely
because of their sensitivity to imbalance and overall lack of robustness
• K Nearest Neighbours models were way too slow to be a viable option, so much so
that we only trained and tested with 10% of the data set
• Multi-layer Perceptron based models were not worth the complexity, as their results
were not their great and the training time was big
• SMOTE and class weights had a decent impact in LR, yet they did not show mutch
effects on the other algorithms
12 / 12

Machine Learning
100% (1)
Machine Learning
62 pages
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
No ratings yet
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
144 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
100% (1)
40 Interview Questions Asked at Startups in Machine Learning - Data Science
33 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Default of Credit Card Clients
No ratings yet
Default of Credit Card Clients
33 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Report On Loan-Prediction Using Machine Learning
No ratings yet
Report On Loan-Prediction Using Machine Learning
25 pages
Artigos 9 BISWorkshop
No ratings yet
Artigos 9 BISWorkshop
700 pages
FULLTEXT01
No ratings yet
FULLTEXT01
68 pages
In5490 Classification
No ratings yet
In5490 Classification
85 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
Presentation Credit Card
No ratings yet
Presentation Credit Card
25 pages
ML Models
No ratings yet
ML Models
21 pages
Group B: Machine Learning
No ratings yet
Group B: Machine Learning
25 pages
INT354 - Unit 2
No ratings yet
INT354 - Unit 2
26 pages
5 Markd
No ratings yet
5 Markd
24 pages
ML Final
No ratings yet
ML Final
34 pages
Credit Fraud
0% (1)
Credit Fraud
67 pages
Untitled Presentation
No ratings yet
Untitled Presentation
21 pages
Lec 2
No ratings yet
Lec 2
13 pages
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
No ratings yet
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
15 pages
ANN, KNN & Decision Tree
No ratings yet
ANN, KNN & Decision Tree
13 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Final Report
No ratings yet
Final Report
17 pages
Generative Evaluation of Audio Representations
No ratings yet
Generative Evaluation of Audio Representations
17 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
التقنية الوسطى سكوباس مع الملخصات احدث 2000 - 30 حزيران 2022 PDF
No ratings yet
التقنية الوسطى سكوباس مع الملخصات احدث 2000 - 30 حزيران 2022 PDF
838 pages
Machine Learning Course Content For Classroomdocx - 240504 - 163403
No ratings yet
Machine Learning Course Content For Classroomdocx - 240504 - 163403
6 pages
Untitled 10
No ratings yet
Untitled 10
12 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Internship Reportfinal
No ratings yet
Internship Reportfinal
21 pages
Report
No ratings yet
Report
14 pages
04 1a-Checkpoint1
No ratings yet
04 1a-Checkpoint1
6 pages
10 Techniques To Deal With Class Imbalance in Machine Learning
No ratings yet
10 Techniques To Deal With Class Imbalance in Machine Learning
10 pages
Ensemble Models For Effective Classification of Big Data With Data Imbalance
No ratings yet
Ensemble Models For Effective Classification of Big Data With Data Imbalance
17 pages
Fraud Detection in Banking Data Using Machine Learning
No ratings yet
Fraud Detection in Banking Data Using Machine Learning
17 pages
Project Lit Final1
No ratings yet
Project Lit Final1
15 pages
Link For Google Colab Note Book: Pa Ge
No ratings yet
Link For Google Colab Note Book: Pa Ge
17 pages
Artigo Fraud-Creditcard
No ratings yet
Artigo Fraud-Creditcard
14 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
Academic Internship Final Report
No ratings yet
Academic Internship Final Report
11 pages
Project Report - Credit Card Fraud Detection
No ratings yet
Project Report - Credit Card Fraud Detection
11 pages
ML Ex 5
No ratings yet
ML Ex 5
6 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Group Assignment - Fraud Detection-1
No ratings yet
Group Assignment - Fraud Detection-1
15 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
3 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Classification
No ratings yet
Classification
4 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
Prac 5
No ratings yet
Prac 5
4 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
7 pages
Research Paper ALAS
No ratings yet
Research Paper ALAS
4 pages
Catboost ET Comparaison
No ratings yet
Catboost ET Comparaison
20 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
B2 19bec113 19bec116 Loan Prediction
No ratings yet
B2 19bec113 19bec116 Loan Prediction
3 pages
Journal Paper
No ratings yet
Journal Paper
5 pages
Synth
No ratings yet
Synth
6 pages
MEE22154 Task2
No ratings yet
MEE22154 Task2
4 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth
No ratings yet
Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth
22 pages
Supervised Machine Learning Algorithms For Credit Card Fraudulent Transaction Detection: A Comparative Study
No ratings yet
Supervised Machine Learning Algorithms For Credit Card Fraudulent Transaction Detection: A Comparative Study
4 pages
TSA Using Matlab
No ratings yet
TSA Using Matlab
30 pages
Supervision of Ethylene Propylene Diene M-Class (EPDM) Rubber Vulcanization and Recovery Processes Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy and Multivariate Analysis
No ratings yet
Supervision of Ethylene Propylene Diene M-Class (EPDM) Rubber Vulcanization and Recovery Processes Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy and Multivariate Analysis
12 pages
BTP Report Final 1
No ratings yet
BTP Report Final 1
28 pages
A Guide To Singular Value Decomposition For Collaborative Filtering
No ratings yet
A Guide To Singular Value Decomposition For Collaborative Filtering
14 pages
Anyasor C Applied Machine Learning. A Practical Guide From Novice To Pro 2024
No ratings yet
Anyasor C Applied Machine Learning. A Practical Guide From Novice To Pro 2024
322 pages
40 - Malware Detection Using Machine Learning and Performance Evaluation
No ratings yet
40 - Malware Detection Using Machine Learning and Performance Evaluation
77 pages
Multimedia and Computer Vision Unit 5
No ratings yet
Multimedia and Computer Vision Unit 5
25 pages
Anomaly Detection and Predictive Maintenance
No ratings yet
Anomaly Detection and Predictive Maintenance
9 pages
Malicious URL
No ratings yet
Malicious URL
11 pages
AP Moller Maersk
No ratings yet
AP Moller Maersk
2 pages
KNN - Model: Train Test CL K
No ratings yet
KNN - Model: Train Test CL K
2 pages
Recommendation Systems Using Nearest Neighbors
No ratings yet
Recommendation Systems Using Nearest Neighbors
8 pages
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
No ratings yet
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
14 pages
Skin Cancer Detection
No ratings yet
Skin Cancer Detection
16 pages
Detection of Driving Fatigue Based On Grip Force On Steering Wheel With Wavelet Transformation and Support Vector Machine
No ratings yet
Detection of Driving Fatigue Based On Grip Force On Steering Wheel With Wavelet Transformation and Support Vector Machine
2 pages
4343 11393 1 PB
No ratings yet
4343 11393 1 PB
4 pages
AI Class 10 Sample Paper 3 Answer Key
No ratings yet
AI Class 10 Sample Paper 3 Answer Key
6 pages
Data Analyst
No ratings yet
Data Analyst
1 page
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
From Everand
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
Adam Jones
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Scikit-Learn Unleashed: A Comprehensive Guide to Machine Learning with Python
From Everand
Scikit-Learn Unleashed: A Comprehensive Guide to Machine Learning with Python
Adam Jones
No ratings yet
Python for Machine Learning: From Fundamentals to Real-World Applications
From Everand
Python for Machine Learning: From Fundamentals to Real-World Applications
Kameron Hussain
No ratings yet

Final Presentation

Uploaded by

Final Presentation

Uploaded by

Credit Card Fraud

Project II Checkpoint I - Artificial Inteligence

Our data set presented to be ready to use as:

• There were no missing values

Performing an Exploratory Data Analysis

Figure: Model Scores and Times

You might also like