0% found this document useful (0 votes)
14 views4 pages

Fraud Detection Dummy

Uploaded by

Ishwari Solase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

Fraud Detection Dummy

Uploaded by

Ishwari Solase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Summer of Code 2022

Credit Card Fraud Detection


Mentee: Ram Kandalkar (20D110018) Mentor: Ramkrishna

Introduction

Credit Card Fraud Detection with Machine Learning is a process of data investigation by a
Data Science team and the development of a model that will provide the best results in
revealing and preventing fraudulent transactions. This is achieved through bringing
together all meaningful features of card users’ transactions, such as Date, User Zone,
Product Category, Amount, Provider, Client’s Behavioral Patterns, etc. The information is
then run through a subtly trained model that finds patterns and rules so that it can classify
whether a transaction is fraudulent or legitimate. All big banks like Chase use fraud
monitoring and detection systems.
cost-sensitive learning. Feature 'Class' is
About Dataset :
the response variable and it takes a value
The dataset contains transactions made 1 in case of fraud and 0 otherwise.

by credit cards in September 2013 by


Evaluation Metrics:
European cardholders.
Given the class imbalance ratio, we
This dataset presents transactions that
recommend measuring the accuracy
occurred in two days, where we have 492
using the Area Under the Precision-Recall
frauds out of 284,807 transactions. The
Curve (AUPRC). Confusion matrix accuracy
dataset is highly unbalanced, the positive
is not meaningful for unbalanced
class (frauds) account for 0.172% of all
classification.
transactions.
EDA and Data Pre-processing:

After using .skew() function came to know


that some column values have skewed
data. So we used the cube root function
to remove the skewness of the data. The
Amount column is not standardized so we
use the StandardScaler() from sklearn to
standardize the data.
It contains only numerical input variables
Building the model:
which are the result of a PCA
transformation. Features V1, V2, … V28 So first create a function in which we had
are the principal components obtained defined our model. First, we split the
with PCA, the only features which have dataset as train and test with the split
not been transformed with PCA are 'Time' percentage of 0.3. then we called the
and 'Amount'. Feature 'Time' contains the LogisticRegression(), GaussianNB(), SVC(),
seconds elapsed between each XGBClassifier(), DecisionTreeClassifier()
transaction and the first transaction in the Classifiers from the sklearn library, and
dataset. The feature 'Amount' is the then fitted, predicted the data, and print
transaction Amount, this feature can be the confusion matrics, classification
used for example-dependant

2
report, and the precision, recall, F1 score
After using the smote technique the
and the accuracy of the model.
F1_score was improved a lot.

It's clear from the results that our model


Now, our model is performing, even
accuracy is very good but as it is an
more, better but let's used Deep Learning
imbalance dataset accuracy is not a good
to improve the score again.
parameter to judge it so we should judge
the model by its F1_score. But, the Using Deep Learning:
F1_score is very bad as compared to the
accuracy of the model.

Using SMOTE :

As our dataset is imbalanced the models


are not able to predict well. So we need to
balance out our data set. For this, we will
be using the SMOTE: Synthetic Minority
Oversampling Technique. Using the Deep neural network our score
improved even more Accuracy: 0.999695,
SMOTE is an oversampling technique
Precision: 0.999392, Recall: 1.000000,
where the synthetic samples are
F1_score: 0.999696. We achieved Recall =
generated for the minority class. This
1 i.e we had not predicted any fraud as
algorithm helps to overcome the
the nonfraud transaction. [Code]
overfitting problem posed by random
oversampling. It focuses on the feature Mentee: Ram Kandalkar (20D110018)
space to generate new instances with the Mentor: Ramkrishna
help of interpolation between the positive
instances that lie together.

3
4

You might also like