0% found this document useful (0 votes)
62 views8 pages

Credit Card Fraud Detection

Fraud' in credit card transactions is unauthorized and unwanted usage of an account by someone other than the owner of that account. Necessary prevention measures can be taken to stop this abuse and the behavior of such fraudulent practices can be studied to minimize it and protect against similar occurrences in the future.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views8 pages

Credit Card Fraud Detection

Fraud' in credit card transactions is unauthorized and unwanted usage of an account by someone other than the owner of that account. Necessary prevention measures can be taken to stop this abuse and the behavior of such fraudulent practices can be studied to minimize it and protect against similar occurrences in the future.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

10 XI November 2022

https://doi.org/10.22214/ijraset.2022.47456
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

Credit Card Fraud Detection


Sanskriti Shevgaonkar1, Priyanka Khadse2, Omkar Shinde3, Tanmay Kulkarni4, Prof. Avinash Gondal5
1, 2, 3, 4, 5
Department of Information Technology, Watumull College of Engineering Ulhasnagar

I. INTRODUCTION
'Fraud' in credit card transactions is unauthorized and unwanted usage of an account by someone other than the owner of that
account. Necessary prevention measures can be taken to stop this abuse and the behavior of such fraudulent practices can be studied
to minimize it and protect against similar occurrences in the future. In other words, Credit Card Fraud can be defined as a case
where a person uses someone else’s credit card for personal reasons while the owner and the card issuing authorities are unaware of
the fact that the card is being used. This problem is particularly challenging from the perspective of learning, as it is characterized
by various factors such as class imbalance. The number of valid transactions far outnumber fraudulent ones. Also, the transaction
patterns often change their statistical properties over the course of time.

II. SCOPE
Fraud detection involves monitoring the activities of populations of users in order to estimate, perceive or avoid objectionable
behavior, which consist of fraud, intrusion, and defaulting. This is a very relevant problem that demands the attention of
communities such as machine learning and data science where the solution to this problem can be automated.

III. PLATFORM: GOOGLE COLAB


Google Colab was developed by Google to provide free access to GPU’s and TPU’s to anyone who needs them to build a machine
learning or deep learning model. Google Colab can be defined as an improved version of Jupyter Notebook.
As a programmer, we can perform the following using Google Colab. Write and execute code in Python Document your code that
supports mathematical equations Create/Upload/Share notebooks Import/Save notebooks from/to Google Drive Import/Publish
notebooks from GitHub Import external datasets e.g. from Kaggle Integrate PyTorch, TensorFlow, Keras, OpenCV Free Cloud
service with free GPU.
Colab, or Colaboratory is an interactive notebook provided by Google (primarily) for writing and running Python through a
browser. We can perform data analysis, create models, evaluate these models in Colab. The processing is done on Google-owned
servers in the cloud. We only need a browser and a fairly stable internet connection. Colab is a great alternative tool to facilitate our
work, whether as a student, professional, or researcher. Although Colab is primarily used for coding in Python, apparently we can
also use it for R (#Rstats). We can also run R in Google Colab and can mount Google Drive or access BigQuery in R notebook.

A. Software Specifications
1) Google Colaboratory

B. Hardware Specifications
1) Microsoft® Windows® 7/8/10 (32- or 64-bit)
2) 3 GB RAM minimum, 8 GB RAM recommended;
3) 2 GB of available disk space minimum
4) core processor of i3 minimum or above.

C. Dataset
1) Creditcard.csv which is available on Kaggle. (https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)

D. Packages Requried
1) ranger
2) caret
3) data.table
4) caTools
5) rpart.plot
6) neuralnet
7) gbm
8) pROC

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 988
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

IV. LITERATURE REVIEW


A Fraud act as the unlawful or criminal deception intended to result in financial or personal benefit. It is a deliberate act that is
against the law, rule or policy with an aim to attain unauthorized financial benefit. Numerous literatures pertaining to anomaly or
fraud detection in this domain have been published already and are available for public usage. A comprehensive survey conducted
by Clifton Phua and his associates have revealed that techniques employed in this domain include data mining applications,
automated fraud detection, adversarial detection.
In another paper, Suman, Research Scholar, GJUS&T at Hisar HCE presented techniques like Supervised and Unsupervised
Learning for credit card fraud detection. Even though these methods and algorithms fetched an unexpected success in some areas,
they failed to provide a permanent and consistent solution to fraud detection. A similar research domain was presented by Wen-
Fang YU and Na Wang where they used Outlier mining, Outlier detection mining and Distance sum algorithms to accurately predict
fraudulent transaction in an emulation experiment of credit card transaction data set of one certain commercial bank. Outlier mining
is a field of data mining which is basically used in monetary and internet fields. It deals with detecting objects that are detached
from the main system i.e. the transactions that aren’t genuine. They have taken attributes of customer’s behaviour and based on the
value of those attributes they’ve calculated that distance between the observed value of that attribute and its predetermined value.
Unconventional techniques such as hybrid data mining/complex network classification algorithm is able to perceive illegal instances
in an actual card transaction data set, based on network reconstruction algorithm that allows creating representations of the deviation
of one instance from a reference group have proved efficient typically on medium sized online transaction. There have also been
efforts to progress from a completely new aspect. Attempts have been made to improve the alert feedback interaction in case of
fraudulent transaction. In case of fraudulent transaction, the authorized system would be alerted and a feedback would be sent to
deny the ongoing transaction. Artificial Genetic Algorithm, one of the approaches that shed new light in this domain, countered
fraud from a different direction.
In 2015, J. Esmaily and R. Moradinezhad in their paper proposed a hybrid of artificial neural network and decision tree. In their
model they used a two-phase approach. In first phase the classification results of Decision tree and Multilayer perceptron were used
to generate a new dataset which in second phase is feed into Multilayer perceptron to finally classify the data. This model promises
reliability by giving very low false detection rate. Siddhartha Bhattacharyya and 4 others in their paper in 2011 did a detailed
comparative study of Support vector machine and random forest along with logistic regression. They concluded through
experiments that Random Forest technique shows most accuracy followed by Logistic Regression and Support Vector Machine.

V. IMPLEMENTATION
In the first step of this data science project, we will perform data exploration. We will import the essential packages required for
this role and then read our data. Finally, we will go through the input data to gain nec- essary insights about it.

VI. READING EVENTS FROM CREDITCARD.CSV


Before going to ccfd analysis, the first step is to read the data for performing analysis on. The data is saved in dataset named as
creditcard.csv. This dataset contains 0.28 million record with various features. The events saved in dataset are unstructured. To
perform analysis, reading of data set is done using command “read.csv”.
creditcard_data <- read.csv("/content/creditcard.csv")

Figure1. Credit Card data CSV (1)

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 989
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

Figure2. Credit Card data CSV (2)

A. Data Exploration
First we imported the datasets that contain transactions made by credit cards. we then explored the data that is contained in the
creditcard_data dataframe. After displaying the creditcard_data using the head() function as well as the tail() function, we proceeded
to explore the other components of this dataframe.

B. Data Manipulation
In this section of the project, we scaled the data using the scale() function. We applied this to the amount component of
our creditcard_data amount. With the help of scaling, the data is structured according to a specified range. Therefore, there are no
extreme values in the dataset that might interfere with the functioning of the model.

C. Data Modelling
After standardizing the entire dataset, I split the dataset into training set as well as test set with a split ratio of 0.80. This means that
80% of the data will be attributed to the train_data whereas 20% will be attributed to the test_data. I then found the dimensions
using the dim() function.

VII. FITTING LOGISTIC REGRESSION MODEL


In this section of the project, we fit the first model. we began with logistic regression. we used it for modeling the outcome
probability of fraud/not fraud. we proceeded to implement this model on the test data. Once I summarized the model, we visualized
it through plots. In order to assess the performance of the model, we portrayed the Receiver Optimistic Characteristics or ROC
curve. For this, we first imported the ROC package and then plotted the ROC curve to analyze its performance.
Code:
# Fitting Logistic Regression Model
Logistic_Model=glm(Class~.,test_data,family=binomial())
summary(Logistic_Model)
# Visualizing summarized model through the following plots
plot(Logistic_Model)
# ROC Curve to assess the performance of the model
library(pROC)
lr.predict <- predict(Logistic_Model,test_data, probability = TRUE)
auc.gbm = roc(test_data$Class, lr.predict, plot = TRUE, col = "blue")

A. Fitting a Decision Tree Model


Next, we implemented a decision tree algorithm to plot the outcomes of a decision through which we could conclude as to what
class the object belongs to. we then implemented the decision tree model and plotted it using the rpart.plot() function. we
specifically used the recursive parting to plot the decision tree.
Code:
# Fitting a Decision Tree Model
library(rpart)

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 990
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

library(rpart.plot)
decisionTree_model <- rpart(Class ~ . , creditcard_data, method = 'class')
predicted_val <- predict(decisionTree_model, creditcard_data, type = 'class')
probability <- predict(decisionTree_model, creditcard_data, type = 'prob')
rpart.plot(decisionTree_model

B. Artificial Neural Network


Artificial Neural Networks are a type of machine learning algorithm that are modeled after the human nervous system. The ANN
models are able to learn the patterns using the historical data and are able to perform classification on the input data. We imported
the neuralnet package that allowed me to implement the ANNs. Then we proceeded to plot it using the plot() function. Now, in the
case of Artificial Neural Networks, there is a range of values that is between 1 and 0. I set a threshold of 0.5, that is, values above
0.5 will correspond to 1 and the rest will be 0.
Code:
# Artificial Neural Network
library(neuralnet)
ANN_model =neuralnet (Class~.,train_data,linear.output=FALSE)
plot(ANN_model)
predANN=compute(ANN_model,test_data)
resultANN=predANN$net.result
resultANN=ifelse(resultANN>0.5,1,0)

C. Gradient Boosting (GBM)


Gradient Boosting is a popular machine learning algorithm that is used to perform classification and regression tasks. This model
comprises of several underlying ensemble models like weak decision trees. These decision trees combine together to form a strong
model of gradient boosting. We implemented gradient descent algorithm in the model.
Code:
# Gradient Boosting (GBM)
library(gbm, quietly=TRUE)
# Get the time to train the GBM model
system.time(
model_gbm <- gbm(Class ~ .
, distribution = "bernoulli"
, data = rbind(train_data, test_data)
, n.trees = 500
, interaction.depth = 3
, n.minobsinnode = 100
, shrinkage = 0.01
, bag.fraction = 0.5
, train.fraction = nrow(train_data) / (nrow(train_data) + nrow(test_data))
)
)
# Determine best iteration based on test data
gbm.iter = gbm.perf(model_gbm, method = "test")
model.influence = relative.influence(model_gbm, n.trees = gbm.iter, sort. = TRUE)
#Plot the gbm model
plot(model_gbm)

D. AUC-ROC Curve
In the last section of the project, we calculated and plotted an ROC curve measuring the sensitivity and specificity of the model.
The print command plots the curve and calculates the area under the curve. The area of a ROC curve can be a test of the sensivity
and accuracy of a model.
Code:
# Plot and calculate AUC on test data
library(pROC)
gbm_test = predict(model_gbm, newdata = test_data, n.trees = gbm.iter)
gbm_auc = roc(test_data$Class, gbm_test, plot = TRUE, col = "red")
print(gbm_auc)

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 991
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

VIII. VISUALIZING THE RESULTS IMPLEMENTED USING VARIOUS SET OF ALGORITHMS

ANN Model

Decision Tree model

GBM Mode Normal Model

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 992
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XI Nov 2022- Available at www.ijraset.com

ROC Curve AUC-ROCCurve

IX. CONCLUSION
Concluding our R Data Science project, we learnt how to develop a credit card fraud detection model using machine learning. We
used a variety of ML algorithms to implement this model and also plotted the respective performance curves for the models. We
also learnt how data can be analyzed and visualized to discern fraudulent transactions from other types of data. Hope you enjoyed
this credit card fraud detection project of machine learning using R.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 993

You might also like