0% found this document useful (0 votes)
92 views5 pages

Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G

1) The document discusses credit card fraud detection using machine learning algorithms like isolation forest and local outlier factor. 2) It analyzes a dataset of over 28,500 credit card transactions to identify potentially fraudulent transactions. Metrics like precision, recall, and F1-scores are used to evaluate algorithm accuracy. 3) Visualization techniques are explored to better understand the transaction data distribution and identify patterns that could indicate fraudulent activity. The goal is to detect fraud in real-time before financial losses are incurred.

Uploaded by

Sneha Kodde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views5 pages

Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G

1) The document discusses credit card fraud detection using machine learning algorithms like isolation forest and local outlier factor. 2) It analyzes a dataset of over 28,500 credit card transactions to identify potentially fraudulent transactions. Metrics like precision, recall, and F1-scores are used to evaluate algorithm accuracy. 3) Visualization techniques are explored to better understand the transaction data distribution and identify patterns that could indicate fraudulent activity. The goal is to detect fraud in real-time before financial losses are incurred.

Uploaded by

Sneha Kodde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Special Issue - 2019 International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181
NCRACES - 2019 Conference Proceedings

Credit Card Fraud Detection Using Machine


Learning
Swaroop K Amruta D Sanath J Pooja G
Dept. of ISE Dept. of ISE Dept. of ISE Dept. of ISE
SDMCET SDMCET SDMCET SDMCET
Dharwad, India Dharwad, India Dharwad, India Dharwad, India

Abstract— In todays world, the most easiest mode of payment is patterns is sensed as fraud. Fraud in finance is an ever
credit card for both online and offline. It helps in providing growing issue, resulting in far reaching consequences. Fraud
cashless shopping across the globe. Fraud event occurs only can be defined as criminal cheating with an aim of financial
during online payment as credit card number is sufficient to gain. With an emergence of internet, it has lead to increase in
make transaction which will be on the credit card to make online credit card transactions .As credit card is most prevailing
payment but for offline payment password will be asked so
method, as it attracts more discounts and offers in both stores
during offline transaction frauds cannot occur. In the existing
and e-commerce, it is more vulnerable to fraud events. Credit
system of detecting fraud transaction, the fraud is detected after
the transaction is done. Companies have a detailed analysis of card fraud detection is the science and the art of detecting
transactional and fraud data. Frauds tends to appear in patterns. unusual activity in credit transactions . Fraud occurs when the
In billions of credit card transactions, it is quite difficult to credit card information of the individual is stolen and used to
analyse each in isolation. Having predictive algorithms can help make unauthorized purchases and or withdrawals from the
to detect fraudulent transactions. this is how data mining comes original holders account .A major challenge to credit fraud
into play. Data consists of combination of continuous data and detection research is the availability of the real world data due
nominal data. We can use variety statistical tests to prevent fraud to privacy and legal concerns. Online Shopping is one of the
events. Detecting credit card fraud is still not a perfect science.
largest and fast growing trend and mode of payment will be by
While fraud is still a major financial issue to banks, the
using credit card, debit card and net banking. Online payment
distribution of fraud to non-fraudulent transactions is severely
skewed towards non-fraudulent transactions. Out of an does not require physical card. If credit card details is known
estimated 12 billion transaction made annually 10 million are to others that will become a major risk. Currently, card holder
fraudulent (this shows every transaction in 1200 is fraudulent will come to know only after the fraud transaction is carried
transaction).To analyse and predict fraud events we have used out. No mechanism exist to track fraud transaction. In this
local outlier factor and isolation forest algorithms and thus project, that is exactly what we are going to be doing as well.
calculated number of fraud transactions. We have calculated the Using a dataset of nearly 28,500 credit card transactions and
accuracy and number of errors of both the algorithms. multiple unsupervised anomaly detection algorithms, we are
Keywords: Credit card ,Isolated forest ,Local outlier factor, Fraud going to identify transactions with a high probability of being
detection, Data mining. credit card fraud. Furthermore, using metrics such as
precision, recall, and F1-scores, we will investigate why the
I. INTRODUCTION classification accuracy for these algorithms can be misleading.
In daily routine we use credit cards to buy goods and services In addition, we will explore the use of data visualization
using online transaction or physical card for offline techniques common in data science, such as parameter
transaction .In credit card based purchase, the card holder histograms and correlation matrices, to gain a better
issues his card to merchant to do payment .the person has to understanding of the underlying distribution of data in our
steal the card to make the transaction fraudulent . If the user is data set.
not aware of loss of card it leads to financial loss to the user as II. LITERATURE SURVEY
well as credit card company. When the payment mode is
online, attackers require only little information for doing false In [2] the authors begin by explaining the method used for
transaction. Example card number. The only way to detect transactions through credit cards. They have proposed a
these kind of fraud is to analyse the spending patterns on system in which they integrate their algorithm with the
every card and irregularities are figured with respect to normal payment gateway to detect fraudulence in real time. The
pattern. Fraud which is detected using existing purchase data authors used 7 techniques to develop the algorithm, which are
of card holder is way to reduce the rate of frauds. Every card Neural Networks, Rule Induction, Case-based reasoning,
holder is characterised by patterns containing information Genetic Algorithms, Inductive Logic Programming, Expert
about distinctive purchase category the time since the last Systems, Regression. The authors determined, the ANN
buying, money spent and other things. Falsehood from such method would best serve this problem statement. The output

Volume 7, Issue 10 Published by, www.ijert.org 1


Special Issue - 2019 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCRACES - 2019 Conference Proceedings

of the neural network will be in the form of probability which forest to do anomaly detection.
tells the degree of a transaction being fraudulent. Neural
network are trained on information based on the various 4)The fraud transactions are given to alarm which alerts the
categories about the card holder such as profession of the card user that fraud transaction has occurred and the user can
holder, earnings, about the large amount of purchased are block the card to prevent further financial loss to him as well as
placed. The system will use back propagation learning the credit card company.
algorithm in this phase to train the network. Depending on the
numeric value of probability between 0 and 1, a transaction 5)The valid transactions are treated as genuine transactions.
will be classified into one of the following categories: Non-
Fraudulent , Doubtful , Suspicious and Fraudulent. This
system being developed will particularly focus on the
merchant side of the industry which will be beneficial to the
merchant by reducing the merchant’s losses which he has to
bear if a transaction is fraudulent. Therefore it is limited by
the availability of Merchant side transaction data which is
hard to obtain on scale.

Authors focused on the Chinese market as it is


rapidly growing and fast paced[3]. The authors
proposed a data mining technique using outlier
detection using distance sum to identify fraud
transactions. The authors preferred to use this
Figure 1 . System block diagram of credit card fraud detection
method over traditional statistical methods like
Regression and Discriminant analysis because
outlier detection method is independent of the IV. SOFTWARE IMPLEMENTATION
dataset distribution. The paper used Euclidean We collected the dataset from Kaggle [1].we collected the
distance formula to calculate distance sum to detect source code from GitHub[4]. The datasets contains transactions
made by credit cards in september2013 by European
outliers. The authors calculated a threshold value
cardholders shown in figure 2.
for distance, if the distance is above said threshold,
the object is classified as an anomaly, or in this We imported libraries and printed the versions in our code and
case, a fraud transaction. The authors collected data then we imported necessary packages. we loaded the dataset
from the csv file using pandas. we explored the dataset. we
from a domestic bank in China, with 16000 have 31 different columns as shown in figure 3.v1 to v28 are
observations. The authors achieved a highest the result of PCA dimensionality reduction to protect sensitive
accuracy of 89.4% for threshold value of 12. This information in our dataset like we don’t want to expose identity
method is highly dependent on the nature of and location of an individual. class 0 indicates valid transaction
distribution of the data, and may vary for data and class 1 indicates fraud transaction. we have 284807
transactions with 31 columns. further while exploring dataset
sources of different banks. we noticed that mean values are close to 0 shown in figure 4 it
III. SYSTEM DESIGN means there are more valid transactions than fraud transactions
in our dataset.in order to save time and computational
requirements as it is a large dataset we will take only 10% of
The fraud detection module will work in the following steps:
the data.so now we have 28401 transactions left. now visually
1)The Incoming set of transactions and amount are treated as we plot histogram of each parameter to check if there are any
credit card transactions. unusual parameters as shown in Figure 5.Now we calculated
number of fraud and valid cases and outlier fraction by
2)The credit card transactions are given to machine learning dividing the number of fraud transactions with number of valid
transactions as shown in Figure 7. We constructed correlation
algorithms as an input.
matrix with heat map to know if there is any strong co
3)The output will result in either fraud or valid transaction by relationship between different variables in our dataset as shown
in Figure 6.It also says if there is any strong linear relationship
analyzing the data and observing a pattern and using machine and also to know which all features are important for overall
learning algorithms such as local outlier factor and isolation classification. But we found that most of the values were close
to 0 so hence there was no strong relationships between v

Volume 7, Issue 10 Published by, www.ijert.org 2


Special Issue - 2019 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCRACES - 2019 Conference Proceedings

parameters. We need to format our dataset. We get all columns dimensional array that has class label for samples as shown in
from data frame, filter columns to remove data that we don’t Figure 8.This is unsupervised learning as it is normally
want. We store variable we will be predicting on i.e. X has detected so we do not want labels to be fed into our network.
columns except class label and Y is what we want i.e. it is 1

Figure 2 . Contents of dataset.

Figure 8 . Showing X and Y values.

Figure 3 . Showing 31 columns of our dataset.

Figure 4 . Showing useful information such as mean,count of our dataset.

Figure 5 . Showing histogram of each parameter


.

Volume 7, Issue 10 Published by, www.ijert.org 3


Special Issue - 2019 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCRACES - 2019 Conference Proceedings

when we explored the dataset transactions. We take all outliers,


classify them as 1 i.e. it indicates fraud transactions . We run
classification metrics as it gives useful information such as
method name, number of errors,precision,f1 and recall scores.
Figure 7 . Showing number of valid and fraud cases as well as outlier fraction.
I. Results

For complex datasets like what we had isolation forest is good


method as 30% of time it is able to detect fraud transactions in
local outlier factor method, we have 97 total number of errors
which is relatively high and accuracy of
99.65942207%.Precision and f1- score are not as good. For
class 0 we have precision of 100% and for class 1 it is found to
have very less amount of fraudulent transactions.
In Isolation forest method, we have 71 total number of errors
which is relatively low and accuracy of 99.750711% For class
1 it is found to have 30% precision. f1 scores are good for
isolation forest compared to local outlier factor method.
Isolation forest method was able to produce better results as
shown in Figure 9.

Figure 6 . Showing correlation matrix with heat map.

V. IMPLEMENTATION AND WORKING

Earlier SVM i.e. support vector machines were used for outlier
detection but it took more time for complex datasets. Isolation
forest and local outlier factor are anomaly detection methods
provided by sk learn package. In local outlier factor method,
the anomaly score of each sample is called Local Outlier
Factor. It records the local deviation of density of a given
sample with respect to its neighbors. The anomaly score
depends on how isolated the object is with respect to the
surrounding neighbor. In isolation forest algorithm, it separates
observations by casually selecting a feature and then randomly
selecting a split value between the highest and lowest values of
the selected feature. Recursive partitioning is represented by
tree structure so we should know the number of splitting to
isolate the sample and that is equal to the path length from root
to terminating node. This path length is a measure of normality
and decision function. Random partitioning produces
noticeably shorter paths for anomalies. Forest of random trees Figure 9. Showing the method name, total number of errors,precision,f1,recall
produce shorter path lengths for samples and are more prone to scores.
be anomalies. We get the y prediction values which will be
negative for outlier and 1 for inlier. It is very useful VI. CONCLUSION AND FUTURE WORK
information but we need to process it before we compare to
class label .class label is 1 for fraud event and 0 for valid case. We imported csv data set, preprocessed it, exploring and
We take all inliers, classify them as o i.e. it indicates valid describing data. And plotting histogram to check unusual
parameters. We did correlation matrix to know which
parameters important for our class. Two algorithm used are

Volume 7, Issue 10 Published by, www.ijert.org 4


Special Issue - 2019 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCRACES - 2019 Conference Proceedings

Isolation forest and local outlier factor to do anomaly


detection. In the dataset. We realized the importance of
understanding the data and precision.

We notice that Isolation Forest is good when compared to


Local Outlier Factor in terms of accuracy, number of errors,
precision, f1 and recall scores. In future, we can use Neural
Networks to train our system for still higher accuracy [5]. We
imported csv data set, preprocessed it, exploring and describing
data. And plotting histogram to check unusual parameters. We
did correlation matrix to know which parameters important for
our class. Two algorithms used are Isolation forest and local
outlier factor to do anomaly detection. In the dataset, We
realized the importance of understanding the data and
precision. Fraud detection is a complex issue that requires a
substantial amount of planning before throwing machine
learning algorithms at it. Nonetheless, it is also an application
of data science and machine learning for the good, which
makes sure that the customer’s money is safe and not easily
tampered with. Future work will also include implementing the
system by using neural networks to train the system for
increasing efficiency. Having a data set with non-anonymized
features would make this particularly interesting as outputting
the feature importance would enable one to see what specific
factors are most important for detecting fraudulent
transactions.

Some of the advantages are:

• Reduction in number of fraud transactions.


• User can safely use his credit card for online
transaction.
• Added layer of security.

Some drawbacks that can be further improved upon are:

• Machine learning algorithms work only for huge sets of


data. For smaller amount of data the results may be not
accurate. It takes a significant amount of data for
machine learning models to become accurate. For large
organizations, this data volume is not an issue but for
others, there must be enough data points to identify
legitimate cause and effect relations.

REFERENCES
[1] Datasets. (n.d.). Retrieved from https://www.kaggle.com/datasets
[2] A. Srivastava, M. Yadav, S. Basu, S. Salunkhe and M. Shabad, "Credit
card fraud detection at merchant side using neural networks," 2016 3rd
International Conference on Computing for Sustainable Global
Development (INDIACom), New Delhi, 2016, pp. 667-670.
[3] W. Yu and N. Wang, "Research on Credit Card Fraud Detection Model
Based on Distance Sum," 2009 International Joint Conference on
Artificial Intelligence, Hainan Island, 2009, pp. 353-356.
doi: 10.1109/JCAI.2009.146
[4] Eduonix.(2018,July26).Eduonix/creditcardML.Retrievedfrom
https://github.com/eduonix/creditcardML
[5] https://pythonprogramming.net/neural-networks-machine-learning-
tutorial/

Volume 7, Issue 10 Published by, www.ijert.org 5

You might also like