Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G
Credit Card Fraud Detection Using Machine Learning: Swaroop K Amruta D Sanath J Pooja G
ISSN: 2278-0181
NCRACES - 2019 Conference Proceedings
Abstract— In todays world, the most easiest mode of payment is patterns is sensed as fraud. Fraud in finance is an ever
credit card for both online and offline. It helps in providing growing issue, resulting in far reaching consequences. Fraud
cashless shopping across the globe. Fraud event occurs only can be defined as criminal cheating with an aim of financial
during online payment as credit card number is sufficient to gain. With an emergence of internet, it has lead to increase in
make transaction which will be on the credit card to make online credit card transactions .As credit card is most prevailing
payment but for offline payment password will be asked so
method, as it attracts more discounts and offers in both stores
during offline transaction frauds cannot occur. In the existing
and e-commerce, it is more vulnerable to fraud events. Credit
system of detecting fraud transaction, the fraud is detected after
the transaction is done. Companies have a detailed analysis of card fraud detection is the science and the art of detecting
transactional and fraud data. Frauds tends to appear in patterns. unusual activity in credit transactions . Fraud occurs when the
In billions of credit card transactions, it is quite difficult to credit card information of the individual is stolen and used to
analyse each in isolation. Having predictive algorithms can help make unauthorized purchases and or withdrawals from the
to detect fraudulent transactions. this is how data mining comes original holders account .A major challenge to credit fraud
into play. Data consists of combination of continuous data and detection research is the availability of the real world data due
nominal data. We can use variety statistical tests to prevent fraud to privacy and legal concerns. Online Shopping is one of the
events. Detecting credit card fraud is still not a perfect science.
largest and fast growing trend and mode of payment will be by
While fraud is still a major financial issue to banks, the
using credit card, debit card and net banking. Online payment
distribution of fraud to non-fraudulent transactions is severely
skewed towards non-fraudulent transactions. Out of an does not require physical card. If credit card details is known
estimated 12 billion transaction made annually 10 million are to others that will become a major risk. Currently, card holder
fraudulent (this shows every transaction in 1200 is fraudulent will come to know only after the fraud transaction is carried
transaction).To analyse and predict fraud events we have used out. No mechanism exist to track fraud transaction. In this
local outlier factor and isolation forest algorithms and thus project, that is exactly what we are going to be doing as well.
calculated number of fraud transactions. We have calculated the Using a dataset of nearly 28,500 credit card transactions and
accuracy and number of errors of both the algorithms. multiple unsupervised anomaly detection algorithms, we are
Keywords: Credit card ,Isolated forest ,Local outlier factor, Fraud going to identify transactions with a high probability of being
detection, Data mining. credit card fraud. Furthermore, using metrics such as
precision, recall, and F1-scores, we will investigate why the
I. INTRODUCTION classification accuracy for these algorithms can be misleading.
In daily routine we use credit cards to buy goods and services In addition, we will explore the use of data visualization
using online transaction or physical card for offline techniques common in data science, such as parameter
transaction .In credit card based purchase, the card holder histograms and correlation matrices, to gain a better
issues his card to merchant to do payment .the person has to understanding of the underlying distribution of data in our
steal the card to make the transaction fraudulent . If the user is data set.
not aware of loss of card it leads to financial loss to the user as II. LITERATURE SURVEY
well as credit card company. When the payment mode is
online, attackers require only little information for doing false In [2] the authors begin by explaining the method used for
transaction. Example card number. The only way to detect transactions through credit cards. They have proposed a
these kind of fraud is to analyse the spending patterns on system in which they integrate their algorithm with the
every card and irregularities are figured with respect to normal payment gateway to detect fraudulence in real time. The
pattern. Fraud which is detected using existing purchase data authors used 7 techniques to develop the algorithm, which are
of card holder is way to reduce the rate of frauds. Every card Neural Networks, Rule Induction, Case-based reasoning,
holder is characterised by patterns containing information Genetic Algorithms, Inductive Logic Programming, Expert
about distinctive purchase category the time since the last Systems, Regression. The authors determined, the ANN
buying, money spent and other things. Falsehood from such method would best serve this problem statement. The output
of the neural network will be in the form of probability which forest to do anomaly detection.
tells the degree of a transaction being fraudulent. Neural
network are trained on information based on the various 4)The fraud transactions are given to alarm which alerts the
categories about the card holder such as profession of the card user that fraud transaction has occurred and the user can
holder, earnings, about the large amount of purchased are block the card to prevent further financial loss to him as well as
placed. The system will use back propagation learning the credit card company.
algorithm in this phase to train the network. Depending on the
numeric value of probability between 0 and 1, a transaction 5)The valid transactions are treated as genuine transactions.
will be classified into one of the following categories: Non-
Fraudulent , Doubtful , Suspicious and Fraudulent. This
system being developed will particularly focus on the
merchant side of the industry which will be beneficial to the
merchant by reducing the merchant’s losses which he has to
bear if a transaction is fraudulent. Therefore it is limited by
the availability of Merchant side transaction data which is
hard to obtain on scale.
parameters. We need to format our dataset. We get all columns dimensional array that has class label for samples as shown in
from data frame, filter columns to remove data that we don’t Figure 8.This is unsupervised learning as it is normally
want. We store variable we will be predicting on i.e. X has detected so we do not want labels to be fed into our network.
columns except class label and Y is what we want i.e. it is 1
Earlier SVM i.e. support vector machines were used for outlier
detection but it took more time for complex datasets. Isolation
forest and local outlier factor are anomaly detection methods
provided by sk learn package. In local outlier factor method,
the anomaly score of each sample is called Local Outlier
Factor. It records the local deviation of density of a given
sample with respect to its neighbors. The anomaly score
depends on how isolated the object is with respect to the
surrounding neighbor. In isolation forest algorithm, it separates
observations by casually selecting a feature and then randomly
selecting a split value between the highest and lowest values of
the selected feature. Recursive partitioning is represented by
tree structure so we should know the number of splitting to
isolate the sample and that is equal to the path length from root
to terminating node. This path length is a measure of normality
and decision function. Random partitioning produces
noticeably shorter paths for anomalies. Forest of random trees Figure 9. Showing the method name, total number of errors,precision,f1,recall
produce shorter path lengths for samples and are more prone to scores.
be anomalies. We get the y prediction values which will be
negative for outlier and 1 for inlier. It is very useful VI. CONCLUSION AND FUTURE WORK
information but we need to process it before we compare to
class label .class label is 1 for fraud event and 0 for valid case. We imported csv data set, preprocessed it, exploring and
We take all inliers, classify them as o i.e. it indicates valid describing data. And plotting histogram to check unusual
parameters. We did correlation matrix to know which
parameters important for our class. Two algorithm used are
REFERENCES
[1] Datasets. (n.d.). Retrieved from https://www.kaggle.com/datasets
[2] A. Srivastava, M. Yadav, S. Basu, S. Salunkhe and M. Shabad, "Credit
card fraud detection at merchant side using neural networks," 2016 3rd
International Conference on Computing for Sustainable Global
Development (INDIACom), New Delhi, 2016, pp. 667-670.
[3] W. Yu and N. Wang, "Research on Credit Card Fraud Detection Model
Based on Distance Sum," 2009 International Joint Conference on
Artificial Intelligence, Hainan Island, 2009, pp. 353-356.
doi: 10.1109/JCAI.2009.146
[4] Eduonix.(2018,July26).Eduonix/creditcardML.Retrievedfrom
https://github.com/eduonix/creditcardML
[5] https://pythonprogramming.net/neural-networks-machine-learning-
tutorial/