A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Techniques
ABSTRACT
Financial fraud is an ever-growing menace with far consequences in the financial industry.
Data mining had played an imperative role in the detection of credit card fraud in online
transactions. Credit card fraud detection, which is a data mining problem, becomes challenging
due to two major reasons – first, the profiles of normal and fraudulent behaviours change
constantly and secondly, credit card fraud data sets are highly skewed. The performance of fraud
detection in credit card transactions is greatly affected by the sampling approach on dataset,
selection of variables and detection technique(s) used. This work investigates the performance of
different classification algorithm on highly skewed credit card fraud data. Dataset of credit card
transactions is sourced from European cardholders containing 284,807 transactions. A hybrid
technique of under-sampling and oversampling is carried out on the skewed data. The three
techniques are applied on the raw and pre-processed data. The work is implemented in Python.
The performance of the techniques is evaluated based on accuracy, sensitivity, specificity,
precision.
INTRODUCTION
Financial fraud is an ever growing menace with far reaching consequences in the finance
industry, corporate organizations, and government. Fraud can be defined as criminal deception
with intent of acquiring financial gain. High dependence on internet technology has enjoyed
increased credit card transactions. As credit card transactions become the most prevailing mode of
payment for both online and offline transaction, credit card fraud rate also accelerates. Credit card
fraud can come in either inner card fraud or external card fraud. Inner card fraud occurs as a result
of consent between cardholders and bank by using false identity to commit fraud while the external
card fraud involves the use of stolen credit card to get cash through dubious means. A lot of
researches have been devoted to detection of external card fraud which accounts for majority of
credit card frauds. Detecting fraudulent transactions using traditional methods of manual detection
is time consuming and inefficient, thus the advent of big data has made manual methods more
impractical. However, financial institutions have focused attention to recent computational
methodologies to handle credit card fraud problem. Data mining technique is one notable method
used in solving credit fraud detection problem. Credit card fraud detection is the process of
identifying those transactions that are fraudulent into two classes of legitimate (genuine) and
fraudulent transactions. Credit card fraud detection is based on analysis of a card’s spending
behavior.
PROBLEM STATEMENT
Our goal is to implement 3 different machine learning models in order to classify, to the
highest possible degree of accuracy, credit card fraud from a dataset gathered in Europe in 2 days
in September 2013. After initial data exploration, we knew we would implement a logistic
regression model, a k-means clustering model, and a neural network. Some challenges we
observed from the start were the huge imbalance in the dataset: frauds only account for 0.172% of
fraud transactions. In this case, it is much worse to have false negatives than false positives in our
predictions because false negatives mean that someone gets away with credit card fraud. False
positives, on the other hand, merely cause a complication and possible hassle when a cardholder
must verify that they did, in fact, complete said transaction (and not a thief).
SYSTEM ARCHITECTURE
HARDWARE REQUIREMENTS
• System : Pentium IV 2.4 GHz.
• Hard Disk : 500 GB.
• Ram : 4 GB
• Any desktop / Laptop system with above configuration or higher level
SOFTWARE REQUIREMENTS
• Operating system : Windows XP / 7
• Coding Language : Python 3.x and above
• Scripting tool : Jupyter notebook
• Libraries : Pandas, Numpy, Sklearn, Matplotlib, keras.