0% found this document useful (0 votes)
88 views

Credit Card Fraud Detection Using Machine Learning

It is difficult for credit card firms to detect malicious activities like fraudulent transactions which cause its users to make payments from their accounts without their knowledge for the items that they did not purchase leading them to financial loss. As the world is moving towards digitalization the use of digital money has also increased which has also led to a rise in fraud associated with them parallelly. There are several methods applied to stop fraudulent activities but fraudsters keep o
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Credit Card Fraud Detection Using Machine Learning

It is difficult for credit card firms to detect malicious activities like fraudulent transactions which cause its users to make payments from their accounts without their knowledge for the items that they did not purchase leading them to financial loss. As the world is moving towards digitalization the use of digital money has also increased which has also led to a rise in fraud associated with them parallelly. There are several methods applied to stop fraudulent activities but fraudsters keep o
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Credit Card Fraud Detection Using Machine Learning


Sarthak Aggarwal1, Vibhuti Nautiyal2, Garima Joshi3, Nishit Galhotra4
1,2,3,4
UG Students, SOC, DIT UNIVERSITY,
Dehradun, Uttarakhand, INDIA

Abstract: It is difficult for credit card firms to detect Fraud detection is a procedure that identifies and stops
malicious activities like fraudulent transactions which scammers from making money in dubious ways. It is a
cause its users to make payments from their accounts collection of actions performed to expose and thwart
without their knowledge for the items that they did not fraudsters' attempts to gain money or property fraudulently.
purchase leading them to financial loss. As the world is The building of a model that will produce the best outcomes
moving towards digitalization the use of digital money in detecting and avoiding the incidence of fraudulent
has also increased which has also led to a rise in fraud transactions is referred to as detecting credit card theft with
associated with them parallelly. There are several machine learning.
methods applied to stop fraudulent activities but
fraudsters keep on trying to find new ways and methods The work of detecting fraud is quite challenging; there
and always come up with unique ideas to break the are many characteristics that must be chosen and
security mechanism to commit fraudulent transactions categorized, and the categorization of these parameters
making billions of losses to banks and credit card users determines the effectiveness of any detection system.
globally. Therefore, there is a great demand for a Furthermore, the current models can only determine the
technique for detecting credit card made fraudulent possibility of a transaction being fraudulent based on the
transaction that not only prevents it but also accurately analysis of user behaviours and activities. They attempt to
and efficiently anticipates before it happens. This paper identify patterns in the way users spend their money and
uses and explains various techniques for detecting credit evaluate if a transaction is legitimate or not.
card fraud, conducts a thorough analysis of both the
existing models and the proposed model, and then Credit card fraud happens usually when there are:
conducts a comparison of these techniques based on
achieved accuracy, false alarm rate, and detection rate. 1. Clone Transactions: - As the name suggests it refers to
a duplication of a transaction. It is an easy way to copy
Keywords:- Random Forest, Logistic Regression, Decision all the information from any existing transaction and are
Tree, SVM (Support Vector Mechanism), False Alarm Rate frequently a well-liked technique for doing transactions
(FAR), Decision Rate. that resemble the real thing.
2. Account Theft: - It often occurs when a person's private
I. INTRODUCTION information, such as login credentials, the answer to a
secret question, their birthdate, or any other information
As the world is moving towards digitalization, the use that is confidential, is taken by the culprit, who can then
of digital money, and Internet Banking has become very use it to carry out money transactions.
common, any individual who is eligible can easily get a 3. False Application Fraud: - As discussed above, account
credit card issued from their bank to make any kind of theft is generally coupled with application fraud. It
online transactions, a credit card is a very thin plastic card signifies a fake account that is often referred to as one
that includes unique credit card number, cardholder’s name, that has been applied for using another individual's name
signature, CVC code and validity information of the card, and identity.
these information is required to make any online transaction, 4. Credit Card Skimming (electronic or manual): -
however with increase of credit card users, the credit card Skimming a credit card refers to producing an
fraudulent activities has also increased parallelly. Today, unauthorized copy of a credit card using a skimmer, a
banks, retail readers, ATMs, and online Internet banking device that reads and copies information from the
systems all read information from credit cards. Its security original card. Using skimmers, fraudsters may copy or
relies on both the plastic card's physical security and the duplicate card numbers and other account information,
confidentiality of the credit card number, which is of the preserve it, and then sell it to other criminals. Both
utmost significance. Credit card fraud essentially refers to manually and electronically are capable of being used.
any activity carried out with the intention of deceiving the 5. Account Takeover: - It is one of the most commonly
card's owner as well as the bank which issued it in order to and widely used fraud technique, here fraudsters send
gain personal information for other fraud activities. To deceptive calls and emails to cardholders, the messages
prevent such transaction, we need a powerful detection they send feels genuine as if they were sent by the bank
system that combat such activities in their initial stages or any other official body, such messages are used to get
before they become successful. and stealing a person's credentials, bank account
numbers, and other confidential data, CVC code or

IJISRT23JUN819 www.ijisrt.com 321


Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
birthdate in order to perform financial profit by the III. LITERATURE REVIEW
fraudsters.
R. M. Jamail Esmaily [1] [2015] In their paper have
II. DIFFERENT TECHNIQUES FOR DETECTING present an anomaly detection approach based on an ANN
CREDIT CARD FRAUD (“Artificial Neural Network”) and “Decision Tree”. The
approach is divided into two stages. A multilayer neural
Since almost all fraudulent transactions comply with a network is used to categorize the data after a decision tree is
related pattern, we can classify transactions as fraudulent or used to create a brand new dataset. There are very few false
legitimate using any pattern recognition algorithm, like SVM detections in this two-level method.
(“Support Vector Machine”), LR (“Logistic Regression”),
ANN (“Artificial Neural Network”), “Naive Bayesian Yashvi Jain, Shripriya Dubey, Namrata Tiwari, and
Network”, KNN (“K-Nearest Neighbor”), “Random Sarika Jain [2] [2019] thoroughly examined several machine
Forest”, “Hidden Markov Models”, a “Fuzzy-Logic-Based learning techniques including ANN. They discovered that
Systems”, and “Decision Trees”. In our proposed model we “Artificial Neural Networks” provide more exact results than
have used the following techniques- “Decision Tree”, “Logistic Regression”, “Support Vector
1. Support Vector Machine: - This is one of the well- Machine” and “K- Nearest Neighbor” technologies.
known statistical learning approaches that has been
proven to be highly successful in a range of classification M. Ramya, K. Anandh Raja and S. Ajith Kumar[11]
tasks, even it may be utilized for regression issues as [2020] observed multiple commonly used fraud detection
well. It is one of the supervised learning algorithms, methods and concluded that by using an API module and
where dataset is split into distinct classes using a predictive analytics the user could be notified in real-time.
hyperplane whose dimensions rely on the features, and
the data points that are closest to the hyperplane within S P Maniraj, Shadab Ahmed, Aditya Saini, and Swarna
each class are those that belong to that class. Deep Sarkar [4] [2019] propose a unique technique for
2. Logistic Regression: - When the dependent variable is detecting fraudulent transactions by using various anomaly
categorical, it is another often- employed strategy. detection algorithms.
Clustering often uses logistic regression, and as a
transaction is processed, it looks at the values of its E. Duman and Y. Sahin [5] [2019] claimed that the
characteristics to determine whether or not it should be decision tree strategy outperforms the SVM approach in
committed. A supervised classification process known as answering the issue.
logistic regression explains the connection between
predictors that may be continuous, binary, or categorical. N. Malini and M. Pushpa [6] [2017] employed KNN
3. Decision Tree: - It is among the effective computational and outlier detection to improve outcomes in fraud detection
tools used for making classification and prediction, it scenarios. The main objective was to reduce false alarms
builds a tree like structure comprising of internal nodes, and raise the rate of fraud detection.
where each branch denotes the results of a test on an
attribute, and a class label is held by each leaf node. It A.S.Malini, J.M Shajitha Banu, M.I Sharmila Fathima
uses DFS or BFS techniques to recursively divide a [7] [2022] They used Isolation Forests along with Area
dataset, and it stops when every element has been given Under Precision-Recall Curve and observed an accuracy of
a certain class. 98.72%.
4. Random Forest: - It is also one of the supervised
learning algorithms that create and merge multiple Kartik Madkaikar, Preity Parab, Manthan Nagvekar,
decision trees into one forest, here the major goal is to Riya Raikar, and Supriya Patil [8] [2021] compared the
work on the collection of decision models rather than implementations of multiple Classification techniques and
relying only on a single model to improve accuracy. The observed Gradient Boosting to be the best with an accuracy
major difference that lies between the “Decision Tree” of 95.90%.
and the “Random Forest” is that the decision tree
produces a single model using the whole dataset, Anuruddha Thennakoon, Shalitha Mihiranga, Chee
whereas the random forest builds several models using Bhagyani, Sasitha Premadasa and Nuwan Kuruwitaarachchi
attributes from the dataset that are randomly chosen. [9] [2019] They addressed four main frauds in real-world
This is the major justification for using the random forest transactions using a series of ML models where the highest
model instead of the decision tree model. accuracy observed was 91% using SVM.

Naresh Kumar Trivedi, Umesh Kumar Lilhore , Sarita


Simaiya, and Sanjeev Kumar Sharma [10] [2020] tested
multiple supervised learning algorithms and observed that
Random Forest gave the maximum accuracy of 94.99%.

IJISRT23JUN819 www.ijisrt.com 322


Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IV. METHODOLOGY produced by Principal component analysis transformation,
with the columns ranging from V1 to V28. These variables
 Dataset Description provide information on the different features of a user’s
The dataset that we have used for our analysis is credit card transactions. The only unmodified features in the
publicly available in the format of a CSV file, obtained from dataset are ‘Time’, ‘Amount’ and ‘Class’. The ‘Class’
Kaggle. The dimensions of the dataset are 284807 rows and contains data in 0 and 1 format where 0 stands for valid
31 columns. The dataset is a collection of 284807 real-world credit card transactions and 1 stand for fraudulent credit card
transactions, out of which 492 transactions are fraudulent. transactions.
The dataset details are in the form of numerical variables

Fig 1 Flow Diagram

The dataset is obtained from Kaggle. After this we our dataset for different models including SVM, Logistic
further analyse the dataset and perform the relevant pre- Regression, Decision Trees and Random Forest. Then we
processing to make the dataset appropriate for our machine also obtain the confusion matrix after applying all the
learning models and removed the unwanted feature time models. At last, we analyse and compare the accuracies of
from our dataset. After cleaning of dataset, we split our all these models.
dataset into testing and training data. We then test and train

Fig 2 Relation between Features and Importance using Random Forest Model.

IJISRT23JUN819 www.ijisrt.com 323


Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 3 Relation between Features and Importance using Decision Tree

 Comparative Analysis 2. Detection Rate also known as Precision is the proportion


We compute the “true positive”, “false positive”, “true of transactions in the sample that the model correctly
negative”, and “false negative” values produced by a identified as either legitimate or unauthorized. Equation
system, also known as the confusion matrix, and use them as of Detection Rate (DR) is given as: -
quantitative measures to assess the performance of various DR = TP / (TP + FP)
models in order to compare these methods.
3. False Alarm Rate measures that how many were wrongly
The "true positive" (TP) rate is the proportion of classified as fraudulent out of total instances classified as
transactions in the dataset that were both fraudulent and fraudulent. Equation of False Alarm (FAR) is given as: -
classified as such. The number of genuine transactions in the FAR = FP/ (FP+TN)
dataset that the system incorrectly identified as fraudulent
transactions is known as a "false positive" (FP), while the The performance of every machine learning model on a
number of fraudulent transactions in the dataset that the set of test data is outlined in a matrix called a “confusion
system incorrectly identified as legitimate transactions is matrix”. It is frequently used to assess the effectiveness of
known as a "false negative" (FN). The number of categorical label prediction algorithms, which try to predict
transactions in the dataset that were both lawful and a category label for each input occurrence.
accurately categorized as being genuine is known as the
"true negative" (TN). Metrics employed in evaluation for The below fig 4 shows the confusion matrix obtained
our model are: by the system after applying Logistic Regression(a),
1. The percentage of transactions that the model accurately Random Forest(b) similarly it also shows the confusion
classifies is known as accuracy. It is among the most matrix for Decision Tree(c) and SVM(d) model representing
popular and widely applied assessment measures. variety values of “True Positives” (TP), “False Positives”
Equation of accuracy (ACC) is given as: - (FP), “True Negatives” (TN), and “False Negatives” (FN)
ACC = (TN + TP)/ (TP + FP + FN + TN) generated by the above-mentioned techniques on the test
data.

IJISRT23JUN819 www.ijisrt.com 324


Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 4 Confusion Matrix

As discussed above after obtaining the confusion matrix, we perform the comparison between model accuracies, the below
figure fig 5 shows the training and testing scores for the different methods that we have implemented in our model.

Fig. 5 Model Accuracy

IJISRT23JUN819 www.ijisrt.com 325


Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Table 1 Comparative Analysis
Techniques Accuracy Detection Rate (Precision) False Alarm Rate (FDR)
Support Vector Machine (SVM) 99.81 99.97 0.158
Random Forest 99.93 99.96 0.047
Decision Tree 99.94 99.99 0.042
Logistic Regression 99.92 99.98 0.063

The above Table 1 shows the comparison between the REFERENCES


accuracy, detection rate also known as precision and false
alarm rate values that we have achieved using different [1]. R. M. Jamail Esmaily, “Intrusion detection system
techniques, it can be clearly identified that we have achieved based on Multilayer perceptron neural networks and
highest accuracy in decision tree (99.94 %) compared to decision tree,” in International Conference on
SVM (99.81), random forest (99.93), and logistic regression Information and Knowledge Technology, 2015.
(99.92). Similarly, we have also achieved high detection rate [2]. Jain, Y. & Tiwari, N. & Dubey, S. & Jain, Sarika.
(99.99 %) and lowest false alarm rate with decision tree (2019). “A comparative analysis of various credit card
(0.042) as compared to other models. fraud detection techniques” in International Journal of
Recent Technology and Engineering. 7. 402-407.
V. CONCLUSION AND FUTURE SCOPE [3]. Siddhartha Bhattacharyya, Sanjeev Jha, Kurian
Tharakunnel, J. Christopher Westland, “Data Mining
for credit card fraud: A comparative study,” Elsevier,
In this project, we have implemented different vol. 50, no. 3, pp. 602613, 2011.
techniques in order to get the best accuracy for the fraud [4]. S P Maniraj, Aditya Saini, Shadab Ahmed, Swarna
detection. As we can see from the Table 1.1, where we have Deep Sarkar, 2019, Credit Card Fraud Detection using
compared different techniques on different grounds, Machine Learning and Data Science, International
Decision. Tree has the best accuracy amongst the Journal of Engineering Research & Technology
implemented techniques. It also has low false alarm rate and (IJERT) Volume 08, Issue 09 (September 2019).
high precision Rate. [5]. Y. Sahin and E. Duman, “Detecting Credit Card Fraud
by Decision Trees and Support Vector Machines,” Int.
Although there are different techniques to find the card Multi-conference Eng. Computer Science, vol. I, pp.
fraud detection but none are able to detect completely. The 442–447, 2011.
models generally detect after fraud has been committed, [6]. N. Malini and M. Pushpa, "Analysis on credit card
each techniques works best for a particular environment, for fraud identification techniques based on KNN and
e.g., Decision tree works best for the already processed and outlier detection," 2017 Third International Conference
sampled data, whereas logistic regression gives best result on Advances in Electrical, Electronics, Information,
on raw and unsampled data. Hence the best solution for this Communication and Bio-Informatics (AEEICB),
problem is to use hybrid techniques in order to nullify the Chennai, 2017, pp. 255-258.
environment constraints and get the better performance. [7]. A.S.Malini, J.M Shajitha Banu, M.I Sharmila Fathima,
“Credit Card Fraud Detection Using Machine
Learning”, June 2022, IJIRT, Volume 9 Issue 1, ISSN:
ACKNOWLEDGEMENT 2349-6002.
[8]. Kartik Madkaikar, Manthan Nagvekar, Preity Parab,
We would like to extend our deepest appreciation to Riya Raikar, Supriya Patil, “Credit Card Fraud
Mr. Ashok Kumar (Assistant Professor, SOC, DIT Detection System”, International Journal of Recent
UNIVERSITY) who has been our guide and mentor and Technology and Engineering (IJRTE) ISSN: 2277-
who made this project possible. The advice and guidance he 3878, Volume-10 Issue-2, July 2021.
provided enabled us to get through each phase of conducting [9]. Anuruddha Thennakoon, Chee Bhagyani, Sasitha
this paper. We would also like to add our severe thanks to Premadasa, Shalitha Mihiranga, Nuwan
DIT University, the institution that gave us both the Kuruwitaarachchi, “Real-time Credit Card Fraud
opportunity and the resources we needed to finish this Detection Using Machine Learning”, 9th International
research project. Conference on Cloud Computing, Data Science &
Engineering (Confluence)
[10]. Naresh Kumar Trivedi, Sarita Simaiya, Umesh Kumar
Lilhore, Sanjeev Kumar Sharma, “An Efficient Credit
Card Fraud Detection Model Based on Machine
Learning Methods”, International Journal of Science
and Technology Vol. 29, No. 5, (2020), pp. 3414 –
3424.

IJISRT23JUN819 www.ijisrt.com 326

You might also like