0% found this document useful (0 votes)
50 views

Fraud Detection in Cashless Transaction Using Machine Learning

This document describes research on detecting fraud in cashless transactions using machine learning. It proposes using logistic regression to build a classifier to identify fraudulent credit card transactions. Due to issues with imbalanced and noisy data, a pre-processing step is used involving linear regression, logistic regression, and clustering to clean the data before classification. The goal is to develop an effective AI-based fraud detection system that can handle these data challenges and adapt to user behavior over time through cross-validation.

Uploaded by

h20230148
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Fraud Detection in Cashless Transaction Using Machine Learning

This document describes research on detecting fraud in cashless transactions using machine learning. It proposes using logistic regression to build a classifier to identify fraudulent credit card transactions. Due to issues with imbalanced and noisy data, a pre-processing step is used involving linear regression, logistic regression, and clustering to clean the data before classification. The goal is to develop an effective AI-based fraud detection system that can handle these data challenges and adapt to user behavior over time through cross-validation.

Uploaded by

h20230148
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Fraud Detection in Cashless Transactions using

Machine Learning
Shruti N Joshi
Mechanical Engineering, PES University,
Bangalore, Karnataka, India

Aadeesh Jadishkumar
Mechanical Enginerring, PES University,
Bangalore, Karnataka, India

Anush D
Mechanical Engineering, PES University,
Bangalore, Karnataka, India

S. L. Shabareesh
Mechanical Engineering, PES University,
Bangalore, Karnataka, India

Akshay Raj
Mechanical Engineering, PES University,
Bangalore, Karnataka, India

Abstract—Due to increasing number of customers as well as use his UPI id for transaction which could be risky.
companies that uses credit cards to complete transactions, the With rise in online payment usage the number of fraudulent
number of possible fraud cases has also increased dramatically. activities has also increased due to which we now have many
Dealing with noisy and imbalanced data, as well as with outliers, verification methods like one time password, authentication
has accentuated this problem. In this work, fraud detection number, email verification and many other methods. Yet the
using artificial intelligence is proposed. The adjoint system uses number of fraudulent cases hasn’t significantly reduced as
logistic regression to build the classifier to prevent frauds in shown in Figure 1. These card fraud funds are usually used for
credit card transactions. To handle unclean data and to ensure
funding criminal activities, which is very hard to prevent. And
a higher efficiency of detection accuracy, a pre-processing step
these frauds mainly hide themselves in internet as they are
is used. The pre-processing step uses these novel main methods
to clean the data: linear regression, logistic regression and the
able to conceal their identity and location. Recent studies
clustering-based method. shows that these online fraudulent activities have had direct
hit on financial sectors.
Keywords—linear regression; logistic regression; cluster;
artificial intelligence

I. INTRODUCTION
The aim of online fraud is to obtain personal or financial
gain through deception. Based on this, the two main method
for avoiding lose to fraud is to detect and prevent fraud
beforehand. Fraud prevention is an active technique for
avoiding the occurrence of fraudulent act, and fraud detection
is the technique to detect fraudulent act by a fraudster.
Currently we are introduced to number of cashless
transactions methos like credit card, debit card, UPI etc which
are widely popular and also convenient method of payment.
Indeed, advances in digital technologies has set a different
path for us on how we handle money, especially how we
changed from physical transaction to mostly digital using Figure 1: Bar graph of Number of online fraud case and amount in lakhs
electronic means. Online payment fraud is the fraudulent way during the 2016 financial year.
of using someone else’s card details to perform a transaction Source: Ultra News
to either buy a product or purchase any service. During
physical transaction the cardholder is present there hence the Losses due to online fraud mainly impact merchants
chances of fraudulent is minimum. But in online payment the because they bear most of the expenses. All the losses are
buyer has to either enter his card details over the internet or borne by the merchants, leading to increases in the prices of

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


goods and decreases in discounts. Hence, reducing this loss is classification-based systems, clustering-based systems,
highly important. An effective fraud detection system is neural network-based systems, and support vector machine-
required to minimize the number of cases of fraud. based systems. Although AI-based systems can perform
well, they suffer from some critical issues like terms
“unbalanced data” or “noisy data” which might cause them
to result in poor accuracy detection.
A. Problem Statement
According to a study, number of online transactions is in B. Research Questions
lakhs per second worldwide. Supporting to such high On the basis of the empirical evidence, the following
transaction number the number of frauds has also research questions are developed to guide this study and meet
increased exponentially as shown in Table 1 which have its objectives.
placed card transaction as their primary target.
 How can a fraud detection system be built using AI that
can deal with imbalanced data effectively?
 How can we smooth (or clean) the data before using it
for training the machine to ensure high detection
accuracy?
 How can the system detect fraud by adapting to the
Table 1: Frauds in Indian Banking Industry for financial years 2013- behaviour of the user?
2018.
Source: RBI Financial Stability Report 2018
C. Solutions
Since the early ages of cashless payment, the card
companies have been fighting against frauds. Every The solution to the questions in B section can be
year, billions of dollars are getting scammed away as summarized as follows:
directly because of online fraud. Fraud cases occurs  An AI-based system for fraud detection is proposed.
under different conditions like during point of The system uses logistic regression to build a classifier
transaction, online transaction, or transaction done by called the class classifier. The class classifier has the
stolen card. ability to deal with imbalanced data and adapt to the
The increasing capabilities of the attackers or the behavior of the user by employing the cross-validation
hackers have accentuated the problem since these people technique.
can exploit security gaps to obtain sensitive information  To ensure high accuracy detection, two main methods
about users or their credit information to perform are used to clean the data. The mean-based method deals
malicious activities, such as fraud. To define this with missing values, and the clustering-based method
problem accurately, Fig. 2 shows the pictorial deals with outliers.
representation of performing credit card fraud.  Extensive experiments are conducted to train and test
the proposed classifier using a standard database.

II. LITERATURE SURVEY


A. Techniques for Fraud detection:
a. Supervised Classification”
 Need examples of past fraudulent and legitimate
activities
 Highly effective at detecting known fraud types
 futile in detecting novel types
b. Unsupervised Classification:
 Need examples of past legitimate activity only
 Highly effective at detecting novel fraud types
 Effective at known types that deviate from the
Fig 2: Pictorial representation of performing online fraud. “norm”
Source: ResearchGate 2020  The “norm” can be based on a customer
compared with self or other customers at previous
As shown in Figure 2, an attacker can perform fraudulent times.
activities on many sides of online process. To solve this In contrast to supervised methods, unsupervised
problem, we need a highly accurate fraud detection system. methods simply seek those accounts, customers,
Artificial intelligence is defined as the research field that etc. whose behavior is unusual. Unsupervised
aims at performing machine learning to obtain an intelligent methods are useful in applications where there is
machine that can perform tasks on behalf of the user. This no prior knowledge as to the particular class of
can be done through two main steps: training and testing. AI observations in a data set.
is employed to build systems for fraud detection, such as
c. Problems in Online transaction fraud detection: [1] Yousefi, Niloofar, Marie Alaghband, and Ivan Garibay. "A
Comprehensive Survey on Machine Learning Techniques and User
 huge transaction data sets: Authentication Approaches for Credit Card Fraud Detection." arXiv
 most variables will be irrelevant preprint arXiv:1912.02629 (2019).
 most cases not fraud: classic Data Mining needle [2] Paschen, Jeannette, Jan Kietzmann, and Tim Christian Kietzmann.
in haystack problem, only 0.1% transactions are "Artificial intelligence (AI) and its implications for market knowledge
in B2B marketing." Journal of Business & Industrial Marketing
fraudulent (2019).
 Delay in learning class labels [3] Abdallah, Aisha, Mohd Aizaini Maarof, and Anazida Zainal. "Fraud
 Mislabeled classes detection system: A survey." Journal of Network and Computer
Applications 68 (2016): 90-113.
[4] Somasundaram, Akila, and Srinivasulu Reddy. "Parallel and
III. PROPOSED APROACH incremental credit card fraud detection model to handle concept drift
and data imbalance." Neural Computing and Applications 31.1
A. Selecting Dataset (2019): 3-14.
[5] Arun, C., and C. Lakshmi. "Class Imbalance in Software Fault
B. Data Cleaning Prediction Data Set." Artificial Intelligence and Evolutionary
The objective of data preprocessing is to prepare debit Computations in Engineering Systems. Springer, Singapore, 2020.
card transaction standards so that it can be analyzed 745- 757.
quantitatively. The data processing involves the [6] Hala Z Alenzi , Nojood O Aljehane. “Fraud Detection in Credit Cards
following: using Logistic Regression” (IJACSA) International Journal of
1). Data computation aggregate: frequency and Advanced Computer Science and Applications, Vol. 11, No. 12, 2020.
accumulated amount of transaction from each debit card [7] Ermatita , Indrajani Sutedja. “Detection of Frauds for Debit Card
sample and category of the tendency of executing Transactions at Automated Teller Machine in Indonesia Using Neural
transactions from the last 3 months. Network”. IOP Conf. Series: Journal of Physics: Conf. Series 1196
2). Quantization of non-numerical data: converting non- (2019) 012076.
numerical data into numeric. For example, binary [8] Deufel, Patrick, Jan Kemper, and Malte Brettel. "Pay now or pay later:
variable “yes” and “no” is converted to “1” and “0” A cross-cultural perspective on online payments." Journal of
Electronic Commerce Research 20.3 (2019): 141-154.
respectively.
C. Database Division [9] Machine Learning With Python: Linear Regression Multiple Variables.
D. Building the Classifier https://github.com/codebasics/py/blob/master/ML/2_linear_reg_multiv
ariate/2_linear_regression_multivariate.ipynb
E. Testing the classifier
F. Evaluating the classifier
G. Examining the value of accuracy
IV. CONCLUSION
THE DETECTION OF CREDIT CARD FRAUD IS A VITAL
RESEARCH FIELD. THIS IS BECAUSE OF THE INCREASING
NUMBER OF FRAUD CASES IN FINANCIAL INSTITUTIONS. THIS
ISSUE OPENS THE DOOR FOR EMPLOYING ARTIFICIAL
INTELLIGENCE TO BUILD SYSTEMS THAT CAN DETECT FRAUD.
BUILDING AN AI-BASED SYSTEM TO DETECT FRAUD
REQUIRES A DATABASE TO TRAIN THE SYSTEM (OR
CLASSIFIER). THE DATA IN REALITY ARE DIRTY AND HAVE
MISSING VALUES, NOISY DATA, AND OUTLIERS. SUCH ISSUES
NEGATIVELY AFFECT THE ACCURACY RATE OF THE SYSTEM .
TO OVERCOME THESE PROBLEMS, A LOGISTIC REGRESSION-
BASED CLASSIFIER IS PROPOSED. THE DATA ARE FIRST
CLEANED USING TWO METHODS: THE MEAN-BASED METHOD
AND CLUSTERING-BASED METHOD. SECOND, THE CLASSIFIER
IS TRAINED BASED ON THE CROSSVALIDATION TECHNIQUE
(FOLDS=10), WHICH ENSURES THAT THE WHOLE DATABASE IS
USED AS BOTH THE TRAINING DATA SET AND TESTING DATA
SET. FINALLY, THE PROPOSED CLASSIFIER IS EVALUATED
BASED ON THE ACCURACY, SENSITIVITY, AND ERROR RATE
METRICS. THE PROPOSED LOGISTIC REGRESSION-BASED
CLASSIFIER IS COMPARED TO WELL-KNOWN CLASSIFIERS,
WHICH ARE THE K-NEAREST NEIGHBOURS CLASSIFIER AND
THE VOTING CLASSIFIER. THE LOGISTIC REGRESSION-BASED
CLASSIFIER GENERATES THE BEST RESULTS.

REFERENCES

You might also like