0% found this document useful (0 votes)
45 views8 pages

JETIR2404299

Abc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views8 pages

JETIR2404299

Abc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.

org(ISSN-2349-5162)

Implementation Paper on UPI Fraud Detection using


Machine Learning
1
Miss. Sayalee S. Bodade, 2 Prof. P.P. Pawade

1
PG Scholar, 2 Professor

12
Computer Science & Engineering

12
P. R. Pote (Patil) College of Engineering & Management,

Amravati, Maharashtra, INDIA

Abstract:- activities and trigger alerts for immediate


intervention.
The UPI fraud detection system are to enhance the
security and reliability of digital payment The scope of developing a UPI fraud detection

transactions, ultimately safeguarding users from system is vast and holds significant potential in

fraudulent activities. Firstly, the paper aims to addressing the emerging challenges in the digital

employ advanced machine learning algorithms and payment landscape. Firstly, the paper encompasses
data analytics to analyze transaction patterns and the implementation of cutting-edge technologies such

detect anomalies that may indicate potential fraud. as machine learning, artificial intelligence, and data

Secondly, it seeks to develop a robust system that can analytics to create a sophisticated fraud detection

identify and mitigate various types of UPI fraud, model. This model will have the capability to analyze

including phishing, identity theft, and unauthorized massive datasets of UPI transactions in real-time,

transactions. The paper also aims to create a real-time identifying patterns, anomalies, and trends associated

monitoring mechanism to promptly identify with fraudulent activities.

suspicious

JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c947
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

development of a machine learning model that can


analyze UPI transaction data in real-time to identify
Keywords:-
fraudulent activities. The primary objective is to
Clustering Algorithms, Taxonomy of Clustering create a system that enhances the security of UPI
Algorithms, Challenges in Clustering Algorithms transactions and reduces financial losses due to fraud.

I .Introduction

This introduction will provide an overview of the key II. Literature Survey
components and challenges involved in UPI fraud
In fraud detection, we often deal with highly
detection using machine learning, highlighting the
imbalanced datasets. For the chosen dataset (Paysim),
importance of staying ahead in the ongoing battle
we show that our proposed approaches are able to
against financial fraud in the digital age. With the
detect fraud transactions with very high accuracy and
increasing popularity of digital payment systems like
low false positives – especially for TRANSFER
UPI (Unified Payments Interface), there is a growing
transactions. Fraud detection often involves a
concern about fraud in these platforms. This paper
tradeoff between correctly detecting fraudulent
aims to develop a robust fraud detection system for
samples and not misclassifying many non-fraud
UPI transactions using machine learning techniques.
samples. This is often a design choice/business
UPI fraud detection using machine learning is a
decision which every digital payments company
proactive approach to safeguarding financial
needs to make. We’ve dealt with this problem by
transactions by leveraging the power of artificial
proposing our class weight based approach. We can
intelligence. Machine learning algorithms analyze
further improve our techniques by using algorithms
vast volumes of transaction data, patterns, and user
like Decision trees to leverage categorical features
behaviors to identify and prevent fraudulent activities
associated with accounts/users in Paysim dataset.
in real-time. This technology holds the potential to
Paysim dataset can also be interpreted as time series.
minimize financial losses, protect user privacy, and
We can leverage this property to build time series
enhance the overall security of digital payment
based models using algorithms like CNN. Our
ecosystems.
current approach deals with entire set of transactions

In this era of constant technological evolution, it is as a whole to train our models. We can create user

crucial for financial institutions, finch companies, specific models - which are based on user’s previous

and payment service providers to implement transactional behavior - and use them to further

advanced machine learning models and algorithms to improve our decision making process. All of these,

stay ahead of fraudsters. This approach not only we believe, can be Very effective in improving our

helps in detecting known fraud patterns but also classification quality on this dataset [1]

adapts to emerging threats through continuous


learning and optimization. The project focuses on the
JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c948
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

Now a days Digital transactions are rapidly linear regression, and Gradient Boosting method are
increasing as it results in increasing online compared for detection and prediction of fraud cases
using loan fraudulent manifestations. Further model
Payment frauds too. In fact, according to the Reserve
accuracy metric have been performed with confusion
Bank of India, comparing March 2022 to March
matrix and calculation of accuracy, precision, recall
2019, digital payments have risen in volume and
and F-1 score along with Receiver Operating
value by 216% and 10%, respectively. People are
Characteristic (ROC) curves [3]
starting to go all-in with digital transactions, but one
can’t deny the security issues that loom, and know- Financial fraud, considered as deceptive tactics for
how when it comes to online payments. Few years gaining financial benefits, has recently become a
ago, we could have barely seen the online payment, widespread menace in companies and organizations.
but today UPI payment QR code installed at Conventional techniques such as manual verifications
doorstep. This invited the hoaxers and attackers to and inspections are imprecise, costly, and time
develop fraudulent transactions and fool people for consuming for identifying such fraudulent activities.
some amount of money. Fortunately, the online With the advent of artificial intelligence, machine-
transactions are monitored and hence could be learning-based approaches can be used intelligently
analyses using the latest tools. In this system, an to detect fraudulent transactions by analyzing a large
attempt is made to develop a machine learning model number of financial data. Therefore, this paper
to identify fraudulent transactions in a transaction’s attempts to present a systematic literature review
dataset. [2] (SLR) that systematically reviews and synthesizes the
existing literature on machine learning (ML)-based
Fraud detection for credit/debit card, loan defaulters
fraud detection. Particularly, the review employed the
and similar types is achievable with the assistance of
Kitchenhand approach, which uses well-defined
Machine Learning (ML) algorithms as they are well
protocols to extract and synthesize the relevant
capable of learning from previous fraud trends or
articles; it then report the obtained results. Based on
historical data and spot them in current or future
the specified search strategies from popular
transactions. Fraudulent cases are scant in the
electronic database libraries, several studies have
comparison of non-fraudulent observations, almost in
been gathered. After inclusion/exclusion criteria, 93
all the datasets. In such cases detecting fraudulent
articles were chosen, synthesized, and analyzed. The
transaction are quite difficult. The most effective way
review summarizes popular ML techniques used for
to pre-vent loan default is to identify non-performing
fraud detection, the most popular fraud type, and
loans as soon as possible. Machine learning
evaluation metrics. The reviewed articles showed that
algorithms are coming into sight as adept at handling
support vector machine (SVM) and artificial neural
such data with enough computing influence. In this
network (ANN) are popular ML algorithms used for
paper, the rendering of different machine learning
fraud detection, and credit card fraud is the most
algorithms such as Decision Tree, Random Forest,
popular fraud type addressed using ML techniques.
JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c949
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

The paper finally presents main issues, gaps, and


limitations in financial fraud detection areas and
suggests possible areas for future research. [4]

III. System Diagram

Fig: Transaction History

Fig: Home page

Fig: Payment Receipt Upload

Fig: Sign-Up Page

Fig: Results for Transaction Receipt

Fig: Fraud Detection

JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c950
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

IV. Working Methodology been used in previous work in the field. Naive Bayes
classifiers such as multinomial and complement
Data Cleaning: Some preprocessing of the data was
naive Bayes are common for use in text classification
necessary. Our chosen method could not handle all
due to being fast and simple to implement [18].
comments from the datasets without failing. Since the
Stochastic gradient descent classifier was
data files were read line by line, newlines () within
recommended for use on tweets by Bifet and Frank
the comments had to be removed. Certain emoji’s
[32]. Since YouTube comments are also part of
couldn’t be properly encoded in our chosen file
social media and tend to be of short length, like
format (UTF8) so those emoji characters had to be
tweets, we believe this to be appropriate for this
deleted. This did not affect the results whatsoever
study. Support vector machines are used since they
since the word preprocessing and tokenization we
are effective at a variety of traditional text
implemented through Scikit-learn (Count Vectorizer)
categorization tasks and generally outperform naive
only considers alphanumeric characters for words
Bayes classifiers [18], [40]. Logistic regression is
with the parameters we used [39]. Regex and
another classifier commonly used in sentiment
character replacing were used to make all datasets
analysis [41]. The International Workshop on
adhere to the same format.
Semantic Evaluation (SemEval) had between 2013 -
Training: All classifiers were trained on the training 2018 a task about sentiment analysis on Twitter.
datasets with a test train split of 80/20 percent. This Several years this task included variations of
enabled us to see the accuracy of the classifiers on classifying the tweet on a scale from positive or
the training datasets. The same random state was negative. SVM- and logistic regression-based
used between the classifiers to make sure that the classifiers were used by several teams attempting the
training is reproducible between the classifiers. Text task of classifying tweets on a scale from positive to
feature extraction was done using the bag-of-words negative [42].
model using the Count Vectorizer in Scikit-learn. As
Prediction: Four formulas for making the prediction
mentioned in section 2.5 Sentiment Analysis, this a
were tested. This will be explained below. Prediction
popular approach to feature extraction.
1 / the base prediction assumes that only the number
Classifiers: All used classifiers were used with the of comments classified as positive and negative
standard parameters in Scikit-learn except for logistic contributes to the like proportion. The formula for the
regression where the max parameter was increased base prediction is given below: predicted like
from the default value of 100 to 1000. This was done proportion = Npositive Npositive + Nnegative where
since the logistic regression classifier reached the Npositive & Nnegative are the number of comments
maximum allowed iterations before the optimal classified as positive and negative respectively. A
solution to the classifying problem was found. consequence of this formula for the base prediction is
Classifiers were selected based on what is suitable for that the videos whose comments are only labeled as
text and social media sentiment analysis and what has neutral had to be excluded since the denominator
JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c951
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

would be 0. This causes the size of the testing dataset This way the performance of the four different
to vary by small amounts between the classifiers for predictions and using all configurations of
the base prediction. The following three predictions classifiers and training datasets could be compared
consider neutral comments to some extent. Any
factor for the neutral comments could be used in the
numerator of the predicted like proportion but we Result Interpretation
have only considered those cases we believe make
reasonable assumptions. Prediction 2 assumes that all
Result analysis is a critical phase in building a UPI
comments labeled as neutral contribute to dislikes.
fraud detection system as it assesses the effectiveness
The predicted like proportion for prediction 2 is
and performance of the implemented solution.
given below: predicted like proportion = Npositive
Npositive + Nneutral + Nnegative where Npositive, Accuracy Assessment:
Nneutral &Nnegative are the number of comments
Evaluate the overall accuracy of the UPI fraud
classified as positive, neutral and negative
detection system by comparing the total number of
respectively. Prediction 3 assumes that half of the
correctly identified fraudulent and non-fraudulent
neutral comments contribute to likes and that half of
transactions against the total number of transactions
the neutral comments contribute to dislikes. The
processed. This provides a high-level understanding
predicted like proportion for prediction 3 is given
of the system's efficacy.
below: predicted like proportion = Npositive + 0.5 ·
Nneutral Npositive + Nneutral + Nnegative where Precision and Recall:
Npositive, Nneutral &Nnegative are the number of
Calculate precision and recall to understand the
comments classified as positive, neutral and negative
trade-off between false positives and false negatives.
respectively. Prediction 4 assumes that all neutral
Precision measures the accuracy of positive
comments contribute to likes. The formula is given
predictions, while recall measures the system's ability
below: predicted like proportion = Npositive +
to capture all actual positives. Striking a balance
Nneutral Npositive + Nneutral + Nnegative where
between these metrics is crucial for a reliable fraud
Npositive, Neutral &Negative are the number of
detection system.
comments classified as positive, neutral and negative
respectively. False Positive Rate:

Evaluation: The accuracy of all classifiers on the Analyze the false positive rate, which indicates the
training dataset was calculated. Knowing the actual proportion of legitimate transactions incorrectly
and predicted like proportions on the YouTube flagged as fraudulent. A low false positive rate is
trending dataset, the Pearson correlation, the p-value essential to minimize disruptions for genuine users
for the Pearson correlation, mean absolute error, and while maintaining effective fraud detection.
standard deviation of differences were calculated.
JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c952
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

Receiver Operating Characteristic (ROC) Curve: VI. Acknowledgement

Plot an ROC curve to visualize the trade-off between First and foremost, I would like to express my sincere
true positive rate and false positive rate at various gratitude to my Prof. P. P. Pawade who has in
thresholds. The area under the ROC curve (AUC)
the literal sense, guided and supervised me. I am
provides a comprehensive measure of the model's
indebted with a deep sense of gratitude for the
performance, with a higher AUC indicating better
constant inspiration and valuable guidance
overall performance.
throughout the work

Confusion Matrix Analysis:

Break down the results using a confusion matrix to


Reference
understand the number of true positives, true
negatives, false positives, and false negatives. This [1] Aditya Oza “Fraud Detection using Machine
detailed analysis helps in identifying specific areas Learning” - https://github.com/aadityaoza/CS-229-
for improvement and fine-tuning the model. project.

[2] Ms. Kishori Dhanaji Kadam, Ms. Mrunal Rajesh


Omanna, Ms. Sakshi Sunil Neje, Ms. Shraddha
V. Conclusion
Suresh Nandai. “Online Transactions Fraud
As we progress into an increasingly digitized world, Detection using Machine Learning” Volume 5, Issue
the importance of securing digital payment systems 6 June 2023, pp: 545-548 www.ijaem.net
cannot be overstated. The implementation paper on
UPI fraud detection serves as a proactive measure to
mitigate risks, protect users, and foster the [3] M. Valavan and S. Rita “Predictive-Analysis-
widespread adoption of digital transactions. Hence, based Machine Learning Model for Fraud Detection
we concluded UPI fraud detection using machine withBoosting Classifiers” Computer Systems Science
learniing which is current landscape demands & Engineerin
innovative solutions, and the development of a UPI
[4] Abdulalem Ali 1,,Shukor Abd Razak
fraud detection system aligns with the imperative to
1,2,ORCID,Siti Hajar Othman 1ORCID,Taiseer
create a secure and trustworthy environment for
Abdalla Elfadil Eisa 3,Arafat Al-Dhaqm
financial transactions
1,ORCID,Maged Nasser 4ORCID,Tusneem Elhassan
1,Hashim Elshafie 5 andAbdu Saif 6ORCID
“Financial Fraud Detection Based on Machine
Learning: A Systematic Literature Review”
https://doi.org/10.3390/app12199637.

JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c953
© 2024 JETIR April 2024, Volume 11, Issue 4 www.jetir.org(ISSN-2349-5162)

[5]. Jain AK, Murty MN, Flynn PJ (1999) Data


clustering: a review. ACM Comput Surv (CSUR)
31(3):264–323.

[6]. Roberts SJ (1997) Parametric and non-parametric


unsupervised cluster analysis. Pattern Recognit
30(2):261–272.

[7]. Gan G, Ma C, Wu J (2007) Data clustering:


theory, algorithms, and applications, vol 20. Siam,
Philadelphia.

[8]. Madhulatha TS (2012) An overview on


clustering methods. arXiv preprint arXiv:1205.1117.

[9]. Pearson K (1894) Contributions to the


mathematical theory of evolution. Philos Trans R Soc
Lond A 185:71–110

JETIR2404299 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c954

You might also like