Credit Card Fraud Detection Using Machine Learning and Blockchain
Credit Card Fraud Detection Using Machine Learning and Blockchain
https://doi.org/10.22214/ijraset.2023.52214
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Abstract: This paper proposes a secure fraud detection model that combines machine learning algorithms and blockchain
technology. According to the study it notes that while blockchain is considered a secure method of integration into finance, fraud
and anomalies are still common in the network. To address this, the we use two machine learning algorithms, XGboost and
KMeans classifier, to classify transactions based on patterns of fraudulent and legitimate transactions .The transaction data is
also classified by supervised machine learning algorithms like Random forests and Decision Tree Classifier . The proposed
model integrates these algorithms with blockchain technology to detect fraudulent transactions in the Etherium network . The
paper includes a security analysis of the proposed smart contract and an attacker model to protect the system from potential
attacks and vulnerabilities. The precision and AUC of the models are also calculated to measure the accuracy of the system.
Overall, the paper proposes an innovative approach to addressing fraud and anomalies in the network using machine learning
and blockchain technology. The integration of these technologies has the potential to improve the security of online transactions
and e-banking systems.
Keywords: anomaly detection; blockchain; fraud detection; machine learning; random forest; XGBoost
I. INTRODUCTION
It is clear that technological advancements have led to the modernization of various industries, including the financial sector, where
traditional currencies are being replaced by digital currencies. However, these transactions are vulnerable to digital attacks, and
detecting fraud and anomalies in digital transactions is critical for maintaining the integrity of the financial system. Anomaly
detection techniques are used to detect illegal and fraudulent activities in financial transactions, but existing methods are designed
for centralized systems. Blockchain technology offers a decentralized and immutable ledger that can address security issues in
centralized systems. Blockchain-based systems have the potential to provide secure and transparent financial transactions while
maintaining privacy. However, malicious actors can still exploit vulnerabilities in the blockchain network, and detecting fraudulent
transactions in the blockchain is a challenging task. The application of machine learning techniques, such as XGBoost and random
forest, to blockchain data has the potential to improve the detection of fraudulent transactions. In this proposed system, machine
learning models are directly linked to the blockchain, and a blockchain-based smart contract is deployed to classify incoming
transactions as fraudulent or legitimate. The proposed system also includes two attacker models to protect against blockchain attacks
.Overall, the proposed system has the potential to improve the security and integrity of financial transactions in the digital age.
However, it is important to note that the effectiveness of the system will depend on the quality and quantity of data used to train the
machine learning models, as well as the ability to adapt to evolving attack patterns.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2745
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
B. Related Work
It is true that blockchain technologies are being deployed in different public and private regions for various objectives, particularly
in protecting and monitoring auditing systems. Blockchain allows for secure and private queries from auditors without exposing
their identities to unauthorized users. However, using blockchain alone for fraud detection may not be sufficient, as it may not
efficiently identify fraudulent transactions. To address this problem, new solutions such as machine learning algorithms are being
used.
Supervised machine learning techniques are particularly useful in detecting fraudulent transactions. Different methods have been
tested, and a comparative analysis of these methods has been presented in various studies. For instance, in [8], the authors proposed
different supervised machine learning solutions for detecting fake businesses and tested them using random forest and XGBoost
classifiers on over 300,000 accounts. XGBoost was also used in [9] for accurate results. In [10], the authors addressed the problem
of an imbalanced dataset, which is a common issue in fraud detection, by applying specific techniques to balance the dataset.
Overall, combining blockchain technologies with machine learning algorithms can provide more robust solutions for fraud detection
and auditing systems. Fraudulent activities in credit card transactions are data mining issues because identifying fraudulent
transactions requires analyzing large amounts of data. However, real-time data for fraud detection is often not readily available to
researchers due to the confidential nature of customer data and banks' privacy policies.
To address these challenges, various approaches have been proposed. In [13], a distributed data mining model was used to address
problems of slanted delivery of credit cards and non-uniform expenditures. In [14], a fraud detection algorithm was presented that
can identify fraud without relying on any fraudulent historical instances, overcoming the cold-start problem. In [15], the authors
suggested and demonstrated the application of uncertain association law mining to extract useful data from credit card transactions.
Other techniques, such as support vector machine models [16] and a combination of Bayesian learning, rule-based learning, and
Dempster–Shafer theory [17], have also been used to decrease wrong identifications of fraud. In [18], a transaction aggregation
technique was used to interpret customer behavior before any transaction is performed and then used to identify fake transactions.
This model can work with unknown datasets and can identify fraudulent transactions while maintaining customer privacy.
It is crucial to ensure the privacy and security of data in cloud-based systems, especially for cyber-physical systems. In [23], the
authors proposed an anomaly detection system that uses machine learning algorithms to detect both insider and outsider attacks. The
system is based on a distributed architecture that preserves the privacy of the data by encrypting it before sharing it with other nodes
in the network. The authors in [24] proposed a privacy-preserving machine learning framework for edge computing. The framework
uses differential privacy and federated learning to train machine learning models on data that is distributed across different edge
devices. The authors demonstrated that the proposed framework can achieve high accuracy while preserving the privacy of the data.
In [25], the authors addressed the issue of privacy in location-based services (LBS) by proposing a privacy-preserving framework
based on blockchain technology. The framework uses a combination of homomorphic encryption and smart contracts to ensure the
privacy of user data while still allowing LBS providers to offer personalized services. In [26], the authors proposed a secure and
privacy-preserving data sharing framework for healthcare applications. The framework uses blockchain technology to ensure data
integrity and privacy, and differential privacy to protect sensitive information. The authors demonstrated the effectiveness of the
proposed framework on a real-world dataset.
Finally, in [27], the authors proposed a privacy-preserving data analysis framework for smart grids. The framework uses
homomorphic encryption and secret sharing to ensure the privacy of the data while allowing the utility company to perform various
data analysis tasks. The authors demonstrated the effectiveness of the proposed framework on a real-world dataset from a smart grid
testbed. Adversarial attacks pose a significant threat to the security and robustness of machine learning models, especially in
sensitive domains such as finance and cyber security. In recent years, researchers have proposed various techniques to mitigate the
impact of such attacks. For example, in [31], the authors proposed a model-agnostic defense approach called Adversarial Training
with Ensemble Diversity (ATED), which combines adversarial training and ensemble learning to improve the model's robustness
against adversarial attacks. The authors of [32] proposed a defense mechanism that uses a conditional generative adversarial
network (cGAN) to generate adversarial examples that are indistinguishable from real examples, thus fooling the attacker's model.
In [33], the authors proposed a method based on gradient regularization to improve the robustness of deep neural networks against
adversarial attacks. Despite these efforts, adversarial attacks remain a challenging problem, and new defense mechanisms need to be
developed to mitigate their impact. In [34], the authors proposed a novel approach that combines multiple defense mechanisms,
including adversarial training, feature squeezing, and gradient masking, to improve the model's robustness against various types of
attacks. In [35], the authors proposed a method based on gradient obfuscation, which modifies the gradient of the model to make it
harder for the attacker to generate effective adversarial examples.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2746
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Other researchers have explored the use of game theory and reinforcement learning to develop more effective defense mechanisms
against adversarial attacks [36].
Overall, adversarial attacks pose a significant challenge to the security and robustness of machine learning models. Although
researchers have proposed various defense mechanisms to mitigate their impact, this remains an active research area with significant
room for improvement.
B. Proposed Model
T he proposed system model that integrates blockchain and machine learning for fraud and anomaly detection in the financial sector
is an innovative approach. The blockchain layer initiates transactions, and then machine learning models are used to classify them as
legitimate or malicious, based on their characteristics.
Binary classification is used to determine if a transaction is fraudulent or not. The machine learning models are trained on a dataset
of bitcoin transactions, which is a popular cryptocurrency used in the financial sector. The dataset is used to identify unusual and
suspicious events that deviate from the normal data patterns.
The random forest and XGboost classifiers are used to classify transactions as legitimate or malicious. These classifiers are also
used to predict incoming transactions, which can help prevent fraudulent activities in the financial sector.
The proposed model is trained and tested using the given dataset to identify legitimate and malicious data patterns. The model's
performance can be evaluated based on metrics such as precision, recall, and F1-score.
Overall, the proposed system model can be a useful tool for fraud and anomaly detection in the financial sector, particularly for
cryptocurrency transactions. The integration of blockchain and machine learning can provide an added layer of security and help
prevent fraudulent activities.
1) SMOTE Analysis
Algorithm 1: Data balancing through SMOTE
1: Initialization
2: Inputs: Minority data M(D)= mi 2 X, Where i = 1,2,3
3: Outputs: Synthetic Data S
4: Number of minority samples (D)
5: Percentage of SMOTE (P)
6: Number of (k) nearest neighbors
7: for n = 1 to D do
8: Find the K nearest neighbors of Di
9: Check P = P/100
10: While P 6= 0 do
11: Select a random sample m in minority class
12: Find neighbor of m
13: Pick a random number a 2 [0, 1]
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2747
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
3) XGBoost
XGBoost is a powerful boosting algorithm that generates a sequence of decision trees. The goal of each subsequent tree is to reduce
the error of the previous tree and update the residual error. This is achieved by building trees sequentially, with each new tree
learning from the errors of the previous trees. In the proposed model, XGBoost is used as a classifier to differentiate between
legitimate and malicious transactions. The algorithm is trained on a dataset of Bitcoin transactions and can accurately classify new
transactions based on their features. Furthermore, the XGBoost algorithm can be connected to a blockchain smart contract to predict
new incoming transactions. This can be useful in real-time fraud detection systems, where quick identification of suspicious
transactions is critical. Overall, the use of XGBoost in the proposed model is an effective approach for fraud detection in
blockchain-based financial transactions. It provides a powerful tool for detecting fraudulent activities, which can help organizations
minimize the impact of such activities on their business.
Algorithm 2: Fraud detection using XGboost
1: Inputs: Balanced Dataset S
2: Outputs: Transactions in Blockchain B
3: Initialization of Dataset
4: Spliting of S into training and testing
5: Xtrain input variables from dataset
6: Ytrain target variables to dataset
7: Xtest input variables from test dataset
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2748
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
4) KMeans Clustering
K-means is a clustering algorithm that groups similar data points together. In the context of fraud detection, K-means can be used to
group together transactions that have similar patterns. By doing so, it can help to identify groups of transactions that are anomalous
or suspicious. We have a dataset of credit card transactions. We can apply K-means clustering to group together transactions that
have similar attributes such as transaction amount, time of day, merchant category code, etc. Once we have these clusters, we can
analyze them to identify any patterns that are unusual or suspicious. For instance, if we find a cluster of transactions with unusually
large transaction amounts or with a high frequency of transactions at merchants with a high risk of fraud, we may flag those
transactions for further investigation. K-means can be a useful tool in fraud detection, especially when used in conjunction with
other techniques such as anomaly detection and supervised learning algorithms. It is important to note, however, that K-means
clustering is not perfect and can have limitations such as sensitivity to initial conditions and the need for the number of clusters to be
specified beforehand. Therefore, it is important to use K-means as part of a comprehensive fraud detection system that includes
multiple techniques and approaches.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2749
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
The proposed system combines the use of blockchain and machine learning for fraud detection. The system receives a new
transaction from the Ethereum network, and the transaction pattern is analyzed and compared to the pattern of bitcoin transactions
stored in the database. The machine learning model is trained on the bitcoin transaction-based dataset and predicts if the new
transaction is legitimate or malicious. If the prediction result is legitimate, the transaction is added to the blockchain. Otherwise, it is
rejected, and the transaction is not added to the blockchain. The system provides a robust mechanism for detecting fraudulent
transactions and ensures the security and privacy of the blockchain network.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2750
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
We recently got a 99.95% accuracy rating for detecting credit card fraud. Given that our data was balanced in favour of one class,
this figure shouldn't come as a surprise. Our model is not overfitted, which is a positive finding from the uncertainty matrix. Finally,
XGBoost is the winner in our situation based on our accuracy score. The data that we have been given for model training is the only
problem with this. The PCA-transformed rendition of the data features. We are doing fantastic if the real features follow a similar
pattern!
IV. CONCLUSION
It protects financial systems from fraudulent attacks. Therefore, a blockchain-based machine learning algorithm is proposed to
secure digital transactions. In this project, various supervised learning approaches to support vector machines, Ada boost and
random forest classifier were used. The proposed model predicts whether the incoming transaction in the blockchain is fraudulent or
not. The proposed model predicts whether the incoming transaction in the blockchain is fraudulent or not. The supervised learning
algorithms allows the model to distinguish between fraudulent and real data. The simulation results show that the proposed
algorithm works adequately to find transaction fraud.
V. FUTURE SCOPE
Our model can be made more precise and accurate by using deep learning algorithms in place of supervised machine learning
algorithms. Our model can be made more efficient against Sybil Attacks where the malicious attacker uses multiple identities. As
we are new into the Blockchain technology and have limited knowledge about it, with proper time given we will be able to explore
more about the Blockchain Technology and use it very effectively in our project. We will try to design and build our model for very
very big datasets in the future.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2751