Comparative Study of Machine Learning Algorithms F
Comparative Study of Machine Learning Algorithms F
net/publication/373613287
Article in International Journal for Research in Applied Science and Engineering Technology · August 2023
DOI: 10.22214/ijraset.2023.55567
CITATIONS READS
0 96
2 authors, including:
Pooja Sri
Jawaharlal Nehru Technological University, Hyderabad
1 PUBLICATION 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Pooja Sri on 17 November 2023.
https://doi.org/10.22214/ijraset.2023.55567
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
Abstract: In the present world, advancement in technologies like e-commerce and financial technology (FinTech) has led to a
surge in the daily volume of online card transactions and there are a lot of issues with credit cards. Identifying CC scams is
currently among the most typical problems worldwide. An individual's credit card information may be fraudulently obtained and
used by criminals for fraudulent purposes. Due to this significant increase in credit card fraud that impacts banks, merchants,
and card issuers, it is essential to develop mechanisms that ensure the security and integrity of credit card transactions. This
research examines divergent approaches to detecting credit card fraud using machine learning (ML) Models. The algorithms
used are the Random Forest algorithm, Decision Trees, and the AdaBoost algorithm. The outcomes of these algorithms are
based on accuracy, precision, recall, and F1-score and AUC-ROC score. Different models are compared and the algorithm that
has good evaluation metrics is considered the best algorithm that is used to detect fraud.
Keywords: Credit card fraud, Machine Learning, Random Forest, Accuracy, Confusion Matrix
I. INTRODUCTION
E-payment systems are crucial in today's culture of intense financial competition. The majority of financial institutions have
improved the public's access to business services through Internet banking in the twenty-first century. The process of purchasing
goods has been greatly simplified. Financial institutions frequently give their customers cards that make it simple for them to make
purchases when they don't have cash on hand. Customers also benefit from credit cards because they are shielded from products that
might be damaged, misplaced, or even stolen. Customers must verify the transaction with the merchant before making any
purchases with their credit card. Credit card fraud occurs when a thief makes purchases using another person's credit card
information without that person's permission. As a result of business migration to the Internet and the electronic financial
transactions that take place in fostering the cashless economy, accurate fraud detection has become essential in protecting such
transactions. There have been billion-dollar losses from credit card theft as a result of the widespread use of credit cards and
insufficient security measures. To reduce their losses, all credit card issuing organizations must now implement reliable fraud
detection systems. Because financial institutions are frequently reluctant to disclose such information, it is challenging to estimate
losses with any degree of precision. As per the Nilson report [6], $408 billion in worldwide losses are projected over the next ten
years, which is based on the global network payment card transactions projected worldwide.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2252
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
Since the card-related information would be the same as that from a well-known, reliable source, it is very challenging to distinguish
the fraudulent transaction from the original transaction.
The architecture diagram in Fig. 1 above provides information on the elements that make it possible for this research's comparative
study. Data is first gathered from the source and sent for preprocessing, where it is cleaned up and processed by having redundant
data, empty spaces, and null values removed. The following step involves extracting the features from the data that are crucial to
building a model for additional classification. After data preparation, it will be divided into training and testing datasets, which are
provided to ML models for fraud prediction.
When splitting the data, various strategies could be used, with the training dataset being used to train the model using the known
statistics and the testing dataset being used to test the trained model's ability to predict fraud.
The results are then assessed using the metrics after a classifier is chosen to distinguish between fraudulent and legitimate
transactions in the following step. The performance analysis is then completed to compare which algorithm performs better based on
metrics such as accuracy score.
IV. IMPLEMENTATION
A. Data Acquisition
We have used real-time credit card transaction data from European cardholders to evaluate and contrast the ML models. In 2 days,
there were 284,807 transactions, 492 of which were marked as fraudulent. Users conduct credit card transactions for online
purchases, and these transactions need to be secure. To protect the confidentiality of user transaction data using the PCA (Principal
Component Analysis) algorithm, people have converted transaction data into numerical format on Kaggle [4].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2253
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
The Random Forest, Decision Forest, and AdaBoost algorithms are trained using the file "creditcard.csv," and after validation, it is
tested to determine whether the dataset contains legitimate or fraudulent transaction transactions. The dataset is split into two
categories as a result: preparation data and experiment data[5], where preparation data is given to the model for training, and
experiment data is used to test the model.
B. Data Preprocessing
Data cleaning, which involves making the raw data ready (i.e., clean and formatted) so that the machine learning model can use it to
make predictions, is the most important stage in the preprocessing of data. In addition to eliminating null values and blank spaces, it
also entails separating the dataset into train and test sets, formatting the data as needed, and performing feature extraction for
analysis.
PCA (principal components analysis) is one of the feature extraction methods used in this study. By converting the transactional
data into a numerical format, PCA helped to reduce the data's dimensionality. The dependence of variables on one another is then
determined by plotting a correlation matrix, as seen in Fig. 2.
The majority of financial transactions, including money transfers and payments made with a debit or credit card, are legitimate. Due
to the fact that only a small percentage of occurrences are fraudulent, the dataset is unbalanced. The dataset's sampling strategy, the
variables selected, and the detecting technique(s) employed have a significant impact on how well card transaction fraud is detected
[8].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2254
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
In this research, the dataset is balanced using the under-sampling technique by reducing the number of transactions from the
majority class, which has typical transactions. After under-sampling, the dataset is visualized in Fig. 4.
2) Decision Tree
A supervised ML model used for decision-making is a decision tree classifier. Both classification and regression problems can be
solved using it. It divides a data set into smaller subsets and has a hierarchy tree-like structure. Every decision tree consists of a set
of nodes and branches, where internal nodes stand in for the decision's features and external or leaf nodes for its results. Branches
are the decision-making criteria or rules that resemble human thought. It will analyze the effects of each branch and visually outline
the results of the complex decision-making problem. Decidual and categorical data are both handled by decision trees. A decision
tree's process moves forward from the root node to predict the input dataset's class. This algorithm tracks the branch and moves to
the following node by comparing the attributes of the actual dataset with those of the root attribute. The algorithm verifies the
attribute value with the other sub-nodes twice before moving. A precise, condensed decision tree is what the Decision Tree model
aims to create. For the purpose of detecting credit card fraud, the decision tree has two stages. The decision tree is first built using
the provided training data, and then decision rules are used to classify incoming credit card transactions.
3) AdaBoost Algorithm
Adaptive Boosting, also known as the AdaBoost algorithm, is a boosting technique that is applied as an ensemble method in
machine learning. By combining several straightforward models, the boosting method in machine learning aims to produce highly
accurate models. The AdaBoost algorithm's working methodology is shown in Fig. 6.5, where the ensemble method takes into
account all of N models' predecessors and combines the previous model with the weights of its successors. When teaching weak
learners, it works best. As a result, AdaBoost is used to solve the classification problem by combining several weak learners into a
single strong learner. ML Models are combined with the AdaBoost method [10] to enhance the classification problem and have a
positive effect.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2255
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
E. Evaluation Metrics
The effectiveness and performance of the statistical or machine learning model are quantified using evaluation metrics. These
metrics provide information on how well the model is performing and help in comparing different models or algorithms. Here are a
few metrics that were employed in this study to assess how well the models performed.
1) Accuracy
2) Confusion Matrix
3) Precision
4) Recall
5) F1-Score
6) AUC-ROC
V. EXPERIMENTAL RESULTS
The three machine learning methods are compared in this section by taking into account all the performance evaluation metrics
(such as accuracy, precision, recall, F1 Score, Roc-AUC score, and confusion matrix), and the results are presented.
Fig. 5 to Fig 7, show how a confusion matrix for three different models functions based on the occurrences of False Positive and
False Negative values, improving the model's accuracy.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2256
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
The comparative outcomes of the machine intelligence algorithms, which are based on the values of evaluation metrics like
Precision, Recall, F1 Score, and AUC-ROC Score, are shown in Figs. 8 and 9.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2257
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
The Random Forest algorithm outperforms the AdaBoost and Decision Tree algorithms in terms of making predictions of CC fraud,
according to the statistics comparison of ML models shown in Table 1.
The accuracy of Random Forest is 96%, which is best compared to the accuracy of the AdaBoost model (94%), and the accuracy of
the Decision tree (93%), as shown in the classification Performance Fig. 10.
VI. CONCLUSIONS
In terms of performance values for the evaluation metrics, Random Forest, AdaBoost, and Decision Tree are the next best accuracy
models, according to the comparative analysis of the various machine learning algorithms used in this research. Statistics of the
evaluation metrics are presented along with a comparison accuracy graph to help evaluate the models and choose the best model for
the CCFD system. In conclusion, it has been demonstrated that data science intelligence techniques based on machine learning
models are effective at identifying credit card fraud and saving millions of dollars by lowering cybersecurity attacks in the financial
and banking sectors.
REFERENCES
[1] D. Tanouz, R. R. Subramanian, D. Eswar, G. V. P. Reddy, A. R. Kumar, and C. V. N. M. Praneeth, "Credit Card Fraud Detection Using Machine Learning,"
2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 967-972, doi:
10.1109/ICICCS51141.2021.9432308.
[2] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim and A. K. Nandi, "Credit Card Fraud Detection Using AdaBoost and Majority Voting," in IEEE Access, vol. 6,
pp. 14277-14284, 2018, doi: 10.1109/ACCESS.2018.2806420.
[3] Bin Sulaiman, R., Schetinin, V. & Sant, P. Review of Machine Learning Approach on Credit Card Fraud Detection. Hum-Cent Intell Syst 2, 55–68 (2022).
https://doi.org/10.1007/s44230-022-00004-0
[4] https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
[5] A. Priya, A. S. Narayanan, S. Madhu Bala and B. D. Patel, "Optimal Algorithm for Credit Card Fraud Detection," 2022 Second International Conference on
Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 2022, pp. 1091-1098, doi: 10.1109/ICAIS53314.2022.9742922
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2258
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com
[6] https://nilsonreport.com/
[7] T. C. Tran and T. K. Dang, "Machine Learning for Prediction of Imbalanced Data: Credit Fraud Detection," 2021 15th International Conference on Ubiquitous
Information Management and Communication (IMCOM), Seoul, Korea (South), 2021, pp. 1-7, doi: 10.1109/IMCOM51814.2021.9377352.
[8] S. K. S, K. K. Shah, K. Kumar, K. K. Patel, and A. R. Sah, "Credit Card Fraud Detection Using Machine Learning Model," 2022 IEEE 2nd Mysore Sub
Section International Conference (MysuruCon), Mysuru, India, 2022, pp. 1-7, doi: 10.1109/MysuruCon55714.2022.9972647.
[9] Jemima Jebaseeli, T., Venkatesan, R., Ramalakshmi, K. (2021). Fraud Detection for Credit Card Transactions Using Random Forest Algorithm. In: Peter, J.,
Fernandes, S., Alavi, A. (eds) Intelligence in Big Data Technologies—Beyond the Hype. Advances in Intelligent Systems and Computing, vol 1167. Springer,
Singapore. https://doi.org/10.1007/978-981-15-5285-4_18
[10] E. Ileberi, Y. Sun and Z. Wang, "Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost," in
IEEE Access, vol. 9, pp. 165286-165294, 2021, doi: 10.1109/ACCESS.2021.3134330
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2259
View publication stats