Optimizing Fraud Detection in Financial Transactions With
Optimizing Fraud Detection in Financial Transactions With
DOI: 10.1111/exsy.13682
ORIGINAL ARTICLE
Ezaz Mohammed Al-dahasi 1 | Rama Khaled Alsheikh 1 | Fakhri Alam Khan 1,2,3 |
4
Gwanggil Jeon
1
Information and Computer Science
Department, King Fahd University of Abstract
Petroleum and Minerals, Dhahran,
The rapid advancement of the Internet and digital payments has transformed the
Saudi Arabia
2
SDAIA-KFUPM Joint Research Center for
landscape of financial transactions, leading to both technological progress and an
Artificial Intelligence, King Fahd University of alarming rise in cybercrime. This study addresses the critical issue of financial fraud
Petroleum and Minerals, Dhahran,
Saudi Arabia
detection in the era of digital payments, focusing on enhancing operational risk
3
Interdisciplinary Research Center of frameworks to mitigate the increasing threats. The objective is to improve the predic-
Intelligent Secure Systems (IRC-ISS), King Fahd
tive performance of fraud detection systems using machine learning techniques. The
University of Petroleum and Minerals,
Dhahran, Saudi Arabia methodology involves a comprehensive data preprocessing and model creation pro-
4
Department of Embedded Systems cess, including one-hot encoding, feature selection, sampling, standardization, and
Engineering, Incheon National University,
tokenization. Six machine learning models are employed for fraud detection, and their
Incheon, Korea
hyperparameters are optimized. Evaluation metrics such as accuracy, precision, recall,
Correspondence
Gwanggil Jeon, Department of Embedded
and F1-score are used to assess model performance. Results reveal that XGBoost
Systems Engineering, Incheon National and Random Forest outperform other models, achieving a balance between false pos-
University, 119 Academy-ro, Yeonsu-gu,
Incheon 22012, Korea.
itives and false negatives. The study meets the requirements for fraud detection sys-
Email: [email protected] tems, ensuring accuracy, scalability, adaptability, and explainability. This paper
Funding information
provides valuable insights into the efficacy of machine learning models for financial
Saudi Data and AI Authority (SDAIA) and King fraud detection and emphasizes the importance of striking a balance between false
Fahd University of Petroleum and Minerals
(KFUPM), Grant/Award Number: JRC-AI-RFP-
positives and false negatives.
12
KEYWORDS
financial transactions, fraud detection, machine learning, Random Forest, XGBoost
1 | I N T RO DU CT I O N
The technological advancement of the Internet has brought about the rise of contemporary services, particularly in e-commerce and
money transfer, improving corporate management, cost-cutting, and productivity (Iranian Joint Congress on Fuzzy and Intelligent
Systems, 2015). The ability for businesses and organizations to perform transactions electronically has changed financial operations.
Cybercrime has increased as a result, with e-banking services being a particular target. These services are targeted by malicious
attackers, who cause annual losses in the billions of dollars and a sharp increase in fraud (Abdallah et al., 2016). Governments, organiza-
tions, and private citizens all suffer major losses as a result of financial fraud, which has a significant impact on the financial industry and
daily life (Choi & Lee, 2018). Businesses globally lose about $4 trillion annually due to fraud, which equals 5% of their revenues. The
Association of Certified Fraud Examiners (ACFE) reports that this includes financial fraud like payment fraud, identity theft, and embez-
zlement. Electronic banking services are highly prone to cyber-attacks, leading to losses worth billions of dollars every year. In 2020, the
Internet Crime Complaint Center (IC3) of the FBI received over 790,000 complaints related to cybercrime, resulting in reported losses
exceeding $4.2 billion. Financial fraud has far-reaching consequences that go beyond the immediate victims and businesses affected. It
Expert Systems. 2024;e13682. wileyonlinelibrary.com/journal/exsy © 2024 John Wiley & Sons Ltd. 1 of 18
https://doi.org/10.1111/exsy.13682
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 18 AL-DAHASI ET AL.
has a significant impact on the global economy at large. According to a report by the International Monetary Fund (IMF), financial crimes
such as money laundering, corruption, and tax evasion constitute approximately 2%–5% of the global GDP, which amounts to trillions of
dollars annually. The critical need for effective fraud detection systems becomes evident in safeguarding financial transactions and miti-
gating cybercrime's impact on individuals, businesses, and the global economy. The current fraud detection systems are facing multiple
challenges that need improvement. First, with the rapid growth of digital payments and electronic services, traditional rule-based fraud
detection systems find it difficult to keep up with the evolving tactics of cybercriminals. These systems rely on predetermined rules and
thresholds, which make them less adaptable to new and sophisticated fraud patterns. The increase in the volume and complexity of
financial transactions makes it increasingly challenging to manually review every transaction for potential fraud. This situation creates a
need for more efficient and accurate automated fraud detection systems.
Moreover, cybercriminals are continuously innovating their techniques to avoid detection by using advanced evasion tactics or blending
fraudulent activities with legitimate transactions. Thus, there is a growing demand for fraud detection systems that can effectively identify such
intricate fraudulent behaviours while minimizing false positives to avoid disrupting legitimate transactions. The goal of fraud detection techniques
is to spot suspicious activity in previous transactions that violate an electronic service. Due to the unprecedented increase in fraud and financial
crimes as a result of the tremendous development in the field of digital payments (Iranian Joint Congress on Fuzzy and Intelligent Systems, 2015),
improving operational risk in financial fraud detection systems is very important (Alfaiz & Fati, 2022). Scientists have a strong interest in anti-
fraud, which has sparked the creation of risk assessment and detection tools.
Traditional fraud detection methods may also struggle to handle the vast amounts of data generated by modern digital payment systems.
Therefore, it is essential to leverage advanced technologies such as machine learning to analyse large datasets and extract meaningful insights
to detect fraudulent activities more effectively. Machine learning (ML) is a potent tool for creating new goods and services. Numerous areas
have researched its effectiveness, which has resulted in significant advancement. For examining the generalizability and transferability of ML
approaches, cross-domain study is essential. It aids in identifying the weaknesses and strengths of ML algorithms, facilitating quicker develop-
ment with less effort. For instance, in the healthcare industry, highly accurate ML algorithms may be used to identify financial fraud. ML algo-
rithms must be designed and implemented using cross-domain analysis (Khetani et al., 2023). Financial fraud, swindles, and extortion are
serious risks to financial services, enterprises, and governments. They have wide-ranging repercussions and put people, organizations, and gov-
ernments at risk (Jessica et al., 2023). The identification of controls and the implementation of efficient controls are essential for corporate
governance roles like Audit and Risk Committees and Senior Management to prevent ongoing and unabated fraudulent activity (Rehman &
Hashim, 2020).
In recent years, fraud and financial crimes have had unprecedented growth as a result of the enormous development in digital payments (Leo
et al., 2019). One of the main sources of operational risk for businesses, especially those in the finance industry, is fraudulent activities (Mashrur
et al., 2020).
In order to prevent fraud effectively and quickly, financial fraud detection technologies are therefore becoming more and more crucial. In the
area of operational risk, we will focus on problems related to fraud detection and suspicious transaction detection in electronic services to ensure
the safeguarding of the confidentiality and integrity of customers' transactions considering the principles of the cyber security framework of the
Saudi Arabia Monetary Authority (SAMA) (Al Sheikh, 2017). Due to that, the contributions of our work are the following:
• Enhancing the operational risk framework for improving the predictive performance of financial fraud detection systems.
• Providing a possible security solution via a simulation framework that utilizes Machine Learning techniques on a large dataset to detect suspi-
cious transactions in financial electronic services.
• Enhancing the accuracy of the proposed model using feature engineering methods.
The structure of this paper is as follows: Section 2 provides reviews of the related work on financial fraud detection systems. We then
describe our methodology in Section 3. In Section 4, we reported the results. Finally, we discuss the suggested future directions and give the con-
clusion of this project in Section 5.
2 | LITERATURE REVIEW
One of the most significant challenges for banks is being able to go through all the transactions and pinpoint the ones that are of a questionable
nature. Financial institutions employ financial fraud technologies to screen and categorize transactions according to varying levels of suspicion
(Kannan & Srinath, 2018).
In this section, we provide an overview of the literature on financial fraud detection systems and the mechanisms used to improve the predic-
tive performance of those systems. We conclude the summary of different approaches at the end of this section.
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
AL-DAHASI ET AL. 3 of 18
Operational risk, as per the definition provided by the Basel Committee on Banking Supervision (BCBS), refers to the potential for financial loss
arising from deficiencies or failures in internal processes, personnel, and systems, or from external occurrences. It is considered a vital component
of risk management within the banking sector (AL-kiyumi et al., 2021). The annual reports exhibited varied forms of operational risk,
encompassing several sub-hazards. The risks encompassed in the analysis are fraud risk, cyber security risk, risk associated with clients' products
and business practices, information and resiliency risk, danger of money laundering and financial crime, vendor and outsourcing risk, technology
risk, and risk of business disruption (Goodhart, 2011). Within the realm of operational risk, there exists a limited body of literature pertaining to
the identification of fraud risk in credit cards and online banking. The focus of their research pertains to the identification of credit card fraud in
domains that are not explicitly associated with bank risk management or the banking sector (Leo et al., 2019). In this paper, the authors collected
the types of risks and literature that used machine learning algorithms, and we provide a summary of them in Table 1. The successful mitigation of
these risks is crucial to the overall performance of a financial institution.
Studies have shown the impact of operational risk management practices on the financial performance of commercial banks in different coun-
tries. As recommended by these studies, there is a need to allocate resources towards understanding operational risk, incorporating risk manage-
ment practices into corporate strategy, and investing in operational risk management software. Some authors found that operational risk has
significant effects on the performance of commercial banks (Fadun et al., 2020). The author conducted research to determine how market risk,
credit risk, liquidity risk, and operational risk all impact the financial performance of commercial banks. According to the findings of the study, the
financial performance of Kenyan commercial banks is significantly impacted negatively by credit risk, liquidity risk, and operational risk. It was
determined that the operational risk would have the most significant impact (Ondigo, 2019). In addition, research conducted in Indonesia revealed
that both operational risk and liquidity risk had a sizeable and detrimental impact on the financial performance of banks that were traded on the
Indonesian stock exchange (Sari et al., 2021).
Those studies, taken as a whole, illustrate how important it is for commercial banks to manage their operational risk, as this type of risk can
have a substantial impact on the financial performance of banks.
The authors in Kurshan et al. (2020) discuss the use of graph-based solutions for detecting fraud in digital transaction data, highlighting the poten-
tial complexities of these solutions due to the size, speed, and nature of financial crime detection applications. They argue that adversarial tactics
will be a major challenge, and the focus should be on improving the performance of existing and emerging graph-based solutions. With the
increasing internet usage and the increasing number of companies operating online, financial frauds are becoming more prevalent, negatively
impacting the economy. The authors suggest that machine learning and data mining approaches are being used to address this issue, but improve-
ments are needed in terms of calculation speed, handling large data, and identifying unidentified assault patterns.
The Long Short-Term Memory (LSTM) methodology was used in this paper (Alghofaili et al., 2020), to offer a deep learning-based strategy for
the detection of financial fraud. With the help of big data, this model aims to improve both the detection efficiency and the present methods. A
real dataset of credit card frauds is used to assess the suggested model, and the outcomes are compared with another deep learning model, the
auto-encoder model, as well as other machine learning methods. The results of the trial demonstrated the LSTM's flawless performance, demon-
strating its ability to reach 99.95% accuracy in less than a minute. Financial fraud is a major issue everywhere on the globe and has significantly
TABLE 1 Summary of methods and algorithms for mitigating operational risks (Leo et al., 2019).
Risk management
Reference Risk type method/tool Algorithm
Ngai et al. (2011) Fraud risk Risk monitoring Neural networks, Bayesian belief network, Decision trees
Sudjianto et al. (2010) Fraud risk Risk monitoring SVM, Classification Trees, Ensemble Learning, CART, C4.5, Bayesian
belief networks, HMM
Khrestina et al. (2017) Financial crime/money Risk monitoring Logistic regression
laundering
Sharma and Fraud risk Operational risk losses SOM
Choudhury (2016)
Pun and Lawryshyn Fraud risk Operational risk losses Neural networks, k-Nearest Neighbour, Naïve Bayesian, Decision
(2012) tree
Peters et al. (2018) Cybersecurity Risk assessment (RCSA) Non-linear clustering method
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 18 AL-DAHASI ET AL.
harmed the sustainable expansion of financial markets. Although the ratio of non-fraud organizations is relatively large compared to fraudulent
ones, it is still difficult to detect fraud with a highly skewed dataset. As a result, algorithms for detecting financial statement fraud have been cre-
ated that are intelligent. Most existing approaches, especially those that deal with associated Chinese comments, solely take into account the
quantitative portion of the financial statement ratios.
The referenced study (Xiuguo & Shengyong, 2022) aims to enhance financial fraud detection using advanced deep learning techniques
applied to a blend of numerical features from financial statements and textual data from managerial remarks in 5130 Chinese listed businesses'
annual reports. By constructing a system of comprehensive financial indices, including previously overlooked non-financial sectors, the
researchers filled gaps in prior studies. Textual data from the Management Discussion and Analysis (MD&A) section was extracted using word
vectors. Deep learning models analysed this textual data alongside numerical features, achieving significant improvements over conventional
machine learning methods. Notably, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models achieved high correct classifica-
tion rates of 94.98% and 94.62% respectively on testing samples. These findings highlight the efficacy of textual features from the MD&A
section in enhancing financial fraud detection.
Traditional statistical methods, as well as more modern machine learning techniques, have been the focus of substantial study in the detection of
financial crimes (Leo et al., 2019). Machine learning is a type of artificial intelligence that analyses data from the past to enhance performance in a
given activity (or tasks) or to make more accurate predictions (Nathani & Singh, 2021). Machine learning is utilized in operational domains that
enable the mitigation of risk, that is, the identification and/or prevention of dangers. Within the area of operational risk, machine learning primar-
ily concentrates on issues pertaining to fraud detection and the identification of suspicious transactions, aside from cybersecurity situations
(Khrestina et al., 2017).
Bayesian algorithms, K Nearest Neighbour, Support Vector Machines (SVM), and bagging ensemble classifiers based on decision tree have
been employed to varied degrees in fraud detection systems (Zareapoor & Shamsolmoali, 2015). The authors collected the most prominent
machine learning research directed at financial risk management, concluding that many studies in risk management tasks rely on machine learning
(Mashrur et al., 2020). The authors in Alfaiz and Fati (2022) also used machine learning to block all fraudulent credit card transactions. A real
dataset was used to detect credit card fraud. It was a two-stage process where nine machine learning algorithms were tested to detect fraudulent
transactions, and the best three algorithms were nominated for use again in the second stage.
The All-Nearest K-Nearest Sampling (AllKNN) technique is combined with the CatBoost (AllKNN-CatBoost) best-proposed model. The All-
KNN CatBoost model has been compared with related works. The results indicated that the proposed model outperformed previous models with
an AUC value of 97.94%, a recall value of 95.91%, and an F1-Score value of 87.40%. Other authors have worked on filtering the watch list of
anti-money laundering systems by applying machine learning algorithms to their proposed model (Alkhalili et al., 2021). A model has been pro-
posed to automate the process of checking blocked transactions and compare the performance of different machine-learning algorithms. Support
vector machines (SVM) were found to outperform other algorithms, and a high-level architecture for integrating the machine learning component
into existing systems was proposed. The model consists of three phases: monitoring, advising, and acting, aiming to improve the efficiency and
accuracy of watch-list filtering by leveraging historical transaction data and additional information about transactions and blacklisted entities. The
ML component can provide suggestions for obstructed transactions, reducing the need for human involvement. The model can be easily added to
the current system for filtering watch lists. The authors developed a Deep Neural Network (DNN) algorithm to predict Bitcoin's price, using previ-
ously extracted features of Bitcoin and reducing uninformative features. The results showed a 53.4% accuracy rate and a MSE 1.02 correct pre-
diction (Ngai et al., 2011).
The authors in Khosravi et al. (2023) focused on the development of fraud prediction models for the banking transaction network. They
accomplished this by implementing many supervised machine-learning techniques. To test the performance of these algorithms, they utilized a
dataset comprising 46,316 client transactions. Additionally, the dataset included 25 features that were retrieved from the transaction network.
Features with a Pearson correlation coefficient greater than 0.8 were eliminated. The evaluation of the models' performance was conducted using
metrics like accuracy, recall, precision, and F1-score. The study findings revealed the efficacy of machine learning models in the detection of orga-
nized fraud in banking transactions. The models exhibited high accuracy, recall, precision, and F1-score values, showing their potential for effec-
tive fraud detection in the banking sector.
The authors in Taneja et al. (2019) used balancing techniques in combination with ensemble models for credit card fraud detection. They
compared different balancing techniques and evaluated their performance using classifiers such as Random Forest, XGBoost, and LGBoost.
The study's findings indicate that the most effective approach for credit card fraud detection was balancing the dataset using SVM SMOTE
and training it with the Random Forest classifier. The combination yielded a recall rate of 0.80, a precision rate of 0.91, and an F-score
of 0.85.
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
AL-DAHASI ET AL. 5 of 18
In Sadgali et al. (2020), the authors presented a framework that serves as a credit card fraud detection tool. Their work contributed to considering
human behavioural factors and imbalanced data, as well as discovering unusual transactions that could not be considered using traditional
methods. Their framework included different detection algorithms with the aim of improving accuracy, and its design consisted of four compo-
nents to deal with (1) data storage, (2) decision-making, (3) behaviour analysis, and (4) ensuring a good authentication filter.
We draw the conclusion that their work exemplifies an adaptive strategy for detecting credit card fraud that makes use of strategies that
deliver a high level of accuracy and consider the nature of the transaction and the customer's profile. It is also a multi-level framework because it
incorporates the transaction file itself, the client profile, and the banking security component. The issues of fraud and abnormalities in the Bitcoin
network were discussed in this study (Ashfaq et al., 2022). The financial industry is facing increasing fraud issues, particularly in online transactions
and e-banking. Blockchain technology, once considered safe, has also seen an increase in fraud. To combat this, a secure blockchain and machine
learning-based fraud detection strategy was proposed. XGboost and Random Forest (RF) machine learning algorithms were used for transaction
classification and prediction. The model was tested for accuracy, precision, and AUC.
A security study was conducted to demonstrate the system's reliability. An attacker model was also proposed to protect the system from
attacks. A multi-feature behaviour approximation model was developed for efficient botnet detection, analysing user and intermediate node
behaviour (Amala Dhaya & Ravi, 2021). The model incorporates a preprocessing phase in which the traces of transactions conducted by various
users are collected and validated to ensure the inclusion of all relevant attributes.
By considering these features and estimating the Mean First Trust Score (MFTS), the model can assess the reliability of source accounts and
identify instances of botnet attacks. The process of mitigation involves the measurement of the Backward Trust Score (BTS) and the subsequent
elimination of compromised nodes. By leveraging the MFTW, it is possible to detect the presence of a botnet and eliminate the malicious node to
lessen the effects of the botnet. Their new algorithm improved performance and decreased false classification ratios by considering a variety of
characteristics and trust measures. This helps in enhancing the confidentiality of financial transactions and mitigating financial fraud.
3 | R E Q U I R E M E N T S F O R F R A U D D E T E C T I O N SY S T EM S
A good fraud detection system should have the following main requirements (Kalbande et al., 2021):
• Accuracy: The system should be able to accurately identify fraudulent transactions while minimizing false positives and false negatives.
• Real-time detection: The system should be capable of detecting fraud in real-time or near real-time to prevent financial losses and mitigate risks.
• Scalability: The system should be able to handle large volumes of data and transactions, as the number of transactions in banking systems can
be substantial.
• Adaptability: The system should be adaptable to changing fraud patterns and techniques. It should be able to learn from new data and update
its models to detect emerging fraud patterns.
• Integration: The system should be easily integrated into existing banking systems and processes, allowing for seamless monitoring and detec-
tion of fraudulent activities.
• Explainability: The system should provide explanations or insights into the factors or features that contribute to the classification of a transac-
tion as fraudulent. This helps in understanding the reasoning behind the system's decisions and aids in fraud investigations.
• Compliance: The system should comply with regulatory requirements and standards, such as anti-money laundering (AML) regulations, to
ensure legal and ethical practices.
Table 2 provides a summary of related papers in terms of fraud detection systems requirements, with symbols indicating whether each
requirement is verified (√), not achieved (), or not mentioned in the paper (-).
The literature review covers several studies that focus on managing operational risks at banks, detecting financial fraud, and using machine
learning algorithms and other techniques for fraud detection. In this discussion, we will explore the common trends, gaps, and areas of disagree-
ment identified in these studies.
The studies point out the importance of understanding operational risks, integrating risk management practices into corporate strategy, and
investing in operational risk management software to improve the financial performance of commercial banks. However, there is limited literature
specifically addressing fraud risk identification in credit cards and online banking within the operational risk management realm. Further research is
necessary in this area to develop effective strategies for mitigating fraud risk. While some studies highlight the significant negative impact of opera-
tional risk on the financial performance of commercial banks, others may have different perspectives on the extent of this impact.
As financial fraud continues to be a major concern, researchers are turning to machine learning and data mining to improve fraud detection
systems. Deep learning techniques like Long Short-Term Memory and Gated Recurrent Units are becoming popular for their ability to analyse
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 18 AL-DAHASI ET AL.
both numerical and textual data and achieve high accuracy rates. However, there are still challenges, such as handling large datasets, identifying
unidentified assault patterns, and calculation speed. Further research is needed to address these challenges and enhance the efficiency of fraud
detection systems. Some studies focus on improving existing fraud detection solutions, while others explore alternative approaches or emphasize
different aspects of fraud detection, leading to varying perspectives on the most effective strategies.
Various machine learning algorithms such as Support Vector Machines (SVM), ensemble classifiers, and deep learning models are widely used
for detecting fraudulent activities. These algorithms have shown great accuracy in identifying organized fraud in banking transactions. However, it
is essential to evaluate and compare these algorithms comprehensively based on various factors such as performance metrics, scalability, and
adaptability to different types of fraud. Different studies may recommend specific machine learning algorithms or techniques based on their
research findings and experiment results. This approach ensures the selection of the most appropriate algorithm for fraud detection.
Various studies have been conducted to develop innovative approaches for fraud detection, such as blockchain technology, behavioural anal-
ysis, and botnet detection. The main objective of these techniques is to improve the accuracy and reliability of fraud detection systems. However,
the effectiveness of these approaches in practical applications may vary, and more empirical studies and real-world evaluations are necessary to
validate their efficacy. Different studies may prioritize different techniques or methodologies for fraud detection, which can lead to varying opin-
ions on the most effective strategies to combat financial fraud.
4 | RESEARCH GAP
The financial sector is a crucial component of the global economy, serving as a fundamental catalyst for promoting economic expansion and
ensuring stability. Nevertheless, the escalating intricacy of financial transactions and the extensive integration of digital technology have led to a
substantial increase in the vulnerability to financial fraud. We can observe that the detection of fraud in financial systems necessitates the fulfil-
ment of certain requirements. It is evident from Table 2 that several articles have addressed some of these needs. It is also evident to us based on
prior research that the incorporation of machine learning algorithms for the purpose of fraud detection has emerged as an essential strategy to
address financial fraud, as shown in the advantages of the related works in Table 1. The recognition of the fact that multiple factors might influ-
ence the effectiveness of machine learning models in identifying fraudulent activities is of utmost importance. These factors include the limited
availability of authentic data sets, the presence of imbalanced data sets, the sheer size of the data sets, the calibre and inclusiveness of the
dataset, the selection of features, the determination of hyperparameters, and the dynamic nature of fraud.
Our contribution to the proposed framework outlines the various ways that machine learning can be exploited to detect fraud in the financial
system and boost the system's overall level of security. The primary focus of our work is on:
• Optimize the detection rate while simultaneously limiting the occurrence of false positives. Models are calculated based on samples of fraudu-
lent and valid transactions using supervised detection methods.
• We will use another data set and compare the results with those studies.
• Implementing the necessary requirements to increase the efficiency of fraud detection systems in financial systems.
TABLE 2 The summary of related papers in terms of fraud detection systems requirements.
We will empower financial institutions to greatly improve their security measures by utilizing many techniques, including data pretreatment,
anomaly detection, and supervised learning. The incorporation of machine learning techniques not only serves to mitigate instances of financial
fraud but also guarantees the seamless processing of lawful transactions. The preservation of integrity and confidence within the financial sector
necessitates the continued inclusion of machine learning-based fraud detection as financial systems undergo ongoing advancements.
5 | M E TH O DO LO GY
This section provides an explanation of the methodology employed in this study. The data has been preprocessed, and a classifier model has been
created. The performance of the model is assessed to evaluate its effectiveness.
When choosing a dataset for the purpose of fraud detection, it is crucial to consider many criteria such the pertinence of the domain, the quality
of the data, and the particular type of fraud that is of interest for detection. The present study aims to utilize a dataset sourced from Vesta's
authentic e-commerce transactions conducted in the year 2019. This dataset encompasses a total of 1,042,574 instances and serves as the basis
for predicting the likelihood of fraudulent online transactions. It comprises a diverse group of features encompassing variables such as device type
and product characteristics (Kaggle, 2019). Table 3 represents the percentage of samples in each class and Figure 1 shows the distribution of
dataset based on our target feature (isFraud).
1 (fraud) 0 (non-fraud)
0.108194 99.891806
Feature extraction: The dataset under consideration comprises 11 distinct features. Table 4 provides a comprehensive representation of each indi-
vidual feature alongside its corresponding description.
One-hot encoding: is a method employed in machine learning and data preprocessing to convert category variables into binary vectors. It is
especially advantageous when working with categorical data in a manner that machine learning models can successfully handle. We employ one-
hot encoding on the ‘type’ column in our work as shown in Figure 2. This column is most likely comprised of categorical data pertaining to the
nature of the transaction. By using one-hot encoding, the data is converted into a binary matrix, where each column represents a distinct transac-
tion type. This encoding is appropriate for training machine learning models that necessitate numerical input.
Feature selection: Within this subsection, the appropriate variables are selected from a given dataset. Feature selection is essential since it can
improve the performance and accuracy of the approaches. Feature selection is a technique that helps extract crucial information from a large
dataset in order to reduce processing time (Kumar, 2014). The dataset contains 11 characteristics. After considering the correlation among the
features, two features (newbalanceDest and newbalanceOrig) were removed due to their Pearson correlation coefficient exceeding the threshold
limit of 0.9. A threshold of 0.9 aligns with common practices in feature selection and aims to maintain a balance between feature informativeness
and collinearity mitigation. Figure 3 displays the correlation matrix.
Sampling: Imbalanced datasets, which are commonly encountered across multiple domains, demonstrate a significant imbalance in the class
distribution. When dealing with it, such as those encountered in fraud detection tasks, using the original dataset without sampling may result in
biased model performance and inaccurate results. This happens because there is an imbalance between the majority class (non-fraudulent transac-
tions) and the minority class (fraudulent transactions), which can cause the model to be skewed towards predicting the majority class. As a result,
it may not be able to detect fraud accurately, leading to poor fraud detection capabilities This imbalance poses difficulties when it comes to train-
ing machine learning models. To tackle this concern, a range of sampling methodologies are implemented, encompassing oversampling,
undersampling, and hybrid approaches. Undersampling entails a reduction in the number of instances of the majority class, whereas oversampling
increases the number of instances of the minority class. By balancing the dataset, combining oversampling and undersampling, either indepen-
dently or via sophisticated techniques such as Synthetic Minority Oversampling Technique (SMOTE) with Tomek connections, the objective is to
reduce class imbalance.
We used the undersampling approach on the majority class since our dataset is unbalanced, with 1,041,446 occurrences that do not reveal
fraud and just 1128 examples that do. The balanced target labelling that distinguishes between ‘Not Fraud’ and ‘Fraud’ is visually represented in
Figures 4 and 5.
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 18 AL-DAHASI ET AL.
Not Fraud
Fraud
50% 50%
Standardization: Numerical characteristics are rescaled to have zero mean and unit variance through the preprocessing step of standardiza-
tion. Each feature's mean is subtracted, and the result is divided by the standard deviation. When characteristics in the dataset have varying
scales, it is crucial. Standardization is used in this context to change the numerical properties. This is especially crucial when utilizing machine
learning techniques that depend on gradients or measures based on distance.
Tokenization: since numerical input data is required for many machine learning models, the process of transforming text into a format appro-
priate for these models' training is called tokenization. Tokenization, in our context, is the process of turning customer names into strings of
tokens, which could be words or characters, and then padding these strings to a certain length. This can assist us in providing client names to
machine learning models in a uniform fashion. After that, dropping unnecessary columns is applied (nameOrig, nameDest, and isFlaggedFraud).
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
AL-DAHASI ET AL. 11 of 18
It is noteworthy that the machine learning algorithms mentioned earlier, MBO, EWA, EHO, MS, SMA, and HHO are not commonly endorsed or
widely employed for the specific task of fraud detection in financial transactions. Rather, these algorithms fall under the category of nature-
inspired optimization algorithms, intended primarily for resolving optimization problems spanning diverse domains. Although there is potential for
the adaptation of nature-inspired optimization algorithms to fraud detection tasks, it is noteworthy that these algorithms are not commonly
endorsed or employed within this domain. Instead, their primary application lies in the resolution of optimization problems within fields such as
engineering, operations research, and other relevant areas.
The process of fraud detection in financial transactions typically entails the examination and analysis of patterns, anomalies, and statistical
indicators within the transactional data. The present study applies six machine learning techniques, namely Naive Bayes, Random Forest, Support
Vector Machine (SVM), Logistic Regression, Decision Tree, and KNN to analyse a dataset consisting of Digital transactions. All these approaches
belong to the category of supervised machine learning, wherein the model is trained using labelled data. These algorithms are chosen for the pur-
pose of fraud detection tasks owing to their capacity to assimilate information from historical transactional data, discern patterns, and classify
transactions as either fraudulent or legitimate based on a range of features and indicators.
In order to effectively execute and deploy the models, it is necessary to first find the appropriate hyperparameters for each model. This process
involves finding the best combination of hyperparameters that can improve the performance of the model on a given dataset. To achieve this, we
define a grid of hyperparameters specifically tailored to the characteristics of the algorithm. For example, in logistic regression, we experiment
with different values of the regularization parameter and penalty term. Similarly, for decision trees, we vary parameters such as the split criterion,
maximum depth, and minimum samples per leaf node. The dataset was divided into two separate sets, specifically the training set and the testing
set. The process of identifying the optimal hyperparameters was carried out solely on the training dataset. The optimized hyperparameters for
each model are displayed in Table 5.
In this work, the model findings are validated using 5 fold and 10 fold. SHapley Additive exPlanations (SHAP) techniques are applied which is a
unified framework for explaining the output of any machine learning model. SHAP provides a way to attribute the prediction of an instance to its
features, thereby explaining how each feature contributes to the prediction.
In the work, ML models and validation are assessed using Accuracy, Precision, Recall, and F1-Score. Accuracy refers to how correct a model's
overall predictions are. Precision measures the proportion of true positive predictions the model made out of all positive predictions. Recall, also
known as sensitivity, calculates the proportion of true positive predictions identified correctly by the model out of all actual positive instances.
F1-Score provides a balanced measure of a model's performance, particularly in scenarios with imbalanced class distributions, as it is the harmonic
mean of Precision and Recall. Tables 6 and 7 display the findings of the used models based on the most effective performance assessment out-
comes on the utilized dataset using the optimal approach of undersampling with 5 fold and 10 fold. The ROC curve illustrates the trade-off
between True Positive Rate (TPR) and False Positive Rate (FPR) across different threshold settings. A model with a higher area under the ROC
curve (AUC) generally indicates better discrimination between positive and negative instances. Figures 6 and 7 show the ROC Curve for Models
with 5 fold and 10 fold respectively.
When it comes to fraud detection in a financial system, the major objective is usually to reduce the number of cases (false negatives) in which
fraudulent conduct is not discovered while preserving a suitable number of cases (false positives) in which normal activity is reported as fraudu-
lent. When a legitimate transaction is mistakenly flagged by the system as fraudulent, this is known as a false positive. Reducing the number of
false positive findings is the aim in order to spare clients from unnecessary inconveniences such as transaction prohibitions or fraudulent notifica-
tions. A false negative occurs when the system is unable to recognize a legitimate fraudulent transaction, which leaves it undiscovered. Reducing
the amount of false negative findings is intended to enhance the system's ability to identify and stop fraudulent activity. Inaccurately detecting
fraudulent activity can lead to financial losses and damage the financial institution's reputation.
The incidence of false positives and false negatives must regularly be balanced in the field of fraud detection. For the following reasons, it is
crucial to reach the ideal equilibrium:
• High False Positives: The possible drop in business and the inconvenience that clients are experiencing.
• High False Negatives: Losses of money and damage to the reputation of the financial institution.
The balance will vary based on the financial institution's risk tolerance and the nature of its operations. For that, we applied machine learning
to enhance and refine the fraud detection models in order to get the required balance between false positive identifications and false negative
identifications. The fraud detection models were improved and refined using machine learning to achieve the necessary balance between false
positive and false negative identifications.
The metrics offer a thorough assessment of our model's performance for each individual class as well as overall. Some models have strong
performance, as seen by the high values for precision, recall, and F1-score. Our focus was on the indicators that are specifically pertinent to our
objective, based on our work.
The elevated F1-Score signifies a commendable equilibrium between precision and recall, with recall measuring the model's capacity to catch
all positive cases. The term refers to the proportion of correctly identified positive cases in relation to the total number of positive cases, including
both true positives and false negatives. A greater recall implies a reduced number of false negatives.
XGBoost and Random Forest perform better than each other in all four evaluation categories. Following suit in terms of efficacy are the
DecisionTreeClassifier and Logistic Regression. The GaussianNB and KNeighborsClassifier methods, on the other hand, appear to be relatively
less effective in producing the intended results.
Upon implementing SHAP, as depicted in Figure 8, we acquired the following insights:
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
AL-DAHASI ET AL. 13 of 18
• 678 it: It suggests that 678 iterations (features or samples) are being processed.
• [00:12, 4.94 it/s]:00:12: The total time taken for the 678 iterations is 12 s.
• 4.94 it/s: The speed of iterations, indicating that approximately 4.94 iterations are processed per second.
To summarize, this result is a byproduct of interpreting our machine learning model with SHAP's PermutationExplainer. Shapley values, which
show how each feature contributes to the model's prediction, are computed by iterating over features or samples. To help us understand the com-
putational efficiency, the speed and amount of time required are given. After analysing the SHAP values, we can gain significant insights into the
factors that influence the model's decisions. By interpreting our models using SHAP values, we have identified several crucial features that greatly
impact the model's decisions in identifying fraudulent transactions. For instance, we have found that transaction amount, transaction type, and
account balances are among the most influential features that affect the model's predictions. SHAP summary plots and feature importance rank-
ings provide easy-to-understand visualizations of these key features, making their interpretation more accessible.
In addition to identifying key features, our approach enables us to provide explanations for the model's decisions. By highlighting suspicious
patterns or behaviours in transactions, our interpretable models generate explanations that are easy for humans to read, which facilitates under-
standing and validation of the model's predictions. These explanations can greatly assist investigators and financial analysts in detecting and miti-
gating fraudulent activities effectively.
We conducted more research and implemented the proposed direction for addressing imbalanced data in Gupta et al. (2021). Authors in Gupta
et al. (2021) applied random sampling on their imbalanced dataset that has 2,84,807 instances and used SVC, LogisticRegression,
RandomForestClassifier, and GaussianNB ML algorithms for making the model. Their results are shown in Table 8.
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 of 18 AL-DAHASI ET AL.
They provided a suggestion involving constructing a more efficient model utilizing alternative sampling strategies, which demonstrated
improved accuracy. Random sampling is a broad method used to construct samples that accurately represent populations. However, Random
Under-Sampling, which we employed, is a specific application of this technique that addresses imbalanced datasets. The latter concentrates on
diminishing the size of the majority class to achieve a more equitable distribution of classes, which is essential for training models to prevent
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
AL-DAHASI ET AL. 15 of 18
prejudiced predictions in favour of the majority class. The efficacy of the findings has been evident to us while their work showed results of
100%, which means there is a noticeable bias.
The suggested study mostly depends on the naive Bayes technique for dataset classification. Nevertheless, our findings suggest that this
strategy is relatively less efficacious in attaining the targeted results. In addition, LogisticRegression and RandomForestClassifier demonstrate
superior performance compared to their previous attempts, as evidenced by the F1-score, and Recall metrics.
The comparison may be biased due to the disparity in dataset size and the variation in approaches to handling imbalanced datasets. However,
our work has enhanced previous studies by employing an alternative technique for addressing imbalanced datasets and leveraging powerful
machine learning algorithms like XGBoost, renowned for its exceptional computational efficiency which is specifically engineered to be capable of
scaling and effectively managing extensive datasets and a substantial quantity of features (Prihandini, 2023).
6.2 | Fraud detection system requirements: A comparative analysis with previous work
Based on the specified needs mentioned earlier and after comparing with studies presented in a Table 2, the achievements within the scope of
this work are listed below:
Accuracy: The work employs a range of machine learning models, such as logistic regression, decision tree, Gaussian Naive Bayes, random for-
est, XGBoost, and k-nearest neighbours (KNN). The cross-validation process evaluates the precision of each model by utilizing the cross_val_score
function.
Real-time detection: The work does not focus on the detection of events in real-time.
Scalability: The work addresses the issue of class imbalance by employing random under-sampling, specifically using the
RandomUnderSampler technique. This technique helps to balance the target variable. The effects of scalability can be apparent when training
models and tuning hyperparameters. Scalability considerations encompass the effective management of substantial datasets.
Adaptability: The work involves conducting hyperparameter tuning for each model using GridSearchCV, a technique that enables the models
to be adjusted to the data.
Integration: The work involves utilizing established machine learning libraries such as scikit-learn and XGBoost, hence ensuring compatibility
with pre-existing machine learning ecosystems.
Explainability: The work involves utilizing the SHAP technique to achieve model explainability. The capacity to comprehend the variables that
determine the designation of a transaction as fraudulent is of utmost importance.
Compliance: Although our dataset originates from Kaggle, a platform for hosting and sharing datasets, the primary duty for adhering to legal
and ethical standards resides with the dataset producers and competition organizers. Kaggle promotes the observance of ethical rules and
legal standards in the creation, uploading, and utilization of datasets.
We performed an exploratory data analysis on a big dataset with skewed data to identify the patterns associated with fraudulent actions. A
machine learning model was constructed to categorize transactions as either fraudulent or non-fraudulent, with the objective of minimizing the
occurrence of false negatives, as explained previously. Additionally, we are focused on applying the requisite criteria to enhance the efficacy of
fraud detection systems within financial systems, which constitutes the main objective of our work.
7 | THREAT TO VALIDITY
The ability of a fraud detection system to work effectively on unseen data beyond the training dataset is crucial. This is known as generalizability,
and it can be enhanced by relying on a diverse and representative dataset along with robust validation techniques. By addressing these aspects,
the proposed approach can demonstrate its effectiveness not only in the specific context of the study but also in real-world applications where
the models need to generalize well to new data and scale efficiently with growing datasets and transaction volumes.
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 18 AL-DAHASI ET AL.
8.1 | Conclusion
In conclusion, this work delves into the imperative realm of financial fraud detection within the evolving landscape of digital payments. The
study's objective, centred around enhancing operational risk frameworks, is systematically approached through a meticulous methodology
encompassing dataset selection, preprocessing, feature engineering, and model creation. Notably, XGBoost and Random Forest emerge as
frontrunners in effectively balancing precision and recall, crucial for minimizing both false positives and false negatives. The application of the
SHAP technique adds a layer of interpretability to the models, offering insights into feature contributions. The study fulfils key requirements for
fraud detection systems, ensuring accuracy, scalability, and explainability. Overall, the work contributes to the ongoing discourse on mitigating
financial fraud risks, shedding light on the applicability and efficacy of machine learning models in safeguarding digital transactions.
Future directions in fraud detection systems should focus on enhancing model robustness through ensemble methods and deeper architectures,
enabling real-time detection with streaming analytics. Ensuring transparency and interpretability in machine learning models is critical, and
research should evolve towards adaptive learning systems and blockchain integration for heightened security. Cross-industry collaboration, inte-
gration of behavioural biometrics, and advanced authentication methods are essential, along with a focus on ethical AI considerations. Preparing
for the impact of quantum computing on security and prioritizing user education will contribute to a more resilient and proactive defence against
emerging cyber threats in the digital finance landscape.
ORCID
Fakhri Alam Khan https://orcid.org/0000-0002-9130-1874
Gwanggil Jeon https://orcid.org/0000-0002-0651-4278
RE FE R ENC E S
Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal of Network and Computer Applications, 68, 90–113. https://doi.
org/10.1016/j.jnca.2016.04.007
Al Sheikh, A. (2017). Cyber security framework Saudi Arabian monetary authority.
Alfaiz, N. S., & Fati, S. M. (2022). Enhanced credit card fraud detection model using machine learning. Electron, 11(4), 662. https://doi.org/10.3390/
electronics11040662
Alghofaili, Y., Albattah, A., & Rassam, M. A. (2020). A financial fraud detection model based on LSTM deep learning technique. Journal of Applied Security
Research, 15, 498–516. https://api.semanticscholar.org/CorpusID:225056799
Alkhalili, M., Qutqut, M. H., & Almasalha, F. (2021). Investigation of applying machine learning for watch-list filtering in anti-money laundering. IEEE Access,
9, 18481–18496. https://doi.org/10.1109/ACCESS.2021.3052313
AL-kiyumi, R. K., AL-hattali, Z. N., & Ahmed, E. R. (2021). Operational risk management and customer complaints in Omani banks. Journal of Governance and
Integrity, 5(1), 200–210. https://doi.org/10.15282/jgi.5.1.2021.7031
Amala Dhaya, M. D., & Ravi, R. (2021). Multi feature behavior approximation model based efficient botnet detection to mitigate financial frauds. Journal of
Ambient Intelligence and Humanized Computing, 12(3), 3799–3806. https://doi.org/10.1007/s12652-020-01677-w
Ashfaq, T., Khalid, R., Yahaya, A. S., Aslam, S., Azar, A. T., Alsafari, S., & Hameed, I. A. (2022). Detection mechanism (pp. 1–20). MDPI.
Chen, S. (2022). Cryptocurrency financial risk analysis based on deep machine learning. Complexity, 2022, 1–8. https://doi.org/10.1155/2022/2611063
Choi, D., & Lee, K. (2018). An artificial intelligence approach to financial fraud detection under IoT environment: A survey and implementation. Security and
Communication Networks, 2018, 5483472:1–5483472:15. https://doi.org/10.1155/2018/5483472
Fadun, D., Olajide, S., & Oye, D. (2020). Impacts of operational risk management on financial performance: A case of commercial banks in Nigeria. Interna-
tional Journal of Finance & Banking Studies, 9, 22–35.
Goodhart, C. A. E. (2011). The Basel committee on banking supervision: A history of the early years 1974–1997. https://api.semanticscholar.org/CorpusID:
202245837
Gupta, A., Lohani, M. C., & Manchanda, M. (2021). Financial fraud detection using Naive Bayes algorithm in highly imbalance data set. Journal of Discrete
Mathematical Sciences and Cryptography, 24(5), 1559–1572. https://doi.org/10.1080/09720529.2021.1969733
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
AL-DAHASI ET AL. 17 of 18
Iranian Joint Congress on Fuzzy and Intelligent Systems. (2015). Iranian joint congress on fuzzy and intelligent systems 4. 2015 Zahid an,
Iranian joint congress
on fuzzy and intelligent systems 4 2015.09.09-11 Zahedan, CFIS 4 2015.09.09-11 Zahedan, conference on fuzzy systems 15 2015.09.09-11 Zahedan, and
conference on intelligent systems 13 2015.09.09-11 Zahedan, 4th Iranian joint congress on fuzzy and intelligent systems 15th conference on fuzzy systems
and 13th conference on intelligent systems. University of Sistan and Baluchestan.
Jessica, A., Raj, F. V., & Sankaran, J. (2023). Credit card fraud detection using machine learning techniques. In ViTECoN 2023—2nd IEEE int. conf. vis. towar.
emerg. trends commun. netw. technol. proc IEEE. https://doi.org/10.1109/ViTECoN58111.2023.10157162
Kaggle. (2019). Online-payments-fraud-detection. IEEE Computational Intelligence Society. https://www.kaggle.com/code/stbnlen/online-payments-fraud-
detection/input
Kalbande, D., Prabhu, P., Gharat, A., & Rajabally, T. (2021). A fraud detection system using machine learning. In 2021 12th int. conf. comput. commun. netw.
technol. ICCCNT (pp. 1–7). IEEE. https://doi.org/10.1109/ICCCNT51525.2021.9580102
Kannan, S., & Srinath, M. V. (2018). Autoregressive-based outlier algorithm to detect money laundering activities. International Journal of Analysis and Appli-
cations, 5(3), 29–38. http://www.ijrar.com/upload_issue/ijrar_issue_1810.pdf
Khetani, V., Gandhi, Y., Bhattacharya, S., Ajani, S. N., & Limkar, S. (2023). Cross-domain analysis of ML and DL: Evaluating their impact in diverse domains.
International Journal of Intelligent Systems and Applications in Engineering, 11, 253–262. www.ijisae.org
Khosravi, S., Kargari, M., Teimourpour, B., Eshghi, A., & Aliabdi, A. (2023). Using supervised machine learning approaches to detect fraud in the banking
transaction network. In 2023 9th int. conf. web res. ICWR (pp. 115–119). IEEE. https://doi.org/10.1109/ICWR57742.2023.10139083
Khrestina, M. P., Dorofeev, D. I., Kachurina, P. A., Usubaliev, T. R., & Dobrotvorskiy, A. S. (2017). Development of algorithms for searching, analyzing and
detecting fraudulent activities in the financial sphere. European Research Studies Journal, 20, 484–498. https://api.semanticscholar.org/CorpusID:
64711976
Kumar, V. (2014). Feature selection: A literature review. Smart Computing Reviews, 4(3), 211–229. https://doi.org/10.6029/smartcr.2014.03.007
Kurshan, E., Shen, H., & Yu, H. (2020). Financial crime fraud detection using graph computing: Application considerations outlook. In Proc. – 2020 2nd int.
conf. transdiscipl. AI, transAI (pp. 125–130). IEEE. https://doi.org/10.1109/TransAI49837.2020.00029
Leo, M., Sharma, S., & Maddulety, K. (2019). Machine learning in banking risk management: A literature review. Risks, 7(1), 29. https://doi.org/10.3390/
risks7010029
Mashrur, A., Luo, W., Zaidi, N. A., & Robles-Kelly, A. (2020). Machine learning for financial risk management: A survey. IEEE Access, 8, 203203–203223.
https://doi.org/10.1109/ACCESS.2020.3036322
Nathani, N., & Singh, A. (2021). Foundations of machine learning. In Introd. to AI tech. renew. energy syst. CRC Press. https://api.semanticscholar.org/
CorpusID:38553870
Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification frame-
work and an academic review of literature. Decision Support Systems, 50, 559–569. https://api.semanticscholar.org/CorpusID:27434345
Ondigo, H. (2019). The joint effect of corporate governance, risk management and firm characteristics on financial performance of commercial banks in
Kenya. International Journal of Economics, Management and Media Studies, 6, 91–111.
Peters, G., Shevchenko, P. V., Cohen, R., & Maurice, D. (2018). Statistical machine learning analysis of cyber risk data: Event case studies. SSRN Electronic
Journal, 75–99. https://doi.org/10.2139/ssrn.3200155
Prihandini, T. (2023). Interactive mobile technologies. International Journal of Interactive Mobile Technologies, 17(15), 135–154.
Pun, J., & Lawryshyn, Y. A. (2012). Improving credit card fraud detection using a meta-classification strategy. International Journal of Computers and Applica-
tions, 56, 41–46. https://api.semanticscholar.org/CorpusID:7187171
Rehman, S. A., & Hashim, F. (2020). Impact of fraud risk assessment on good corporate governance: Case of public listed companies in Oman. Business Sys-
tems Research, 11(1), 16–30. https://doi.org/10.2478/bsrj-2020-0002
Sadgali, I., Sael, N., & Benabbou, F. (2020). Adaptive model for credit card fraud detection. International Journal of Interactive Mobile Technologies, 14(3), 54–
65. https://doi.org/10.3991/ijim.v14i03.11763
Sari, A. A. P. A. M. P., Suindari, N. M., & Lestari, N. L. P. R. W. (2021). Effect of credit risk and market risk on financial performance with liquidity as a media-
tion on Lpd in Badung regency, Indonesia. Russian Journal of Agricultural and Socio-Economic Sciences, 117(9), 55–63. https://doi.org/10.18551/rjoas.
2021-09.07
Sharma, S., & Choudhury, A. R. (2016). Fraud analytics: A survey on bank fraud and fraud prediction using unsupervised learning based approach. Interna-
tional Journal of Innovations in Engineering Research and Technology, 3, 1–9. https://api.semanticscholar.org/CorpusID:611470
Sudjianto, A., Nair, S., Yuan, M., Zhang, A., Kern, D., & Cela Diaz, F. (2010). Statistical methods for fighting financial crimes. Technometrics, 52, 19–25.
https://api.semanticscholar.org/CorpusID:34785896
Taneja, S., Suri, B., & Kothari, C. (2019). Application of balancing techniques with ensemble approach for credit card fraud detection. In 2019 Int. conf. com-
put. power commun. technol. GUCON (pp. 753–758). IEEE.
Xiuguo, W., & Shengyong, D. (2022). An analysis on financial statement fraud detection for Chinese listed companies using deep learning. IEEE Access, 10,
22516–22532. https://doi.org/10.1109/ACCESS.2022.3153478
Zareapoor, M., & Shamsolmoali, P. (2015). Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Computer Science,
48(C), 679–685. https://doi.org/10.1016/j.procs.2015.04.201
AUTHOR BIOGRAPHI ES
Ezaz M. Al‐Dahasi is a full‐time Ph.D. student at the Information and Computer Science Department, King Fahd University of Petroleum and
Minerals, Saudi Arabia. She specialises in the area of net‐centric computing.
Rama K. Alsheikh is a full‐time Ph.D. student at the Information and Computer Science Department, King Fahd University of Petroleum and
Minerals, Saudi Arabia. She specialises in the area of net‐centric computing.
14680394, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/exsy.13682 by Universidad Nacional Autonoma De Mexico, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 18 AL-DAHASI ET AL.
Fakhri Alam Khan is an Associate Professor with the Department of Information and Computer Science at King Fahd University of Petroleum
and Minerals. He is also a ‘research fellow’ with the Saudi Data and AI Authority (SDAIA) under the SDAIA‐KFUPM Joint Research Center for
Artificial Intelligence. He received his Ph.D. in computer science from the University of Vienna, Austria, in 2010 and completed a post‐doctor-
ate from the Vienna University of Technology in 2017. Before joining KFUPM, he worked in different capacities for over ten years at the
Institute of Management Sciences (IMSciences), Peshawar, Pakistan. He has published several research articles in various reputed peer‐
reviewed internationally recognized journals and has supervised numerous M.S. and Ph.D. students. His research interests include the IoT,
data analytics, data provenance, distributed systems, machine learning, multimedia technologies, and nature‐inspired metaheuristic
algorithms.
Gwanggil Jeon received the B.S., M.S., and Ph.D. (summa cum laude) degrees from the Department of Electronics and Computer Engineering,
Hanyang University, Seoul, Korea, in 2003, 2005, and 2008, respectively. From 2009.09 to 2011.08, he was with the School of Information
Technology and Engineering, University of Ottawa, Ottawa, ON, Canada, as a Post‐Doctoral Fellow. From 2011.09 to 2012.02, he was with
the Graduate School of Science and Technology, Niigata University, Niigata, Japan, as an Assistant Professor. From 2014.12 to 2015.02 and
2015.06 to 2015.07, he was a Visiting Scholar at Centre de Mathématiques et Leurs Applications (CMLA), École Normale Supérieure Paris‐
Saclay (ENS‐Cachan), France. From 2019 to 2020, he was a Prestigious Visiting Professor at Dipartimento di Informatica, Università degli
Studi di Milano Statale, Italy. From 2019 to 2020 and 2023 to 2024, he was a Visiting Professor at Faculdade de Ciência da Computação,
Universidade Federal de Uberlândia, Brasil. He is currently a professor at Incheon National University, Incheon. He was a general chair of IEEE
SITIS 2023, and served as a workshop chairs in numerous conferences. Dr. Jeon is an Associate Editor of IEEE Transactions on Circuits and
Systems for Video Technology (TCSVT), Elsevier Sustainable Cities and Society, IEEE Access, Springer Real‐Time Image Processing, Journal of
System Architecture, and Wiley Expert Systems. Dr. Jeon was a recipient of the IEEE Chester Sall Award in 2007, ACM's Distinguished
Speaker in 2022, the ETRI Journal Paper Award in 2008, and Industry‐Academic Merit Award by Ministry of SMEs and Startups of Korea Min-
ister in 2020.
How to cite this article: Al-dahasi, E. M., Alsheikh, R. K., Khan, F. A., & Jeon, G. (2024). Optimizing fraud detection in financial transactions
with machine learning and imbalance mitigation. Expert Systems, e13682. https://doi.org/10.1111/exsy.13682