0% found this document useful (0 votes)
42 views16 pages

Enhancing Medicare Fraud Detection Through Machine Learning Addressing Class Imbalance With SMOTE-ENN

This article discusses a novel approach to enhance Medicare fraud detection using machine learning by addressing the issue of imbalanced datasets with a hybrid resampling technique called SMOTE-ENN. The authors highlight the challenges of traditional methods and demonstrate that their approach, which includes generating synthetic instances and removing noisy data, significantly improves model accuracy. The study emphasizes the importance of using the Area Under the Precision-Recall Curve (AUPRC) for evaluating performance in imbalanced scenarios, with Decision Trees achieving the highest scores across metrics.

Uploaded by

sanjukta sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views16 pages

Enhancing Medicare Fraud Detection Through Machine Learning Addressing Class Imbalance With SMOTE-ENN

This article discusses a novel approach to enhance Medicare fraud detection using machine learning by addressing the issue of imbalanced datasets with a hybrid resampling technique called SMOTE-ENN. The authors highlight the challenges of traditional methods and demonstrate that their approach, which includes generating synthetic instances and removing noisy data, significantly improves model accuracy. The study emphasizes the importance of using the Area Under the Precision-Recall Curve (AUPRC) for evaluating performance in imbalanced scenarios, with Decision Trees achieving the highest scores across metrics.

Uploaded by

sanjukta sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Enhancing Medicare Fraud Detection


through Machine Learning: Addressing
Class Imbalance with SMOTE-ENN
RAYENE BOUNAB1 , KARIM ZAROUR1 , BOUCHRA GUELIB1 , NAWRES KHLIFA.2
1
Constantine 2 University – Abdelhamid Mehri, New information technologies, LIRE laboratory, Nouvelle ville Ali Mendjli BP67A – Constantine, Algeria.
2
University of Tunis el Manar, Higher Institute of Medical Technologies of Tunis, Research Laboratory of Biophysics and Medical Technologies, Tunis, Tunisia
(e-mail: [email protected])
Corresponding author: First Rayene Bounab ([email protected]).

ABSTRACT The healthcare fraud detection field is constantly evolving and faces significant challenges,
particularly when addressing imbalanced data issues. Previous studies mainly focused on traditional
machine learning (ML) techniques, often struggling with imbalanced data. This problem arises in various
aspects. It includes the risk of overfitting with Random Oversampling (ROS), noise introduction by the
Synthetic Minority Oversampling Technique (SMOTE), and potential crucial information loss with Random
Undersampling (RUS). Moreover, improving model performance, exploring hybrid resampling techniques,
and enhancing evaluation metrics are crucial for achieving higher accuracy with imbalanced datasets. In
this paper, we present a novel approach to tackle the issue of imbalanced datasets in healthcare fraud
detection, with a specific focus on the Medicare Part B dataset. First, we carefully extract the categorical
feature "Provider Type" from the dataset. This allows us to generate new, synthetic instances by randomly
replicating existing types, thereby increasing the diversity within the minority class. Then, we apply a
hybrid resampling method named SMOTE-ENN, which combines the Synthetic Minority Over-sampling
Technique (SMOTE) with Edited Nearest Neighbors (ENN). This method aims to balance the dataset by
generating synthetic samples and removing noisy data to improve the accuracy of the models. We use
six machine learning (ML) models to categorize the instances. When evaluating performance, we rely
on common metrics like accuracy, F1 score, recall, precision, and the AUC-ROC curve. We highlight
the significance of the Area Under the Precision-Recall Curve (AUPRC) for assessing performance in
imbalanced dataset scenarios. The experiments show that Decision Trees (DT) outperformed all the
classifiers, achieving a score of 0.99 across all metrics.

INDEX TERMS Healthcare fraud, Imbalanced data, Machine Learning(ML), Noisy data.

I. INTRODUCTION population. Ensuring efficient fraud detection is vital for the


EALTHCARE systems globally face a significant chal- protection of public funds and guaranteeing that resources
H lenge due to fraud, which impacts both their finan-
cial stability and moral principles. In particular, the U.S.
are distributed fairly for necessary healthcare services and
patient care. The challenge in healthcare fraud detection lies
Medicare program, a key element of the healthcare sector, in the evolving nature of fraud schemes, which are complex
experiences substantial financial loss from such fraudulent and diverse. Traditional, rule-based detection methods fall
practices. According to the Federal Bureau of Investigation, short in this dynamic environment, lacking the necessary
healthcare fraud represents 3–10% of the total healthcare adaptability and scalability to address the sophisticated na-
costs, leading to yearly losses between $19 billion and $65 ture of modern healthcare fraud. Machine learning (ML),
billion [1]. These illegal activities not only deplete financial a subfield of Artificial intelligence (AI) has demonstrated
resources but also affect the operational efficiency and trust- exceptional proficiency in healthcare fraud detection, partic-
worthiness of healthcare systems. Therefore, it is imperative ularly in processing the Medicare dataset released annually
to implement effective and strong fraud detection strategies, by the U.S. government [2]. This dataset is a crucial resource
especially in Medicare, which serves a broad and diverse for researchers focusing on healthcare fraud detection. This

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

reflects the government’s commitment to combating fraud by Over-sampling Technique (SMOTE) [12]. Concurrently, un-
equipping specialists with vital data, thereby facilitating the dersampling of the majority class is executed using Random
development of more sophisticated fraud detection strategies Undersampling (RUS) [13] to achieve a balanced dataset.
based on ML. Its strength lies in its ability to learn from Despite the efficacy of these techniques, challenges persist.
historical data and adapt to emerging fraudulent patterns, ROS methods, for instance, may be susceptible to overfitting,
making it effective in analyzing large datasets to identify potentially compromising the generalizability of the model.
anomalies and fraud indicators. This adaptability renders ML Meanwhile, the application of SMOTE carries the risk of
indispensable in creating efficient, responsive systems for introducing noise to the dataset. Moreover, the implemen-
large-scale operations like Medicare, positioning it is an in- tation of RUS comes with its own set of concerns, notably
dispensable asset in combating healthcare fraud [3]. Machine the risk of discarding crucial data, potentially leading to a
Learning (ML), however, excels in this aspect. Its ability to loss of important information. The intricate trade-offs and
learn from historical data and adjust to new fraudulent pat- considerations associated with each resampling technique
terns allows it to process and analyze vast datasets, detecting underscore the complexity of addressing the class imbalance
anomalies and patterns indicative of fraud. This capability in healthcare fraud detection datasets. To surmount the limi-
positions ML as a crucial tool in creating more effective and tations identified in prior research, such as:
responsive fraud detection systems, especially for large-scale • Limited research into advanced techniques for handling
operations like Medicare. Its dynamic approach makes it an imbalanced datasets
indispensable asset in the ongoing fight against healthcare • Existing resampling approaches such as ROS may result
fraud [3]. in overfitting, whereas strategies like SMOTE could
Recent studies, such as those by [4]–[8] demonstrate the contribute noise to the dataset
successful application of ML techniques using the Medi- • Random Undersampling technique poses the risk of
care dataset to uncover fraudulent activities. The Medicare eliminating important data, which could lead to the
datasets [9], disseminated by the Centers for Medicare and exclusion of critical patterns that indicate fraud.
Medicaid Services, exhibit a pronounced class imbalance
This paper introduces a novel approach to address im-
characterized by a disproportionate representation of non-
balanced datasets in healthcare fraud detection, particularly
fraudulent cases relative to fraudulent instances. This class
focusing on the Medicare Part B dataset. A key innovation
imbalance presents a formidable impediment to the efficacy
lies in the meticulous separation of the categorical features
of ML algorithms deployed for fraud detection. Predomi-
from the numerical features, enabling the random generation
nantly, ML models are predisposed to a bias towards the
of synthetic instances to enrich minority class diversity. Our
majority class, in this case, non-fraudulent transactions, lead-
proposed Synthetic Minority Over-sampling technique with
ing to a heightened incidence of false negatives. This phe-
Edited Nearest Neighbors (SMOTE-ENN) hybrid resampling
nomenon occurs when the algorithm erroneously categorizes
method contributes significantly by simultaneously rebalanc-
fraudulent activities as legitimate, a direct consequence of the
ing the dataset and eliminating noisy data, which is then
skewed training data [2], [10]. Such imbalance in the dataset
evaluated using various ensemble classifiers. To the best of
precipitates the development of ML models that demonstrate
our knowledge, this paper proposes an approach that com-
suboptimal performance in the accurate detection of fraudu-
bines the separate generation of categorical features, with the
lent activities. This deficiency critically undermines the over-
SMOTE-ENN technique and a variety of ensemble learning
arching effectiveness and reliability of the fraud detection
classifiers. Additionally, we incorporate the use of the Area
mechanism within the healthcare domain. To ameliorate this
Under the Precision-Recall Curve (AUPRC) metric for eval-
situation, it is imperative to establish datasets that are bal-
uation, enhancing the robustness and comprehensiveness of
anced, thereby ensuring that ML algorithms are more adept
our analysis.
at discerning the minority class, which in this context refers
The main contributions of this paper can be summarized
to fraudulent transactions. A balanced dataset is instrumental
as follows:
in enabling the algorithm to detect nuanced patterns and
anomalies that are indicative of fraudulent activities [5]. • Randomly generate the categorical feature "Provider.

A notable gap in current research endeavors within health- Type" based on existing categories in the dataset
care fraud detection is the inadequate focus on addressing • Application of the SMOTE-ENN hybrid resampling

the challenges posed by imbalanced data. The preponderance method to balance the dataset and remove noisy data.
of research has been directed towards classification tasks, • Evaluation of the effectiveness of the proposed approach

with insufficient attention to the intricate issue of data im- using ensemble learning classifiers.
balance. Although there has been a notable deficiency in • Employing the Area Under the Precision-Recall Curve

addressing data imbalances within healthcare fraud detec- (AUPRC) metric for a more effective evaluation of
tion, some researchers have begun to address this gap using model performance in the context of an imbalanced
resampling techniques. These methodologies, which include dataset
Random Oversampling (ROS) [5], Adaptive synthetic sam- The structure of this paper is organized as follows: Sec-
pling approach (ADASYN) [11], and Synthetic Minority tion II provides an overview of the related work, with an
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

emphasis on studies that utilized ML and data balancing served is the reliance on the significantly imbalanced Medi-
techniques. The proposed system is detailed in Section IV. care dataset for experimentation, an issue that remains largely
The experimental results and a discussion are presented in unaddressed and could potentially result in misclassification
Section V. Finally, the paper concludes with Section VI, outcomes.
summarizing the main outcomes.
B. WORKS ADDRESSING THE PROBLEM OF
II. RELATED WORK IMBALANCED DATA IN HEALTHCARE FRAUD
Detecting fraud in healthcare has been the subject of exten- DETECTION
sive exploration in the literature. This section presents and The following studies present some of the common methods
evaluates different papers in the field of healthcare fraud applied in the field of healthcare fraud detection to handle
detection based on two principal aspects that align with the imbalanced data.
objectives of our study. Firstly, there is a significant amount The paper [20] tackled the problem of imbalanced data
of literature that focuses on the utilization of AI method- by experimenting with different class distributions in their
ologies to detect healthcare fraud. Many studies highlight ML models. Using the Medicare Part B dataset, the authors
the effectiveness of ML techniques in effectively identifying applied six ML models across seven class distributions to ad-
fraudulent behavior within healthcare systems [14]. Another dress the data imbalance. The results indicate that employing
area of research examines the challenge of imbalanced data a 90:10 ratio of non-fraud to fraud cases outperformed other
in healthcare fraud detection. Researchers have explored var- models. In their study, [21], the authors addressed the chal-
ious strategies to handle this problem, aiming to improve the lenge of the imbalanced data in the Medicare dataset by em-
effectiveness of ML models in accurately detecting health- ploying ML models for classification and six sampling tech-
care fraud [15]. niques to balance the dataset. The study’s findings demon-
strated that RUS consistently gave strong results across all
A. WORKS ADDRESSING THE USE OF AI IN ML models. A semantic embedding approach was proposed
HEALTHCARE FRAUD DETECTION in [22]. The author proposed a semantic embedding approach
Recent advancements in AI, especially ML, have led to to convert healthcare procedure codes (HCPCS) from the
diverse and innovative approaches to detecting healthcare Medicare fraud dataset into semantic embeddings. To address
fraud. The authors in [16] aimed to improve decision-tree- the imbalanced data issue, the work employed a simple un-
based ensemble techniques for healthcare fraud detection, dersampling method. Another semantic embedding approach
utilizing the large Part D Medicare dataset with around was proposed in [23]. The authors developed semantic em-
175 million records. The authors in [17] introduced a ML beddings for medical provider types using both pre-trained
framework that transforms prescription claims into statistical (Global Vectors for Word Representation (GloVe), Medical
modeling features, focusing on business heuristics, provider- Word2Vec (Med-W2V)) and custom (HcpcsVec, RxVec) em-
prescriber relationships, and client demographics. The study beddings from Medicare claims data. This method improved
by [2] employed an ensemble feature selection technique the representation of provider specialties and was validated
in ML models for Medicare fraud detection. This approach using various ML algorithms. Additionally, the study tackled
improved explainability and reduced data complexity. The the issue of imbalanced data by employing random over-
work proposed by [18], introduced a Bayesian Belief Net- sampling (ROS) and under-sampling techniques. The authors
work (BBN) model for healthcare fraud detection, involving in [14] applied ML and DL techniques to identify financial
preprocessing and feature engineering of Texas Medicaid fraud in healthcare credit card transactions. Additionally,
prescription claims. This approach outperformed baseline they tackled the challenge of imbalanced data by recom-
models in scalability and interpretability. mending a hybrid resampling approach, although the study
In [19], the authors concentrated on applying a data-centric did not specify the particular methods used for this resam-
AI approach to detect U.S. Medicare fraud. This significantly pling.
enhanced ML models’ performance through careful data In their study, [24], the authors proposed unsupervised
preparation and feature engineering. Their approach showed DL techniques to detect procedure code overutilization in
superior results compared to traditional datasets in Medicare medical claims. To tackle the imbalanced data, the test set
fraud classification tasks. [6] proposed a study to detect was composed of outliers representing potential fraudulent
healthcare fraud instances by applying four ML algorithms. cases. The paper, [25], focused on assessing the performance
In their research, they identified 19 essential features, which of ML classifiers in the Medicare imbalanced dataset. The au-
they organized into four primary categories. thors applied the RUS method with various ensemble learn-
Upon examining the studies, we can observe the use of di- ing techniques to address class imbalances. Another paper,
verse methods in detecting fraud, such as ensemble methods, [26], explored the classification of healthcare fraud using the
decision-tree-based techniques, and BBN. Moreover, several highly imbalanced Medicare dataset by employing the RUS
works emphasize the important role of data preparation, method to address the imbalance issue. The results show RUS
feature engineering, and feature selection in enhancing the enhanced the AUC scores while reducing the training data
model’s performance. However, a common limitation ob- size. In the paper [11], the authors proposed the use of two
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

data balancing techniques, namely: Class Weighing Scheme achieved in each study.
(CWS) and ADASYN. Moreover, to classify instances, the
authors applied a range of ML algorithms. III. PROBLEM FORMULATION
Given the significant class imbalance in the Medicare Part B
C. DISCUSSION OF RELATED WORK dataset, as illustrated in Figure 1, with a ratio of 1:11,312 be-
The reviewed papers demonstrate a focus on employing ML tween fraudulent (minority class) and non-fraudulent claims
techniques for detecting various forms of healthcare fraud. A (majority class), traditional ML models face substantial chal-
significant challenge across these studies is the management lenges in accurately detecting instances of fraud. This im-
of imbalanced datasets, a prevalent issue in fraud detection. balance biases models towards the majority class, severely
[22], [25], [26], and [21] applied the RUS method, which undermining their capability to generalize and identify fraud-
randomly removed samples from the majority class to match ulent activities effectively.
the number of samples in the minority class. While it reduces To address this imbalance, we propose the use of the
time complexity and computational load, RUS significantly SMOTE-ENN algorithm. We denote the set of fraud de-
limits healthcare fraud detection. Its main drawback is the tection models as {fm }M m=1 , each trained on its respective
m Nm
potential loss of critical information, as it randomly removes subset of data Dm , where Dm = {(xm i , yi )}i=1 . Here,
majority-class instances. On the other hand, ROS, as applied xi represents the feature vector for the i-th claim, and yim
m

by [23], can be effective in providing a balanced dataset with- indicates its corresponding class label.
out losing information. However, it can lead to overfitting. The SMOTE-ENN algorithm (see Algorithm 1) is applied
By duplicating minority class samples, ROS can make the ′
to each subset Dm , to generate a balanced dataset Dm
model specific to the existing fraud instances, reducing its through synthetic sample generation and noise reduction.
generalizability to new or slightly different types of fraud. This process is formulated as:
Thus, it is important to apply new methods that generate
new instances, such as the SMOTE method. One paper in ′
Dm = SMOTE-ENN(Dm ). (1)
the related works adopted a hybrid resampling approach [14],
which combines under-sampling and over-sampling methods The primary challenge is to validate the effectiveness of
to mitigate their drawbacks. However, there is a lack of infor- the SMOTE-ENN approach to balancing the dataset and
mation regarding the methods used. Their hybrid approach improving the detection accuracy of the models fm . The
leaves a gap in understanding its efficacy and applicability performance of the models trained on the balanced dataset

in diverse healthcare fraud scenarios. Moreover, in [11], Dm will be assessed and compared to their performance on
two methods were applied, namely: ADASYN and CWS. the original imbalanced dataset Dm , with a focus on their
ADASYN generates synthetic samples for the minority class accuracy and generalization in detecting fraudulent activities.
based on a density distribution, which helps in creating more
diverse and representative samples. However, in complex IV. PROPOSED SOLUTION
Medicare fraud datasets, this method can introduce noise. This section outlines the proposed solution, focusing on
Overall, the major gaps in the studies on Medicare fraud achieving dataset balance. It begins by generating categorical
detection using ML largely stem from an inadequate explo- data and applying the SMOTE-ENN technique to numerical
ration of more sophisticated techniques to handle imbalanced data, specifically for the classification task. The approach will
datasets. There is a need for methods that can manage the be detailed in two phases: first, providing a comprehensive
complex, high-dimensional nature of Medicare data. The overview of the entire architecture, followed by a thorough
SMOTE method, known for its effectiveness in generating explanation of each component.
representative minority class samples, is notably not well
explored in this field. Additionally, the problem of noisy data A. OVERALL OVERVIEW
when generating new instances is not discussed; thus, taking The significant disparity in class distribution shown in the
this challenge into account is important when dealing with Medicare dataset poses a difficult obstacle to effectively iden-
imbalanced datasets. Leveraging the power of hybrid meth- tifying fraudulent claims. This challenge highlights the need
ods should also be taken into account. Furthermore, these for a strong technique that can fix imbalances to enhance
resampling methods could be significantly enhanced when model performance. Equation 1 presents the SMOTE-ENN
combined with ensemble learning classifiers, known for their algorithm as our recommended solution, which tackles the
robustness and generalizability. Addressing these gaps with imbalance by increasing the size of the minority class and
such advanced methodologies could significantly improve refining the dataset.
the accuracy and efficiency of healthcare fraud detection in Figure 1 presents the proposed architecture for healthcare
Medicare systems. fraud detection based on generating categorical data and the
Table 1 provides a comprehensive comparative overview SMOTE-ENN method formulated in 1. In our architecture
of the related works in the field of healthcare fraud detec- for classifying healthcare fraud claims, we first partition
tion. It details the datasets used, ML methods applied, data the dataset based on data type and then apply a series of
balancing techniques employed, and the evaluation metrics preprocessing steps to enhance data quality. To tackle data
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 1. Comparative table of related works

Ref Dataset ML Methods Data Balancing Method Evaluation


[21] Medicare Logistic Regression(LR), Random For- ROS, RUS, SMOTE, SMOTE vari- Area Under the Curve
est (RF), Gradient Boosting Trees ants, ADASYN (AUC)= 0.82
(GBT)
[20] Medicare Part B Naive Bayes(NB), LR, Decision Trees RUS AUC
(DT), K-Nearest Neighbors (KNN),
Support Vector Machine (SVM), RF
[22] Medicare Word2Vec (Skip-gram, Continuous Bag Undersampling AUC=0.870,
Of Words (CBOW)) Geometric Mean
(G-mean)=0.783
[23] Medicare Logistic Regression (LR), RF, GBT, ROS, RUS AUC=0.830
Multi-Layer Perceptron (MLP)
[14] Healthcare Trans- NB, LR, KNN, RF, Convolutional Neu- Hybrid Resampling Accuracy=97.58
actions ral Network (CNN)
[16] Part D Medicare eXtreme Gradient Boosting - AUC= 0.97
(XGBoost), RF
[17] Prescription LR, RF, Principal Component Analy- - Receiver Operating
Claims sis(PCA) Characteristic (ROC)=
0.76, F1-score= 0.88
[11] Healthcare LR, DT, RF, XGBoost CWS, ADASYN AUC=0.95
insurance
[2] Medicare Category Boosting (CatBoost), XG- - AUC= 0.95, Area
Boost, RF, Extremely Randomized Under the Precision-
Trees(ET), Light Gradient Boosting Recall Curve
Machine (LightGBM), DT, LR, Ensem- (AUPRC)=0.78
ble Feature Selection
[25] Medicare CatBoost, XGBoost, LightGBM, RF, RUS AUC=0.97,
ET AUPRC=0.92
[26] Medicare CatBoost, XGBoost, RF, ET RUS AUC=0.99
[19] U.S. Medicare XGBoost, RF - G- mean = 0.90, AUC =
0.962
[18] Texas Medicaid Bayesian Belief Network(BNN) - F-score=0.94
[6] Healthcare Insur- SVM, DT, RF, MLP - F-score=0.95
ance
[24] Healthcare Claims Deep Autoencoders - precision=0.87,
recall=1.00, F-
score=0.93

imbalance, we utilize SMOTE-ENN, a hybrid resampling This dataset contains approximately 9,449,361 records and
method. Along with this, we augment categorical data using a range of 29 distinct features. Many of the attributes are
the Random Sampling without Replacement method. Finally, provider demographic data, which we do not use for model-
we employ various ensemble learning classifiers for classifi- ing purposes. As a result, it serves as a valuable resource for
cation. our analytical investigations. The List of Excluded Individu-
als and Entities (LEIE) is managed by the Office of Inspector
B. DATA COLLECTION General (OIG) in compliance with Sections 1128 and 1156
The datasets used for the study include publicly accessible of the Social Security Act [28]. The Office of Inspector
Medicare data Physician and Other Practitioners (PartB) of General (OIG) maintains the authority to exclude healthcare
the year 2020, provided by the Centers for Medicare & providers from engaging in federally financed healthcare
Medicaid Services (CMS) [27], and the List of Excluded programs due to a range of legitimate reasons. It is worth
Individuals and Entities (LEIE). mentioning that individuals who are placed on the exclusion
list are considered unable to receive payments from Federal
The Medicare dataset was obtained in a comma-delimited
healthcare programs for any services that they provide. To
format (CSV), making it suitable for additional data process-
pursue reinstatement, those who have been excluded must
ing procedures. To facilitate an in-depth comprehension of
adhere to a prescribed procedure after successfully fulfilling
the data, the CMS provides methodological documentation
the duration of their exclusion. The current structure of
that clarifies its techniques for collecting and processing data.
the LEIE data consists of 18 attributes that give relevant
This is further supported by data dictionaries that outline the
information regarding the provider under investigation and
definitions of all attributes present in the datasets. The pro-
outline the precise reasons for their exclusion.
posed study is specifically centered on the dataset known as
"Medicare Part B Summary by Provider and Service 2020."
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Imbalanced Preprocessing Data splitting Data Classification


data Balancing

Train-set

Medicare LEIE
Part B Dataset
Labeling Test-set

FIGURE 1. Proposed Architecture for Healthcare Fraud Detection Based on SMOTE-ENN

C. PREPROCESSING crucial criteria are created for the detection of fraudulent ac-
tivities: First, the National Provider Identifier (NPI) from Part
Data were carefully preprocessed under the Centers for B claims should be present in the LEIE dataset. Secondly,
Medicare & Medicaid Services (CMS) methodological doc- the year of Part B precedes the year in which the exclusion
umentation, which provides valuable insights into their data period concludes. When these conditions are met, the record
processing procedures and comprehensive data dictionaries is labeled as fraud; otherwise, it is labeled non-fraud. The
that meticulously define the dataset’s attributes. We carefully labeling technique we describe here is the same technique
follow the data preparation method proposed in [19]. We outlined in [8], [30], [31], [32]. After labeling the dataset,
start by adding a new column that serves later for labeling, we remove the columns with low pertinence in the dataset,
denoted as "Year." The value "2020" is assigned, representing namely: NPI, YEAR, HCPCS_Drug_Ind, and keep only 9
the year of the dataset. Following this, we move on to iden- features. Table 2 presents the features used in experiments
tifying and rectifying any instances of missing values. To fa- based on work [31].
cilitate this procedure, we utilized the methodology described Finally, we normalize the dataset to ensure that each fea-
in the data dictionary files that were supplied by the Centers ture contributes equally during analysis or modeling. This
for Medicare & Medicaid Services (CMS) [29]. The pro- phase protects statistical learning methods by preventing
cess of imputing missing data was conducted systematically. larger numeric values from overwhelming smaller ones [33].
Specifically, when faced with missing information regarding
the gender of providers, we introduced a third category de- D. SPLITTING DATASET BASED ON DATA TYPE
noted as "U" to represent unknown values. Next, the gender In this step, the dataset is divided according to the data type,
value was encoded numerically, with the assignment of M=1, whether numerical or categorical, to facilitate separate treat-
F=0, and U=2. A comprehensive assessment was conducted ment and preservation of the local structure of the informa-
to analyze the characteristics of the missing values in the tion. The dataset comprises eight numerical features and one
remaining columns, particularly concerning the absence of categorical feature, denoted as "Rndrng_Prvdr_Type". Figure
provider names and geographic details. Due to their low 2 represents the splitting into numerical and categorical data.
relevance to our study, we have chosen to eliminate these The provider type attribute is a categorical variable that
columns from further consideration. Our subsequent step describes the provider or supplier’s medical speciality, which
involved the selection of specific rows that met the condition encompasses 102 distinct types (e.g., Internal Medicine,
of having the value ’N’ in the ’HCPCS_Drug_Ind’ column Family Medicine, Cardiology, etc.). The objective is to gen-
as recommended in the CMS documentation. The second erate instances based on the existing "Provider Types". To
dataset used in this study, which plays an essential role in the accomplish this, the Random Sampling without Replace-
labeling procedure of the Part B dataset, is the LEIE dataset. ment method is employed. Initially, the 102 provider types
It is formatted as a character-separated value file (CSV). The are shuffled to ensure a random order. Subsequently, each
relevant features of this dataset are NPI, the exclusion type, Provider type is selected sequentially from this shuffled
the exclusion date, waiver data, and the reinstatement date. list, ensuring every type is selected once before any type
In the LEIE CSV file, these elements are named: NPI, EX- is selected again. After exhausting all 102 types, the list
CLTYPE, EXCLDATE, REINDATE, WAIVERDATE, and is reshuffled, and the selection process is repeated. This
WVRSTATE, respectively. We followed the same method- process continues until the desired number of k instances is
ology presented in [20], to prepare the LEIE data. After reached. This method guarantees that each provider type is
preparing the two datasets, we proceed to the labeling step represented fairly and equally across the total instances, pre-
using the LEIE dataset. During the process of labeling, two venting any bias towards certain types. Figure 3 explains the
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 2. Description of Medicare Data Features

Feature Description Type


Rndrng_Prvdr_Gndr Provider Gender Categorical
Rndrng_Prvdr_Type Type of Provider Categorical
Tot_Benes Number of Medicare Beneficiaries Numerical
Tot_Srvcs Number of Services Numerical
Tot_Bene_Day_Srvcs Number of Distinct Medicare Beneficiary/Per Day Services Numerical
Avg_Sbmtd_Chrg Average Submitted Charge Amount Numerical
Avg_Mdcr_Alowd_Amt Average Medicare Allowed Amount Numerical
Avg_Mdcr_Pymt_Amt Average Medicare Payment Amount Numerical
Avg_Mdcr_Stdzd_Amt Average Medicare Standardized Amount Numerical

Split the dataset based on data type

Categorical Numerical

FIGURE 2. Splitting the dataset based on data type

generation of "ProviderType" based on 102 existing types. sampling to tackle the challenge of imbalanced datasets,
as proposed by [34]. The initial phase involves augment-
E. TRAIN-TEST-SPLIT ing the minority class representation through the SMOTE
To accurately evaluate our models’ performance, we di- algorithm, which synthesizes synthetic instances by linear
vide our dataset into training and test sets, using the interpolation between existing minority class samples and
"Train_test_split" method. This approach enables assessing their nearest neighbors [35]. Nevertheless, the randomness
the models’ ability to perform effectively on new, unseen data in selection intrinsic to SMOTE can introduce noise, poten-
and determining their overall efficacy. We split the dataset tially impeding the model’s ability to generalize [36]. To
into Train_Test_Sets based on the ratio 80:20, where 80% mitigate such effects, the Edited Nearest Neighbor (ENN)
of the dataset was assigned to the training set, while the method is employed after SMOTE. This subsequent step aims
remaining 20% constitute the test set. to purify the dataset by discarding instances that introduce
noise or redundancy. The procedure involves examining each
F. SYNTHETIC MINORITY OVER-SAMPLING instance to ensure the consistency of its class label with
TECHNIQUE WITH EDITED NEAREST NEIGHBORS those of its nearest neighbors, thus enhancing the dataset’s
(SMOTE-ENN) overall quality for subsequent modeling. Using the SMOTE-
SMOTE-ENN is a composite resampling approach, amal- ENN approach, we generated 7,119,172 synthetic instances,
gamating the principles of both oversampling and under- thereby balancing the dataset and achieving a better class
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Provider Type
Imbalanced
1 Internal Medicine dataset

Nurse Practitioner
Generate randomly
15983879 types based SMOTE
on 102 types Cardiology
1 Provider Type
Randomly select Xi from the
Cardiology
Internal Medicine minority class

Adult Congenital
Nurse Practitioner
Heart Disease Identify its K-nearest
neighbors: Kxi
.
Cardiology .
. .
. Randomly select xi
.
. from the minority class
.
.
. .
. . Generate a new instance:
. .
xnew= xi+ λ×(xni-xi)
Adult Congenital .
102 Heart Disease .
.
.
. ENN

Adult Congenital Identify the xi K-nearest


15983879 Heart Disease neighbors: Kxi

else
If the majority of Kxi are from the
FIGURE 3. Generation of Provider Type. majority class:
remove XI from the dataset

distribution.
Figure 4 demonstrates the detailed steps of this proce-
dure, showing how SMOTE generates synthetic instances and
Balanced
ENN eliminates noise examples. Having a strong method- dataset

ological foundation is essential for creating reliable fraud


detection algorithms that can generalize effectively across FIGURE 4. SMOTE-ENN process.
various types of claims. The relationship between the prob-
lem statement presented in Equation 1 and the methodolog-
ical framework shown in Figures 1 and 4 demonstrates dataset S ′ , primed for training resilient machine learning
our thorough strategy for addressing the class imbalance models. By leveraging SMOTE for enrichment and ENN for
problem. This interaction is the foundation of our technique, purification, the SMOTE-ENN algorithm significantly ele-
improving the accuracy and generalization capacities of the vates the dataset’s utility, thus serving as an essential in-
detection models and tackling the core difficulty posed by the strument in optimizing classifier efficacy amidst the complex
Medicare Part B dataset. terrain of healthcare fraud detection.
Algorithm 1 presents the SMOTE-ENN method, specif-
ically tailored for balancing the Medicare Part B Dataset G. CLASSIFICATION
through a combined approach of oversampling minority To classify data as fraud or legitimate, we employ six ML
classes and undersampling majority classes. In its initial classifiers, namely: Extreme Gradient Boosting (XGBoost),
phase, the algorithm focuses on oversampling. It randomly Adaptive Boosting (Adaboost), Light Gradient Boosting Ma-
selects a minority class instance xi and determines its k chine (LGBM), Decision Trees (DT), Logistic Regression
nearest neighbors, thereby creating a subset Sk . A synthetic (LR), and Random Forest classifiers. Ensemble approaches
instance p is then interpolated between xi and a random like XGBoost, Adaboost, LightGBM, and RF are widely
member from Sk , which is subsequently labeled as part of recognized for their resilience and effectiveness within the
the minority class and integrated into the dataset S. This field of ML [37].
process enhances the minority class’s presence, mitigating • Extreme Gradient Boosting (XGBoost): a highly effi-
the imbalance. cient and scalable variant of gradient boosting, recog-
The algorithm then transitions to undersampling, aiming nized for its exceptional performance and speed, making
to refine the majority class by excising instances likely to it a fundamental component of our ensemble of classi-
introduce classification noise. It selects a random instance fiers [38].
xr from S and identifies its k nearest neighbors. Should xr • Adaptive Boosting (AdaBoost): this method improves
predominantly associate with the opposite class, it is pruned, the performance of basic models by concentrating on in-
reducing the risk of overfitting and bolstering the classifier’s stances that were incorrectly identified by earlier models
generalizability. [39].
The culmination of this two-phase procedure is a balanced • Light Gradient Boosting Machine (LightGBM): a popu-

8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Algorithm 1 SMOTE-ENN Algorithm for Balancing Medicare Part B Dataset


1: function SMOTE-ENN(Dm )
m Nm
2: Input: Training dataset Dm = {(xm i , yi )}i=1

3: Output: Balanced dataset Dm
4: Oversampling ▷ Step 1: Oversampling the minority class
5: Select a sample (xm m
i , yi ) randomly from minority class instances in Dm
6: Sk ← Find the k nearest minority class neighbors of xm i
7: p ← Generate a synthetic sample by interpolation between xm i and a randomly selected xk from Sk
8: Assign the minority class label to the new sample p
9: Add the new sample p to the dataset Dm
10: Undersampling ▷ Step 2: Undersampling the majority class
11: Select a sample (xm r , yr
m
) randomly from D m
12: Sk ← Find the k nearest neighbors of xm r
13: if the majority of xm r ’s neighbors are from the majority class then
14: Remove (xm r , y m
r ) from Dm
15: end if

16: Dm ← Balanced dataset

17: return Dm
18: end function

lar gradient boosting framework known for its efficiency mance experiments are conducted on the dataset outlined in
in handling large-scale data while reducing computa- Section IV. This dataset comprises both valid and fraudulent
tional resources [40]. healthcare claims. The following subsections describe the
• Decision trees: are used for their simplicity and in- validation, methodologies, and evaluation metrics employed.
terpretability to classify data by dividing the dataset
recursively [41]. A. VALIDATION
• Logistic Regression (LR): Utilizes a logistic function
To assess the performance of the proposed models, we
to predict probabilities, offering an accurate model for
employ two common methods: Train_TestSplit and Cross-
binary classification tasks like distinguishing between
validation. The Train-Test Split method involves dividing the
fraudulent and legitimate transactions [42].
dataset into two separate subsets: a training set and a testing
• Random Forest: improve predictive accuracy and pre-
set. This partitioning enables the evaluation of the model’s
vent overfitting by combining predictions from many
performance on unseen data. In our work, we adopt an 80:20
decision trees, each trained on different subsets of data.
split ratio, allocating 80% of the data for training and 20% for
This makes them a crucial component of our ensemble
testing purposes. On the other hand, k-fold Cross-validation
strategy [43].
is crucial in the context of healthcare fraud detection, and
Our choice of these classifiers is based on their combined involves partitioning the data into five distinct subsets. Cross-
robustness and effectiveness in the field of machine learn- validation significantly mitigates false positives and nega-
ing [37]. Ensemble approaches like XGBoost, AdaBoost, tives, enhancing the model’s accuracy in identifying fraud
LightGBM, and RF excel at combining different models to and ensuring a more robust and reliable evaluation.
capture a wider range of patterns and linkages in the data.
This characteristic is especially beneficial in healthcare fraud
detection, where the intricate and ever-changing fraudulent B. EVALUATION METRICS
patterns require advanced, flexible analytical approaches Evaluation metrics are important when it comes to assessing
[44]. We utilize the distinct capabilities of each classifier to the efficacy of ML models in the detection of healthcare
tackle the difficulties involved in identifying healthcare fraud, fraud. Accuracy, precision, recall, F1 score, and area under
guaranteeing the ongoing effectiveness of our model despite the curve (AUC) are frequently used metrics. Specifically,
changing fraud patterns. the AUC metric plots the true positive rate against the false
positive rate at various threshold settings [45]. Additionally,
V. EXPERIMENTAL RESULTS in the context of the imbalanced dataset, we use the Area
This section provides an evaluation and validation of the Under the Precision-Recall Curve (AUPRC) which offers
performance of the presented models in the detection of better insight into the classification performance. It measures
healthcare fraud. We utilize a variety of libraries available the relationship between precision-recall and presents it in a
in the Python programming language, including Pandas, single value. A higher AUCPR value indicates good perfor-
Numpy, and Matplotlib packages from the sklearn library. mance in correctly identifying positive cases [46]. The rest of
To assess the effectiveness of the models, a series of perfor- the metrics are described as follows:
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

• True positive (TP): a fraud sample is correctly identi- TABLE 3. Baseline classification using Train_Test_split
fied as a fraud.
• True negative (TN): a non-fraud sample is correctly Classifier Accuracy F1-Score Precision Recall AUC
LR 0.9999 0.0000 0.0000 0.0000 0.6200
identified as non-fraud. DT 0.9999 0.0045 0.0062 0.0052 0.5030
• False positive (FP): a non-fraud sample is incorrectly RF 0.9999 0.0000 0.0000 0.0000 0.5467
identified as fraud. XGBoost 0.9999 0.0000 0.0000 0.0000 0.8337
• False negative (FN): a fraud simple is incorrectly iden- Adaboost 0.9999 0.0000 0.0000 0.0000 0.8073
tified as non-fraud. LGBM 0.9998 0.0000 0.0000 0.0000 0.7986
• Total positive (P): TP+FN.
• Accuracy: The accuracy metric represents the percent-
a perfect accuracy of 0.9999, which signifies good classifi-
age of occurrences that are correctly classified. It is
cation of the instances. Nevertheless, the F1-score, precision,
calculated using the following formula [47]:
and recall metrics for all classifiers continuously exhibit a
TP + TN value of 0.0000. For the AUC values, XGBoost achieves
(2)
P +N the highest at 0.7444 and RF the lowest at 0.4966. These
• Recall: Also referred to as sensitivity, the true positive AUC values still present a significant challenge to effectively
rate is a measure of the proportion of correctly classified discriminating between classes.
instances in the positive class. It is computed using the
TABLE 4. Baseline classification using Cross-validation
following formula [48]:
TP Classifier Accuracy F1-Score Precision Recall AUC
(3) LR 0.9999 0.0000 0.0000 0.0000 0.6044
TP + FN
DT 0.9998 0.0000 0.0000 0.0000 0.4999
• Precision: it indicates the ratio of the positive samples RF 0.9999 0.0000 0.0000 0.0000 0.4966
that are fraud. It is calculated as follows [49]: XGBoost 0.9990 0.0000 0.0000 0.0000 0.7444
Adaboost 0.9999 0.0000 0.0000 0.0000 0.7374
TP LGBM 0.9998 0.0000 0.0000 0.0000 0.7039
(4)
TP + FP
• F1-score: it is the weighted average of both Precision
and Recall. The F1 score is computed using the follow- 2) Classification results using SMOTE-ENN
ing formula [50]: Table 5 presents the obtained classification results using
SMOTE-ENN and train-test-split methods. We can observe
Precision ∗ Recall that the DT classifier exhibits the highest performance across
2∗ (5)
Precision + Recall all metrics, with accuracy, F1-score, precision, recall, and
AUC, each at 0.99. XGBoost also presents good results with
C. RESULTS
an accuracy, F1-score, and AUC of 0.95, a precision of 0.94,
The objective of our study is to assess different models and
and a recall of 0.96. RF and LBGM classifiers achieve a
their implications for enhancing fraud detection within the
similar accuracy of 0.90; RF achieves an F1-score of 0.90,
healthcare industry. This section provides a comprehensive
a precision of 0.82, a 0.94 AUC, and a high recall of 0.99.
analysis and discussion of the outcomes obtained through the
Whereas LGBM obtains an F1-score of 0.89, precision of
implementation of the suggested approach on the Medicare
0.88, recall, and AUC of 0.90. Conversely, LR and Adaboost
PartB dataset. We apply Logistic Regression LR, Decision-
demonstrate relatively poor results. LR obtains an accuracy
Trees DT, Random Forest RF, XGBoost, Adaboost, and
of 0.65, an F1-score of 0.68, a precision of 0.57, a 0.67 AUC,
LGBM, as classifiers, with accuracy, F1-score, precision, and
and a high recall of 0.83. For Adaboost, the results show
recall as evaluation metrics.
a low accuracy of 0.64, an F1-score of 0.69, a precision of
0.56, a 0.67 AUC, and a notably high recall of 0.89. Table 8
1) Classification results at baseline
Table 3 presents the obtained results of different ML methods TABLE 5. Classification results using SMOTE-ENN and Train_Test_split
to detect healthcare fraud using the Train_Test_split method.
While the classifiers have a remarkable accuracy of 0.9999, Classifier Accuracy F1-Score Precision Recall AUC
they exhibit shortcomings in reliably identifying positive LR 0.65 0.68 0.57 0.83 0.67
instances, as demonstrated by precision, recall, and F1-score DT 0.99 0.99 0.99 0.99 0.99
RF 0.90 0.90 0.82 0.99 0.94
values of 0.0000. The AUC values range from a low value XGBoost 0.95 0.95 0.94 0.96 0.95
of 0.5030 in the case of the DT to a high value of 0.8337 Adaboost 0.64 0.69 0.56 0.89 0.67
for XGBoost. These values indicate a moderate capability LGBM 0.90 0.89 0.88 0.90 0.90
for distinguishing between the two classes. Table 4 displays
the obtained results of baseline classification of various ML displays the classification results using SMOTE-ENN and
algorithms using Cross-validation. All the classifiers achieve cross-validation. DT outperforms the other classifiers with
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

ROC Curve for Train-Test-split


AUPRC curve for Train-Test-split
1.0
1.0

0.8 0.9

0.8
True Positive Rate

0.6

Precision
0.7
0.4

0.6
Logistic Regression (area = 0.67) Logistic Regression (AUPRC = 0.74)
0.2 Decision Trees (area = 1.00) Decision Trees (AUPRC = 1.00)
Random Forest (area = 0.95) Random Forest (AUPRC = 0.94)
XGBoost (area = 0.96) 0.5 XGBoost (AUPRC = 0.96)
AdaBoost (area = 0.67) AdaBoost (AUPRC = 0.75)
LGBM (AUPRC = 0.92)
0.0 LGBM (area = 0.91)
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0 Recall
False Positive Rate
FIGURE 6. AUPRC curve for each model in Train-test-split.
FIGURE 5. ROC curve for each model in Train-test-split.

accuracy, F1-score, and recall of 1.00, precision of 0.99, experiments show significant performance variations among
and 0.95 AUC. Following this, XGBoost attains remarkable different ML models.
results, with an accuracy, F1-score, precision, recall of 0.96,
and a good AUC value of 0.99. RF presents good results, with 1) Discussion of the baseline classification results
0.95 for all metrics and 0.99 for AUC. LGBM also presents The initial baseline results from both Train_test_split and
good results with 0.91 accuracy, F1-score, and recall. The cross-validation methods exhibit high accuracy (0.9999)
model also achieves a precision of 0.90 and an AUC of 0.97. across all classifiers. Nevertheless, these high results of
Simultaneously, LR and Adaboost classifiers attain closely accuracy are misleading, especially when facing a highly
similar outcomes, with an accuracy and F1-score of 0.65, imbalanced dataset. When the majority class dominates the
precision of 0.69, recall of 0.67, and a 0.73 AUC for LR. minority class, the models always predict the majority class
For AdaBoost, the metrics indicate an accuracy of 0.65, an without actually learning to identify the characteristics of the
F1-score of 0.64, a precision of 0.70, a recall of 0.67, and an minority class, in this case, the fraudulent instances. This is
AUC score of 0.68. highlighted by the low results of precision, recall, and F1-
score among all classifiers, underscoring their ineffectiveness
TABLE 6. Classification results using SMOTE-ENN and Cross-validation
in accurately identifying fraud instances. Moreover, while
AUC scores show some improvement, reaching the highest
Classifier Accuracy F1-Score Precision Recall AUC
LR 0.65 0.65 0.69 0.67 0.73
score of 0.8337 by XGBoost, these values are still not optimal
DT 1.00 1.00 0.99 1.00 0.95 for reliable fraud detection.
RF 0.95 0.95 0.95 0.95 0.99 In the Appendix A, we present the obtained results of base-
XGBoost 0.96 0.96 0.96 0.96 0.99 line classification utilizing train-test splits with two ratios,
Adaboost 0.65 0.64 0.70 0.67 0.68 specifically 25:75 and 30:70. The baseline classifications
LGBM 0.91 0.91 0.90 0.91 0.97
for the 25:75 and 30:70 ratios showed high accuracy with
LR, DT, RF, XGBoost, Adaboost, and LGBM classifiers.
However, they had notably low F1-Scores, Precision, Recall,
D. DISCUSSION and AUC values, suggesting a limited ability to predict the
This paper introduces a new ML approach, employing the minority class effectively. These findings indicate that a
hybrid resampling method SMOTE-ENN to tackle the im- severe class imbalance can significantly impact the results of
balanced data problem within the Medicare dataset. Addi- the classification task.
tionally, it investigates the unique treatment of categorical
features alongside numerical data to enhance the efficacy of 2) Discussion of the classification results using SMOTE-ENN
the fraud detection process. This approach demonstrated ef- The SMOTE-ENN technique, which generates new minority
ficacy compared to traditional techniques such as ROS, RUS, class instances and removes overlapping samples from the
and the basic SMOTE method, particularly in its proficiency dataset, has varying effects on different classifiers. It notably
in handling imbalanced data while reducing noisy data. The improves tree-based and ensemble methods but has limited
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

ROC Curves
1.0

0.8
True Positive Rate

0.6

0.4

Decision Tree (AUC = 1.00)


0.2 Random Forest (AUC = 1.00)
LogisticRegression (AUC = 0.73)
XGBoost (AUC = 0.99)
AdaBoostClassifier (AUC = 0.68)
0.0 LGBM (AUC = 0.97)
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate

FIGURE 7. ROC curve for each model in Cross-validation. FIGURE 8. AUPRC curve for each model in Cross-validation.

influence on LR and Adaboost. For instance, DT exhibits per- 4) Comparaison with State-of-the-Art
fect results with 0.99 across all metrics with Train_test_split. Our research significantly contributes to the healthcare fraud
Similarly, for cross-validation, the model attains perfect re- field by addressing the imbalanced data problem. By utilizing
sults of 1.00 with accuracy, F1-score, and recall. These strong the SMOTE-ENN technique, along with the generation of
results affirm the effectiveness of the proposed approach to the categorical feature, we have enhanced the ability of ML
boost the classifier’s performance on imbalanced data sets. models to accurately identify instances of fraud. This method
The difference in the algorithms’ handling of imbalanced surpasses traditional techniques like ROS, RUS, and basic
data and synthesized instances is due to LR’s linearity and SMOTE, as well as other studies that focus on different
Adaboost’s sensitivity to noise, which prevent them from sampling techniques or embeddings. Our research, therefore,
fully utilizing over-sampling benefits. Our strategy focuses contributes to the ongoing efforts to develop effective sys-
on enhancing fraud detection by adjusting class distribution tems to detect fraud in healthcare.
and strengthening the models’ capacity to learn from the mi-
Tables 10, 11, and 12 in Appendix B B present the
nority class. Creating new fraudulent transactions and remov-
obtained results of different classifiers (LR, DT, RF, XG-
ing noisy data helps improve generalization from training to
Boost, Adaboost, and LGBM) using three distinct data sam-
unseen data, enhancing the effectiveness of fraud detection
pling methods: RUS, ROS, and SMOTE combined with
across different classification methods.
Train_Test_split for the classification task. For instance,
when using RUS, performance measures show a respectable
3) Discussion of AUPRC curve level of accuracy, with the LGBM classifier slightly outper-
Another important point in the imbalanced data is the anal- forming in accuracy (0.74), recall (0.77), and AUC (0.75).
ysis of the ROC-AUC and AUPRC curves, which plays a Nevertheless, all classifiers face challenges in achieving pre-
crucial role in understanding the performance of ML models cision and F1-score, with both metrics registering zero. RUS
in the context of imbalanced datasets for healthcare fraud may enhance model sensitivity but significantly reduce pre-
detection. The ROC-AUC curve in figures 5 and 7 reinforces cision, resulting in a high rate of false positives. Conversely,
our initial findings, notably highlighting the perfect results ROS significantly improves accuracy for RF and XGBoost
of DT. This consistency between the ROC curve results and models but fails to enhance precision or recall for minority
our initial findings presents a comprehensive validation of the class predictions, revealing a crucial limitation in detecting
model’s performance, particularly in the context of imbal- the minority class. SMOTE provides the highest accuracy,
anced datasets in healthcare fraud detection. Moreover, the especially for DT and RF, yet does not address the problem of
AUPRC curves in 6 and 8, which are relevant in the case of near-zero precision and recall.
imbalanced data, confirm the obtained results. These findings SMOTE-ENN excels at handling imbalanced datasets by
from the ROC-AUC and AUPRC curves are important to effectively balancing and removing noisy data, surpassing
understanding our models since they give a clear overview approaches such as RUS, ROS, and SMOTE. This method
of each model’s strengths and weaknesses. effectively addresses the limitations of singular balancing
methods, providing a sophisticated approach to improve
12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Comparison with Traditional Methods


1.0 Accuracy
F1-Score
Precision
Recall
AUC
0.8

0.6
Score

0.4

0.2

0.0
RUS ROS SMOTE Our work
Resampling Methods

FIGURE 9. Comparison with traditional methods.

the classifier’s performance in the presence of class imbal- [15]. These models are adept at capturing intricate patterns
ances. Table 7 compares the best outcomes achieved by in extensive datasets, making them especially well-suited
each method, specifically RUS, ROS, SMOTE, and our pro- for applications where the minority class is vital, such as
posed methodology. Our methodology outperformed stan- fraud detection. LSTM networks, a type of RNN, are highly
dard methods in all evaluated measures, demonstrating its effective at assessing sequential data, such as time-series
effectiveness. medical billing information or patient treatment sequences, to
Figure 9 illustrates a comparison between RUS, ROS, identify anomalies or fraudulent trends [51]. Similarly, Con-
SMOTE, and our proposed approach, highlighting the su- volutional Neural Networks (CNNs) excel at handling class
perior performance of our technique in terms of accuracy, imbalance by leveraging their feature extraction capabilities,
F1-score, precision, recall, and AUC, surpassing traditional particularly in high-dimensional data spaces like images and
methods significantly. signal processing. This renders them appropriate for complex
tasks that challenge traditional techniques, such as medical
TABLE 7. Comparison with traditional methods
imaging for diagnosing rare conditions among several normal
cases. Their hierarchical learning method efficiently reveals
Classifier Accuracy F1-Score Precision Recall AUC
RUS 0.74 0.00 0.00 0.77 0.75
important patterns in imbalanced datasets, demonstrating
ROS 0.99 0.00 0.00 0.00 0.50 its adaptability in various fields, such as natural language
SMOTE 0.99 0.00 0.00 0.01 0.50 processing and healthcare diagnostics [52]. By exploring
Our work 0.99 0.99 0.99 0.99 0.99 these avenues, future research can build upon our findings,
potentially leading to more robust and comprehensive fraud
detection systems in healthcare.
5) Limitations
Despite this, our research methodology has certain limita-
tions that need to be addressed. While this approach demon- VI. CONCLUSION
strates remarkable results, it was not uniformly observed This study emphasizes the need to address imbalanced data
across all ML models. In addition, the study relies on the in healthcare fraud detection by introducing a novel ML
Medicare PartB dataset, which might limit the generaliz- framework based on the SMOTE-ENN hybrid resampling
ability of our findings to other datasets. Future work could method. This method effectively balances datasets by creat-
explore several promising directions. One direction is the ing synthetic samples while eliminating noisy data, thereby
exploration of a new dataset, such as Medicare PartD, that en- enhancing the model’s accuracy. Another aspect of our study
compasses prescription drug benefits. This presents a differ- is the application of the AUC and AUPRC as evaluation
ent set of challenges and patterns of fraud compared to other metrics. These metrics facilitated a thorough analysis of the
parts of Medicare. Moreover, advanced deep learning mod- models’ performance, with the AUPRC proving to be espe-
els like Long Short-Term Memory (LSTM), Convolutional cially critical in the context of imbalanced datasets. Thus,
Neural Networks (CNN), and Recurrent Neural Networks this approach serves as a basis for new researchers to apply
(RNN) show potential for addressing imbalanced datasets new approaches to detect healthcare fraud. Future research
VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

directions include evaluating SMOTE-ENN’s performance TABLE 11. Classification results using ROS and Train_Test_split
in diverse healthcare fraud scenarios and combining it with
innovative AI technologies such as deep learning (DL) to Classifier Accuracy F1-Score Precision Recall AUC
LR 0.52 0.00 0.00 0.83 0.67
enhance the effectiveness of fraud detection methods. DT 0.52 0.00 0.00 0.66 0.67
. RF 0.99 0.00 0.00 0.00 0.5
XGBoost 0.96 0.00 0.00 0.21 0.58
APPENDIX A BASELINE CLASSIFICATION WITH Adaboost 0.71 0.00 0.00 0.75 0.72
DIFFERENT TRAIN_TEST_SPLIT RATIOS LGBM 0.74 0.00 0.00 0.77 0.75
The appendix provides two tables demonstrating the perfor-
mance of six different classifiers (LR, DT, RF, XGBoost, TABLE 12. Classification results using SMOTE and Train_Test_split
Adaboost, and LGBM) across two training-to-testing ratios
(25:75 and 30:70) on the Medicare PartB dataset. Both tables Classifier Accuracy F1-Score Precision Recall AUC
LR 0.52 0.00 0.00 0.74 0.63
8 and 9 present a comparative comparison across various DT 0.99 0.00 0.00 0.01 0.50
metrics: Accuracy, F1-Score, Precision, Recall, and AUC. RF 0.99 0.00 0.00 0.00 0.49
XGBoost 0.95 0.00 0.00 0.21 0.58
TABLE 8. Baseline classification using 25:75 ratio Train_Test_split Adaboost 0.67 0.00 0.00 0.70 0.69
LGBM 0.88 0.00 0.00 0.37 0.62
Classifier Accuracy F1-Score Precision Recall AUC
LR 0.9999 0.0000 0.0000 0.0000 0.4999
DT 0.9999 0.0045 0.0062 0.0052 0.4999 [2] J. T. Hancock, R. A. Bauder, H. Wang, and T. M. Khoshgoftaar, “Explain-
RF 0.9999 0.0000 0.0000 0.0000 0.4999 able machine learning models for medicare fraud detection,” Journal of
XGBoost 0.9999 0.0000 0.0000 0.0000 0.5 Big Data, vol. 10, no. 1, p. 154, 2023.
Adaboost 0.9999 0.0000 0.0000 0.0000 0.4999 [3] A. Alanazi, “Using machine learning for healthcare challenges and oppor-
LGBM 0.9998 0.0000 0.0000 0.0000 0.4999 tunities,” Informatics in Medicine Unlocked, vol. 30, p. 100924, 2022.
[4] R. A. Bauder and T. M. Khoshgoftaar, “The detection of medicare fraud
using machine learning methods with excluded provider labels,” in The
Thirty-First International Flairs Conference, 2018.
TABLE 9. Baseline classification using 30:70 ratio Train_Test_split
[5] ——, “Medicare Fraud Detection Using Machine Learning Methods,”
in 2017 16th IEEE International Conference on Machine Learning
Classifier Accuracy F1-Score Precision Recall AUC and Applications (ICMLA). IEEE, Dec. 2017, pp. 858–865. [Online].
LR 0.9999 0.0000 0.0000 0.0000 0.4999 Available: http://ieeexplore.ieee.org/document/8260744/
DT 0.9999 0.0003 0.0003 0.0004 0.5 [6] V. Nalluri, J.-R. Chang, L.-S. Chen, and J.-C. Chen, “Building prediction
RF 0.9999 0.0000 0.0000 0.0000 0.4999 models and discovering important factors of health insurance fraud using
XGBoost 0.9999 0.0000 0.0000 0.0000 0.5 machine learning methods,” Journal of Ambient Intelligence and Human-
ized Computing, vol. 14, no. 7, pp. 9607–9619, 2023.
Adaboost 0.9999 0.0000 0.0000 0.0000 0.4999
[7] P. Dua and S. Bais, “Supervised learning methods for fraud detection
LGBM 0.9998 0.0003 0.0003 0.0004 0.502 in healthcare insurance,” Machine learning in healthcare informatics, pp.
261–285, 2014.
[8] R. Bauder, R. da Rosa, and T. Khoshgoftaar, “Identifying medicare
APPENDIX B CLASSIFICATION RESULTS USING provider fraud with unsupervised machine learning,” in 2018 IEEE inter-
national conference on information Reuse and integration (IRI). IEEE,
STATE-OF-THE-ART RESAMPLING METHODS 2018, pp. 285–292.
This appendix presents the results of the classification utiliz- [9] Centers for Medicare and Medicaid Services, “Research, statistics, data,
ing six algorithms as well as traditional resampling methods, and systems,” 2017. [Online]. Available: https://www.cms.gov/research-s
tatistics-data-and-systems/research-statistics-data-and-systems.html
including ROS, RUS, and SMOTE. Table 10 presents the [10] P. Brennan, “A comprehensive survey of methods for overcoming the class
classification results using the RUS method; Table 11 shows imbalance problem in fraud detection,” Institute of technology Blanchard-
the experiment results using ROS; and Table 12 outlines the stown Dublin, Ireland, 2012.
[11] N. Agrawal and S. Panigrahi, “A comparative analysis of fraud detection
obtained results using the SMOTE method. in healthcare using data balancing & machine learning techniques,” in
2023 International Conference on Communication, Circuits, and Systems
TABLE 10. Classification results using RUS and Train_Test_split (IC3S). IEEE, 2023, pp. 1–4.
[12] M. Herland, R. A. Bauder, and T. M. Khoshgoftaar, “The effects of class
Classifier Accuracy F1-Score Precision Recall AUC rarity on the evaluation of supervised healthcare fraud detection models,”
Journal of Big Data, vol. 6, pp. 1–33, 2019.
LR 0.45 0.00 0.00 0.85 0.65
[13] J. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “The effects of
DT 0.69 0.00 0.00 0.66 0.67 random undersampling for big data medicare fraud detection,” in 2022
RF 0.72 0.00 0.00 0.77 0.75 IEEE International Conference on Service-Oriented System Engineering
XGBoost 0.73 0.00 0.00 0.75 0.74 (SOSE). IEEE, 2022, pp. 141–146.
Adaboost 0.68 0.00 0.00 0.76 0.72 [14] A. Mehbodniya, I. Alam, S. Pande, R. Neware, K. P. Rane, M. Shabaz, and
LGBM 0.74 0.00 0.00 0.77 0.75 M. V. Madhavan, “Financial fraud detection in healthcare using machine
learning and deep learning techniques,” Security and Communication
Networks, vol. 2021, pp. 1–8, 2021.
[15] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing,
REFERENCES “Learning from class-imbalanced data: Review of methods and applica-
[1] L. Morris, “Combating fraud in health care: an essential component of any tions,” Expert systems with applications, vol. 73, pp. 220–239, 2017.
cost containment strategy,” Health Affairs, vol. 28, no. 5, pp. 1351–1356, [16] J. Hancock and T. M. Khoshgoftaar, “Optimizing ensemble trees for
2009. big data healthcare fraud detection,” in 2022 IEEE 23rd international

14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

conference on information reuse and integration for data science (IRI). [39] C. Ying, M. Qi-Guang, L. Jia-Chen, and G. Lin, “Advance and prospects of
IEEE, 2022, pp. 243–249. adaboost algorithm,” Acta Automatica Sinica, vol. 39, no. 6, pp. 745–758,
[17] N. Kumaraswamy, M. K. Markey, J. C. Barner, and K. Rascati, “Feature 2013.
engineering to detect fraud using healthcare claims data,” Expert Systems [40] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu,
with Applications, vol. 210, p. 118433, 2022. “Lightgbm: A highly efficient gradient boosting decision tree,” Advances
[18] N. Kumaraswamy, T. Ekin, C. Park, M. K. Markey, J. C. Barner, and in neural information processing systems, vol. 30, 2017.
K. Rascati, “Using a bayesian belief network to detect healthcare fraud,” [41] S. B. Kotsiantis, “Decision trees: a recent overview,” Artificial Intelligence
Expert Systems with Applications, p. 122241, 2023. Review, vol. 39, pp. 261–283, 2013.
[19] J. M. Johnson and T. M. Khoshgoftaar, “Data-centric ai for healthcare [42] S. Lemeshow, R. X. Sturdivant, and D. W. Hosmer Jr, Applied logistic
fraud detection,” SN Computer Science, vol. 4, no. 4, p. 389, 2023. regression. John Wiley & Sons, 2013.
[43] G. Biau and E. Scornet, “A random forest guided tour,” Test, vol. 25, pp.
[20] R. A. Bauder and T. M. Khoshgoftaar, “The effects of varying class distri-
197–227, 2016.
bution on learner behavior for medicare fraud detection with imbalanced
[44] J. T. Hancock and T. M. Khoshgoftaar, “Gradient boosted decision tree
big data,” Health information science and systems, vol. 6, pp. 1–14, 2018.
algorithms for medicare fraud detection,” SN Computer Science, vol. 2,
[21] R. A. Bauder, T. M. Khoshgoftaar, and T. Hasanin, “Data sampling
no. 4, p. 268, 2021.
approaches with severely imbalanced big data for medicare fraud detec-
[45] S. Wu and P. Flach, “A scored auc metric for classifier evaluation and
tion,” in 2018 IEEE 30th international conference on tools with artificial
selection,” in Second workshop on ROC analysis in ML, bonn, Germany,
intelligence (ICTAI). IEEE, 2018, pp. 137–142.
2005.
[22] J. M. Johnson and T. M. Khoshgoftaar, “Hcpcs2vec: Healthcare procedure [46] K. Boyd, K. H. Eng, and C. D. Page, “Area under the precision-recall
embeddings for medicare fraud prediction,” in 2020 IEEE 6th International curve: point estimates and confidence intervals,” in Machine Learning and
Conference on Collaboration and Internet Computing (CIC), 2020, pp. Knowledge Discovery in Databases: European Conference, ECML PKDD
145–152. 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part
[23] ——, “Medical Provider Embeddings for Healthcare Fraud Detection,” III 13. Springer, 2013, pp. 451–466.
SN Computer Science, vol. 2, no. 4, p. 276, Jul. 2021. [Online]. Available: [47] P. Y. Prasad, A. S. Chowdarv, C. Bavitha, E. Mounisha, and C. Reethika,
https://link.springer.com/10.1007/s42979-021-00656-y “A comparison study of fraud detection in usage of credit cards using
[24] M. Suesserman, S. Gorny, D. Lasaga, J. Helms, D. Olson, E. Bowen, and machine learning,” in 2023 7th International Conference on Trends in
S. Bhattacharya, “Procedure code overutilization detection from health- Electronics and Informatics (ICOEI). IEEE, 2023, pp. 1204–1209.
care claims using unsupervised deep learning methods,” BMC Medical [48] M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “Evaluation measures for
Informatics and Decision Making, vol. 23, no. 1, p. 196, 2023. models assessment over imbalanced data sets,” J Inf Eng Appl, vol. 3,
[25] J. T. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “Evaluating no. 10, 2013.
classifier performance with highly imbalanced big data,” Journal of Big [49] P. Gupta, A. Varshney, M. R. Khan, R. Ahmed, M. Shuaib, and S. Alam,
Data, vol. 10, no. 1, p. 42, 2023. “Unbalanced credit card fraud detection data: A machine learning-oriented
[26] J. T. Hancock III and T. M. Khoshgoftaar, “Exploring maximum tree depth comparative study of balancing techniques,” Procedia Computer Science,
and random undersampling in ensemble trees to optimize the classification vol. 218, pp. 2575–2584, 2023.
of imbalanced big data,” SN Computer Science, vol. 4, no. 5, p. 462, 2023. [50] B. Guelib, K. Zarour, H. Hermessi, R. Bounab, and K. Nawres, “Same-
[27] CMS, “Medicare Physician & Other Practitioners - by Provider - Centers subject-modalities-interactions: A novel framework for mri and pet multi-
for Medicare & Medicaid Services Data.” [Online]. Available: https://data modality fusion for alzheimer’s disease classification,” IEEE Access,
.cms.gov/provider-summary-by-type-of-service/medicare-physician-oth vol. 11, pp. 48 715–48 738, 2023, 2023.
er-practitioners/medicare-physician-other-practitioners-by-provider [51] R. Ghosh, S. Phadikar, N. Deb, N. Sinha, P. Das, and E. Ghaderpour,
[28] OIG, “LEIE Downloadable Databases | Office of Inspector General | “Automatic eyeblink and muscular artifact detection and removal from eeg
U.S. Department of Health and Human Services.” [Online]. Available: signals using k-nearest neighbor classifier and long short-term memory
https://oig.hhs.gov/exclusions/exclusions_list.asp networks,” IEEE Sensors Journal, vol. 23, no. 5, pp. 5422–5436, 2023.
[29] CMS, “Medicare Physician & Other Practitioners Methodology - Centers [52] D. Dablain, K. N. Jacobson, C. Bellinger, M. Roberts, and N. V. Chawla,
for Medicare & Medicaid Services Data.” [Online]. Available: https://data “Understanding cnn fragility when learning with imbalanced data,” Ma-
.cms.gov/resources/medicare-physician-other-practitioners-methodology chine Learning, pp. 1–26, 2023.
[30] R. Bauder and T. Khoshgoftaar, “Medicare fraud detection using random
forest with class imbalanced big data,” in 2018 IEEE international confer-
ence on information reuse and integration (IRI). IEEE, 2018, pp. 80–87.
[31] J. Hancock and T. M. Khoshgoftaar, “Medicare fraud detection using
catboost,” in 2020 IEEE 21st international conference on information reuse
and integration for data science (IRI). IEEE, 2020, pp. 97–103.
[32] M. Herland, T. M. Khoshgoftaar, and R. A. Bauder, “Big data fraud
detection using multiple medicare data sources,” Journal of Big Data,
vol. 5, no. 1, pp. 1–21, 2018.
[33] M. Rashid, J. Kamruzzaman, T. Imam, S. Wibowo, and S. Gordon, “A
tree-based stacking ensemble technique with feature selection for network RAYENE BOUNAB received her computer
intrusion detection,” Applied Intelligence, vol. 52, no. 9, pp. 9768–9781, science degree from the University of Abdel-
2022.
hamid Mehri Constantine2 in 2016. She contin-
[34] G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior ued her studies at the University of Abdelhamid
of several methods for balancing machine learning training data,” ACM PLACE
Mehri Constantine2 and received her Master’s de-
SIGKDD explorations newsletter, vol. 6, no. 1, pp. 20–29, 2004. PHOTO
gree within the Faculty of ‘Nouvelle Téchnolo-
[35] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: HERE
gies d’Informations et Communication’ NTIC, in
synthetic minority over-sampling technique,” Journal of artificial intelli-
gence research, vol. 16, pp. 321–357, 2002.
2019. Now she is a PhD student at the Department
of TLSI, LIRE Laboratory of the University of
[36] I. D. Mienye and Y. Sun, “A deep learning ensemble with data resampling
for credit card fraud detection,” IEEE Access, vol. 11, pp. 30 628–30 638, Abdelhamid Mehri Constantine 2, she is working
2023. on machine learning for healthcare fraud detection.
[37] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng, “Stochastic gradient boosted
distributed decision trees,” in Proceedings of the 18th ACM conference on
Information and knowledge management, 2009, pp. 2061–2064.
[38] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen,
R. Mitchell, I. Cano, T. Zhou et al., “Xgboost: extreme gradient boosting,”
R package version 0.4-2, vol. 1, no. 4, pp. 1–4, 2015.

VOLUME 4, 2016 15

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3385781

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

KARIM ZAROUR is a Professor of Computer


science at the University Constantine 2, Algeria.
He received his Ph.D. degree from Mentouri Uni-
versity of Constantine. He received his habilitation
PLACE
qualifications from Abdelhamid Mehri University.
PHOTO
He supervises many PhD and Master students. He
HERE
published many articles in International Journals
and Conferences. His current research interests in-
clude health informatics, IA, privacy and security
in healthcare, multi-agent systems and Cloud.

BOUCHRA GUELIB received the B.S. in 2015


from the University of Abdelhamid Mehri Con-
stantine2 within the faculty of ‘Nouvelle Téch-
nologies d’Informations et Communication’. In
PLACE
2018, she received her M.S. degree in ’Information
PHOTO
systems’ from the same university. Now she is
HERE
a PhD student at the Department of TLSI, LIRE
laboratory of the University of Abdelhamid Mehri
Constantine2, working on multimodal fusion us-
ing machine learning. She is interested in medical
image processing, multimodal fusion and the bioinformatic field.

NAWRES KHLIFA Engineer and Ph.D. of the


National School of Engineers of Tunis ENIT. She
is currently a Professor at the Higher Institute
of Medical Technologies of Tunis- Tunis El Ma-
nar University. She coordinates the TIMEd team:
Medical Image Processing of the BTM Labora-
tory. Her research work focuses on Artificial In-
telligence and CAD design in medical imaging,
emotion recognition, and gaze tracking.

16 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like