0% found this document useful (0 votes)
23 views

Credit Card Fraud Detection Based On Improved Variational Autoencoder Generative Adversarial Network

This paper proposes using an improved variational autoencoder generative adversarial network (VAEGAN) to generate additional minority class data and address imbalanced data issues for credit card fraud detection. The VAEGAN is trained on real minority class data and generates synthetic fraudulent samples to augment the training set. Experimental results show this oversampling method outperforms other techniques like GAN, VAE, and SMOTE in terms of precision, F1 score, and other metrics for training ensemble learning classifiers on the augmented imbalanced credit card transaction data.

Uploaded by

Manasa Krishna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Credit Card Fraud Detection Based On Improved Variational Autoencoder Generative Adversarial Network

This paper proposes using an improved variational autoencoder generative adversarial network (VAEGAN) to generate additional minority class data and address imbalanced data issues for credit card fraud detection. The VAEGAN is trained on real minority class data and generates synthetic fraudulent samples to augment the training set. Experimental results show this oversampling method outperforms other techniques like GAN, VAE, and SMOTE in terms of precision, F1 score, and other metrics for training ensemble learning classifiers on the augmented imbalanced credit card transaction data.

Uploaded by

Manasa Krishna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 13 July 2023, accepted 26 July 2023, date of publication 7 August 2023, date of current version 11 August 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3302339

Credit Card Fraud Detection Based on Improved


Variational Autoencoder Generative
Adversarial Network
YUANMING DING , WEI KANG , JIANXIN FENG , BO PENG, AND ANNA YANG
Communication and Network Key Laboratory, Dalian University, Dalian 116622, China
Corresponding author: Yuanming Ding ([email protected])
This work was supported in part by the National Natural Science Foundation of China under Grant 61901079.

ABSTRACT The rapid spread of mobile banking and e-commerce has coincided with a dramatic increase
in fraudulent online payments in recent years. Although machine learning and deep learning are widely used
in credit card fraud detection, the typical credit card transaction data set is unbalanced, and the fraud data is
much less than the normal transaction data, limiting the effectiveness of traditional binary classification
algorithms. To overcome this issue, researchers oversample minority class data and utilize ensemble
learning classification algorithms. However, oversampling still has disadvantages. Hence, we improve the
generator part of the Variational Autoencoder Generative Adversarial Network (VAEGAN) and propose
a new oversampling method that generates convincing and diverse minority class data. The training set
is enhanced by generating minority class fraud data to train the ensemble learning classification model.
The method is tested on an open credit card dataset, with the experimental results demonstrating that the
oversampling method utilizing the improved VAEGAN is superior to the oversampling method of Gener-
ative Adversarial Network (GAN), Variational Autoencoder (VAE), and Synthetic Minority Oversampling
Technique (SMOTE) in terms of Precision, F1_score, and other indicators. The oversampling method based
on the improved VAEGAN effectively deals with the classification problem of imbalanced data.

INDEX TERMS Credit card fraud, ensemble learning, variational autoencoder generative adversarial
network, oversampling.

I. INTRODUCTION is much less than the number of negative samples in fraud


Imbalanced data refers to the situation where the number of detection datasets. In an unbalanced dataset, due to the small
samples of different classes in the data set varies significantly. number of positive samples, the model may be more inclined
For example, the dataset is imbalanced in a binary classifica- to predict the negative class while ignoring the positive class,
tion problem if the number of positive samples is much less decreasing the model’s classification performance, especially
than that of negative samples. The class with a large number for the minority class [3]. At the same time, the model’s
of samples is usually called the majority class, and the class generalization performance declines, and its performance
with a small number of samples is the minority class [1]. evaluation deviates [4], [5].
In practical applications, unbalanced data sets can appear The credit card fraud detection problem studied in this
in various fields, such as medicine, natural language pro- paper belongs to the classification problem of imbalanced
cessing, image recognition, industrial defect detection, and data. Credit card fraud detection refers to identifying and
finance [2]. In the financial field, the incidence of fraudulent preventing fraudulent behavior in credit card transactions
transactions is very low, so the number of positive samples based on relevant characteristic variables in the customer’s
past transaction records. Although fraudulent transactions are
The associate editor coordinating the review of this manuscript and a minority, the losses caused by misjudging fraudulent trans-
approving it for publication was Tyson Brooks . actions are often greater than those caused by misjudging
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
83680 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

non-fraudulent transactions [6]. Currently, the main solutions combined manual and automatic classification, compared
for credit card fraud detection are as follows, supervised different machine learning algorithms, and used data mining
machine learning algorithm [6], [7], [8], semi-supervised techniques to solve fraud detection and similar problems.
machine learning algorithm [9], [10], and unsupervised In [23], eight machine learning algorithms were compared
machine learning algorithm [8]. There are mainly two strate- to credit card fraud detection. The Logistic Regression (LR),
gies to solve the class imbalance problem in credit card fraud C5.0 decision tree algorithm, and Support Vector Machine
detection. On the data processing level, oversampling and (SVM) were selected as the final classification method.
undersampling techniques are used to balance the original In [13], researchers compared two random forests with dif-
data [11], [12], and on the algorithm level, ensemble learning ferent base classifiers and analyzed their credit card fraud
and cost Sensitive learning further improve the effectiveness detection performance. Other solutions applied artificial neu-
of classifiers [8], [13]. However, despite many studies exist, ral networks to credit card fraud detection. For instance,
these present various problems and require further refinement Asha RB [24] utilizes various machine learning algorithms
and improvement. and artificial neural network (ANN) to predict the occur-
This paper uses the most widely used supervised learning rence of fraud. The experimental results show that it provides
method to detect credit card transaction data [14], compares higher accuracy than unsupervised learning. The work of [25]
and analyzes the classification performance of five classifi- formulated the fraud detection problem as a sequence clas-
cation algorithms, and selects the classification model with sification task and used a long short-term memory (LSTM)
the best precision and F1 score performance [15]. In order network to incorporate transaction sequences.
to solve the negative impact caused by data imbalance, For the data imbalance problem, studies have shown that
researchers use undersampling or oversampling methods to oversampling and undersampling methods perform well for
improve the results of fraud detection classification [16], [17], ensemble classification models such as AdaBoost, XGBoost,
[18], [19]. Undersampling reduces the number of majority and Random Forest [26]. Indeed, [27] proposed an All
class samples, removing some useful hidden information and K-Nearest Neighbors (AllKNN) undersampling technique,
thus affecting the model’s classification performance. Typi- which, although it improved the classification performance
cally, researchers adopt the oversampling method [20], [21]. on some indicators, lost important information in the data,
This paper utilizes the improved Variational Autoencoder leading to flawed Trained classifiers [28]. Currently, over-
Generative Adversarial Network (VAEGAN) deep learning sampling has become the main data preprocessing method
method to generate positive data. Specifically, the minority to deal with imbalanced data, with [29] oversampling the
class data in the original training set is used as the training minority class using SMOTE. Besides, Majzoub et al. [30]
set of the deep learning method, which is then used to gen- proposed a Hybrid Cluster Affinity Boundary Line SMOTE
erate false minority class data. After that, the generated fake (HCAB-SMOTE) oversampling technique that improves
data and the original training set are combined to form an SMOTE. Recently, deep learning models have also been
enhanced training set. Experimental evaluations demonstrate applied with data oversampling. Fiore et al. [31] expanded
that classification models trained on the augmented training the credit card fraud data using a GAN to generate vir-
set attain an improved classification performance compared tual fraud samples. The results show that this method is
to models solely trained on the original training set. Although superior to the SMOTE oversampling method. Addition-
our framework is developed for credit card fraud detection, ally, Tingfei et al. [32] proposed using a VAE model as an
it is quite general and can be easily extended to other appli- oversampling module to augment the original training data
cation domains. with generated data. The experimental results show that
The remainder of this paper is structured as follows. the VAE oversampling model slightly improved over the
Section II systematically introduces the related work on GAN network.
credit fraud, and Section III presents some basic theoreti- This work employs the VAEGAN model as an oversam-
cal knowledge of the model. Section IV elaborates on the pling module to rebalance the training set by generating fake
research methodology, and section V discusses the relevant minority class data and injecting the generated data into the
experimental content. Finally, Section VI summarizes our original training set. At the same time, we improved and opti-
objectives and findings. mized the VAEGAN model, enhancing its expressive ability
to output more realistic and diverse data.
II. RELATED WORK
Credit card fraud detection has always been a concern for III. RELATED THEORY
many researchers. Supervised learning methods based on A. SMOTE
machine learning and deep learning are on credit card fraud SMOTE is an algorithm dealing with class-imbalanced
detection. To improve the impact of the imbalance of credit datasets, which balances the class distribution in the dataset
card data on the classification results, the researchers have by generating some synthetic samples. Specifically, the
proposed two solutions. One is to improve the classifier and SMOTE algorithm selects some minority samples, selects
select a better-performance classification mode, and the other several nearest neighbor samples for each sample, and gener-
is to deal with the imbalanced data. In [22], the authors ates new samples through random interpolation. By changing

VOLUME 11, 2023 83681


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

the interpolation ratio, the SMOTE algorithm adjusts the


influence of generating synthetic samples on the training set.
This process is represented by the following formula:
FIGURE 2. Input raw data x to train encoder E, whose output z is used to
x ′ = x + rand (0, 1) ∗ |a − b| (1) train decoder D and generate data x′ .

B. GAN D. VAEGAN
GAN is a deep learning model comprising two neural net- VAEGAN is a generative model that combines the advantages
works: a generator (G) and a discriminator (D), as depicted of VAE and GAN to learn the latent space representation
in Figure 1. The generator receives a random noise vector and the distribution of the generated samples through two-
z as input and generates false data x′ through a series of stage training. VAEGAN uses the discriminator of GAN to
transformations. The discriminator (D) receives real data x assist training, and the discriminant result is employed as
and fake data x′ and tries to distinguish which data is real or the loss function of VAEGAN. Compared with VAE and
fake. GAN networks, VAEGAN has three main advantages: the
GAN’s innovation lies in introducing the confrontational learned latent space representation is more discriminative and
training concept, enabling the generator to generate more can better distinguish different samples; the generator learns
realistic data. Equation 2, 3 presents the objective function the advantages of VAE and GAN simultaneously and can
of the GAN network: generate more Realistic and diverse samples; mapping latent
vectors to interpretable feature spaces helps to analyze and
max Ex∼pd (x) [D (x)] + Ez∼pz (z) [1 − D (G (z))] (2) understand the model’s representation ability. The advantages
D
min Ez∼pz (z) [1 − D (G (z))] (3) of the VAEGAN model afford a wider application prospect in
G the image and video generation.
where Pd (x) represents the distribution of real data, Pz (z) is
the distribution of noise, G (z) denotes the generated sample,
and D (x) is the output probability.

FIGURE 3. Input the original data x to train the encoder E. Its output z is
used to train the generator G in the generative adversarial network. The
generated data x′ is merged with the original training set x, and then the
discriminator D is trained using the enhanced training set.

FIGURE 1. Input random noise z to train the generator G, whose output x′ As illustrated in Figure 4, the real data is inputted into the
is combined with the original training set x to train the discriminator.
VAEGAN’s encoder, which encodes it into mean and variance
codes. The mean and variance codes are then reparameterized
to generate latent codes. VAEGAN’s decoder generates fake
C. VAE data by decoding the latent codes. Finally, the real and gen-
The VAE model is a generative model that generates new data erated fake data are fed into the VAEGAN’s discriminator to
by learning the distribution of latent variables. As illustrated determine whether the input data is real or fake.
in Figure 2, it comprises the encoder (E) and decoder (D). The
encoder maps the input data x to a probability distribution z in
the latent space, and the decoder samples and reconstructs the
original data x′ from the latent space. The training objective
of the VAE model is to minimize the reconstruction error and
the KL divergence of the latent variables. The VAE model can
learn a continuous representation in the latent space, which
can be reasoned through variational inference, presenting a
certain interpretability. However, the data quality generated
by VAE is inferior to generative models such as GAN. The
encoding principle of the encoder is presented in Equation 4:

z = σ (x) ∗ N (0, 1) + µ (x) (4)

where σ (x) and µ(x) are the standard deviation and mean of
the real data, respectively. This way, the encoder’s distribu- FIGURE 4. Flowchart of VAEGAN: Input real data to generate false data
tion can be directly decoded and generate the data. and judge the authenticity of the data.

83682 VOLUME 11, 2023


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

IV. RESEARCH METHODOLOGY The target variable of the data set is ‘‘Class’’, which is used
A. CREDIT CARD FRAUD DETECTION FRAMEWORK to mark whether the transaction is a fraudulent transaction,
This paper studies four oversampling models: SMOTE, GAN, where 0 indicates a normal transaction, and 1 is a fraudulent
VAE, and VAEGAN, and improves the VAEGAN model. transaction. The dataset consists of 284,807 transactions,
Five classification models, i.e., Logistic, decision tree, ran- of which only 492 are fraudulent transactions (0.17% of the
dom forest, neural network, and XGBoost, are compared and total), and the remaining 284,315 transactions are normal
analyzed, considering their effect on credit card fraud detec- transactions. This is a typical imbalanced classification prob-
tion. The credit card fraud detection framework studied in this lem. Figure 5 shows the class distribution of fraudulent and
article comprises two parts. In the first part, five classification non-fraudulent transactions in the credit card fraud dataset,
models are trained separately without balancing the credit revealing an extremely imbalanced distribution between nor-
card data while maintaining the original data distribution mal and fraudulent transactions in the credit card dataset.
ratio. The classification models suitable for credit card fraud
detection are selected based on some evaluation metrics.
In the second part, the five oversampling models studied are
used to balance the original training set, and the most effective
oversampling method is selected to improve the classification
effectiveness. The credit card fraud detection framework can
be summarized as follows:
1) Train Logistic, decision tree, random forest, neu-
ral network, and XGBoost models, and perform
cross-validation and grid search. The classification
model C with the best classification effect is selected
as the baseline method.
2) Screen all fraudulent samples from the original training
set T to form a set F.
3) Use SMOTE, GAN, VAE, VAEGAN, and the improved
VAEGAN models to oversample F and increase the
number and diversity of fraudulent samples. Then gen- FIGURE 5. The credit card data distribution, 1 is fraudulent data, 0 is
normal data.
erate a new synthetic instance F′ .
4) Construct an enhanced training set T′ , and merge the
synthetic sample F′ generated by the oversampling In order to improve the performance and efficiency of the
method with the original training set T. fraud detection algorithm, the characteristics of the transac-
5) Retrain the classification model C on the enhanced tion amount are normalized to avoid a large impact on the
training set T′ . model weight. This paper adopts the normalization method
6) The difference in the performance indicators between based on the median and interquartile range, and the normal-
the original classification model C and the enhanced ization rule is as follows:
classification model C′ is compared on the independent
xi − median
test set S. The improvement effect of different over- xi′ = (5)
sampling methods and enhanced training sets on the IQR
effectiveness of the classification model is verified. where xi represents a certain sample value, the median repre-
Through the above experimental process, the impact of sents the sample’s median, and IQR is the interquartile range
oversampling methods and enhanced training sets on credit of the sample.
card fraud detection can be scientifically evaluated, improv- Finally, the data set is divided into a training set, accounting
ing the classification model’s effectiveness and robustness, for 70% of the total samples, including 199,365 transaction
thus providing a reliable and effective solution for practical data, of which 337 involve fraud data (0.169%). The remain-
applications. ing data are used as a test set, accounting for 30% of the
total samples, including 85442 transaction data, of which
B. DATASET DESCRIPTION 155 involve fraud data (0.181% incidence rate).
This study exploits the credit card fraud detection data
released on the Kaggle platform, which contains European C. FRAUD DETECTION CLASSIFICATION ALGORITHMS
cardholders’ credit card transaction data within two days in We evaluated five fraud detection and classification algo-
September 2013. The data set contains 30 features, including rithms, including machine learning, ensemble learning, and
28 numerical features V1, V2. . . V28 that have undergone neural networks, and selected the optimal classification
PCA dimensionality reduction, and two features, ‘‘Time’’ algorithm as the baseline model for credit card fraud detection
and ‘‘Amount’’, that have not undergone PCA conversion. through experimental comparative analysis.

VOLUME 11, 2023 83683


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

1) XGBOOST model’s scalability, making handling more complex data and


XGBoost (eXtreme Gradient Boosting) is an algorithm tasks challenging. Therefore, it is necessary to improve the
based on GBDT. It builds multiple decision trees iter- VAEGAN model, increase the number of encoders, improve
atively, optimizing each iteration’s loss function while the expressiveness of the latent space, and enhance the
using gradient-boosting techniques to speed up training and model’s scalability.
improve accuracy. Equation 6 is the objective function of
XGBoost:
X  X
L (Φ) = l yi , ŷi + Ω (fk ) (6)
i k
where L(8) is the expression on the linear space, i is the i-th
sample, k is the k-th tree, and ŷi is the predicted value of the
i-th sample xi .

2) OTHER CLASSIFICATION ALGORITHMS FIGURE 6. Input the original data x to train the encoder E1 and the
encoder E2 respectively, and the fusion output z is used to train the
Logistic Regression is a classic machine learning algorithm generator G in the Generative Adversarial Networks. The generated data
for binary or multi-class classification problems. It combines x′ is merged with the original training set x, and the combined enhanced
training set is used to train the discriminator D.
the input features linearly and then uses the sigmoid function
to map the result into a probability output between 0 and 1.
A Decision Tree is a classification model based on a tree Spurred by the above problems, this paper improves the
structure created by selecting the best features for node split- VAEGAN model by adding an encoder to the VAE part of
ting. During the test, it starts from the root node, traverses the original VAEGAN model (Figure 6). Input the fraud data
in order according to the feature value, and finally reaches into encoder E1 and encoder E2 separately. Both E1 and E2
the leaf node. The category of the leaf node is the prediction can encode the input real data into mean and variance codes
result. respectively. By merging the mean and variance codes from
Random Forest is an ensemble learning algorithm based both encoders, we generate the latent code. Then, the decoder
on decision trees, which reduces overfitting and improves generates fake data by decoding the latent code.
prediction accuracy by building multiple decision trees on The key step to realizing the above idea is fusing the mean
different random samples and features. The random forest and variance encoded by the two encoders. Thus, the results
votes through all decision trees at test time to determine the of the two encodings in VAEGAN are fused by multiply-
final prediction. ing two normal distribution probability density functions.
A Neural Network is a machine learning model that imi- Assuming that the probability density functions of two nor-
tates the structure and function of the human nervous system. mal distributions are distributed as:
It comprises multiple layers of neuron nodes, and weights (x−µ1 )2
1 −
2σ12
connect each layer. The neural network passes the input signal f (x) = √ e (7)
2π σ1
to the output layer through forward propagation and then
(x−µ2 )2
uses the backpropagation algorithm to adjust the weights to 1 −
2σ22
g (x) = √ e (8)
realize the nonlinear transformation of the input and predict 2π σ2
the output. Multiplying the two gives:
(x−µ0 )2
D. IMPROVED VAEGAN OVERSAMPLING METHOD 1 −
2σ02
To achieve better oversampling results, we tested adding extra h (x) = A · √ e (9)
2π σ0
encoders or increasing the number of layers. The credit card
The value of A is:
fraud data only has 30 dimensions, and the data features
(µ1 −µ2 )2
are not very complex. Using two encoders can improve the −
model’s representation ability, as each encoder can learn e ( ) 2 σ12 +σ22
A= q (10)
different feature representations. However, using a deeper 2π σ12 + σ22

encoder leads to overfitting and result in a decrease in sam-
pling effectiveness. The value of µ0 is:
The original VAEGAN model has only one encoder, which µ1 σ22 + µ2 σ12
cannot easily capture the data’s complex structure and multi- µ0 = (11)
σ12 + σ22
level features, resulting in limited model and generalization
performance. The insufficient representation ability of the The value of σ02 is:
latent space may lead to the lack of diversity and realism
σ12 σ22
of the generated samples while affecting the model’s gen- σ02 = (12)
eralization performance. Additionally, an encoder limits the σ12 + σ22

83684 VOLUME 11, 2023


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

FIGURE 7. (a) : The result of multiplying normal distribution N(µ1 , σ12 ) and N(µ2 , σ22 ) is distribution N(µ0 , σ02 ). (b) After the normal distribution
N(µ1 , σ12 ) is multiplied by N(µ2 , σ22 ), the scaling factor is deleted, and the result is N(µ0 , σ02 ).

where h (x) is the result of a normal distribution N (µ0 , σ02 ) TABLE 1. Model parameters.
multiplied by the scaling factor A, µ0 is the mean of a normal
distribution, and σ02 is the Variance of a normal distribution.
We conclude that multiplying the probability density func-
tions of two normal distributions that obey N (µ1 , σ12 ) and
N (µ2 , σ22 ) is equivalent to the normal distribution N (µ0 , σ02 )
multiplied by the scaling factor A.
Therefore, the product of two Gaussian distributions is
a scaled Gaussian distribution. However, scaling factor A
changes the density value corresponding to the value of each
selected random variable and does not change the expected
sum after the product Variance, i.e., the distribution relation-
ship after the product is unaffected by the scaling factor.
When fusing the mean and variance encoded by the two
encoders, we delete the scale factor A to ensure that the
distribution after fusion is also a normal distribution. use
h′ (x) replace h (x).
(x−µ0 )2
1 − the classification performance indicators of the five mod-
2σ02
h (x) = √

e (13)
2πσ0 els, revealing that the XGBoost model has obvious advan-
tages in the Recall and F1_score indicators. To evaluate
V. EXPERIMENTAL RESULTS AND DISCUSSION the model’s effectiveness more comprehensively, the PR
A. BASELINE MODEL curve is drawn based on different classifiers (Figure 8),
To screen out the optimal classification model, this paper and the ROC curve is illustrated in Figure 9. Combin-
uses the Logistic, decision tree, random forest, neural net- ing Figures 8 and 9 reveals that the classification results
work, and XGBoost algorithms to detect credit card fraud. of the XGBoost classifier are better than the other classi-
We performed cross-validation, grid search, and other strate- fiers.
gies on the five classification models to ensure the gener- Finally, we choose the XGBoost classification model as
alization ability and performance of the selected models. the baseline method for fraud detection. This baseline model
After synthesizing the five indicators, we determined the will be a reference model for subsequent model performance
final parameter settings per model (Table 1). Table 2 records improvements.

VOLUME 11, 2023 83685


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

TABLE 2. Base model classification indicator results. TABLE 3. Model parameters.

TABLE 4. Model parameters.

FIGURE 8. Basic model classification precision-recall curve. improved VAEGAN also uses a two-layer network, and GAN
uses a three-layer network.
In the oversampling experiment, the false fraud samples
were synthesized based on 337 fraud samples of the original
training set. The synthetic fraud samples to the real fraud
samples in the original training set have a ratio of 0.25, 0.5,
1, 2, 3, 4, 8, 10, 20, and 100. The synthetic samples and the
original training set form an enhanced training set for model
training.
Finally, five tests were conducted on the same test set,
and then the average value of Precision, Recall, F1_socre,
Specificity, and AUC was taken as the final experimental
result.
By combining Figure 10 and Table 5, we conclude that the
SMOTE and GAN oversampling methods negatively impact
the model’s precision, and the overall trend decreases as the
number of generated samples increases. The VAE, VAEGAN,
and improved VAEGAN oversampling methods significantly
FIGURE 9. Basic model classification receiver operating characteristic
improve classification precision. Under the experimental
curve. expansion ratio, the precision of these three oversampling
methods is higher than that of the baseline model. The
improved VAEGAN model improves the precision more
B. ANALYSIS OF OVERSAMPLING METHODS than the other models at each scale. The precision of the
This article compares and analyzes five oversampling meth- improved VAEGAN model is 0.0281 higher than the baseline
ods, with Table 3 reporting the GAN and VAE parameters. model, and VAE and VAEGAN are 0.0184 and 0.0159 higher,
The generator and discriminator in GAN are three-layer respectively.
networks and the encoder and decoder in VAE are two- By combining Figure 11 and Table 5, we conclude that
layer networks. Table 4 presents the parameters of VAEGAN the VAE, VAEGAN, and improved VAEGAN methods can
and improved VAEGAN. Besides, VAE in VAEGAN uses a improve the F1 value more significantly, and the improved
two-layer network, GAN uses a three-layer network, VAE in VAEGAN has the best effect, which is far better than the other

83686 VOLUME 11, 2023


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

FIGURE 10. Oversampling model and baseline model classification precision under different training set
expansion ratios.

TABLE 5. Precision and F1_score as the number Ng of generated examples vary.

oversampling models in all proportions. Indeed, the F1 value the improved VAEGAN oversampling method achieves the
increased from 0.863 to 0.884. The VAE and VAEGAN meth- best effect on Specificity. Moreover, the improved VAEGAN
ods performed similarly regarding the F1 value. Additionally, does not perform equally well to SMOTE on the classification
the GAN method promotes the F1 value when the expansion indicator AUC but has improved performance compared to
ratio is less than three, and the effect is not as good as the other oversampling and baseline models.
baseline model when the expansion ratio is greater than three. In reference [32], Deep Neural Network (DNN) was used
The performance of the SMOTE method is generally inferior as the classification algorithm and VAE was used as the over-
to the baseline model in terms of the F1 value. Figure 12 and sampling algorithm for credit card fraud detection. We com-
Table 6 highlight that the oversampling method has improved pared our experimental results with those of Tingfei et al. The
recall, and the overall trend is wavy. In some cases, SMOTE results show that our method achieved a higher precision at all
and GAN models are slightly lower than the recall of the augmentation ratios with an increase of 0.0203 in precision.
baseline model. For instance, improving VAEGAN is more In terms of F1-score, our method has a significant advantage
stable and prominent. Compared with the baseline method, at most augmentation ratios. However, in some cases, the
the Recall of improved VAEGAN, VAE, VAEGAN, and GAN recall of our method is slightly lower than that of Tingfei et al.
methods improved by 0.0256, 0.0194, 0.0161, and 0.0211, F1-score combines precision and recall, so it can more com-
respectively. prehensively evaluate its performance. The proposed method
The experimental results on Specificity and AUC are in this paper is more suitable for imbalanced data classifica-
reported in Tables 6 and 7, respectively. The results suggest tion problems.

VOLUME 11, 2023 83687


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

FIGURE 11. Oversampling model and baseline model classification F1_score under different training set
expansion ratios.

FIGURE 12. Oversampling model and baseline model classification recall under different training set expansion ratios.

The DNN was initially trained on an imbalanced (smaller) significantly on all three classification metrics mentioned
dataset, but it is well-known that DNN can perform well when above.
there is a larger amount of data. To further validate that the The DNN model is used as a classification algorithm.
improved VAEGAN model can achieve better results, we ran Through comparative experiments, we found that using
enhanced data on the DNN model. The experimental results the improved VAEGAN model for data augmentation still
are shown in Tables 8, 9 and 10. achieved better results. The maximum improvement in Recall
The Recall of DNN on the raw data is 0.8065, while the for classification was 0.0276. The Precision for classifica-
Recall of XGBoost on the raw data is 0.8129. The Precision tion was significantly improved, with the highest increase
of DNN on the raw data is 0.8562, while the Precision of being 0.0221. Across all augmentation ratios, the F1 score
XGBoost on the raw data is 0.9197. The F1 score of DNN on for classification was greatly improved, with the highest
the raw data is 0.8306, while the F1 score of XGBoost on the increase being 0.0235. However, the performance of the
raw data is 0.8630. XGBoost outperforms the DNN model DNN model on augmented data is still not as good as that

83688 VOLUME 11, 2023


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

TABLE 6. Recall and Specificity as the number Ng of generated examples vary.

TABLE 7. AUC as the number Ng of generated examples is varied. TABLE 9. Precision as the number Ng of generated examples is varied.

TABLE 8. Recall as the number Ng of generated examples is varied. TABLE 10. F1_score as the number Ng of generated examples is varied.

VI. CONCLUSION AND FUTURE WORK


of the XGBoost model. XGBoost is more suitable for the This paper proposes a new credit card fraud detection method
classification of imbalanced data. The augmented data is that combines the improved VAEGAN oversampling method
still imbalanced, although the degree of imbalance has been with the XGBoost classification algorithm. The improved
reduced. VAEGAN oversampling model is trained using the minority

VOLUME 11, 2023 83689


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

class samples in the original training set, and then a large [15] F. K. Alarfaj, I. Malik, H. U. Khan, N. Almusallam, M. Ramzan,
amount of minority class data is generated. Although our and M. Ahmed, ‘‘Credit card fraud detection using state-of-the-art
machine learning and deep learning algorithms,’’ IEEE Access, vol. 10,
model is proposed in the context of credit card fraud detec- pp. 39700–39715, 2022.
tion, it can be easily extended to other application domains [16] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, ‘‘Credit card fraud
involving class imbalance. The experimental results suggest detection using machine learning techniques: A comparative analysis,’’ in
Proc. Int. Conf. Comput. Netw. Informat. (ICCNI), Oct. 2017, pp. 1–9.
that the XGBoost algorithm, as the baseline model for credit [17] H. Shamsudin, U. K. Yusof, A. Jayalakshmi, and M. N. A. Khalid,
card fraud detection, has achieved better classification results ‘‘Combining oversampling and undersampling techniques for imbalanced
than Logistic, decision tree, random forest, and neural net- classification: A comparative study using credit card fraudulent transaction
dataset,’’ in Proc. IEEE 16th Int. Conf. Control Autom. (ICCA), Oct. 2020,
work. This reveals that ensemble methods may be more pp. 803–808.
effective when dealing with class-imbalanced classification [18] A. K. Gangwar and V. Ravi, ‘‘WIP: Generative adversarial network for
problems. Oversampling methods are also an effective way oversampling data in credit card fraud detection,’’ Proc. 15th Int. Conf.
(ICISS). Hyderabad, India: Springer, Dec. 2019, pp. 123–134.
to improve the performance of imbalanced classification [19] Y.-J. Lee, Y.-R. Yeh, and Y. F. Wang, ‘‘Anomaly detection via online
problems. oversampling principal component analysis,’’ IEEE Trans. Knowl. Data
Overall, the improved VAEGAN method achieved an Eng., vol. 25, no. 7, pp. 1460–1470, Jul. 2013.
[20] B. Prasetiyo, Alamsyah, M. A. Muslim, and N. Baroroh, ‘‘Evaluation
excellent precision and F1 score, but the improvement in performance recall and F2 score of credit card fraud detection unbalanced
recall and AUC at certain expansion ratios were not signif- dataset using SMOTE oversampling technique,’’ J. Physics: Conf. Ser.,
icant compared to the GAN and VAE methods. Compared vol. 1918, no. 4, Jun. 2021, Art. no. 042002.
[21] H. Zhu, M. Zhou, G. Liu, Y. Xie, S. Liu, and C. Guo, ‘‘NUS: Noisy-
with the VAEGAN model, the complexity has increased. sample-removed undersampling scheme for imbalanced classification and
In the future, we will study how to stabilize further the application to credit card fraud detection,’’ IEEE Trans. Computat. Social
improvement of Recall and AUC based on steadily improving Syst., early access, Mar. 7, 2023, doi: 10.1109/TCSS.2023.3243925.
[22] N. Carneiro, G. Figueira, and M. Costa, ‘‘A data mining based system
the precision and F1 score. for credit-card fraud detection in e-tail,’’ Decis. Support Syst., vol. 95,
pp. 91–101, Mar. 2017.
REFERENCES [23] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.-S. Hacid, and H. Zeineddine,
‘‘An experimental study with imbalanced classification approaches for
[1] H. Liu, M. Zhou, and Q. Liu, ‘‘An embedded feature selection method for
credit card fraud detection,’’ IEEE Access, vol. 7, pp. 93010–93022, 2019.
imbalanced data classification,’’ IEEE/CAA J. Autom. Sinica, vol. 6, no. 3,
[24] A. Rb and S. K. Kr, ‘‘Credit card fraud detection using artificial neural
pp. 703–715, May 2019.
network,’’ Global Transitions Proc., vol. 2, no. 1, pp. 35–41, Jun. 2021.
[2] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing,
[25] J. Jurgovsky, M. Granitzer, K. Ziegler, S. Calabretto, P.-E. Portier,
‘‘Learning from class-imbalanced data: Review of methods and applica-
L. He-Guelton, and O. Caelen, ‘‘Sequence classification for credit-card
tions,’’ Exp. Syst. Appl., vol. 73, pp. 220–239, May 2017.
fraud detection,’’ Exp. Syst. Appl., vol. 100, pp. 234–245, Jun. 2018.
[3] H. Patel, D. S. Rajput, G. T. Reddy, C. Iwendi, A. K. Bashir, and O.
[26] A. Singh, R. K. Ranjan, and A. Tiwari, ‘‘Credit card fraud detection under
Jo, ‘‘A review on classification of imbalanced data for wireless sen-
extreme imbalanced data: A comparative study of data-level algorithms,’’
sor networks,’’ Int. J. Distrib. Sensor Netw., vol. 16, no. 4, Apr. 2020,
J. Experim. Theor. Artif. Intell., vol. 34, no. 4, pp. 571–598, Jul. 2022.
Art. no. 155014772091640.
[27] N. S. Alfaiz and S. M. Fati, ‘‘Enhanced credit card fraud detection model
[4] C. Jian, J. Gao, and Y. Ao, ‘‘A new sampling method for classifying imbal-
using machine learning,’’ Electronics, vol. 11, no. 4, p. 662, Feb. 2022.
anced data based on support vector machine ensemble,’’ Neurocomputing,
[28] H. He and E. A. Garcia, ‘‘Learning from imbalanced data,’’ IEEE Trans.
vol. 193, pp. 115–122, Jun. 2016.
Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009.
[5] Y. Liu, Y. Wang, X. Ren, H. Zhou, and X. Diao, ‘‘A classification method
[29] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
based on feature selection for imbalanced data,’’ IEEE Access, vol. 7,
Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16,
pp. 81794–81807, 2019.
pp. 321–357, Jun. 2002.
[6] A. Correa Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, ‘‘Feature
[30] H. Al Majzoub, I. Elgedawy, Ö. Akaydın, and M. Köse Ulukök, ‘‘HCAB-
engineering strategies for credit card fraud detection,’’ Exp. Syst. Appl.,
SMOTE: A hybrid clustered affinitive borderline SMOTE approach for
vol. 51, pp. 134–142, Jun. 2016.
imbalanced data binary classification,’’ Arabian J. Sci. Eng., vol. 45, no. 4,
[7] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, ‘‘Deep
pp. 3205–3222, Apr. 2020.
learning detecting fraud in credit card transactions,’’ in Proc. Syst. Inf. Eng.
[31] U. Fiore, A. De Santis, F. Perla, P. Zanetti, and F. Palmieri, ‘‘Using
Design Symp. (SIEDS), Apr. 2018, pp. 129–134.
generative adversarial networks for improving classification effectiveness
[8] F. Carcillo, Y.-A. Le Borgne, O. Caelen, Y. Kessaci, F. Oblé, and G. Bon- in credit card fraud detection,’’ Inf. Sci., vol. 479, pp. 448–455, Apr. 2019.
tempi, ‘‘Combining unsupervised and supervised learning in credit card
[32] H. Tingfei, C. Guangquan, and H. Kuihua, ‘‘Using variational auto
fraud detection,’’ Inf. Sci., vol. 557, pp. 317–331, May 2021.
encoding in credit card fraud detection,’’ IEEE Access, vol. 8,
[9] A. Salazar, G. Safont, and L. Vergara, ‘‘Semi-supervised learning for pp. 149841–149853, 2020.
imbalanced classification of credit card transaction,’’ in Proc. Int. Joint
Conf. Neural Netw. (IJCNN), Jul. 2018, pp. 1–7.
[10] N. R. Dzakiyullah, A. Pramuntadi, and A. K. Fauziyyah, ‘‘Semi-supervised
classification on credit card fraud detection using autoencoders,’’ J. Appl.
Data Sci., vol. 2, no. 1, pp. 1–7, 2021.
[11] L. Ni, J. Li, H. Xu, X. Wang, and J. Zhang, ‘‘Fraud feature boosting
mechanism and spiral oversampling balancing technique for credit card
fraud detection,’’ IEEE Trans. Computat. Social Syst., pp. 1–16, 2023. YUANMING DING received the Ph.D. degree
[12] F. Zhang, G. Liu, Z. Li, C. Yan, and C. Jiang, ‘‘GMM-based undersampling from Keio University, Japan, in 2004. From
and its application for credit card fraud detection,’’ in Proc. Int. Joint Conf. November 2004 to November 2016, he was a Post-
Neural Netw. (IJCNN), Jul. 2019, pp. 1–8. doctoral Fellow with JSPS. Since 2009, he has
[13] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, ‘‘Random forest been a Professor with the Information Engineering
for credit card fraud detection,’’ in Proc. IEEE 15th Int. Conf. Netw., Sens. College, Dalian University, China. His research
Control (ICNSC), Mar. 2018, pp. 1–6. interests include communication signal process-
[14] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, ‘‘Credit ing, network technologies, machine learning, and
card fraud detection using AdaBoost and majority voting,’’ IEEE Access, information security.
vol. 6, pp. 14277–14284, 2018.

83690 VOLUME 11, 2023


Y. Ding et al.: Credit Card Fraud Detection Based on Improved VAEGAN

WEI KANG received the B.S. degree in network BO PENG received the B.Sc. degree from the
engineering from Chuzhou University, Chuzhou, Jiangxi University of Science and Technology,
China, in 2021. He is currently pursuing the mas- in 2021. He is currently pursuing the master’s
ter’s degree with the Communication and Network degree with the Communication and Network
Key Laboratory, Dalian University. His current Key Laboratory, Dalian University. His current
research interests include machine learning and research interests include neural networks and
information security. machine learning.

JIANXIN FENG received the Ph.D. degree


from Northeastern University, China, in 2005.
From 1999 to 2012, she was a Teacher with the ANNA YANG received the B.S. degree in network
Institute of Information Science and Engineer- engineering from Chuzhou University, Chuzhou,
ing, Northeastern University. From 2018 to 2019, China, in 2021. She is currently pursuing the mas-
she was a Visiting Scholar with the Department ter’s degree with the Communication and Network
of Computer Science, Liverpool John Moores Key Laboratory, Dalian University. Her current
University. She is currently an Associate Profes- research interests include machine learning and
sor with the College of Information Engineering, image processing.
Dalian University, China. Her current research
interests include network protocol, wireless communication, machine learn-
ing, and information security.

VOLUME 11, 2023 83691

You might also like