0% found this document useful (0 votes)
12 views10 pages

DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection

The paper presents DBNex, a novel framework for financial fraud detection that utilizes Deep Belief Networks and explainable AI to address the challenges posed by imbalanced datasets. The methodology includes data preprocessing techniques such as up-sampling and sub-sampling, achieving a detection accuracy of 94%. The study highlights the importance of explainability in AI models and demonstrates that DBNex outperforms existing solutions in terms of accuracy and interpretability.

Uploaded by

V R L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection

The paper presents DBNex, a novel framework for financial fraud detection that utilizes Deep Belief Networks and explainable AI to address the challenges posed by imbalanced datasets. The methodology includes data preprocessing techniques such as up-sampling and sub-sampling, achieving a detection accuracy of 94%. The study highlights the importance of explainability in AI models and demonstrates that DBNex outperforms existing solutions in terms of accuracy and interpretability.

Uploaded by

V R L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2022 IEEE International Conference on Big Data (Big Data)

DBNex: Deep Belief Network and Explainable AI


based Financial Fraud Detection
Abhimanyu Bhowmik∗ Madhushree Sannigrahi∗ Deepraj Chowdhury
Department of Artificial Intelligence Department of Artificial Intelligence Department of ECE
Amity University Kolkata Amity University Kolkata IIIT Naya Raipur
2022 IEEE International Conference on Big Data (Big Data) | 978-1-6654-8045-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/BigData55660.2022.10020494

Kolkata, India Kolkata, India Chhattisgarh, India


Email: [email protected] Email: [email protected] Email: [email protected]

Ashutosh Dhar Dwivedi Raghava Rao Mukkamala


Centre for Business Data Analytics Centre for Business Data Analytics
Department of Digitalization, Copenhagen Business School Department of Digitalization, Copenhagen Business School
2000 Frederiksberg, Denmark 2000 Frederiksberg, Denmark
Email: [email protected] Department of Technology, Kristiania University College
Kirkegata 24-26, 0153 Oslo, Norway
Email: [email protected]

Abstract—The majority of financial transactions are now con- United Kingdom, and Ireland. Compared to the year ending
ducted virtually around the world. The widespread use of credit 2019, there was an anticipated 32% increase in fraud cases
cards and online transactions encourages fraudulent activity. in the UK alone by the end of 2021, bringing the total that
Thus, one of the most demanding real-world challenges is fraud
detection. Unbalanced datasets, in which there are a dispro- year to $5 million [3]. Financial fraud detection has developed
portionately high number of non-fraud samples compared to into a critical issue and one of the most significant research
incidents of fraud, are one of the key obstacles to effective fraud problems. The data imbalance is the main issue with fraud
detection. A further factor complicating the learning process for detection since it is evident that there won’t be any financial
cutting-edge machine learning classifiers is how quickly fraud fraud in the majority of financial transactions. To build a
behaviour changes. Thus, in this study, we suggest an efficient
fraud detection methodology. We propose a unique nonlinear em- strong fraud detection system, it is crucial to handle this
bedded clustering to resolve imbalances in the dataset, followed data imbalance issue. The uneven distribution of data can
by a Deep Belief Network for detecting fraudulent transactions. be handled in a variety of ways. The first strategy has to
The proposed model achieved an accuracy of 94% with a 70:30 deal with data up-sampling or down-sampling. The second
ratio of training-validation dataset. one has to do with choosing a machine-learning model. The
Index Terms—Financial Fraud Detection, Deep Belief Network,
UMAP, DBSCAN, CTGAN, Explainable AI dominant class is downsized using the data down-sampling
approach such that its numbers are equal to those of the
minority class. Due to the limited amount of data available
I. I NTRODUCTION
after down-sampling, this method has a disadvantage with
One of the biggest worries today is financial fraud, which respect to training the model [7]. However, in the data up-
affects not only the financial industry but also the entire market sampling methodology, some classic machine learning or deep
where some form of digital payment services are used [1]. The learning models are used to generate an accurate representation
credit card is one of the most significant financial tools, used of the minority class data in order to match the number of
every day by hundreds of thousands of people. The credit card majority category samples. The second viewpoint highlights
industry deals with numerous fraudulent transactions every day the necessity of choosing the correct model for learning. There
as a result of its widespread use. The ability to gather data are two types of machine learning models: supervised and
from fraud cases fuels the expansion of fraud detection in unsupervised. Unsupervised learning works with clustering
the modern era. In order to distinguish fraudulent transactions and anomaly detection, whereas supervised learning deals with
from their non-fraudulent peers, modern machine learning and classification and regression issues. The popularity of machine
deep learning techniques play a critical role in identifying learning models and classification approaches towards fraud
patterns. In 2020, there were approximately 400k instances detection, i.e., to identify fraud and non-fraud samples as 1s
of credit card fraud, according to the FTC. With losses of or 0s, has increased relative to the unsupervised approaches,
$11 billion in 2020 alone, the US is home to over one-third even though the majority of fraud detection algorithms are
of all credit card fraud instances worldwide [2]. The greatest based on an anomaly or intrusion detection framework. To
rates of credit card fraud in Europe are seen in France, the address imbalances in the dataset, we suggest a novel nonlinear

978-1-6654-8045-1/22/$31.00 ©2022 IEEE 3033


Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
TABLE I
C OMPARISON OF P ROPOSED M ODEL WITH E XISTING S OLUTIONS

Related Works Data Pre-processing Model Precision Recall AUC Explainability


Erfani et al. [1] ✓ Deep SVDD 0.90 - 0.93 ✗
Raghavan et al. [4] ✗ Ensembled - - 0.89 ✗
Khedmati et al. [5] ✓ SVM 0.84 0.75 0.94 ✗
Li et al. [6] ✓ Decision Tree 0.87 0.89 0.95 ✗
DBNex ✓ DBN 0.95 0.94 0.96 ✓

embedded clustering based on the sub-sampling of the up- An unsupervised ML framework based on k-nearest neigh-
sampled data. To surpass other cutting-edge approaches, we bour(KNN) was implemented by Malini et al. for credit card
fine-tuned the Deep Belief Network model for the catego- fraud identification [9]. Similar behaviour-based clustering
rization of fraud and non-fraud cases. Last but not least, we was also executed by Li et al. to handle imbalanced data
explain the classification’s justifications with the assistance of by clustering data into major and minor training classes [6].
explainable AI. After training, elements of opposite clusters were introduced in
each group. These opposite-class transactions were considered
A. Key Contribution
the noise-transaction by the model. In order to find kernel
1) We propose a new framework using Deep Belief Net- parameters for accommodating the imbalanced data, Wu et
works for classification and nonlinear embedded clus- al. recommended using the proximity among subspace and
tering based on sub-sampling for data preprocessing. clusters (Grid search and SVM) [10]. In another work by
2) Our new proposed data preprocessing methodology in- Zhang et al., hierarchical clustering was performed K-means
cludes up-sampling the data and sub-sampling based on and Gaussian Mixture Model (GMM) to handle the imbalances
clustering the fraud as well as non-fraud samples. of the dataset [11]. Density Based Clustering (DBSCAN)
3) We provide the explainability of our model using the combined with Support vector data description (SVDD) was
industry standard Explainable AI framework SHAP. suggested for fraud detection as an improved version of the
4) Our experimental results demonstrate that our proposed classic DBSCAN model by Khedmati et al. [5] and later
methodology outperforms the other state-of-the-art ma- Hidden Markov Model (HMM) was used by Lucas et al. to
chine learning frameworks in terms of accuracy. determine the probability of each transaction [12]. Addition-
The original code for DBNex is available for open access ally, to simulate the steps involved in processing credit card
at Github [8]. transactions and spot fraud, Hidden Markov Models (HMMs)
This paper is further sectioned into the following compo- were used by Srivastava et al [13]. Support Vector Machine
nents: Section II discusses the Related works and Section III (SVM), Random Forest (RF), and Logistic Regression (LR)
confers the Proposed Methodology. Section IV discusses the are examined in Bhattacharyya et al. using real-world data
Results and Discussion. Section V concludes the paper with from worldwide transactions involving credit cards [14]. In
Conclusion, Future Works and Acknowledgement. addition, a cost-sensitive decision tree-based approach is sug-
II. R ELATED W ORKS gested for detecting card fraud and tested on real-world data
[15]. An updated Fisher discriminant function is suggested
Fraud detection systems are typically of 2 types: (1) based
in another paper [16], to make credit card fraud recognition
on predefined rules and (2) involving intelligent learning.
highly responsive to significant incidents.
The former establishes some pre-defined rules to detect fraud
patterns whereas the latter depends on learning based on Several anomaly detection techniques such as clustering-
sample cases. However, rule-based fraud-detection systems based, classification-based and hybrids are deployed by
are no longer actively used in the industry. The enormous Agrawal & Agrawal in the paper [17]. Among them, hybrid
data generated nowadays facilitate AI models to learn fraud approaches surpass other approaches in terms of accuracy.
patterns more efficiently and make better decisions. AI-based Nevertheless, the long training time and high computational
techniques can be broadly divided into machine learning and complexity are the foremost drawbacks of these architec-
deep learning algorithms. tures.A novel clustering method was proposed by Bhowmik
A. Classic Machine Learning approach et al. which took the state of transactions over time into
consideration [18].
Machine learning is a sub-field of artificial intelligence that
automates the development of analytical models. In terms Researchers have also created and studied other metrics
of the availability of labels, fraud detection models that use for fraud-detection algorithms, including discriminant power,
statistical machine learning approaches can be categorized into G-mean, and likelihood-ratio, in addition to the well-known
two main groups, namely unsupervised and supervised models. and often used measures of AUC score, precision-recall and
The data for unsupervised models comes from either public accuracy [19] [20]. Despite the fact that accuracy is potentially
or internal fraud datasets, and there are no labels that need to deceiving, there was no obvious winner from these metrics that
be specified by the user. the author advised.

3034
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Overall view of the proposed model: DBNex

B. Advanced Deep Learning approach chine (RBM) [26] are unsupervised deep learning models, used
to reconstruct normal credit-card transactions and recognise
Deep learning is a subset of machine learning that combines
their peculiarities. DAE mainly comprises two symmetrical
artificial neural networks and representation learning. It has re-
Deep Belief Networks(DBN) [27] anointed Encoders and
cently facilitated several practical applications. Discriminative
Decoders. Pumsirirat et al. employed DAE and RBM as fraud
and generative models are the two broad categories into which
detection algorithms [28].
deep learning algorithms can be divided.
Erfani et al. implemented Deep SVDD (support vector data
A Self-Organizing Map (SOM) neural network-based
description) along with a novel data preprocessing technique
method was put forth in a study by Olszewski et al. [21].
comprised of PCA and clustering-based subsampling for de-
They have represented the high-dimensional data using SOM
tecting fraud samples. The authors have implemented two
in a two-dimensional plan for easier visual comprehension.
versions of SVDD namely one class SVDD and soft boundary
They distinguish fake samples from real ones using threshold
Deep SVDD. Based on the proportions of the training dataset,
detection and using data visualisation. Although it’s great
they provided trend analysis for the skewed distribution of the
visualisation capability the user SOM is not capable of parallel
class [1].
processing and suffers slow computation.
Conventional neural networks already have been utilised for C. Summary and Motivation
credit card fraud detection by Dorronsoro et al. [22]. In a
study by Roy et al., the efficiency of DL methods was tested In the past, a variety of machine learning methods, including
on about 80 million credit-card transactions worldwide that SVMs, KNN, DBScan, Random Forest, GMM, and deep
were pre-determined as fraudulent [23]. The study compared learning methods like SVDD, SOM, CNN, and many more,
the performance of many methodologies like Long Short- have been used to identify fraud. The majority of them deal
term Memory(LSTM), Gated Recurrent Unit, Artificial Neural with imbalanced data sets when there is very little fraudulent
Networks and Recurrent Neural Networks based on data data. Thus, DBNex tries to counter this dataset bias by using
imbalance, scalability and sensitivity analysis. They found that CTGAN to generate pseudo-fraud cases. Along with it, a non-
LSTM gave the best results while dealing with big-complex linear dimensionality reduction technique is implemented in
datasets. LSTM was also used by Jurgovsky et al. [24] to our suggested model to produce better outcomes than the
portray credit-card fraud detection as a supervised sequence linear models mentioned above. Section III further discusses
classification problem. Convolutional Neural Networks (CNN) the model’s detailed architecture and specifications.
are a type of neural network that was initially developed
III. P ROPOSED M ETHODOLOGY
and used in image processing and computer vision (CV).
Three different types of layers make up a CNN: convolutional This section elaborates on the procedure implemented in this
layer, pooling layer, and fully connected neural network. An research work as shown in Figure 1. The entire methodology
online transaction fraud detection model employing CNN was is divided into three subsections. The first part includes a
proposed by Zhang et al. [25]. For training the network, they discussion of the dataset. In the second phase, the imbalanced
used a commercial bank’s B2C transactions. Raghavan et al. data is pre-processed using a novel methodology employing
[4] experimented using Ensembled learning with SVM, RNN UMAP. The third phase involves the Deep Belief Network for
and Random Forest and achieved an AUC of about 0.89. the classification of data into fraud and non-fraud samples. The
Deep Auto Encoders (DAEs) and Restricted Boltzmann Ma- final phase explains the role of each feature in classification

3035
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
using SHAP. The methodology is described in the algorithm columns in the training dataset to be used conditionally during
1. the production of synthetic rows. Through the use of training-
by-sampling, the condition and training data are sampled in
A. Dataset accordance with the log frequency of each category, allowing
The Credit Card Fraud Detection dataset by Worldline and CTGAN to uniformly explore all potential discrete values and
the Machine Learning Group of ULB (Université Libre de increase further accuracy.
Bruxelles) is used in this paper [29]. It includes labelled 2) Feature Scaling: The resultant new dataset is then
anonymized credit card payments made by European card standardized. Standardization is a feature scaling process that
users in September 2013. The dataset contains 492 frauds out centers the data around a zero mean and a standard deviation
of 284,807 transactions that took place over the course of two of one. Thus, the attribute’s mean becomes zero, and the
days. Only 0.172% of the transactions in the dataset are fraud- distribution that results has a unit standard deviation.
ulent, which indicates a severe lack of balance. Due to privacy
concerns, the original characteristics and further context of the X −µ
X′ = (1)
data have not been provided. It only contains numerical input σ
variables which are PCA (Principle Component Analysis) 1 where, µ = mean and
transformed into features V1, V2,... V28, ’Time’, ’Amount’ σ =the standard deviation of features.
and ’Class’. 3) Dimentionality Reduction: A nonlinear dimensional
B. Data Pre-processing reduction technique called uniform manifold approximation
and projection (UMAP) [32] makes three assumptions: that
In this phase, the raw dataset is processed to obtain a
the observations are distributed evenly over a Riemannian
final training dataset to train the detection algorithm. The
manifold, that the Riemannian metric is locally constant, and
four successive stages constituting this phase are: generating
that the manifold is locally linked. UMAP is a graph-based
synthetic data, normalisation, dimensionality reduction, and
technique, as opposed to t-SNE [33], which makes use of a
stratified sampling. The resultant dataset is far more fitting
probabilistic model. Its main goal is to represent every original
for the deep learning procedure.
high-dimensional data point as a predetermined k-dimensional
1) Synthesizing and Sampling: Only 5000 non-fraudulent
weighted UMAP graph while minimising the edge-wise cross-
data instances were randomly sampled2 for experimentation’s
entropy between the original data and the weighted graph.
sake. This skimming was carried out because utilising a
The original data points are then individually represented by
generative adversarial network (GAN) model to produce an
k-dimensional eigenvectors of the UMAP graph [34].
excessive amount of false data might cause the data to deviate
UMAP takes input data as X = {x1 , x2 , ..., xN |xi ∈
from its original form. 500 fictitious rows of fraud cases were
RM } and find an optimally lower dimensional embedding
created using CTGAN and merged with the original data to
{y1 , y2 , ..., yN |yi ∈ Rk }, where k < M . The initial step
somewhat balance the dataset.
is to create weighted k-neighbour graphs. Let’s consider
CTGAN, or Conditional Tabular GAN, was introduced in
d : X × X → R+ to be a metric and k << M to be
this research to produce realistic data for a better understand-
a hyper-parameter. Taking k-nearest neighbors for every xi
ing of input data distribution and to keep track of sample
under metric d
dataset building [30]. The GAN-based approach can produce
highly accurate synthetic clones by learning from real single-
table data and rows from a given distribution. A generative ρi = min{d(xi , xj )|1 ≤ j ≤ k, d(xi , xj ) > 0} (2)
adversarial network, or GAN, is composed of discriminator where,
and generator networks. In an antagonistic relationship with
one another, both networks take pleasure in deceiving one k
X −max(0, d(xi , xj ) − ρi )
another. The generator network’s job is to create samples exp ( ) = log2 k (3)
that can’t be distinguished from the genuine ones, while the j=1
σi
discriminator’s job is to identify the false samples from the
real ones. Both networks learn to outperform each other in the The value of ρi should be chosen in such a way that at
adversarial training environment, producing realistic synthetic least one data point is connected to xi with the weight of
examples. Traditionally, GANs are used to generate images. edge being 1, where σi a length-scale parameter, defined via
However, very recently, several tabular data-generating break- 3. A weighted-directed graph can be defined as G = (V, E, ω);
throughs have been found using GANs. TGAN or Tabular where V = set of vertices, E is set of edges E = {(xi , xj )|1 ≤
GAN [31], is one of these algorithms. CTGAN is a conditional j ≤ k, 1 ≤ i ≤ N }, and ω = set of edge’s weightage.
implementation of TGAN that allows one of the discrete
−max(0, d(xi , xj ) − ρi )
ω(xi , xj ) = exp ( ) (4)
1 Itis an unsupervised statistical method used for dimensionality-reduction σi
of data by using their principle components.
2 Random sampling is a sampling technique which gives each row an equal Through symmetrization, UMAP attempts to generate an
chance of getting chosen. undirected weighted graph G from a directed graph G. Let A

3036
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
be the graph G’s adjacency matrix. It is possible to obtain a 4) Stratified Sampling using DBSCAN: Stratified random
symmetric matrix. sampling is a sampling approach which includes dividing a
population into smaller homogeneous subgroups known as
B = A + AT − A ⊗ AT ,
strata (stratum as singular) based on similar traits or attributes
where, T = transpose shared by individual members, such as gender or age [35]. In
⊗ = the Hadamard product. The adjacency matrix B defines stratification, a random sample is chosen from each stratum
the undirected-weighted Laplacian G (the UMAP graph). In after the total population is divided into numerous strata. This
essence, UMAP generates an analogous weighted graph H is to make sure the sampled training distribution of the study
with low-dimensional points {yi }i=1...N , employing attracted appropriately represents each subgroup of the given dataset. In
and repulsive forces. These forces at co-ordinate yi , yj are this study, strata were formed by clustering the resultant non-
given by: linear embedding of data using density-based spatial clustering
of applications with noise or DBSCAN.
2(b−1)
−2ab||yi − yj ||2 In cases when the distribution of the data is uncertain, the
ω(xi , xj )(yi , yj ), (5) density-based clustering [36] technique can cluster any dataset.
1 + ||yi − yj ||22
The conventional density-based clustering technique, or DB-
and,
SCAN, is often used for data cluster analysis because of its
2b simple yet effective characteristics [37]. Based on how densely
(1 − ω(xi , xj ))(yi , yj ) (6)
(ϵ + ||yi − yj ||22 )(1 + ayi − yj2
2b ) a point coexists with other points, the DBSCAN algorithm
separates each point into three categories: Core, Boundary, and
where, a, b = hyperparameter, ϵ = small integer such that
Noise. Core points are those with at least a minimum number
denominator ̸= 0. The aim is to discover the best low-
of points (threshold value) in their immediate neighbourhood
dimensional position {yi }N k
i=1 , yi ∈ R , that reduces the edge- within ϵ radius. A point is a border point if it can be accessed
wise cross-entropy with the original high-dimensional data at
from a core point and there are fewer points nearby than the
each location.
threshold value. A point is known as Noise when there are no
nearby neighbours. A viable cluster is composed of central
Algorithm 1: DBNex
core points surrounded by peripheral boundary points, and
Data: Transaction Dataset; every point is identified according to its cluster membership.
Result: Fraud / Non-Fraud detection;
Std ← Standardization; C. Classification using Deep Belief Network
RndSam ← Random Sampling; A Deep Belief Network (DBN) is an advanced generative
StratSam ← Stratfied Sampling; framework which makes use of deep architecture [38]. Re-
Ct ← CTGAN Tabular Data Generator; stricted Boltzmann Machines(RBMs) are stacked together to
U m ← UMAP: Uniform Manifold Approximation and construct the deep network as depicted in Figure 2.
Projection;
DBSc ← DBSCAN Clustering;
DBN ← Deep Belief Network;
n ←No. of samples;
gen ←No. of generated samples;
for i ∈ Data do
if lable = 1 then
F raudData ← i
end
else
N onF raudData ← i
end
end
Data1 ⇐ Std(F raudData)
Data2 ⇐ Std(RndSam(N onF raudData, n))
Data1+ = Ct(F raudData, gen)
Data1 ⇐ U m(Data1) Data2 ⇐ U m(Data2)
Data1 ⇐ DBSc(Data1) Data2 ⇐ DBSc(Data2)
Data1 ⇐ StratSam(Data1) Fig. 2. Architecture of Deep Belief Network.
Data2 ⇐ StratSam(Data2)
Restricted Boltzmann Machine (RBM) is a generative
T otData = Out1 + Out2
stochastic artificial neural network with the ability to learn a
Output = DBN (T ot)
probability distribution from its inputs. It is a technique useful
for dimensionality reduction, collaborative filtering, pattern

3037
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
recognition, classification, regression, and topic modelling. It an instance x [40]. Shapley values are calculated using the
was named after the well-known Austrian scientist, Ludwig SHAP explanation approach, using coalitional game theory.
Boltzmann, who developed the Boltzmann distribution in the A dataset’s feature values participate in coalitions as players.
late 20th century. Paul Smolensky created RBMs in 1986 We can fairly divide the prediction across the characteristics
[26] under the name Harmonium, and they gained popularity by using the Shapley values. A novel aspect of SHAP is how
after Geoffrey Hinton [39] and others developed rapid learning the Shapley value explanation is modelled as a linear model
algorithms for them in the middle of the 2000s. It is a shallow with additive feature attribution. The ”explanation”, according
neural network configuration consisting of one visible and to SHAP, may be represented as:
one hidden layer. The ”restriction” in a restricted Boltzmann M
machine is the absence of intra-layer connections. Each node
X
g(z ′ ) = ϕ0 + ϕj zj′ (10)
in RBM acts as a location for computing to process input and j=1
begins with a stochastic choice on whether to transmit that
data. where, g = explainability model
Each RBM layer in the stack has communication channels z ′ ∈ {0, 1}M = coalition vector
open to both the layer below it and the one above it. The M = maximum coalition size
network is, therefore, comprised of various single-layer net- ϕj ∈ R = feature attribution for feature j, the Shapley values
works. Except for the first and last levels, each layer in a DBN A value of 1 in the coalition vector indicates that the relevant
serves as both the input layer and the hidden layer for the feature value is ”present,” whereas a value of 0 indicates that
nodes that come before it [27]. Backpropagation and gradient it is ”missing.” For the computation of the ϕ’s, the technique is
descent is used to optimise the deep network. Deep Belief to express coalitions as a linear model. The coalition vector x′
Networks can be employed for detection, image clustering and for the relevant instance x, is a vector of all 1, indicating that
generation, and motion-capture data. In the DBN model, the every feature value is ”present” for x. The simplified equation
visible vector v and the hidden layers hk have the following becomes:
joint distribution:
M
X
l−2
Y g(z ′ ) = ϕ0 + ϕj (11)
P (v, h1 , ..., hl ) = ( P (hk |hk+1 ))P (hl-1 |hl ) (7) j=1
k=0
Only Shapley values can satisfy the qualities of Efficiency,
k k+1
where,v = h0; P (h |h ) represents the condition distribution Symmetry, Dummy, and Additivity. As it calculates Shapley
of visible and hidden unit at k th ) level of RBM. P(hl-1 — hl ) values, SHAP meets these requirements as well. The three
is the joint distribution of visible and hidden layer in the top- desirable characteristics described by SHAP are as follows:
most layer of the constrained Boltzmann machine. 1) Local accuracy:
DBN learning is divided into two stages: pretraining and
M
fine-tuning. For hierarchical stacked RBM training, unsu- X
pervised learning is most favoured for its divergent inverse fˆ(x) = g(x′ ) = ϕ0 + ϕj x′j
j=1
characteristics, while supervised learning is inferred by the
BP approach for fine-tuning the starting and biasing data. If we consider ϕ0 = EX (fˆ(x)) and set x′j = 1, that is the
For fine-tuning and enhancing RBMs and extracting fea- Shapley efficiency property, employing the coalition vector,
tures from the data, unsupervised DBN training is highly but with a different name.
recommended. The energy function E(v, h|θ) is derived in
the applied set of (v, h).
M
X M
X
n m n X
n
fˆ(x) = ϕ0 + ϕj x′j = EX (fˆ(x)) + ϕj (12)
j=1 j=1
X X X
E(v, h|θ) = − ai vi − bj h j − vi wij hj (8)
i=1 j=1 i=1 i=1 2) Missingness:
where θ = {W ij, ai, bj} is a parameter in RBM; here, w,a x′j = 0 ⇒ ϕj = 0
and b depicts the association layer’s weight and visible and
hidden layer’s bias respectively as According to missingness, an absent feature receives a zero
attribution. Here, x′j represents the coalitions with instances
e−E(v,h|θ) X
to be explained as 1. A 0 would indicate that the feature value
P (v, h|θ) = , Z(θ) = e−E(v,h|θ) (9)
Z(θ) is absent for the relevant instance. Because it is compounded
v,h
with x′j = 0, a missing characteristic might theoretically
D. Explainability using SHapley Additive exPlanations have any Shapley value without affecting the local accuracy
(SHAP) attribute. The Missingness trait requires missing features to
By calculating the influence of every characteristic on have a Shapley value of 0. In reality, this is only significant
the final prediction, SHAP seeks to describe the forecast of for constant features.

3038
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
3) Consistency:
Consider fˆx (z ′ ) = fˆ(hx (z ′ )) and zj′ ⇒ zj′ = 0

fˆx′ (z ′ ) − fˆx′ (zj′ ) ≥ fˆx (z ′ ) − fˆx (zj′ )


for all the inputs z ′ ∈ {0, 1}M , subsequently:
ϕj (fˆ′ , x) ≥ ϕj (fˆ, x)
According to the consistency principle, the Shapley value
will rise or remain the same if a model is adjusted so the
marginal contribution of a trait value either increases or
remains the same (independent of other features).
IV. R ESULT AND D ISCUSSION
A. Data pre-processing and clustering results
To balance the imbalanced credit card fraud data is first split Fig. 4. Silhouette score of DBSCAN Clusters for non-fraud samples where
into pre-determined fraud into non-fraud groups. Only 0.172% ϵ ranging from 0.1 to 0.6 .
of the transactions in the dataset were fraudulent. To somewhat
balance the dataset, 500 fraud cases were synthesised using
for DBSCAN is taken as 5 and the distance metric is taken as
CTGAN, and the non-fraud dataset was randomly sampled into
“euclidean”. Both are the sci-kit learn library’s default settings.
5000 cases. The ratio of fraud samples to non-fraud samples
The silhouette scores for various epsilon values ranging from
in the resulting dataset is roughly 80:20. This was done to
0.1 to 0.6 for non-fraud transaction samples were computed
maintain a rough balance between the new and original dataset
and displayed in Figure 4. The maximum silhouette score
as discussed in section III-B1.
obtained is 0.43742403 for the ϵ ≥ 0.57. Similarly, for fraud
transaction samples, the silhouette scores for different epsilon
values ranging from 0.1 to 1.5 were computed. The highest
silhouette score obtained is 0.7805387 for the ϵ ≥ 1.2.
Stratified sampling i.e. random sample from all the clusters
(identified by DBSCAN) is done on both the sample classes
and 40% of the total data is chosen. This sampled data from
both classes are merged and mixed randomly (so that no
unnecessary pattern of fraud and non-fraud classes can be
identified by the model during training) to form the final
dataset to be trained and tested in the DBN model.

B. Fraud detection classification results


A 70:30 train-test dataset is created from the final dataset.
It uses a supervised DBN classifier with a hidden layer
structure of 50 × 50, with the learning rate of RBM taken
as 0.05, and the overall learning rate of the deep network
is set to be 0.09. 500 iterations of backpropagation of the
Fig. 3. Clustering of UMAP-based 2D embedding of non-fraud samples using
DBSCAN. deep ANN were performed after the 30 epoch training of
RBM is completed. ReLU was used as the model’s Activation
Both classes are standardised before the data is non-linearly function, and the batch size of the dataset was set at 16. 20% of
embedded using UMAP. Here, a 2-dimensional embedding has the neurons were randomly removed during regularisation to
been carried out to visualise the data [Figure 3]. However, avoid overfitting the network. The Deep Belief Network model
the dimensionality of the latent space can be changed as per achieved an overall accuracy of 94%. For the 30th epoch, the
the user’s requirements and desires. It is strongly advised to RBM Reconstruction error is 0.137217, and the ANN training
perform a linear dimensionality reduction, such as PCA, on the loss is 0.362541.
dataset before passing it through the UMAP for better model The model is analysed with performance matrices as por-
performance if the dimension of the data is too large. trayed in Table II. Precision (also called positive predictive
The resulting dataset is then divided into five and six value) is the fraction of relevant instances among the retrieved
distinct clusters using DBSCAN for non-fraud and fraud instances, while recall (also known as sensitivity) is the frac-
classes respectively. The minimum sample (threshold value) tion of relevant instances that were retrieved. Both precision

3039
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Receiver operating characteristic curves: Fraud-NonFraud classes

and True Positives. The ’Area Under the ROC Curve’, often
Fig. 5. Confusion matrix: Fraud-NonFraud classes known as AUC, is an overall indicator of performance overall
potential categorization criteria. The likelihood that the model
TABLE II values a random positive example higher than a random
C LASSIFICATION REPORT FOR 70:30 TRAIN - VALIDATION RATIO
negative example is one approach to analyse AUC.
precision recall f1-score
False 0.99 0.93 0.96 C. Classification Explainability
True 0.77 0.98 0.86

accuracy 0.94
macro avg 0.88 0.95 0.91
weighted avg 0.95 0.94 0.94

and recall are therefore based on relevance. The definitions of


precision and recall are:
true positive
precision =
true positive + f alse positive
Fig. 7. Beeswarm plot of testing dataset for both the classes.
true positive
recall =
true positive + f alse negative The probability of the test data is computed by the trained
The balanced F-score, often known as the classic F-measure, DBN model. Using the SHAP explainer, the explanations of
is a metric that is the harmonic mean of accuracy and recall: the probability are obtained. Here explanation refers to how
individual features contribute to the final prediction. Since the
precision · recall original dataset is sampled and embedded into a 2-dimensional
F =
precision + recall latent space, the SHAP explainer explains the probabilities
concerning the 2 embedded features.
The quantity of true positives, false positives and false The additive feature inputs would typically be better mapped
negatives is determined by confusion matrices. The quantity to the probabilistic output space of the model using a logit link
of inaccurately erroneous forecasts is measured by the false- function, but since DBN may generate infinite log odds ratios,
negative rate. The model’s performance is shown using a we don’t utilise it in this study. An even better approach to
confusion matrix, as shown in Figure 5. assess the relative influence of all characteristics over the total
The ROC-AUC curve for the complete framework is de- data is through a beeswarm plot, as shown in Figure 7. The
picted in the image 6. A ROC curve, also known as a receiver order of features is determined by the total of their SHAP
operating characteristic curve, compares the true positive rate values throughout all samples.
and the false positive rate at various classification levels. More A heatmap plot offers a different, more comprehensive look
items are classified as positive when the classification thresh- at the model’s behaviour, with an emphasis on the population
old is lowered, which raises the number of both False Positives subgroups shown in Figure 8.

3040
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
framework may also be integrated with Fog architecture.
Another extension of this work could be distributed learning of
this model using some cryptographic methodology to securely
transmit embedded data between client and server.

R EFERENCES
[1] M. Erfani, F. Shoeleh, and A. A. Ghorbani, “Financial fraud detection
using deep support vector data description,” in 2020 IEEE International
Conference on Big Data (Big Data). IEEE, 2020, pp. 2274–2282.
[2] “Credit card fraud statistics 2022,”
https://www.definefinancial.com/blog/identity-theft-credit-card-fraud-
statistics/.
[3] “Credit card fraud capitals of europe,”
https://merchantmachine.co.uk/credit-card-fraud-capitals-of-europe/.
[4] P. Raghavan and N. El Gayar, “Fraud detection using machine learning
and deep learning,” in 2019 international conference on computational
Fig. 8. Heatmap plot of testing dataset for both the classes. intelligence and knowledge economy (ICCIKE). IEEE, 2019, pp. 334–
339.
[5] M. Khedmati, M. Erfani, and M. GhasemiGol, “Applying support vector
data description for fraud detection,” arXiv preprint arXiv:2006.00618,
V. P RIME L IMITATIONS 2020.
The absence of actual data was the main obstacle our [6] Q. Li and Y. Xie, “A behavior-cluster based imbalanced classification
method for credit card fraud detection,” in Proceedings of the 2019 2nd
research had to overcome. The majority of readily avail- International Conference on Data Science and Information Technology,
able datasets are either manipulated, purposely produced, or 2019, pp. 134–139.
pseudo-data. For reasons of safety, the data we utilized was [7] A. K. I. Hassan and A. Abraham, “Modeling insurance fraud detec-
tion using imbalanced data classification,” in Advances in nature and
processed using principal component analysis. In this research biologically inspired computing. Springer, 2016, pp. 117–127.
work, Explainable AI was implemented for explaining the role [8] Dbnex. [Online]. Available: https://github.com/abhimanyubhowmik/
of different features for decision making which would have DBNex
[9] N. Malini and M. Pushpa, “Analysis on credit card fraud identification
worked best with original data. techniques based on knn and outlier detection,” in 2017 Third Interna-
tional Conference on Advances in Electrical, Electronics, Information,
VI. C ONCLUSION Communication and Bio-Informatics (AEEICB), 2017, pp. 255–258.
An innovative framework for fraud detection was put out in [10] K.-P. Wu and S.-D. Wang, “Choosing the kernel parameters for support
vector machines by the inter-cluster distance in the feature space,”
this research. The dataset imbalance is where this work faces Pattern Recognition, vol. 42, no. 5, pp. 710–717, 2009.
its biggest difficulties. The most recent deep data generation [11] Y. Zhang, G. Liu, L. Zheng, and C. Yan, “A hierarchical clustering
model, CTGAN, was employed to provide findings that are strategy of processing class imbalance and its application in fraud detec-
tion,” in 2019 IEEE 21st International Conference on High Performance
remarkably accurate. The strategy used for data preprocess- Computing and Communications; IEEE 17th International Conference
ing is what makes this work novel. It effectively maps the on Smart City; IEEE 5th International Conference on Data Science and
data into a low-dimensional latent space using the power of Systems (HPCC/SmartCity/DSS). IEEE, 2019, pp. 1810–1816.
non-linear embedding, and it leverages stratified sampling to [12] Y. Lucas, P.-E. Portier, L. Laporte, L. He-Guelton, O. Caelen, M. Gran-
itzer, and S. Calabretto, “Towards automated feature engineering for
provide small training sample data. It is an effective method credit card fraud detection using multi-perspective hmms,” Future Gen-
that can be scaled depending on the situation because the eration Computer Systems, vol. 102, pp. 393–402, 2020.
sample training size is smaller than the whole amount of data. [13] A. Srivastava, A. Kundu, S. Sural, and A. Majumdar, “Credit card
fraud detection using hidden markov model,” IEEE Transactions on
For categorization, a deep implementation of the Restricted dependable and secure computing, vol. 5, no. 1, pp. 37–48, 2008.
Boltzmann Machine called the Deep Belief Network was used. [14] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data
It can be further optimised as a deep learning model to mining for credit card fraud: A comparative study,” Decision support
systems, vol. 50, no. 3, pp. 602–613, 2011.
improve accuracy, precision, or recall. In order to decrease [15] Y. Sahin, S. Bulkan, and E. Duman, “A cost-sensitive decision tree
overfitting and make the model more generalizable with regard approach for fraud detection,” Expert Systems with Applications, vol. 40,
to a big dataset, dropout regularisation was applied to the no. 15, pp. 5916–5923, 2013.
[16] N. Mahmoudi and E. Duman, “Detecting credit card fraud by modified
model. Utilizing metrics that are accepted in the business, such fisher discriminant analysis,” Expert Systems with Applications, vol. 42,
as the ROC-AUC curve and the Confusion Matrix, we may no. 5, pp. 2510–2516, 2015.
assess how well our model performs. We also employ SHAP, [17] S. Agrawal and J. Agrawal, “Survey on anomaly detection using data
mining techniques,” in KES, 2015.
a well-used model explainability framework, to explain the [18] A. Bhowmik, M. Sannigrahi, P. Guha, D. Chowdhury, and S. S. Gill,
classification model predictions. Overall, with the aid of our “Dynamite: Dynamic aggregation of mutually-connected points based
preprocessing techniques, this model is able to solve the issues clustering algorithm for time series data,” Internet Technology Letters,
p. e395, 2022.
brought on by data imbalance and distinguish fraud samples [19] M. Hossin and M. N. Sulaiman, “A review on evaluation metrics for
from their non-fraud counterparts. data classification evaluations,” International journal of data mining &
knowledge management process, vol. 5, no. 2, p. 1, 2015.
VII. F UTURE W ORKS [20] Q. Gu, L. Zhu, and Z. Cai, “Evaluation measures of the classification
performance of imbalanced data sets,” in International symposium on
In the future, the model can be further extended to process intelligence computation and applications. Springer, 2009, pp. 461–
very large-scale data, i.e., big data. Alternatively, the proposed 471.

3041
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.
[21] D. Olszewski, “Fraud detection using self-organizing map visualizing the
user profiles,” Knowledge-Based Systems, vol. 70, pp. 324–334, 2014.
[22] J. R. Dorronsoro, F. Ginel, C. Sgnchez, and C. S. Cruz, “Neural
fraud detection in credit card operations,” IEEE transactions on neural
networks, vol. 8, no. 4, pp. 827–834, 1997.
[23] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, “Deep
learning detecting fraud in credit card transactions,” in 2018 Systems and
Information Engineering Design Symposium (SIEDS). IEEE, 2018, pp.
129–134.
[24] J. Jurgovsky, M. Granitzer, K. Ziegler, S. Calabretto, P.-E. Portier, L. He-
Guelton, and O. Caelen, “Sequence classification for credit-card fraud
detection,” Expert Syst. Appl., vol. 100, pp. 234–245, 2018.
[25] Z. Zhang, X. Zhou, X. Zhang, L. Wang, and P. Wang, “A model based
on convolutional neural network for online transaction fraud detection,”
Security and Communication Networks, vol. 2018, 2018.
[26] P. Smolensky, “Information processing in dynamical systems: Founda-
tions of harmony theory,” Colorado Univ at Boulder Dept of Computer
Science, Tech. Rep., 1986.
[27] Y. Hua, J. Guo, and H. Zhao, “Deep belief networks and deep learn-
ing,” in Proceedings of 2015 International Conference on Intelligent
Computing and Internet of Things. IEEE, 2015, pp. 1–4.
[28] A. Pumsirirat and Y. Liu, “Credit card fraud detection using deep
learning based on auto-encoder and restricted boltzmann machine,”
International Journal of advanced computer science and applications,
vol. 9, no. 1, 2018.
[29] “Credit card fraud detection,” https://www.kaggle.com/datasets/mlg-
ulb/creditcardfraud.
[30] L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni,
“Modeling tabular data using conditional gan,” Advances in Neural
Information Processing Systems, vol. 32, 2019.
[31] L. Xu and K. Veeramachaneni, “Synthesizing tabular data using gener-
ative adversarial networks,” arXiv preprint arXiv:1811.11264, 2018.
[32] L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold
approximation and projection for dimension reduction,” arXiv preprint
arXiv:1802.03426, 2018.
[33] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
[Online]. Available: http://www.jmlr.org/papers/v9/vandermaaten08a.
html
[34] Y. Hozumi, R. Wang, C. Yin, and G.-W. Wei, “Umap-assisted k-means
clustering of large-scale sars-cov-2 mutation datasets,” Computers in
biology and medicine, vol. 131, p. 104264, 2021.
[35] C. Kadilar and H. Cingi, “Ratio estimators in stratified random sam-
pling,” Biometrical Journal: Journal of Mathematical Methods in Bio-
sciences, vol. 45, no. 2, pp. 218–225, 2003.
[36] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based
algorithm for discovering clusters in large spatial databases with noise.”
in kdd, vol. 96, no. 34, 1996, pp. 226–231.
[37] B. Lafabregue, J. Weber, P. Gançarski, and G. Forestier, “End-to-end
deep representation learning for time series clustering: a comparative
study,” Data Mining and Knowledge Discovery, vol. 36, no. 1, pp. 29–
81, 2022.
[38] A.-r. Mohamed, G. Dahl, G. Hinton et al., “Deep belief networks for
phone recognition,” in Nips workshop on deep learning for speech
recognition and related applications, vol. 1, no. 9, 2009, p. 39.
[39] R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted boltzmann
machines for collaborative filtering,” in Proceedings of the 24th inter-
national conference on Machine learning, 2007, pp. 791–798.
[40] D. Chowdhury, S. Poddar, S. Banarjee, R. Pal, A. Gani, C. Ellis,
R. C. Arya, S. S. Gill, and S. Uhlig, “Covidxai: Explainable ai assisted
web application for covid-19 vaccine prioritisation,” Internet Technology
Letters, p. e381, 2022.

3042
Authorized licensed use limited to: New Horizon College of Engineering - Bengaluru. Downloaded on November 09,2024 at 04:10:59 UTC from IEEE Xplore. Restrictions apply.

You might also like