0% found this document useful (0 votes)
144 views8 pages

Improve Profiling Bank Customer Behavior Using ML

This document summarizes a research article that evaluates different machine learning techniques for profiling bank customers' behavior using a dataset from a Taiwanese bank. The techniques evaluated are k-means clustering, improved k-means, fuzzy c-means clustering, and neural networks. The researchers compare the accuracy of each technique by applying them to the labeled bank customer transaction and demographic data and determining which technique most accurately classifies customers.

Uploaded by

ranaya23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views8 pages

Improve Profiling Bank Customer Behavior Using ML

This document summarizes a research article that evaluates different machine learning techniques for profiling bank customers' behavior using a dataset from a Taiwanese bank. The techniques evaluated are k-means clustering, improved k-means, fuzzy c-means clustering, and neural networks. The researchers compare the accuracy of each technique by applying them to the labeled bank customer transaction and demographic data and determining which technique most accurately classifies customers.

Uploaded by

ranaya23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Improve profiling bank customer's


behavior using machine learning
EMAD ABD ELAZIZ DAWOOD a, ESSAM ELFAKHRANY b, FAHIMA A. MAGHRABY c
a
Department of Information systems, Arab academy for science, technology, and maritime transport, Cairo, Egypt.
b,c
Department of computer science, Arab academy for science, technology, and maritime transport, Cairo, Egypt.

Corresponding author: Emad Abd Elaziz ([email protected])

ABSTRACT In the banking industry, credit card evolution is a noticeable occurrence. Each banking system
includes a huge dataset for customer's transactions of their credit cards. Therefore, banks would be in need
of customer profiling. Profiling bank customer's cognize the issuer’s decisions about whom to give banking
facilities and what a credit limit to provide. It also helps the issuers get a better understanding of their potential
and current customers. In previous research, Customer profiling mainly depends on transaction data or
demographic data, but in this research, we merge both data in order to get a more accurate result and minimize
the risk. By finding the best technique, it leads to improvement in accuracy and helps banks to get higher
profitability by customer satisfaction through a focus on the valuable customer (companies) which consider
as the main engine in the bank's profitability. This study aims at using k-mean, improved k-mean, fuzzy c-
means and neural networks. The used dataset is labeled and creating a new label as a target for neural network
classification is the main aspect of this study, which helps to reduce the clustering execution time and get the
best accuracy results. Finally, by comparing the accuracy ratio it shows that the neural network is the best
clustering technique.

INDEX TERMS profiling, banking, machine learning, k-mean, fuzzy c-mean, neural network classifier.

I. INTRODUCTION how to find equations and functions that not work only in the
In the modern era of the banking sector, banks have large example that it has, but also in the future work for unknown
datasets contain customer's information and their history of ones. Machine learning not only helps in upgrade connection
transactions. So that banks need to divide these large datasets levels with current customers, but it also plays an important
into small clusters to be able to analyze these customer's role in predicting the behavior of customers based on a certain
behaviors for using it in the best way to suggest a suitable group of occurrences or patterns which identify their future
strategy to attain the highest benefits, customer satisfaction strategy, planning on offering targeted credit products to the
to increase profitability. To achieve this purpose, customer customers. It shifted the focus to the customer and modify the
profiling or customer segmentation is used. Profiling role played by banks in their current format. The four machine
produces customer profiles, which provide the banks with a learning techniques which are used in this research are (K-
full description of their customers based on a set of attributes. mean, improved k-mean, fuzzy c-mean, and artificial neural
Customer segmentation refers to characterize the groups of networks) and their applications are applied to a real dataset
customers based on either specific characteristics (e.g. from a bank in Taiwan, and then compare the accuracy ratio
region, age, income for demographic segmentation) or their between them. The used machine learning techniques are
behavior (for behavioral segmentation). However, ‘customer about profiling the customer behaviors into clusters.
segmentation’ and ‘profiling’ are considered as two sides of
the same coin. The rest of this paper is organized as follows:
Banks are confronting many challenges like default Section II: presents the related works, which focus on profiling
prediction, risk management, customer retention, and customers using machine-learning techniques. Section III:
customer profiling for different purposes to achieve higher explains the four machine learning techniques and the
profitability and reduce the risk. So it is necessary to identify accuracy measures, which are used in our research. Section
customers well, to solve such challenges. Machine learning IV: describes the dataset and its attributes. Section V: clarifies
is the science of enabling computers to act without being the proposed model and applying the techniques on the
programmed. Machine learning is so pervasive today that dataset. Section VI: shows the results of our experiment and
you probably use it dozens of times a day without knowing compares it with the results of earlier researches. Section VII:
it [1]. Machine learning teaches the computer how to learn, presents the conclusion and future work.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

ii. LITERATURE SURVEY used to get the average area under the curve (AUC) and the
Many researchers are working on the problem of profiling correct rate of the model. Light GBM (high-performance
bank customers using different techniques and different Gradient Boosting framework built by Microsoft Company)
datasets. was the highest accuracy rate. The model of Light GBM
The following papers focus on bank customers profiling and achieved an accuracy ratio by F1-measure equal 89.34%.
machine learning techniques used:
In 2018, NH Niloy [8] presented a classification model for the
In 2015, Majid Sharahi[2] presented a classification model for credit card default data set for a bank in Taiwan. Naïve
the dataset of Sepah Bank Branches Tehran using two steps Bayesian Classifier and Decision Trees were used as
and k-means clustering algorithms. Segmentation of 60 classification algorithms to classify if the client is the default
companies, which were customers of Sepah Bank, was a kind credit cardholder or not. The result of this paper showed that
of demographic and behavioral segmentation and it helped to Naïve Bayesian achieved the best accuracy.
identify the loyal customers.
In 2019, Ali Arshad [9] presented a multi-class classification
In 2016, M. Ayoubi [3] explained a customer segmentation model for eighteen datasets from the UCI repository. Semi-
model based on the two-step algorithm and Kohonen neural Supervised Deep Fuzzy C-Mean (DFCM-MC) was used in
network. Customer segmentation based on effective factors on this paper for clustering semi-supervised data. They
Customer Lifetime Value (CLV). The dataset about 56000 introduced a new label for the unlabeled data by fuzzy c-mean.
customers of the “Taavon bank” was used in this research. They used the labeled data (supervised data) and unlabeled
Firstly, by using the means of a Two-step approach, the data (unsupervised data) with the new label that extracted the
optimum number of clusters was determined. Then,” Kohonen discriminatory information that was used for classification.
neural network" was applied. Based on WRFM (the weight of The accuracy rate of DFCM-MC was 80.82% and the f-
Recency, Frequency, and Monetary) model, the value of each measure was 78.16%.
cluster was calculated.
The previous literature survey shows that various machine-
In 2017, Shamala Palaniappan [4] presented a profiling model learning algorithms were used for predicting and clustering
for the customers of a Portuguese retail bank within the different datasets by many authors. All of them clustered the
duration of five years (2008 to 2013). This paper focused on original datasets with the existing label, but in this work, we
helping banks to increase the accuracy of their customer create a new label by using the unsupervised technique and use
profiling through classification as well as identifying a group it as a target for the neural network algorithm. A profiling
of customers who had a high probability to subscribe to a long- model was built for the dataset of bank customers using a
term deposit. Three classification algorithms were used which supervised machine learning algorithm depends on the result
were Naïve Bayes, Random Forest, and Decision Tree. of the unsupervised techniques as input for the supervised
algorithm.
In 2017, Arpit Bansal [5] presented a modification in a
clustering model of the k-means algorithm. This modification 2.1. The impact of the dollar crisis on credit cards in Egyptian
based on normalization. The researcher to find the results used banks:
the Cancer Dataset. The original data were highly Some customer switches from one bank to another because of
dimensional, but only five attributes had been finally banks do not classify the customer as the best rating so there
considered based on requirements. This paper showed that the is no satisfaction for them. In recent days, Due to the high price
accuracy rate for the existing algorithm equal to 57.14% while of the dollar against the Egyptian pound (Dollar crisis),
the improved algorithm recorded 92.86%. customers tend to use credit cards, which need a good rating
so that the customer is satisfied to get the best profit and reduce
In 2017, P. S. Patil and N. V. Dharwadkar, [6], produce a the risk.
prediction and classification model for two datasets of bank Egypt's largest listed bank, Commercial International Bank
customer's data. They used the Artificial Neural Network (CIB), told customers on July 2016 it was reducing the number
(ANN) in this model then weighted the results. By applying of foreign currency customers can spend and withdraw when
the ANN algorithm and the proposed model, shows that the using their debit and credit cards abroad. Egypt's central bank
ANN algorithm works efficiently for the two datasets. This wrote to bank chiefs asking that they "ensure that debit cards,
algorithm gave an accuracy rate of 72% for dataset1 and 98% including pre-paid cards, issued in local currency by Egyptian
for dataset2. banks are only used within the country." CIB did not specify
which cards would be affected or give the new limits, but
In 2018, Shenghui Yang [7] presented a classification model several bank staff told Reuters that the move would affect both
for the credit card default data set in the bank from Taiwan credit and debit cards with limits cut by about 50 percent. CIB
using five clustering algorithms. 10-fold cross-validation was cut Classic Card owners' maximum purchases outside of
2

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

Egypt to $2,500 a month from $5,000 and $3,500 a month ∣∣xi−vj∣∣ is the Euclidean distance between a point, xi, and
from $7,500 a month for Gold Card owners [10]. HSBC Egypt a centroid, vj, iterated overall c points in the ith cluster, for
(The Hong Kong and shanghai banking corporation) says that all n clusters.
all credit and debit cards have a limit of $100 per month,
though it does not specify whether this is for cash withdrawals B. IMPROVED K-MEANS CLUSTERING
or purchases, according to the bank's website [11, 12]. Other ALGORITHM
Egyptian banks have put limits on debit and credit card Improvement in the k-means clustering algorithm was
purchases and ATM withdrawals abroad. According to used because it can define the number of clusters
Ahmed Aboul Dahab, head of retail at SAIB Bank (Arab automatically and assign the required cluster to un-
International Banking Company) [13], says that the bank clustered points. The proposed improvement leads to
registered a 70-percent drop in credit card usage in January achieve high accuracy and reduce the clustering time by the
and February compared to the same period a year earlier. member assigned to the cluster. An improved k-means
Because of this crisis, many customers turned from their banks clustering algorithm based on dissimilarity. It selects the
to another searching for the high limit. So that any bank may initial centroids using the Huffman tree, which uses the
lose a huge number of customers, so we suggest to reprofiling dissimilarity matrix. Many experiments confirm that the
the bank customers to put them in a suitable cluster to increase improved algorithm is efficient with better clustering
customer retention and get high profitability. accuracy on the same algorithm time complexity [16].

iii. METHODS C. FUZZY C-MEANS CLUSTERING


In the world of information explosion, individual banks Fuzzy clustering (also referred to as soft clustering) is a
produce and collect a huge volume of data every day. Right form of clustering in which each data point can belong to
now, machine learning is an indispensable tool in the decision more than one cluster. In fuzzy clustering, data points can
support system and plays a key role in customer segmentation, potentially belong to multiple clusters. One of the most
customer services, fraud detection, credit and behavior widely used Fuzzy Clustering Algorithms is the Fuzzy C-
scoring, and benchmarking [14]. Machine learning authorizes means clustering (FCM) Algorithm. (FCM) clustering was
you to take your segmentation to the up next level. Machine developed by J.C. Dunn in 1973, and improved by J.C.
learning segments are effective: they can update in real-time. Bezdek in 1981. The algorithm focuses on improving the
This makes it possible to automate the personalization clustering or centroid computation without considering the
methods; the thing that is necessary if you want to publish noise and outliers [17].
them widely.
The four machine learning techniques employed in this study Algorithmic steps for Fuzzy c-means clustering:
are discussed below:
A. K-MEANS ALGORITHM Let X = {x1, x2, x3 ..., xn} be the set of data points and
V = {v1, v2, v3 ..., vc} be the set of centers.
K-mean clustering technique is one of the most commonly 1) Randomly select ‘c’ cluster centers.
used techniques for years because of its stability and Mac 2) Calculate the fuzzy membership 'µij' using:
Queen proposes simplicity. The K-Means clustering 𝜇𝑖𝑗 = 1/ ∑𝑐𝑘=1(𝑑𝑖𝑗 /𝑑𝑖𝑘 )(2/m-1) (2)
algorithm in 1967 is a partition-based cluster analysis
method. K-means execute division of objects into clusters 3) Compute the fuzzy centers 'vj' using:
that are “similar” between them and “dissimilar” to the 𝑚 𝑚
𝑣𝑗 = (∑𝑛𝑖=1(𝜇𝑖𝑗 ) 𝑥𝑖 )/(∑𝑛𝑖=1(𝜇𝑖𝑗 ) ), ∀𝑗 = 1,2, … . 𝑐 (3)
objects belongs to another cluster. It is used widely in
where,
cluster analysis for that, the K-means algorithm has higher
efficiency and scalability and converges fast when dealing
'n' is the number of data points.
with large data sets. K-means clustering is a type of
'vj' represents the jth cluster center.
unsupervised learning, which is used when you have
'm' is the fuzziness index m € [1, ∞].
unlabeled data (i.e., data without defined categories or
'c' represents the number of the cluster center.
groups). The goal of this algorithm is to find groups in the
'µij' represents the membership of ith data to jth cluster
data, with the number of groups represented by the variable
center. 'dij' represents the Euclidean distance between ith
K. K-means is a fast and efficient method, because the
data and jth cluster center.
complexity of one iteration is k*n*d where k (number of
clusters), n (number of examples), and d (time of D. ARTIFICIAL NEURAL NETWORKS (ANN)
computing the Euclidian distance between two A neural network sometimes is a simplified pattern of
points)[15].the following equation represent k-mean human brain information processing. The neural network
clustering algorithm: by simulating the inner connection between the neurons
𝐽(𝑉) = ∑𝑐𝐼=1 ∑𝑛𝐽=1(‖𝑋𝑖 − 𝑉𝑗‖)2 (1) works. Warren McCulloch and Walter Pitts (1943) created

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

a computational model for neural networks based on • False Negative (FN): Observation is positive, but it is
mathematics and algorithms called threshold logic. This predicted negative.
model paved the way for neural network research to split • True Negative (TN): Observation is negative, and it is
into two approaches. One approach focused on biological predicted to be negative.
processes in the brain while the other focused on the • False Positive (FP): Observation is negative, but it is
predicted positive.
application of neural networks to artificial intelligence. A
common use of the phrase "ANN model" is really the
iv. DATA SET:
definition of a class of such functions (where members of
The data set (‘default of credit card clients) is obtained
the class are obtained by varying parameters, connection
from the archive of UCI (the University of California,
weights, or specifics of the architecture such as the number Irvine) Machine Learning Repository [19]. It is a recently
of neurons or their connectivity)[18]. This methodology published dataset (obtained in 2015). The attribute details
provides the opportunity of creating a large combination of in the dataset are given in Table 1. The data set contains
different structures based on 30000 observations and 23 variables and there are no
• Number of layers, missing data on it. All explanatory variables were
• Selection of activation function. normalized. Standardizing data is a data pre-processing
• The number of perceptrons. step applied to variables to scale these variables to a similar
• Normalization layers range. This research aimed at the case of customer's default
• Dropout adjustments payments in Taiwan and compares the accuracy rate of
profiling customers among four machine-learning
techniques. Therefore, among the four machine learning
techniques, the artificial neural network is the only one that
(4) can accurately profile the data set.
𝑎𝑙𝑖 = 𝜎 (∑ 𝜔𝑗𝑘
𝑙
𝑎𝑘𝑙−1 + 𝑏𝑗𝑙 )
𝑘 Table 1. Description of the attributes in the dataset

Where the activation 𝑎𝑙𝑖 of the jth neuron in the lth layer is
Attribute no. Attribute name Description
related to the activations in the (l−1)th layer. Weight matrix X1 Limit_ BAL Amount of the given
wl for each layer, l. the entry in the jth row and kth column credit (NT dollar)
𝑙
is 𝜔𝑗𝑘 . X2 Sex Gender (1 = male;
2 = female).
X3 Education Education (1 =
graduate school;
E. EVALUATION METRICS: 2 = university;
After building a machine learning profiling model, the 3 = high school;
performance of this model should be measured by different 4 = others).
accuracy measures to evaluate it. In this paper, there are X4 Marital status Marital status (1 =
married; 2 = single;
different techniques (supervised and unsupervised) so 3 = others).
evaluation of their performance of classification was X5 Age Age (year).
measured by using these measures shown in the following X6-X11 Pay_0 to Pay_6 April to September
X12-X17 Bill_AMT1 to Amount of bill
equations (5, 6, 7, 8, 9, 10, and 11). BILL_AMT6 statement
(NT dollar)
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (5) X18-X23 Pay_AMT1 to
PAY_AMT6
Amount of previous
payment (NT dollar)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
TP X24 Y Default payment (Yes
Sensitivity = (6) = 1,
TP+FN No = 0)
TN
Specificity = (7)
TN+FP
TP
Precision = (8) v. PROPOSED FRAMEWORK:
TP+FP
TP The main idea of our proposed model shown in
Recall = sensitivity = (9) figure 1 is to improve profiling bank customer's
TP+FN
(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙)
F-measure = 2 ∗ (𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙) (10) behavior using different machine learning techniques.
This model starts with the data set, which obtained from
G-mean = √𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 ∗ 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑡𝑦 (11) the UCI machine learning repository. Then data goes
through the step of data preprocessing. After that, the
Where machine learning techniques are applied to build the
• True Positive (TP): Observation is positive, and it is customer profile. In machine learning, the profiling
predicted to be positive. phase recognizes the items in a group and places them

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

𝑋−𝑋𝑚𝑖𝑛
under target categories. In this paper, the accuracy rate 𝑋𝑛𝑒𝑤 = 𝑋 (12)
of techniques is evaluated through Gini co-efficient for 𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛
the unsupervised techniques then used the results as
input for supervised technique (Artificial Neural 2. Classification using machine learning
Network) (ANN) then evaluates the results to compare algorithms:
them to get the best technique. The result of data preprocessing is the final training set.
Then, applying the four machine learning techniques on
the final training set. The first technique was applied is
the K-means algorithm. The number of clusters is
determined based on the researcher's pre-knowledge.
So, in this paper, the researcher determined the number
of clusters as three.
The second classifier, improved k-mean that determine
the number of clusters as five clusters by the next steps
[21]:
1. using the intra-cluster distance measure, which is
simply the distance between a point and its cluster
center and we take the average of all of these
distances, defined as
𝐾
1
𝑖𝑛𝑡𝑟𝑎 = ∑ ∑‖𝑥 − 𝑧𝑖 ‖2
𝑁
𝑖=1 𝑥𝜀𝐶𝑖
(13)
Where N is the number of pixels in the image, K is the
number of clusters, and zi is the cluster center of cluster
Ci. We obviously want to minimize this measure.
2. The next step is minimizing this measure.
Measuring the inter-cluster distance, or the
distance between clusters, which must be as big
as possible. Then calculate this as the distance
between cluster centers, and take the minimum of
this value, defined as
FIGURE 1. The proposed model for profiling bank customers 2
𝑖𝑛𝑡𝑒𝑟 = 𝑚𝑖𝑛 (‖𝑧𝑖 − 𝑧𝑗 ‖ ),𝑖 = 1,2, … . , 𝑘 − 1
𝑗 = 𝑖 + 1, . . , 𝑘
1. Data preprocessing
(14)
Data preprocessing is the first important step in the data
Where cluster centers are zi ’ and zj. K is the number of
mining process. If there is much not relevant and
clusters.
superfluous information present or noisy and untrusted 3. Only taking the minimum of this value, the
data, analyzing data that has not been carefully checked smallest of this distance to be maximized, and the
for such problems can produce not accurate results. other larger values will automatically be bigger
Thus, the quality and representation of data are first and than this value.
important before applying the analysis. Often, data 4. Finally, calculate the ratio of inter and intra which
preprocessing has been the most important phase in our defined as validity:
machine-learning project. Firstly, the normalization 𝐼𝑛𝑡𝑟𝑎
Validity = (15)
𝐼𝑛𝑡𝑒𝑟
process is confirmed in the database. In most problems,
5. Therefore, the clustering, which gives a minimum
to normalize the data, at first eliminate the units of value for the validity measure; tell us what the
measurement for data, to be able to easily compare data ideal value of K (number of clusters).
from different places. One of the most common ways to The third classifier is a fuzzy c-mean that applied to the
normalize data includes: data set using a number of clusters as five.
Re-scaling data to have values between 0 and 1.
This is usually called feature scaling. One possible The next step, calculation Gini co-efficient for each one of
formula to achieve this is [20]: the three unsupervised algorithms getting the best accuracy
for profiling the dataset.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

Finally, applying (ANN). We take the results of the


Neural network evaluation:
unsupervised techniques as a target for a neural network to
In this phase, taking the result of the improved k-mean
get its accuracy. By taking our results of K-means,
improved k-means, and fuzzy C-means as targets, we clustering algorithm with a high rate of accuracy as a
introduce a new label for the dataset. Then, try them and target for the neural network algorithm. The results in
get their accuracy by evaluating seven accuracy measures. table3 showed that the neural network was the best
The best classifiers that can help to improve profiling of accuracy rate in classifying the dataset. Therefore, we
bank customers is the highest accuracy one. achieved the aim of this experiment to improve profiling
bank customer's behavior by creating a new label with
unsupervised machine learning techniques.

Figure 3. Shows variation in the gradient coefficient


with respect to the number of epochs. As it is shown in
the Figure, after epoch number 170, the errors have
happened 6 times and the test is stopped at epoch
number 176. The final value of the gradient coefficient
at epoch number 176 is 0.073403, which is
approximate near to zero. The minimum the value of
the gradient coefficient better will be training and
testing of networks.

FIGURE 2. Proposed Model pseudo code steps

vi. EXPERIMENTS AND ANALYSIS


The experiment is applied to Matlab Platform (R2015b)
and using a PC with the following specifications:
Intel(R) Core(TM) i7-2400 CPU @ 3.10 GHz and 6.00
GB RAM, and under windows 64-bit operating system.
A. Analysis and Comparison:
The results in the below table2 show the classification
performance using different numbers of unsupervised FIGURE 3. The training state plot of the proposed model.
machine learning classifiers. Then Gini co-efficient
measured the accuracy. Taking these results as a new Table 3 shows the results of accuracy measures, which
label for the dataset instead of the old label to perform are got from applying the proposed ANN algorithm on
the next step of our experiment and using this new label the dataset on Matlab. It achieved a high accuracy ratio
as a target for the artificial neural network algorithm. by different measures. The accuracy rate equal
98.08%, achieve F-measure as 95.19% and G-mean
TABLE 2. the results of Gini co-efficient for unsupervised techniques
equal 97.96%.
Machine learning technique Best Gini obtained Rank
TABLE 3. The evaluation of the proposed neural network model.
Unsupervised (k-means) 26.37% 3 Measure Value
Unsupervised (improved k- 37.61% 1 Accuracy rate 0.9808
means) Sensitivity 0.9777
Specificity 0.9816
Fuzzy C-means 29.04% 2 Precision 0.9275
Recall 0.9777
Table 2 describes the results of applying the unsupervised F-Measure 0.9519
three techniques on the dataset after evaluating the G-mean 0.9796
performance with Gini co-efficient. It shows that
improved k-means are the best accuracy technique equal
to 37.61%.
6

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

The confusion matrix is shown in fig. 4 is a table that is


used to represent the performance of our classification TABLE 5. the results of the earlier founding researches:
model or (“classifier ANN”) on a set of test data to show Name of Date of Used Accuracy
the true values. By this matrix, the algorithm visualization researcher publishing technique rate
of the performance was detected. It produces an easy Shenghui Sept. 2018 neural 88.83
Yang[22] network
determination of confusion between classes. The Sharjeel 2017 neural 90.99
performance measures are calculated from this confusion Imtiaz[23] network
matrix. Furrakh 2017 Neural 81.7
Shahzad[24] network
(MLP)
Vladislav May 2017 neural 81.1
Pyzhov[25] network

vii. CONCLUSION AND FUTURE WORK:


Profiling has allowed the banks to build an interactive
relationship based on humanistic experience and trust.
Clustering techniques used to divide large datasets into
clusters. Proposed modification in the K-Means clustering
vanished off the two major drawbacks of K-Means
clustering that are the accuracy level and calculation time
consumed in clustering the dataset. The careful analysis of
the profiling environment should be made to ensure
effective and efficient segmenting of the bank's customer
pool to help design its service and product offering to win
FIGURE 4. The confusion matrix of the neural network
customer loyalty and satisfaction. The supervised machine
classifier. learning showed high accurate results of profiling than the
unsupervised technique by creating a new label target for
By scanning the confusion matrix of the neural network, it
the dataset. The artificial neural network showed the
achieves an accuracy rate for the neural network in Matlab
highest accuracy by seven different measures. So that any
equal 98.08%.
bank in the future can use this model and technique to
This confusion matrix shows that there are five clusters
improve profiling of its customer, get high profitability,
with a different number of customers. We can profile them
and reduce the risk.
as
In future work, we try to improve the effectiveness and
TABLE 4. The clusters result from the proposed neural network
model. performance of our proposed approach by applying some
Cluster Cluster N. customer Details deep learning algorithms In medical informatics.
name
1 Platinum 5765 Top class REFERENCES
2 Golden 5580 2nd rank [1] S. S.-Schwartz and S. Ben-David. Understanding machine learning:
From theory to algorithms. Cambridge university press, 2014.
3 Bronze 5171 3rd rank [2] M. Sharahi, M. Aligholi.'' Classify the Data of Bank Customers
4 Silver 6832 4th rank Using Data Mining and Clustering Techniques.'' Journal of Applied
Environmental and Biological Sciences February 11, 2015.
5 Classic 5858 5th rank [3] M. Ayoubi, "Customer segmentation based on CLV model and
neural network." International Journal of Computer Science Issues
(IJCSI) 13.2 (2016): 31.
Table 4 shows the five clusters and the number of [4]S. Palaniappan, A. Mustapha, et al. "Customer Profiling using
customers in each cluster. By scanning and analyzing the Classification Approach for Bank Telemarketing." JOIV: International
results with the dataset, it showed that the platinum Journal on Informatics Visualization 1.4-2 (2017): 214-217.
cluster with 5765 customers is the best. After that the [5]A. Bansal, M. Sharma, and S. Goel. "Improved k-means clustering
algorithm for prediction analysis using classification technique in data
golden, bronze, silver and classic clusters with 5580, mining." International Journal of Computer Applications 157.6 (2017):
5171, 6832 and 5858 customers respectively. 0975-8887.
[6] P. S. Patil and N. V. Dharwadkar, "Analysis of banking data using
Results of the earlier founding researches: machine learning," 2017 International Conference on I-SMAC (IoT in
Table 5 shows that by comparing our results with paper Social, Mobile, Analytics, and Cloud) (I-SMAC), Palladam, 2017, pp.
[22, 23, 24, and 25]; we found that our proposed model 876-881.
[7] S. Yang and H. Zhang. "Comparison of Several Data Mining
achieved the best result in the accuracy measures. The Methods in Credit Card Default Prediction." Intelligent Information
earlier researches we have found using the same dataset Management 10.05 (2018): 115.
and the same technique (ANN).
7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

[8] N. H. Niloy and M. A. I. Navid. "Naïve Bayesian Classifier and Emad Abd Elaziz Dawood was born in
Classification Trees for the Predictive Accuracy of Probability of Sharkia, Egypt, in 1989. He received a
Default Credit Card Clients." American Journal of Data Mining and bachelor's degree in information systems
Knowledge Discovery 3.1 (2018): 1. from the science valley academy in 2010. He
[9]A. Arshad, S. Riaz, and L. Jiao. "Semi-Supervised Deep Fuzzy C- is a teaching assistant in the higher valley
Mean Clustering for Imbalanced Multi-class Classification." IEEE institute of information systems. He is the
Access (2019). Head of the Youth Welfare Authority in the
[10] Ahram Online, "Egypt's Banque Misr conditionally suspends card science valley academy. He is currently
usage abroad amid currency crisis," Egypt's Banque Misr conditionally pursuing a master's degree in information
suspends card usage abroad amid currency crisis - Economy - Business systems with the Arab Academy for
–Ahram online.[Online].Available: Technology and Maritime (AASTMT), Cairo, Egypt. He is a Research
http://english.ahram.org.eg/News/246079.aspx.[Accessed:10-Apr- Scholar with the Department of Computing and Information
2019] Technology, AASTMT. His main fields of research interests are data
[11] N. M. El Agroudy, F. A. Shafiq, and S. Mokhtar. "The effect of mining, machine learning.
the rise in the dollar rate on the Egyptian economy." Sciences 5.02
(2015): 509-514.
[12] H. Hassan and A. Jreisat. "Does bank efficiency matter? A case of Essamedean Elfakhrany received the
Egypt." International Journal of Economics and Financial Issues 6.2 B.S. and M.S. degrees from the Military
(2016): 473-478. Technical College (MTC), Cairo, Egypt, in
[13] T. Hafez, “IN DEPTH-The ups and downs of the Egyptian pound, 1986 and 1991, respectively, and the Ph.D.
"AmCham.[Online].Available: degree in System Engineering, The Ohio State
https://www.amcham.org.eg/publications/business- University, Dec 1999. He is an Assoc.
monthly/issues/256/April-2017/3568/the-ups-and-downs-of-the- Professor in the Computer Science department
egyptian-pound. [Accessed: 09-Apr-2019]. at Arab Academy for Sciences, Technology, &
[14]T. Perraju, "Artificial intelligence and decision support systems." Maritime Transport. His research interests
International Journal of Advanced Research in IT and Engineering 2.4 include data science, ontological knowledge
(2013): 17-26. representation, semantic web, and IoT streaming data analytics. He is
[15] M. Kaur, N.Kaur ''Adaptive K-Means Clustering Techniques For interested in teaching Artificial intelligence, knowledge management,
Data Clustering'' International Journal of Innovative Decision support systems and theory of computation.
Research in Science, Engineering, and Technology (2014).
[16] J. Wang and Su. Xiaolong "An improved K-Means clustering
algorithm." Communication Software and Networks (ICCSN), 2011 FAHIMA A. MAGHRABY received the B.S.
IEEE 3rd International Conference on. IEEE, 2011. degree in Computer Science from AinShams
[17] F. BASER, S. GOKTEN, and P. O. GOKTEN. "Using fuzzy c- University, Cairo, Egypt, in 2003 and the M.S.
means clustering algorithm in financial health scoring." Audit Financiar degree in Computer Science from AinShams
15.147 (2017): 385-394. University, Cairo, Egypt, in 2008. The Ph.D.
[18] S. Deb, "Application of Artificial Neural Networks (ANN)-In degree in Computer Science from AinShams
Designing SODEPUS (Study of Dynamic Earth Processes using University, Cairo, Egypt, in 2014. From 2004 to
Software)." 2014, she was a Lecturer Assistant in the
[19] Default of credit card clients Data Set, UCI machine learning Institute of Computer Science, Shorouk
repository. Academy, Cairo, Egypt. From 2014 till now,
[20] B. K. Singh, K. Verma, and A. S. Thoke. "Investigations on impact she is a lecturer in the Faculty of Computing and Information
of feature normalization techniques on classifier's performance in breast Technology, Arab Academy for Science, Technology and Maritime
tumor classification." International Journal of Computer Applications Transport (AASTMT), Cairo, Egypt. Her research interest includes
116.19 (2015). Bioinformatics, Imaging Processing, Artificial Intelligence, and
[21] S. Ray, and Rose H. Turi. "Determination of number of clusters in Blockchain.
k-means clustering and application in colour image
segmentation." Proceedings of the 4th international conference on
advances in pattern recognition and digital techniques. 1999.
[22] S.Yang, and H. Zhang. "Comparison of Several Data Mining
Methods in Credit Card Default Prediction." Intelligent Information
Management 10.05 (2018z): 115.
[23] S. Imtiaz and A. J. Brimicombe. "A Better Comparison Summary
of Credit Scoring Classification." International Journal of Advanced
Computer Science and Applications 8.7 (2017): 1-4.
[24] M. Pasha, et al. "Performance comparison of data mining
algorithms for the predictive accuracy of credit card defaulters." Int. J.
Comput. Sci. Netw. Secur 17.3 (2017): 178-183.
[25] V. Pyzhov and S. Pyzhov. "Comparison of methods of data mining
techniques for the predictive accuracy." (2017).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like