0% found this document useful (0 votes)

144 views8 pages

Improve Profiling Bank Customer Behavior Using ML

This document summarizes a research article that evaluates different machine learning techniques for profiling bank customers' behavior using a dataset from a Taiwanese bank. The techniques evaluated are k-means clustering, improved k-means, fuzzy c-means clustering, and neural networks. The researchers compare the accuracy of each technique by applying them to the labeled bank customer transaction and demographic data and determining which technique most accurately classifies customers.

Uploaded by

ranaya23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views8 pages

Improve Profiling Bank Customer Behavior Using ML

Uploaded by

ranaya23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Improve profiling bank customer's

behavior using machine learning
EMAD ABD ELAZIZ DAWOOD a, ESSAM ELFAKHRANY b, FAHIMA A. MAGHRABY c
a
Department of Information systems, Arab academy for science, technology, and maritime transport, Cairo, Egypt.
b,c
Department of computer science, Arab academy for science, technology, and maritime transport, Cairo, Egypt.

Corresponding author: Emad Abd Elaziz ([email protected])

ABSTRACT In the banking industry, credit card evolution is a noticeable occurrence. Each banking system
includes a huge dataset for customer's transactions of their credit cards. Therefore, banks would be in need
of customer profiling. Profiling bank customer's cognize the issuer’s decisions about whom to give banking
facilities and what a credit limit to provide. It also helps the issuers get a better understanding of their potential
and current customers. In previous research, Customer profiling mainly depends on transaction data or
demographic data, but in this research, we merge both data in order to get a more accurate result and minimize
the risk. By finding the best technique, it leads to improvement in accuracy and helps banks to get higher
profitability by customer satisfaction through a focus on the valuable customer (companies) which consider
as the main engine in the bank's profitability. This study aims at using k-mean, improved k-mean, fuzzy c-
means and neural networks. The used dataset is labeled and creating a new label as a target for neural network
classification is the main aspect of this study, which helps to reduce the clustering execution time and get the
best accuracy results. Finally, by comparing the accuracy ratio it shows that the neural network is the best
clustering technique.

INDEX TERMS profiling, banking, machine learning, k-mean, fuzzy c-mean, neural network classifier.

I. INTRODUCTION how to find equations and functions that not work only in the
In the modern era of the banking sector, banks have large example that it has, but also in the future work for unknown
datasets contain customer's information and their history of ones. Machine learning not only helps in upgrade connection
transactions. So that banks need to divide these large datasets levels with current customers, but it also plays an important
into small clusters to be able to analyze these customer's role in predicting the behavior of customers based on a certain
behaviors for using it in the best way to suggest a suitable group of occurrences or patterns which identify their future
strategy to attain the highest benefits, customer satisfaction strategy, planning on offering targeted credit products to the
to increase profitability. To achieve this purpose, customer customers. It shifted the focus to the customer and modify the
profiling or customer segmentation is used. Profiling role played by banks in their current format. The four machine
produces customer profiles, which provide the banks with a learning techniques which are used in this research are (K-
full description of their customers based on a set of attributes. mean, improved k-mean, fuzzy c-mean, and artificial neural
Customer segmentation refers to characterize the groups of networks) and their applications are applied to a real dataset
customers based on either specific characteristics (e.g. from a bank in Taiwan, and then compare the accuracy ratio
region, age, income for demographic segmentation) or their between them. The used machine learning techniques are
behavior (for behavioral segmentation). However, ‘customer about profiling the customer behaviors into clusters.
segmentation’ and ‘profiling’ are considered as two sides of
the same coin. The rest of this paper is organized as follows:
Banks are confronting many challenges like default Section II: presents the related works, which focus on profiling
prediction, risk management, customer retention, and customers using machine-learning techniques. Section III:
customer profiling for different purposes to achieve higher explains the four machine learning techniques and the
profitability and reduce the risk. So it is necessary to identify accuracy measures, which are used in our research. Section
customers well, to solve such challenges. Machine learning IV: describes the dataset and its attributes. Section V: clarifies
is the science of enabling computers to act without being the proposed model and applying the techniques on the
programmed. Machine learning is so pervasive today that dataset. Section VI: shows the results of our experiment and
you probably use it dozens of times a day without knowing compares it with the results of earlier researches. Section VII:
it [1]. Machine learning teaches the computer how to learn, presents the conclusion and future work.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2934644, IEEE Access

ii. LITERATURE SURVEY used to get the average area under the curve (AUC) and the
Many researchers are working on the problem of profiling correct rate of the model. Light GBM (high-performance
bank customers using different techniques and different Gradient Boosting framework built by Microsoft Company)
datasets. was the highest accuracy rate. The model of Light GBM
The following papers focus on bank customers profiling and achieved an accuracy ratio by F1-measure equal 89.34%.
machine learning techniques used:
In 2018, NH Niloy [8] presented a classification model for the
In 2015, Majid Sharahi[2] presented a classification model for credit card default data set for a bank in Taiwan. Naïve
the dataset of Sepah Bank Branches Tehran using two steps Bayesian Classifier and Decision Trees were used as
and k-means clustering algorithms. Segmentation of 60 classification algorithms to classify if the client is the default
companies, which were customers of Sepah Bank, was a kind credit cardholder or not. The result of this paper showed that
of demographic and behavioral segmentation and it helped to Naïve Bayesian achieved the best accuracy.
identify the loyal customers.
In 2019, Ali Arshad [9] presented a multi-class classification
In 2016, M. Ayoubi [3] explained a customer segmentation model for eighteen datasets from the UCI repository. Semi-
model based on the two-step algorithm and Kohonen neural Supervised Deep Fuzzy C-Mean (DFCM-MC) was used in
network. Customer segmentation based on effective factors on this paper for clustering semi-supervised data. They
Customer Lifetime Value (CLV). The dataset about 56000 introduced a new label for the unlabeled data by fuzzy c-mean.
customers of the “Taavon bank” was used in this research. They used the labeled data (supervised data) and unlabeled
Firstly, by using the means of a Two-step approach, the data (unsupervised data) with the new label that extracted the
optimum number of clusters was determined. Then,” Kohonen discriminatory information that was used for classification.
neural network" was applied. Based on WRFM (the weight of The accuracy rate of DFCM-MC was 80.82% and the f-
Recency, Frequency, and Monetary) model, the value of each measure was 78.16%.
cluster was calculated.
The previous literature survey shows that various machine-
In 2017, Shamala Palaniappan [4] presented a profiling model learning algorithms were used for predicting and clustering
for the customers of a Portuguese retail bank within the different datasets by many authors. All of them clustered the
duration of five years (2008 to 2013). This paper focused on original datasets with the existing label, but in this work, we
helping banks to increase the accuracy of their customer create a new label by using the unsupervised technique and use
profiling through classification as well as identifying a group it as a target for the neural network algorithm. A profiling
of customers who had a high probability to subscribe to a long- model was built for the dataset of bank customers using a
term deposit. Three classification algorithms were used which supervised machine learning algorithm depends on the result
were Naïve Bayes, Random Forest, and Decision Tree. of the unsupervised techniques as input for the supervised
algorithm.
In 2017, Arpit Bansal [5] presented a modification in a
clustering model of the k-means algorithm. This modification 2.1. The impact of the dollar crisis on credit cards in Egyptian
based on normalization. The researcher to find the results used banks:
the Cancer Dataset. The original data were highly Some customer switches from one bank to another because of
dimensional, but only five attributes had been finally banks do not classify the customer as the best rating so there
considered based on requirements. This paper showed that the is no satisfaction for them. In recent days, Due to the high price
accuracy rate for the existing algorithm equal to 57.14% while of the dollar against the Egyptian pound (Dollar crisis),
the improved algorithm recorded 92.86%. customers tend to use credit cards, which need a good rating
so that the customer is satisfied to get the best profit and reduce
In 2017, P. S. Patil and N. V. Dharwadkar, [6], produce a the risk.
prediction and classification model for two datasets of bank Egypt's largest listed bank, Commercial International Bank
customer's data. They used the Artificial Neural Network (CIB), told customers on July 2016 it was reducing the number
(ANN) in this model then weighted the results. By applying of foreign currency customers can spend and withdraw when
the ANN algorithm and the proposed model, shows that the using their debit and credit cards abroad. Egypt's central bank
ANN algorithm works efficiently for the two datasets. This wrote to bank chiefs asking that they "ensure that debit cards,
algorithm gave an accuracy rate of 72% for dataset1 and 98% including pre-paid cards, issued in local currency by Egyptian
for dataset2. banks are only used within the country." CIB did not specify
which cards would be affected or give the new limits, but
In 2018, Shenghui Yang [7] presented a classification model several bank staff told Reuters that the move would affect both
for the credit card default data set in the bank from Taiwan credit and debit cards with limits cut by about 50 percent. CIB
using five clustering algorithms. 10-fold cross-validation was cut Classic Card owners' maximum purchases outside of
2

Egypt to $2,500 a month from $5,000 and $3,500 a month ∣∣xi−vj∣∣ is the Euclidean distance between a point, xi, and
from $7,500 a month for Gold Card owners [10]. HSBC Egypt a centroid, vj, iterated overall c points in the ith cluster, for
(The Hong Kong and shanghai banking corporation) says that all n clusters.
all credit and debit cards have a limit of $100 per month,
though it does not specify whether this is for cash withdrawals B. IMPROVED K-MEANS CLUSTERING
or purchases, according to the bank's website [11, 12]. Other ALGORITHM
Egyptian banks have put limits on debit and credit card Improvement in the k-means clustering algorithm was
purchases and ATM withdrawals abroad. According to used because it can define the number of clusters
Ahmed Aboul Dahab, head of retail at SAIB Bank (Arab automatically and assign the required cluster to un-
International Banking Company) [13], says that the bank clustered points. The proposed improvement leads to
registered a 70-percent drop in credit card usage in January achieve high accuracy and reduce the clustering time by the
and February compared to the same period a year earlier. member assigned to the cluster. An improved k-means
Because of this crisis, many customers turned from their banks clustering algorithm based on dissimilarity. It selects the
to another searching for the high limit. So that any bank may initial centroids using the Huffman tree, which uses the
lose a huge number of customers, so we suggest to reprofiling dissimilarity matrix. Many experiments confirm that the
the bank customers to put them in a suitable cluster to increase improved algorithm is efficient with better clustering
customer retention and get high profitability. accuracy on the same algorithm time complexity [16].

iii. METHODS C. FUZZY C-MEANS CLUSTERING

In the world of information explosion, individual banks Fuzzy clustering (also referred to as soft clustering) is a
produce and collect a huge volume of data every day. Right form of clustering in which each data point can belong to
now, machine learning is an indispensable tool in the decision more than one cluster. In fuzzy clustering, data points can
support system and plays a key role in customer segmentation, potentially belong to multiple clusters. One of the most
customer services, fraud detection, credit and behavior widely used Fuzzy Clustering Algorithms is the Fuzzy C-
scoring, and benchmarking [14]. Machine learning authorizes means clustering (FCM) Algorithm. (FCM) clustering was
you to take your segmentation to the up next level. Machine developed by J.C. Dunn in 1973, and improved by J.C.
learning segments are effective: they can update in real-time. Bezdek in 1981. The algorithm focuses on improving the
This makes it possible to automate the personalization clustering or centroid computation without considering the
methods; the thing that is necessary if you want to publish noise and outliers [17].
them widely.
The four machine learning techniques employed in this study Algorithmic steps for Fuzzy c-means clustering:
are discussed below:
A. K-MEANS ALGORITHM Let X = {x1, x2, x3 ..., xn} be the set of data points and
V = {v1, v2, v3 ..., vc} be the set of centers.
K-mean clustering technique is one of the most commonly 1) Randomly select ‘c’ cluster centers.
used techniques for years because of its stability and Mac 2) Calculate the fuzzy membership 'µij' using:
Queen proposes simplicity. The K-Means clustering 𝜇𝑖𝑗 = 1/ ∑𝑐𝑘=1(𝑑𝑖𝑗 /𝑑𝑖𝑘 )(2/m-1) (2)
algorithm in 1967 is a partition-based cluster analysis
method. K-means execute division of objects into clusters 3) Compute the fuzzy centers 'vj' using:
that are “similar” between them and “dissimilar” to the 𝑚 𝑚
𝑣𝑗 = (∑𝑛𝑖=1(𝜇𝑖𝑗 ) 𝑥𝑖 )/(∑𝑛𝑖=1(𝜇𝑖𝑗 ) ), ∀𝑗 = 1,2, … . 𝑐 (3)
objects belongs to another cluster. It is used widely in
where,
cluster analysis for that, the K-means algorithm has higher
efficiency and scalability and converges fast when dealing
'n' is the number of data points.
with large data sets. K-means clustering is a type of
'vj' represents the jth cluster center.
unsupervised learning, which is used when you have
'm' is the fuzziness index m € [1, ∞].
unlabeled data (i.e., data without defined categories or
'c' represents the number of the cluster center.
groups). The goal of this algorithm is to find groups in the
'µij' represents the membership of ith data to jth cluster
data, with the number of groups represented by the variable
center. 'dij' represents the Euclidean distance between ith
K. K-means is a fast and efficient method, because the
data and jth cluster center.
complexity of one iteration is k*n*d where k (number of
clusters), n (number of examples), and d (time of D. ARTIFICIAL NEURAL NETWORKS (ANN)
computing the Euclidian distance between two A neural network sometimes is a simplified pattern of
points)[15].the following equation represent k-mean human brain information processing. The neural network
clustering algorithm: by simulating the inner connection between the neurons
𝐽(𝑉) = ∑𝑐𝐼=1 ∑𝑛𝐽=1(‖𝑋𝑖 − 𝑉𝑗‖)2 (1) works. Warren McCulloch and Walter Pitts (1943) created

a computational model for neural networks based on • False Negative (FN): Observation is positive, but it is
mathematics and algorithms called threshold logic. This predicted negative.
model paved the way for neural network research to split • True Negative (TN): Observation is negative, and it is
into two approaches. One approach focused on biological predicted to be negative.
processes in the brain while the other focused on the • False Positive (FP): Observation is negative, but it is
predicted positive.
application of neural networks to artificial intelligence. A
common use of the phrase "ANN model" is really the
iv. DATA SET:
definition of a class of such functions (where members of
The data set (‘default of credit card clients) is obtained
the class are obtained by varying parameters, connection
from the archive of UCI (the University of California,
weights, or specifics of the architecture such as the number Irvine) Machine Learning Repository [19]. It is a recently
of neurons or their connectivity)[18]. This methodology published dataset (obtained in 2015). The attribute details
provides the opportunity of creating a large combination of in the dataset are given in Table 1. The data set contains
different structures based on 30000 observations and 23 variables and there are no
• Number of layers, missing data on it. All explanatory variables were
• Selection of activation function. normalized. Standardizing data is a data pre-processing
• The number of perceptrons. step applied to variables to scale these variables to a similar
• Normalization layers range. This research aimed at the case of customer's default
• Dropout adjustments payments in Taiwan and compares the accuracy rate of
profiling customers among four machine-learning
techniques. Therefore, among the four machine learning
techniques, the artificial neural network is the only one that
(4) can accurately profile the data set.
𝑎𝑙𝑖 = 𝜎 (∑ 𝜔𝑗𝑘
𝑙
𝑎𝑘𝑙−1 + 𝑏𝑗𝑙 )
𝑘 Table 1. Description of the attributes in the dataset

Where the activation 𝑎𝑙𝑖 of the jth neuron in the lth layer is
Attribute no. Attribute name Description
related to the activations in the (l−1)th layer. Weight matrix X1 Limit_ BAL Amount of the given
wl for each layer, l. the entry in the jth row and kth column credit (NT dollar)
𝑙
is 𝜔𝑗𝑘 . X2 Sex Gender (1 = male;
2 = female).
X3 Education Education (1 =
graduate school;
E. EVALUATION METRICS: 2 = university;
After building a machine learning profiling model, the 3 = high school;
performance of this model should be measured by different 4 = others).
accuracy measures to evaluate it. In this paper, there are X4 Marital status Marital status (1 =
married; 2 = single;
different techniques (supervised and unsupervised) so 3 = others).
evaluation of their performance of classification was X5 Age Age (year).
measured by using these measures shown in the following X6-X11 Pay_0 to Pay_6 April to September
X12-X17 Bill_AMT1 to Amount of bill
equations (5, 6, 7, 8, 9, 10, and 11). BILL_AMT6 statement
(NT dollar)
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (5) X18-X23 Pay_AMT1 to
PAY_AMT6
Amount of previous
payment (NT dollar)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
TP X24 Y Default payment (Yes
Sensitivity = (6) = 1,
TP+FN No = 0)
TN
Specificity = (7)
TN+FP
TP
Precision = (8) v. PROPOSED FRAMEWORK:
TP+FP
TP The main idea of our proposed model shown in
Recall = sensitivity = (9) figure 1 is to improve profiling bank customer's
TP+FN
(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙)
F-measure = 2 ∗ (𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙) (10) behavior using different machine learning techniques.
This model starts with the data set, which obtained from
G-mean = √𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 ∗ 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑡𝑦 (11) the UCI machine learning repository. Then data goes
through the step of data preprocessing. After that, the
Where machine learning techniques are applied to build the
• True Positive (TP): Observation is positive, and it is customer profile. In machine learning, the profiling
predicted to be positive. phase recognizes the items in a group and places them

𝑋−𝑋𝑚𝑖𝑛
under target categories. In this paper, the accuracy rate 𝑋𝑛𝑒𝑤 = 𝑋 (12)
of techniques is evaluated through Gini co-efficient for 𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛
the unsupervised techniques then used the results as
input for supervised technique (Artificial Neural 2. Classification using machine learning
Network) (ANN) then evaluates the results to compare algorithms:
them to get the best technique. The result of data preprocessing is the final training set.
Then, applying the four machine learning techniques on
the final training set. The first technique was applied is
the K-means algorithm. The number of clusters is
determined based on the researcher's pre-knowledge.
So, in this paper, the researcher determined the number
of clusters as three.
The second classifier, improved k-mean that determine
the number of clusters as five clusters by the next steps
[21]:
1. using the intra-cluster distance measure, which is
simply the distance between a point and its cluster
center and we take the average of all of these
distances, defined as
𝐾
1
𝑖𝑛𝑡𝑟𝑎 = ∑ ∑‖𝑥 − 𝑧𝑖 ‖2
𝑁
𝑖=1 𝑥𝜀𝐶𝑖
(13)
Where N is the number of pixels in the image, K is the
number of clusters, and zi is the cluster center of cluster
Ci. We obviously want to minimize this measure.
2. The next step is minimizing this measure.
Measuring the inter-cluster distance, or the
distance between clusters, which must be as big
as possible. Then calculate this as the distance
between cluster centers, and take the minimum of
this value, defined as
FIGURE 1. The proposed model for profiling bank customers 2
𝑖𝑛𝑡𝑒𝑟 = 𝑚𝑖𝑛 (‖𝑧𝑖 − 𝑧𝑗 ‖ ),𝑖 = 1,2, … . , 𝑘 − 1
𝑗 = 𝑖 + 1, . . , 𝑘
1. Data preprocessing
(14)
Data preprocessing is the first important step in the data
Where cluster centers are zi ’ and zj. K is the number of
mining process. If there is much not relevant and
clusters.
superfluous information present or noisy and untrusted 3. Only taking the minimum of this value, the
data, analyzing data that has not been carefully checked smallest of this distance to be maximized, and the
for such problems can produce not accurate results. other larger values will automatically be bigger
Thus, the quality and representation of data are first and than this value.
important before applying the analysis. Often, data 4. Finally, calculate the ratio of inter and intra which
preprocessing has been the most important phase in our defined as validity:
machine-learning project. Firstly, the normalization 𝐼𝑛𝑡𝑟𝑎
Validity = (15)
𝐼𝑛𝑡𝑒𝑟
process is confirmed in the database. In most problems,
5. Therefore, the clustering, which gives a minimum
to normalize the data, at first eliminate the units of value for the validity measure; tell us what the
measurement for data, to be able to easily compare data ideal value of K (number of clusters).
from different places. One of the most common ways to The third classifier is a fuzzy c-mean that applied to the
normalize data includes: data set using a number of clusters as five.
Re-scaling data to have values between 0 and 1.
This is usually called feature scaling. One possible The next step, calculation Gini co-efficient for each one of
formula to achieve this is [20]: the three unsupervised algorithms getting the best accuracy
for profiling the dataset.

Finally, applying (ANN). We take the results of the

Neural network evaluation:
unsupervised techniques as a target for a neural network to
In this phase, taking the result of the improved k-mean
get its accuracy. By taking our results of K-means,
improved k-means, and fuzzy C-means as targets, we clustering algorithm with a high rate of accuracy as a
introduce a new label for the dataset. Then, try them and target for the neural network algorithm. The results in
get their accuracy by evaluating seven accuracy measures. table3 showed that the neural network was the best
The best classifiers that can help to improve profiling of accuracy rate in classifying the dataset. Therefore, we
bank customers is the highest accuracy one. achieved the aim of this experiment to improve profiling
bank customer's behavior by creating a new label with
unsupervised machine learning techniques.

Figure 3. Shows variation in the gradient coefficient

with respect to the number of epochs. As it is shown in
the Figure, after epoch number 170, the errors have
happened 6 times and the test is stopped at epoch
number 176. The final value of the gradient coefficient
at epoch number 176 is 0.073403, which is
approximate near to zero. The minimum the value of
the gradient coefficient better will be training and
testing of networks.

FIGURE 2. Proposed Model pseudo code steps

vi. EXPERIMENTS AND ANALYSIS

The experiment is applied to Matlab Platform (R2015b)
and using a PC with the following specifications:
Intel(R) Core(TM) i7-2400 CPU @ 3.10 GHz and 6.00
GB RAM, and under windows 64-bit operating system.
A. Analysis and Comparison:
The results in the below table2 show the classification
performance using different numbers of unsupervised FIGURE 3. The training state plot of the proposed model.
machine learning classifiers. Then Gini co-efficient
measured the accuracy. Taking these results as a new Table 3 shows the results of accuracy measures, which
label for the dataset instead of the old label to perform are got from applying the proposed ANN algorithm on
the next step of our experiment and using this new label the dataset on Matlab. It achieved a high accuracy ratio
as a target for the artificial neural network algorithm. by different measures. The accuracy rate equal
98.08%, achieve F-measure as 95.19% and G-mean
TABLE 2. the results of Gini co-efficient for unsupervised techniques
equal 97.96%.
Machine learning technique Best Gini obtained Rank
TABLE 3. The evaluation of the proposed neural network model.
Unsupervised (k-means) 26.37% 3 Measure Value
Unsupervised (improved k- 37.61% 1 Accuracy rate 0.9808
means) Sensitivity 0.9777
Specificity 0.9816
Fuzzy C-means 29.04% 2 Precision 0.9275
Recall 0.9777
Table 2 describes the results of applying the unsupervised F-Measure 0.9519
three techniques on the dataset after evaluating the G-mean 0.9796
performance with Gini co-efficient. It shows that
improved k-means are the best accuracy technique equal
to 37.61%.
6

The confusion matrix is shown in fig. 4 is a table that is

used to represent the performance of our classification TABLE 5. the results of the earlier founding researches:
model or (“classifier ANN”) on a set of test data to show Name of Date of Used Accuracy
the true values. By this matrix, the algorithm visualization researcher publishing technique rate
of the performance was detected. It produces an easy Shenghui Sept. 2018 neural 88.83
Yang[22] network
determination of confusion between classes. The Sharjeel 2017 neural 90.99
performance measures are calculated from this confusion Imtiaz[23] network
matrix. Furrakh 2017 Neural 81.7
Shahzad[24] network
(MLP)
Vladislav May 2017 neural 81.1
Pyzhov[25] network

vii. CONCLUSION AND FUTURE WORK:

Profiling has allowed the banks to build an interactive
relationship based on humanistic experience and trust.
Clustering techniques used to divide large datasets into
clusters. Proposed modification in the K-Means clustering
vanished off the two major drawbacks of K-Means
clustering that are the accuracy level and calculation time
consumed in clustering the dataset. The careful analysis of
the profiling environment should be made to ensure
effective and efficient segmenting of the bank's customer
pool to help design its service and product offering to win
FIGURE 4. The confusion matrix of the neural network
customer loyalty and satisfaction. The supervised machine
classifier. learning showed high accurate results of profiling than the
unsupervised technique by creating a new label target for
By scanning the confusion matrix of the neural network, it
the dataset. The artificial neural network showed the
achieves an accuracy rate for the neural network in Matlab
highest accuracy by seven different measures. So that any
equal 98.08%.
bank in the future can use this model and technique to
This confusion matrix shows that there are five clusters
improve profiling of its customer, get high profitability,
with a different number of customers. We can profile them
and reduce the risk.
as
In future work, we try to improve the effectiveness and
TABLE 4. The clusters result from the proposed neural network
model. performance of our proposed approach by applying some
Cluster Cluster N. customer Details deep learning algorithms In medical informatics.
name
1 Platinum 5765 Top class REFERENCES
2 Golden 5580 2nd rank [1] S. S.-Schwartz and S. Ben-David. Understanding machine learning:
From theory to algorithms. Cambridge university press, 2014.
3 Bronze 5171 3rd rank [2] M. Sharahi, M. Aligholi.'' Classify the Data of Bank Customers
4 Silver 6832 4th rank Using Data Mining and Clustering Techniques.'' Journal of Applied
Environmental and Biological Sciences February 11, 2015.
5 Classic 5858 5th rank [3] M. Ayoubi, "Customer segmentation based on CLV model and
neural network." International Journal of Computer Science Issues
(IJCSI) 13.2 (2016): 31.
Table 4 shows the five clusters and the number of [4]S. Palaniappan, A. Mustapha, et al. "Customer Profiling using
customers in each cluster. By scanning and analyzing the Classification Approach for Bank Telemarketing." JOIV: International
results with the dataset, it showed that the platinum Journal on Informatics Visualization 1.4-2 (2017): 214-217.
cluster with 5765 customers is the best. After that the [5]A. Bansal, M. Sharma, and S. Goel. "Improved k-means clustering
algorithm for prediction analysis using classification technique in data
golden, bronze, silver and classic clusters with 5580, mining." International Journal of Computer Applications 157.6 (2017):
5171, 6832 and 5858 customers respectively. 0975-8887.
[6] P. S. Patil and N. V. Dharwadkar, "Analysis of banking data using
Results of the earlier founding researches: machine learning," 2017 International Conference on I-SMAC (IoT in
Table 5 shows that by comparing our results with paper Social, Mobile, Analytics, and Cloud) (I-SMAC), Palladam, 2017, pp.
[22, 23, 24, and 25]; we found that our proposed model 876-881.
[7] S. Yang and H. Zhang. "Comparison of Several Data Mining
achieved the best result in the accuracy measures. The Methods in Credit Card Default Prediction." Intelligent Information
earlier researches we have found using the same dataset Management 10.05 (2018): 115.
and the same technique (ANN).
7

[8] N. H. Niloy and M. A. I. Navid. "Naïve Bayesian Classifier and Emad Abd Elaziz Dawood was born in
Classification Trees for the Predictive Accuracy of Probability of Sharkia, Egypt, in 1989. He received a
Default Credit Card Clients." American Journal of Data Mining and bachelor's degree in information systems
Knowledge Discovery 3.1 (2018): 1. from the science valley academy in 2010. He
[9]A. Arshad, S. Riaz, and L. Jiao. "Semi-Supervised Deep Fuzzy C- is a teaching assistant in the higher valley
Mean Clustering for Imbalanced Multi-class Classification." IEEE institute of information systems. He is the
Access (2019). Head of the Youth Welfare Authority in the
[10] Ahram Online, "Egypt's Banque Misr conditionally suspends card science valley academy. He is currently
usage abroad amid currency crisis," Egypt's Banque Misr conditionally pursuing a master's degree in information
suspends card usage abroad amid currency crisis - Economy - Business systems with the Arab Academy for
–Ahram online.[Online].Available: Technology and Maritime (AASTMT), Cairo, Egypt. He is a Research
http://english.ahram.org.eg/News/246079.aspx.[Accessed:10-Apr- Scholar with the Department of Computing and Information
2019] Technology, AASTMT. His main fields of research interests are data
[11] N. M. El Agroudy, F. A. Shafiq, and S. Mokhtar. "The effect of mining, machine learning.
the rise in the dollar rate on the Egyptian economy." Sciences 5.02
(2015): 509-514.
[12] H. Hassan and A. Jreisat. "Does bank efficiency matter? A case of Essamedean Elfakhrany received the
Egypt." International Journal of Economics and Financial Issues 6.2 B.S. and M.S. degrees from the Military
(2016): 473-478. Technical College (MTC), Cairo, Egypt, in
[13] T. Hafez, “IN DEPTH-The ups and downs of the Egyptian pound, 1986 and 1991, respectively, and the Ph.D.
"AmCham.[Online].Available: degree in System Engineering, The Ohio State
https://www.amcham.org.eg/publications/business- University, Dec 1999. He is an Assoc.
monthly/issues/256/April-2017/3568/the-ups-and-downs-of-the- Professor in the Computer Science department
egyptian-pound. [Accessed: 09-Apr-2019]. at Arab Academy for Sciences, Technology, &
[14]T. Perraju, "Artificial intelligence and decision support systems." Maritime Transport. His research interests
International Journal of Advanced Research in IT and Engineering 2.4 include data science, ontological knowledge
(2013): 17-26. representation, semantic web, and IoT streaming data analytics. He is
[15] M. Kaur, N.Kaur ''Adaptive K-Means Clustering Techniques For interested in teaching Artificial intelligence, knowledge management,
Data Clustering'' International Journal of Innovative Decision support systems and theory of computation.
Research in Science, Engineering, and Technology (2014).
[16] J. Wang and Su. Xiaolong "An improved K-Means clustering
algorithm." Communication Software and Networks (ICCSN), 2011 FAHIMA A. MAGHRABY received the B.S.
IEEE 3rd International Conference on. IEEE, 2011. degree in Computer Science from AinShams
[17] F. BASER, S. GOKTEN, and P. O. GOKTEN. "Using fuzzy c- University, Cairo, Egypt, in 2003 and the M.S.
means clustering algorithm in financial health scoring." Audit Financiar degree in Computer Science from AinShams
15.147 (2017): 385-394. University, Cairo, Egypt, in 2008. The Ph.D.
[18] S. Deb, "Application of Artificial Neural Networks (ANN)-In degree in Computer Science from AinShams
Designing SODEPUS (Study of Dynamic Earth Processes using University, Cairo, Egypt, in 2014. From 2004 to
Software)." 2014, she was a Lecturer Assistant in the
[19] Default of credit card clients Data Set, UCI machine learning Institute of Computer Science, Shorouk
repository. Academy, Cairo, Egypt. From 2014 till now,
[20] B. K. Singh, K. Verma, and A. S. Thoke. "Investigations on impact she is a lecturer in the Faculty of Computing and Information
of feature normalization techniques on classifier's performance in breast Technology, Arab Academy for Science, Technology and Maritime
tumor classification." International Journal of Computer Applications Transport (AASTMT), Cairo, Egypt. Her research interest includes
116.19 (2015). Bioinformatics, Imaging Processing, Artificial Intelligence, and
[21] S. Ray, and Rose H. Turi. "Determination of number of clusters in Blockchain.
k-means clustering and application in colour image
segmentation." Proceedings of the 4th international conference on
advances in pattern recognition and digital techniques. 1999.
[22] S.Yang, and H. Zhang. "Comparison of Several Data Mining
Methods in Credit Card Default Prediction." Intelligent Information
Management 10.05 (2018z): 115.
[23] S. Imtiaz and A. J. Brimicombe. "A Better Comparison Summary
of Credit Scoring Classification." International Journal of Advanced
Computer Science and Applications 8.7 (2017): 1-4.
[24] M. Pasha, et al. "Performance comparison of data mining
algorithms for the predictive accuracy of credit card defaulters." Int. J.
Comput. Sci. Netw. Secur 17.3 (2017): 178-183.
[25] V. Pyzhov and S. Pyzhov. "Comparison of methods of data mining
techniques for the predictive accuracy." (2017).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

AI Powered Decision Making in Banks
100% (2)
AI Powered Decision Making in Banks
17 pages
Reset Epson L3150 Printer With WICReset Utility Tool - Wic Reset Key
No ratings yet
Reset Epson L3150 Printer With WICReset Utility Tool - Wic Reset Key
18 pages
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
No ratings yet
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
4 pages
F3A 1 More About Factorization of Polynomials
No ratings yet
F3A 1 More About Factorization of Polynomials
18 pages
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
No ratings yet
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
11 pages
Claves de Office
0% (1)
Claves de Office
19 pages
Hyderabad
No ratings yet
Hyderabad
43 pages
3 DG Auto Load Sharing Scheme
100% (1)
3 DG Auto Load Sharing Scheme
9 pages
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
No ratings yet
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
148 pages
Data Mining Project
100% (1)
Data Mining Project
24 pages
Crescent Business School: Data Mining For Business Applications
0% (1)
Crescent Business School: Data Mining For Business Applications
6 pages
Sinusoidal Steady State-Mcqs: 1. The Value of Current Through The 1 Farad Capacitor of Figure Is
No ratings yet
Sinusoidal Steady State-Mcqs: 1. The Value of Current Through The 1 Farad Capacitor of Figure Is
13 pages
Analysis of Bank Marketing For Term Deposit Using Data Mining Techniques
No ratings yet
Analysis of Bank Marketing For Term Deposit Using Data Mining Techniques
11 pages
Review Jurnal CB 2 (Sciencefirect Q1)
No ratings yet
Review Jurnal CB 2 (Sciencefirect Q1)
16 pages
Credit Assessment of Bank Customers by A Fuzzy Exp
No ratings yet
Credit Assessment of Bank Customers by A Fuzzy Exp
6 pages
Data Mining in Banking and Its Applications - A Rev
No ratings yet
Data Mining in Banking and Its Applications - A Rev
9 pages
A Financial Data Mining Model TL
No ratings yet
A Financial Data Mining Model TL
14 pages
Customer Profiling and Segmentation in Retail Bank
No ratings yet
Customer Profiling and Segmentation in Retail Bank
7 pages
Unit 5 - Applications of AI and Machine Learning
No ratings yet
Unit 5 - Applications of AI and Machine Learning
57 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
Segmenting Bank Customers Via RFM Model and Unsupervised Machine Learning
No ratings yet
Segmenting Bank Customers Via RFM Model and Unsupervised Machine Learning
6 pages
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
No ratings yet
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
35 pages
Electronics 13 04527 With Cover
No ratings yet
Electronics 13 04527 With Cover
34 pages
Krishna Report
No ratings yet
Krishna Report
27 pages
IJIKMv18p087 105tran8783
No ratings yet
IJIKMv18p087 105tran8783
20 pages
Improving The Predictive Accuracy of The Cross-Sel
No ratings yet
Improving The Predictive Accuracy of The Cross-Sel
19 pages
Rahman 2020
No ratings yet
Rahman 2020
6 pages
A Data Mining Approach To Predict Prospective Business Sectors For Lending in Retail Banking Using Decision Tree
No ratings yet
A Data Mining Approach To Predict Prospective Business Sectors For Lending in Retail Banking Using Decision Tree
10 pages
Data-Driven Approaches To Improve Customer Experience in Banking 2004
No ratings yet
Data-Driven Approaches To Improve Customer Experience in Banking 2004
23 pages
Enhanced Churn Prediction Model With Boosted Trees Algorithms in The Banking Sector
No ratings yet
Enhanced Churn Prediction Model With Boosted Trees Algorithms in The Banking Sector
6 pages
Random Forest and Logistic Regression Algorithms A Comparison of Classification Methods For Bank Ma
No ratings yet
Random Forest and Logistic Regression Algorithms A Comparison of Classification Methods For Bank Ma
4 pages
Data Mining Attrition Analysis
No ratings yet
Data Mining Attrition Analysis
14 pages
3755-Article Text-8216-1-10-20180709
No ratings yet
3755-Article Text-8216-1-10-20180709
17 pages
Customer Profile Cluster Techiques
No ratings yet
Customer Profile Cluster Techiques
13 pages
Data Mining: (Kumar, Viswanath and Rao, 2016)
No ratings yet
Data Mining: (Kumar, Viswanath and Rao, 2016)
3 pages
IJIKMv18p087 105tran8783
No ratings yet
IJIKMv18p087 105tran8783
19 pages
Abstraction
No ratings yet
Abstraction
8 pages
8 68-83 Tajet (Joynat) Evaluating+Machine+Learning+Models
No ratings yet
8 68-83 Tajet (Joynat) Evaluating+Machine+Learning+Models
16 pages
International Journal of Advanced Trends in Computer Science and Engineering
No ratings yet
International Journal of Advanced Trends in Computer Science and Engineering
8 pages
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
No ratings yet
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
6 pages
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
No ratings yet
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
8 pages
Latex Tutorial
No ratings yet
Latex Tutorial
44 pages
Report-Yifan Lu.1
No ratings yet
Report-Yifan Lu.1
13 pages
Prediction of Bank Customer Potential Using Creative Marketing Based On Exploratory
No ratings yet
Prediction of Bank Customer Potential Using Creative Marketing Based On Exploratory
2 pages
Bank Customer Churn Prediction
No ratings yet
Bank Customer Churn Prediction
5 pages
Applsci 15 03138 v3
No ratings yet
Applsci 15 03138 v3
25 pages
Machine Learning Based Customer Churn Prediction in Banking: November 2020
No ratings yet
Machine Learning Based Customer Churn Prediction in Banking: November 2020
7 pages
An Effective Method To Understand Bank Customer Re
No ratings yet
An Effective Method To Understand Bank Customer Re
5 pages
Clustering in Recurrent Neural Networks For Micro-Segmentation Using Spending Personality
No ratings yet
Clustering in Recurrent Neural Networks For Micro-Segmentation Using Spending Personality
5 pages
SSRN 4976040
No ratings yet
SSRN 4976040
14 pages
Machine Learning To Develop Credit Card Customer Churn Prediction
No ratings yet
Machine Learning To Develop Credit Card Customer Churn Prediction
14 pages
Final KHDL
No ratings yet
Final KHDL
32 pages
2 DHS IEEE DM Bank
No ratings yet
2 DHS IEEE DM Bank
1 page
Optimizing Customer Segmentationinthe Banking Sector
No ratings yet
Optimizing Customer Segmentationinthe Banking Sector
8 pages
Improving The Predictive Accuracy of The Cross-Selling of Consumer Loans Using Deep Learning Networks
No ratings yet
Improving The Predictive Accuracy of The Cross-Selling of Consumer Loans Using Deep Learning Networks
18 pages
Honey Research Paper
No ratings yet
Honey Research Paper
4 pages
Sharma & Soni, 2020, Discernment of Potential Buyers Based On Purchasing Behaviour Via Machine Learning Techniques
No ratings yet
Sharma & Soni, 2020, Discernment of Potential Buyers Based On Purchasing Behaviour Via Machine Learning Techniques
5 pages
Credit Approval Data Analysis Using Classification and Regression Models
No ratings yet
Credit Approval Data Analysis Using Classification and Regression Models
2 pages
How Banks Can Better Serve Their Customers Through Artificial Techniques
No ratings yet
How Banks Can Better Serve Their Customers Through Artificial Techniques
16 pages
1 PB
No ratings yet
1 PB
11 pages
A Data-Driven Approach To Predict The Success of Bank Telemarketing
No ratings yet
A Data-Driven Approach To Predict The Success of Bank Telemarketing
35 pages
Management Information System of Allied Bank
57% (14)
Management Information System of Allied Bank
13 pages
Applications of Data Mining in The Banking Sector
No ratings yet
Applications of Data Mining in The Banking Sector
8 pages
K Meanspaper
No ratings yet
K Meanspaper
20 pages
2024 - Data Mining and Banking (Using Two Different Tools)
No ratings yet
2024 - Data Mining and Banking (Using Two Different Tools)
10 pages
All Papers Part 2
No ratings yet
All Papers Part 2
172 pages
EBI Overview
No ratings yet
EBI Overview
4 pages
Batool's File
No ratings yet
Batool's File
32 pages
SC MCQ
0% (1)
SC MCQ
10 pages
Supplier
No ratings yet
Supplier
117 pages
Communications and Data Handling
No ratings yet
Communications and Data Handling
57 pages
Quintum Configuration Guide DX
No ratings yet
Quintum Configuration Guide DX
47 pages
Take-Home Exam Questions On Learning
No ratings yet
Take-Home Exam Questions On Learning
2 pages
1073 Operating Manual PDF
No ratings yet
1073 Operating Manual PDF
42 pages
Documents: Search Books, Presentatio
No ratings yet
Documents: Search Books, Presentatio
14 pages
Báo cáo Đa nền tảng
No ratings yet
Báo cáo Đa nền tảng
24 pages
Unit-Ii 191eec303t Lic
No ratings yet
Unit-Ii 191eec303t Lic
125 pages
Manual For Blower Door Operation-200 1000 2000 3000
No ratings yet
Manual For Blower Door Operation-200 1000 2000 3000
88 pages
Java Theroy ! Easy To Learn
No ratings yet
Java Theroy ! Easy To Learn
53 pages
Ada Worksheet Patterson
No ratings yet
Ada Worksheet Patterson
2 pages
6021A
No ratings yet
6021A
2 pages
Linked List Programs
No ratings yet
Linked List Programs
6 pages
Share Whitepaper 7
No ratings yet
Share Whitepaper 7
14 pages
Data and Instruction Caches
No ratings yet
Data and Instruction Caches
6 pages
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
No ratings yet
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
7 pages
Amon Chowdhury CV
No ratings yet
Amon Chowdhury CV
3 pages
Ibm Devops and Software Engineering: Sahish Pandav
No ratings yet
Ibm Devops and Software Engineering: Sahish Pandav
1 page
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
From Everand
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
Yang Yiming
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet

Improve Profiling Bank Customer Behavior Using ML

Uploaded by

Improve Profiling Bank Customer Behavior Using ML

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

Improve profiling bank customer's

Corresponding author: Emad Abd Elaziz ([email protected])

VOLUME XX, 2017 1

iii. METHODS C. FUZZY C-MEANS CLUSTERING

Finally, applying (ANN). We take the results of the

Figure 3. Shows variation in the gradient coefficient

FIGURE 2. Proposed Model pseudo code steps

vi. EXPERIMENTS AND ANALYSIS

The confusion matrix is shown in fig. 4 is a table that is

vii. CONCLUSION AND FUTURE WORK:

You might also like