Application of Data Mining in Term Deposit Marketing - IAENG

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol II

IMECS 2018, March 14-16, 2018, Hong Kong

Application of Data Mining in Term Deposit


Marketing
Q.R. Zhuang, Y.W. Yao, and O. Liu

 a study implemented by [5], positive responses to mass


Abstract—Term deposits are facing challenges from both campaigns are less than 1%; conversely, direct marketing
economic pressure and marketing competition. There are a campaigns are more effective. As a result, this research will
number of valuable studies concerning bank and deposit mainly concentrate on direct marketing campaigns of term
marketing. These studies mentioned the significance of
customers and customer segmentation in bank and deposit
deposit accounts.
marketing. However, problems like obsolescence of data,
inadequate maps, lack of data and specific methods encounter Nevertheless, direct marketing might cause negative attitudes
in practical application of deposit market segmentation. This toward banks due to the intrusion of privacy. Therefore,
research adopts data mining techniques through SPSS Modeler pinpointing the target customer groups is the most important
to predict customers’ term deposit subscription behaviors and marketing strategy when adopting direct marketing
understand customers’ features to improve the effectiveness
and accuracy of bank marketing.
campaigns. This study will predict customers’ term deposit
subscription behaviors and understand customers’ features to
Index Terms—data mining, customer segmentation, term improve the effectiveness and accuracy of bank marketing.
deposits, SPSS Modeler

II. LITERATURE REVIEW


I. INTRODUCTION There are a number of valuable studies concerning bank and

B ANKING industry is an important sector of social


economy. Bank sectors provide various products and
services for clients. Deposits constitute one of the most
deposit marketing. Different recommendations are put
forward from different marketing aspects based on
qualitative methods or quantitative analysis. Data mining
traditional and fundamental operations of banks and techniques have been widely applied in bank marketing as
meanwhile, deposits are a primary source of bank financing well. Wu came up the idea that the association rules can be
[1]. There are many types of deposit accounts and some applied in cross-selling of bank products and customer risk
major types, including checking accounts, savings accounts, control [6]. However, many studies just compare the
term deposit accounts and money market deposit accounts [2]. performance of different classification algorithms on
This study will especially focus on term deposit accounts, predicting the success rate of bank marketing campaigns. For
because term deposit accounts provide bank sectors with the example, Moro, Cortez and Laureano used the rminer
most stable sources of credit and profit. However, the global Package and R Tool to test three classification models
financial crisis in 2008 raised people’s distrusts on banks and (Decision Trees, Naïve Bayes and Support Vector Machines)
the suspiciousness result in deposits shrank [3]. In addition, and compare their performance through Receiver Operating
due to the rapid development of capital market, the Characteristic curve (ROC) and Lift curve analysis [3].
emergence of a large amount of financial intermediation and Similarly, Moro, Cortez and Rita tested four data mining
financial instruments provides more investment channels and models, including logistic regression, decision trees (DT),
opportunities for residents. Both economic pressure and neural network (NN) and support vector machine [7]. After
marketing competition drive bank sectors to improve the evaluating area of the receiver operating characteristic curve
effectiveness of marketing campaigns. (AUC) and area of the LIFT cumulative curve (ALIFT),
neural network presented the best performance. Nachev
There are two typical marketing campaigns for companies to combined cross-validation and multiple runs to partition the
promote services and/or products, including mass campaigns data set into train and test sets [8]. He also explored the
and direct marketing [4]. Mass campaigns aim at general impact of performance caused by different neural network
indiscriminate public and direct marketing campaigns are designs.
implemented with the target of a specific group. According to
All researches above focus on predicting customers’
Manuscript received January 1, 2018; revised February 1, 2018. behaviors resulting from bank marketing. In order to avoid
Q.R. Zhuang is with the International Business School Suzhou, Xi’an marketing campaigns being annoying rather than attractive,
Jiaotong-Liverpool University, Suzhou, China. (e-mail:
[email protected] ). the right promotional messages should be delivered to right
Y.W. Yao is with Department of Accountancy, Hang Seng Management customer groups. As early as 1974, Robert put forward the
College, Hong Kong. (e-mail: [email protected] ). idea of the use of census data in bank marketing [9]. He
O. Liu is with the International Business School Suzhou, Xi’an
Jiaotong-Liverpool University, Suzhou, China. (corresponding author, mentioned that census data can be applied into location
e-mail: [email protected]). analysis and marketing segmentation. Wang, Song and Fang

ISBN: 978-988-14048-8-6 IMECS 2018


ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol II
IMECS 2018, March 14-16, 2018, Hong Kong

mentioned that the banking industry lacks scientific algorithms can be evaluated by the average silhouette
marketing management and banks generally adopt some coefficient of all instances [14]. A higher silhouette
traditional marketing methods, including relationship coefficient indicates that the instances are better matched to
marketing (use employees’ personal relationship to find its own clusters.
deposit clients), self-interest marketing (obtain deposits by
satisfying clients’ individual interests, such as gifts), passive
marketing (attract customers to increase deposit by offering IV. DATA UNDERSTANDING
warm and thoughtful counter service) and simple service A secondary dataset related to direct marketing campaigns on
marketing (attract deposits by meeting the low-level term deposit accounts of a Portuguese banking institution is
requirements of customers, such as providing door-to-door obtained from the Internet [15]. The dataset contains 41188
services) [10]. They came up with the idea that carrying out observations and 21 variables. The detailed attribute
market segmentation of deposit marketing and selecting the information is shown in the table below.
marketing target is the scientific way of marketing TABLE I
management. However, problems like obsolescence of data, ATTRIBUTE INFORMATION
inadequate maps, lack of data and specific methods encounter Name Data type Description
in practical application of deposit market segmentation.
Bank Client Data
age numeric age
This study will adopt data mining techniques to predict job categorical type of job
customers’ term deposit subscription behaviors and marital categorical marital status
understand customers’ features to improve the effectiveness education categorical education background
default categorical has credit in default?
and accuracy of bank marketing. In order to achieve this housing categorical has housing loan?
objective, the following questions will be addressed. loan categorical has personal loan?
I. How to predict whether a bank client will subscribe to a Contact/ Campaign Data
contact categorical contact communication type
term deposit or not? month categorical last contact month of year
II. Which determinants would indicate a client is ready to day_of_week categorical last contact day of the week
subscribe to a term deposit through direct marketing? duration numeric last contact duration, in seconds
campaign numeric number of contacts performed
III. How to segment term deposit market? during this campaign and for
IV. Are there any common features of clients who have this client
subscribed to a term deposit? pdays numeric number of days that passed by
after the client was last
contacted from a previous
campaign
III. METHODOLOGY previous numeric number of contacts performed
before this campaign and for
In this research, classification models and clustering models this client
will be built through SPSS Modeler. A number of machine poutcome categorical outcome of the previous
marketing campaign
learning algorithms and modeling techniques are included in
Social and Economic Context Attributes
IBM SPSS Modeler for different types of problems solving.
emp.var.rate numeric employment variation rate -
quarterly indicator
Classification algorithms are used to establish predictive cons.price.idx numeric consumer price index - monthly
model by learning and discovering the relationship between a indicator
cons.conf.idx numeric consumer confidence index -
set of feature variables and a target variable. Two phases are monthly indicator
typically contained in classification algorithms [11]. In the euribor3m numeric euribor 3 month rate - daily
first phase, models are constructed from the training instance. indicator
In the second phase, unlabelled testing instances can be nr.employed numeric number of employees -
quarterly indicator
predicted and assigned through the model established in the Output Variable
training phase. Several indicators are typically used to y binary has the client subscribed a term
evaluate the performance of a binary classifier. For example, deposit?
accuracy is used to describe outcomes that are predicted
correctly. Moreover, AUC is the area under the ROC
(Receiver Operating Characteristic) curve, which is a
V. MODELING
probability [12]. Furthermore, Gini coefficient is related to
AUC that Gini=2*AUC-1. A Gini coefficient above 60% Classification and clustering models are established on the
corresponds to a good classification model. processed data.
A. Classification
Clustering algorithms are applied to customer segmentation. Classification algorithms are used to establish a predictive
Instances can be divided into natural groups through model of whether a client will subscribe to a term deposit or
clustering techniques, which is an unsupervised learning not. Auto Classifier node of SPSS Modeler enables to
scheme [13]. Instances with strong resemblance will be in the automatically create and compare multiple different
same cluster. There are different types of clustering classification models. As a result, C5.0 model shows the best
algorithms, including portioning approaches, hierarchical performance with the highest accuracy.
methods, density-based methods, grid-based methods,
model-based methods, etc. The quality of clustering

ISBN: 978-988-14048-8-6 IMECS 2018


ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol II
IMECS 2018, March 14-16, 2018, Hong Kong

Therefore, a boosted C5.0 model is built to further improve much longer than regular customers. Meanwhile, more
the performance of the C5.0 model. Figure 1 presents that the number of contacts are performed for new customers during
boosted C5.0 improve the accuracy of the model to 97%. In the marketing campaign with the average of 2.16 times.
addition, both AUC and Gini coefficient indicate that the Furthermore, usual communication type for new customers is
boosted C5.0 classifier generate a more accurate telephone and commonly used communication type for
classification results and a better predictive model. regular customers is cellular.

However, it seems that there is no significant difference


between the two clusters when comparing the bank client
attributes. As a result, clustering techniques are re-applied in
bank client data to only focus on the customers’
characteristics. Five clusters are generated by K-means
clustering algorithms with the highest silhouette of 0.37.
Fig. 1. Boosted C5.0 Model Comparing the five clusters, differences between each cluster
In this boosted C5.0 classification predictive model, contact can be summarized as the distinct attributes shown in this
duration (‘duration’) is the most important predictor. table.
Combining the rules generated by the boosted C5.0, one of TABLE Ⅲ
the rules predicts that if the contact duration is no more than DISTINCT ATTRIBUTES OF FIVE CLUSTERS
77 seconds, the clients will not subscribe to a term deposit. Clusters Distinct Attributes

Cluster 1 No housing loan; Married or divorced; Age: 43


Cluster 2 4-year basic education background; Retired,
housemaid; Divorced or married; Age: 66
Cluster 3 Single; No housing loan; Student; Age:30
Cluster 4 Single or divorced; Housing loan; administration;
Fig. 2. Significant Predictor Age: 33
Cluster 5 Married; Housing loan; Entrepreneur, management;
B. Clustering Illiterate; Age: 40
Clustering algorithms are applied to segment clients who
In the first cluster, people with stable lives and income have
have subscribed to a term deposit. Therefore, the dataset is
no pressure of housing loan. They may want to subscribe to a
filtered by the condition that y= yes, which includes 3859 term deposit to advance prepare for their future retirement or
instances. In order to discover and understand customers’ for their children. In the second cluster, people are mainly
behaviors and characteristics, social and economic context retired or housemaid. Their jobs, ages and education
attributes and output variable will not be used to generate background may lead to their habit of adopting risk-averse
clusters. Because according to the classification results, investment and preference of saving in banks. In the third
economic context attributes have less impact on clients’ cluster, typical groups are students who are about to step into
deposit subscription behavior. the society and they may start to subscribe to a deposit to
prepare for their future life. In the fourth cluster, major
Firstly, Auto Cluster node is attached to compare different groups are young people who have jobs and housing loan as
clustering models. Two clusters are automatically created by well. The pressure of lives may drive them to choose a more
TwoStep mode with the highest silhouette coefficient secure way to invest, which is saving in banks. In the fifth
(0.389). Significant differences between the two clusters are cluster, people are generally not well educated and they
displayed in Table Ⅱ. become entrepreneurs or managers at their middle age.
TABLE Ⅱ People in this cluster may have experienced challenges and
DISTINCT ATTRIBUTES OF TWO CLUSTERS understand lives are not that easy. They may like subscribing
Regular Customer to a term deposit to cherish their gains.
Attribute New Customer (Cluster 1)
(Cluster 2)
pdays 999 (not previously contacted) 281.96
poutcome nonexistent success VI. CONCLUSION
previous 0.05 1.58
duration long(598.74) short(370.56) To conclude, term deposits of bank sectors are facing the
campaign more(2.16) less(1.72) challenges from both economic pressure and marketing
contact telephone cellular competition. This study adopts data mining techniques to
predict customers’ term deposit subscription behaviors and
understand customers’ features to improve the effectiveness
Clients in the first cluster were not previously contacted and and accuracy of bank marketing.
there was basically no marketing campaign offered for clients
in this cluster. Clients in the second cluster were in the The results generated by the application of classification
opposite situation. Therefore, the TwoStep model segments algorithms and clustering algorithms have practical meaning
for the objectives of this research. Some feasible suggestions
clients into two clusters: new customers and regular
are put forward as followings. Firstly, marketing staffs
customers. The duration of contact with new customers is
should be patient when implementing direct marketing,

ISBN: 978-988-14048-8-6 IMECS 2018


ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol II
IMECS 2018, March 14-16, 2018, Hong Kong

especially telemarketing for new customers. The contact


duration has significant impact on the success rate of
telemarketing. Secondly, the number of contacts performed
during the marketing campaign should be controlled. It is
better to control the number of contacts less than 3 times;
otherwise, too frequent contacts may cause aversion. Thirdly,
it is better to call customers’ telephone numbers (such as
office number) rather than their cellular to try to avoid the
feeling of intrusion of privacy, especially telemarketing for
new customers. Fourthly, bank sectors can launch targeted
marketing campaigns to attract specific customers in
accordance with the results of clustering, such as children’s
growth deposit scheme, retirement term deposit scheme,
student term deposit scheme, housing loan deposit scheme,
etc.

REFERENCES
[1] Khir, K., Gupta, L., & Shanmugam, B. (2008) ‘Islamic banking: A
practical perspective’.
[2] Islam, M.A. and Ghosh, P. (2014) ‘A comparative analysis of deposit
products in banking industry: an opportunity for eastern bank Ltd.’,
Journal of Investment and Management, 3(1), January, pp.7-20.
[3] Moro, S., Cortez, P. & Laureano, R. (2013) A data mining approach for
bank telemarketing using the rminer package and r tool [Online].
Available from:
https://www.researchgate.net/publication/256464440_A_data_mining
_approach_for_bank_telemarketing_using_the_rminer_package_and_
r_tool?enrichId=rgreq-ef9c19b19ab77f6e64e62c02ff6bdc5c-XXX&e
nrichSource=Y292ZXJQYWdlOzI1NjQ2NDQ0MDtBUzoxMTkyMz
AyMzgzMDIyMTFAMTQwNTQzODExMjgxNQ%3D%3D&el=1_x
_2&_esc=publicationCoverPdf (Accessed: 4 September 2017).
[4] Ling, X. and Li, C. (1998) ‘Data Mining for Direct Marketing:
Problems and Solutions’. Proceedings of the 4th KDD conference,
AAAI Press, pp.73–79.
[5] Ou, C., Liu, C., Huang, J. & Zhong, N. (2003) ‘On Data Mining for
Direct Marketing’. Proceedings of the 9th RSFDGrC conference, 2639,
pp.491–498.
[6] Wu, Q.H. (2008) ‘Some Issues with Applying Association Rules in
Commercial Bank’, Journal of System Simulation, 20 (8), April,
pp.2206-2209.
[7] Moro, S., Cortez, P. & Rita, P. (2014) A Data-Driven Approach to
Predict the Success of Bank Telemarketing [Online]. Available from:
https://pdfs.semanticscholar.org/4a27/709545cfa225d8983fb4df8061f
b205b9116.pdf (Accessed: 14 September 2017).
[8] Nachev, A. (2015) Application of data mining techniques for direct
marketing [Online]. Available from:
http://www.foibg.com/ibs_isc/ibs-30/ibs-30-p09.pdf (Accessed: 14
September 2017).
[9] Predue, R.T. (1974) ‘SOME TYPICAL USES OF CENSUS DATA IN
BANK MARKETING RESEARCH’, Review of Public Data Use,
2(2), pp.31-36.
[10] Wang, B.Z., Song, J.L., & Fang, C. (2002) ‘Opinions on Deposit
Marketing of Commercial Banks’, Financial Theory and Practice, 2002
(9), August, pp.32-33.
[11] Aggarwal, C.C. (2015) ‘Data Classification Algorithms and
Applications’, CRC Press, EBSCOhost [Online]. Available from:
http://10.7.1.204:81/read.php?resid=99673992 (Accessed: 25
November 2017).
[12] Fawcett, T. (2006) ‘An introduction to ROC analysis’, Pattern
recognition letters, 27(8), pp. 861-874.
[13] Witten, I., Frank, E. & Hall, M.A. (2011) Data Mining – Pratical
Machine Learning Tools and Techniques. Burlington: Elsevier.
[14] Chen, X. and Li, Z. (2013) ‘Effectiveness Analysis of The Application
of Clustering in Student Grouping’, International Conference on
Education Technology and Information System, Atlantis Press,
pp.988-991.
[15] UCI Machine Learning Repository (2014). Bank Marketing Data Set
[Online]. Available from:
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing (Accessed: 4
September 2017).

ISBN: 978-988-14048-8-6 IMECS 2018


ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)

You might also like