Research On A Customer Churn Combination Prediction Model Based On Decision Tree and Neural Network
Research On A Customer Churn Combination Prediction Model Based On Decision Tree and Neural Network
Abstract—Customer churn is a prominent issue facing customer churn. Through the model, information useful for
companies. Preventing customer churn, trying to retain and predicting customer loss can be extracted from a large
retain customers has become an important issue for business number of customer data so that enterprises can formulate
operations and development. Most of the current customer relevant customer work plans based on this information [3].
churn predictions use a single prediction model, which makes At present, domestic and foreign customer churn prediction
it difficult to accurately predict customer churn. Based on the algorithms include predictions based on traditional statistics
prediction results and confidence of decision tree and neural and predictions based on combined classifiers [4]. Based on
network model, this paper designs a combined prediction machine learning methods and statistical theory, Y. Hang et
model of customer churn and conducts empirical research on
al. [5] used customers' demographic statistics to correlate
the effectiveness of the model. The prediction results show that
indicators to predict churn customers. Based on the
compared with the single customer churn prediction model,
the combined prediction model has higher accuracy and better transaction time of retail customers, Miguéis et al. [6]
prediction effect, and can more intuitively display the basic established a predictive model based on Logistic regression.
characteristics of the churn customers. Yin Ting et al. [7] combined the prior information method of
Bayesian classification with the information entropy gain
Keywords-customer churn; decision tree; prediction model; method of decision tree classification and applied it to the
neural network analysis of customer churn in the telecommunications
industry. Du Gang and Huang Zhenyu [8] used the improved
I. INTRODUCTION decision tree model to predict customer purchase behavior,
Since China's entry into the WTO, many industries have and analyzed the effect comparison before and after
opened to the outside world, which has led to a more fierce optimization to verify the effectiveness and efficiency of the
market competition environment for enterprises. The improved algorithm in the customer purchase behavior. Cui
intensified market competition environment has made the Yongzhe [9] uses the C4.5 algorithm in the decision tree
problem of customer churn faced by enterprises more and algorithm to establish a churn early warning model for
more serious. Generally, the original customers of an telecommunications customers.
enterprise no longer purchase the company's products or However the traditional parametric model or a single
receive corporate services are called corporate customer artificial intelligence-based method cannot achieve relatively
churn. Different industries have different definitions of high-precision prediction, so the establishment of a
customer churn [1]. Generally speaking, it can be divided combined prediction model to improve the prediction
into two categories: active customer churn and passive accuracy is an inevitable trend to solve the problem of
customer churn. Usually excessive customer churn can have customer churn [10]. Based on this, this paper designs a
a significant impact on a company's performance. How to combined prediction model based on two models of decision
retain new customers while developing new customers has tree C5.0 and neural network. The confidence of the two
become a subject that related staff of the company have to prediction models is used as the weight, and the weighted
study [2]. score is used as the customer churn probability. The
Data mining technology is the most commonly used combined prediction model is used to predict customer churn
method to predict customer churn. Data mining technology in a supermarket. By comparing the prediction accuracy of
uses decision tree, neural network, classification regression the three models, the validity of the combined prediction
tree and other technologies to build a model for predicting model is verified.
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO CEARA. Downloaded on April 03,2023 at 17:25:19 UTC from IEEE Xplore. Restrictions apply.
II. CUSTOMER CHURN PREDICTION MODEL brand new network is created. But when the "Continue to
train the existing model" option is selected, the training will
A. Decision Tree Customer Churn Prediction Model continue to use the network successfully generated by the
Based on the information gain theory, the decision tree is previous node. When a stream containing the generated
one of the most widely used data classification algorithms at neural network model is executed, for each output field in
present. The decision tree structure contains several nodes the original training data, a new field is added to the stream.
and branches, where nodes represent tests on a certain This new field contains the network prediction for the
attribute and branches represent the results of the tests. corresponding output field. For the symbol output field,
Common decision tree algorithms include ID3, C5.0, etc., another new field is added that contains the confidence of the
which are mainly used for predictive analysis of events. The prediction.
decision tree prediction process is performed in two steps:
one is to build and evolve a decision tree using the training C. Combined Customer Churn Prediction Model
set; the other is to test the attribute values of each node, The so-called combined prediction model takes the
classify the input data, and use the attribute values of this confidence of the decision tree and the neural network
class to complete the estimation of the prediction object [11]. prediction model as the weight, the prediction result as the
This article adopts the C5.0 decision tree method, which variable, recalculates the probability of customer churn by
is widely used in the industry for dirt tolerance and strong weighting, and gives the prediction result of customer churn
interpretation ability, to build a customer churn model. The based on the probability of churn. The formula for
C5.0 model uses the field with the largest information gain to calculating the probability of customer churn is as follows:
split the sample. The sample subset obtained from the first
split is usually split according to another attribute. This is 1
repeated until the sample subset can no longer be split. P (aX 1 bX 2 )
Finally, we remove or trim out a subset of the samples that 2
do not contribute significantly to the model.
The model uses the bootstrap method to improve the where a, b are the confidence of each sample record in the
accuracy of the C5.0 algorithm. The entire model set is used
decision tree and neural network customer churn prediction
for sample classification, and each decentralized prediction is
integrated into a comprehensive prediction through a respectively, and X 1 , X 2 are the prediction results of
weighted voting process. When the node's flow is executed, customer churn in the decision tree and neural network
two new fields are added: the predicted value of each record model respectively. The value of X 1 , X 2 is either 0 or 1. If
and the confidence.
the customer is lost, the value is 1; if the customer is not lost,
B. Neural Network Customer Churn Prediction Model the value is 0.
As a data analysis mode of human brain thought According to formula (1), if the prediction results of both
simulation, neural networks are based on massive data models are not churn, that is, the values of X 1 and X 2 are
parallel processing and calculation, and are used to describe both 0, then the customer churn probability P of the
cognitive, decision-making and other intelligent control combined prediction is also 0, that is, the customer is not
behaviors. The model structure of a typical neural network churn. If the prediction results of both models are churn, that
includes input layer, hidden layer, and output layer, which
are connected by several neurons. BP neural network is the is, the values of X 1 and X 2 are 1, and the accuracy of the
most widely used neural network algorithm, and its output predictions of both models is above 90%, then the customer
expression [12] is: churn probability P of the combined prediction will be
H f i (¦Zij xi T j ) greater than 0.5, that is, the customer is predicted churn. If
the prediction result of one model is churn and the prediction
where Zij is the connection weight coefficient, f i is the result of the other model is not churn, then the churn
probability P of the combined prediction will be less than 0.5.
excitation function, T j is the threshold of the neuron, and xi Further, if the confidence level predicted for the churn record
is the input of the neuron. BP neural network is trained using is greater than 0.8, indicating that the model predicts the
a teacher-learning method and can implement any complex customer churn with a greater probability, then the combined
non-linear mapping function. The training process is based model will give the probability of customer churn. If the
on the principle of minimum output error, and the connection confidence of the churn record is less than 0.8, it is
weight coefficients and thresholds are modified layer by considered that the possibility of customer churn is small,
layer. then the combination model predicts that the customer is not
This paper selects the customer related characteristic churn.
attributes as the input of the neural network, and the output is It can be seen that the biggest advantage of the combined
whether the customer is lost. In order to facilitate the customer churn prediction model is that it can integrate the
effective evaluation of the model later, 2/3 of the data is results of the two models to clearly distinguish between
randomly selected as the training set and 1/3 as the test set. churn customers and non-churn customers, and for customers
By default, every time a neural network node is executed, a in between, it gives the probability of churn. This can help
130
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO CEARA. Downloaded on April 03,2023 at 17:25:19 UTC from IEEE Xplore. Restrictions apply.
marketers to more accurately identify the churn status of customer purchases is increasing and the trend of purchases
customers with different values, and select customers with is 0.112.
different churn probability for marketing purposes.
TABLE I. CHARACTERISTIC ATTRIBUTES OF A MEMBER CUSTOMER
III. EMPIRICAL ANALYSIS OF THE COMBINED PREDICTION Member Non- Average Unit
Membership Trend value
MODEL card
Card Level
Age shopping shopping shopping
of purchases
number points interval points
Ordinary
A. Data Collection and Selection of Attributes 1800193
member
34 0 41.6 0.504 0.112
131
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO CEARA. Downloaded on April 03,2023 at 17:25:19 UTC from IEEE Xplore. Restrictions apply.
Then the prediction results and confidence of the model makes full use of the advantages of a single prediction
customer churn model of the decision tree and neural model and improves the accuracy of the prediction.
network are summarized, and the prediction results are Enterprises can formulate corresponding countermeasures to
converted: not churn is recorded as 0, churn is recorded as 1. avoid customer churn based on the prediction results.
Substitute into formula (1) for weighted calculation, and the
customer churn probability P is between 0 and 0.99. When IV. CONCLUSION
P does not exceed 0.4, the prediction result of the combined Aiming at the problem that a single model is difficult to
prediction model is that the customer is not churn. When achieve high-precision customer churn prediction, this paper
P is not less than 0.5, the prediction result of the combined uses the prediction results and confidence of decision tree
prediction model and neural network prediction model to
prediction model is the customer churn. And when P is
build a combined customer churn prediction model. The
between 0.4 and 0.5, the combined prediction model gives
empirical results show that the combined prediction model
the probability value of the customer churn. Table IV shows
can not only have a better interpretation ability like a
the prediction results of the loss of some member customers.
decision tree model, but also a higher prediction accuracy
TABLE IV. PREDICTION RESULTS OF CHURN OF SOME MEMBER rate of a neural network model, which can better make up for
CUSTOMERS the shortcomings of a single prediction model, and can also
Decision Neural get more stable and accurate prediction results.
Member Decision Neural Customer
tree network Whether
card tree network churn
number
prediction
confidence
prediction
confidence probability
churn ACKNOWLEDGMENT
results results
1800200 0 0.912 0 0.878 0.0000 not churn Grateful acknowledgement is made to Ms. Shi Ziyan who
1800371 1 0.721 0 0.543 0.3605 not churn
1800433 1 0.823 0 0.521 0.4115 0.4115
provided me with the data, and Mr. Fang Chensheng who
gave me considerable help by means of suggestion,
D. Model Evaluation comments and criticism. Without their help, the completion
The evaluation index for predicting the quality of the of the article would be impossible. In addition, I deeply
customer churn model is the accuracy rate. The so-called appreciate the contribution to this thesis made in various
accuracy rate refers to the ratio of the number of predicted ways by my friends and colleagues.
correct customers to the total number of customers. The
combined churn model is used to predict member customers, REFERENCES
and the results are shown in Table V. Of the 2681 customers, [1] Chen Mingliang. Discussion on the framework of the basic theoretical
21 have a churn probability between 0.4 and 0.5. Excluding system of customer relationship management[J]. Journal of
Management Engineering, 2006,20 (4): 36- 41㸬
these 21 customers, there are 2660 customers left. Among
[2] Li Yang. Neural network-based customer churn data mining
these 2660 customer churn predictions, there are 2630 prediction model[J]. Journal of Computer Applications, 2006,33 (S1):
correct predictions and 30 incorrect predictions. The 48-51.
accuracy of the model prediction is as high as 98.87%. [3] Zeng Yaohui. Application of Data Mining in Customer Relationship
Management of Communication Industry[J]. Telecommunication
TABLE V. PREDICTION RESULTS OF THE COMBINED PREDICTION Engineering Technology and Standardization, 2006,19 (7): 63- 66㸬
MODEL
[4] Yu Xiaobin, Cao Jie, Kong Zaiwu. A review on customer churn[J].
Predicted Predicted Customers with a Computer Integrated Manufacturing System, 2012(10).
churn unchurn churn probability of [5] Zhang X, Zhu J, Xu S, et al. Predicting Customer Churn through
customers customers (0.4,0.5) Interpersonal Influence[J]. Knowledge-Based Systems, 2012,
Actual churn 28(6):97-104.
640 17 12
customers
[6] Miguéis V L, Van Den Poel D, Camanho A S, et al. Modeling Partial
Actual unchurn
13 1990 9 Customer Churn: On the Value of First Product-category Purchase
customers
Sequences[J]. Expert Systems with Applications, 2012,
39(12):11250-11256.
TABLE VI. COMPARISON OF PREDICTION ACCURACY OF THE THREE
MODELS [7] Yi Ting, Ma Jun, Qin Xizhong, et al. Application of Bayes Decision
Tree in Prediction of Customer Churn[J]. Computer Engineering and
Prediction Model Prediction accuracy Applications, 2014(7).
Decision tree customer churn prediction model 93.47%
[8] Du Gang, Huang Zhenyu. Prediction of Customer Buying Behavior in
Neural network customer churn prediction model 96.42% Big Data Environment[J]. Management Modernization, 2015(1).
Combined customer churn prediction model 98.87%
[9] Cui Yongzhe. Application of data mining technology in early warning
of customer churn[J]. Journal of Yanbian University: Natural Science
In order to further analyze the prediction accuracy of Edition, 2008, 34 (2).
different model algorithms, the prediction results of the [10] Chen Chen. Analysis of Telecom Customer Churn Forecast Based on
decision tree churn model, the neural network churn model Combination Forecast [D]. Hunan University, 2011.
and the combined churn model are compared, as shown in [11] Yu Lu. A Combined Prediction Model of Telecommunications
Table VI. The results show that the accuracy of customer Customer Churn[J]. Journal of Huaqiao University (Natural Science
churn prediction using the combined model is higher than Edition), 2016, 37 (5): 637- 640.
that of decision tree and neural network customer churn [12] Li Aiqun, Qiao Yan, Wang Ruchuan, et al. Analysis of Telecom
prediction models. This is mainly because the combination Customer Churn Based on Distributed Hybrid Data Mining [J].
Computer Technology and Development, 2010, 20 (10):43-46.
132
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO CEARA. Downloaded on April 03,2023 at 17:25:19 UTC from IEEE Xplore. Restrictions apply.