A Neuro-Fuzzy Classifier For Customer Churn Prediction: Hossein Abbasimehr Mostafa Setak M. J. Tarokh
A Neuro-Fuzzy Classifier For Customer Churn Prediction: Hossein Abbasimehr Mostafa Setak M. J. Tarokh
Mostafa Setak
M. J. Tarokh
ABSTRACT
Churn prediction is a useful tool to predict customer at churn
risk. By accurate prediction of churners and non-churners, a
company can use the limited marketing resource efficiently to
target the churner customers in a retention marketing campaign.
Accuracy is not the only important aspect in evaluating a churn
prediction models. Churn prediction models should be both
accurate and comprehensible. Therefore, Adaptive Neuro Fuzzy
Inference System (ANFIS) as neuro-fuzzy classifier is applied to
churn prediction modeling and benchmarked to traditional rulebased classifier such as C4.5 and RIPPER. In this paper, we
have built two ANFIS models including ANFIS-Subtractive
(subtractive clustering based fuzzy inference system (FIS)) and
ANFIS-FCM (fuzzy C-means (FCM) based FIS) models. The
results showed that both ANFIS-Subtractive and ANFIS-FCM
models have acceptable performance in terms of accuracy,
specificity, and sensitivity. In addition, ANFIS-Subtractive and
ANFIS-FCM clearly induce much less rules than C4.5 and
RIPPER. Hence ANFIS-Subtractive and ANFIS-FCM are the
most comprehensible techniques tested in the experiments.
These results indicate that ANFIS shows acceptable
performance in terms of accuracy and comprehensibility, and it
is an appropriate choice for churn prediction applications.
General Terms
Data Mining & Churn
Keywords
Churn Prediction, Data mining, ANFIS, Fuzzy C-means,
Subtractive clustering.
2. METHODS
2.1 Fuzzy c-means (FCM) clustering
algorithm
Fuzzy c-means (FCM) is a data clustering method wherein each
data point belongs to a cluster to some degree that is specified
by a membership grade. This method was originally introduced
by Jim Bezdek in 1981[13].
1. INTRODUCTION
In recent years, Due to the saturated markets and competitive
business environment, Customer churn becomes a focal concern
of most firms in all industries. Neslin et al. [1] defined customer
churn as the tendency of customers to stop doing business with a
company in a given time period. Churn prediction is a useful
tool to predict customer at churn risk. Technically spoken, the
purpose of churn prediction is to classify the customers into two
types: customers who churn (leave the company) and customer
who continue doing their business with company [2]. By
accurate prediction of churners and non-churners, a company
can use the limited marketing resource efficiently to target the
churner customers in a retention marketing campaign.
Gaining a new customer costs 12 times more than retaining the
existing one [3]; Therefore, a small improvement on the
accuracy of churn prediction can result a big profit for a
company [4].
Data mining techniques had been used widely in churn
prediction context such as support vector machines (SVM) [5, 6,
7], decision tree [8], artificial neural network (ANN) [9, 10],
,,
} in an p-
35
is the degree of
membership of
in the cluster j,
is the ith of p-dimensional
data,
is the p-dimensional center of the cluster, and
is
any norm expressing the similarity between any measured data
and the center. Fuzzy partitioning is carried out through an
iterative optimization of the objective function shown above,
with the update of membership
and cluster centers by:
=
(2)
(3)
Where
and
radius for each cluster center. After calculating the potential for
each vector, the one with the higher potential is selected as the
first cluster center. Let
be the center of the first group and
its potential. Then the potential for each
is reduced
according to Eq.(4)
=
Also =
(4)
and
36
(7)
Where the output of this layer
represents the firing strength
of a rule.
Layer 3. Every node i in this layer, determines the ratio of the
ith rule's firing strength to the sum of all rules' firing strengths
as:
i=1, 2.
(8)
Where the output of this layer represents the normalized firing
strengths.
Layer 4: Every node i in this layer is an adaptive node with a
node function of the form
=
=
(
+
+ )
(9)
Where
is the output of layer 3, and {
} is the
parameter set. Parameters in this layer are referred to as the
consequent parameters.
Layer 5: this layer consists of one single node that computes the
overall output as the summation of all incoming signals from
layer 4 as
Overall output =
(10)
3. EMPIRICAL ANALYSIS
3.1 Dataset
All algorithms used in this paper are applied on a publicly
available dataset downloaded from the UCI Repository of
Machine Learning Databases at the University of California,
Irvine1. The data set contains 20 variables worth of information
about 5000 customers, along with an indication of whether or
not that customer churned (left the company). The proportion of
churner in the dataset is 14.3%. For a full description of the
dataset, one may refer to [21]. We first split the data set into
67%/33% training / test set split. The proportion of churners
was oversampled in order to give the predictive model a better
ability of discerning discriminating patterns. Therefore the
proportion of churner and non-churner in training data set is
50%|50%. The test set was not oversampled to provide a more
realistic test set; the churn rate remained 14.3%. All models
constructed during this work are evaluated on this test set.
http://www.ics.uci.edu/mlearn/MLRepository.html
37
Dichotomous
Categorical
TotalDayMins
Continuous
VoiceMail Plan
Subscriber(0=no,
1=yes)
Daytime usage
TotalEveMins
Continuous
Evening usage
TotalEveCharge
Continuous
TotalNightCharge
Continuous
TotalInterMins
Continuous
TotalInterCalls
Continuous
NumberofCalltoCS
Continuous
Type
What
Dichotomous
Categorical
International Plan
Subscriber(0=no,
1=yes)
Number of
international calls
Number of calls to
customer service
38
4.2 Comprehensibility
Accuracy, sensitivity, specificity are not the only important
aspect in evaluating a churn prediction models [28]. A churn
prediction model should be both comprehensible and accurate.
Comprehensibility of model causes it to reveal some knowledge
about churn drivers of customers. Such knowledge can be
extracted in the form of if then rules which allows developing
a more effective retention strategy. Therefore, comprehensibility
of the classification model is an important requirement in churn
prediction modeling.
Accuracy
Specificity
Sensitivity
#rules
C4.5
94%
95.6%
87%
25
RIPPER
95%
97.5%
85.7%
18
Logistic regression
77.3%
76.6%
82%
----
ANFIS-Subtractive
92%
93%
84%
ANFIS-FCM
91%
92%
84%
39
5. CONCLUSIONS
Both accuracy and comprehensibility are two important
requirements in churn prediction modeling. This paper presents
application of ANFIS in churn prediction context. Particularly,
we compared ANFIS as a neuro-fuzzy classifier with two stateof-the-arts crisp classifiers including C4.5 and RIPPER rule
learner. The results showed that both ANFIS-Subtractive and
ANFIS-FCM models have acceptable performance in terms of
accuracy, specificity, and sensitivity. In addition, ANFISSubtractive and ANFIS-FCM clearly induce much less rules
than C4.5 and RIPPER. Hence ANFIS-Subtractive and ANFISFCM which result in a comparable number of rules are the most
comprehensible techniques tested in the experiments. These
results indicate that ANFIS showed acceptable performance in
terms of accuracy and comprehensibility, and it is an appropriate
choice for churn prediction applications.
6. ACKNOWLEDGMENTS
We thank the Iran Telecommunication Research Center for
financial support.
7. REFERENCES
[1] Neslin, S.A. Gupta, S. Kamakura, W. Lu, J. Mason, C.,
2006. Defection detection: Measuring and understanding
the predictive accuracy of customer churn models. Journal
of Marketing Research, 43(2), 204211.
[2] Coussement, K. F. Benoit, D. Van den Poel, D., 2010.
Improved marketing decision making in a customer churn
prediction context using generalized additive models,
Expert Systems with Applications 37, 21322143.
40
[22] Han, J., & Kamber, M., 2006. Data Mining Concepts and
Techniques. Morgan Kaufmann.
[23] Witten, I. H. & Frank, E., 2005. Data mining: Practical
machine learning tools and techniques. San Francisco:
Morgan Kaufmann. 0-12-088407-0.
[24] Berger, H., Merkl, D., Dittenbach, M. 2006. Exploiting
Partial Decision Trees For Feature Subset Selection in eMail Categorization, In Proceedings of the ACM
Symposium on Applied Computing (SAC ).
[25] Burez, J. Van den Poel , D., 2009.Handling class
imbalance in customer churn prediction, Expert Systems
with Applications 36, 46264636
[26] Weiss, G. M., 2004. Mining with rarity: A unifying
framework. SIGKDD Explorations, 6(1), 719.
[27] Fuzzy logic toolbox user's guide for use with MATLAB
2010.
[28] Verbeke, W., et al., 2011. Building comprehensible
customer churn prediction models with ad-vanced rule
induction techniques. Expert Systems with Applications,
38, 23542364.
41