Customer Retention Data Mining
Customer Retention Data Mining
Keywords: customer retention, data mining, deviation analysis, feature selection, multiple
level association rules
1. Introduction
In the last decade, the increased dependency and widespread use of data-
bases in almost every business, scientific and government organization has
led to an explosive growth of data. Instead of being blessed with more
information to aid decision making, the overwhelming amounts of data have
inevitably resulted in the problem of “information overloading but knowledge
starvation”, as the human analysts are unable to keep pace to digest the
data and turn it into useful knowledge for application purposes. This situ-
ation has motivated some scientists and researchers in the fields of artificial
intelligence, machine learning, statistics and databases to put their expertise
together to form the field of knowledge discovery in databases (KDD). KDD
seeks to intelligently analyze voluminous amounts of information in data-
bases and extract previously unknown and useful knowledge (nuggets) from
them (Fayyad et al., 1996).
Active research in these fields has produced a wide range of effective
knowledge discovery techniques like ID3 (Quinlan, 1986) for classification
(used in C4.5 (Quinlan, 1993)) and Apriori (Agrawal and Srikant, 1994) for
association rule mining (used in DBMiner (Kamber et al., 1997)), to cater
to various applications in data mining. This tremendous success achieved
570 KIANSING NG AND HUAN LIU
in the research domain has also spun off a wide repertoire of high-quality,
off-the-shelf commercial data mining software/tools, like C5.0 by RuleQuest,
MineSet by Silicon Graphics and Intelligent Miner by IBM, to name a few.
Many people saw these tools as the catalyst for the success of data mining
applications. After all, many organizations are facing problems coping with
overwhelming amounts of data in their databases and are attracted by the
potential competitive advantages from data mining applications.
The availability of real-world problems and the wealth of data from the
organizations’ databases provide an excellent test-bed for us to perform
practical data mining. Our work was motivated by a real-world problem
involving a collaboration with a large company to tackle the pressing issue
of “customer retention”. Such collaborations between academia and applica-
tion domains to solve real-world problems represent a positive step towards
the success of data mining applications. The proposed approach showcases
an effective application of data mining in the sales and services related
industries, and reveals the complex and intertwined process of practical data
mining. More importantly, we demonstrate that real-world data mining is
an art of combining careful study of the domain, intelligent analysis of the
problems, and skillful use of various tools from machine learning, statistics,
and databases.
One truth about data mining applications is that even when the problem or
goal is clear and focused, the mining process still remains a complicated one,
involving multiple tasks across multiple stages. In our application, although
we are clear that we want to retain customers and the goal is to “identify
the potential defectors way before they actually defect”, it is still difficult to
know where to start. Without studying the domain, it is impossible for us
to go further. The problem and goal statement specified often gives no clue
as to “what” tasks of data mining are involved, “which” techniques are to
be applied, and “how” they are applied. Obviously, the problem of customer
retention must be further decomposed into several sub-problems such that
the knowledge derived from a task in one phase can serve as the input to the
next phase. Since the available data mining techniques and tools are designed
to be task-specific (according to the framework given in Figure 1) rather
than problem-specific, they cannot be applied directly to solve real-world
problems.
Many challenges can only be found in real-world applications. The
changing environment can cause the data to fluctuate and make the previ-
ously discovered patterns partially invalid. Such phenomenon is referred to as
“concept drift” (Widmer, 1996). The possible solutions include incremental
methods for updating the patterns and treating such drifts as an opportunity
for “interesting” discovery by using it to cue the search for patterns of
CUSTOMER RETENTION VIA DATA MINING 571
solving real-world problems and further justifies that practical data mining is
an art that requires more than just directly applying off-the-shelf techniques
and tools. The remainder of this article is organized as follows. In Section 2,
we begin with a domain analysis and task discovery of the customer reten-
tion problem faced by the company. We then perform a top-down problem
decomposition and list various sub-problems. This is important because each
sub-problem must map to only one specific task of data mining, so that
the existing data mining tools and techniques can effectively be applied. In
Section 3, we illustrate the use of feature selection via induction to choose
the objective “indicators” (or salient features) about customer loyalty. With
this technique, “concept-drifts” – the definition of a concept change over
time (Clearwater et al., 1989) – can be captured as they take place. This
is followed by the use of deviation analysis and forecasting to monitor
these indicators for the potentially defecting customers in Section 4. Next,
in Section 5, we elaborate on the employment of multiple-level association
rule mining in predicting customers who are likely to follow the previ-
ously identified defecting customers and leave the company. These “early
warnings” of possible chain effect will enable the marketing division to
take actions or tailor special packages to retain important customers and
their potential followers before the defection takes place. Finally, Section 6
suggests some implications of this project.
Because of the confidentiality of the databases used and the sensitivity
of services provided by the company, we deliberately use “the company”
throughout the paper and describe applications through some intuitive
examples as much as possible. In explaining basic concepts, we use the
“credit” database from theUC Irvine Machine Learning Repository (Merz
and Murphy, 1996) for illustrative purpose.
can choose the company over others or vice versa as different relayers provide
varying services and charges with contracts of various periods. Companies
(senders and receivers) can form consortiums or groups to enjoy discounts
of various sorts offered by the company. The goal of this work is to help a
relayer keep as many of its customers as possible using its relaying services.
The goal of customer retention is to retain customers before they switch to
other relayers.
Like many organizations, the dependency on information technology has
inevitably resulted in an explosive growth of data, far beyond the human
analyst’s ability to understand and make use of the data for competitive
advantages. This is also due to the fact that conventional databases and
spreadsheets used by these analysts are not designed for identifying patterns
from the databases. Neither do they possess the capability to select nor
consolidate the different sources of information from a large number of
multiple databases of heterogeneous sources. In view of these inadequa-
cies, the company involved sees “data warehousing” and “data mining” as
two intuitive solutions. Maintaining a data warehouse separately from the
transactional database allows special organization, access methods and imple-
mentation methods to support multi-dimensional views and operations typical
of OLAP. In fact, some OLAP tool can be integrated to the data ware-
house to support complex OLAP queries involving multi-dimensional data
representation, visualization and interactive viewing, while not degrading the
performance of the operational databases.
The first step in our analysis involves identifying opportunities for data
mining applications. This step is important because not every problem can be
solved by data mining. Some guidelines for selecting a potential data mining
application include “the potential for significant impact”, “availability of
sufficient data with low noise level”, “relevance of attributes”, and “presence
of domain knowledge”. In fact, nearly one-fifth of the whole development
time was spent on identifying the “right” problems for application, as well
as justifying the use of data mining over the conventional approaches. The
possibility that an application can be generalized to solve other similar prob-
lems in related industries is also taken into consideration. With these factors
in mind, our feasibility study has identified the problem of customer retention
as a potentially usefull data mining application.
The motivation of our work comes from the fact that the problem
of customer retention is becoming an increasingly pressing issue for
organizations in the sales (e.g., departmental stores, banking, insurance, etc.)
and services (e.g., providers of Internet and/or Telecommunications services)
574 KIANSING NG AND HUAN LIU
In this application, the goal is focused and clear. We are concerned about
customer defection and the goal is to identify the potentially defecting
customers so that steps can be taken to retain them before they actually
defect. At first glimpse, it is difficult to start. The key to finding a solution
is to iteratively decompose it into some solvable sub-problems. The problem
analysis and task decomposition for our application is briefly summarized in
Figure 2.
As we can see in the figure, the main problem of customer retention is
decomposed into three sub-problems or sub-goals. In the first sub-problem,
576 KIANSING NG AND HUAN LIU
The key to solving the problem of customer retention is to identify the list
of potential defectors and predict the consequences following each potential
defection even before they actually take place. Intuitively, this gives rise to
the need for us to first identify a set of relevant attributes or indicators that are
representative for the target concept of “customer loyalty and their likelihood
of defection”. The knowledge found can then be cross-validated against the
existing knowledge, and employed to capture concept drifts.
Because of the clear importance of this task, most organizations have had to
rely on the judgments of their human experts to devise a set of “subjective
CUSTOMER RETENTION VIA DATA MINING 577
Classification is one of the most important and frequently seen tasks in data
mining: given a large set of training data of the form {A1 , A2 , . . . , An , C},
its objective is to learn an accurate model of how attribute-values (A0i s) can
determine class-labels C. “Decision trees” is one possible model (Quinlan,
1986) from which a set of disjunctive if-then classification rules can be
derived. Classification rules having high predictive accuracy (or confidence)
are employed for various tasks. First, the model can be used to perform
classification for future data having unknown class outcome – prediction.
For example, a bank manager can check a future application against the
classification model obtained from historical data to determine whether an
application should be granted a credit – a screening process. Second, since
those attributes appearing in the classification rules are influential to the
eventual outcome of the classification, the user can have a better under-
standing and insight into the characteristics for each target class. This is
578 KIANSING NG AND HUAN LIU
Our actual data, sampled from transactional databases residing in Oracle, has
more than 40 attributes and 60,000 periodical records. Because of its confid-
entiality, we choose the “credit” data to illustrate the idea of classification
using decision trees. The data is parially shown in Table 1. The last column
shows the class values. C4.5 is applied to the data to derive the updated
classification rules about customers of the following form:
The attributes that appear in the classification rules are objective indicators
as they are found in the data and are considered influential to the target
concept “Granted”. For instance, from the above classification rule, the user
can conclude that attributes “Jobless”, “Bought” and “Saving” are influential
and relevant to the target class of “Granted”, while other attributes such as
“Married”, “Age”, “Sex” are not.
Periodical applications of this method allow the Marketing Department
to objectively identify the most recent set of influencing indicators in order
to capture possible concept drifts. The set of objective indicators are then
compared with the set of subjective indicators identified by the domain
experts. As a result of the cross-validation process, the eventual set of merged
indicators is more up-to-date and more reliable for gauging the loyalty of
customers and their likelihood of defecting. Monitors are then placed on
these loyalty-indicators in the data warehouse so that if any customer shows
significant deviations beyond a certain minimum deviation threshold δMin , an
exception report of defection will be triggered off. This is described in the
next section.
CUSTOMER RETENTION VIA DATA MINING 579
(At − Et )
δt =
Et
where At is the Actual value for the indicator, and Et is the Expected value
for the indicator over a time period of the time-series.
If the analysis detects any deviation δt exceeding a certain user-specified
“minimum deviation threshold δMin ”, i.e., δt > δMin in some pre-defined
measures in the temporal database, it suggests that a significant deviation has
occurred. Some exception reports are generated. Since significant deviations
from the norms are unexpected, they should be “interesting” to the user.
Such statistical analysis method is widely employed in data mining to
discover a few really important and relevant deviations among a multitude
of potentially interesting changes in the temporal databases. Without such a
method, most of the changes are normally “drowned out” by the mass of data
(Matheus et al., 1994) and will remain unnoticed. Even if human analysts
were able to detect the more abrupt pattern changes in the time-series, it
would be extremely difficult to monitor such a large number of deviations
over a long period of time. Nevertheless, finding these patterns is interesting
in discovering higher-level relationships.
CUSTOMER RETENTION VIA DATA MINING 581
4.3. Forecasting
Ft = [(A + B ∗ (t − 1)) + B] ∗ St
582 KIANSING NG AND HUAN LIU
Usually, these smoothing constants are left to the control of the end-user
in a dynamic environment, although empirical experiments have shown that
a value between 0.10 and 0.30 for all the smoothing constants often results
in reliable forecasts. However, if the user expects the level of the estimate to
change permanently in the immediate future because of some special circum-
stances, then a larger value of a smoothing constant (like 0.7) should be used
for a short period of time. Once the computed level of the forecasting model
has changed in accordance with these special circumstances, the user should
then switch back to a smaller value of the smoothing constant.
In our implementation, the normative values Et for each of the five to seven
“indicators” are first developed through a trend-seasonal forecasting model
(Levin et al., 1992), based on the customer’s historical performance in the
temporal databases. With an annual size of more than 50,000 records avail-
able for each indicator to train the forecasting model, the accuracy of the
predicted normative values can be increased. This should also be credited to
the additional factors taken into consideration in our approach:
− the use of seasonal indeces adjusts the forecast according to the annual
seasonal pattern in the time-series, and
− the use of exponential smoothing allows more weight to be assigned to
the recent data, thus taking into account the current circumstances, like
a recent economic downturn in South-East Asia.
With the normative values of every indicator forecasted for every
customer, deviation analysis is performed to detect those customers who
show significant deviations, or δt > δMin . These customers are deemed to
be potentially defecting and this warrants a further “interestingness valid-
ation” (Matheus et al., 1994). In other words, their deviations are further
compared with those δSR of the (aggregated) customers operating in the
same service-route. By doing so, we take into account the trends in the
external environment and the profiles of the subject under consideration. This
will ensure that a “real” deviation is exclusive only to the specific subject
and not some general phenomenon experienced by other subjects too in the
same service-route. For example, the Asian boom in the mid of last decade
had generally boosted the volumes of those in the Asia service-route. Simi-
larly, the recent Asian economic crisis also causes an overall reduction in
the volumes of those senders and receivers in the Asia service-route. An
illustration of one such analysis is shown in Figure 3.
In the analysis, senders S1 , S4 and consortium C2 showed significant
deviations δ1 which satisfied the δMin of –10%. These are further compared
584 KIANSING NG AND HUAN LIU
with the average deviation δSR of the aggregated customers in their respective
service-routes. In S1 ’s case, the general population in SR1 performs reas-
onably well (a positive deviation +1%), suggesting that S1 ’s deviation is
unexpected and thus interesting. In S4 ’s case, the general population in SR5
performs equally badly (a deviation of –10%), suggesting that S4 ’s deviation
should be expected and thus not interesting. As mentioned above, we can
usually relate S4 ’s kind of deviation to some regional event like the current
economy turmoil in Asia that affects all the relaying operations in the Asia’s
service-routes. If no such explanation can be found, then it would mean that
all the customers in the service-route are declining.
If consistent deviations are also observed across the set of indicators for
“deviating” customers like S1 and C2 , then a periodic exception report is
produced to alert the domain experts on these possible “defectors”. Domain
knowledge and insights are then applied to verify the findings for each of
these cases and the suspected potential defectors will be monitored closely
for the subsequent periods. Persistent deviations are strong signs of likely
defection. Besides performing deviation analysis on the Customer concept,
similar analysis can also be applied to investigate and identify upcoming
or weakening Markets (continents and countries) and service-routes for the
purpose of marketing.
major players. Since they carry very large relaying volumes, they have great
influence over the smaller companies. Hence, the defection of a major sender
will encourage similar behavior in their associated business partners who
will attempt to preserve established relationships. This can inflict a severe
dent to the financial health of a relayer. Hence, there is much incentive for
the Marketing Department to have a full picture of the consequences from
an identified potential defection. If we wait until a “chain effect” becomes
observable to the human analysts, it would be too late. In short, preventive
measures should also be taken to take care of the followers when a potential
defector is detected as they can also influence the major players to change
their stands.
Although the mining of association rules at the message level will give
a good idea of the association relationships between senders and receivers,
knowledge of this level does not provide much business value for our applic-
ation. This is because knowledge at too low a level (over-specific) will end
up looking like the raw data and having little general meaning. Since most
586 KIANSING NG AND HUAN LIU
6. Conclusions
In the course of our work, we have identified some interesting objective indi-
cators among a large number of attributes. The finding has verified our earlier
conjecture on the limitations of human capabilities. In addition, preliminary
experiments on the historical data sets have successfully identified some
already defected customers long before they showed prominent signs of
defection. This work is significant because our approach can be generalized
into solving similar problems in the sales and services related industries, like
Telecommunications, Internet Service Providers, Insurance, Cargo Trans-
shipment, etc. For instance, a popular strategy used by many companies in the
services industry is using attractive promotions and discounts to “lure” new
customers into short-term services under them. Even department stores in the
sales industry come up with their own VIP smart cards in a bid to retain their
customers. We would like to highlight that the information from the logs and
databases can potentially be turned into valuable knowledge for competitive
advantage. For example, the customers’ particulars and their profiles (like
mobile-phone or Internet usage patterns) could be mined for predicting a list
of potential defectors among them. Since most customers are bound to the
services of a company for at least a period of time (usually around a year),
special offers can be made to those who show signs of dissatisfaction.
Many ideas presented here can in fact be modified to suit various applica-
tions of similar needs. For instance, although the third sub-task in our work
is made possible by the availability of the transactional associations, unique
to a business of hierarchical structure, there are many other kinds of asso-
ciations in different problem domains. Spatial associations can be identified
and applied in some property-related problems while sequential associations
can be found in a sales transactions database and applied to predicting future
purchases in E-business. These different associations in different problem
domains can help infer valuable knowledge.
One of the goals of this work is to show that the maturity of data mining
has reached a point where large-scale applications to practical problems are
desirable and feasible. This work will hopefully create some sort of chain
effect in motivating the strategic use of data mining in business applications
where conventional approaches fall short. The success of practical appli-
CUSTOMER RETENTION VIA DATA MINING 589
Acknowledgements
We would like to thank Farhad Hussain and Manoranjan Dash for helping us
finalize this version of the paper, and the company involved in the project to
make this application possible although it is unfortunate that the identity of
the company cannot be mentioned. We are also indebted to the anonymous
reviewers and the editor for their detailed constructive suggestions and
comments.
References
Agrawal, R., Imielinski, T & Swami, A. (1993). Database Mining: A Performance Perspective.
IEEE Trans. on Knowledge and Data Engineering 5(6): 914–925.
Agrawal, R. & Srikant, R. (1994). Fast Algorithms for Mining Association Rules in Large
Databases. In: Proceedings of the 20th VLDB Int’l Conference, Santiago, Chile, 487–499.
Blum, A. & Langley, P. (1997). Selection of Relevant Features and Examples in Machine
Learning. Artificial Intelligence 97: 245–271.
Clearwater, S., Cheng, T., Hirsh, H. & Buchanan, B. (1989). Incremental Batch Learning. In
Segre, A. (ed.) Proceedings of The Sixth International Workshop on Machine Learning,
366–370. Morgan Kaufmann Publishers, Inc.
Dash, M. & Liu, H. (1997). Feature Selection Methods for Classifications. Intelligent Data
Analysis: An International Journal 1(3). http://www-east.elsevier.com/ida/free.
htm.
Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). From Data Mining to Knowledge
Discovery: An Overview. In Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. & Uthurusamy,
R. (eds.) Advances in Knowledge Discovery and Data Mining, 495–515. AAAI Press /
The MIT Press.
Han, J. & Fu, Y. (1996). Attribute-Oriented Induction in Data Mining. In Fayyad, U.,
Piatetsky-Shapiro, G., Smyth, P. & Uthurusamy, R. (eds.) Advances in Knowledge
Discovery and Data Mining, 399–421. AAAI Press / The MIT Press.
John, G., Kohavi, R. & Pfleger, K. (1994). Irrelevant Feature and the Subset Selection Problem.
In Cohen, W. A. H. H. (ed.) Machine Learning: Proceedings of the Eleventh International
Conference, 121–129. New Brunswick, N.J.
590 KIANSING NG AND HUAN LIU