0% found this document useful (0 votes)
47 views2 pages

Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince

Uploaded by

nima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince

Uploaded by

nima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

2012 IEEE 36th International Conference on Computer Software and Applications

Customer Churn Prediction for Telecom Services

Utku Yabas Hakki Candan Cankaya Turker Ince


School of Engineering and Computer Science and Engineering School of Engineering and
Computer Science Department Computer Science
Izmir University of Economics Southern Methodist University Izmir University of Economics
Izmir, Turkey Dallas, Texas Izmir, Turkey
e-mail: [email protected] [email protected] [email protected]

Abstract—Customer churn is a big concern for telecom service experiments. Although competition is over, improvements
providers due to its associated costs. This short paper briefly can still be made. Evaluations for the ranking consider the
explains our ongoing work on customer churn prediction for area under the Receiver Operating Characteristic (ROC)
telecom services. We are working on data mining methods to curve. We are working on the ensemble methods to improve
accurately predict customers who will change and turn to the solution to the churn prediction problem. The first place
another provider for the same or similar service. Sample in the KDD 2009 competition at churn problem is still held
dataset we use for our experiments has been compiled by by IBM [2]. We have chosen the same dataset to work with,
Orange Telecom from real data. They posted the sample because it includes all the features and challenges for churn
dataset for 2009 Knowledge Discovery and Data Mining
prediction. The dataset does not reveal any information about
Competition. IBM has scored the highest on this dataset
requiring significant amount of computational resources. We
customers. We can also see other scores from the
are aiming to find alternative methods that can match or competition’s website which helps us evaluate how we are
improve the recorded highest score with more efficient use of doing.
resources. Dataset has very large number of features, examples
II. METHODS USED AND POTENTIAL CONTRIBUTIONS
and incomplete values. As the first step, we employ some
methods to preprocess the dataset for its imperfections. Then, To overcome the churn prediction problem, we have been
we compare and contrast various ensemble and single using machine learning algorithms and data mining tools.
classifiers. We conclude the paper with future directions for One of the popular tools in the field is Weka [3]. Weka is an
the study. open source software for data mining, developed by the
University of Waikato in New Zealand. We also use
Keywords-churn prediction; machine learning; data mining; additional libraries for the methods that are not implemented
pattern recognition; in Weka. We built and implemented these additional
methods that were not included in the tool set.
I. INTRODUCTION We encountered some challenges so far in the study. Our
Rapid improvements and dynamics in technology market first challenge was the size of the dataset. It includes 100,000
place make customer retention a competitive effort. examples with 15,000 variables. The dataset was equally
Especially in saturated telecommunications market, there are divided into training and test sets. Test set’s class labels are
incumbent service providers and newcomers offering deals not posted. The other concern was preprocessing of the
and packages for consumers who would like to churn to their variables. Variables are polluted by high number of missing
services. On the defending end, strategies and counter offers values and outliers. There are some variables that are not
have to be made for potential churners as it is more normalized with different dynamic ranges. Some categorical
expensive earning a customer back once s/he churns. variables have huge number of vocabulary and some
According to the SAS Institute report [1], the annual rate of numerical variables have only few distinct values. The
customer churn in telecommunications industry is currently amount of required computational resources was high.
at about 30% with an upward trend in correlation with the Because of the size of the dataset and its huge number of
growth of the market. variables and examples, model building takes a long time.
In this study, we concentrate on evaluation and analysis The other major challenge in our study was the unbalanced
of performance of different machine learning methods for numbers of positive and negative samples per class in churn
accurate churn prediction. In 2009, the French Telecom data sets. The positive-to-negative sample ratio is only less
Company Orange sponsored a competition in knowledge than 10% of the class examples.
discovery and data mining (KDD), and posted three After the preprocessing, we focused on mostly ensemble
problems [2]. One of the problems was churn analysis and methods. Algorithm that scores highest in our experiments is
prediction. They provided a sample real dataset to be used in Random Forests [5]. Random Forests is built from many
the competition. In our study, we use this dataset for our decision trees. At each node of single decision trees, m

0730-3157/12 $26.00 © 2012 IEEE 353


356
358
DOI 10.1109/COMPSAC.2012.54
number of random features are chosen and best split on these observations, the Random Forests algorithm has the best
m is used to split that node. Each single tree votes for the score in our experiments so far. We ran these algorithms on a
prediction. Most popular prediction among many decision personal computer with 2.66 GHz dual core Intel processor
trees is the output of the forest. and 4 GB of Ram. Building Random Forests with 50 trees
takes about an hour on this platform.
III. CURRENT RESULTS Random Forests has parameters for number of trees,
IBM’s churn score was 0.7651, which can be seen from maximum depth of the trees, and number of random features
the leader board at competition’s website [2]. In this study, to be used for each split. Some important results on the test
our main aim is to improve the IBM’s ROC area score. Some set can be seen on Table I. Each score is a Random Forest
of the teams in KDD 2009 competition have decided to use model score with given number of trees and maximum depth
the ensemble methods. From our own experimental results, of the trees where each tree is built with 6 random features.
we observed that ensemble methods outperform the We observe that Random Forest with 50 trees and up to
individual classifiers for this dataset. Orange Telecom also maximum depth of 15 gives the best result of 0.6533, see in
posted the small version of the dataset which includes 50,000 Table I. We could improve the score with re-sampling
examples and 230 variables [2]. We have been performing technique. By re-sampling, the number of positive examples
some experiments with this small dataset and we plan to test increases and the number of negative examples decreases.
the best performing set of methods to the larger dataset after This effectively adds up some cost to the positive examples
improving the result from the small one. which prevents the bias problem on the classifier
Because the dimension of the feature set is significantly performance.
large, to improve the overall classification performance we
TABLE I. RANDOM FORESTS ROC AREA SCORES
decided to implement feature selection. We have
experimented on the effects of different feature subsets on Number of Trees
performance of a classifier. We excluded the features that Random Forest
have single or no value from the dataset. Numeric features 25 50
with very little distinct values are also excluded. Figure 1 15 0.6338 0.6533
shows the histogram of the values for a chosen sample
Depth of the
numeric variable. Huge percent of the values of variable 9 trees
20 0.6292 0.6503
are gathered around 0. No
limit
0.6367 0.6530

IV. OUTLOOK AND CONCLUSIONS


Churn is the hardest problem of the three posted
problems for the KDD 2009 competition. IBM has the
highest 0.7651 ROC area score. In this study, we evaluate
alternative machine learning methods aiming at matching or
improving the best scores recorded at the KDD 2009
competition. We also focus on efficient use of computational
resources. As the continuation of this study, we will include
other ensemble methods and complete the comparative
Figure 1. Histogram of Variable 9
analysis. We will also work on intelligent ways of dealing
About 50% of the variables have more than 98% of their with the dataset impurities and reducing overall complexity.
values missing. Missing values in numerical variables are We believe that this analysis should also lay a good
replaced with mean, and the ones in categorical variables are foundation for the other churn prediction problems, where
treated as another category. Eleven of the 40 categorical the proposed methods should be applicable with little or no
variables have more than 1000 distinct values. We only used modifications.
10 most frequent values of the categorical variables with REFERENCES
huge vocabulary and merged others in a value named
[1] SAS Institute (2000). Best Practice in Churn Prediction. A SAS
“Others”. Following the preprocessing and feature selection, Institute White Paper.
we studied the parameters of the algorithms. We focus on the
[2] 2009 Knowledge Discovery and Datamining competition,
performance of the ensemble methods. Until now, we have http://www.kddcup-orange.com
tried some of the ensemble and single methods. We built [3] I. Guyon, V. Lemaire, M. Boulle, G. Dror, , D. Vogel, “Analysis of
churn prediction models using decision trees, Random the KDD Cup 2009: Fast Scoring on a Large Customer Database”,
Forests, decision trees with bagging and boosting. We also JMLR: Workshop and Conference Proceedings 7: 1-22, 2009.
worked on bag of single classifiers. Ensemble methods like [4] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter
boosting and Random Forests outperform single decision Reutemann, Ian H. Witten (2009); The WEKA Data Mining
trees. Our focus is to better existing scores for churn Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
prediction by using ensemble methods. From our [5] Leo Breiman, “Random Forests”, Machine Learning, 45 , 5-32, 2001.

359
357
354

You might also like