Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince

Uploaded by

nima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views2 pages

Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince

Uploaded by

nima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

2012 IEEE 36th International Conference on Computer Software and Applications

Customer Churn Prediction for Telecom Services

Utku Yabas Hakki Candan Cankaya Turker Ince

School of Engineering and Computer Science and Engineering School of Engineering and
Computer Science Department Computer Science
Izmir University of Economics Southern Methodist University Izmir University of Economics
Izmir, Turkey Dallas, Texas Izmir, Turkey
e-mail: [email protected] [email protected] [email protected]

Abstract—Customer churn is a big concern for telecom service experiments. Although competition is over, improvements
providers due to its associated costs. This short paper briefly can still be made. Evaluations for the ranking consider the
explains our ongoing work on customer churn prediction for area under the Receiver Operating Characteristic (ROC)
telecom services. We are working on data mining methods to curve. We are working on the ensemble methods to improve
accurately predict customers who will change and turn to the solution to the churn prediction problem. The first place
another provider for the same or similar service. Sample in the KDD 2009 competition at churn problem is still held
dataset we use for our experiments has been compiled by by IBM [2]. We have chosen the same dataset to work with,
Orange Telecom from real data. They posted the sample because it includes all the features and challenges for churn
dataset for 2009 Knowledge Discovery and Data Mining
prediction. The dataset does not reveal any information about
Competition. IBM has scored the highest on this dataset
requiring significant amount of computational resources. We
customers. We can also see other scores from the
are aiming to find alternative methods that can match or competition’s website which helps us evaluate how we are
improve the recorded highest score with more efficient use of doing.
resources. Dataset has very large number of features, examples
II. METHODS USED AND POTENTIAL CONTRIBUTIONS
and incomplete values. As the first step, we employ some
methods to preprocess the dataset for its imperfections. Then, To overcome the churn prediction problem, we have been
we compare and contrast various ensemble and single using machine learning algorithms and data mining tools.
classifiers. We conclude the paper with future directions for One of the popular tools in the field is Weka [3]. Weka is an
the study. open source software for data mining, developed by the
University of Waikato in New Zealand. We also use
Keywords-churn prediction; machine learning; data mining; additional libraries for the methods that are not implemented
pattern recognition; in Weka. We built and implemented these additional
methods that were not included in the tool set.
I. INTRODUCTION We encountered some challenges so far in the study. Our
Rapid improvements and dynamics in technology market first challenge was the size of the dataset. It includes 100,000
place make customer retention a competitive effort. examples with 15,000 variables. The dataset was equally
Especially in saturated telecommunications market, there are divided into training and test sets. Test set’s class labels are
incumbent service providers and newcomers offering deals not posted. The other concern was preprocessing of the
and packages for consumers who would like to churn to their variables. Variables are polluted by high number of missing
services. On the defending end, strategies and counter offers values and outliers. There are some variables that are not
have to be made for potential churners as it is more normalized with different dynamic ranges. Some categorical
expensive earning a customer back once s/he churns. variables have huge number of vocabulary and some
According to the SAS Institute report [1], the annual rate of numerical variables have only few distinct values. The
customer churn in telecommunications industry is currently amount of required computational resources was high.
at about 30% with an upward trend in correlation with the Because of the size of the dataset and its huge number of
growth of the market. variables and examples, model building takes a long time.
In this study, we concentrate on evaluation and analysis The other major challenge in our study was the unbalanced
of performance of different machine learning methods for numbers of positive and negative samples per class in churn
accurate churn prediction. In 2009, the French Telecom data sets. The positive-to-negative sample ratio is only less
Company Orange sponsored a competition in knowledge than 10% of the class examples.
discovery and data mining (KDD), and posted three After the preprocessing, we focused on mostly ensemble
problems [2]. One of the problems was churn analysis and methods. Algorithm that scores highest in our experiments is
prediction. They provided a sample real dataset to be used in Random Forests [5]. Random Forests is built from many
the competition. In our study, we use this dataset for our decision trees. At each node of single decision trees, m

356
358
DOI 10.1109/COMPSAC.2012.54
number of random features are chosen and best split on these observations, the Random Forests algorithm has the best
m is used to split that node. Each single tree votes for the score in our experiments so far. We ran these algorithms on a
prediction. Most popular prediction among many decision personal computer with 2.66 GHz dual core Intel processor
trees is the output of the forest. and 4 GB of Ram. Building Random Forests with 50 trees
takes about an hour on this platform.
III. CURRENT RESULTS Random Forests has parameters for number of trees,
IBM’s churn score was 0.7651, which can be seen from maximum depth of the trees, and number of random features
the leader board at competition’s website [2]. In this study, to be used for each split. Some important results on the test
our main aim is to improve the IBM’s ROC area score. Some set can be seen on Table I. Each score is a Random Forest
of the teams in KDD 2009 competition have decided to use model score with given number of trees and maximum depth
the ensemble methods. From our own experimental results, of the trees where each tree is built with 6 random features.
we observed that ensemble methods outperform the We observe that Random Forest with 50 trees and up to
individual classifiers for this dataset. Orange Telecom also maximum depth of 15 gives the best result of 0.6533, see in
posted the small version of the dataset which includes 50,000 Table I. We could improve the score with re-sampling
examples and 230 variables [2]. We have been performing technique. By re-sampling, the number of positive examples
some experiments with this small dataset and we plan to test increases and the number of negative examples decreases.
the best performing set of methods to the larger dataset after This effectively adds up some cost to the positive examples
improving the result from the small one. which prevents the bias problem on the classifier
Because the dimension of the feature set is significantly performance.
large, to improve the overall classification performance we
TABLE I. RANDOM FORESTS ROC AREA SCORES
decided to implement feature selection. We have
experimented on the effects of different feature subsets on Number of Trees
performance of a classifier. We excluded the features that Random Forest
have single or no value from the dataset. Numeric features 25 50
with very little distinct values are also excluded. Figure 1 15 0.6338 0.6533
shows the histogram of the values for a chosen sample
Depth of the
numeric variable. Huge percent of the values of variable 9 trees
20 0.6292 0.6503
are gathered around 0. No
limit
0.6367 0.6530

IV. OUTLOOK AND CONCLUSIONS

Churn is the hardest problem of the three posted
problems for the KDD 2009 competition. IBM has the
highest 0.7651 ROC area score. In this study, we evaluate
alternative machine learning methods aiming at matching or
improving the best scores recorded at the KDD 2009
competition. We also focus on efficient use of computational
resources. As the continuation of this study, we will include
other ensemble methods and complete the comparative
Figure 1. Histogram of Variable 9
analysis. We will also work on intelligent ways of dealing
About 50% of the variables have more than 98% of their with the dataset impurities and reducing overall complexity.
values missing. Missing values in numerical variables are We believe that this analysis should also lay a good
replaced with mean, and the ones in categorical variables are foundation for the other churn prediction problems, where
treated as another category. Eleven of the 40 categorical the proposed methods should be applicable with little or no
variables have more than 1000 distinct values. We only used modifications.
10 most frequent values of the categorical variables with REFERENCES
huge vocabulary and merged others in a value named
[1] SAS Institute (2000). Best Practice in Churn Prediction. A SAS
“Others”. Following the preprocessing and feature selection, Institute White Paper.
we studied the parameters of the algorithms. We focus on the
[2] 2009 Knowledge Discovery and Datamining competition,
performance of the ensemble methods. Until now, we have http://www.kddcup-orange.com
tried some of the ensemble and single methods. We built [3] I. Guyon, V. Lemaire, M. Boulle, G. Dror, , D. Vogel, “Analysis of
churn prediction models using decision trees, Random the KDD Cup 2009: Fast Scoring on a Large Customer Database”,
Forests, decision trees with bagging and boosting. We also JMLR: Workshop and Conference Proceedings 7: 1-22, 2009.
worked on bag of single classifiers. Ensemble methods like [4] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter
boosting and Random Forests outperform single decision Reutemann, Ian H. Witten (2009); The WEKA Data Mining
trees. Our focus is to better existing scores for churn Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
prediction by using ensemble methods. From our [5] Leo Breiman, “Random Forests”, Machine Learning, 45 , 5-32, 2001.

359
357
354

Classification of Customer Churn Prediction Model For Telecommunication Industry Using Analysis of Variance
No ratings yet
Classification of Customer Churn Prediction Model For Telecommunication Industry Using Analysis of Variance
7 pages
CEH Practical Notes - ?????????
No ratings yet
CEH Practical Notes - ?????????
40 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
PNMTj-5000s Operation
100% (1)
PNMTj-5000s Operation
83 pages
Churn Prediction
100% (3)
Churn Prediction
41 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
100% (1)
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
14 pages
Ali Tamaddoni Jahromi, Mehrad Moeini, Issar Akbari, Aram Akbarzadeh
No ratings yet
Ali Tamaddoni Jahromi, Mehrad Moeini, Issar Akbari, Aram Akbarzadeh
11 pages
A Survey On Customer Churn Prediction in
No ratings yet
A Survey On Customer Churn Prediction in
6 pages
Customer Churn Prediction Using Machine Learning Algorithms
No ratings yet
Customer Churn Prediction Using Machine Learning Algorithms
5 pages
IB Math AA SL Questionbank - Exponents & Logarithms
No ratings yet
IB Math AA SL Questionbank - Exponents & Logarithms
1 page
IJSC Vol 10 Iss 2 Paper 5 2054 2060
No ratings yet
IJSC Vol 10 Iss 2 Paper 5 2054 2060
7 pages
Churn Analysis Report
No ratings yet
Churn Analysis Report
28 pages
Ispanyakonferansi
No ratings yet
Ispanyakonferansi
5 pages
A Proposed Churn Prediction Model: Essam Shaaban, Yehia Helmy, Ayman Khedr, Mona Nasr
No ratings yet
A Proposed Churn Prediction Model: Essam Shaaban, Yehia Helmy, Ayman Khedr, Mona Nasr
5 pages
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
No ratings yet
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
16 pages
Baggage Delivery Business Proposal
100% (1)
Baggage Delivery Business Proposal
6 pages
Model 1 - Customer Churn Prediction in Telecom Using ML
No ratings yet
Model 1 - Customer Churn Prediction in Telecom Using ML
24 pages
Churn Forecasting Using Deep Ljearning Model
No ratings yet
Churn Forecasting Using Deep Ljearning Model
5 pages
Ref 1
No ratings yet
Ref 1
10 pages
Predicting Churn Customer in Telecom Using Peergrading Regression Learning Technique
No ratings yet
Predicting Churn Customer in Telecom Using Peergrading Regression Learning Technique
13 pages
Swasti Arya
No ratings yet
Swasti Arya
6 pages
CustomerChurnPrediction ProjectReport 2555425555
No ratings yet
CustomerChurnPrediction ProjectReport 2555425555
19 pages
Customer Churn Prediction System: A Machine Learning Approach
No ratings yet
Customer Churn Prediction System: A Machine Learning Approach
24 pages
Review1 1
No ratings yet
Review1 1
16 pages
Analysis of Customer Churn Prediction in Telecom Industry Using Decision Trees and Logistic Regression
No ratings yet
Analysis of Customer Churn Prediction in Telecom Industry Using Decision Trees and Logistic Regression
4 pages
1 s2.0 S2590123024014208 Main
No ratings yet
1 s2.0 S2590123024014208 Main
12 pages
Phase 3
No ratings yet
Phase 3
16 pages
Comparative Study of Customer Churn Prediction Based On Data Ensemble Approach
No ratings yet
Comparative Study of Customer Churn Prediction Based On Data Ensemble Approach
10 pages
Computing Efficient Features Using Rough Set Theory Combined With Ensemble Classification Techniques To Improve The Customer Churn Prediction in Telecommunication Sector
No ratings yet
Computing Efficient Features Using Rough Set Theory Combined With Ensemble Classification Techniques To Improve The Customer Churn Prediction in Telecommunication Sector
22 pages
Comparative Analysis of Predictive Models For Customer Churn Prediction in The Telecommunication Industry
No ratings yet
Comparative Analysis of Predictive Models For Customer Churn Prediction in The Telecommunication Industry
6 pages
Paper Published
No ratings yet
Paper Published
5 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Seminar Synopsisreport
No ratings yet
Seminar Synopsisreport
6 pages
DWDM Cep
No ratings yet
DWDM Cep
13 pages
2017 Paper 10
No ratings yet
2017 Paper 10
5 pages
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
No ratings yet
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
7 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Churn PredictionITNACC
No ratings yet
Churn PredictionITNACC
7 pages
Décortication Article 1
No ratings yet
Décortication Article 1
4 pages
Predicting Customer Churn A Systematic Literature Review
No ratings yet
Predicting Customer Churn A Systematic Literature Review
22 pages
Journal Pone 0278095
No ratings yet
Journal Pone 0278095
21 pages
A Customer Churn Prediction Using Pearson Correlation Function and K Nearest Neighbor Algorithm For Telecommunication Industry
No ratings yet
A Customer Churn Prediction Using Pearson Correlation Function and K Nearest Neighbor Algorithm For Telecommunication Industry
14 pages
Research Paper - Tushar Agrawal
No ratings yet
Research Paper - Tushar Agrawal
3 pages
Report
No ratings yet
Report
17 pages
Synopsis Major Project
No ratings yet
Synopsis Major Project
8 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
Customerchurnprediction Systema Machinelearning
No ratings yet
Customerchurnprediction Systema Machinelearning
24 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Customer Churn Prediction Using Machine Learning Algorithms
No ratings yet
Customer Churn Prediction Using Machine Learning Algorithms
6 pages
Analysis of Telecom Churn Using Machine Learning Techniques
No ratings yet
Analysis of Telecom Churn Using Machine Learning Techniques
6 pages
Customer Churn Analysis in Telecom Industry
No ratings yet
Customer Churn Analysis in Telecom Industry
6 pages
Synopsis
No ratings yet
Synopsis
3 pages
Wa0004.
No ratings yet
Wa0004.
70 pages
Barkatullah University Online Migration Form
67% (6)
Barkatullah University Online Migration Form
34 pages
Customer Churn Prediction Using Machine Learning
No ratings yet
Customer Churn Prediction Using Machine Learning
7 pages
E3sconf Iccsei2023 02012
No ratings yet
E3sconf Iccsei2023 02012
9 pages
Customer Churn Prediction Model For Telecommunication Industry
No ratings yet
Customer Churn Prediction Model For Telecommunication Industry
7 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
CX Programer-Help-OMRON FB Library Reference Ver.090612: CPU Position Controller
No ratings yet
CX Programer-Help-OMRON FB Library Reference Ver.090612: CPU Position Controller
31 pages
Block Diagram and Layout Plans
No ratings yet
Block Diagram and Layout Plans
10 pages
Architectural Design On Digital ICS Assignment-2
No ratings yet
Architectural Design On Digital ICS Assignment-2
15 pages
Fundamentals
No ratings yet
Fundamentals
2 pages
Ads R2022
No ratings yet
Ads R2022
178 pages
Benchmarking For Comparative Evaluation of RP Systems and Processes
No ratings yet
Benchmarking For Comparative Evaluation of RP Systems and Processes
13 pages
8-Queen Problem
No ratings yet
8-Queen Problem
2 pages
JavaScript - If... Else Statement
No ratings yet
JavaScript - If... Else Statement
4 pages
OOP - S2021 - Mid Term Exam
No ratings yet
OOP - S2021 - Mid Term Exam
2 pages
Man Versus Machine Learning - The Term Structure of Earnings Expectations and Conditional Biases
No ratings yet
Man Versus Machine Learning - The Term Structure of Earnings Expectations and Conditional Biases
36 pages
D8129 Octo Relay Mod Installation Manual enUS 2538142603
No ratings yet
D8129 Octo Relay Mod Installation Manual enUS 2538142603
8 pages
Importance of Software Testing in Software Development Life Cycle
No ratings yet
Importance of Software Testing in Software Development Life Cycle
4 pages
Chapter 3: Solving Systems of Linear Equations Using Gaussian Elimination
No ratings yet
Chapter 3: Solving Systems of Linear Equations Using Gaussian Elimination
13 pages
IP Wireless / Wired Camera Waterproof: User Manual
No ratings yet
IP Wireless / Wired Camera Waterproof: User Manual
56 pages
Acer Aspire 3810t - INVENTEC BAP31 - 1310A2264501 - REV AX1Sec
No ratings yet
Acer Aspire 3810t - INVENTEC BAP31 - 1310A2264501 - REV AX1Sec
36 pages
Vestige Best Deal
No ratings yet
Vestige Best Deal
19 pages
Regulation-2019 Curriculum - UG - CSE 1
No ratings yet
Regulation-2019 Curriculum - UG - CSE 1
19 pages
20 Abbreviations Related To Computer (5 Files Merged)
No ratings yet
20 Abbreviations Related To Computer (5 Files Merged)
11 pages
Config Windows Server 2022 SP Us
No ratings yet
Config Windows Server 2022 SP Us
26 pages
DCW20 960W DIN Rail Combo DC-UPS / DC-DC Converter: Main Features Embedded User Interface
No ratings yet
DCW20 960W DIN Rail Combo DC-UPS / DC-DC Converter: Main Features Embedded User Interface
4 pages
How To Enable A Pre-Boot BitLocker PIN On Windows
No ratings yet
How To Enable A Pre-Boot BitLocker PIN On Windows
6 pages
JETIR1509007
No ratings yet
JETIR1509007
7 pages
Status Update
No ratings yet
Status Update
5 pages
Resumen Del Hardware Del Ordenador
No ratings yet
Resumen Del Hardware Del Ordenador
2 pages
Sell Bitcoin For Bank Transfer NoOnes
No ratings yet
Sell Bitcoin For Bank Transfer NoOnes
1 page
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince

Uploaded by

Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince

Uploaded by

2012 IEEE 36th International Conference on Computer Software and Applications

Customer Churn Prediction for Telecom Services

Utku Yabas Hakki Candan Cankaya Turker Ince

0730-3157/12 $26.00 © 2012 IEEE 353

IV. OUTLOOK AND CONCLUSIONS

You might also like