Comparative Study of Customer Churn Prediction Based On Data Ensemble Approach
Comparative Study of Customer Churn Prediction Based On Data Ensemble Approach
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-9458-0/23/$31.00 ©2023 IEEE
churn. One such industry is the credit card dichotomous: i.e. binary. Logistic
users where band of researchers have done regression is employed to explain data and
a vast research on customer churn. therefore the relationship between one
(Rajamohamed R, Manokaran J, 2018) variable quantity and one or more
have used a rough clustering and base independent variables.
supervised learning algorithms to predict Ensemble methods may be a machine
the churn of a credit card company. learning technique that mixes several base
Customer Churn in service industry is models so as to supply one optimal
considered to be matter of survival for predictive model. Ensemble modeling
many organizations. So various could be a process where multiple diverse
reserachers have adopted various methods models are created to predict an outcome,
and techniques to predict the churn and either by using many various modeling
one such benchmark prediction is done algorithms or using different training data
(Coussement, K., & Van den Poel, D., sets. The ensemble model then aggregates
2008) by using the application of support the prediction of every base model and
vector machine for a subscription based leads to once final prediction for the
service company. The customer churn unseen data.
prediction in telecommunications by Related Review of Literature:
Huang, B., et. al., (2012), proved to The cardinal to arrive at solutions for this
establish the future researchers to use those problems is by forecasting the probable
models as a base model for their research. churners (Hadden J, et.al, 2007) and it
Machine Learning Algorithm: becomes pertinent for the organizations to
A Naive Bayes classifier could be a use churn prediction as a tool (Abbasimehr
probabilistic machine learning model H, 2011), to identity the customers who
that’s used for classification task. The crux are at risk of churning. Most of the
of the classifier relies on the Bayes churning prediction has been using
theorem. machine learning platform to analyze the
Discriminant Function Analysis could be a existing dataset (Ahmad AK, 2019).
dimensionality reduction technique that's Various models in machine learning were
commonly used for supervised compared for churn prediction by using
classification problems. AUC curve and confusion matrix
Decision trees are constructed via an (Vafeiadis, T., et.al, (2015). Random
algorithmic approach that identifies ways Forest model has be used for churn
to differentiate between a knowledge set in prediction in telecom industry in many
various supported conditions. cases (Idris, A., 2012). After the repeated
A random forest may be a machine use of Random forest by many researchers,
learning technique that’s wont to solve where the data used was appropriate for
regression and classification problems. It the usage (Xie, Y.,2009) used improved
utilizes ensemble learning, which may be a balanced random forest model for the
technique that mixes many classifiers to prediction ie., the imbalanced data was
supply solutions to complex problems. balanced by using bootstrap method.
Logistic regression may be a method that's Prediction for churn using data mining
used for building machine learning models approach (Wei CP, Chiu IT 2002) provides
where the variable quantity is the data used to wrangle before the usage
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
and the same have been providing a campaigns and provide offers that are
similar result that of a machine learning. profitable for the company.
Various arduous researchers have tried in
various forum (Nath, S. V., & Behara, R. Research Objectives:
S. 2003), to discuss the diverse ML • To dissever the customers into
algorithms for customer churn in various various segments and understand
industry and also eventually come out with the reason of churning
their appropriate model to fit the • To empathize the customers’
sensational issue of churn prediction. Also expectations and developing a
models like linear regression, support recommendation model
vector machine, naïve bayes, decision tree • To disseminate the customers’
have been used by multiple researchers to behaviour towards the company in
predict the potential customer churn order to strategize for customer
(Lalwani, P., 2022). Ensemble modeling satisfaction.
has been adopted by certain research to Empirical Methodology
attest the authenticity of the prediction, ie., This section, elaborates the various
(De Bock, K. W.,2011), combining more analysis that are performed for ratcheting
than two models and evaluating the best fit towards the taken objectives. As per the
model to arrive the best possible outcome. problem statement it is evident that
Problem statement: outcome concentrates more of whether a
Due to the advancement of technology customer has churned or not. Classification
people prefer to purchase products at the algorithms is adopted for this sublime
comfort of the home. Post pandemic the problem. To further authenticate the found
scenario of consumers choosing online results for the problem given the data has
mode of purchase has significantly gone been further exerted on various other
up. In connection to that, this study models like Naive Bayes Model,
concentrates an E-Commerce company in Discriminant Analysis, Decision Tree,
India which is facing a huge customer Random Forest, and Logistic Regression.
churn due to hyper competition. Also the To further attest the outcome of the data
company is under tremendous pressure to ensemble modeling of Ada boost, Gradient
retain the exiting consumers. Above all the Boosting and Bagging has been used.
more prominent peak of problem was the Cross Validation has been adopted for the
losing one account which has multiples model tuning of the taken proposed
users, i.e., more than once customer are models. To start with the analysis the
using the same account which is quite given data was understood by dataset
common in India. provided. From the dataset provided it was
Need of the Study: more lenient to understand and know the
The idea of this research is to understand attributes of the dataset and also provided
the customers based on the provided data a glimpse of the number of variables and
and segment them according in order to the total number of rows. From the
develop various churn prediction models problem statement and dataset it was
and choose an appropriate model which evident that the target variable was
address the problem and also provides ‘Churn’ and there were 19 other variables
recommendations in order to organize
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
provided to manipulate the outcome churn. was evident that the target variable
The work further proceeded with doing ‘Churn’ was not evenly spread-across, ie.,
exploratory data analysis where both Yes was 86.16% and No was 16.83%. Out
Univariate and bivariate analysis (Fig. 1) of two classes one is higher than the other,
has been run-through to understand the ie., Yes is more than No. In order to
insights of the given dataset. balance the data of oversampling, SMOTE
technique has been adopted. In order to
furtherance the research the missing values
for each variables were identified and it
was explored that almost all the variables
have missing values expect 4 variables like
‘Account ID’, ‘Churn’, ‘revenue growth
year on year’ and ‘coupon used for
Figure 1: Univariate Analysis payment’ were having a missing values.
Categorical variables like ‘payment’,
‘gender’, ‘account segment’, ‘marital
status’ and ‘login device’ were imputed by
using mode method. All other numerical
values were imputed by using mean
method. To validate the dataset with the
extreme values, outlier treatment (Fig. 2)
was adopted and it evident that most of the
variables have outliers present but the
variable ‘Churn’ has the highest number of
outlier with 16.38% whereas the rest of the
data has a negligible percentage of
outliers.
Figure 2: Outliers
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
Naive
82
Test Bayes 87% 50% 66% 87%
Figure 3: Pearson Correlation via Heat %
Model
Map
Figure 4: Train Data & Test Data
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
The above table 3 along with the figure 6,
provides that accuracy is high whereas
Recall value is not good enough for the
model to be used. Even the both the train
and train set looks similar the model
cannot be chosen due to low recall values
even though the precision value seem to be
good.
Table 4. Random Forest
The table 2 and figure 5 provides that Precision
Train Recall / F1
Accuracy is high whereas Recall value is Models Accuracy / AUC
/ Test Sensitivity Score
Specificity
not good enough for the model to be used.
Random
Even the both the train and train set looks Train
Forest
92% 64% 86% 97% 92%
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
From the above table 4 and figure 7, one due to the data is oversampled. This
can conclude that even though Accuracy, method overcomes the imbalances in the
Specificity are high the model cannot be dataset by generating artificial data in the
chosen due to the moderate Recall value. minority class observations to shift the
Table 5. Logistic Regression minority into majority. 4 Ensemble
Modeling (1.Ada Boost 2. Gradient
Acc
Recall Preci Boosting 3. Bagging 4. Random Forest)
Train / sion / F1
/ Test
Models urac
Sensiti Speci
AUC
Score have been used.
y
vity ficity Table 6. Comparison of Ensemble
Logistic
Train
Regression
88% 42% 76% 86% 88% Modeling
Logistic Recall
Test 88% 43% 77% 86% 88% Acc Precision
Regression Train / / F1
Models urac / AUC
Test Sensiti Score
y Specificity
vity
Figure 8. Train Data & Test Data Ensemble
Learning
Train 90% 59% 76% 92% 90%
- Ada
Boost
Ensemble
Learning
Test 90% 58% 76% 92% 90%
- Ada
Boost
Ensemble
Learning
Train - 92% 63% 83% 95% 92%
Gradient
Boosting
Ensemble
Learning
Test - 91% 61% 82% 95% 91%
Gradient
Boosting
Ensemble
Learning 100
Train 100% 100% 100% 100%
- %
Bagging
Ensemble
Learning
Test 97% 86% 93% 100% 97%
-
Bagging
Ensemble
Learning
100
Train - 100% 100% 100% 100%
From the table 5 and figure 8, it is evident Random
%
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
Accuracy, Specificity are high whereas considered to be the best fit model.
Recall value is medium to consider the Insights:
model to be fit but in the case of Bagging Comparison of the gender, males
model, Accuracy, Specificity and Recall customers (59.6%) have high probability
values are high and both AUC & F1 Score of churning than the female customer
are good enough for the model to consider (39.5%) and in the case of Login Device,
and also the train and train set looks Mobile users (66%) have high probability
similar. This makes this model to be of churning compared to that of Computer
chosen to predict the churn outcome. In the (27%). More number of customers have
case of Ensemble Random Forest model, churned when the number of calls to the
Accuracy, Specificity and Recall values customer care has been in the range of 7 –
are high and both AUC & F1 Score are 17 in the last 12 months and also more
good enough for the model to consider and number of customers have churned in the
the train and train set looks similar. This initial 2 years which got drastically
makes this model to be chosen to predict subsidized after 11th year. The customer
the churn outcome. whose status were married (53%) were
Best Model: high time churns followed by single (29%)
The above models provided a clear idea and customers who have less than 3 –
about the appropriate model to be used years of tenure(25%) were high time
based on Accuracy, Recall, Precision, churners. Non receiving of cashback (8%)
AUC and F1 Score. Out of the 5 models, 2 prompted the customers to churn and no
models namely Decision Tree and Random contact of customer care (7%) in days also
Forest models consider to be better than has a substantial influence in the customer
other models. The reason to choose the churn. The more the customers contacted
above 2 modes were due to high accuracy, the customer care in the last 12 months
moderate sensitivity and reasonable (7%) also has an influence in the target
specificity. For both the above mentioned variables and finally when there are less or
models AUC & F1 Score were good no complaints in the last 12 months (6%)
enough for the model to consider. Out to then the customer churn is minimal.
the 2 models Random Forest model was Recommendations:
considered to be the best fit model due to The following recommendations would be
high in accuracy, specificity, AUC and FI appropriate to implement considering the
Score while moderate in Sensitivity insights obtained from the model.
compared to that of Decision Tree model. Concentration on women centric
So the Random Forest model is considered promotions will bring in more female
to be the best fit model. After Ensemble customer and identifying the reason for the
modeling Bagging and Random Forest churn of mobile device users and try
model were considered to be almost rectifying the same. Trying to minimize
similar to each other. Out of two, Bagging the numbers of calls that a customer call
considered to be best fit model based on the customer care by solving the issues and
high values on accuracy, Sensitivity, responding back by “happy to serve you
Specificity, AUC and F1 Score. Out of all gesture”. Concentration on acquiring long
the models before and after tuning by term customers’ profiles should be
ensemble modeling, Bagging was considered. (Referral bonus). Trying to
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
minimize the complaints (less or no (2011) A neuro-fuzzy classifier for
complaints in the last 12 months (6%)) customer churn
then the customer churn is minimal. Try prediction.International Journal of
providing huge offers like (30 % - 60%) Computer Applications 19(8):35–
discount for the customers who register for 41
longer tenure. Strategize loyalty points or ➢ Ahmad AK, Jafar A, Aljoumaa K
benefits for retaining the customers for a (2019) Customer churn prediction
longer period by providing maximum in telecom using machine learning
satisfaction. Providing a minimum in big data platform. Journal of Big
cashback benefit for the all the customers Data 6(1):28
who have renewed on time (minimum of ➢ Coussement, K., & Van den Poel,
3% - 10%) will be ideal. Identify the D. (2008). Churn prediction in
reason for which the customer calls and subscription services: An
device a process to maximize the application of support vector
satisfaction level by rectifying the machines while comparing two
problems. Evaluate the reason for parameter-selection techniques.
complaints in the last 3 year or so and find Expert systems with applications,
out an appropriate solution to provide a 34(1), 313-327.
satisfactory service. ➢ De Bock, K. W., & Van den Poel,
Conclusion: D. (2011). An empirical evaluation
Company needs a churn prediction model of rotation-based ensemble
to address the segmented customers for classifiers for customer churn
providing offers in order to retain them. It prediction. Expert Systems with
is pertinent to have a clear outcome based Applications, 38(10), 12293-
recommendations for the campaigns and 12301.
the offers to be provided in the campaign.
➢ Hadden J, Tiwari A, Roy R, Ruta
The outcome of the research should
D (2007) Computer assisted
provide appropriate suggestive offers that
customer churn management:
are profit making to the company. Only by
State-ofthe-art and future trends.
knowing the customers (why customers
Computers & Operations Research
churn) well the company can satisfy their
34(10):2902–2917
financial goals as well as the societal
➢ Huang, B., Kechadi, M. T., &
objectives. This report provides the
Buckley, B. (2012). Customer
company a clear understanding about the
churn prediction in
behavior of their customers which in turn
telecommunications. Expert
can increase their offerings to satisfy their
Systems with Applications, 39(1),
customers. Since customers are part of
1414-1425.
society it gives immense opportunity for
the company to satisfy the customer by ➢ Idris, A., Rizwan, M., & Khan, A.
knowing their tastes and preferences and (2012). Churn prediction in
achieving their financial goals. telecom using Random Forest and
References: PSO based data balancing in
combination with various feature
➢ Abbasimehr H, Setak M, Tarokh M
selection strategies. Computers &
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.
Electrical Engineering, 38(6), Ying, W. (2009). Customer churn
1808-1819. prediction using improved
➢ Lalwani, P., Mishra, M. K., balanced random forests. Expert
Chadha, J. S., & Sethi, P. (2022). Systems with Applications, 36(3),
Customer churn prediction system: 5445-5449.
a machine learning approach.
Computing, 104(2), 271-294.
➢ Nath, S. V., & Behara, R. S. (2003,
November). Customer churn
analysis in the wireless industry: A
data mining approach. In
Proceedings-annual meeting of the
decision sciences institute (Vol.
561, pp. 505-510).
➢ Rajamohamed R, Manokaran J
(2018) Improved credit card churn
prediction based on rough
clustering and supervised learning
techniques. Cluster Computing
21(1):65–77
➢ S. D. Kumar, K. Soundarapandiyan
and S. Meera, "Commiserating
Customers’ Purchasing Pattern
Using Market Basket Analysis,"
2022 1st International Conference
on Computational Science and
Technology (ICCST), CHENNAI,
India, 2022, pp. 1-4, doi:
10.1109/ICCST55948.2022.10040
320.
➢ Vafeiadis, T., Diamantaras, K. I.,
Sarigiannidis, G., & Chatzisavvas,
K. C. (2015). A comparison of
machine learning techniques for
customer churn prediction.
Simulation Modelling Practice and
Theory, 55, 1-9.
➢ Wei CP, Chiu IT (2002) Turning
telecommunications call details to
churn prediction: a data mining
approach. Expert systems with
applications 23(2):103–112
➢ Xie, Y., Li, X., Ngai, E. W. T., &
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on January 06,2025 at 03:49:30 UTC from IEEE Xplore. Restrictions apply.