0% found this document useful (0 votes)
36 views6 pages

Enhancing Customer Retention Strategies Predicting Churn Rate in Telecom Sectors Using Machine Learning Ensemble Techniques

Uploaded by

snehadhake1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views6 pages

Enhancing Customer Retention Strategies Predicting Churn Rate in Telecom Sectors Using Machine Learning Ensemble Techniques

Uploaded by

snehadhake1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 10th International Conference on Communication and Signal Processing (ICCSP)

Enhancing Customer Retention Strategies:


Predicting Churn Rate in Telecom Sectors Using
Machine Learning Ensemble Techniques
Apoorva Vikrant Kulkarni
Chinmay Chourdia Stuti Roy Symbiosis Centre for Information
2024 10th International Conference on Communication and Signal Processing (ICCSP) | 979-8-3503-5306-8/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICCSP60870.2024.10543502

Symbiosis Centre for Information Symbiosis Centre for Information Technology,


Technology, Technology, Symbiosis International
Symbiosis International Symbiosis International (Deemed University)
(Deemed University) (Deemed University) Pune, India
Pune, India Pune, India [email protected]
[email protected] [email protected]
Bhuvaneswari Arunagiri
Anna University & Adhiparasakthi
Engineering College, India
[email protected]

Abstract— In the competitive telecommunications landscape, acquiring new customers is underscored as 10


industry, clients risk revenue and marketing costs. This times pricier than retaining existing ones. Customer churn
study focuses on churn using ensemble methods such as analysis becomes crucial for tasks like profiling
RandomForest, ExtraTrees, LightGBM, and XGBoost. customers, scrutinizing defections, and estimating attrition
Evaluating eight algorithms including Decision Tree and likelihood.
Gradient Boosting, AdaBoost emerges as the best The telecommunications industry's landscape has been
performer with the highest F1 score, ensuring a further shaped by the increasing adoption of Over-The-
balanced trade-off between accuracy and recall He Top (OTT) channels like Netflix and Amazon Video. This
advises telecom companies on flying routes improving digital shift, amplified by the global COVID-19
the storage of strategic importance, using machine pandemic, has not only led to a surge in digital media
learning. The findings provide a strategy for consumption but has also heightened the demand for
telecommunication companies to improve customer higher bandwidths and efficient network management by
retention strategies in a dynamic market. telecom companies [2]. Governments and regulatory
bodies are intervening to ensure seamless internet traffic
flow, presenting additional challenges for telecom service
Keywords—Churn, Machine Learning, Accuracy, providers.
Precision, Decision Tree, K-Nearest Neighbors (KNN), In the recent past, advancements in knowledge and
Gradient Boosting, AdaBoost, Random Forest, Extra technology have brought about transformative changes
Trees, LightGBM, XGBoost. across various sectors. These developments, often referred
to as the "Fourth Industrial Revolution" [2], have led to a
I. INTRODUCTION convergence of the physical and digital realms.
In the telecommunications industry, which is The primary objective of this research is to employ
characterized by intense competition and many players, machine learning algorithms, including RandomForest,
customer turnover is a major challenge, because customers ExtraTrees, LightGBM, and XGBoost, to predict
are offered many options, which accelerates the battle for customer churn in the telecom sector [3]. Focusing on
market share. This phenomenon reflects the risk of Customer Relationship Management (CRM) strategies,
customers switching to other service providers. As a the study goes beyond predicting churn to identify
growing problem in the rapidly growing and competitive influential factors. By leveraging data mining and
telecommunications industry, the focus of the industry has machine learning, the research aims to establish an
shifted from acquiring new customers to retaining existing efficient churn model, recognizing the importance of
customers due to the high costs involved [1]. The purpose understanding the distinctive characteristics of churn
of this study is to explore the complexities of customer customers. Intending to enhance customer retention, the
attrition, investigate the influencing factors and propose study emphasizes advising the company on effective
strategic solutions for telecommunications companies amid strategies, considering churn prediction as a central theme
this dynamic and challenging environment. crucial for telecom success [4][ 5].
Customer churn, the termination of a telecom
subscription, poses a significant threat to revenue, urging II. LITERATURE REVIEW
companies to prioritize retention efforts in today's Customer churn poses a significant threat to
commercial landscape. Acquiring new customers is subscription-based businesses, impacting enterprise value.
reported to be up to 10 times more expensive than Traditional satisfaction data often falls short in addressing
retaining existing ones. Churn, influenced by factors like root causes, necessitating a shift in understanding the
low satisfaction and aggressive competition, necessitates customer experience. This challenge is exemplified by
effective retention strategies. In subscription-based Alex, typical consumer-facing choices in today's
industries like insurance and telecommunications, competitive market. Recognizing customer retention as
calculating churn is paramount. In this competitive

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:53:17 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-5306-8/24/$31.00 ©2024 IEEE 377
crucial, businesses invest strategically for efficient The challenge of customer churn requires advanced
growth. data analysis and predictive methodologies. Logistic
regression, outperforming other methods, provides
Churn events arise when the customer experience falls valuable insights into churn factors. Yet, there's a gap in
below a critical threshold. Existing methods like exploring hybrid models for a more nuanced
satisfaction surveys prove inadequate in capturing understanding, suggesting future research opportunities
impulsive customer decisions. Proactive measures are [15]
essential to identify nuanced dissatisfaction factors.
The research employs supervised learning algorithms III. METHODOLOGY
to predict customer churn, utilizing six algorithms: KNN,
Random Forest, XGBoost, AdaBoost, Decision Tree, and
Gradient Boosting.
• The KNN (K-Nearest Neighbors) algorithm is a
method of machine learning that makes
classifications or predictions by examining the
majority class of its nearest neighbours in the feature
space. This algorithm possesses characteristics of
simplicity and versatility [6].
• Random Forest, an ensemble algorithm, prevents
overfitting and efficiently handles highly
Fig. 1. Process Flow Diagram
dimensional data, demonstrating lower classification
errors [7]. A. Dataset
• XGBoost, recognized for speed and accuracy, The telecom customer dataset, sourced from Kaggle
outperforms other gradient boosting and originally uploaded by user Blastchar five years ago,
implementations, excelling in classification and provides valuable insights into a California-based telecom
regression tasks [8]. company in the USA. The dataset encompasses a
• AdaBoost improves the performance of other comprehensive data dictionary. With a total of 21
algorithms by focusing on misclassified data, columns, including 19 features and 1 target column
demonstrating effectiveness in applications like (Churn), there are 7043 rows, each representing a distinct
customer. The dataset is a balanced combination of
fingerprint classification [9].
categorical and numerical features, making it versatile for
• Decision Tree classifiers, popular for representing in-depth analysis. The downloader and documentation
classifiers, find applications in diverse fields due to highlight the dataset and its importance in predicting
their interpretability and ease of understanding [10]. customer behaviour and developing targeted customer
• Gradient Boosting, a technique for regression and retention programs. This dataset is a reliable resource for
understanding the dynamics of telecommunications
classification, constructs a predictive model through
customer relationships and provides valuable information
an ensemble of weak prediction models [11]. for both research analysis and predictive modelling.
• Extra-Trees, an extension of Random Forest,
enhances model robustness by introducing high B. Loading Dataset
randomization during tree-building. It considers The telecom customer prediction dataset was loaded into
multiple random splits for each feature, guarding Python using slash and the file path was obtained from the
against overfitting and improving generalization file stored on the laptop and desktop.
capabilities compared to traditional Random
Forests[12]. C. Importing Required Libraries
• LightGBM, an efficient gradient boosting The next step involved importing the necessary libraries
framework, excels in tasks like classification and to simplify tasks and simplify the code. Prewritten code in
regression, optimized for distributed and efficient Python libraries can help solve some problems and make
training [13]. data preprocessing steps more efficient. Imported libraries,
as shown in Figure 2, facilitate access to the code that solves
The literature review highlights the significance of the problem.
customer churn prediction in the telecom sector. Existing
studies explore various machine learning methods,
emphasizing the superiority of boosted versions and
ensemble techniques, particularly Random Forest.
Proposed models leverage Random Forest, achieving an
88.63% accuracy in classifying instances and identifying
specific churn factors for targeted strategies. However,
research gaps suggest further exploration of clustering
techniques for a nuanced understanding of customer
dynamics and enhanced predictive strategies [14].

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:53:17 UTC from IEEE Xplore. Restrictions apply.
378
• F1-Score:
A complete metric for evaluating binary
classification that strikes a balance between recall
and precision is the F1-Score. It gives an overview
of the prediction performance of a model,
especially for datasets that are both balanced and
imbalanced. The F1-Score is computed using the
following formula:

Fig. 6. F1-Score Formula

A high F1 score indicates a model with few false


positives and few false negatives, providing a
reliable estimate of its performance.
IV. DATA ANALYSIS
Fig. 2. Importing Libraries
• Tenure Matters, a chart showing a key factor
affecting customer turnover: contract length. It
D. Importing Data
shows the number of customers in each contract
Next, a CSV format dataset containing tabular data was period group (contract duration) who terminated
imported. The read_csv() function was used to create a data (cancelled service). We can see a clear trend: the
frame that provides a structured representation of the dataset turnover is highest in the shortest period (0-12
for further analysis. This systematic approach ensures that months) and decreases continuously as the tenure
the necessary tools and resources are readily available in the increases. This suggests that customers with shorter
later stages of data exploration and turnover forecasting. contracts are more likely to leave, which may be
due to:
- Lower perceived value: You feel less invested in
E. Classification Report the service with a shorter commitment.
• Accuracy: - More Flexibility: Easier escape route if they're not
Accuracy is a key performance measure expressed happy.
as a percentage that indicates the proportion of - Promotional rates: Due to introductory offers on
shorter contracts, they may have higher renewal
correctly predicted cases in the test data. It
costs, which increase turnover.
estimates the total closeness of the forecasts to the
actual values, and the accuracy formula gives:

Fig. 3. Accuracy Formula

• 2.Precision:
Precision measures the model and its ability to
correctly predict a given class of cases, focusing on
the percentage of true positives. It measures the Fig. 7. Bar Graph (Churn-Tenure)
accuracy of the model's positive predictions, and
the accuracy formula is: • Longer Contracts Linked to Higher Churn: Scatter
plot displays monthly charges distribution across
different customer tenures (0-12, 12-24, 24-48, and
over 48 months). Longer tenures exhibit a broader
Fig. 4. Precision Formula range and higher median monthly charges; for
instance, over a 48-month tenure median is $80,
• Recall: compared to $60 for 0-12 months. Potential reasons
Recall calculates the percentage of accurately include higher usage levels in longer-tenured
predicted positive cases out of all potential positive customers or being on older, costlier rate plans.
predictions. It is sometimes referred to as Outliers indicate customers with significantly higher
sensitivity or true positive rate. Out of all true or lower charges, possibly due to excessive data use
positive samples, it determines the proportion of or promotional rates. The plot suggests a correlation
accurately recognized positive samples, and the between tenure and monthly charges, highlighting
return formula is: factors influencing customer billing variations.

Fig. 5. Recall Formula

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:53:17 UTC from IEEE Xplore. Restrictions apply.
379
Fig. 8. Scatter Plot (Charges-Tenure)

• High spenders are more likely to churn: This


scatterplot highlights the correlation between Fig. 10. Box Plot (Charges-Tenure)
increases in monthly and total spending and higher
customer turnover, represented by the orange dots. • Duration of contract and monthly payment increase
Growth trends indicate that turnover is affected by turnover: the graph has a double effect on the
both immediate monthly costs and cumulative turnover of telecommunications customers, the
financial burdens. This could be due to budget extension of contracts and the increase of monthly
constraints, dissatisfaction with expensive plans or payment. When both factors increase, turnover
lower value, especially from customers with a higher increases, which more likely indicates that customers
total spend. are leaving with more financial responsibility and
stress. Key strategies to address this include offering
competitive renewal contracts, offering flexible plan
options to accommodate changes in usage, and
ensuring transparent payment transfers to avoid bill
shocks and build trust. Fundamentally, understanding
and managing the interplay between contract length
and monthly payments is critical to minimizing
turnover in the telecommunications industry.

Fig. 9. Scatter Plor (Churn-Charges)

• Higher total fees increase cost reduction. The box plot


illustrates the wider range of total fees for one- and
two-year contracts compared to monthly contracts,
with higher medians. For example, the average total
fee for two-year contracts is around 6,000, while for
monthly contracts it is around 2,000. In addition,
there are significant differences in the distribution of
total payments between cancelled and cancelled
customers, especially for one-year and two-year
contracts. The average total cost of excluded
customers is higher, for example, about 7,000 for Fig. 11. Bar Graph (Correlation)
two-year contracts compared to about 5,000 for
cancelled customers. In summary, the box plot
highlights the differences in total costs between V. RESULT AND DISCUSSION
contract types and customer situations, as well as The Result of Comparison Performance Metrics
potential financial impacts and customer behaviour.
• Accuracy:
The graph compares the accuracy of eight machine
learning models for a telecom classification task,
ranging from 73.05% (KNN) to 78.72%
(AdaBoost). High performers include AdaBoost
and Gradient Boosting (78%+), mid-range
performers are Decision Tree, LightGBM, and
XGBoost (76-77.5%), while lower performers are
KNN, Random Forest, and Extra Trees (<77%).
High accuracy benefits telecom by reducing churn,
enhancing fraud detection, optimizing network
performance, and providing personalized customer

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:53:17 UTC from IEEE Xplore. Restrictions apply.
380
experiences. Considerations include data quality, telecommunications applications and provides
cost-benefit analysis, and interpretability. valuable information for strategic decision-making.
AdaBoost leads with a precision of 0.74, recall of
0.70, F1-score of 0.71, and accuracy of 0.79,
excelling in identifying at-risk customers. Gradient
Boosting follows with balanced performance
(precision 0.73, recall 0.68, F1-score 0.69,
accuracy 0.77). Random Forest, LightGBM, and
XGBoost offer competitive results across metrics.
Decision Tree and Extra Trees prioritize
interpretability, while KNN exhibits lower
precision (0.66) and recall (0.63). AdaBoost is
recommended for precision-focused applications,
while other models offer balanced trade-offs
between precision, recall, and accuracy, aligned
with diverse business objectives. Regular
monitoring ensures continued effectiveness in
predicting customer churn. Fig. 14. ROC-AUC Curve

VI. LIMITATION AND FUTURE DIRECTION


This study has several limitations that deserve to be
acknowledged. In particular, the imbalance of the dataset
represents an important limitation. A ratio of 100:36
between the majority class (customers who stay) and the
minority class (customers who leave) can prevent machine
learning models, from the ability to learn effectively from
the minority class. This imbalance makes it difficult to
accurately predict turnover, which can bias the results in
Fig. 12. Bar Graph (Model Accuracy) favour of the majority class. Furthermore, the dataset used
in this study covers only some of the characteristics related
TABLE I. SCORE GETTING FROM DIFFERENT MODELS
to switching telecommunication companies. Excluding
Confusion Matric Report various possible influencing factors limits the
Models generalizability of the results to other data and scenarios. It
F1-
Precision Recall Accuracy is important to understand that the research and findings
Score
Decision Tree 73% 68% 70% 78% may not be universally applicable due to these limitations.
K-Nearest Neighbour 66% 63% 64% 73% VII. CONCLUSION
Gradient Boosting 73% 68% 69% 77% A comparison of eight machine learning models of traffic
Adaboost 74% 70% 71% 79% forecasts shows the excellence of AdaBoost, leading in
accuracy (0.74) and accuracy (0.79). Models ranging from
Randon Forest 72% 68% 69% 77% 73.05% (KNN) to 78.72% (AdaBoost) offer the possibility
Extra Trees 69% 66% 67% 75% to significantly reduce turnover. AdaBoost's accuracy-
focused application, with a mathematical accuracy of 0.74,
LightGBM 72% 69% 70% 77%
highlights its effectiveness in identifying high-risk
XGboost 71% 68% 69% 77% customers. It shows how machine learning can optimally
Fig. 13. Table (Confusion Matrix Score)
boost network performance, reduce turnover and improve
individual customer experiences. Continuous monitoring
• ROC – AUC Curves: and consideration of factors such as data quality ensure
The ROC curve evaluates binary classification continuous effectiveness in preventing and controlling
customer turnover in the telecommunications sector.
models that describe the trade-off between true
positive rate (TPR) and false positive rate (FPR) at
different thresholds. In a graph comparing machine VIII. REFERENCE
learning models for telecom data, AdaBoost stands [1] John Hadden, Ashutosh Tiwari, Rajkumar Roy, Dymitr Ruta,
out with the largest area under the curve (AUC), Computer-assisted customer churn management: State-of-the-art and future
trends, Computers & Operations Research, Volume 34, Issue 10, 2007,
indicating better overall performance. AdaBoost Pages 2902-2917, ISSN 0305-0548,
distinguishes between excellent positive and https://doi.org/10.1016/j.cor.2005.11.007.
negative cases and minimizes errors. This brings [2] Keun Lee, Franco Malerba, Catch-up cycles and changes in industrial
benefits to telcos, including customer attrition, leadership: Windows of opportunity and responses of firms and countries in
better fraud detection and optimized network the evolution of sectoral systems, Research Policy, Volume 46, Issue 2,
2017, Pages 338-351, ISSN 0048-7333,
performance. In summary, AdaBoost stand's high https://doi.org/10.1016/j.respol.2016.09.006.
AUC demonstrates its effectiveness in key

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:53:17 UTC from IEEE Xplore. Restrictions apply.
381
[3] Alzubi, J., Nayyar, A., & Kumar, A. (2018). Machine Learning from
Theory to Algorithms: An Overview. Journal of Physics: Conference
Series, 1142, 012012. https://doi.org/10.1088/1742-6596/1142/1/012012.
[4] Bin, L., Peiji, S., & Juan, L. (2007). Customer Churn Prediction Based
on the Decision Tree in Personal Handyphone System Service. 2007
International Conference on Service Systems and Service Management.
https://doi.org/10.1109/icsssm.2007.4280145.
[5] Özdemir, O., Batar, M., & Işık, A. H. (2020). Churn Analysis with
Machine Learning Classification Algorithms in Python. Artificial
Intelligence and Applied Mathematics in Engineering Problems,
844–852. https://doi.org/10.1007/978-3-030-36178-5_73.
[6] Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019). A Brief
Review of Nearest Neighbor Algorithm for Learning and Classification.
2019 International Conference on Intelligent Computing and Control
Systems (ICCS). https://doi.org/10.1109/iccs45141.2019.9065747.
[7] Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
https://doi.org/10.1023/a:1010933404324.
[8] Peng, Z., Huang, Q., & Han, Y. (2019). Model Research on Forecast of
Second-Hand House Price in Chengdu Based on XGboost Algorithm. 2019
IEEE 11th International Conference on Advanced Infocomm Technology
(ICAIT). https://doi.org/10.1109/icait.2019.8935894.
[9] Zhang, Y., Ni, M., Zhang, C., Liang, S., Fang, S., Li, R., & Tan, Z.
(2019). Research and Application of AdaBoost Algorithm Based on SVM.
2019 IEEE 8th Joint International Information Technology and Artificial
Intelligence Conference (ITAIC).
https://doi.org/10.1109/itaic.2019.8785556.
[10] B. Charbuty and A. Abdulazeez, “Classification Based on Decision
Tree Algorithm for MachineLearning”, JASTT, vol. 2, no. 01, pp. 20 - 28,
Mar. 2021. https://doi.org/10.38094/jastt20165.
[11] O. González-Recio, J.A. Jiménez-Montero, R. Alenda, The gradient
boosting algorithm and random boosting for genome-assisted evaluation in
large data sets, Journal of Dairy Science, Volume 96, Issue 1, 2013, Pages
614-624, ISSN 0022-0302, https://doi.org/10.3168/jds.2012-5630.
[12] Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees.
Mach Learn 63, 3–42 (2006). https://doi.org/10.1007/s10994-006-6226-1.
[13] Y. Deng, D. Li, L. Yang, J. Tang and J. Zhao, "Analysis and
prediction of bank user churn based on the ensemble learning algorithm,"
2021 IEEE International Conference on Power Electronics, Computer
Applications (ICPECA), Shenyang, China, 2021, pp. 288-291, doi:
10.1109/ICPECA51329.2021.9362520.
[14] I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam and S. W. Kim,
"A Churn Prediction Model Using Random Forest: Analysis of Machine
Learning Techniques for Churn Prediction and Factor Identification in
Telecom Sector," in IEEE Access, vol. 7, pp. 60134-60149, 2019, doi:
10.1109/ACCESS.2019.2914999.
[15] ICVISP 2019: Proceedings of the 3rd International Conference on
Vision, Image and Signal Processing August 2019 Article No.: 34 Pages 1–
7 https://doi.org/10.1145/3387168.3387219.

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:53:17 UTC from IEEE Xplore. Restrictions apply.
382

You might also like