Churn Prediction Using Machine Learning
Churn Prediction Using Machine Learning
Abstract— Telecom operators face a big problem with calls), our goal is to extract meaningful insights that can guide
customer churn, or attrition, which lowers customer successful retention tactics.
satisfaction, reduces market share, and results in revenue In recent years, the proliferation of competitor solutions,
loss. For telecom businesses to apply efficient retention combined with changing customer expectations, has made it
tactics and keep a competitive edge in the ever-changing more difficult to retain subscribers in the telecommunications
telecom market, it is critical to comprehend the root business. As a result, understanding the root causes and patterns
causes and trends of customer attrition. In an effort to of customer attrition has become critical for telecom operators
improve company sustainability in the telecom sector, this looking to preserve a competitive advantage and assure
study provides a thorough investigation of telecom corporate sustainability.
customer attrition from a machine learning perspective. By employing feature selection and engineering techniques,
The study uses cutting-edge machine learning techniques we extract relevant features that capture the essence of customer
applied to telecom data to examine the trends and causes churn dynamics. The core of the project involves the application
of customer turnover. By using predictive modelling, we of various machine learning algorithms, including logistic
may determine the main causes impacting attrition and regression, decision trees, random forests, and gradient boosting
gain insights into the dynamics of client churn. We create machines, to construct predictive models. We aim to understand
reliable predictive models that can precisely predict the nuances of customer attrition dynamics by examining the
customer attrition by conducting a methodical assessment complicated interactions between these factors and giving
of machine learning methods such as gradient boosting, telecom operators with invaluable insights to proactively
random forests, and decision trees. This study advances address churn and foster long-term customer loyalty.
predictive analytics in the telecom industry by using Furthermore, our research extends beyond analysis to real-
machine learning to evaluate telecom customer attrition. world application, as we incorporate our predictive modelling
It also offers useful implications for the expansion and efforts' results into an intuitive user interface. Using the Flask
sustainability of businesses. framework, we provide a streamlined user interface that gives
telecom operators instant access to data on customer loss
Keywords: Telecom, Customer Attrition, Customer probability. This solution fills the knowledge gap between
Churn, Machine Learning, Predictive Analytics, Business predictive analytics and real-world implementation, giving
Sustainability. telecom operators a strong tool to predict customer attrition and
take proactive steps to keep customers.
I. INTRODUCTION
II. LITERATURE REVIEW
The problem of customer attrition, in which users stop
using services, which reduces revenue and market share, is a The use of machine learning approaches for predicting customer
major issue in the telecommunications sector. For telecom attrition in the telecom industry is examined in this article. The
operators hoping to preserve business sustainability, study seeks to effectively predict customer turnover by utilising
comprehending, anticipating, and resolving customer attrition four different machine learning classifiers and applying the
becomes more and more important as technology advances and Synthetic Minority Oversampling Technique (SMOTE) to
competition heats up. Our study explores this problem by using overcome imbalanced dataset concerns [1]. M. Galal, S. Rady,
cutting-edge machine learning methods to identify the and M. Aref's paper focuses on predicting client turnover in
underlying causes and trends of customer attrition. digital banking platforms. They present a classifier-based model
By employing an extensive dataset that includes customer for consumer profile data and compare various supervised
demographic data, consumption metrics (like talk time and classification methods such as KNN, Logistic Regression,
charges), and customer contact indicators (like service AdaBoost, Gradient Boosting, and Random Forest [2]
The paper by Xiaowei Zhang and Juanqiong Gou explores the
relationship between customer emotions and service purchases. IV. WORKING PROCESS
Using customer complaints from an information service
enterprise, the study establishes a warning model for customer Procedure – Data collection
churn based on emotions. Through data analysis techniques The first stage in the process is to obtain the Telecom
like data mining, the research aims to predict customer churn Customer Churn dataset, which is the main source of data for
by identifying emotional indicators in complaints [4]. By the study. This dataset comes from the internal database of the
defining churn customers, selecting relevant features, and telecom firm that is the subject of the investigation, or from a
training SVM, Adaboost, RandomForest, and Xgboost models, reliable third-party source. The dataset includes all of the
they aim to identify customers at risk of churn. The study aims pertinent customer data, including call statistics, churn labels,
to assist airlines in adopting personalized marketing strategies voicemail messages, international plan status, and account
to maximize profits, address customer churn, maintain market length. To guarantee the accuracy of the data for further
share, and boost profitability [6]. analysis, it is essential to confirm the integrity and
completeness of the dataset.
III. PROPOSED SYSTEM
Procedure – Preprocessing the dataset
In order to achieve equitable representation of both churn
In order to predict telecom customer churn, the proposed
system for this research project includes a thorough pipeline and non-churn cases, we correct inherent imbalances in the
that includes data cleaning and preprocessing, exploratory data dataset during the preprocessing step. This helps to prevent
analysis (EDA), feature selection, model training and model bias and improve prediction accuracy. Unbalanced
evaluation, and model deployment as a Flask web application. datasets can cause skewed model performance, when the
First, to guarantee data quality and consistency, the dataset model shows a bias towards predicting the majority class,
which includes vital customer data including account duration, leading to worse than ideal performance when it comes to
international plan status, voicemail messages, call statistics, identifying occurrences of the minority class, like churn cases.
and churn labels is carefully cleaned and preprocessed. To accomplish this balance, methods like as resampling and
Subsequently, feature selection attempts are guided by the Synthetic Minority Oversampling Technique (SMOTE) are
insights obtained from the dataset's features and distributions used, either by duplicating instances of the minority class or
through the application of EDA approaches. he most pertinent by creating synthetic samples that correspond to the majority
predictors for churn prediction are then found using feature class. Preprocessing also includes data transformation and
selection techniques, such as assessing the impact of various cleaning to improve data quality and guarantee feature
attributes on the predictor variable. This system uses a range of consistency. This includes addressing outliers, inconsistencies,
machine learning methods, such as MLP Classifier, XGBoost, and missing values, all of which, if ignored, can negatively
Decision Tree, and Random Forest, to forecast customer impact the performance of the model.
attrition in the telecom sector. The dataset is used to train and (EDA) is essential for comprehending the properties of
assess these algorithms in order to find out how well they the dataset and providing guidance for further modelling
predict churn. decisions. We can learn more about feature distributions, spot
outliers, and investigate connections with the goal variable,
churn, by using EDA. While bivariate analysis examines
correlations between pairs of features and their link with
churn, univariate analysis assists in identifying outliers and
anomalies among individual features. This analysis helps with
feature and model selection and offers insightful information
about the data structure. Recursive Feature Elimination (RFE),
which iteratively chooses the most instructive features for
model training, is one feature selection technique used in
preprocessing. We can improve prediction accuracy and lower
the chance of overfitting by choosing the most pertinent
characteristics, which will improve the model's capacity to
Fig 1: System Architecture generalise to new data.
V. MODEL PERFORMANCE
VII. CONCLUSION
[1] M, Aishwarya & T, Bindhiya & Tanisha, S & B, Soundarya [10] F. Alhaqui, M. Elkhechafi and A. Elkhadimi, "Machine
& Shanuja, C. (2023). Customer Churn Prediction Using learning for telecoms: From churn prediction to customer
Synthetic Minority Oversampling Technique. 01-05. relationship management," 2022 IEEE International Conference
10.1109/C2I659362.2023.10430989. on Machine Learning and Applied Network Technologies
(ICMLANT), Soyapango, El Salvador, 2022, pp. 1-5, doi:
[2] M. Galal, S. Rady and M. Aref, “Enhancing Customer 10.1109/ICMLANT56191.2022.9996496.
Churn Prediction in Digital Banking using Ensemble
Modeling,” 2022 4th Novel Intelligent and Leading Emerging [11] S. D. Kumar, K. Soundarapandiyan and S. Meera,
Sciences Conference (NILES), Giza, Egypt, 2022, pp. 21–25, "Comparative Study of Customer Churn Prediction Based on
doi:10.1109/NILES56402.2022.9942408. Data Ensemble Approach," 2023 Intelligent Computing and
Control for Engineering and Business Systems (ICCEBS),
[3] H. Karamollaoğlu, İ. Yücedağ and İ. A. Doğru, "Customer Chennai, India, 2023, pp. 1-10, doi:
Churn Prediction Using Machine Learning Methods: A 10.1109/ICCEBS58601.2023.10449139.
Comparative Analysis," 2021 6th International Conference on
Computer Science and Engineering (UBMK), 2021, pp. 139- [12] D. Azzam, M. Hamed, N. Kasiem, Y. Eid and W. Medhat,
144, doi: 10.1109/UBMK52708.2021.9558876. "Customer Churn Prediction Using Apriori Algorithm and
Ensemble Learning," 2023 5th Novel Intelligent and Leading
[4] Xiaowei Zhang and Juanqiong Gou, "Warning model of Emerging Sciences Conference (NILES), Giza, Egypt, 2023,
customer churn based on emotions," 2015 International pp. 377-381, doi: 10.1109/NILES59815.2023.10296608.
Conference on Logistics, Informatics and Service Sciences
(LISS), Barcelona, 2015, pp. 1-3, doi:
10.1109/LISS.2015.7369683.