0% found this document useful (0 votes)
25 views5 pages

Churn Prediction Using Machine Learning

This article examines predicting customer attrition in the telecommunications industry using machine learning techniques. It proposes a system including data collection, preprocessing, exploratory data analysis, feature selection, model training and evaluation, and deploying models through a web application. The goal is to effectively predict customer churn and provide useful insights for sustainability in the telecom business.

Uploaded by

FIZA S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views5 pages

Churn Prediction Using Machine Learning

This article examines predicting customer attrition in the telecommunications industry using machine learning techniques. It proposes a system including data collection, preprocessing, exploratory data analysis, feature selection, model training and evaluation, and deploying models through a web application. The goal is to effectively predict customer churn and provide useful insights for sustainability in the telecom business.

Uploaded by

FIZA S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Understanding Customer Attrition: A

Machine Learning Perspective for Business


Sustainability
1 2
Nithyashree T, Fiza S, 3Abinayasri S,
4
Department of Computer Science Aravind C
and Engineering, Department of Computer Science and
Sri Ramakrishna Engineering Engineering,
College, Coimbatore, India Sri Ramakrishna Engineering College,
Coimbatore, India
Email: [email protected]
Email: [email protected]
[email protected]
[email protected]

Abstract— Telecom operators face a big problem with calls), our goal is to extract meaningful insights that can guide
customer churn, or attrition, which lowers customer successful retention tactics.
satisfaction, reduces market share, and results in revenue In recent years, the proliferation of competitor solutions,
loss. For telecom businesses to apply efficient retention combined with changing customer expectations, has made it
tactics and keep a competitive edge in the ever-changing more difficult to retain subscribers in the telecommunications
telecom market, it is critical to comprehend the root business. As a result, understanding the root causes and patterns
causes and trends of customer attrition. In an effort to of customer attrition has become critical for telecom operators
improve company sustainability in the telecom sector, this looking to preserve a competitive advantage and assure
study provides a thorough investigation of telecom corporate sustainability.
customer attrition from a machine learning perspective. By employing feature selection and engineering techniques,
The study uses cutting-edge machine learning techniques we extract relevant features that capture the essence of customer
applied to telecom data to examine the trends and causes churn dynamics. The core of the project involves the application
of customer turnover. By using predictive modelling, we of various machine learning algorithms, including logistic
may determine the main causes impacting attrition and regression, decision trees, random forests, and gradient boosting
gain insights into the dynamics of client churn. We create machines, to construct predictive models. We aim to understand
reliable predictive models that can precisely predict the nuances of customer attrition dynamics by examining the
customer attrition by conducting a methodical assessment complicated interactions between these factors and giving
of machine learning methods such as gradient boosting, telecom operators with invaluable insights to proactively
random forests, and decision trees. This study advances address churn and foster long-term customer loyalty.
predictive analytics in the telecom industry by using Furthermore, our research extends beyond analysis to real-
machine learning to evaluate telecom customer attrition. world application, as we incorporate our predictive modelling
It also offers useful implications for the expansion and efforts' results into an intuitive user interface. Using the Flask
sustainability of businesses. framework, we provide a streamlined user interface that gives
telecom operators instant access to data on customer loss
Keywords: Telecom, Customer Attrition, Customer probability. This solution fills the knowledge gap between
Churn, Machine Learning, Predictive Analytics, Business predictive analytics and real-world implementation, giving
Sustainability. telecom operators a strong tool to predict customer attrition and
take proactive steps to keep customers.
I. INTRODUCTION
II. LITERATURE REVIEW
The problem of customer attrition, in which users stop
using services, which reduces revenue and market share, is a The use of machine learning approaches for predicting customer
major issue in the telecommunications sector. For telecom attrition in the telecom industry is examined in this article. The
operators hoping to preserve business sustainability, study seeks to effectively predict customer turnover by utilising
comprehending, anticipating, and resolving customer attrition four different machine learning classifiers and applying the
becomes more and more important as technology advances and Synthetic Minority Oversampling Technique (SMOTE) to
competition heats up. Our study explores this problem by using overcome imbalanced dataset concerns [1]. M. Galal, S. Rady,
cutting-edge machine learning methods to identify the and M. Aref's paper focuses on predicting client turnover in
underlying causes and trends of customer attrition. digital banking platforms. They present a classifier-based model
By employing an extensive dataset that includes customer for consumer profile data and compare various supervised
demographic data, consumption metrics (like talk time and classification methods such as KNN, Logistic Regression,
charges), and customer contact indicators (like service AdaBoost, Gradient Boosting, and Random Forest [2]
The paper by Xiaowei Zhang and Juanqiong Gou explores the
relationship between customer emotions and service purchases. IV. WORKING PROCESS
Using customer complaints from an information service
enterprise, the study establishes a warning model for customer Procedure – Data collection
churn based on emotions. Through data analysis techniques The first stage in the process is to obtain the Telecom
like data mining, the research aims to predict customer churn Customer Churn dataset, which is the main source of data for
by identifying emotional indicators in complaints [4]. By the study. This dataset comes from the internal database of the
defining churn customers, selecting relevant features, and telecom firm that is the subject of the investigation, or from a
training SVM, Adaboost, RandomForest, and Xgboost models, reliable third-party source. The dataset includes all of the
they aim to identify customers at risk of churn. The study aims pertinent customer data, including call statistics, churn labels,
to assist airlines in adopting personalized marketing strategies voicemail messages, international plan status, and account
to maximize profits, address customer churn, maintain market length. To guarantee the accuracy of the data for further
share, and boost profitability [6]. analysis, it is essential to confirm the integrity and
completeness of the dataset.
III. PROPOSED SYSTEM
Procedure – Preprocessing the dataset
In order to achieve equitable representation of both churn
In order to predict telecom customer churn, the proposed
system for this research project includes a thorough pipeline and non-churn cases, we correct inherent imbalances in the
that includes data cleaning and preprocessing, exploratory data dataset during the preprocessing step. This helps to prevent
analysis (EDA), feature selection, model training and model bias and improve prediction accuracy. Unbalanced
evaluation, and model deployment as a Flask web application. datasets can cause skewed model performance, when the
First, to guarantee data quality and consistency, the dataset model shows a bias towards predicting the majority class,
which includes vital customer data including account duration, leading to worse than ideal performance when it comes to
international plan status, voicemail messages, call statistics, identifying occurrences of the minority class, like churn cases.
and churn labels is carefully cleaned and preprocessed. To accomplish this balance, methods like as resampling and
Subsequently, feature selection attempts are guided by the Synthetic Minority Oversampling Technique (SMOTE) are
insights obtained from the dataset's features and distributions used, either by duplicating instances of the minority class or
through the application of EDA approaches. he most pertinent by creating synthetic samples that correspond to the majority
predictors for churn prediction are then found using feature class. Preprocessing also includes data transformation and
selection techniques, such as assessing the impact of various cleaning to improve data quality and guarantee feature
attributes on the predictor variable. This system uses a range of consistency. This includes addressing outliers, inconsistencies,
machine learning methods, such as MLP Classifier, XGBoost, and missing values, all of which, if ignored, can negatively
Decision Tree, and Random Forest, to forecast customer impact the performance of the model.
attrition in the telecom sector. The dataset is used to train and (EDA) is essential for comprehending the properties of
assess these algorithms in order to find out how well they the dataset and providing guidance for further modelling
predict churn. decisions. We can learn more about feature distributions, spot
outliers, and investigate connections with the goal variable,
churn, by using EDA. While bivariate analysis examines
correlations between pairs of features and their link with
churn, univariate analysis assists in identifying outliers and
anomalies among individual features. This analysis helps with
feature and model selection and offers insightful information
about the data structure. Recursive Feature Elimination (RFE),
which iteratively chooses the most instructive features for
model training, is one feature selection technique used in
preprocessing. We can improve prediction accuracy and lower
the chance of overfitting by choosing the most pertinent
characteristics, which will improve the model's capacity to
Fig 1: System Architecture generalise to new data.

Procedure – Model building and evaluation


The process of generating dependable predictive models for
SYSTEM SPECIFICATIONS telecom customer churn prediction entails a number of crucial
elements in the model building process. First, a collection of
2.1 SOFTWARE REQUIREMENTS machine learning algorithms is chosen according to how well
they fit the dataset's properties and the challenge at hand. A few
of these algorithms are XGBoost, Decision Tree, Random
• Language: Python Forest, K-Nearest Neighbours (KNN), Gradient Boosting,
• Libraries: pandas, NumPy, scikit-learn, XGBoost, Support Vector Classifier (SVC), and Logistic Regression .
seaborn, MGD_outliers, matplotlib Subsequently, the chosen algorithms are implemented,
• Flask web framework creating a lexicon of models for assessment. The performance
• HTML/CSS web interface of each model is then determined by training and evaluating it
• Jupyter Notebook using a cross-validation methodology. In particular, a 5-fold
cross-validation technique is used,
in which each model is trained on four subsets (or folds) of the
dataset and tested on the remaining fraction. Every subset
serves as the test set once during the five repetitions of this
process.
Because it strikes a compromise between recall and
precision, the F1 score is employed as the evaluation metric
during cross-validation to assess each model's performance.
This makes it appropriate for imbalanced classification issues
such as churn prediction. Furthermore, a scoring function is
generated using the make_scorer function to guarantee the
validity of the assessment.
For additional analysis, the cross-validation scores for
Fig 2: Performance of Random forest
every model are kept in a dictionary called cv_scores_models.
The performance of each model over various dataset folds is
represented by these scores. The distribution of cross-
validation scores for each model is displayed in a boxplot,
which allows for the visualization and comparison of the
models' performances. This visualization makes it simple to
compare the performance of the models and aids in
determining which ones are the most promising for additional
assessment. After training, the models are tested using the
validation set to determine how well they perform using
measures like receiver operating characteristic (ROC) curves,
accuracy, precision, recall, and F1-score. This makes it
possible to evaluate several machine learning methods and
choose the best model or models for additional research.
In general, the process of developing a model entails
employing cross-validation to systematically assess a variety of
machine learning algorithms and choosing the top-performing Fig 3: F1 score for Random forest
models according to their F1 scores. By using an iterative
process, the final predictive model is made to be strong,
dependable, and appropriate for forecasting telecom customer
attrition.

Procedure – Model deployment with Flask UI


The trained model is deployed to a web-based environment
for practical use in the process of integrating the churn
prediction model with a Flask UI. The Flask application loads
the churn prediction model, which enables it to forecast fresh
data that users submit via the user interface. Users can input
customer information and receive churn forecasts through the
user-friendly interface (UI) created for the model. It is possible
to integrate feedback systems and visualizations to improve the Fig 4: Performance of XGBoost
user experience. The performance and dependability of the
integrated system are guaranteed by extensive testing and
certification. Through an easily navigable online interface, this
smooth implementation enables telecom firms to leverage
machine learning for customer retention initiatives.
Through the usage of Flask's lightweight web application
framework, decision-makers can access the trained model via
an intuitive interface. Users may input customer data and
receive real-time attrition forecasts thanks to this seamless
deployment, which supports proactive retention strategies and
well-informed decision-making.

V. MODEL PERFORMANCE

Fig 5: F1 score for XGBoost


VI. RESULT

Out of all the models that were assessed, logistic regression


had comparatively lower F1 scores, which range from 0.7716
to 0.7951. Support vector classifiers (SVC) and K-nearest
neighbours (KNN) both perform consistently and well; their F1
scores range from 0.8818 to 0.9163 and 0.8983 to 0.9392,
respectively. The decision tree exhibits consistent
performance, attaining F1 scores within the range of 0.9101 to
0.9351.
As more features are chosen, Random Forest and
XGBoost continuously display higher F1 scores.Achieving
higher F1 scores requires choosing the ideal number of
characteristics. In this instance, choosing 10 characteristics Fig 8: Prediction for Churned customer
gives both models their highest F1 scores.The higher scores
across various feature selections suggest that XGBoost
performs better than Random Forest in terms of F1
scores.These findings imply that the features chosen have a
major influence on how well both models perform, and
choosing the right features can increase the model's predictive
power and accuracy of the target variable.These findings
suggest that using XGBoost with 10 chosen features is advised
since it regularly produces higher F1 scores..

Fig 9: Prediction for Non-Churned customer

VII. CONCLUSION

In conclusion, our project's churn prediction algorithm


offers No-Churn Telecom a big chance to proactively keep
clients and raise customer satisfaction levels. Through precise
identification of high-risk clients, the organisation can employ
tailored retention tactics and offer customised promotions to
reduce customer turnover. Increased client loyalty, lower churn
rates, and eventually higher business profitability are all
possible outcomes of this strategy. Throughout the project, we
followed the best standards in data science, which include
Fig 6: Cross validation scores
careful feature engineering, thorough performance evaluation,
rigorous data preprocessing, and judicious model selection. The
aforementioned procedures emphasised the significance of
comprehending organisational goals and customising analytical
techniques to effectively tackle practical issues. The experiment
also shown how effective XGBoost is as a strong tool for churn
prediction jobs.
In the future, telecom churn prediction research will focus
on data augmentation to create a larger dataset, sophisticated
feature engineering to gain deeper understanding, and ensemble
learning to combine models for higher accuracy. Investigating
deep learning techniques such as RNNs and CNNs may
improve predicted performance even more. Timely intervention
is made possible by the real-time deployment of models
Fig 7: Input Interface for Churn Prediction Model coupled with telecom systems, and ongoing monitoring
guarantees that the models remain relevant. Telecom firms are
able to improve churn prediction, strengthen customer retention
tactics, and cultivate long-term customer loyalty through the
collection of varied data, use of sophisticated techniques, and
implementation of models in production.
REFERENCES

[1] M, Aishwarya & T, Bindhiya & Tanisha, S & B, Soundarya [10] F. Alhaqui, M. Elkhechafi and A. Elkhadimi, "Machine
& Shanuja, C. (2023). Customer Churn Prediction Using learning for telecoms: From churn prediction to customer
Synthetic Minority Oversampling Technique. 01-05. relationship management," 2022 IEEE International Conference
10.1109/C2I659362.2023.10430989. on Machine Learning and Applied Network Technologies
(ICMLANT), Soyapango, El Salvador, 2022, pp. 1-5, doi:
[2] M. Galal, S. Rady and M. Aref, “Enhancing Customer 10.1109/ICMLANT56191.2022.9996496.
Churn Prediction in Digital Banking using Ensemble
Modeling,” 2022 4th Novel Intelligent and Leading Emerging [11] S. D. Kumar, K. Soundarapandiyan and S. Meera,
Sciences Conference (NILES), Giza, Egypt, 2022, pp. 21–25, "Comparative Study of Customer Churn Prediction Based on
doi:10.1109/NILES56402.2022.9942408. Data Ensemble Approach," 2023 Intelligent Computing and
Control for Engineering and Business Systems (ICCEBS),
[3] H. Karamollaoğlu, İ. Yücedağ and İ. A. Doğru, "Customer Chennai, India, 2023, pp. 1-10, doi:
Churn Prediction Using Machine Learning Methods: A 10.1109/ICCEBS58601.2023.10449139.
Comparative Analysis," 2021 6th International Conference on
Computer Science and Engineering (UBMK), 2021, pp. 139- [12] D. Azzam, M. Hamed, N. Kasiem, Y. Eid and W. Medhat,
144, doi: 10.1109/UBMK52708.2021.9558876. "Customer Churn Prediction Using Apriori Algorithm and
Ensemble Learning," 2023 5th Novel Intelligent and Leading
[4] Xiaowei Zhang and Juanqiong Gou, "Warning model of Emerging Sciences Conference (NILES), Giza, Egypt, 2023,
customer churn based on emotions," 2015 International pp. 377-381, doi: 10.1109/NILES59815.2023.10296608.
Conference on Logistics, Informatics and Service Sciences
(LISS), Barcelona, 2015, pp. 1-3, doi:
10.1109/LISS.2015.7369683.

[5] M. D. S. Rahman, M. D. S. Alam and M. D. I. Hosen, "To


Predict Customer Churn By Using Different Algorithms," 2022
International Conference on Decision Aid Sciences and
Applications (DASA), 2022, pp. 601-604, doi:
10.1109/DASA54658.2022.9765155.

[6] J. Ran and X. Cheng, "Airline Customer Value Analysis


and Customer Churn Prediction Based on LRFMC Model and
K-means Algorithm," 2021 2nd International Conference on
Computer Science and Management Technology (ICCSMT),
Shanghai, China, 2021, pp. 185-193, doi:
10.1109/ICCSMT54525.2021.00044.

[7] K. Kim and J. -H. Lee, "Bayesian Optimization of


Customer Churn Predictive Model," 2018 Joint 10th
International Conference on Soft Computing and Intelligent
Systems (SCIS) and 19th International Symposium on
Advanced Intelligent Systems (ISIS), Toyama, Japan, 2018,
pp. 85-88, doi: 10.1109/SCIS-ISIS.2018.00024.

[8] P. Hemalatha and G. M. Amalanathan, "A Hybrid


Classification Approach for Customer Churn Prediction using
Supervised Learning Methods: Banking Sector," 2019
International Conference on Vision Towards Emerging Trends
in Communication and Networking (ViTECoN), Vellore,
India, 2019, pp. 1-6, doi: 10.1109/ViTECoN.2019.8899692

[9] J. Yang, "Design of E-commerce Customer Churn


Prediction System Based on Data Mining Techniques," 2023
IEEE 3rd International Conference on Social Sciences and
Intelligence Management (SSIM), Taichung, Taiwan, 2023,
pp. 114-118, doi: 10.1109/SSIM59263.2023.10468983.

You might also like