Customer Churn Prediction Using Machine Learning
Customer Churn Prediction Using Machine Learning
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In today’s highly competitive telecom sector, performing exploratory data analysis (EDA), we aim to
customer churn — the loss of clients to competitors — poses a identify patterns that differentiate loyal customers from those
major threat to revenue and growth. This project tackles churn likely to leave. The goal is not only to predict churn but also
prediction using machine learning, focusing on the Random to understand the underlying factors contributing to it.
Forest algorithm to identify customers likely to leave. The
Telco Customer Churn dataset, containing customer The implementation of this churn prediction model serves as a
demographics, service usage, and account details, serves as decision-support system for marketing and customer service
the foundation.The workflow begins with exploratory data departments. By identifying high-risk customers early,
analysis (EDA) to uncover key trends and indicators of churn. targeted interventions such as discounts, personalized
A robust preprocessing pipeline is then applied, including communication, or service improvements can be deployed to
handling missing data, encoding categories, scaling, and improve retention. This project demonstrates how leveraging
addressing class imbalance. Random Forest is chosen for its machine learning can turn raw customer data into actionable
accuracy and interpretability, and its performance is compared business insights, helping organizations become more
against models like Logistic Regression, SVM, and XGBoost proactive, customer-centric, and competitive in the digital age.
using metrics such as precision, recall, F1-score, and ROC-
1.1 DOMAIN INTRODUCTION
AUC.Results show that contract type, tenure, and billing-
related features significantly influence churn. The model not The telecommunication industry is one of the most dynamic
only predicts churn with high accuracy but also provides and data-intensive sectors in the world, providing essential
actionable insights through feature importance and services such as mobile communication, broadband, cable
visualization tools. This supports data-driven retention television, and internet connectivity to billions of users
strategies like targeted offers or improved services.Ultimately, globally. With rapid technological advancements and
the project showcases how machine learning enhances increasing competition, telecom companies are constantly
customer relationship management (CRM) and can be adapted striving to improve service quality, reduce operational costs,
for similar use cases in banking, insurance, and e- and enhance customer satisfaction. In such a saturated market,
commerce. one of the major concerns is customer churn, where
subscribers switch from one service provider to another, often
Key Words: Customer churn, churn prediction, Random due to dissatisfaction, better offers, or perceived lack of value.
Forest, machine learning, telecom industry, predictive
modeling, supervised learning Customer churn not only results in immediate revenue loss but
also affects a company's long-term profitability and market
share. Acquiring new customers often costs significantly more
1.INTRODUCTION
than retaining existing ones. Hence, understanding and
predicting churn behavior is vital for telecom operators.
In today’s competitive business landscape, retaining existing
Factors influencing churn can range from poor network
customers is as crucial as acquiring new ones, especially in
quality and high service charges to limited service features
subscription-based industries like telecommunications,
and ineffective customer support. Telecom companies now
banking, insurance, and internet services. One of the biggest
rely heavily on data analytics and machine learning to extract
challenges faced by companies is customer churn, which
meaningful patterns from customer data, enabling smarter
refers to when customers discontinue using a company’s
business decisions and customer retention strategies.
product or service. Predicting churn in advance allows
businesses to take proactive measures to retain customers,
ultimately reducing revenue loss and increasing customer
satisfaction. With the advent of data-driven strategies,
machine learning has emerged as a powerful tool to analyze
historical data and forecast customer behavior.
The system should be available for use Data For saving and
10 Availability as needed and should not require Serialization joblib or pickle loading trained
constant maintenance. (optional) models
To optimize
Hyperparameter GridSearchCV from
algorithm
3.4 REQUIRED LIBRARIES AND FRAMEWORKS : Tuning sklearn.model_selection
parameters
Category Library / Framework Purpose Manages
Environment project
Primary Management Anaconda or virtualenv dependencies
programming (optional) and isolated
language for environments
Core Language Python 3.8+ model
development
and data
processing
4. MODELS AND METHODS
Data Handling Pandas Data
manipulation
and
preprocessing
(loading,
cleaning,
4.1 Random Forest Algorithm The data includes parameters such as:
Random Forest is a supervised ensemble learning method
used for classification and regression. It builds multiple Time spent on different modules
decision trees on random data subsets and combines their
outputs for improved accuracy. In Learnwise, it's used to Quiz scores and attempts
predict learner success, dropout risk, and engagement levels,
offering robust and reliable educational insights. Session durations
By understanding how learners interact with the platform, we Linear Regression was initially used to model simple
can better personalize content recommendations and predict predictive tasks, such as:
user outcomes such as course completion or dropout
likelihood. Predicting expected course completion time.
5.1.2 Training Dataset Acquisition Estimating learner engagement scores based on early
activity patterns.
The effectiveness of the AI models powering Learnwise relies
heavily on two factors: the richness of collected interaction Working:
parameters and the quality of the training dataset.
Independent variables: Time spent, number of
To achieve this, we curated datasets by collecting user logs modules accessed, quiz attempts.
from:
Dependent variable: Engagement score or course
Pilot Learnwise platform usage sessions. completion probability.
Publicly available learning behavior datasets (e.g., The model fits a linear equation to the input features
from Kaggle educational repositories). and predicts continuous outcomes.
Used scikit-learn's LinearRegression() class. using one-hot encoding to convert categories into binary
columns. For binary variables like Partner or Dependents, we
Training involved minimizing Mean Squared Error used label encoding to map Yes and No values to 1 and 0.
(MSE) between predicted and actual engagement This made the data suitable for machine learning models that
outcomes. require numerical input.
Post-training, model predictions were sorted to To enhance the predictive power of the model, we created
highlight at-risk learners needing additional support. new derived features. One such feature was tenure_group,
which grouped customers into bins based on their tenure (e.g.,
"0–12 months", "13–24 months", etc.). This captured
customer loyalty stages and revealed strong correlations with
churn probability. Another useful feature was
MonthlyCharges_to_TenureRatio, which helped identify
customers who were paying a high monthly fee but had low
tenure—often early indicators of dissatisfaction.
Despite its success, the project also acknowledged certain [5] Social network analysis for customer churn predictionW
limitations, such as the model's complexity and longer Verbeke, D Martens, B Baesens - Applied Soft Computing,
prediction times compared to lighter algorithms. Additionally, 2014 - Elsevier
interpretability remains a challenge with ensemble models like
Random Forest. Addressing these limitations in future [6] A customer churn prediction model in telecom industry
iterations—such as using explainable AI techniques or using boostingN Lu, H Lin, J Lu, G Zhang - IEEE
exploring gradient boosting models—can further enhance the Transactions on Industrial …, 2012 - ieeexplore.ieee.org
system's transparency and effectiveness.
ACKNOWLEDGEMENT