0% found this document useful (0 votes)

10 views9 pages

Churn Prediction in Telecom Using Machine Learning in R

The document discusses a churn prediction system for the telecom industry using machine learning techniques in R. It details the data structure, preprocessing steps, model training with logistic regression, random forest, and neural networks, and evaluates their performance based on accuracy, sensitivity, and specificity. The findings indicate that logistic regression is the most effective model for predicting customer churn, providing valuable insights for strategic decision-making.

Uploaded by

Syed Kashif Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views9 pages

Churn Prediction in Telecom Using Machine Learning in R

Uploaded by

Syed Kashif Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Churn Prediction in Telecom using

Machine Learning in R

Table of Contents
1. Introduction ............................................................................................................................ 2

a. Background ........................................................................................................................ 2

b. Problem Identification ....................................................................................................... 2

c. Goal and Scope .................................................................................................................. 2

2. Data ........................................................................................................................................ 3

a. Data Structure ................................................................................................................ 3

b. Preprocessing ................................................................................................................. 4

3. Analysis and Modelling ......................................................................................................... 5

a. Data Splitting ................................................................................................................. 5

b. Model Training............................................................................................................... 6

c. Model Evaluation ........................................................................................................... 7

d. Feature Importance ........................................................................................................ 7

4. Conclusion ............................................................................................................................. 8

a. Best Model for Predicting Customer Churn .................................................................. 8

b. ROC Curve Comparison ................................................................................................ 9

1|Page
1. Introduction

a. Background
In the highly competitive telecom industry, retaining these customers now beats attracting new
customers into the game. Customer churn is one of the biggest challenges that will be faced by
telecom companies – decrease in revenue attributed to the reduced number of subscribers who
move to their competitors. With customer retention being cheaper than attraction, reducing
churn has thus become critical to companies. The rise of machine learning (ML) has greatly
improved customer behaviour analysis over the past decade in providing patterns and early
actions to retain customers. These analytics enable operators to be aware of when customers
may leave and develop customized retention strategies thus increasing the customer loyalty
and satisfaction.

b. Problem Identification
Customer churn is directly related to firm profitability, customer-value over time and general
market stability. Classical analytical approaches are often unable to identify the sophisticated
relationships between such customer characteristics as billing pattern, contract details, usage
pattern, and demographic data pertaining to churn. This lapse causes delayed responses or
inability to forecast churn risks at all. Telecom companies work with massive pools of data
ranging from the background of clients, service engagements, and billings history.
Unfortunately, companies do not take full benefit as long as churn prediction systems are not
there. The major barriers are: accurate prediction of churn; the cause of customer churn; and
which factors have the greatest impact on the churn of customers.

c. Goal and Scope

This work’s main objective is to develop a churn prediction system by making use of supervised
machine learning approaches. Models are trained on customer data that is labelled to determine
and label the probability of individual customer churn. The main activities of the project are as
follows: examining the primary data that the customer has, cleansing and formatting the data
such that it can be used by the models, developing the classification models using three
techniques such as the logistic regression for ensuring that clarifications of interpretation are
clear, the random forest for its strong prediction result, and the neural network to help it handle

2|Page
Moreover the project delves into feature importance and it explains the key drivers to churn
and visualizes model outputs in terms of ROC curves and confusion matrices. The work is
confined to binary churn classification from structured data and uses R programming and its
ML ecosystem, which includes caret, RandomForest, nnet, pROC, and ggplot2. The main
purpose is to enable action insights for telecom businesses to create successful retention
strategies and make data-driven decisions.

2. Data

a. Data Structure
The data set for this study has 7043 records and 21 attributes that describe a single telecom
customer. The dataset comprises the following data: demographics (gender, senior citizen);
account details (tenure, contract); service customers use (phone, internet); and financial metrics
(monthly and total charges). The major objective of this research is to model these features for
the purpose of predicting the binary indicator of target variable Churn, which is the indication
for the defection of a customer from the service (Yes) or not (No). InternetService and Gender
are among a set of factor variables that are numerically valued together with the continuous
MonthlyCharges and tenure. By using the R function str(), an initial review found most
columns had been correctly assigned data types except for TotalCharges column that was
mistakenly retained as a character field due to its null values.

3|Page
b. Preprocessing
Handling Missing Values
Two primary procedures were executed to date during the data cleaning process. First, missing
values were addressed. For instance, the totalCharges column had empty fields which were
filled by NA values. The na.omit() function was then used to eliminate these rows thus deleting
just a miniscule percentage of the data while ensuring that the modeling records were complete
and consistent. The R code for this step is as follows:

Outlier Detection
Second, outlier detection was performed. Using MonthlyCharges and TotalCharges, potential
extremes that could threaten model bias were searched and evaluated through boxplots. While
some (though quite a few) unusually high values were detected, they appeared in accordance
to actual customer behavior (for example, long term customer making high total charges).
Consequently, all observations were not outliers, and thus not discarded in the dataset. Instead
of exclusions, normalization of numerical features was carried out prior to training, to facilitate
neural network algorithms. According to the boxplots, MonthlyCharges were more common in
the range from 20 to 90, while TotalCharges were less uniform. These findings guaranteed that
the data set was mistake or outlier-free, a good base for building the model.
Boxplot for MonthlyCharges:

4|Page
Boxplot for TotalCharges:

3. Analysis and Modelling

a. Data Splitting
For model effectiveness evaluation, the dataset was divided into two sets: 70% for training and
30% for validation. For this purpose, the caret package create DataPartition() function was used
to make sure the classification balance of the target variable Churn was preserved in training
and validation sets. The use of such stratified sampling procedure is an important factor in
reducing sampling bias and equal distribution of churned and retained customers in the course
of model evaluation.

5|Page
b. Model Training
We developed three classification artificial intelligence algorithms that should predict customer
churn. The adopted classification model presentations focus on a balanced style of prediction
development, which is at once understandable and dependable, as well as in possession of
potential to express non-linear relations within the data.
Logistic Regression: This is one of the widely used approaches in addressing the problem
binary classifications. With use of a logistic function over a linear combination of input
features, it estimates the probability of customer exit from the service. This is where its strength
is; it provides insights by clearly demonstrating how individual factors contribute to customers
churning.

Random Forest: This ensemble technique during training constructs several decision trees and
obtains the predicted class as the mode of the individual tree outputs. When we use this method,
we can increase accuracy and reduce overfitting. Additionally, it shows the role played by every
feature in establishing the outcomes.

Neural Network: A feedforward model with a single hidden layer and five hidden nodes was
used. During training using the nnet package, numerical standardized features were entered
into the model. Although care should be taken to preprocess the inputs, in a way that in this
case, scaling the features, neural networks are useful for learning fine-grain patterns.

6|Page
c. Model Evaluation

Performances were quantitatively measured by using the following three main metrics of
evaluation:
• Accuracy: The proportion of correct predictions by the model from all the predictions
made.
• Sensitivity (Recall): Percentage of true churning customers predicted by the model’s
predictions.
• Specificity: The percentage of customers who were not churned by the model that gave
the right assessment.
In order for the models to be evaluated directly, each of them was run on to the same validation
data set. As it can be seen in the table below, the evaluation result for each model is given.
Model Accuracy Sensitivity Specificity
Logistic Regression 0.8140 0.5911 0.8947
Random Forest 0.8027 0.5482 0.8947
Neural Network 0.8050 0.5393 0.9012

d. Feature Importance
Importance of all the features was calculated through random forest analysis by using mean
decrease in Gini impurity. The score shows the extent this contributes to reducing the
unpredictability of the classification task. The results of the analysis showed that factors of
TotalCharges, MonthlyCharges, tenure and Contract most contributed to the prediction of the

7|Page
churn outcomes. These findings make sense as they demonstrate that a customer’s financial
implication and loyalty to the firm were interlinked.
To demonstrate the importance ranking, the following command was used:

4. Conclusion

a. Best Model for Predicting Customer Churn

Logistic Regression was determined to be the best of the models - Logistic Regression, Random
Forest, and Neural Network in the prediction of customer churn. This model was shown to have
excellent expertise in identifying at-risk customers as the accuracy and sensitivity were 81.4%
and 59.11%, respectively. The Neural Network’s increased specificity was paid for with
reduced sensitivity which reduced collective efficacy on churn pre-identification. Random

8|Page
Forest displayed strong results, but its accuracy and sensitivity were both inferior to those
reached by Logistic Regression.

That Logistic Regression offers clear insights about feature importance makes it of particular
value to the business stakeholders who can use understandings to make strategic decisions.
This amount of clarity allows for making workable business strategies and explaining results
to stakeholders without technical knowledge. Consequently, Logistic Regression is the
preferred way to evaluate customer churn in this telecom dataset.

b. ROC Curve Comparison

In order to qualitatively examine the performance of the different models, all three ROC curves
were plotted. The ROC curve explains the interaction between sensitivity and specificity during
the process of model evaluation. The AUC value shows the degree to which the model can
distinguish between churn and non churn cases. Each of the three models showed satisfactory
performance; and the Logistic Regression showed a slightly higher curve.

9|Page

What Is Apheresis
100% (1)
What Is Apheresis
2 pages
Marketing Plan Bata Pakistan Final
No ratings yet
Marketing Plan Bata Pakistan Final
22 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
No ratings yet
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
16 pages
Customer_Churn_Prediction_Capstone_Projectdocx (1)
No ratings yet
Customer_Churn_Prediction_Capstone_Projectdocx (1)
11 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
Predicting Churn Customer in Telecom Using Peergrading Regression Learning Technique
No ratings yet
Predicting Churn Customer in Telecom Using Peergrading Regression Learning Technique
13 pages
Paper Published
No ratings yet
Paper Published
5 pages
Synopsis Major Project
No ratings yet
Synopsis Major Project
8 pages
Data science case report
No ratings yet
Data science case report
20 pages
1-s2.0-S2590123024014208-main
No ratings yet
1-s2.0-S2590123024014208-main
12 pages
Customer_Churn_Prediction_Using_Machine_Learning_Algorithms
No ratings yet
Customer_Churn_Prediction_Using_Machine_Learning_Algorithms
6 pages
Algorithms 17 00231
No ratings yet
Algorithms 17 00231
21 pages
Churn Forecasting Using Deep Ljearning Model
No ratings yet
Churn Forecasting Using Deep Ljearning Model
5 pages
Comparative Analysis of Predictive Models For Customer Churn Prediction in The Telecommunication Industry
No ratings yet
Comparative Analysis of Predictive Models For Customer Churn Prediction in The Telecommunication Industry
6 pages
Customer Churn Prediction Using Machine Learning
No ratings yet
Customer Churn Prediction Using Machine Learning
7 pages
Customer Churn Telecom
No ratings yet
Customer Churn Telecom
35 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Classification of Customer Churn Prediction Model For Telecommunication Industry Using Analysis of Variance
No ratings yet
Classification of Customer Churn Prediction Model For Telecommunication Industry Using Analysis of Variance
7 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Analysis of Customer Churn Prediction in Telecom Industry Using Decision Trees and Logistic Regression
No ratings yet
Analysis of Customer Churn Prediction in Telecom Industry Using Decision Trees and Logistic Regression
4 pages
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
No ratings yet
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
7 pages
Customer Churn Prediction Capstone Himanshu
No ratings yet
Customer Churn Prediction Capstone Himanshu
5 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
1-7-Machine-Learning-Approaches-for-Customer-Churn-Prediction-in-Telecommunications (2)
No ratings yet
1-7-Machine-Learning-Approaches-for-Customer-Churn-Prediction-in-Telecommunications (2)
7 pages
Predictive Analytics Customer Churn (1)
No ratings yet
Predictive Analytics Customer Churn (1)
3 pages
Ref 1
No ratings yet
Ref 1
10 pages
3_6
No ratings yet
3_6
7 pages
DWDM Cep
No ratings yet
DWDM Cep
13 pages
Analysis_of_Telecom_Churn_using_Machine_Learning_Techniques
No ratings yet
Analysis_of_Telecom_Churn_using_Machine_Learning_Techniques
6 pages
Model 1 - Customer Churn Prediction in Telecom Using ML
No ratings yet
Model 1 - Customer Churn Prediction in Telecom Using ML
24 pages
Research Churn
No ratings yet
Research Churn
4 pages
Churn PredictionITNACC
No ratings yet
Churn PredictionITNACC
7 pages
Duplichecker-Plagiarism-Report (72)
No ratings yet
Duplichecker-Plagiarism-Report (72)
2 pages
Iranian Churn
No ratings yet
Iranian Churn
16 pages
Churn Prediction in Mobile Telecom Syste PDF
No ratings yet
Churn Prediction in Mobile Telecom Syste PDF
5 pages
Research paper_Tushar Agrawal
No ratings yet
Research paper_Tushar Agrawal
3 pages
A Customer Churn Prediction Using Pearson Correlation Function and K Nearest Neighbor Algorithm For Telecommunication Industry
No ratings yet
A Customer Churn Prediction Using Pearson Correlation Function and K Nearest Neighbor Algorithm For Telecommunication Industry
14 pages
Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince
No ratings yet
Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince
2 pages
Bda Review
No ratings yet
Bda Review
13 pages
CHURNFORGE Research Paper Kajal
No ratings yet
CHURNFORGE Research Paper Kajal
6 pages
A Survey on Customer Churn Prediction In
No ratings yet
A Survey on Customer Churn Prediction In
6 pages
Abhishekj uvatkar
No ratings yet
Abhishekj uvatkar
4 pages
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
100% (1)
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
14 pages
output_4
No ratings yet
output_4
5 pages
Comparative_Study_of_Customer_Churn_Prediction_Based_on_Data_Ensemble_Approach
No ratings yet
Comparative_Study_of_Customer_Churn_Prediction_Based_on_Data_Ensemble_Approach
10 pages
Group 13 - Analyzing Customer Churn
No ratings yet
Group 13 - Analyzing Customer Churn
6 pages
Customer Churn Prediction System: A Machine Learning Approach
No ratings yet
Customer Churn Prediction System: A Machine Learning Approach
24 pages
Paper3 On Chrun Prediction
No ratings yet
Paper3 On Chrun Prediction
16 pages
Customerchurnprediction systema machinelearning
No ratings yet
Customerchurnprediction systema machinelearning
24 pages
Project Report
No ratings yet
Project Report
83 pages
Customer Churn Prediction in Telcom Industry Using Data Mining Techniques
No ratings yet
Customer Churn Prediction in Telcom Industry Using Data Mining Techniques
14 pages
Computing Efficient Features Using Rough Set Theory Combined With Ensemble Classification Techniques To Improve The Customer Churn Prediction in Telecommunication Sector
No ratings yet
Computing Efficient Features Using Rough Set Theory Combined With Ensemble Classification Techniques To Improve The Customer Churn Prediction in Telecommunication Sector
22 pages
Churn Rate DPV
No ratings yet
Churn Rate DPV
15 pages
Telecom_Customer_Churn
No ratings yet
Telecom_Customer_Churn
5 pages
Swasti Arya
No ratings yet
Swasti Arya
6 pages
Customer Churn in Subscription Business Model-Pred - Copy
No ratings yet
Customer Churn in Subscription Business Model-Pred - Copy
7 pages
journal.pone.0278095
No ratings yet
journal.pone.0278095
21 pages
Ieor-Mid Term-1 2k20 Pe 25
No ratings yet
Ieor-Mid Term-1 2k20 Pe 25
13 pages
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Africa South: Viewpoints 1956-1961
No ratings yet
Africa South: Viewpoints 1956-1961
50 pages
The Excellent Path To Enlightenment - A Short Preliminary Practice - Jamyang Khyentse Wangpo
No ratings yet
The Excellent Path To Enlightenment - A Short Preliminary Practice - Jamyang Khyentse Wangpo
30 pages
Chevrolet Performance Guide: 2012 Camaro
No ratings yet
Chevrolet Performance Guide: 2012 Camaro
5 pages
Inter House Competition Schedule and Rules for April to July 2025
No ratings yet
Inter House Competition Schedule and Rules for April to July 2025
2 pages
Download ebooks file COMBEE - Harriet Tubman, the Combahee River raid, and black freedom during the Civil War 1st Edition Fields-Black all chapters
100% (2)
Download ebooks file COMBEE - Harriet Tubman, the Combahee River raid, and black freedom during the Civil War 1st Edition Fields-Black all chapters
65 pages
Bank Credits
100% (1)
Bank Credits
11 pages
Swimming Activities PDF
No ratings yet
Swimming Activities PDF
2 pages
Academy of Management
No ratings yet
Academy of Management
20 pages
Full Download Essentials of Human Diseases and Conditions 6th Edition Frazier Test Bank All Chapter 2024 PDF
100% (22)
Full Download Essentials of Human Diseases and Conditions 6th Edition Frazier Test Bank All Chapter 2024 PDF
34 pages
Test Machine
No ratings yet
Test Machine
4 pages
Annual Supervisory Plan
No ratings yet
Annual Supervisory Plan
23 pages
Presentation 9 - Paraphrasing Techniques
No ratings yet
Presentation 9 - Paraphrasing Techniques
27 pages
Math 5 TOS 24 25 2nd Quarter
No ratings yet
Math 5 TOS 24 25 2nd Quarter
2 pages
[Ebooks PDF] download Orthodontic Treatment of Class III Malocclusion 1st Edition Peter W. Ngan full chapters
100% (4)
[Ebooks PDF] download Orthodontic Treatment of Class III Malocclusion 1st Edition Peter W. Ngan full chapters
51 pages
001 Min PDF
No ratings yet
001 Min PDF
30 pages
Collin County: Capitalization Policy
No ratings yet
Collin County: Capitalization Policy
32 pages
Unit 1 Test 1-4
No ratings yet
Unit 1 Test 1-4
5 pages
CNC USB Controller API: User Manual
No ratings yet
CNC USB Controller API: User Manual
29 pages
Rfusipil - Umm,+2021+februari Heni 4 Check
No ratings yet
Rfusipil - Umm,+2021+februari Heni 4 Check
8 pages
Saldariega v. Hon. Panganiban
No ratings yet
Saldariega v. Hon. Panganiban
3 pages
Katalog 2019 - EN - v1 - Web PDF
No ratings yet
Katalog 2019 - EN - v1 - Web PDF
120 pages
CBRE
No ratings yet
CBRE
9 pages
38 Azimuth-2p-06 09 22
No ratings yet
38 Azimuth-2p-06 09 22
2 pages
The Problem With Woke Capitalism
No ratings yet
The Problem With Woke Capitalism
12 pages
BocadeLeonesV2, Genealogia de Boca de Leones NL
No ratings yet
BocadeLeonesV2, Genealogia de Boca de Leones NL
368 pages
Gamete: Inheritance Matching
No ratings yet
Gamete: Inheritance Matching
2 pages
FM Project Archroma Final
No ratings yet
FM Project Archroma Final
10 pages
The Elements of Visual Arts
No ratings yet
The Elements of Visual Arts
20 pages