Ott Subscriber Churn Prediction Using Machine Learning - Copy
Ott Subscriber Churn Prediction Using Machine Learning - Copy
CSUSB ScholarWorks
5-2023
Recommended Citation
Senthil Kumar, Needhi Devan, "OTT SUBSCRIBER CHURN PREDICTION USING MACHINE LEARNING"
(2023). Electronic Theses, Projects, and Dissertations. 1660.
https://scholarworks.lib.csusb.edu/etd/1660
This Project is brought to you for free and open access by the Office of Graduate Studies at CSUSB ScholarWorks.
It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator
of CSUSB ScholarWorks. For more information, please contact [email protected].
OTT SUBSCRIBER CHURN PREDICTION
A Project
Presented to the
Faculty of
San Bernardino
In Partial Fulfillment
Master of Science
in
by
May 2023
OTT SUBSCRIBER CHURN PREDICTION
A Project
Presented to the
Faculty of
San Bernardino
by
May 2023
Approved by:
Dr. Conrad Shayo, Member, Reader & Department Chair, Information and
Decision Sciences
© 2023 Needhi Devan Senthil Kumar
ABSTRACT
Learning algorithms can be used to predict churn and develop targeted retention
strategies to address the specific needs and concerns of at-risk subscribers. The
customer targeting? The dataset was collected from the Kaggle repository and
Then, evaluate the performance of each algorithm to find out the highest
accuracy model. The findings and conclusion for each question are 1) Logistic
analysis. 2) By sending the test data to a trained model by their historical dataset,
boosting machine model was found to have the highest accuracy and maximum
AUROC, making it a powerful tool in the fight against customer churn. Areas for
iii
techniques, and integrating real-time data sources to improve the accuracy and
iv
ACKNOWLEDGEMENTS
I would like to thank my parents and my friends for their support and
Also, I would like to thank Dr. William Butler and Dr. Conrad Shayo for
v
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ............................................................................ v
Multi-Layer Perceptron....................................................................... 7
Decision Tree..................................................................................... 8
Implementation ................................................................................. 13
vi
CHAPTER FIVE: DISCUSSION, CONCLUSION, AND AREAS OF FURTHER
STUDY
Discussion ........................................................................................ 18
Conclusion ........................................................................................ 19
REFERENCES ............................................................................................. 21
vii
LIST OF TABLES
viii
LIST OF FIGURES
ix
CHAPTER ONE
INTRODUCTION
OTT (Over the Top) is a term used to describe the delivery of audio, video,
and other media content over the Internet without the need for a traditional cable or
satellite television provider. OTT services provide users with an alternative source for
content that is normally only available through a cable or satellite provider. They also
offer a cost-effective way to access content, often eliminating the need for expensive
contracts and monthly fees. OTT services typically require users to sign up for an
account, either through a website or an app, and then they can access the content
they desire. OTT services can provide a range of options, from streaming live TV
channels to on-demand shows and movies. These services also often provide
additional features, such as the ability to save favorite shows, get recommendations,
and access exclusive content(Fitzgerald, 2019). Let's start by stating the problem
statement, then go on to the research questions and how this culminating experience
project is structured.
Problem Statement
Subscriber churn is a major issue for online streaming services such as Over-
the-top (OTT) platforms (Madden et al., 1999b). Churn is the rate at which
subscribers to a particular OTT service cancel their subscriptions, and it’s a major
part of customer retention for OTT services. Churn is a complex issue, with potential
well as the effectiveness of marketing and promotional efforts. A high churn rate can
indicate that customers are not satisfied with the service, or that the marketing efforts
1
are not effective. It can also indicate that the service is too expensive, or that there
Research Questions
Learning?
Organization of Project
the goal is to explore how Machine Learning algorithms can be used to decrease
the existing churn analysis of various industries. Chapter 3 will provide research
2
CHAPTER TWO
LITERATURE REVIEW
To learn more about this issue and come up with a workable solution,
research and implementations done in the past by other authors were examined.
Let’s provide our findings in this section related to the research question: Q1. What
Agrawal et al., (2018) discussed the problem of customer churn and analyzes
previous works to identify gaps in the solutions implemented. Agrawal et al., (2018)
predict customer churn in the Syriatel telecom company. The XGBOOST algorithm,
which had an AUC value of 93.301% produced the best results. XGBOOST
continued to produce the highest results with an AUC of 89% when tested on a fresh
dataset pertaining to various time periods The outcomes of predicting churn in the
telecom business have been proven to be improved by the usage of Social Network
used in churn prediction in various industries, which is a part of our project but their
De Caigny et al., (2018) aimed to explore the viability of using the Logistic
prediction. The results showed that LLM performed better than using Logistic
Regression (LR) and Decision Tree (DT) as standalone techniques and at least
equally well as two homogeneous ensemble methods, Random Forest (RF) and
3
Logistic Model Tree (LMT). The LLM provides a comprehensible method with acting
ability, and it can enrich both DT and LR by adding the coefficients of logistic
regression to the leaves in a decision tree and by fitting several logistic regressions
new feature set for predicting customer turnover in the telecom sector, which
includes call details, account information, bill information, and other forms of data.
used to predict churn of the customers in financial industries, it lacks the methods in
To answer the research questions: Q3. How to retain subscribers and improve
customer targeting?
Ullah et al., (2019) aimed to build a churn prediction model for a telecom
company to improve its CRM and retain valuable customers. They used machine
learning techniques to analyze customer data and identify the main factors
contributing to churn. The results showed that the proposed model performed better
than other techniques and produced a better F-measure result of 88% using
Random Forest and J48. Sung Won Kim et al., (2019) also performed cluster
profiling to better understand the risk of churn for different groups of customers and
provided guidelines for customer retention. Their research is more focused on the
customer retention churn prediction model in telecom, but their research is less
Network model that identified attributes related to churn rate and achieved an
4
transformation, and selection to prepare data for four tree-based algorithms and
found that the xgboost algorithm produced the best results. De Caigny et al., (2018)
explored the viability of using the Logistic Linear Model as a classification technique
and found that it performed better than other classifiers. Huang et al., (2012)
presented a new feature set for churn prediction and evaluated seven modeling
techniques, finding that the Logistic Regression and DT/SVM were suitable for
predicting true churn rate and false churn rate. Finally, Ullah et al., (2019) used
measure result of 88% using Random Forest and J48. These studies highlight the
importance of feature selection and modeling technique selection for effective churn
prediction in the telecom, financial and social network industries and let’s implement
5
CHAPTER THREE
RESEARCH METHODOLOGY
have been explained. To answer the research question: Q1. What Machine Learning
algorithms are used to overcome subscriber churn? From the literature review the
research come to the solution that Machine learning algorithms used to overcome
Forest, Decision Trees, and Gradient Boosting Machines, let’s implement these
methods into the OTT platform industry to find subscriber churn. Let’s introduce the
machine learning algorithms that are used to predict subscriber churn analysis and
Logistic Regression
classification issues is logistic regression. The sigmoid function is used to predict the
effective, and appropriate for small datasets. To comprehend the underlying causes
linear relationship between the input variables and the result. As a foundational
6
Multi-Layer Perceptron
network, for a variety of tasks like classification, regression, and prediction. It has
many layers of neurons, which combine the inputs in a weighted way and then pass
them via an activation function. Using non-linear activation functions, MLP can
manage relationships between inputs and outputs that are not linear. It is an effective
tool for difficult classification and regression problems, but it can be computationally
expensive and prone to overfitting for big datasets. MLP is widely utilized in
Random Forest
decision trees to increase accuracy and lessen the chance of overfitting. A group of
divide on at each node depending on a parameter like information gain or Gini index.
The outcome is then determined by which decision tree received the most votes (De
In order to obtain high accuracy and lower the risk of overfitting, it integrates
numerous decision trees. It is helpful for churn prediction studies where data quality
may be an issue because it can manage missing data and is robust to outliers. Yet,
variables with more levels and not be appropriate for small datasets (Ullah et al.,
2019).
7
Decision Tree
depict decisions and their potential outcomes based on the characteristics of the
data. The tree structure is produced by recursively dividing the data into subsets
depending on the most significant predictors, and the algorithm selects the optimal
feature to split on based on a criterion like information gain or Gini index at each
variables can all be handled via decision trees. They can be unstable with slight
changes in data, are biased toward variables with more levels, and are prone to
overfitting. They are also only able to predict binary outcomes, which may not be
useful for doing so in cases when there are several alternative outcomes, such as
learning algorithm that can be used for regression, classification, and ranking tasks.
GBM is capable of handling high-dimensional data and effectively reducing bias and
such as decision trees, in a stepwise manner. The algorithm starts with a simple
model and then iteratively improves upon it by fitting new models to the residual
errors of the previous models. GBM optimizes a loss function using a gradient
descent approach to minimize the discrepancy between predicted and actual values
by adjusting the weights of the weak learners. The process is repeated until
8
To answer the research questions: Q2. How to predict subscribers’ churn in
will be collected from the OTT platform. The data may also include customer
feedback, preferences, and engagement with the platform. The collected data will be
engineering techniques will also be applied to create new features that can be useful
Data Collection
To do churn there are several characteristics that are important for subscriber
1. User demographics: Age, gender, location, and other demographic data can
2. Usage patterns: Information on how frequently and how long a user accesses
the service, which features are used, and when they are used can provide
subscription, their billing history, and their payment method can be used to
predict churn.
4. Content preferences: The types of content a user consumes, the ratings and
reviews they provide, and their engagement with content can be used to
predict churn.
9
6. Technical data: Information on the user's device type, network quality, and
It is important to note that not all these characteristics may be relevant for
every OTT service or every subscriber churn analysis. The specific characteristics
that are most important may depend on the type of service, the target audience, and
the specific factors that drive churn for that service. Since the research papers
discussed in the literature review were from various industries like telecom, financial,
and social networks, it is not appropriate to use their dataset. So, after searching
OTT datasets in different data repository websites like UCI, Kaggle, Data.gov,
google dataset search, Data.world, Reddit, and OpenML got the churn modeling
dataset from the Kaggle repository which satisfies the requirement to be used to
To answer the research questions: Q3. How to retain subscribers and improve
customer targeting?
Based on the insights gained from the models, customer retention strategies
models and retention strategies will be implemented in the OTT platform and
improve the performance of the models and strategies. The effectiveness of the
metrics such as customer churn rate, customer lifetime value, and revenue. The
evaluation will help in identifying the strengths and weaknesses of the implemented
10
So, the summary is choosing the best algorithm for churn analysis depending
on various factors such as data size and quality, problem complexity, desired
logistic regression, which is computationally efficient and easy to interpret. For larger
datasets, random forests and gradient boosting can handle missing data and outliers
but may be slower. Performance metrics such as accuracy, precision, recall, and F1
considered. Ensemble methods like random forests and gradient boosting can be
used if no single algorithm performs well on the data. Ultimately, the best algorithm is
characteristics of the problem and the data. The following chapter will begin with an
11
CHAPTER FOUR
In this chapter, let’s see the detailed data description, and the system
requirements which need to run the machine learning program. Finally, the
experimental results of the models have been discussed below. The dataset, which
12
System Requirements
Preprocessing the dataset, implementing the solution, and training the models
were all carried out in a Jupyter notebook. Version 6.1.4 of the Jupyter notebook is
utilized. Python 3.8 is the language used for programming. Data science and
language. For activities like data preprocessing, model training, and evaluation, it
has a sizable and vibrant community that offers considerable help and resources for
Implementation
Experimental Results
In this project, generated two variables, "X" and "Y," after uploading our
dataset to the environment, and allocated all of the features of the dataset to "X,"
with the exception of the column "Target," which was assigned to "Y." After dividing
the datasets, the appropriate algorithm was then employed, in this instance, logistic
imported. Utilizing the testing dataset, a prediction was made after the trained
13
datasets had been fitted into the model. Got the following Classification Report in
Fig.1 and an accuracy of 86.50%. The AUROC value of the model is 0.539 as shown
in Fig.6.
hidden_layer, activation, solver, alpha, and max_iter with the following values 100,
‘relu’, ‘adam’, 0.0001, and 1000 respectively. After fitting the trained datasets into the
model, predicted the outcome using the testing dataset. Got the following
Classification Report in Fig.2 and an accuracy of 87.50%. The AUROC value of the
the following Classification Report in Fig.3 and an accuracy of 91%. The AUROC
datasets into the model, predicted the outcome using the testing dataset. Got the
trained datasets into the model, predicted the outcome using the testing dataset. Got
15
the following Classification Report in Fig.5 and an accuracy of 93.25%. The AUROC
criteria, the true positive rate (TPR) is plotted versus the false positive rate (FPR).
true positives (positive samples that were properly predicted to be positive) among
all positive samples. The FPR is defined as the percentage of false positives
(positive samples that were mistakenly predicted to be positive) among all negative
samples.
determine AUROC, and the threshold is steadily raised. TPR and FPR are calculated
for each threshold and plotted on the ROC curve. The AUROC score, which goes
classification, whereas a score of 0.5 shows that the classifier is only slightly more
accurate than random guessing. Unlike other measures like accuracy, precision, and
recall, the AUROC is not affected by class imbalance and threshold selection. This
16
changed to maximize the statistic that matters most in the particular context when
the cost of false positives and false negatives differ. The AUROC of the used model
17
CHAPTER FIVE
Discussion
churn?
Learning?
After discussing the results and drawing a conclusion, follows suggestion for
areas for further study. Here are some findings and conclusions from previous
questions. In reference to the 1st question, from the literature review and
experimental results, the research come to the solution that Machine learning
Moving on to the 2nd question, by sending the test data to a trained model by their
historical dataset, customers are likely to leave a company (i.e., churn) based on
Lastly referring to the 3rd question, here are some strategies that can be used
based on the insights obtained from churn prediction analysis: Identify at-risk
models can help identify subscribers who are at risk of churning. Once these
18
subscribers are identified, targeted retention strategies can be developed to address
their specific needs and concerns. Moving on to personalized offers and promotions
can help companies develop personalized offers, discounted subscription plans, free
Churn prediction models can help identify common issues that lead to
subscriber churns, such as poor customer service or long wait times. Companies can
use this information to improve their customer service and address subscriber
concerns in a timely manner. By rewarding loyal subscribers with exclusive offers and
discounts, companies can increase subscriber retention and reduce churn. The
models can help identify which subscribers are most likely to respond to loyalty
programs and what types of rewards are most effective. Analyzing subscriber behavior
and subscription history can help companies optimize their pricing strategies.
Conclusion
and got the highest accuracy and maximum AUROC of 93.25% and 0.793
respectively in the Gradient Boosting Machine Model. And so, in churn prediction,
GBM can be used to identify the most significant predictors of churn and build a
model that predicts which subscribers are at a high risk of leaving. GBM can handle
many features, including both numerical and categorical variables, making it suitable
for churn prediction analysis. The algorithm can also deal with imbalanced datasets
and handle missing values, which are common in real-world churn prediction
scenarios. GBM can provide accurate and robust predictions, making it a powerful
19
Area for Further Study
Churn prediction is a complex and dynamic field, and there are several areas
of further study that could help improve the accuracy and effectiveness of churn
sources, such as social media and customer reviews, into churn prediction models.
Another area of study is the use of deep learning techniques, such as convolutional
neural networks (CNNs) and recurrent neural networks (RNNs), to improve the
sources, such as clickstream data and mobile app usage data, could enable more
accurate and timely identification of customers at risk of churn. Overall, there are
many exciting areas of further study in churn prediction, and continued research and
innovation in this field have the potential to greatly improve customer retention
20
REFERENCES
Agrawal, S., Das, A., Gaikwad, A., & Dhage, S. (2018). Customer Churn Prediction
Ahmad, A. K., Jafar, A., & Aljoumaa, K. (2019). Customer churn prediction in
telecom using machine learning in big data platform. Journal of Big Data, 6(1).
https://doi.org/10.1186/s40537-019-0191-6
De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification
https://doi.org/10.1016/j.ejor.2018.02.009
Huang, B., Kechadi, M. T., & Buckley, B. (2012). Customer churn prediction in
https://doi.org/10.1016/j.eswa.2011.08.024
Kuldeep, C., Rojhe, V., Singh, N., & Rao, A. (n.d.). {YICCISS-2021} HJ Emerging
www.houseofjournals.com
Madden, G., Savage, S. J., & Coble-Neal, G. (1999a). Subscriber churn in the
https://doi.org/10.1016/S0167-6245(99)00015-3
21
Madden, G., Savage, S. J., & Coble-Neal, G. (1999b). Subscriber churn in the
https://doi.org/10.1016/S0167-6245(99)00015-3
Ullah, I., Raza, B., Malik, A. K., Imran, M., Islam, S. U., & Kim, S. W. (2019). A Churn
22