Churn Buster Uncovering Patterns and Predicting Churn in OTT Platforms
Churn Buster Uncovering Patterns and Predicting Churn in OTT Platforms
Dipti Vatsa
Sunita Adarsh Yadwad Shreya M. Dholariya
Dept. Computer Science Engineering Dept. Computer Science Engineering
Dept. Computer Science Engineering
Parul Institute of Engineering and Parul Institute of Engineering and
Parul Institute of Engineering and
Technology Technology
Technology
2024 Parul International Conference on Engineering and Technology (PICET) | 979-8-3503-6974-8/24/$31.00 ©2024 IEEE | DOI: 10.1109/PICET60765.2024.10716193
Abstract—Churn, a common concern among business OTT sector encompasses various strategies, including
leaders, poses a significant challenge in today's fast-paced world analysing subscriber behaviour, calculating churn rates,
where consumer attention spans are short. This research focuses implementing personalised marketing, and enhancing user
on churn analysis within the OTT streaming platform sector, experience. The goal is to identify factors driving subscriber
essential for retaining subscribers and ensuring business attrition and develop data-driven strategies to mitigate it,
longevity. Utilising data preprocessing methods and machine ensuring financial stability and competitiveness. This
learning algorithms, this study aims to identify factors research paper aims to predict customer retention rates using
influencing churn and provide insights for targeted retention data visualisation and emphasises the importance of churn
strategies. The dataset includes 16 variables, with churn serving
rate prediction in determining overall company success. built-
as the binary-dependent variable. Data preprocessing involved
addressing missing values and converting categorical data into
in; examples of the type styles are provided throughout this
numerical format. Four machine learning models—Random document and are identified in italic type, within parentheses,
Forest, XGBoost, K-Nearest Neighbors, and Support Vector following the example. Some components, such as multi-
Machine—were applied, with Random Forest achieving the leveled equations, graphics, and tables are not prescribed,
highest accuracy of 89%. Evaluation metrics like confusion although the various table text styles are provided. The
matrices, ROC curves, and Precision-Recall curves were used to formatter will need to create these components, incorporating
evaluate model performance, revealing Random Forest's the applicable criteria that follow.
effectiveness in churn prediction. Principal Component Analysis
(PCA) was employed to handle the complexity of user data, II. LITERATURE SURVEY
aiding in the identification of influential churn factors. This Numerous studies across various industries investigate
study emphasises the importance of understanding customer customer churn's impact on revenue. In the Malaysian
behaviour and utilising machine learning to forecast telecommunications sector, low Net Promoter Scores (NPS)
subscription choices. The insights gained provide valuable were linked to churn, highlighting prompt helpdesk support's
guidance to OTT companies for enhancing customer satisfaction role in enhancing satisfaction. Utilizing CART algorithms, a
and retention strategies, ensuring competitiveness in a dynamic
churn prediction model achieved 98% accuracy [1]. In Korean
industry landscape.
influencer commerce, a Decision Tree algorithm predicted
Keywords—Churn analysis, OTT streaming platforms, Data churn with 90% accuracy. Chinese telecoms employed
preprocessing, Machine learning algorithms, Random Forest, segmentation, revealing a logistic regression model with
XGBoost, K-Nearest Neighbors, Support Vector Machine, 93.5% accuracy [2]. XGBoost showed significant
Evaluation metrics, Principal Component Analysis (PCA), improvement in churn prediction accuracy, from 48.1% to
Customer behavior, Retention strategies. 82.1% [3].A study applied the RFM model to UK web retailer
data, successfully identifying churn risk by segmenting
I. INTRODUCTION customers based on purchasing behavior [4]. Interpretability
In the contemporary high-speed business landscape, using SHAP values in machine learning models for churn
accurately predicting churn rates is essential, particularly for prediction was emphasized [5]. In the mobile gaming
companies in the rapidly expanding Over-The-Top (OTT) industry, survival analysis and Random Forest provided
industry. Failing to effectively manage churn can lead to insights for churn modeling [6]. A novel churn prediction
significant financial losses and hinder a platform's ability to method based on social network analysis was proposed in the
compete in a crowded market. Customer retention is crucial telecom sector [7]. XGBoost was favored over SVM, Random
for business sustainability, impacting a company's reputation Forest, and Adaboost for airline churn prediction [8]. Deep
and market position. Understanding and addressing churn learning models were introduced for airline churn prediction,
rates, which measure customer disengagement over time, are emphasizing parameter optimization [9]. Support Vector
especially vital for OTT platforms reliant on subscriber Machine with balanced data sampling delivered optimal
renewals for financial stability. Churn analysis within the results in the banking sector [10]. Challenges in telecom churn
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:48:36 UTC from IEEE Xplore. Restrictions ap
prediction included imbalanced data and real-time prediction 4) Feature Selection: Shifts focus to feature selection to
[11]. In digital banking, a combined model achieved 87% identify the most relevant variables, using methods like
accuracy for early churn detection [12]. Logistic regression ANOVA and PCA to prioritize informative features while
was utilized in the Chinese telecom industry for churn reducing dimensionality.
prediction [13]. Logistic regression performed best with 5) Model Implementation: Implements predictive models
balanced data in health insurance churn prediction [14]. such as Random Forest, XGBoost, KNN, and SVM,
Neural networks achieved 99.99% accuracy in ISP churn training them on the dataset to extract insights and make
prediction in Surabaya [15]. A separate investigation predictions.
developed a churn risk prediction mechanism for cloud 6) Model Evaluation: Evaluates models’ performance using
service providers using random forest [16]. Integrating standard metrics like accuracy, precision, recall, and F1
customer feedback into churn prediction analysis improved score to gauge their effectiveness.
prediction accuracy [17]. 7) Data visualization: Generates visual representations of
data findings using tools like Power BI to enhance
III. METHODOLOGY comprehension and streamline communication of insights.
In contemporary research and practical applications, the 8) Interpretation & Reporting: Interprets analyzed data and
systematic analysis of data serves as a fundamental pillar for reports key findings to stakeholders, emphasizing clear
informed decision-making. In this study, a comprehensive and concise reporting for informed decision-making and
exploration of the data analysis process is undertaken, aiming strategic planning
to elucidate its significance and methodologies. Through a B. Algorithms
structured study flow, a detailed understanding of the
sequential steps involved in data analysis, from initial data The analysis of customer churn in the study entails the
collection to the interpretation and reporting of findings, is utilization of four distinct algorithms, listed as follows:
provided.
● Extreme Gradient Boosting (Xgboost)
● Random forest
● K-Nearest Neighbors
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:48:36 UTC from IEEE Xplore. Restrictions ap
outlier detection. Its ability to provide feature importance of different classes, with the objective of maximizing the
rankings and handle missing values adds to its appeal, margin between classes while minimizing classification
making it a favored option among data scientists and errors.Renowned for its ability to manage both linearly
practitioners. and non-linearly separable data using various kernel
functions, SVM demonstrates robustness against
Training Flow: In training, Random Forest constructs overfitting and presents strong generalization capabilities.
decision trees independently, optimizing subsets of the
training data through bagging [17]. After constructing all Training Flow: SVM selects a kernel function and
decision trees, predictions are combined or aggregated, initializes model parameters. It identifies the hyperplane
typically by averaging for regression tasks or using the that maximizes the margin between classes and adjusts
mode for classification tasks. parameters to minimize errors. For non-linear data, SVM
uses the kernel trick to map it into a higher-dimensional
Purpose: Random Forest is primarily used for supervised space for separation. SVM iteratively updates parameters
learning tasks like classification and regression, aiming to until convergence.
provide accurate predictions while mitigating overfitting
and handling high-dimensional data. It is valued for its Purpose: SVM serves in binary and multi-class
flexibility, scalability, and user-friendly nature, making it classification tasks, aiming to find the hyperplane that
a popular choice for interpretable models. maximizes class separation for better generalization
performance. It is effective for linearly separable data and
Challenges: Despite its effectiveness, Random Forest may can transform non-linear data into separable forms using
face challenges such as managing computational the kernel trick, making it suitable for various
complexity, adjusting hyperparameters, and interpreting applications.
complex interactions between features. These challenges
can require meticulous experimentation and may limit Challenges: Despite its effectiveness, SVM faces
interpretability in some scenarios. Nonetheless, Random challenges such as computational complexity, especially
Forest remains a widely used and powerful algorithm in with large datasets and high-dimensional spaces.
machine learning due to its robustness and versatility Performance may degrade with imbalanced data or non-
linear separable classes, requiring techniques like class
3) K-Nearest Neighbors: K-Nearest Neighbors (KNN) is a weighting or kernel parameter tuning. Additionally,
simple yet powerful algorithm used for both classification SVM’s decision boundary is defined by support vectors,
and regression tasks in machine learning. It predicts the limiting interpretability [17]. Nonetheless, SVM remains
class or value of a new data point based on the majority a powerful tool for optimal class separation.
vote or average of its nearest neighbors in the feature
space. Unlike other algorithms, KNN doesn’t require an C. Dataset Description
explicit training phase but instead relies on stored The dataset, sourced from Kaggle, contains 2000 records
instances from the training dataset. and 16 columns about OTT platform users. Key features
Training Flow: In KNN, the entire training dataset is include:
stored initially. When a new data point is encountered, ● User Information: Attributes like customer ID,
distances to all training points are computed using a phone number, gender, and age.
specified metric (e.g., Euclidean distance), and the K ● Subscription Details: Data such as days subscribed,
nearest neighbors are identified. For classification, the multiscreen usage, and email subscription status.
majority class among these neighbors is predicted; for ● Viewing Behavior: Metrics like weekly minutes
regression, the average of their target values is forecasted. watched, daily viewing times, and videos watched.
Purpose: KNN serves purposes in both classification and ● Customer Support: Information about support calls
regression tasks within supervised learning. Its primary and days of inactivity.
purpose is to provide predictions based on the similarity ● Churn Status: Indicator for users who discontinued
of instances in the training data. KNN is particularly subscriptions.
useful when decision boundaries are complex and non-
linear, and when interpretability is important, given its
reliance on the local structure of the data to generate
predictions [17].
Challenges: KNN faces challenges in computational
efficiency, especially with large datasets, as it requires
calculating distances between the query point and all
training points. Selecting the optimal value of K and the
appropriate distance metric can also be challenging and
may require careful experimentation. Additionally,
KNN’s performance may degrade in noisy feature spaces
or with irrelevant features. Nonetheless, it remains widely
used, particularly where interpretability and simplicity are
valued.
4) Support Vector Machine : SVM, a widely used supervised
learning algorithm in machine learning, is primarily
utilized for classification tasks. Its operation involves
identifying the optimal hyperplane to separate data points
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:48:36 UTC from IEEE Xplore. Restrictions ap
Fig. 3. Smote
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:48:36 UTC from IEEE Xplore. Restrictions ap
Fig. 7. P-R Curve_Random Forest
Fig. 9. Dashboard
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:48:36 UTC from IEEE Xplore. Restrictions ap
and generalization. Investigating advanced sampling and (Updated 23 April 2022) Available at https://www.mdpi.com/2078-
2489/13/5/227. [Accessed 27 February 2023].
dimensionality reduction approaches may lead to more
[17] Mckinney,Wes(2012).Python for Data Analysis.O'Reily Media,Inc.
effective churn prediction models for OTT platforms. This
study provides a foundation for future work addressing
customer churn challenges in OTT platforms using advanced
machine learning methods.
ACKNOWLEDGMENT
We extend our heartfelt thanks to Professor Dr. Sunita
Adarsh Yadwad and Assistant Professor Shreya Dholariya for
their invaluable guidance and support. We are grateful to the
Department of Computer Science and Engineering (CSE) at
Parul University for fostering research opportunities. Our
appreciation also goes to the research participants and
colleagues for their cooperation and feedback.
REFERENCES
[1] Mustafa, N., Ling, L.S. & Razak, S.F.A., 2021.Customer Churn
Prediction for Telecommunication Industry: A Malaysian Case Study,
F1000Research 2021 10(1274), pp. 20.
[2] Kassem, E.A., Hussein, S.A., Mostafa, A. & Alsheref, F.K., 2020,
Customer Churn Prediction Model and Identifying Features to Increase
Customer Retention based on User Generated Content, (IJACSA)
International Journal of Advanced Computer Science and Applications
11(5), pp. 522-531
[3] Vaudevan, M., Narayanan, R.S., Fatima, S. & Abhishek., 2021,
Customer Churn Analysis using XGBoosted decision trees, Indonesian
Journal of Electrical Engineering and Computer Science 25(1), pp. 488-
495.
[4] Bagul, N., Berad, P., Surana, P. & Khachane, C., 2021, Retail Customer
Churn Analysis using RFM Model and K-Means Clustering,
International Journal of Engineering Research & Technology (IJERT)
10(3), pp. 349-354.
[5] Guliyev, H. & Tatoğlu, F.Y., 2021. Customer Churn Analysis in
Banking Sector: Evidence from explainable machine learning models,
Journal of Applied Microeconometrics (JAME) 1(2), pp. 85-99.
[6] Wang, G.Y., 2022. Churn Prediction for High-Value Players in
Freemium Mobile Games: Using Random Under-Sampling, Statistika
2020 102(4), pp. 443- 453.
[7] Kosti´c, S.M., Simi´c , M.I. & Kosti´c, M. V., 2020. Social Network
Analysis and Churn Prediction in Telecommunications Using Graph
Theory, Entropy 2020 22(753), pp. 23.
[8] Cheng, X., Ran, J., 2021. Airline Customer Value Analysis and
Customer Churn Prediction Based on LRFMC Model and K-Means
Algorithm, 2nd International Conference on Computer Science and
Management Technology(ICCSMT), Wuhan, 2021. Ieee ;China.
[9] State Key Laboratory of Integrated Services Networks, Xidian
University, 2020. Xi’an 710071,China.
[10] Karvana, K.G.M., Yazid, S., Syalim, A. & Mursanto, P., 2019.
Customer Churn Analysis and Prediction Using Data Mining Models in
Banking Industry, International Workshop on Big Data and Information
Security(IWBIS) , Depok, 2019. Ieee;
[11] Faculty of Computing and Informatics Multimedia University, 2019.
Cyberjaya Selangor, Malaysia.
[12] Galal, M., Rady, S. & Aref, M., 2022. Enhancing Customer Churn
Prediction in Digital Banking using Ensemble Modeling, 4th Novel
Intelligent and Leading Emerging Sciences Conference(NILES), Cairo,
2022. Ieee; Egypt.
[13] Masuichi, H., Onishi, T., Satake, K., Wang, Y., 2015. Improving Churn
Prediction with Voice of the Customer Available at:
https://www.anlp.jp/proceedings/annual_meeting/2015/pdf_dir/B6-
1.pdf.[Accessed 27 February 2023].
[14] Jamjoon, A.A., 2021. The Use of Knowledge Extraction in Predicting
Customer Churn in B2B Available at:
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-
021-00500-3. [Accessed 27 February 2023].
[15] Yudhistira, A.R. & Sampoa, F., 2016. Customer Churn Analysis With
Back-Propagation Neural Network: Case Internet Service Provider Xyz
Available at: https://ijret.org. [Accessed 27 February 2023].
[16] Saias, J., Rato, L. & Goncalves, T., 2022. An Approach to Churn
Prediction for Cloud Services Recommendation and User Retention
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:48:36 UTC from IEEE Xplore. Restrictions ap