0% found this document useful (0 votes)
39 views7 pages

Customer Churn Prediction Using Machine Learning

The document discusses a project focused on predicting customer churn in the telecom industry using machine learning, specifically the Random Forest algorithm. It outlines the methodology, including data preprocessing and exploratory data analysis, to identify key factors influencing churn and improve customer retention strategies. The project aims to provide actionable insights and enhance decision-making for telecom companies by leveraging historical customer data.

Uploaded by

Dani Jojo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Customer Churn Prediction Using Machine Learning

The document discusses a project focused on predicting customer churn in the telecom industry using machine learning, specifically the Random Forest algorithm. It outlines the methodology, including data preprocessing and exploratory data analysis, to identify key factors influencing churn and improve customer retention strategies. The project aims to provide actionable insights and enhance decision-making for telecom companies by leveraging historical customer data.

Uploaded by

Dani Jojo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal of Scientific Research in Engineering and Management (IJSREM)

Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

CUSTOMER CHURN PREDICTION USING MACHINE LEARNING


JENIFA J1 ,ADHITHIYA S2,DINAKARA PANDIAN B3,MANIKANDAN G4,KEERTHIVASAN V5
1
Assistant Professor -Department of Information Technology & Kings Engineering College-India.
2,3,4,5
Department of Information Technology & Kings Engineering College-India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - In today’s highly competitive telecom sector, performing exploratory data analysis (EDA), we aim to
customer churn — the loss of clients to competitors — poses a identify patterns that differentiate loyal customers from those
major threat to revenue and growth. This project tackles churn likely to leave. The goal is not only to predict churn but also
prediction using machine learning, focusing on the Random to understand the underlying factors contributing to it.
Forest algorithm to identify customers likely to leave. The
Telco Customer Churn dataset, containing customer The implementation of this churn prediction model serves as a
demographics, service usage, and account details, serves as decision-support system for marketing and customer service
the foundation.The workflow begins with exploratory data departments. By identifying high-risk customers early,
analysis (EDA) to uncover key trends and indicators of churn. targeted interventions such as discounts, personalized
A robust preprocessing pipeline is then applied, including communication, or service improvements can be deployed to
handling missing data, encoding categories, scaling, and improve retention. This project demonstrates how leveraging
addressing class imbalance. Random Forest is chosen for its machine learning can turn raw customer data into actionable
accuracy and interpretability, and its performance is compared business insights, helping organizations become more
against models like Logistic Regression, SVM, and XGBoost proactive, customer-centric, and competitive in the digital age.
using metrics such as precision, recall, F1-score, and ROC-
1.1 DOMAIN INTRODUCTION
AUC.Results show that contract type, tenure, and billing-
related features significantly influence churn. The model not The telecommunication industry is one of the most dynamic
only predicts churn with high accuracy but also provides and data-intensive sectors in the world, providing essential
actionable insights through feature importance and services such as mobile communication, broadband, cable
visualization tools. This supports data-driven retention television, and internet connectivity to billions of users
strategies like targeted offers or improved services.Ultimately, globally. With rapid technological advancements and
the project showcases how machine learning enhances increasing competition, telecom companies are constantly
customer relationship management (CRM) and can be adapted striving to improve service quality, reduce operational costs,
for similar use cases in banking, insurance, and e- and enhance customer satisfaction. In such a saturated market,
commerce. one of the major concerns is customer churn, where
subscribers switch from one service provider to another, often
Key Words: Customer churn, churn prediction, Random due to dissatisfaction, better offers, or perceived lack of value.
Forest, machine learning, telecom industry, predictive
modeling, supervised learning Customer churn not only results in immediate revenue loss but
also affects a company's long-term profitability and market
share. Acquiring new customers often costs significantly more
1.INTRODUCTION
than retaining existing ones. Hence, understanding and
predicting churn behavior is vital for telecom operators.
In today’s competitive business landscape, retaining existing
Factors influencing churn can range from poor network
customers is as crucial as acquiring new ones, especially in
quality and high service charges to limited service features
subscription-based industries like telecommunications,
and ineffective customer support. Telecom companies now
banking, insurance, and internet services. One of the biggest
rely heavily on data analytics and machine learning to extract
challenges faced by companies is customer churn, which
meaningful patterns from customer data, enabling smarter
refers to when customers discontinue using a company’s
business decisions and customer retention strategies.
product or service. Predicting churn in advance allows
businesses to take proactive measures to retain customers,
ultimately reducing revenue loss and increasing customer
satisfaction. With the advent of data-driven strategies,
machine learning has emerged as a powerful tool to analyze
historical data and forecast customer behavior.

This project focuses on developing a machine learning model


that predicts customer churn using the Random Forest
algorithm — a robust ensemble technique known for its high
accuracy and resistance to overfitting. The model is trained on
the Telco Customer Churn dataset, which includes features
like customer tenure, monthly charges, contract type, and
service usage patterns. By preprocessing the data, handling
missing values, encoding categorical variables, and
Fig.1.Intro Template

© 2025, IJSREM | www.ijsrem.com DOI: | Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

specifically the Random Forest algorithm—to detect patterns


and risk factors associated with customer attrition. The model
By analyzing historical data such as call records, billing is designed to handle both categorical and numerical
information, customer complaints, service usage, and variables, allowing for a more comprehensive understanding
demographic attributes, telecom providers can build predictive of customer behavior. This system can be used by telecom
models that forecast which customers are likely to churn. companies, internet service providers, and similar industries
These insights help companies implement preemptive actions where customer retention is a key performance indicator.
such as targeted marketing campaigns, loyalty programs, and
personalized offers to retain valuable customers. This project The project includes several critical steps within its scope:
falls under this domain and leverages machine learning data collection, preprocessing, exploratory data analysis
techniques, particularly Random Forest, to build an efficient (EDA), model training and validation, and performance
churn prediction model—helping telecom companies evaluation. Techniques such as label encoding, feature
transform raw data into actionable insights and competitive selection, and cross-validation are employed to ensure a
advantage. reliable and accurate predictive outcome. The results from the
model are then interpreted to provide actionable insights into
1.2 OBJECTIVES the most influential factors contributing to churn, such as
contract type, customer tenure, and monthly charges. This
The primary objective of this project is to develop a predictive helps business analysts and customer success teams prioritize
model that can accurately identify customers who are likely to their engagement strategies for high-risk customers.
churn in the near future. By leveraging the power of machine
learning—specifically the Random Forest algorithm—this However, the scope of the project is limited to building and
project aims to uncover hidden patterns and significant validating the prediction model and does not extend to
indicators from customer behavior and usage data that deploying the model in a real-time production environment.
contribute to churn. The project also focuses on enhancing the Additionally, while the model is developed on a specific
interpretability of the model, so that business stakeholders can dataset (e.g., Telco Customer Churn), its structure allows for
understand which factors influence customer decisions. future adaptation to other domains or datasets with similar
Furthermore, the model is intended to serve as a valuable churn-related patterns. The predictive insights gained through
decision-making tool for customer retention teams by this project provide a strong foundation for building customer-
allowing early intervention through personalized strategies. focused business strategies and improving service quality to
Overall, the goal is to use data-driven insights to improve reduce churn.
customer retention, reduce revenue loss, and increase business
profitability. In addition to building a predictive model, the project also
emphasizes the importance of interpretability and
 To predict customer churn using machine learning transparency in machine learning. Understanding why a
techniques with high accuracy and reliability. customer is likely to churn is just as important as predicting
 To apply the Random Forest algorithm for who will churn. Therefore, tools such as feature importance
classification, due to its robustness and effectiveness analysis and visualizations (e.g., bar plots, heatmaps, and
in handling both categorical and numerical data. decision trees) are used to make the model’s decisions more
 To analyze customer data (e.g., tenure, monthly interpretable for business users. This allows stakeholders to
charges, contract type) and identify key trust and act upon the predictions with confidence. The
features that contribute to churn behavior. insights drawn from these analyses can guide improvements
 To perform exploratory data analysis (EDA) and data in marketing strategies, customer experience design, and
preprocessing to ensure data quality and uncover service delivery.
hidden patterns.
 To evaluate model performance using metrics such Furthermore, the scope of the project extends to creating
as accuracy, precision, recall, F1-score, and AUC- reusable and scalable code modules that can be applied to
ROC. other datasets or similar business use cases. By ensuring that
 To assist decision-makers in identifying high-risk the codebase is modular and well-documented, the project
customers early and developing retention strategies serves as a blueprint for similar predictive analytics tasks
accordingly. across different domains such as banking, insurance, or e-
 To reduce operational costs by minimizing customer commerce. While real-time deployment and integration with
turnover and increasing customer lifetime value. CRM systems are out of scope for this project, the structure
 To demonstrate the business value of machine and methodologies used lay the groundwork for such
learning in solving real-world customer management enhancements in future iterations.
problems.
 To visualize results and insights in a clear and 2.SYSTEM ANALYTICS
interpretable manner for stakeholders and
management. System analytics in the context of telecom churn prediction
involves analyzing customer behavior, usage patterns, and
The scope of this project lies in the application of machine service preferences to extract meaningful insights. The system
learning techniques to develop a predictive model capable of uses large-scale customer data, including demographic
identifying customers who are likely to churn from a information, account details, and service usage logs, to
subscription-based service. Using historical customer data, the identify trends and risk factors associated with customer
project focuses on training a classification model— churn. Visualization tools such as graphs, heatmaps, and

© 2025, IJSREM | www.ijsrem.com DOI: | Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

dashboards help present the data in an interpretable manner, Category Requirement


making it easier for analysts to spot key patterns. These
analytics support business intelligence by uncovering which Libraries & NumPy, Pandas, Matplotlib, Seaborn,
features (e.g., payment type, contract length, internet service Frameworks Scikit-learn, XGBoost
usage) most strongly correlate with churn. By integrating
machine learning models into the analytics framework, the CSV file input (optionally: MySQL /
Database
system can move from descriptive analysis to predictive MongoDB for dynamic datasets)
insights, enabling real-time churn forecasting.
Visualization Tools Matplotlib, Seaborn, Plotly
2.1 EXISTING PROBLEM
Scikit-learn metrics module
One of the major problems in the telecom industry is the Model Evaluation
(accuracy_score, confusion_matrix,
inability to effectively anticipate and manage customer churn. Tools
ROC)
With millions of subscribers, manually tracking customer
dissatisfaction or potential churn is impractical. Companies Git (optional, for collaborative
often rely on generic retention strategies that are neither cost- Version Control
development)
efficient nor targeted. Furthermore, traditional data analysis
techniques may fail to capture complex, non-linear MS Word / LaTeX / Google Docs (for
relationships in customer behavior. Another issue is data Documentation
report writing)
quality—telecom datasets often contain noise, missing values,
or imbalanced class distributions where churned customers are Anaconda (recommended for managing
a minority. These limitations hinder accurate modeling and Other Tools
environments and dependencies)
lead to suboptimal decision-making, resulting in lost revenue
and customer dissatisfaction. There's a clear need for a
scalable, intelligent system that can automate and enhance the
churn prediction process.

2.2 PROPOSED METHODOLOGY 3.2 FUNCTIONAL REQUIREMENTS :


The proposed methodology involves building a machine Category Requirement
learning-based system to predict customer churn with high
accuracy. The process begins with data acquisition from Windows 10 / Linux Ubuntu 20.04 /
telecom databases, followed by data preprocessing, which Operating System
macOS (any OS supporting Python)
includes cleaning, feature selection, and transformation. The
preprocessed data is then used to train several machine Programming
learning models such as Logistic Regression, Support Python 3.8 or above
Language
Vector Machine (SVM), Random Forest, and XGBoost.
These models are chosen for their effectiveness in IDE / Code Editor Jupyter Notebook / VS Code / PyCharm
classification tasks. The system evaluates model performance
using metrics like accuracy and ROC-AUC score, selecting Libraries & NumPy, Pandas, Matplotlib, Seaborn,
the best-performing model for deployment. Additionally, data Frameworks Scikit-learn, XGBoost
visualization tools are used throughout the process to provide
clarity on feature importance and model output. This CSV file input (optionally: MySQL /
methodology not only automates churn prediction but also Database
MongoDB for dynamic datasets)
enables telecom companies to act preemptively by identifying
high-risk customers and applying personalized retention Visualization Tools Matplotlib, Seaborn, Plotly
strategies.
Model Evaluation Scikit-learn metrics module
3.SYSTEM REQUIREMENTS Tools (accuracy_score, confusion_matrix, ROC)
3.1 SOFTWARE REQUIREMENTS :
Git (optional, for collaborative
Version Control
Category Requirement development)

Windows 10 / Linux Ubuntu 20.04 / MS Word / LaTeX / Google Docs (for


Operating System Documentation
macOS (any OS supporting Python) report writing)

Programming Anaconda (recommended for managing


Python 3.8 or above Other Tools
Language environments and dependencies)

IDE / Code Editor Jupyter Notebook / VS Code / PyCharm


3.3 NON-FUNCTIONAL REQUIREMENTS

© 2025, IJSREM | www.ijsrem.com DOI: | Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

S. Category Library / Framework Purpose


Requirement Description
No.
transforming
The system should provide predictions data)
1 Performance within a reasonable time (e.g., a few
seconds per input). Numerical
Numerical Ops Numpy operations and
The system should be able to handle array handling
2 Scalability larger datasets (e.g., 100,000+ customer
records) efficiently. To create
graphs,
Data matplotlib, seaborn,
The prediction accuracy of the model heatmaps, and
Visualization plotly
3 Accuracy should be above a defined threshold interactive
(e.g., 80%). charts

The system should deliver consistent Provides


4 Reliability and repeatable results for the same algorithms like
input data. Machine Logistic
scikit-learn
Learning Regression,
The code should be modular and well- SVM, Random
5 Maintainability documented to allow easy updates or Forest, etc.
enhancements.
Advanced
The system should have a simple, user- boosting
Xgboost
6 Usability friendly interface for ease of use by algorithm for
analysts or operators. classification

The application should run on different For metrics like


7 Portability platforms (Windows/Linux/macOS) accuracy,
with minimal changes. Model Evaluation sklearn.metrics ROC-AUC,
confusion
If hosted online, the system should matrix
Security
8 ensure data privacy and secure access to
(optional)
sensitive customer data. IDE for
running and
Development Jupyter Notebook /
The system should allow easy testing code
Environment Google Colab
9 Extensibility integration of new algorithms or blocks
external data sources. interactively

The system should be available for use Data For saving and
10 Availability as needed and should not require Serialization joblib or pickle loading trained
constant maintenance. (optional) models

To optimize
Hyperparameter GridSearchCV from
algorithm
3.4 REQUIRED LIBRARIES AND FRAMEWORKS : Tuning sklearn.model_selection
parameters
Category Library / Framework Purpose Manages
Environment project
Primary Management Anaconda or virtualenv dependencies
programming (optional) and isolated
language for environments
Core Language Python 3.8+ model
development
and data
processing
4. MODELS AND METHODS
Data Handling Pandas Data
manipulation
and
preprocessing
(loading,
cleaning,

© 2025, IJSREM | www.ijsrem.com DOI: | Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

4.1 Random Forest Algorithm The data includes parameters such as:
Random Forest is a supervised ensemble learning method
used for classification and regression. It builds multiple  Time spent on different modules
decision trees on random data subsets and combines their
outputs for improved accuracy. In Learnwise, it's used to  Quiz scores and attempts
predict learner success, dropout risk, and engagement levels,
offering robust and reliable educational insights.  Session durations

 Number of resources accessed


4.2 Decision Tree
A Decision Tree splits data based on feature values to form a  Engagement levels based on clickstream data
tree structure of decisions. Internal nodes represent features,
branches show decisions, and leaf nodes indicate outcomes. In Both behavioral data (engagement metrics) and performance
Learnwise, it personalizes content paths and quiz difficulties data (quiz scores, completion rates) were integrated to train
based on learner performance. predictive models for learning path recommendation and
dropout prediction.
4.3 Support Vector Machine (SVM)
SVM classifies learners into categories like highly engaged or 5.2 DATA PRE-PROCESSING
at-risk by finding optimal boundaries between classes. It
Before feeding the collected data into machine learning
handles complex, high-dimensional, and noisy educational
models, several preprocessing steps were applied:
data effectively, making it ideal for behavioral analysis on the
Learnwise platform.  Data Cleaning: Removed missing values and
erroneous entries from logs.
4.4 K-Nearest Neighbors (KNN)
KNN classifies learners based on similarity to their peers. It’s  Feature Engineering: Created new features such as
used for personalized recommendations, identifying peer engagement scores and content interaction
learning groups, and adapting to evolving data. Learnwise frequencies.
uses KNN to deliver collaborative, behavior-based learning
 Data Encoding: Categorical variables like course
paths.
names and session types were label-encoded for
compatibility with models.
5.IMPLEMENTATION
 Data Normalization: Standardized numeric values
5.1 DATA ANALYSIS (e.g., quiz scores, session times) using Min-Max
Scaling to improve model convergence.
5.1.1 Exploring User Data Patterns
 Train-Test Split: Divided the preprocessed data into
As we commence the implementation phase, our first step training and testing sets (80:20) to enable effective
involves performing a thorough analysis of user interaction model evaluation.
data. The objective is to uncover key patterns and
relationships between various user activities—such as course 5.3 MACHINE LEARNING APPROACH
engagement, test performance, time spent per module, and
resource usage. 5.3.1 Linear Regression Model

By understanding how learners interact with the platform, we Linear Regression was initially used to model simple
can better personalize content recommendations and predict predictive tasks, such as:
user outcomes such as course completion or dropout
likelihood.  Predicting expected course completion time.

5.1.2 Training Dataset Acquisition  Estimating learner engagement scores based on early
activity patterns.
The effectiveness of the AI models powering Learnwise relies
heavily on two factors: the richness of collected interaction Working:
parameters and the quality of the training dataset.
 Independent variables: Time spent, number of
To achieve this, we curated datasets by collecting user logs modules accessed, quiz attempts.
from:
 Dependent variable: Engagement score or course
 Pilot Learnwise platform usage sessions. completion probability.

 Publicly available learning behavior datasets (e.g.,  The model fits a linear equation to the input features
from Kaggle educational repositories). and predicts continuous outcomes.

 Synthetic data generation mimicking realistic learner Implementation:


activities.

© 2025, IJSREM | www.ijsrem.com DOI: | Page 5


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

 Used scikit-learn's LinearRegression() class. using one-hot encoding to convert categories into binary
columns. For binary variables like Partner or Dependents, we
 Training involved minimizing Mean Squared Error used label encoding to map Yes and No values to 1 and 0.
(MSE) between predicted and actual engagement This made the data suitable for machine learning models that
outcomes. require numerical input.

 Post-training, model predictions were sorted to To enhance the predictive power of the model, we created
highlight at-risk learners needing additional support. new derived features. One such feature was tenure_group,
which grouped customers into bins based on their tenure (e.g.,
"0–12 months", "13–24 months", etc.). This captured
customer loyalty stages and revealed strong correlations with
churn probability. Another useful feature was
MonthlyCharges_to_TenureRatio, which helped identify
customers who were paying a high monthly fee but had low
tenure—often early indicators of dissatisfaction.

We also calculated interaction features between relevant


columns. For instance, combining OnlineSecurity and
InternetService helped reveal service bundle patterns that
influenced churn. Customers with fiber optic internet but no
online security were more likely to churn compared to those
with DSL and basic add-ons. These interaction terms added
granularity to the model's understanding of feature
relationships.

An essential part of feature engineering was scaling


numerical features. Columns like MonthlyCharges and
TotalCharges were normalized using StandardScaler, which
centers the data and scales it to unit variance. This is
particularly beneficial for models sensitive to feature scale,
such as Logistic Regression or Support Vector Machines.

We also performed feature selection using a combination of


techniques. Initially, we calculated the correlation matrix to
identify redundant features. Then, we used model-based
feature importance scores from a Random Forest classifier to
prioritize features. Additionally, Recursive Feature
Elimination (RFE) was used to iteratively select features that
contributed most to model accuracy.

Another advanced technique we explored was target


6.MODEL COMPARISON encoding on high-cardinality categorical variables, although
cautiously to avoid overfitting. For instance, if the dataset
Feature engineering is a critical step in the machine learning contained a feature like City, it might have been encoded
pipeline, especially in a churn prediction problem where the based on the average churn rate per city. However, in this
success of the model heavily depends on how well the raw dataset, most categorical variables were low-cardinality, so
data is transformed into meaningful inputs. In this project, we this method had limited application.
performed several feature engineering techniques to extract
additional value from the Telco Customer dataset and improve We also ensured that the features were logically consistent
the model's ability to distinguish between customers who and interpretable. For example, we checked for data leakage
churn and those who stay. by ensuring that no features derived from post-churn behavior
were included in the training data. Additionally, we
The first step in feature engineering involved dealing with maintained a clean feature naming convention and
missing and anomalous values. Certain columns, such as documented the transformations in the preprocessing pipeline
TotalCharges, contained missing or non-numeric entries that to ensure reproducibility.
were converted to NaN during data ingestion. We imputed
missing values using appropriate strategies—numerical In summary, the feature engineering process significantly
columns were filled with median values, while categorical improved the quality and expressiveness of the data. Through
ones used the mode. This ensured the dataset remained thoughtful transformation, encoding, binning, and interaction
balanced and representative. creation, we were able to turn raw tabular data into a rich set
of inputs that could power an accurate and interpretable churn
Next, we addressed categorical variable encoding. Many prediction model. This step laid the foundation for the success
features in the Telco dataset are categorical, such as Contract, of the downstream modeling and evaluation stages.
PaymentMethod, and InternetService. These were transformed

© 2025, IJSREM | www.ijsrem.com DOI: | Page 6


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 05 | May - 2025 SJIF Rating: 8.586 ISSN: 2582-3930

7.CONCLUSION interest toward us. We are grateful to Dr.D.C.JULLIE


JOSPHINE M.E., Ph.D., Professor and Head of
The customer churn prediction project successfully INFORMATION TECHNOLOGY DEPARTMENT,
demonstrates how machine learning can be leveraged to tackle Kings Engineering College, for his valuable suggestions,
a critical business challenge—identifying customers likely to
guidance and encouragement. We wish to express our dear
discontinue a service. By applying the Random Forest
algorithm, a powerful ensemble method, the project achieved sense of gratitude and sincere thanks to our SUPERVISOR,
reliable and accurate results in predicting churn behavior MS.J.JENIFA M.E.,Assistant Professor, Information
based on customer attributes. This approach provides Technology Department. for her internal guidance. We
businesses with a proactive mechanism to engage at-risk express our sincere thanks to our parents, friends and staff
customers before they churn, thus preserving revenue and members who have helped and encouraged us during the
improving customer satisfaction. entire course of completing this project work successfully
Throughout the project, essential steps such as data cleaning,
exploratory data analysis, and feature engineering were REFERENCES
carefully executed to ensure that the model received high-
quality inputs. Visualizations revealed key patterns, such as [1] Customer churn prediction system: a machine learning
the influence of contract type, tenure, and monthly charges on approachP Lalwani, MK Mishra, JS Chadha, P Sethi -
churn likelihood. These insights not only improved the Computing, 2022 - Springer
model's predictive power but also offered valuable business
intelligence to guide retention strategies. [2] Customer churn prediction system: a machine learning
approachP Lalwani, MK Mishra, JS Chadha, P Sethi -
The Random Forest classifier proved to be an effective choice Computing, 2022 – Springer
due to its ability to handle both categorical and numerical
features, its robustness to overfitting, and its capability to rank [3] Customer churn prediction by hybrid neural networksCF
feature importance. It outperformed simpler models in terms Tsai, YH Lu - Expert Systems with Applications, 2009 –
of both accuracy and generalization, making it suitable for Elsevier
real-world deployment. Moreover, the model was successfully
integrated into a Flask API, enabling real-time predictions and [4] Customer churn prediction using improved balanced
seamless integration into customer relationship management random forestsY Xie, X Li, EWT Ngai, W Ying - Expert
systems. Systems with Applications, 2009 - Elsevier

Despite its success, the project also acknowledged certain [5] Social network analysis for customer churn predictionW
limitations, such as the model's complexity and longer Verbeke, D Martens, B Baesens - Applied Soft Computing,
prediction times compared to lighter algorithms. Additionally, 2014 - Elsevier
interpretability remains a challenge with ensemble models like
Random Forest. Addressing these limitations in future [6] A customer churn prediction model in telecom industry
iterations—such as using explainable AI techniques or using boostingN Lu, H Lin, J Lu, G Zhang - IEEE
exploring gradient boosting models—can further enhance the Transactions on Industrial …, 2012 - ieeexplore.ieee.org
system's transparency and effectiveness.

In conclusion, this project not only meets the objective of


predicting customer churn using machine learning but also
lays a foundation for actionable business strategies. With
ongoing monitoring, retraining, and integration into customer
support workflows, the model can play a key role in
improving customer retention and driving long-term business
growth.

ACKNOWLEDGEMENT

We thank God Almighty for the blessings, knowledge and


strength in enabling us to finish our project. Our deep
gratitude goes to our founder Late. Dr. D. SELVARAJ,
M.A., M.Phil., for his patronage in completion of our project.
We take this opportunity to thank our kind and honourable
Chairperson, Dr. S. NALINI SELVARAJ, M.Com.,
M.Phil., Ph.D., and our Honourable Director, Mr. S.
AMIRTHARAJ, B.Tech., M.B.A for their support to finish
our project successfully. We wish to express our sincere
thanks to our beloved Principal, Dr.C.RAMESH BABU
DURAI M.E., Ph.D., for his kind encouragement and his

© 2025, IJSREM | www.ijsrem.com DOI: | Page 7

You might also like