0% found this document useful (0 votes)
21 views16 pages

DataScience Project-New

This project report presents a Customer Churn Prediction Model developed using machine learning techniques to identify factors contributing to customer attrition in competitive industries. The model employs Random Forest and Decision Tree Classifiers, utilizing data such as demographics and service usage to predict churn and inform proactive retention strategies. Key insights include the importance of contract types, service offerings, and customer demographics in reducing churn rates and enhancing customer loyalty.

Uploaded by

Sa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

DataScience Project-New

This project report presents a Customer Churn Prediction Model developed using machine learning techniques to identify factors contributing to customer attrition in competitive industries. The model employs Random Forest and Decision Tree Classifiers, utilizing data such as demographics and service usage to predict churn and inform proactive retention strategies. Key insights include the importance of contract types, service offerings, and customer demographics in reducing churn rates and enhancing customer loyalty.

Uploaded by

Sa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Data-Driven Customer Churn Analysis

PROJECT REPORT
Submitted by:

Mariam Ghani RA2211042020017

B S Sanath RA2211042020020

Swathika M RA2211042020045

In partial satisfaction of the requirements for the degree of

BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND BUSINESS SYSTEMS ENGINEERING

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY


RAMAPURAM, CHENNAI-600089
MAY 2025
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY,
RAMAPURAM

BONAFIDE CERTIFICATE

Certified that this project report “POLICE CRIME ANALYSIS” is the Bonafide
work of “SANJAY A N (RA2211042020022),AKHIL AHMED
(RA2211042020008), SIDDHARTH S (RA2211042020054)” carried out the
21CSC355T– Data Mining And Analytics project work under my supervision.

SIGNATURE

Ms. S. Vaishnavii,M.E

Assistant Professor – CSE

School of Computer Science Engineering,

SRM Institute of Science & Technology,

Ramapuram, Chennai - 600089


1. ABSTRACT 01
2. INTRODUCTION 02
3. PROBLEM STATEMENT 03
4. SYSTEM ANALYSIS 04
5. SYSTEM REQUIREMENTS 05
6. SYSTEM ARCHITECTURE 06
7. SYSTEM MODULES 07
8. SYSTEM IMPLEMENTATION 09
9. PERFORMANCE ANALYSIS 13
10. CONCLUSION 16
11. FUTURE ENHANCEMENT 17
12. REFERENCES 18

APPENDIX
1. SAMPLE CODING 19
2. SAMPLE OUTPUT 20

LIST OF FIGURE NAME/TABLE NAME


LIST OF ABBREVATIONS
ABSTRACT
CHAPTER – 1
Introduction

Customer retention is a critical factor in the success of any business, particularly in industries with
high competition, such as telecommunications, banking, and subscription-based services. Customer
churn refers to the rate at which customers stop using a company’s product or service over a given
period. Understanding and predicting churn can help businesses take proactive measures to enhance
customer satisfaction, improve services, and optimize marketing strategies.

This report presents a Customer Churn Prediction Model developed using machine learning
techniques to identify key factors contributing to customer attrition. The model leverages Random
Forest and Decision Tree Classifiers to predict whether a customer is likely to churn based on
demographic, service usage, and contract-related features. To address data imbalance, Synthetic
Minority Over-sampling Technique (SMOTE) is applied, ensuring better prediction accuracy for
minority classes. The model's performance is optimized using grid search and cross-validation,
evaluated through key metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
The primary objective of this model is to provide businesses with predictive insights into customer
behavior, enabling them to take preemptive actions before a customer decides to leave. By analyzing
patterns in customer demographics, service usage, contract types, and support interactions, the model
helps organizations pinpoint the major causes of churn. This allows for the development of targeted
customer engagement strategies, such as personalized offers, loyalty programs, and improved
customer service interventions, ultimately leading to higher retention rates and reduced revenue
losses.
Beyond prediction, this model also serves as a strategic tool for business optimization. By identifying
key factors driving customer churn, businesses can refine their service offerings, pricing strategies,
and customer support systems to better meet customer needs. The insights derived from this analysis
can also inform marketing strategies, helping companies allocate resources efficiently to maximize
customer lifetime value (CLV). With this approach, businesses can shift from a reactive churn
management model to a proactive, data-driven decision-making framework, ultimately fostering
long-term customer loyalty and business growth.

2. Dataset Description
The Customer Churn Prediction Model is built on an extensive dataset containing customer
demographic details, service subscription attributes, and account-related information. The dataset
consists of thousands of customer records, each characterized by multiple numerical and categorical
features that influence customer retention and churn behavior. The primary goal of this dataset is to
provide a comprehensive view of customer engagement and service usage patterns, enabling
machine learning models to identify key factors contributing to churn.

The dataset contains a mix of numerical and categorical features, ensuring that the model can capture
the complex interactions between customer demographics, service preferences, and churn tendencies.
By analyzing this data, the model aims to predict whether a customer is likely to leave the service
provider, allowing businesses to take proactive retention measures.

Below are some key features of the dataset:

1. Tenure: A numerical value representing the number of months a customer has been with the
company. Longer tenure generally indicates higher loyalty.
2. Contract Type: Categorical data indicating whether the customer has a month-to-month, one-
year, or two-year contract. Customers with longer contracts tend to have lower churn rates.
3. Monthly Charges: The amount a customer is billed each month. Higher charges may
influence churn, especially if customers perceive a lack of value in the services provided.
4. Total Charges: The cumulative amount paid by a customer over their entire tenure. It can
indicate long-term customer value and spending behavior.
5. Payment Method: The mode of payment used by customers, such as electronic check, mailed
check, bank transfer, or credit card. Certain payment methods may correlate with higher
churn rates.
6. Internet Service Type: Indicates whether a customer has DSL, Fiber Optic, or No Internet
Service. Fiber optic users may have different churn tendencies compared to DSL users.
7. Phone Service: A binary feature specifying whether a customer has a phone service.
8. Multiple Lines: Whether the customer has multiple phone lines, which may suggest higher
engagement with the provider.
9. Online Security: Indicates whether the customer has subscribed to an additional online
security service. Customers who use such services may have a higher perceived value of the
provider.
10. Online Backup: A feature indicating whether customers have cloud-based backup services.
11. Tech Support: Whether a customer has access to technical support services, which could
impact customer satisfaction and churn likelihood.
12. Streaming TV and Streaming Movies: Specifies whether the customer has access to
streaming services as part of their subscription package. Entertainment-based services can
affect customer retention.
13. Device Protection: Indicates whether a customer has opted for device protection plans, which
may add value to their subscription.
14. Dependents: A categorical feature specifying whether the customer has dependents.
Customers with families may have different usage behaviors and churn patterns.
15. Partner Status: Indicates whether a customer has a spouse or partner. This demographic
attribute can impact the likelihood of churn.
16. Senior Citizen: A binary feature identifying whether a customer is a senior citizen (1 = Yes, 0
= No). Age demographics can influence churn rates.
17. Churn: The target variable, indicating whether the customer has churned (1 = Yes, 0 = No).

3. Sample dataset

4. Trend Analysis

Churn Rate Distribution


One of the key insights from the dataset is the distribution of customer churn. Analysis of churn rates
shows:

 A significant portion of customers remain loyal, with the majority classified as non-churners.

 However, a substantial number of customers churn within the first 6 to 12 months,


highlighting early dissatisfaction as a critical factor.

 Month-to-month contract holders exhibit a higher churn rate, whereas customers with long-
term contracts (one or two years) tend to stay longer.
 These trends indicate that contract type and early engagement strategies play a crucial role in
customer retention.

Service Usage Trends


Examining the dataset’s service usage features reveals the most common service preferences. Key
observations include:

 Internet service type is a major factor in churn behavior:

 Fiber optic users have a higher churn rate compared to DSL users, likely due to pricing
concerns or competition.

 Customers without internet service have the lowest churn rate, indicating they are less likely
to switch providers.

 Additional services impact retention:

 Customers subscribed to online security, tech support, and backup services show lower churn
rates.

 Customers who do not opt for these add-ons tend to churn more frequently, suggesting that
bundled services enhance retention.

 These findings highlight the importance of offering value-added services to improve


customer loyalty.

Monthly Charges vs. Churn


 A scatter plot analysis of monthly charges and churn rates suggests:

 Higher monthly charges correlate with increased churn, indicating that cost-sensitive
customers are more likely to leave.

 Customers with lower monthly charges tend to stay longer, possibly due to perceived
affordability.

 Subscription-based add-ons like streaming services and device protection plans can influence
churn, as customers may reconsider expenses over time.

 This trend suggests that pricing strategies and discounts for high-risk customers could be
effective in reducing churn.

Demographic Trends in Churn


 Understanding customer demographics can reveal significant patterns in churn behavior:

 Senior citizens have a higher churn rate, possibly due to lower engagement with digital
services or financial constraints.
 Customers with dependents and partners tend to have a lower churn rate, suggesting that
family plans contribute to retention.

 Customers paying via electronic check show a significantly higher churn rate compared to
those using bank transfers or credit cards, indicating that payment method could be an early
predictor of churn.

 These trends emphasize the role of personalized retention strategies, such as tailored offers
for senior customers or incentives for stable payment methods.

Key Insights from the Analysis


1. Customers on month-to-month contracts are more likely to churn, while long-term contract
holders show higher retention rates.

2. Fiber optic internet users have a higher churn rate than DSL users, indicating potential
dissatisfaction with pricing or service reliability.

3. Value-added services like tech support and online security reduce churn, suggesting
businesses should promote these features as retention tools.

4. High monthly charges are linked to higher churn, reinforcing the need for cost-effective
pricing models and discount offers for at-risk customers.

5. Demographics influence churn, with senior citizens and electronic check users showing
higher churn rates, highlighting the importance of targeted engagement strategies.

5. Model building

Customer churn prediction involves supervised machine learning techniques, where historical
customer data is used to train a model to classify customers as churners or non-churners. The
objective is to develop a robust model that accurately identifies at-risk customers, enabling
businesses to take proactive retention measures. Below is an explanation of different machine
learning approaches and their relevance to this dataset.

1. Decision Tree Classifier


 A rule-based supervised learning algorithm that creates a tree-like structure to classify
customers based on feature splits.

 Each node represents a decision based on a customer attribute (e.g., contract type, monthly
charges).

 The model continues to split the data until reaching a leaf node (churn or non-churn).

 Application in this project:


 Provides an interpretable model that shows which features are most important in determining
churn.

 Useful for identifying thresholds where customers become high-risk, such as monthly charges
exceeding a certain amount.

2. Random Forest Classifier


 An ensemble learning technique that combines multiple decision trees to improve prediction
accuracy.

 Reduces overfitting by averaging the predictions of several trees.

 Application in this project:

 Increases prediction robustness by mitigating noise in individual decision trees.

 Helps in understanding which customer attributes contribute most to churn.

3. Logistic Regression
 A linear model for binary classification, predicting the probability of churn.

 Uses a sigmoid function to output a probability score between 0 and 1.

 Application in this project:

 Useful for understanding the relationship between customer attributes and churn probability.

 Can quantify how much a unit increase in monthly charges impacts churn likelihood.

4. Support Vector Machine (SVM)


 A boundary-based classifier that separates churners and non-churners using a hyperplane.

 Works well for datasets with clear margins of separation.

 Application in this project:

 Can be effective when combined with kernel tricks to capture complex, nonlinear
relationships.

 Useful if customer churn data exhibits a clear decision boundary based on key features.

5. XGBoost (Extreme Gradient Boosting)


 A powerful gradient boosting algorithm that combines weak learners (decision trees) to create
a strong predictive model.
 Highly efficient and scalable, often outperforming other models in structured data tasks.

 Application in this project:

 Handles imbalanced datasets well, making it ideal for churn prediction.

 Works effectively with missing data and can capture complex interactions between variables.

6. Artificial Neural Networks (ANNs)


 A deep learning model that mimics the human brain, learning complex patterns in data.

 Can capture nonlinear relationships that traditional machine learning models might miss.

 Application in this project:

 Suitable for large-scale churn datasets with intricate dependencies between features.

 Can be fine-tuned with techniques like dropout and batch normalization to improve
performance.
Optimal Case:

The optimal case for a customer churn prediction model is when it achieves high predictive accuracy,
strong generalization to unseen data, and actionable insights that enable businesses to retain at-risk
customers effectively.
6. Challenges Faced

1. Handling Class Imbalance: The dataset had a significantly lower number of churned
customers compared to retained customers. To address this, techniques like SMOTE
(Synthetic Minority Over-sampling Technique) and class weighting were implemented.
2. Feature Selection and Engineering: Identifying the most relevant features that contribute to
churn prediction required multiple rounds of feature importance analysis and correlation
studies.
3. Optimizing Model Performance: Balancing between precision and recall was challenging,
as a model with high precision but low recall would miss many actual churners, while high
recall but low precision would generate too many false positives.
4. Hyperparameter Tuning: Finding the right combination of hyperparameters (e.g., number
of estimators, max depth, learning rate) for Random Forest and XGBoost required
extensive fine-tuning with Grid Search and Cross-Validation.
5. Interpreting Model Decisions: Business stakeholders needed understandable insights, so
SHAP values and feature importance visualizations were used to explain the model's
decisions.

7. Future Enhancements

 Handling Class Imbalance: The dataset had a significantly lower number of churned
customers compared to retained customers. To address this, techniques like SMOTE
(Synthetic Minority Over-sampling Technique) and class weighting were implemented.
 Feature Selection and Engineering: Identifying the most relevant features that contribute to
churn prediction required multiple rounds of feature importance analysis and correlation
studies.
 Optimizing Model Performance: Balancing between precision and recall was challenging, as
a model with high precision but low recall would miss many actual churners, while high
recall but low precision would generate too many false positives.
 Hyperparameter Tuning: Finding the right combination of hyperparameters (e.g., number of
estimators, max depth, learning rate) for Random Forest and XGBoost required extensive
fine-tuning with Grid Search and Cross-Validation.
 Interpreting Model Decisions: Business stakeholders needed understandable insights, so
SHAP values and feature importance visualizations were used to explain the model's
decisions.

8. Recommendations

 Improve Customer Support and Engagement: Customers with frequent service issues or low
engagement levels should receive personalized assistance and proactive customer service.
 Offer Tailored Retention Programs: High-risk churners identified by the model can be offered
loyalty rewards, discounts, or better subscription plans to encourage retention.

 Monitor Billing and Payment Trends: Customers who frequently miss payments or use high-
cost payment methods (like one-time electronic checks) are at higher risk of churn. Offering
flexible billing options can help retain them.

 Targeted Marketing Campaigns: Use churn predictions to develop personalized marketing


campaigns, focusing on at-risk customers with customized offers, exclusive deals, or feature
recommendations.

9. Conclusion

The Customer Churn Prediction Model provides valuable insights into the key factors driving
customer attrition and offers actionable strategies to enhance customer retention. By leveraging
machine learning techniques such as Random Forest, Decision Trees, and XGBoost, businesses can
identify at-risk customers before they churn and implement targeted interventions to improve
retention rates.

One of the most significant findings from the model is that billing and contract-related factors play a
crucial role in churn. Customers on month-to-month contracts and those using electronic check
payments are more likely to leave. This indicates that businesses can reduce churn by offering long-
term contract incentives and introducing more flexible payment options.

Additionally, service quality and engagement levels strongly influence churn risk. Customers with
frequent technical issues or limited engagement with services are at higher risk. This highlights the
importance of proactive customer support, personalized recommendations, and service quality
improvements to retain customers effectively.

From a business strategy perspective, integrating this model into customer relationship management
(CRM) systems can provide real-time churn alerts and enable data-driven decision-making. AI-
driven churn prediction can help businesses prioritize high-risk customers, optimize marketing
efforts, and design better retention programs, leading to increased customer satisfaction and revenue
growth.

Looking ahead, future improvements such as real-time prediction systems, advanced feature
engineering, and reinforcement learning-based retention strategies can further refine the model’s
effectiveness. By continuously adapting to evolving customer behaviors, businesses can stay ahead
of churn risks and build stronger, long-term customer relationships.

You might also like