Project Report
Project Report
2) Client
3) Dataset Description
5) Outliers Treatment
6) Feature Encoding
9) Data Splitting
10 Feature Scaling
18) Cross-Validation
22) Conclusion
Problem Statement
In today's competitive business world, it's important to keep customers happy so they don't
stop using our products or services. We want to develop a model that can predict which
customers are likely to stop using our service, so we can take steps to keep them.
Customer churn can lead to a loss of revenue and a decrease in customers. We want to use
machine learning to build a model that can accurately predict which customers are likely to
churn based on their past behaviour, demographics, and subscription details. This will help
us target high-risk customers with personalized retention strategies.
We want to create a solution that will help us keep customers happy and using our products
or services for the long term.
Client
Proactive retention: The model can help the client identify customers who are likely
to churn before they actually do. This allows the client to take steps to retain those
customers, such as offering them discounts or special deals.
Cost savings: By focusing on high-risk customers, the client can allocate their
resources more effectively and save money on marketing and customer acquisition
costs.
Enhanced customer experience: Personalized retention efforts can improve the
overall customer experience, leading to increased satisfaction and loyalty. This can
make customers less likely to churn in the future.
Optimized marketing: Targeted marketing efforts can be tailored to specific
customer segments, improving the effectiveness of marketing campaigns. This can
help the client attract new customers and retain existing ones.
Business insights: The project can provide insights into factors that influence churn.
This information can be used to improve the client's products and services, making
them more appealing to customers.
Competitive edge: Effective churn prediction can help the client differentiate
themselves from their competitors. This can give the client an advantage in
attracting and retaining customers.
Revenue growth: Reduced churn rates mean a higher retention of paying customers.
This can lead to increased revenue growth and profitability.
Data-driven decisions: The model's insights can help the client make informed
decisions based on historical customer data. This can help the client improve their
products, services, and marketing campaigns.
Resource allocation: The model can help the client allocate customer service
resources more efficiently. This can help the client resolve customer issues more
quickly and effectively.
Long-term value: Improved customer retention can help the client build a
foundation for sustainable business growth and long-term success.
Data Description
Dataset consists customer information for a customer churn prediction problem. It includes
the following columns:
Location: Location where the customer is based, with options including Houston, Los
Angeles, Miami, Chicago, and New York.
Churn: A binary indicator (1 or 0) representing whether the customer has churned (1) or not
(0).
* All variables have the correct data type, and there are no missing values or duplicate
records.
* Descriptive statistics were generated for each variable, revealing insights into customer
demographics, subscription details, billing, usage, and churn behavior.
* Gender and Location distributions were analysed, indicating the gender and location
distribution of the customers.
Outliers Treatment
Outliers can affect model performance, so identifying and treating them is crucial.
Feature Encoding
Categorical variables were encoded to numerical values to enable machine learning
algorithms to process them effectively.
* Histograms and density plots were used to assess the distribution of numerical variables.
Feature Scaling
Feature scaling was applied to ensure all variables were on the same scale, aiding model
convergence.
* Random Forest Feature Importance was used to rank features based on their contribution
to the target variable.
Total_Usage_GB 0.290353
Age 0.194396
Subscription_Length_Months 0.142624
Gender_Male 0.016683
Location_Houston 0.010007
Location_Miami 0.009792
* Training and test data performance metrics were calculated, revealing the strengths and
weaknesses of each algorithm.
Hyperparameter Tuning
Hyperparameter tuning was explored to improve the model's performance, but no
substantial gains were achieved.
Cross-Validation
Cross-validation was performed to validate the model's performance and ensure it
generalized well to new data.
Model Evaluation
(I) Train & Test Data Metrics
The final XGBoost model's performance was evaluated using various metrics on both the
training and test datasets.
Saving Model
The final XGBoost model was saved as a pickle file for future use.
Conclusion
The customer churn prediction project involved thorough exploratory data analysis, pre-
processing, and the evaluation of various machine learning algorithms. The XGBoost
Classifier was selected as the final model due to its superior performance across different
metrics. While achieving optimal accuracy and recall is challenging, the insights gained from
this project can guide the company's strategies for customer retention and business growth.
Further analysis may involve gathering more data and exploring advanced techniques to
improve model performance.