0% found this document useful (0 votes)
88 views3 pages

Customer Churn Prediction

This document discusses developing a customer churn prediction model for ABC Bank. It analyzes customer data from the bank to predict whether customers will remain with the bank or churn. The data contains information on 10,000 customers. Exploratory analysis finds the data is normally distributed except estimated salary. Various classification models are tested, and random forest is found to perform best with 83% accuracy. The random forest model can correctly predict customer churn without overfitting or underfitting.

Uploaded by

mbibachris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views3 pages

Customer Churn Prediction

This document discusses developing a customer churn prediction model for ABC Bank. It analyzes customer data from the bank to predict whether customers will remain with the bank or churn. The data contains information on 10,000 customers. Exploratory analysis finds the data is normally distributed except estimated salary. Various classification models are tested, and random forest is found to perform best with 83% accuracy. The random forest model can correctly predict customer churn without overfitting or underfitting.

Uploaded by

mbibachris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CUSTOMER CHURN PREDICTION

1. INTRODUCTION

Customer churn (or customer attrition) refers to the loss of customers or subscribers for any
reason at all. Businesses measure and track churn as a percentage of lost customers compared
to the total number of customers over a given time period. This metric is usually tracked
monthly and reported at the end of the month. Bank customer churn is how likely the
customers of a bank are going to remain or continue to purchase the bank services. In other
words it is the customer retention rate of a bank.

This project looks at analysing predicting the customer churn for ABC BANK.

2. DATASET

The data used is a kaggle dataset which can be downloaded by clicking here.

This dataset is for ABC Multistate bank with following columns:

1.customer_id, unused variable.

2.credit_score, used as input.

3.country, used as input.

4.gender, used as input.

5.age, used as input.

6.tenure, used as input.

7.balance, used as input.

8.products_number, used as input.

9.credit_card, used as input.

10.active_member, used as input.

11.estimated_salary, used as input.

12.churn, used as the target. 1 if the client has left the bank during some period or 0 if he/she
has not.
3. PROJECT OBJECTIVE

This project aims to develop a robust customer churn prediction model for ABC Bank. This
model will identify customers at high risk of leaving the bank, enabling ABC to implement
targeted retention strategies and improve customer lifetime value.

This objective incorporates the following elements:

● Focuses on a specific outcome: Develop a churn prediction model.


● Benefits ABC Bank: Identifies at-risk customers for targeted retention efforts.
● Impact: Improves customer lifetime value by reducing churn.
● Measurable: Model performance can be measured by accuracy and other metrics.

4. EXPLORATORY ANALYSIS
The qualitative variables are well distributed with just a few outliers at the variable
product number.
The quantitative variables are normally distributed with the exception of the estimated
salary which is uniformly distributed. However, there are a few outliers throughout
the variables.

MACHINE LEARNING MODELS (CLASSIFICATION)

Since the target variable is categorical and binary in nature it is therefore appropriate to use
classification algorithms. The project used a number of algorithms which include:
1. Logistic regression: The accuracy of the model is 0.71.the model even though with
71% accuracy still predict more correct 0 than 1 which wont be best for our
predictions.
2. KNN: This model does not do well with the prediction as well. WITH 72% accuracy.
3. DECISION TREE CLASSIFIER: This model is fairly good but we can explore other
models as well with 80% accuracy.
4. Random forest: Also with an accuracy of 83%
5. Gaussian Naive Bayes: Also recorded an accuracy of 71%
KEY FINDINGS

From the results presented above, the random forest algorithm best predicts the customer
churn. it predicts 2039 zero correctly against 689 predicted wrongly and 443 ones correctly
against 178 predicted wrongly. The model can work correctly without overfitting and
underfitting.

CONCLUSION

The data was clean and had 10000 rows and 12 columns as the variables. The columns or
variables include customer_id, credit_score, country, gender, age, tenure, balance,
products_number, credit_card, active_member, estimated_salary, churn. However the
customer_id column was dropped because it was of no importance to the project. The churn
column was the target variable( that is what is to be predicted.

The categorical variables were well distributed with just a few differences in the counts of the
various classes in the variables with the exception of product number which had classes that
were hugely under represented.The continous variables were all normally distributed with the
exception of the estimated salary that had a seemly uniform distribution. However, there were
a few outliers in the age and balance variables which may not impact the modelling
significantly. In terms of comparing churn to all other categorical variables the number of
customers staying was significantly than those leaving.

The type of machine learning is a classification. The model seeks to predict whether a
customer will leave or stay at ABC BANK. Logistic Regression, K-NEAREST NEIGHBOR,
Decision Tree, Random Forest, Naive Bayes were all algorithms used in the modelling to
select the best algorithm for the prediction. Random Forest algorithm was the best with a
score of 83%.IT PREDICTS 2039 ZERO CORRECTLY AGAINST 689 PREDICTED
WRONGLY AND 443 ONES CORRECTLY AGAINST 178 PREDICTED WRONGLY.

You might also like