0% found this document useful (0 votes)
420 views11 pages

Bank Customer Churn Analysis - Jupyter Notebook

The document is a Jupyter notebook analyzing customer churn for a bank. It imports necessary libraries, reads in a dataset on bank customers, and drops unnecessary columns. It then performs exploratory data analysis on the data, including plotting pie charts of categorical variables, bar plots of churn rates by geography and gender, and counts of churned vs not churned customers. It also encodes categorical variables as numeric and checks for null values before assigning variables as dependent (churn) and independent for modeling.

Uploaded by

akash.050501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
420 views11 pages

Bank Customer Churn Analysis - Jupyter Notebook

The document is a Jupyter notebook analyzing customer churn for a bank. It imports necessary libraries, reads in a dataset on bank customers, and drops unnecessary columns. It then performs exploratory data analysis on the data, including plotting pie charts of categorical variables, bar plots of churn rates by geography and gender, and counts of churned vs not churned customers. It also encodes categorical variables as numeric and checks for null values before assigning variables as dependent (churn) and independent for modeling.

Uploaded by

akash.050501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

Importing necessary libraries


In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns

Read dataset
In [2]: 1 df=pd.read_csv('Bank Churn_Modelling.csv')
2 df.head()

Out[2]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Bala

0 1 15634602 Hargrave 619 France Female 42 2

1 2 15647311 Hill 608 Spain Female 41 1 8380

2 3 15619304 Onio 502 France Female 42 8 15966

3 4 15701354 Boni 699 France Female 39 1

4 5 15737888 Mitchell 850 Spain Female 43 2 12551

In [3]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 RowNumber 10000 non-null int64
1 CustomerId 10000 non-null int64
2 Surname 10000 non-null object
3 CreditScore 10000 non-null int64
4 Geography 10000 non-null object
5 Gender 10000 non-null object
6 Age 10000 non-null int64
7 Tenure 10000 non-null int64
8 Balance 10000 non-null float64
9 NumOfProducts 10000 non-null int64
10 HasCrCard 10000 non-null int64
11 IsActiveMember 10000 non-null int64
12 EstimatedSalary 10000 non-null float64
13 Exited 10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB

Dropping unwanted columns

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 1/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [4]: 1 unnecessary_cols=['RowNumber','CustomerId','Surname']
2 df=df.drop(df[unnecessary_cols],axis=1)

EDA

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 2/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [5]: 1 for column in df.columns:


2 unique_values = df[column].value_counts()
3 if df[column].nunique()<6:
4 plt.figure()
5 plt.pie(unique_values, labels=unique_values.index, autopct='%1.
6 plt.title(f'Distribution of {column}')
7 plt.axis('equal')
8 ​
9 # Display all the pie charts
10 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 3/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 4/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 5/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [6]: 1 # Churn Rate by Geography


2 ​
3 plt.figure(figsize =(10,6))
4 ​
5 churn_rate_geo_gender = df.groupby(['Geography','Gender'])['Exited'].me
6 sns.barplot(data=churn_rate_geo_gender, x= 'Geography', y= 'Churn Rate'
7 plt.xlabel('Geography')
8 plt.ylabel('Churn Rate')
9 plt.title('Churn Rate by Geography & Gender')
10 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 6/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [7]: 1 churn_counts = df['Exited'].value_counts()


2 colors = ['#7B68EE', '#483D8B']
3 plt.figure(figsize=(8, 6))
4 plt.bar(churn_counts.index, churn_counts.values, color=colors)
5 plt.xlabel('Churn (Exited)')
6 plt.ylabel('Count')
7 plt.xticks(churn_counts.index, labels=['Not Churned', 'Churned'])
8 plt.title('Count of Customers Churned vs. Not Churned')
9 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 7/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [8]: 1 plt.figure(figsize=(10, 6))


2 sns.countplot(data=df, x='IsActiveMember', hue='Exited', palette='Set1'
3 plt.xlabel('Active Membership')
4 plt.ylabel('Count')
5 plt.title('Active Membership Distribution by Churn')
6 plt.legend(['Not Churned', 'Churned'])
7 plt.xticks([0, 1], ['Inactive', 'Active'])
8 plt.show()

Categorical values
In [9]: 1 for i in df:
2 if df[i].dtypes == object:
3 print(df[i].value_counts(),"\n")

Geography
France 5014
Germany 2509
Spain 2477
Name: count, dtype: int64

Gender
Male 5457
Female 4543
Name: count, dtype: int64

Encoding categorical variables

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 8/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [10]: 1 from sklearn.preprocessing import LabelEncoder


2 le=LabelEncoder()

In [11]: 1 df['Geography']=le.fit_transform(df['Geography'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)

0 France
1 Germany
2 Spain

In [12]: 1 df['Gender']=le.fit_transform(df['Gender'])
2 for i,j in enumerate (le.classes_):
3 print(i, j)

0 Female
1 Male

In [13]: 1 df.head()

Out[13]:
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsA

0 619 0 0 42 2 0.00 1 1

1 608 2 0 41 1 83807.86 1 0

2 502 0 0 42 8 159660.80 3 1

3 699 0 0 39 1 0.00 2 0

4 850 2 0 43 2 125510.82 1 1

In [14]: 1 df.isnull().sum()

Out[14]: CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64

Assigning dependent and independent


variable
In [15]: 1 X = df.drop('Exited', axis=1)
2 y = df['Exited']

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 9/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

Splitting dataset to training and testing set

In [16]: 1 from sklearn.model_selection import train_test_split


2 X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=.70
3 X_train.shape, y_test.shape

Out[16]: ((7000, 10), (3000,))

Models
In [17]: 1 from sklearn.metrics import classification_report, confusion_matrix
2 def print_metrics(model, X_train=X_train,y_train = y_train, X_test = X_
3 model.fit(X_train, y_train)
4 y_pred = model.predict(X_test)
5 print(classification_report(y_test,y_pred))
6 print(confusion_matrix(y_test,y_pred))

In [18]: 1 from sklearn.linear_model import LogisticRegression


2 print_metrics(LogisticRegression())

precision recall f1-score support

0 0.81 0.97 0.89 2416


1 0.44 0.08 0.14 584

accuracy 0.80 3000


macro avg 0.63 0.53 0.51 3000
weighted avg 0.74 0.80 0.74 3000

[[2354 62]
[ 536 48]]

In [19]: 1 from sklearn.tree import DecisionTreeClassifier


2 print_metrics(DecisionTreeClassifier())

precision recall f1-score support

0 0.88 0.87 0.87 2416


1 0.48 0.50 0.49 584

accuracy 0.80 3000


macro avg 0.68 0.68 0.68 3000
weighted avg 0.80 0.80 0.80 3000

[[2096 320]
[ 294 290]]

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 10/11


11/12/2023, 10:49 Bank Customer Churn Analysis - Jupyter Notebook

In [20]: 1 from xgboost import XGBClassifier


2 print_metrics(XGBClassifier())

precision recall f1-score support

0 0.88 0.95 0.92 2416


1 0.70 0.48 0.57 584

accuracy 0.86 3000


macro avg 0.79 0.71 0.74 3000
weighted avg 0.85 0.86 0.85 3000

[[2299 117]
[ 305 279]]

In [21]: 1 from sklearn.ensemble import RandomForestClassifier


2 print_metrics(RandomForestClassifier())

precision recall f1-score support

0 0.88 0.97 0.92 2416


1 0.78 0.46 0.58 584

accuracy 0.87 3000


macro avg 0.83 0.72 0.75 3000
weighted avg 0.86 0.87 0.86 3000

[[2338 78]
[ 314 270]]

In [22]: 1 # Factors contributing to customer attrition :


2 # 1. Female (Gender)
3 # 2. Germany (Geography)

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Bank Customer Churn Analysis.ipynb 11/11

You might also like