Bank Customer Segmentation
Bank Customer Segmentation
Customer
Segmentation
0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Table of Contents
Contents
Executive Summary ................................................................................................................................. 3
Introduction ............................................................................................................................................ 3
Data Description ..................................................................................................................................... 3
Sample of the dataset ......................................................................................................................... 3
Exploratory Data Analysis ....................................................................................................................... 4
Let us check the types of variables in the data frame. ....................................................................... 4
Check for missing values in the dataset .............................................................................................. 4
Descriptive Statistics ………………………………………………………………………………………………………………….....4
1.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate, Bi-variate, and
multivariate analysis............................................................................................................................5
Histplot, Univariate Analysis ............................................................................................................... 5
Skewness in data, distplot ..................................................................................................................6
Bivariate Analysis, pairplot..................................................................................................................7
Correlation Plot ..................................................................................................................................8
Check Outliers.....................................................................................................................................9
1.2 Do you think scaling is necessary for clustering in this case? Justify............................................10
1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum clusters using
Dendrogram and briefly describe them...............................................................................................11
1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbow
curve and silhouette score. Explain the results properly. Interpret and write inferences on the finalized
clusters..........................................................................................................................................12
The End....................................................................................................................................................
1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
List of Figures
Fig.1 – Histplot, Distplot ................................................................................................................................. 5
Fig.2 – Histplot, Distplot ................................................................................................................................. 6
Fig.3 – Pair plot .............................................................................................................................................. 7
Fig.4 – Heatmap ............................................................................................................................................. 8
Fig 5- Boxplot..................................................................................................................................................9
Fig 6- Dendrogram----------------------------------------------------------------------------------------------------------------- 10
Fig 7- Elbow plot-------------------------------------------------------------------------------------------------------------------- 14
List of Tables
Table 1. Dataset Sample ................................................................................................................................. 3
Table 2. Descriptive Statistics......................................................................................................................... 4
Table 3. Skewness of data .............................................................................................................................. 6
Table 4. Correlation between observation ....................................................................................................7
Table 5: Scaled data .........................................................................................................................................9
Table 6: Number of clusters and frequency table...........................................................................................10
Table 7: Kmeans and sil width.........................................................................................................................11
Table 8: Grouping as per clusters....................................................................................................................12
2
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Executive Summary
A leading bank wants to develop a customer segmentation to give promotional offers to its customers. They
collected a sample that summarizes the activities of users during the past few months. You are given the task to
identify the segments based on credit card usage.
Introduction
The purpose is to explore the data set and find the spending areas of the customers as accordance to
their credit profile, so promotional offers can be provided based on their transaction history.
Data Description
3
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Exploratory Data Analysis
Descriptive Statistics:
4
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.1 Read the data, do the necessary initial steps, and exploratory data analysis
(Univariate, Bi-variate, and multivariate analysis.
Univariate Analysis:
5
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Calculate the skewness in the dataset:
Data is rightly skewed for all variable, except for probability_of_full_payment which is left skewed
6
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Bivariate Analysis
7
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Correlation Plot
8
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Check Outliers:
1.2 Do you think scaling is necessary for clustering in this case? Justify
Yes, it’s necessary as we need to rescale the data for further clustering use as the variables are different
from each other and range needs to be added
9
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum
clusters using Dendrogram and briefly describe them
Cluster Frequency:
The observation for clustering would nominal be 3, based on the hierarchical clustering we have a pattern
of high, medium and low spending with variables max_spent_in single_shopping and
probability_of_full_payment.
10
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply
elbow curve and silhouette score. Explain the results properly. Interpret and write
inferences on the finalized clusters.
Within sum of squares ranging from 1 to 15:
[1469.9999999999998,
659.171754487041,
430.6589731513006,
371.38509060801096,
327.21278165661346,
289.31599538959495,
262.98186570162267,
241.81894656086033,
223.91254221002725,
206.39612184786694,
193.2835133180646,
182.97995389115258,
175.11842017053073,
166.02965682631788]
Its observed there are 3 to 4 points however we will go with 3 points for this
Calculation.
11
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The optimal number of clusters here would be 3.
12
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The KMeans is the lowest spending group
There are 3 clustering groups with high, medium and low spending
13
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited