100% found this document useful (1 vote)

328 views

Clustering Analysis: Reading The Data

The document discusses clustering analysis performed on customer data from a bank to identify customer segments for targeted promotional offers. Exploratory data analysis was conducted on the variables, which found correlations between spending, payments, and credit limits. Both hierarchical and K-means clustering were applied to scaled data and identified 3 optimal clusters. The clusters were profiled as high, medium, and low spending groups. Different promotional strategies were recommended for each cluster focused on rewards, discounts, credit limits, and partner brands based on their spending behaviors and payment histories.

Uploaded by

KATHIRVEL S

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

328 views

Clustering Analysis: Reading The Data

Uploaded by

KATHIRVEL S

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

1

CLUSTERING ANALYSIS
A leading bank wants to develop a customer segmentation to give promotional offers to its
customers. They collected a sample that summarizes the activities of users during the past
few months. You are given the task to identify the segments based on credit card usage.
Data Dictionary for Market Segmentation:
1. spending: Amount spent by the customer per month (in 1000s)
2. advance payments: Amount paid by the customer in advance by cash (in 100s)
3. probability_of_full_payment: Probability of payment done in full by the customer to
the bank
4. current balance: Balance amount left in the account to make purchases (in 1000s)
5. credit limit: Limit of the amount in credit card (10000s)
6. min_payment_amt : minimum paid by the customer while making payments for
purchases made monthly (in 100s)
7. max_spent_in_single_shopping: Maximum amount spent in one purchase (in 1000s)

1.1 Read the data and do exploratory data analysis. Describe the data briefly.
So, we will import all the necessary libraries for cluster analysis,
Import numpy as np
Import pandas as pd
Import matplotlib.pyplot as plt
Import seaborn as sns
From sklearn.cluster import KMeans
From sklearn.metrics import silhouette samples, silhouette score
Reading the data,

The data seems to be perfect

The shape of the data is (210, 7)
The info of the data indicates that all values are float
No Null values in the data
No missing values in the data
2

Description of the Data

We have 7 variables,
No null values present in any variables.
The mean and median values seems to be almost equal.
The standard deviation for spending is high when compared to other variables.
No duplicates in the dataset

Exploratory Data Analysis

Univariate / Bivariate analysis
Helps us to understand the distribution of data in the dataset. With univariate analysis we can
find patterns and we can summarize the data and have understanding about the data to solve
our business problem.
3

The box plot of the spending variable shows no outliers.

Spending is positively skewed - 0.399889.
We could also understand there could be chance of multi modes in the dataset.
The dist plot shows the distribution of data from 10 to 22

The box plot of the advance payments variable shows no outliers.

advance payments is positively skewed - 0.386573.
We could also understand there could be chance of multi modes in the dataset.
The dist plot shows the distribution of data from 12 to 17

The box plot of the probability of full payment variable shows few outliers.
Probability of full payment is negatively skewed - -0.537954

The dist plot shows the distribution of data from 0.80 to 0.92.

The Probability values is good above 80%

The box plot of the current balance variable shows no outliers.

Current balance is positively skewed - 0.525482

The dist plot shows the distribution of data from 5.0 to 6.5.

The box plot of the credit limit variable shows no outliers.

Credit limit is positively skewed - 0.134378

The dist plot shows the distribution of data from 2.5 to 4.0
5

The box plot of the min payment amount variable shows few outliers.
Min payment amount is positively skewed - 0.401667

The dist plot shows the distribution of data from 2 to 8

The box plot of the max spent in single shopping variable shows no outliers.
Max spent in single shopping is positively skewed - 0.561897

The dist plot shows the distribution of data from 4.5 to 6.5

No outlier treatment only 3 to 4 values re observed has outlier we are treating them
6

Multivariate analysis
Check for multicollinearity
7

Heatmap for Better Visualization

Observations
Strong positive correlation
Between - spending & advance payments,
-advance payments & current balance,
- Credit limit & spending
- Spending & current balance
- credit limit & advance payments
- Max_spent_in_single_shopping current balance
8

1.2 Do you think scaling is necessary for clustering in this case? Justify
Yes, scaling is very important as the model works based on the distance based computations
scaling is necessary for unscaled data.
Scaling needs to be done as the values of the variables are in different scales.
Spending, advance payments are in different values and this may get more weightage.
Scaling will have all the values in the relative same range.
I have used standard scalar for scaling
Below is the snapshot of scaled data.

1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum
clusters using Dendrogram and briefly describe them
Hierarchical clustering

For visualization purposes I have used to Dendrogram

The above dendrogram indicates all the data points have clustered to different clusters by wards
method.

To find the optimal number cluster through which we can solve our business objective we use
truncate mode = lastp.

Wherein we can give last p = 10 according to industry set base value.

Now, we can understand all the data points have clustered into 3 clusters.

Next to map these clusters to our dataset we can use fclusters

Now, we can look at the cluster frequency in our dataset,

Cluster profiling to understand the business problem.

By choosing average method to the scaled data,

The above dendrogram indicates all the data points have clustered to different clusters by average
method.

To find the optimal number cluster through which we can solve our business objective we use
truncate mode = lastp.

Wherein we can give last p = 10 according to industry set base value.

Now, we can understand all the data points have clustered into 3 clusters.

Next to map these clusters to our dataset we can use fclusters

Now, we can look at the cluster frequency in our dataset,

Observation
Both the method are almost similar means, minor variation, which we know it occurs.
There was not too much variations from both methods

Cluster grouping based on the dendrogram, 3 or 4 looks good. Did the further analysis, and
based on the dataset had gone for 3 group cluster
And three group cluster solution gives a pattern based on high/medium/low spending with
max_spent_in_single_shopping (high value item) and probability_of_full_payment (payment
made).

1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply
elbow curve and silhouette score.
K-means clustering,
Randomly we decide to give n_clusters = 3 and we look at the distribution of clusters
according to the n_clusters.
We apply K-means technique to the scaled data.

Cluster output for all the observations in the dataset,

We have 3 clusters 0,1,2

To find the optimal number of clusters, we can use k-elbow method

To find the inertia value for all the clusters from 1 to 11, I used a for loop to find the optimal number
of clusters.

The silhouette score for 3 clusters is good

The elbow curve seen here also shows us after 3 clusters there is no huge drop in the values, so we
select 3 clusters.

So adding the cluster results to our dataset to solve our business objective.
14

This table shows the clusters to the dataset and also individual sil_width score.

Cluster frequency

This frequency shows frequency of clusters to the dataset.

3-Group clusters via K- Means has equal split of percentage of results.

Cluster 0 Medium

Cluster 1 low

Cluster 2 High

Observation
By K-
values. Also the elbow curve seems to show similar results.

The silhouette width score of the K means also seems to very less value that indicates all the data
points are properly clustered to the cluster. There is no mismatch in the data points with regards to
clustering

1.5 Describe cluster profiles for the clusters defined. Recommend different promotional
strategies for different clusters.

Group 1: High Spending Group

Giving any reward points might increase their purchases.
Maximum max_spent_in_single_shopping is high for this group, so can be offered
discount/offer on next transactions upon full payment
Increase their credit limit and
Increase spending habits
Give loan against the credit card, as they are customers with good repayment
record.
Tie up with luxury brands, which will drive more one_time_maximun spending

Group 2: Low Spending Group - customers should be given remainders for

payments. Offers can be provided on early payments to improve their payment rate.
- Increase their spending habits by tying up with grocery stores, utilities (electricity,
phone, gas, others)

Group 3: Medium Spending Group - They are potential target customers who are
paying bills and doing purchases and maintaining comparatively good credit score.
So we can increase credit limit or can lower down interest rate. - Promote premium
cards/loyalty cars to increase transactions. - Increase spending habits by trying with
premium ecommerce sites, travel portal, travel airlines/hotel, as this will encourage
them to spend more

Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
AC 310 Service Manual-V1.0
100% (2)
AC 310 Service Manual-V1.0
65 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Project Questions
No ratings yet
Project Questions
4 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
722.9 Introduction To Mercedes
91% (11)
722.9 Introduction To Mercedes
46 pages
Async JavaScript and HTTP Requests - Learn JavaScript - Requests Cheatsheet - Codecademy
No ratings yet
Async JavaScript and HTTP Requests - Learn JavaScript - Requests Cheatsheet - Codecademy
4 pages
CLUSTERING ANALYSIS State Wise Health PDF
No ratings yet
CLUSTERING ANALYSIS State Wise Health PDF
14 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Clustering Project
100% (1)
Clustering Project
44 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Problem 2 - Survey: Importing Nessceary Libraries
No ratings yet
Problem 2 - Survey: Importing Nessceary Libraries
10 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
Pradeep Chauhan Business Report 09july'23
100% (1)
Pradeep Chauhan Business Report 09july'23
32 pages
BUSINESS REPORT Part 1
No ratings yet
BUSINESS REPORT Part 1
9 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Paper 4-Churn Prediction in Telecommunication PDF
No ratings yet
Paper 4-Churn Prediction in Telecommunication PDF
3 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Buisiness Reoprt Extended As Project Report
No ratings yet
Buisiness Reoprt Extended As Project Report
18 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
No ratings yet
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
18 pages
Machine Learning Project
100% (1)
Machine Learning Project
1 page
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
AS Graded Project Suchi Solanki
No ratings yet
AS Graded Project Suchi Solanki
21 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
ML Quiz 3
No ratings yet
ML Quiz 3
2 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
PG Program Dsba
No ratings yet
PG Program Dsba
16 pages
Data Mining Project - PCA - Hair Salon
No ratings yet
Data Mining Project - PCA - Hair Salon
8 pages
Solution To Problem 1: Importing The Libraries
No ratings yet
Solution To Problem 1: Importing The Libraries
6 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
SMDM Project
No ratings yet
SMDM Project
17 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
100% (1)
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
20 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Data Mining Project Anshul
100% (1)
Data Mining Project Anshul
48 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Rose Sparkling Wine
100% (1)
Rose Sparkling Wine
32 pages
Clustering Analysis: Reading The Data
100% (1)
Clustering Analysis: Reading The Data
15 pages
Travel Agency Package
No ratings yet
Travel Agency Package
26 pages
Claim Analysis
No ratings yet
Claim Analysis
54 pages
Data Visvilization Project Boston-Condo-Sales
No ratings yet
Data Visvilization Project Boston-Condo-Sales
61 pages
Get An Invitation to Pursuit Evasion Games and Graph Theory Student Mathematical Library 97 1st Edition Anthony Bonato PDF ebook with Full Chapters Now
100% (1)
Get An Invitation to Pursuit Evasion Games and Graph Theory Student Mathematical Library 97 1st Edition Anthony Bonato PDF ebook with Full Chapters Now
40 pages
METAR and TAF Weather Reports Area Forecast
No ratings yet
METAR and TAF Weather Reports Area Forecast
11 pages
Logarithmic Function Equation and Inequality
100% (1)
Logarithmic Function Equation and Inequality
44 pages
Let's Practice. Work On The: Following Transformations and Patterns Carefully
No ratings yet
Let's Practice. Work On The: Following Transformations and Patterns Carefully
5 pages
Catalog Elab Pharma Machine-1 (Electrolab)
No ratings yet
Catalog Elab Pharma Machine-1 (Electrolab)
52 pages
Personal Care Surfactants
100% (1)
Personal Care Surfactants
72 pages
DC Pandey Physics PDF 945d46f5
No ratings yet
DC Pandey Physics PDF 945d46f5
89 pages
Driver EPSON Stylus TX121 Maintenance Printer Tips
No ratings yet
Driver EPSON Stylus TX121 Maintenance Printer Tips
2 pages
Comparision of RHEL7 and RHEL 8.
No ratings yet
Comparision of RHEL7 and RHEL 8.
11 pages
Lindell 80 Series Manual
No ratings yet
Lindell 80 Series Manual
18 pages
Anatomy of A Constant Elasticity of Substitution Type Production/Utility Function in Three Dimensions (A Visual Guide For Econ Majors)
No ratings yet
Anatomy of A Constant Elasticity of Substitution Type Production/Utility Function in Three Dimensions (A Visual Guide For Econ Majors)
7 pages
3 SG Strength
No ratings yet
3 SG Strength
38 pages
Nutanix - AHV Admin Guide v510
No ratings yet
Nutanix - AHV Admin Guide v510
72 pages
situation based question complied
No ratings yet
situation based question complied
21 pages
Huerta-Beristain Et Al-2017-Journal of Chemical Technology and Biotechnology
No ratings yet
Huerta-Beristain Et Al-2017-Journal of Chemical Technology and Biotechnology
7 pages
Grade 3 Math Vocabulary
100% (1)
Grade 3 Math Vocabulary
3 pages
Al Thomali 2017
No ratings yet
Al Thomali 2017
6 pages
BMR Calculator
No ratings yet
BMR Calculator
1 page
Lumpy Disease Classification Using Deep Learning
No ratings yet
Lumpy Disease Classification Using Deep Learning
7 pages
1.V11D1 V11ep1 (Hea160) 10%a, 60%S
No ratings yet
1.V11D1 V11ep1 (Hea160) 10%a, 60%S
202 pages
Acid Blue 9
No ratings yet
Acid Blue 9
21 pages
Knowledge: II. Empiricism vs. Rationalism vs. Constructivism
No ratings yet
Knowledge: II. Empiricism vs. Rationalism vs. Constructivism
9 pages
Wireless Everywhere - Wifi Foundations Workshop
No ratings yet
Wireless Everywhere - Wifi Foundations Workshop
128 pages
utf-8''ECO1161 Slides for Week 10 BD %281%29
No ratings yet
utf-8''ECO1161 Slides for Week 10 BD %281%29
13 pages
Framework Manager-0124 IBM Cognos
No ratings yet
Framework Manager-0124 IBM Cognos
61 pages
Contactors and Contactor Assemblies: Sirius
No ratings yet
Contactors and Contactor Assemblies: Sirius
16 pages
Batch Management Customizing in SAP
100% (1)
Batch Management Customizing in SAP
19 pages