0% found this document useful (0 votes)

107 views30 pages

UL Coded Project Report - KC

The AllLife Bank Project Case Study aims to segment customers based on their spending patterns and interactions with the bank using clustering algorithms. The analysis employs K-means and hierarchical clustering to identify distinct customer groups, leading to actionable recommendations for targeted marketing and improved service delivery. Key findings reveal varying customer profiles, including high-end clients with significant online engagement and low-end clients who prefer in-person visits.

Uploaded by

kart238

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views30 pages

UL Coded Project Report - KC

Uploaded by

kart238

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

UL Graded Project - Coded

AllLife Bank Project Case Study

Submitted By
Karthik Chandrasekaran

1
Contents
Problem Statement – Coded Project ............................................................................................................ 5
Context ................................................................................................................................................... 5
Objective ................................................................................................................................................. 5
Data Description ...................................................................................................................................... 5
Data Dictionary .................................................................................................................................... 5
Data Overview......................................................................................................................................... 6
Data Preprocessing .................................................................................................................................. 7
Dataset Summary Statistics .................................................................................................................. 7
Exploratory Data Analysis ........................................................................................................................ 8
Univariate Analysis .............................................................................................................................. 8
Outlier Treatment ..................................................................................................................................14
Scaling the dataset before clustering ......................................................................................................15
K-means clustering algorithms ................................................................................................................15
Apply K-means - Elbow curve ..............................................................................................................15
Silhouette Score ..................................................................................................................................16
Cluster Profiling ......................................................................................................................................17
Hierarchical clustering ............................................................................................................................19
Compare cluster K-means clusters and Hierarchical clusters ....................................................................23
Actionable Recommendations ................................................................................................................24
Using PCA to reduce the number of variables ..........................................................................................25
Actionable Recommendations ................................................................................................................30

2
List of Figures

Sl. Page
List of Figures
No Number
1 Fig 1: Avg Credit Limit 8
2 Fig 2: Total Credit Cards 8
3 Fig 3: Total Bank Visits 9
4 Fig 4: Total Online Visits 9
5 Fig 5: Total Calls made 10
6 Fig 6: Histogram and Boxplot for Avg_Credit_Limit 10
7 Fig 7: Histogram and Boxplot for Total_Credit_Cards 10
8 Fig 8: Histogram and Boxplot for Total_Visits_Bank 11
9 Fig 9: Histogram and Boxplot for Total_Visits_Online 11
10 Fig 10: Histogram and Boxplot for Total_Calls_Made 11
11 Fig 11: Correlation Matrix 12
12 Fig 12: PairPlot 13
13 Fig 13: Outliers 14
14 Fig 13: Removal of Outliers 14
15 Fig 14: Scaled Dataset 15
16 Fig 15: Clusters and Average Distortions 15
17 Fig 16: Selecting k with the Elbow Method 15
18 Fig 17: Silhouette Score 16
19 Fig 18: Silhouette score for 3 is the highest. 16
20 Fig 19: Finding optimal no. of clusters with silhouette coefficients 16
21 Fig 20: Cluster Profiling 17
22 Fig 21: Checking the groups for Avg_Credit_Limit 17
23 Fig 21: Checking the groups for the remainder features 17
24 Fig 22: Boxplot of numerical variables for each cluster: K_means_segments 18
25 Fig 23: Dendrograms for each linkage methods 20
26 Fig 24: Cophent correlation for each linkage methods 20
27 Fig 25: Cophent correlation for each linkage methods 21
28 Fig 26: Creating 3 HC clusters 21
29 Fig 27: Checking the groups for Avg_Credit_Limit 21
30 Fig 28: Checking the groups for the remainder features 22
31 Fig 29: Boxplot of numerical variables for each cluster: HC_Clusters 22
32 Fig 30: Boxplot of numerical variables for each cluster: K_means_segments 23
33 Fig 31: Boxplot of numerical variables for each cluster: HC_Clusters 24
34 Fig 32: Scaled Dataset 25
35 Fig 33: Cumulative Explained Variance by Components1 25
36 Fig 34: Cumulative Explained Variance by Components2 26
37 Fig 35: Dendrograms with Linkage Methods 28
38 Fig 36: PCA_HC_Clusters 28
39 Fig 37: Boxplot of numerical variables for each cluster: PCA_HC_Clusters 29

3
List of Tables

Sl. No List of Figures Page Number

1 Table 1: Data Description 5
2 Table 2: Sample records in the dataset 6
3 Table 3: Data Info 6
4 Table 4: Unique values in each column 7
5 Table 5: Data Info after dropping columns 7
6 Table 5: Dataset Summary Statistics 7

4
Problem Statement – Coded Project
Context
AllLife Bank wants to focus on its credit card customer base in the next financial year. They have been advised by
their marketing research team, that the penetration in the market can be improved. Based on this input, the
Marketing team proposes to run personalized campaigns to target new customers as well as upsell to existing
customers. Another insight from the market research was that the customers perceive the support services of
the back poorly. Based on this, the Operations team wants to upgrade the service delivery model, to ensure that
customer queries are resolved faster. The Head of Marketing and Head of Delivery both decide to reach out to
the Data Science team for help.

Objective
To identify different segments in the existing customers, based on their spending patterns as well as past
interaction with the bank, using clustering algorithms, and provide recommendations to the bank on how to
better market to and service these customers.

Data Description
The data provided is of various customers of a bank and their financial attributes like credit limit, the total
number of credit cards the customer has, and different channels through which customers have contacted the
bank for any queries (including visiting the bank, online, and through a call center).

Data Dictionary

Variable Description
Sl_No Primary key of the records
Customer Key Customer identification number
Average Credit Limit Average credit limit of each customer for all credit cards
Total credit cards Total number of credit cards possessed by the customer
Total number of visits that the customer made (yearly) personally
Total visits bank
to the bank
Total number of visits or online logins made by the customer
Total visits online
(yearly)
Total number of calls made by the customer to the bank or its
Total calls made
customer service department (yearly)

Table 1: Data Description

5
Data Overview

Table 2: Sample records in the dataset

Table 3: Data Info

Observation

 There are 660 observations and 7 columns in the dataset.

 All columns have 660 non-null values.
 All columns are of int64 data type.
 There are no missing values

6
Table 4: Unique values in each column

Observation
Customer key, which is an identifier, has repeated values.

Data Pre-processing
Drop the columns that is not needed for the Analysis, 'Sl_No', 'Customer_Key', and drop duplicate customer
keys

Table 5: Data Info after dropping columns

Dataset Summary Statistics

7
Observation

After removing duplicated keys and rows and unnecessary columns, there are 644 unique observations and 5
columns in our data.

 Credit limit average is around 35K with 50% of customers having a credit limit less than 18K, which
implies a high positive skewness.
 Looking at standard deviation, we can see a considerably high variation in credit limits as well.
 On average, credit cards owned by each customer are ~5. Some customers have 10
 On average, most customer interactions are through calls, then online. Also, some customers never
contacted/visited the bank.
 There are 644 rows and 5 columns.

Exploratory Data Analysis

Univariate Analysis

Fig 1: Avg Credit Limit

Fig 2: Total Credit Cards

22.8% of customers have maximum of 4 Credit cards followed by 17.2% of them having 6 Credit cards.

8
Fig 3: Total Bank Visits
23.9% have visited the bank 2 times. 15.1% have never visited the bank.

Fig 4: Total Online Visits

28.4% have used the online facility 2 times. 21.9% have never used the online facility.

9
Fig 5: Total Calls made
16.1% of customers have called up the bank 4 times. 14.6% of them have never made any calls to the bank.

Fig 6: Histogram and Boxplot for Avg_Credit_Limit

Fig 7: Histogram and Boxplot for Total_Credit_Cards

10
Fig 8: Histogram and Boxplot for Total_Visits_Bank

Fig 9: Histogram and Boxplot for Total_Visits_Online

Fig 10: Histogram and Boxplot for Total_Calls_Made

Observations

 Many outliers in average credit limit. High credit customers are causing skewness.
 Online visits are mostly between 1 and 4 with some outliers with more than 8 and above.
11
Multivariate analysis

Fig 11: Correlation Matrix

Observations

Variable Variable Correlation

Avg_Credit_Limit Total_calls_made Negative
Avg_Credit_Limit Total_visits_bank Negative
Avg_Credit_Limit Total_visits_online Positive
Avg_Credit_Limit Total_Credit_Cards Positive
Total_calls_made Total_Credit_Cards Negative
Total_visits_bank Total_Credit_Cards Positive
Total_visits_bank Total_visits_online Negative
Total_visits_bank Total_calls_made Negative

 Avg_Credit_Limit is positively correlated with Total_Credit_Cards

 Total_visits_bank, Total_visits_online, Total_calls_made are negatively correlated which implies that
majority of customers use only one of these channels to contact the bank.

12
Fig 12: PairPlot

13
Outlier Treatment
Visually checking distributions

Fig 13: Outliers

Treating outliers by flooring and capping

Fig 13: Removal of Outliers

14
Scaling the dataset before clustering

Fig 14: Scaled Dataset

K-means clustering algorithms

Apply K-means - Elbow curve

Fig 15: Clusters and Average Distortions

Fig 16: Selecting k with the Elbow Method

Appropriate k seems to be a 2 or 3.
15
Silhouette Score

Fig 17: Silhouette Score

Fig 18: Silhouette score for 3 is the highest.

Fig 19: Finding optimal no. of clusters with silhouette coefficients

Observations

 Cluster 0 seems to be too thin in comparison to the others.

 Cluster 1 seems to be too large in comparison to the others.

Let us take 3 as appropriate no. of clusters as Silhouette score is high enough.

16
Cluster Profiling

Fig 20: Cluster Profiling

Checking the groups for Avg_Credit_Limit

Fig 21: Checking the groups for Avg_Credit_Limit

Checking the groups for the remainder features

Fig 21: Checking the groups for the remainder features

17
Boxplot of numerical variables for each cluster: K_means_segments

Fig 22: Boxplot of numerical variables for each cluster: K_means_segments

Insights K-means

 Cluster 0 :
 Avg_Credit_Limit: The mid end type of client.
 Total_Credit_Cards: The mid end type of client.
 Total_visits_bank: Visit the most the bank.
 Total_visits_online: Doesn't access much the online bank.
 Total_calls_made: Don't call as much as expected.
 Cluster 1 :
 Avg_Credit_Limit: The high end type of client.
 Total_Credit_Cards: The high end type of client.
 Total_visits_bank: The low end type of client.
 Total_visits_online: The high end type of client.
 Total_calls_made: The low end type of client.
 Cluster 2 :
 Avg_Credit_Limit: The low end type of client.
 Total_Credit_Cards: The low end type of client.
 Total_visits_bank: Doesn't visit much the bank.
 Total_visits_online: The mid end type of client.
 Total_calls_made: The high end type of client.

18
Hierarchical clustering
Apply Hierarchical clustering with different linkage methods and plot dendrograms for each linkage methods

19
Fig 23: Dendrograms for each linkage methods

Dendrogram with Weighted, centroid and average Linkage shows the distinct and separated cluster, which
is represented by highest correlation score meaning that the clusters are separated from each other.
Cophent correlation is a measure of the correlation between the distance of points in feature space and
distance on dendrogram. Closer it is to 1, the better is the clustering.

Fig 24: Cophent correlation for each linkage methods

Highest cophent correlation is 0.8924686891600743, which is obtained with Euclidean distance metric and
average linkage method.

20
Create and print dataframe to compare Cophenetic Coefficient for each linkage

Fig 25: Cophent correlation for each linkage methods

Creating 3 HC clusters

Fig 26: Creating 3 HC clusters

It seems that for hierarchical approach 2 clusters is a better choice

Checking the groups for Avg_Credit_Limit

Fig 27: Checking the groups for Avg_Credit_Limit

21
Checking the groups for the remainder features

Fig 28: Checking the groups for the remainder features

Boxplot of numerical variables for each cluster: HC_Clusters

Fig 29: Boxplot of numerical variables for each cluster: HC_Clusters

Insights Hierarchical Clustering

 Cluster 0 :
 Avg_Credit_Limit: The low end type of client.
 Total_Credit_Cards: The mid end type of client.
 Total_visits_bank: The high end type of client.
 Total_visits_online: The mid end type of client.
 Total_calls_made: The high end type of client.

22
 Cluster 1 :
 Avg_Credit_Limit: The high end type of client.
 Total_Credit_Cards: The high end type of client.
 Total_visits_bank: The low end type of client.
 Total_visits_online: The high end type of client.
 Total_calls_made: The mid end type of client.
 Cluster 2 :
 Avg_Credit_Limit: The high end type of client.
 Total_Credit_Cards: The low end type of client.
 Total_visits_bank: The low end type of client.
 Total_visits_online: The low end type of client.
 Total_calls_made: The low end type of client.

Compare cluster K-means clusters and Hierarchical clusters

Boxplot of numerical variables for each cluster: K_means_segments

Fig 30: Boxplot of numerical variables for each cluster: K_means_segments

Conclusions K-means

 Cluster 0: Seems to be type of clients with the mid-range credit limit, more willing to visit the bank.
 Cluster 1: High range type of client with more credit cards and high online transactions.
 Cluster 2: Seems to be the type of client with the lowest credit limit, more willing to call the bank.

23
Boxplot of numerical variables for each cluster: HC_Clusters

Fig 31: Boxplot of numerical variables for each cluster: HC_Clusters

Conclusions Hierarchical clusters

 Cluster 0: Seems to be type of clients with the lowest credit limit. A client who prefers visiting the
bank.
 Cluster 1: Seems to be type of clients with the highest credit limit. A client that demands online
contact.
 Cluster 2: Seems to be the type of client with the mid credit limit range, a type of client that do not
visit the bank neither uses online banking nor the calling facility.

Actionable Recommendations
The cluster that represents customers with high average credit limits and high average
balances.
Recommendation: This cluster could be targeted with personalized offers for high-value products and
services. The bank could consider offering these customers higher credit limits or lower interest rates

The cluster represents customers with low average credit limits and low average balances.
Recommendation: This cluster could be targeted with offers for basic banking products and services. The
bank could consider offering these customers lower fees or higher interest rates on savings accounts.

The cluster represents customers with average credit limits and average balances.
Recommendation: This cluster could be targeted with offers for a variety of banking products and services.
The bank could consider offering these customers personalized offers based on their individual needs and
preferences.

In-person customers and Phone customers should be reached out to promote online banking.

24
Using PCA to reduce the number of variables
Let us use the PCA to reduce the dimensions so that it explains 80% variance

Fig 32: Scaled Dataset

Show variance explained by individual components

Fig 33: Cumulative Explained Variance by Components1

For 90% variance, the number of components should be close to 3.5

25
Fig 34: Cumulative Explained Variance by Components2

Perform Clustering

26
27
Fig 35: Dendrograms with Linkage Methods
It can be seen that ward linkage method shows 4 as number of clusters

Fig 36: PCA_HC_Clusters

28
Boxplot of numerical variables for each cluster: PCA_HC_Clusters

Fig 37: Boxplot of numerical variables for each cluster: PCA_HC_Clusters

Insights PCA_HC_Clusters

 Cluster 0

 Lowest Avg_Credit_Limit.
 Lowest number in Total_Credit_Cards.
 Second lowest Total_visits_bank.
 Second highest Total_visits_online.
 Total_calls_made avg of 7.
 Clients prefer calls.
 Cluster 1

 Second highest Avg_Credit_Limit.

 Second highest Total_Credit_Cards.
 Second highest Total_visits_bank.
 Low Total_visits_online.
 Total_calls_made avg of 2.
 Clients visit in person.
 Cluster 2

 Highest Avg_Credit_Limit.
 Highest number in Total_Credit_Cards.
 Smallest Total_visits_bank.
 Highest Total_visits_online.
 Total_calls_made avg of 1.
 Clients would visit online.
 Cluster 3

 Second lowest Avg_Credit_Limit.

 Second Lowest Total_Credit_Cards.
 Highest Total_visits_bank.
 Low Total_visits_online.
 Total_calls_made avg of 2.
 Clients visit in person.
29
Actionable Recommendations
Cluster 0

This type of customer has a bad Avg_Credit_Limit and likes to call the bank. It is important to identify whether
they are the type of customer the bank wants to invest in. Mainly because developing a better call center
experience can be expensive and customers in this cluster enjoy the phone call experience.

Cluster 1

This type of customer has a good Avg_Credit_Limit and likes to visit the bank in person. It is important to
identify visiting patterns and improve your experience.

Cluster 2

This type of customer has a good Avg_Credit_Limit and likes to visit the online bank. It is important to identify
patterns of online visits and improve your experience by tracking your internet flow showing new products
and services.

Cluster 3

This type of customer has a decent Avg_Credit_Limit and likes to visit the bank in person. It is important to
identify visiting patterns and improve their experience.

Sybex - Maya - Secrets of The Pros - 2003 (PDF)
No ratings yet
Sybex - Maya - Secrets of The Pros - 2003 (PDF)
384 pages
Adm2 FR Operating Manual 15.07.02
71% (7)
Adm2 FR Operating Manual 15.07.02
160 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Ultratech Cement: Particulars Test Results Requirements of
100% (1)
Ultratech Cement: Particulars Test Results Requirements of
1 page
Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
5 Data Centric Engineering
No ratings yet
5 Data Centric Engineering
23 pages
Business Report Project Data Mining
No ratings yet
Business Report Project Data Mining
50 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Clustering
No ratings yet
Clustering
53 pages
Data Mining Project Anshul
100% (1)
Data Mining Project Anshul
48 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Geerations of Computer 1st To 5th Explained With Pictures
No ratings yet
Geerations of Computer 1st To 5th Explained With Pictures
9 pages
Arduino Energy Meter PDF
100% (2)
Arduino Energy Meter PDF
16 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
Data Mining
No ratings yet
Data Mining
27 pages
12 - Asterix at The Olympic Games (1968) (Digital-Empire) (WebP by Doc MaKS)
100% (1)
12 - Asterix at The Olympic Games (1968) (Digital-Empire) (WebP by Doc MaKS)
54 pages
Final Project
No ratings yet
Final Project
32 pages
Ranger 700: 1. Contents
100% (1)
Ranger 700: 1. Contents
8 pages
Clustering Analysis: Reading The Data
100% (1)
Clustering Analysis: Reading The Data
15 pages
Churn Prediction Model
No ratings yet
Churn Prediction Model
36 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Churn Analysis of Bank Customers
100% (1)
Churn Analysis of Bank Customers
12 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Omicron
No ratings yet
Omicron
23 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
13 - Asterix and The Cauldron (1969) (Digital-Empire) (WebP by Doc MaKS)
100% (1)
13 - Asterix and The Cauldron (1969) (Digital-Empire) (WebP by Doc MaKS)
54 pages
Data Mining Project Report - Reshma
No ratings yet
Data Mining Project Report - Reshma
23 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Consumer Durable Industry: Presented By-Kasturi Mandal A Vijay Kumar Sasi Kumar Umesh G S Arun Kumar Barun Bardhan
0% (1)
Consumer Durable Industry: Presented By-Kasturi Mandal A Vijay Kumar Sasi Kumar Umesh G S Arun Kumar Barun Bardhan
60 pages
Compliance Under Case-B'.: Notes
No ratings yet
Compliance Under Case-B'.: Notes
10 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Curriculum Map Grade 7
No ratings yet
Curriculum Map Grade 7
7 pages
Insights
No ratings yet
Insights
2 pages
Bank Customer Segmentation
No ratings yet
Bank Customer Segmentation
14 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Credit Card Customer Segmentation by Clustering: Bennett NG Teng Seng
No ratings yet
Credit Card Customer Segmentation by Clustering: Bennett NG Teng Seng
6 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
11 - Asterix and The Chieftains Shield (1968) (Digital-Empire) (WebP by Doc MaKS)
No ratings yet
11 - Asterix and The Chieftains Shield (1968) (Digital-Empire) (WebP by Doc MaKS)
54 pages
Description: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Description: Bank - Marketing - Part1 - Data - CSV
4 pages
Engine Immobilizer System
No ratings yet
Engine Immobilizer System
6 pages
Project Questions
No ratings yet
Project Questions
4 pages
The Elements of Quantitative Investing
From Everand
The Elements of Quantitative Investing
Giuseppe A. Paleologo
No ratings yet
Predictionof Customer Churnin Banking Industry
No ratings yet
Predictionof Customer Churnin Banking Industry
16 pages
Operational Amplifier
No ratings yet
Operational Amplifier
18 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Numpy Tutorial by Expertized Guy
No ratings yet
Numpy Tutorial by Expertized Guy
12 pages
High School Students' Perceptions of Motivations For Cyberbullying An Exploratory Study
No ratings yet
High School Students' Perceptions of Motivations For Cyberbullying An Exploratory Study
6 pages
A Reliable Architecture Based On Reactive Microservices For Iot Applications
No ratings yet
A Reliable Architecture Based On Reactive Microservices For Iot Applications
5 pages
Unikl Bmi: Section A: Course Details
No ratings yet
Unikl Bmi: Section A: Course Details
4 pages
RBC BusinessRequirementDocument
No ratings yet
RBC BusinessRequirementDocument
2 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
Yash - Capstone Report PDF Notes1
No ratings yet
Yash - Capstone Report PDF Notes1
14 pages
Reference Report 2
No ratings yet
Reference Report 2
43 pages
The Basic Concepts of Information Systems: July 2021
No ratings yet
The Basic Concepts of Information Systems: July 2021
19 pages
QMS Manual (23 Files Merged)
100% (1)
QMS Manual (23 Files Merged)
168 pages
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
No ratings yet
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
14 pages
Vaixell Teseu
No ratings yet
Vaixell Teseu
5 pages
Mlproj
No ratings yet
Mlproj
49 pages
Brosura Hitachi ZW150-6 EN
No ratings yet
Brosura Hitachi ZW150-6 EN
28 pages
Jahnavijillella ML1 30 06 2024 PDF
No ratings yet
Jahnavijillella ML1 30 06 2024 PDF
53 pages
Report
No ratings yet
Report
17 pages
SQL Project
No ratings yet
SQL Project
21 pages
Activo PD503 004
No ratings yet
Activo PD503 004
4 pages
Report-Yifan Lu.1
No ratings yet
Report-Yifan Lu.1
13 pages
Lesson 2 Introduction of Robot HAT
No ratings yet
Lesson 2 Introduction of Robot HAT
4 pages
Cover Letter Qatar
No ratings yet
Cover Letter Qatar
1 page
Business Report
No ratings yet
Business Report
18 pages
Thera Bank Loan Purchase Modelling
No ratings yet
Thera Bank Loan Purchase Modelling
44 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Bank Churn Analysis Requirement Document
No ratings yet
Bank Churn Analysis Requirement Document
3 pages
Raptor 2024
No ratings yet
Raptor 2024
8 pages
Inmoov Report
No ratings yet
Inmoov Report
94 pages
Business Report - ML
No ratings yet
Business Report - ML
25 pages
Brochure - Global Wi-Fi Market - Global Forecast To 2020
No ratings yet
Brochure - Global Wi-Fi Market - Global Forecast To 2020
24 pages
Proposal CIS 412
No ratings yet
Proposal CIS 412
1 page
Unsupervised Learning
No ratings yet
Unsupervised Learning
19 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
RIL - List of Subsidiaries
No ratings yet
RIL - List of Subsidiaries
7 pages
Probability & Statistics - Workbook.solutions
No ratings yet
Probability & Statistics - Workbook.solutions
471 pages
Workbook - Hypothesis Testing - Solutions
No ratings yet
Workbook - Hypothesis Testing - Solutions
91 pages
Workbook - Discrete Random Variables
No ratings yet
Workbook - Discrete Random Variables
19 pages
The Virtual File System (VFS)
No ratings yet
The Virtual File System (VFS)
60 pages
Workbook Regression
No ratings yet
Workbook Regression
18 pages
Workbook - Hypothesis Testing
No ratings yet
Workbook - Hypothesis Testing
26 pages
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
No ratings yet
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
10 pages
Car Insurance Insights Summary Presentation
No ratings yet
Car Insurance Insights Summary Presentation
10 pages
10 Hypothesis Testing For The Difference of Proportions
No ratings yet
10 Hypothesis Testing For The Difference of Proportions
9 pages
Probability & Statistics - Workbook
No ratings yet
Probability & Statistics - Workbook
163 pages
03 Coefficient of Determination and RMSE
No ratings yet
03 Coefficient of Determination and RMSE
7 pages
Python Seaborn Tutorial For Beginners v2
No ratings yet
Python Seaborn Tutorial For Beginners v2
40 pages
02 Significance Level and Type I and II Errors
No ratings yet
02 Significance Level and Type I and II Errors
8 pages
09 Lineplot
No ratings yet
09 Lineplot
21 pages
Probability & Statistics - Final Exam - Solutions
No ratings yet
Probability & Statistics - Final Exam - Solutions
16 pages
Probability & Statistics - Final Exam
No ratings yet
Probability & Statistics - Final Exam
9 pages
3 Outliers Iqr
No ratings yet
3 Outliers Iqr
3 pages
Probability & Statistics - Final Exam - Practice 1
No ratings yet
Probability & Statistics - Final Exam - Practice 1
9 pages
CP R80 CheckPoint API ReferenceGuide
No ratings yet
CP R80 CheckPoint API ReferenceGuide
6 pages
Supervised Learning Problem For Solving
No ratings yet
Supervised Learning Problem For Solving
2 pages
Lecture 1-1 Introduction To Digital Systems
No ratings yet
Lecture 1-1 Introduction To Digital Systems
16 pages
01 Mean, Variance, and Standard Deviation
No ratings yet
01 Mean, Variance, and Standard Deviation
10 pages
02 Measures of Spread
No ratings yet
02 Measures of Spread
6 pages
10 Building Histograms From Data Sets
No ratings yet
10 Building Histograms From Data Sets
7 pages
02 Frequency Histograms and Polygons, and Density Curves
No ratings yet
02 Frequency Histograms and Polygons, and Density Curves
6 pages
04 Box and Whisker Plots
No ratings yet
04 Box and Whisker Plots
6 pages
03 Symmetric and Skewed Distributions and Outliers
No ratings yet
03 Symmetric and Skewed Distributions and Outliers
6 pages
07 Relative Frequency Tables
No ratings yet
07 Relative Frequency Tables
6 pages
08 Joint Distributions
No ratings yet
08 Joint Distributions
6 pages
01 Measures of Central Tendency
No ratings yet
01 Measures of Central Tendency
6 pages
09 Histograms and Stem-And-leaf Plots
No ratings yet
09 Histograms and Stem-And-leaf Plots
6 pages
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet

UL Coded Project Report - KC

Uploaded by

UL Coded Project Report - KC

Uploaded by

UL Graded Project - Coded

AllLife Bank Project Case Study

Sl. No List of Figures Page Number

Table 1: Data Description

Table 2: Sample records in the dataset

Table 3: Data Info

 There are 660 observations and 7 columns in the dataset.

Table 5: Data Info after dropping columns

Dataset Summary Statistics

Exploratory Data Analysis

Fig 1: Avg Credit Limit

Fig 2: Total Credit Cards

Fig 4: Total Online Visits

Fig 6: Histogram and Boxplot for Avg_Credit_Limit

Fig 7: Histogram and Boxplot for Total_Credit_Cards

Fig 9: Histogram and Boxplot for Total_Visits_Online

Fig 10: Histogram and Boxplot for Total_Calls_Made

Fig 11: Correlation Matrix

Variable Variable Correlation

 Avg_Credit_Limit is positively correlated with Total_Credit_Cards

Fig 13: Outliers

Treating outliers by flooring and capping

Fig 13: Removal of Outliers

Fig 14: Scaled Dataset

K-means clustering algorithms

Fig 15: Clusters and Average Distortions

Fig 16: Selecting k with the Elbow Method

Fig 17: Silhouette Score

Fig 18: Silhouette score for 3 is the highest.

Fig 19: Finding optimal no. of clusters with silhouette coefficients

 Cluster 0 seems to be too thin in comparison to the others.

Let us take 3 as appropriate no. of clusters as Silhouette score is high enough.

Fig 20: Cluster Profiling

Checking the groups for Avg_Credit_Limit

Fig 21: Checking the groups for Avg_Credit_Limit

Checking the groups for the remainder features

Fig 21: Checking the groups for the remainder features

Fig 22: Boxplot of numerical variables for each cluster: K_means_segments

Fig 24: Cophent correlation for each linkage methods

Fig 25: Cophent correlation for each linkage methods

Fig 26: Creating 3 HC clusters

Checking the groups for Avg_Credit_Limit

Fig 27: Checking the groups for Avg_Credit_Limit

Fig 28: Checking the groups for the remainder features

Boxplot of numerical variables for each cluster: HC_Clusters

Fig 29: Boxplot of numerical variables for each cluster: HC_Clusters

Insights Hierarchical Clustering

Compare cluster K-means clusters and Hierarchical clusters

Fig 30: Boxplot of numerical variables for each cluster: K_means_segments

Fig 31: Boxplot of numerical variables for each cluster: HC_Clusters

Conclusions Hierarchical clusters

Fig 32: Scaled Dataset

Fig 33: Cumulative Explained Variance by Components1

Fig 36: PCA_HC_Clusters

Fig 37: Boxplot of numerical variables for each cluster: PCA_HC_Clusters

 Second highest Avg_Credit_Limit.

 Second lowest Avg_Credit_Limit.

You might also like