0% found this document useful (0 votes)

75 views31 pages

Final Group 1

Bencare wants to move beyond simplistic customer segmentation strategies to identify segments using variables that influence customer choice. The analysis will identify customer segments using k-means clustering on factors related to satisfaction, pricing, reputation, and short/long term values. The objectives are to analyze loyalty across segments and study the impact of culture on loyalty. The analytical plan involves removing outliers, hierarchical clustering to identify cluster seeds, ANOVA to estimate cluster centroids, splitting the data to validate clusters internally, and Cohen's Kappa to assess cluster agreement. K-means clustering will further refine the clusters.

Uploaded by

Akash Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views31 pages

Final Group 1

Uploaded by

Akash Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

CLUSTER

ANALYSIS GRP
1

PROBLEM STATEMENT
Stimulus:

Marketers use demographic variables such as age, gender and education to identify distinctive segments that they
can attract, retain and win loyalty. But at often instances they fail to provide any useful insights which might be helpful for sound
decision making. Bencare wants to move away from simplistic segmentation strategies to those that use variables that consumers
actually use in making choice decisions for insurance products.
The KCPs of interest are defined as follows:

DESCRIPTION OF SEGMENTS
Well be considering four dependent variables for external validation

Satisfaction:

Trust Company:
Trust Agent:
Behavioural Loyalty:
Cognitive Loyalty:

Pricing:
Reputation:

SAT1 to SAT3

PRICE1 to PRICE3
REPU1 to REPU4

Short-term Value:

VAL1 to VAL3

Long-term Value:

VAL4 to VAL6

rep17 to rep 20
prac17 to prac20
loy1 to loy4
loy5 to loy8

The segmentation is based on the customer perception about Bencare on

a scale of Satisfaction, pricing, reputation, and short/long term values. The
segments will display Trust or Loyalty depending upon the loadings on the
corresponding attributes.

The aim of Bencare is to go beyond the conventional measures and derive insights from the segments in terms of the loyalty factor
and to identify the most appropriate segment to target. It also aims to study the impact of cultural differences on loyalty.

OBJECTIVES
To identify different customer segments using Insurance Products
To analyse groups of similar data instead of individual observation.
Analyse the loyalty of different segments
Studying the impact of cultural differences on each particular segment.

ANALYTICAL PLAN
Prepping the data by
removing the outliers

Identifying the cluster

seeds using
Hierarchical
Clustering

Identify the cluster

centroids by
comparing the means
using ANOVA

Split the dataset into

two halves, a test
sample and other is
the internal validation
sample and apply KMeans cluster to
obtain new centroids

Run External Validation using

ANOVA to compute differences
between the means within
each sample

Run Cohens Kappa to

determine symmetry or
agreement within the
cluster seeds.

Run Update and No

Update to obtain Cluster
Membership

ASSUMPTIONS
1

Well be using compositional method as assessment of similarity in which a

defined set of attributes is considered in developing the similarity between
objects

Were using Squared Euclidean distance as it increases the importance of large

distances, while weakening the importance of small distances.

The dataset provided is free from any Multicollinearity biases as the factors
scores have been provided in the dataset

We would be considering that all dependent variables are correlated with each
other and thus behave similarly, so checking outliers for any one would suffice

Step 1: Identifying Outliers

We will be using Mahalonobiss distance for Multivariate Normality using
Regression.
Were considering 99% significance at a five degrees of freedom, the ChiSquare value for which is 15.086, so any value of Mahals distance
exceeding this cut-off will be treated as an outlier.
There are a total of 29 outliers, which represent around 3.6 % of the total
responses.
As were eliminating these outliers we can see the variance captured
increases from 38.1% to 56.5%, which reflects a clear case of the
presence of outliers, hence we
will eliminate all those 29 outliers.

Step 2 : Cluster Seeds using

Hierarchical Clustering
Theres a significant increase in the percentage
change as we move from cluster 4 to 3. Cluster
Seed=3 would be the most appropriate choice.

% Change
It graph is a visual representation of the % age change for
deciding upon the number of cluster seeds. The first elbow
appears at 4, thus Number of Cluster Seeds deduced is 3. We
would be running hierarchical clustering for clusters 3 to 7.

15
10
Stress

5
0
2

9 10 11 12 13 14 15

Number of Clusters

Estimating Cluster Centroids using

ANOVA (For Cluster Seed 3)
Step 3 :

Descriptive

REGR factor score 1 for

analysis 1

REGR factor score 2 for

analysis 1

REGR factor score 3 for

analysis 1

N
1
2
3
Total
1

247
144
677

2
3
Total
1

247
144
677

2
3

247

Total
REGR factor score 4 for
analysis 1

REGR factor score 5 for

analysis 1

286

144
677
286

2
3
Total
1

247
144
677

2
3

247

Total

677

286

144

Std.
Mean
Deviation
-.524475
.87593233
8
.2247202 .55471251
.8042042 .61893394
.0314789 .88774481
-.145388
.65155437
4
.0531066 1.06215779
.3734897 .76418553
.0373985 .86680447
-.102392
.97457065
4
.1551692 .67344599
-.121879
1.05996505
4
-.012567
.90519522
3
-.395874
.77777667
9
.4470345 .61640115
.1610676 .82835820
.0301197 .82607392
-.190584
.76591607
3
.7239336 .69727932
-.802032
.85476796
5
.0130160 .96047909

Mean value of
standardized
scores to be
used as the
centroids
for
cluster
Analysis.

Looking at F value
we can say that
factor 1 and
factor 5
contributes
maximum in the
formation of this
clusters.

Sum of
Squares

REGR
factor
score 1
for
analysis 1