0% found this document useful (0 votes)

15 views25 pages

Cluster Analysis

Yes

Uploaded by

abhishekdwivediupscaspirant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views25 pages

Cluster Analysis

Yes

Uploaded by

abhishekdwivediupscaspirant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Cluster Analysis

Cluster Analysis
• Used to classify objects (cases) into
homogeneous groups called clusters.
• Objects in each cluster tend to be similar and
dissimilar to objects in the other clusters.
• In cluster analysis groups are suggested by the
data.
An Ideal Clustering Situation

Variable 1

Variable 2
More Common Clustering Situation

Variable 1

X
Variable 2
Statistics Associated with Cluster Analysis

• Agglomeration schedule. Gives information on the

objects or cases being combined at each stage of a
hierarchical clustering process.

• Cluster centroid. Mean values of the variables for all

the cases in a particular cluster.

• Cluster centers. Initial starting points in nonhierarchical

clustering. Clusters are built around these centers, or
seeds.

• Cluster membership. Indicates the cluster to which

each object or case belongs.
Statistics Associated with Cluster
Analysis
• Dendrogram (A tree graph). A graphical device for displaying
clustering results.

-Vertical lines represent clusters that are joined together.

-The position of the line on the scale indicates distances

at which clusters were joined.

• Distances between cluster centers. These distances indicate

how separated the individual pairs of clusters are. Clusters that
are widely separated are distinct, and therefore desirable.
Conducting Cluster Analysis
Formulate the Problem

Select a Distance Measure

Select a Clustering Procedure

Decide on the Number of Clusters

Interpret and Profile Clusters

Assess the Validity of Clustering

Formulating the Problem
• Most important is selecting the variables on
which the clustering is based.
• Inclusion of even one or two irrelevant
variables may distort a clustering solution.
• Variables selected should describe the
similarity between objects in terms that are
relevant to the marketing research problem.
• Should be selected based on past research,
theory, or a consideration of the hypotheses
being tested.
Select a Similarity Measure
• Similarity measure can be correlations or distances

• The most commonly used measure of similarity is

the Euclidean distance. The city-block distance is
also used.

• If variables measured in vastly different units, we

must standardize data. Also eliminate outliers

• Use of different similarity/distance measures may

lead to different clustering results.

• Hence, it is advisable to use different measures

and compare the results.
Hierarchical Clustering Methods
• Hierarchical clustering is characterized by the
development of a hierarchy or tree-like structure.
-Agglomerative clustering starts with each object
in a separate cluster. Clusters are formed by grouping
objects into bigger and bigger clusters.
-Divisive clustering starts with all the objects
grouped in a single cluster. Clusters are divided or split
until each object is in a separate cluster.
• Agglomerative methods are commonly used in marketing
research. They consist of linkage methods, variance
methods, and centroid methods.
Hierarchical Agglomerative Clustering-
Linkage Method
• The single linkage method is based on minimum
distance, or the nearest neighbor rule.

• The complete linkage method is based on the

maximum distance or the furthest neighbor
approach.

• The average linkage method the distance

between two clusters is defined as the average of
the distances between all pairs of objects
Linkage Methods of Clustering
Single Linkage
Minimum
Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average
Cluster 1 Distance Cluster 2
Hierarchical Agglomerative Clustering-
Variance and Centroid Method
• Variance methods generate clusters to minimize the within-
cluster variance.

• Ward's procedure is commonly used. For each cluster, the sum

of squares is calculated. The two clusters with the smallest
increase in the overall sum of squares within cluster distances are
combined.

• In the centroid methods, the distance between two clusters is

the distance between their centroids (means for all the variables),

• Of the hierarchical methods, average linkage and Ward's

methods have been shown to perform better than the other
procedures.
Other Agglomerative Clustering Methods
Ward’s Procedure

Centroid Method
Idea Behind K-Means
• Algorithm for K-means clustering
1. Partition items into K clusters
2. Assign items to cluster with nearest
centroid mean
3. Recalculate centroids both for cluster
receiving and losing item
4. Repeat steps 2 and 3 till no more
reassignments
Select a Clustering Procedure
• The hierarchical and nonhierarchical methods should be
used in tandem.

-First, an initial clustering solution is obtained

using a hierarchical procedure (e.g. Ward's).

-The number of clusters and cluster centroids so

obtained are used as inputs to the optimizing
partitioning method.

• Choice of a clustering method and choice of a distance

measure are interrelated. For example, squared
Euclidean distances should be used with the Ward's and
centroid methods. Several nonhierarchical procedures
also use squared Euclidean distances.
Decide Number of Clusters
• Theoretical, conceptual, or practical
considerations.
• In hierarchical clustering, the distances at which
clusters are combined (from agglomeration
schedule) can be used
• Stop when similarity measure value makes
sudden jumps between steps
Interpreting and Profiling Clusters

• Involves examining the cluster centroids. The

centroids enable us to describe each cluster by
assigning it a name or label.

• Profile the clusters in terms of variables that were not

used for clustering. These may include
demographic, psychographic, product usage, media
usage, or other variables.
Assess Reliability and Validity
1. Perform cluster analysis on the same data using different
distance measures. Compare the results across measures
to determine the stability of the solutions.
2. Use different methods of clustering and compare the results.
3. Split the data randomly into halves. Perform clustering
separately on each half. Compare cluster centroids across
the two subsamples.
4. Delete variables randomly. Perform clustering based on the
reduced set of variables. Compare the results with those
obtained by clustering based on the entire set of variables.
5. In nonhierarchical clustering, the solution may depend on
the order of cases in the data set. Make multiple runs using
different order of cases until the solution stabilizes.
Example of Cluster Analysis
• Consumers were asked about their attitudes
about shopping. Six variables were selected:
• V1: Shopping is fun
V2: Shopping is bad for your budget
V3: I combine shopping with eating out
V4: I try to get the best buys when shopping
V5: I don’t care about shopping
V6: You can save money by comparing prices
• Responses were on a 7-pt scale (1=disagree;
7=agree)
Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7
Dendrogram
The Elbow method
• The Elbow method picks the range of values and
takes the best among them. It calculates the
Within Cluster Sum of Square(WCSS) for
different values of K. It calculates the sum of
squared points and calculates the average
distance.
Cluster Centroids

Means of
Variables
Cluster No. V1 V2 V3 V4 V5 V6

1 5.750 3.625 6.000 3.125 1.750 3.875

2 1.667 3.000 1.833 3.500 5.500 3.333

3 3.500 5.833 3.333 6.000 3.500 6.000

Research Methodology and Quantitative Methods
From Everand
Research Methodology and Quantitative Methods
G. NAGESWARA RAO
1/5 (1)
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
Chapter 20: Cluster Analysis: Advance Marketing Research
No ratings yet
Chapter 20: Cluster Analysis: Advance Marketing Research
40 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
No ratings yet
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
13 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
35 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
No ratings yet
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
50 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Advanced Marketing Research: Session 17: Cluster Analysis
No ratings yet
Advanced Marketing Research: Session 17: Cluster Analysis
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Session-13b BRM PDF
No ratings yet
Session-13b BRM PDF
18 pages
11 Chapter 3
No ratings yet
11 Chapter 3
17 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
46 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Block 18 ST3188
No ratings yet
Block 18 ST3188
29 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
41 pages
Business Research Methods: Cluster Analysis
No ratings yet
Business Research Methods: Cluster Analysis
46 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Cluster Analysis CH 20
No ratings yet
Cluster Analysis CH 20
2 pages
Lec 35
No ratings yet
Lec 35
18 pages
In Marketing, Cluster Analysis Is Used For: Statistical
No ratings yet
In Marketing, Cluster Analysis Is Used For: Statistical
3 pages
Malhotra MR6e 20
No ratings yet
Malhotra MR6e 20
46 pages
Cluster Analysis
No ratings yet
Cluster Analysis
6 pages
Slides - Clustering
No ratings yet
Slides - Clustering
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Cluster Analysis: Prepared By: (Group-5) Ashish Goyal Jitendra Jain Nitesh Sadani
100% (1)
Cluster Analysis: Prepared By: (Group-5) Ashish Goyal Jitendra Jain Nitesh Sadani
19 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
Cluster Analysis
No ratings yet
Cluster Analysis
61 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Clustering: Analisis Big Data - Pertemuan 6
No ratings yet
Clustering: Analisis Big Data - Pertemuan 6
51 pages
SPSS Tutorial Cluster Analysis
No ratings yet
SPSS Tutorial Cluster Analysis
42 pages
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
2021 BM MA Course Session 3 - Segmentation
No ratings yet
2021 BM MA Course Session 3 - Segmentation
20 pages
Cluster Analysis: Mala Srivastava
No ratings yet
Cluster Analysis: Mala Srivastava
21 pages
Aula - Análise de Clusters
No ratings yet
Aula - Análise de Clusters
93 pages
Cluster Analysis
100% (1)
Cluster Analysis
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
INVENRELATION
From Everand
INVENRELATION
Shih Yu Chang
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Windows 7 Regal Business Edition 2014 SP1
No ratings yet
Windows 7 Regal Business Edition 2014 SP1
1 page
Urban Planning and GIS
No ratings yet
Urban Planning and GIS
2 pages
Field Service Manager
No ratings yet
Field Service Manager
2 pages
Addition of Integers
No ratings yet
Addition of Integers
6 pages
Enlogic by Nvent G3 Enterprise Power Distribution Units For HPE Datasheet
No ratings yet
Enlogic by Nvent G3 Enterprise Power Distribution Units For HPE Datasheet
3 pages
Digital Instrumentation
No ratings yet
Digital Instrumentation
1 page
Secureworks Hacker Annualreport
No ratings yet
Secureworks Hacker Annualreport
25 pages
Sample Business Plan For Skin Care Company
100% (1)
Sample Business Plan For Skin Care Company
8 pages
(2020) Liquid Case Study - Animation Studio Unlocks VDI Performance and Efficiency With Liquid (Liquid)
No ratings yet
(2020) Liquid Case Study - Animation Studio Unlocks VDI Performance and Efficiency With Liquid (Liquid)
7 pages
Apertadeira 90º-48ea
No ratings yet
Apertadeira 90º-48ea
48 pages
LTE Frequency Bands
No ratings yet
LTE Frequency Bands
6 pages
SQL For Beginners
No ratings yet
SQL For Beginners
79 pages
Creating A Standby Using RMAN Duplicate (RAC or Non RAC) (Doc ID 1617946.1)
No ratings yet
Creating A Standby Using RMAN Duplicate (RAC or Non RAC) (Doc ID 1617946.1)
11 pages
AccurioPress C2070 C2070P C2060 Catalog en PDF
No ratings yet
AccurioPress C2070 C2070P C2060 Catalog en PDF
16 pages
CRC Leaky Bucket Algorithm
No ratings yet
CRC Leaky Bucket Algorithm
7 pages
Bizhub PRO 1200 Series Product Guide 4.8
No ratings yet
Bizhub PRO 1200 Series Product Guide 4.8
73 pages
Nokia Siemens Networks
100% (1)
Nokia Siemens Networks
33 pages
Schneider - 45RIEC PDF
No ratings yet
Schneider - 45RIEC PDF
28 pages
CMOS Answers: 1. What Is Intrinsic and Extrinsic Semiconductor?
No ratings yet
CMOS Answers: 1. What Is Intrinsic and Extrinsic Semiconductor?
4 pages
China Book Digital Publishing Market Analysis Yanping Bryant Openbook Trajectory
No ratings yet
China Book Digital Publishing Market Analysis Yanping Bryant Openbook Trajectory
37 pages
PDP Erik Conrath
No ratings yet
PDP Erik Conrath
8 pages
Ma1254 - Random Processes: Unit I - Probability and Random Variable
100% (1)
Ma1254 - Random Processes: Unit I - Probability and Random Variable
5 pages
Ramya R Chandran: Career Objective
No ratings yet
Ramya R Chandran: Career Objective
2 pages
Content Addressable Memory Using XNOR CAM Cell
No ratings yet
Content Addressable Memory Using XNOR CAM Cell
5 pages
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
No ratings yet
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
4 pages
OUTPUT#5
No ratings yet
OUTPUT#5
2 pages
2020 APCS Tom Tat Gioi Thieu Mon Hoc 2
No ratings yet
2020 APCS Tom Tat Gioi Thieu Mon Hoc 2
28 pages
Lec 12
No ratings yet
Lec 12
15 pages
IBM University Relations - Newsletter (Q3&4,2010)
No ratings yet
IBM University Relations - Newsletter (Q3&4,2010)
21 pages
BDP and CapDev Format Sample
No ratings yet
BDP and CapDev Format Sample
17 pages

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

Cluster Analysis

• Agglomeration schedule. Gives information on the

• Cluster centroid. Mean values of the variables for all

• Cluster centers. Initial starting points in nonhierarchical

• Cluster membership. Indicates the cluster to which

-Vertical lines represent clusters that are joined together.

-The position of the line on the scale indicates distances

• Distances between cluster centers. These distances indicate

Select a Distance Measure

Select a Clustering Procedure

Decide on the Number of Clusters

Interpret and Profile Clusters

Assess the Validity of Clustering

• The most commonly used measure of similarity is

• If variables measured in vastly different units, we

• Use of different similarity/distance measures may

• Hence, it is advisable to use different measures

• The complete linkage method is based on the

• The average linkage method the distance

• Ward's procedure is commonly used. For each cluster, the sum

• In the centroid methods, the distance between two clusters is

• Of the hierarchical methods, average linkage and Ward's

-First, an initial clustering solution is obtained

-The number of clusters and cluster centroids so

• Choice of a clustering method and choice of a distance

• Involves examining the cluster centroids. The

• Profile the clusters in terms of variables that were not

1 5.750 3.625 6.000 3.125 1.750 3.875

2 1.667 3.000 1.833 3.500 5.500 3.333

3 3.500 5.833 3.333 6.000 3.500 6.000

You might also like