0% found this document useful (0 votes)

18 views24 pages

Cluster Analysis Finalllll

Uploaded by

Vinit Badani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views24 pages

Cluster Analysis Finalllll

Uploaded by

Vinit Badani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Cluster Analysis

-Prof. Chitvan Mehrotra

What is Cluster Analysis?
 Cluster Analysis is a multivariate interdependence technique whose primary objectiv e is to classify objects into relativ ely
homogenous groups.

 Cluster is a collection of data objects

 Similar (cohesiv e) to one another within the same cluster (high intra-class similarity)

 Dissimilar to the objects in other clusters (low inter-class similarity)

 Cluster Analysis inv olves finding groups of objects such that the objects in a group will be similar (or related) to one another and
different from (or unrelated to) the objects in other groups

Inter-cluster
Intra-cluster
distances are
distances are
maximized
minimized
Examples of Clustering Applications
 Medicine – What are the diagnostic clusters? To answer this question the researcher
would devise a diagnostic questionnaire that includes possible symptoms (for
example, in psychology, anxiety, depression etc.). The cluster analysis can then
identify groups of patients that have similar symptoms.

 Marketing – What are the customer segments? To answer this question a market
researcher may conduct a survey covering needs, attitudes, demographics, and
behavior of customers. The researcher then may use cluster analysis to identify
homogenous groups of customers that have similar needs and attitudes.

 Education – What are student groups that need special attention? Researchers may
measure psychological, aptitude, and achievement characteristics. A cluster analysis
then may identify what homogeneous groups exist among students (for example, high
achievers in all subjects, or students that excel in certain subjects but fail in others).

 Fraud Detection – Combinations of rules, are used to explore random, 'fraudulent' or

'non-fraudulent‘ transactions. These clusters are then used to train a supervised
machine learning algorithm that is able to cluster new records as either fraudulent or
non-fraudulent.
Clustering procedures
 Hierarchical procedures-Tree like structure for understanding the levels of
observation
 – Agglomerative (start from n clusters, to get to 1 cluster)
 – Divisive (start from 1 cluster, to get to n cluster)

 Non hierarchical procedures- A centroid is chosen and distance from the centroid is
used to form clusters
 – K-means clustering
Concept of Euclidean Distance
 Objective of clustering is to group the similar objects together.
 E.D. (based on the Pythagoras Theorem) is used to assess how
similar or different the objects are.
 It measures similarity in terms of distance between pairs of
object.
 Object with smaller distance are similar to each other than those
with larger distance
SPSS Example 1- Psychiatric
Treatment
 We wanted to look at clusters of cases referred for psychiatric
treatment.
 We measured each subject on four questionnaires: Spielberger Trait
Anxiety Inventory (STAI), the Beck Depression Inventory (BDI), a
measure of Intrusive Thoughts and Rumination (IT) and a measure of
Impulsive Thoughts and Actions (Impulse).
 The rationale behind this analysis is that people with the same
disorder should report a similar pattern of scores across the measures
(so the profiles of their responses should be clustered).
 To check the analysis, trained psychologists were asked to agree a
diagnosis based on the DSM-IV (GAD - Generalized Anxiety Disorder,
DEP – Depression & OCD - Obsessive Compulsive Disorder)
SPSS Commands

First perform a hierarchical method to define the number of clusters.

Then use the k-means procedure to actually form the clusters.
 Analyze/Classify/Hierarchical Cluster Analysis
 Select the four diagnostic questionnaires from the list on the left-
hand side and drag them to the box labelled Variables.
 Statistics (Agglomeration Schedule)
 Plots (Dendrogram, Vertical)
 Method (Wards Method; Squared Euclidian Distance, Standardize -
Z Scores, By Variables)
 Save (Single Solution; Number of clusters = 3)
 Ok
Interpretation 1
Stage at which each cluster first appears. Zeroes
indicate that single clusters existed before analysis.

The coefficients column indicates the

Euclidean distance or distance between
the two clusters (or cases) joined at each
stage

The jump in coefficients (56-19=37

but one cluster solution not preferred
so next Biggest jump in coefficients is
between Stage 13 and stage 12(from
bottom to top) (19-4=15 )so stage 13
is where we should stop.
Number of Clusters = 2
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 0 0 7
3 10 14 2.667 0 1 8
4 2 13 3.000 0 0 15
5 5 11 3.000 0 0 9
This table
6 3 8 3.000 0 0 16
shows how the
7 6 12 3.333 2 0 10
cases are
8 4 10 3.500 0 3 13
clustered
9 5 9 4.000 5 0 11
together at
10 1 6 4.167 0 7 12
each stage of
11 5 20 5.667 9 0 15
the cluster
12 1 17 5.800 10 0 14
analysis.
13 4 19 6.000 8 0 17
14 1 15 7.933 12 0 16
15 2 5 8.200 4 11 18
16 1 3 9.714 14 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 0 0 7
3 10 14 2.667 0 1 8
4 2 13 3.000 0 0 15
5 5 11 3.000 0 0 9
6 3 8 3.000 0 0 16
7 6 12 3.333 For instance,
2 in this 0 10
8 4 10 3.500 example,0 cases 14 3and 16 13
9 5 9 4.000 are joined5 at stage 01. This 11
10 1 6 4.167 is shown in
0 the Clusters
7 12
11 5 20 5.667 Combined 9 columns. 0 15
12 1 17 5.800 When clusters
10 or cases
0 14
13 4 19 6.000 are joined,
8 they are0 17
14 1 15 7.933 subsequently
12 labeled0 with 16
15 2 5 8.200 the smaller
4 of the two
11 18
16 1 3 9.714 14
cluster numbers 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 0 0 7
3 10 14 2.667 0 1 8
4 2 13 3.000 0 0 15
5 5 11 3.000 0 0 9
6 3 8 3.000 0 coefficients
The 0 16
7 6 12 3.333 2
column 0
indicates 10
8 4 10 3.500 0 3 13
the Euclidean
9 5 9 4.000 5 0 11
distance or
10 1 6 4.167 0 7 12
distance between
11 5 20 5.667 9 0 15
the two clusters (or
12 1 17 5.800 10 0 14
13 4 19 6.000
cases)
8
joined0
at 17
14 1 15 7.933 each stage0
12 16
15 2 5 8.200 4 11 18
16 1 3 9.714 14 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
STEPS TO FIND THE NUMBER OF CLUSTERS USING AGGLOMERATION SCHEDULE

• the agglomeration schedule, or the order in which variables

combine with each other. If we use this, we can find from the bottom
two rows going up, the maximum difference between the coefficients
at each stage. The last row indicates one cluster, the row before that
indicates a 2-cluster solution, and so on.
• Wherever the maximum difference between coefficients occurs, the
lower row indicates the number of clusters. 2-cluster solution, and so
on. Wherever the maximum difference between coefficients occurs,
the lower row indicates the number of clusters
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 For0 a good cluster
0 7
3 10 14 2.667 solution,
0 you 1will 8
4 2 13 3.000 see0 a sudden0 jump 15
5 5 11 3.000 in 0the distance0 9
6 3 8 3.000 coefficient
0 (or
0 a 16
7 6 12 3.333 sudden
2 drop0in the 10
8 4 10 3.500 similarity
0 3 13
9 5 9 4.000 5 0
coefficient.(read 11
10 1 6 4.167 0 the bottom
from 7 to 12
11 5 20 5.667 9 so the 0
top) 15
12 1 17 5.800 10
stopping 0 is
stage 14
13 4 19 6.000 188which means 0 a 17
14 1 15 7.933 12 cluster 0
two 16
15 2 5 8.200 4 11 18
solution.
16 1 3 9.714 14 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 0 0 7
3 10 14 2.667 0 1 8
4
The next part2of 13 3.000 0 0 15
5 the table shows
5 11 3.000 0 0 9
6 the stage at which
3 8 3.000 0 0 16
7 each cluster first
6 12 3.333 2 0 10
8 appears. Single4 10 3.500 0 3 13
9 cases existed5 9 4.000 5 0 11
10 before we started
1 6 4.167 0 7 12
11 the analysis,5so 20 5.667 9 0 15
12 they are indicated
1 17 5.800 10 0 14
13 by zeroes here.4 19 6.000 8 0 17
14 1 15 7.933 12 0 16
15 2 5 8.200 4 11 18
16 1 3 9.714 14 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 0 0 7
3 10 14 2.667 0 1 8
4 2 13 3.000 0 0 15
5 5 11 3.000 0 0 9
6 3 8 3.000 0 0 16
7 6 12 3.333 2 0 10
8 4 10 3.500 0 3 13
9 consider case 14 5 9 4.000 5 0 11
10 1
for stage 1 & stage 6 4.167 0 7 12
11 3. 1 here shows5 20 5.667 9 0 15
12 1
that cluster 14 had 17 5.800 10 0 14
13 already appeared.4 19 6.000 8 0 17
14 1 15 7.933 12 0 16
15 2 5 8.200 4 11 18
16 1 3 9.714 14 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
Agglomeration Schedule
Stage Cluster First
Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 14 16 2.000 0 0 3
2 6 7 2.000 0 0 7
3 10 14 2.667 0 1 8
4 2 13 3.000 0 0 15
5 5 11 3.000 0 0 9
6 3 8 3.000 0 0 16
7 6 12 3.333 2 0 10
8 The
4 last column
10 3.500 0 3 13
9 shows
5 the 9 4.000 5 0 11
10 subsequent
1 6 stage 4.167 0 7 12
11 at
5 which the
20 newly 5.667 9 0 15
12 merged
1 cluster
17 is 5.800 10 0 14
13 combined
4 19with yet 6.000 8 0 17
14 another
1 cluster.
15 Eg 7.933 12 0 16
15 consider
2 case
5 14 8.200 4 11 18
16 for
1 stage 1 3& 3 9.714 14 6 19
17 4 18 10.067 13 0 18
18 2 4 25.212 15 17 19
19 1 2 34.589 16 18 0
Interpretation 2

The dendrogram, additionally

provides a rescaled distance
measure between the various cluster
combines at various stages.
SPSS – K-means Cluster Analysis

 K-means does clustering, assign clusters to respondents and

describes clusters based on dimensions.
 Open Psychiatric Test – CA file.
 Analyze/Classify/K-means Cluster Analysis
 Move all importance_ variables, Number of Clusters = 3 (from earlier
hierarchical clustering)
 Iterate (number of iterations = 99)
 Save (Cluster Membership, Distance from Cluster Centers)
 Options (Initial Clusters, Anova Table, Exclude Cases Pairwise)
 OK
Interpretation 1
The initial cluster centers are the variable
values of the k well-spaced observations.
(The "cluster center" is the arithmetic mean
of all the points belonging to the cluster)

• The iteration history shows the progress of the

clustering process at each step.
• In early iterations, the cluster centers shift.
• By the 3rd iteration, cluster centers have
converged, The process stops when there is
no change in cluster centers.
Interpretation 2

Each case is allotted to either cluster

1 or cluster 2 here. With the help of
this table we can conclude how
many cases fall in each cluster(cluster
size)
Interpretation 3

The ANOVA table indicates which variables

contribute the most to your cluster solution.
If the ‘Sig.’ for variable is less than 0.05, the
contribution of that variable for
differentiating clusters is significant.
Interpretation 4

The final cluster centers are computed as the mean for each variable within each final
cluster. The final cluster centers reflect the characteristics of the typical case for each
cluster.
Interpretation -5

Cluster size
Thank you.

UK Full Year 2024
No ratings yet
UK Full Year 2024
33 pages
Aula - Análise de Clusters
No ratings yet
Aula - Análise de Clusters
93 pages
The Occult Knowledge - Strategies of Epi PDF
100% (3)
The Occult Knowledge - Strategies of Epi PDF
61 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
DM 4
No ratings yet
DM 4
76 pages
JTBD ODI For Business To Buttons Stockholm May 2018
No ratings yet
JTBD ODI For Business To Buttons Stockholm May 2018
69 pages
Cluster Analysis
No ratings yet
Cluster Analysis
101 pages
Cluster Analysis
No ratings yet
Cluster Analysis
61 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Formal/Official Letters: Sample - Letter To The Editor
No ratings yet
Formal/Official Letters: Sample - Letter To The Editor
10 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Group#10 (Cluster Analysis)
No ratings yet
Group#10 (Cluster Analysis)
53 pages
Cluster Analysis
No ratings yet
Cluster Analysis
67 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
DWDS Unit 6 Cluster Analysis
No ratings yet
DWDS Unit 6 Cluster Analysis
31 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
19 - Clustering in Operation Research
No ratings yet
19 - Clustering in Operation Research
11 pages
Marielle Caccam Jewel Refran
No ratings yet
Marielle Caccam Jewel Refran
100 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Clustering
No ratings yet
Clustering
80 pages
Agriculture Brochure 2025
No ratings yet
Agriculture Brochure 2025
2 pages
Humanoid
No ratings yet
Humanoid
21 pages
Cluster Analysis - CFL PPT2
No ratings yet
Cluster Analysis - CFL PPT2
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
No ratings yet
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
13 pages
Digital FilmMaker Issue 48 2017
100% (1)
Digital FilmMaker Issue 48 2017
116 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
Cluster Analysis-2
No ratings yet
Cluster Analysis-2
7 pages
HUYỀN - HOMEWORK WRITING 01
No ratings yet
HUYỀN - HOMEWORK WRITING 01
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
No ratings yet
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
50 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
35 pages
Cluster Analysis: Grouping Cases or Variables
No ratings yet
Cluster Analysis: Grouping Cases or Variables
42 pages
v1.1 ABC LTB LG Day 4
No ratings yet
v1.1 ABC LTB LG Day 4
32 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
IN DataSheetForInstruments
No ratings yet
IN DataSheetForInstruments
8 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Session-13b BRM PDF
No ratings yet
Session-13b BRM PDF
18 pages
Chem JUJ K1 K2 K3 Skema Jawapan SET 2
33% (6)
Chem JUJ K1 K2 K3 Skema Jawapan SET 2
18 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Sport Mechanics For Coaches 2nd Edition Book
No ratings yet
Sport Mechanics For Coaches 2nd Edition Book
2 pages
The 4a's Lesson Plan
100% (2)
The 4a's Lesson Plan
7 pages
3rd Term Report
No ratings yet
3rd Term Report
1 page
Multi-Attribute Evaluation of Flood Management in Japan: A Choice Experiment Approach
No ratings yet
Multi-Attribute Evaluation of Flood Management in Japan: A Choice Experiment Approach
10 pages
Self Made Questionnaire Version2
No ratings yet
Self Made Questionnaire Version2
3 pages
Chapter 6 - Gregor Mendel and Genetics
100% (1)
Chapter 6 - Gregor Mendel and Genetics
20 pages
Cluster Analysis: Cosmin Lazar Como Lab Vub
No ratings yet
Cluster Analysis: Cosmin Lazar Como Lab Vub
77 pages
Eoies Task12 B2.2 Cte
No ratings yet
Eoies Task12 B2.2 Cte
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Cluster
100% (1)
Cluster
72 pages
DRR Integration in School Curricula
100% (11)
DRR Integration in School Curricula
229 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Noli Me Tangere
No ratings yet
Noli Me Tangere
2 pages
Cluster Analysis
No ratings yet
Cluster Analysis
77 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
Comparison of "Infiltration" and "Block" Technique in Control of Extraction Pain of Primary Mandibular Canine in 7 - 9 Years Old Children
No ratings yet
Comparison of "Infiltration" and "Block" Technique in Control of Extraction Pain of Primary Mandibular Canine in 7 - 9 Years Old Children
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Curriculum Vitae: Personal Details
No ratings yet
Curriculum Vitae: Personal Details
2 pages
Introductio 1
No ratings yet
Introductio 1
9 pages
Cluster Analysis For Market Segmentation
No ratings yet
Cluster Analysis For Market Segmentation
24 pages
BAB2202LAW 2034 Company Law Subject Overview - 2013
No ratings yet
BAB2202LAW 2034 Company Law Subject Overview - 2013
8 pages
Business Research Methods: Cluster Analysis
No ratings yet
Business Research Methods: Cluster Analysis
46 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Techniques & Methods in Sensory Evaluation
No ratings yet
Techniques & Methods in Sensory Evaluation
37 pages
In Marketing, Cluster Analysis Is Used For: Statistical
No ratings yet
In Marketing, Cluster Analysis Is Used For: Statistical
3 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
RectorDecryptor.2.3.14.0 07.05.2011 22.02.03 Log
No ratings yet
RectorDecryptor.2.3.14.0 07.05.2011 22.02.03 Log
2 pages
SPSS Tutorial Cluster Analysis
No ratings yet
SPSS Tutorial Cluster Analysis
42 pages
Pronest 2017 LT: Cad/Cam Nesting Software
No ratings yet
Pronest 2017 LT: Cad/Cam Nesting Software
2 pages
Instinct Theory of Motivation
No ratings yet
Instinct Theory of Motivation
4 pages
Maintenance Instructions For Chemline 3/8" - 1-1/2" SB Series Pressure Relief Valves
No ratings yet
Maintenance Instructions For Chemline 3/8" - 1-1/2" SB Series Pressure Relief Valves
3 pages
Invisibility of Class Privilege
No ratings yet
Invisibility of Class Privilege
2 pages
Chapter13 Slides
No ratings yet
Chapter13 Slides
24 pages
Key Concepts in Discrete Mathematics
From Everand
Key Concepts in Discrete Mathematics
Udayan Bhattacharya
No ratings yet
Chemistry Through Group Theory Applications
From Everand
Chemistry Through Group Theory Applications
Deepak Yadav
No ratings yet
Guide Manual to Intercept and Beat the Roulette Microprocessor
From Everand
Guide Manual to Intercept and Beat the Roulette Microprocessor
The Guru
No ratings yet
Homework Helpers: Geometry
From Everand
Homework Helpers: Geometry
Carolyn C. Wheater
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Multiplication Tables and Flashcards: Times Tables for Children
From Everand
Multiplication Tables and Flashcards: Times Tables for Children
Jack Goldstein
4/5 (1)

Cluster Analysis Finalllll

Uploaded by

Cluster Analysis Finalllll

Uploaded by

Cluster Analysis

-Prof. Chitvan Mehrotra

 Cluster is a collection of data objects

 Dissimilar to the objects in other clusters (low inter-class similarity)

 Fraud Detection – Combinations of rules, are used to explore random, 'fraudulent' or

First perform a hierarchical method to define the number of clusters.

The coefficients column indicates the

The jump in coefficients (56-19=37

• the agglomeration schedule, or the order in which variables

The dendrogram, additionally

 K-means does clustering, assign clusters to respondents and

• The iteration history shows the progress of the

Each case is allotted to either cluster

The ANOVA table indicates which variables

You might also like