Cluster Analysis Finalllll
Cluster Analysis Finalllll
Similar (cohesiv e) to one another within the same cluster (high intra-class similarity)
Inter-cluster
Intra-cluster
distances are
distances are
maximized
minimized
Examples of Clustering Applications
Medicine – What are the diagnostic clusters? To answer this question the researcher
would devise a diagnostic questionnaire that includes possible symptoms (for
example, in psychology, anxiety, depression etc.). The cluster analysis can then
identify groups of patients that have similar symptoms.
Marketing – What are the customer segments? To answer this question a market
researcher may conduct a survey covering needs, attitudes, demographics, and
behavior of customers. The researcher then may use cluster analysis to identify
homogenous groups of customers that have similar needs and attitudes.
Education – What are student groups that need special attention? Researchers may
measure psychological, aptitude, and achievement characteristics. A cluster analysis
then may identify what homogeneous groups exist among students (for example, high
achievers in all subjects, or students that excel in certain subjects but fail in others).
Non hierarchical procedures- A centroid is chosen and distance from the centroid is
used to form clusters
– K-means clustering
Concept of Euclidean Distance
Objective of clustering is to group the similar objects together.
E.D. (based on the Pythagoras Theorem) is used to assess how
similar or different the objects are.
It measures similarity in terms of distance between pairs of
object.
Object with smaller distance are similar to each other than those
with larger distance
SPSS Example 1- Psychiatric
Treatment
We wanted to look at clusters of cases referred for psychiatric
treatment.
We measured each subject on four questionnaires: Spielberger Trait
Anxiety Inventory (STAI), the Beck Depression Inventory (BDI), a
measure of Intrusive Thoughts and Rumination (IT) and a measure of
Impulsive Thoughts and Actions (Impulse).
The rationale behind this analysis is that people with the same
disorder should report a similar pattern of scores across the measures
(so the profiles of their responses should be clustered).
To check the analysis, trained psychologists were asked to agree a
diagnosis based on the DSM-IV (GAD - Generalized Anxiety Disorder,
DEP – Depression & OCD - Obsessive Compulsive Disorder)
SPSS Commands
The final cluster centers are computed as the mean for each variable within each final
cluster. The final cluster centers reflect the characteristics of the typical case for each
cluster.
Interpretation -5
Cluster size
Thank you.