0% found this document useful (0 votes)
48 views21 pages

Cluster Analysis: Mala Srivastava

This document discusses different cluster analysis techniques that can be used to group similar data points together, including hierarchical cluster analysis and k-means cluster analysis. It provides information on how each technique works, the parameters that must be specified to perform the clustering, and how to interpret the results, including looking at agglomeration schedules to determine the optimal number of clusters and using ANOVA tables to identify which variables contribute most to cluster separation. The goal of cluster analysis is to efficiently group data points into a minimal number of meaningful clusters.

Uploaded by

dimpy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views21 pages

Cluster Analysis: Mala Srivastava

This document discusses different cluster analysis techniques that can be used to group similar data points together, including hierarchical cluster analysis and k-means cluster analysis. It provides information on how each technique works, the parameters that must be specified to perform the clustering, and how to interpret the results, including looking at agglomeration schedules to determine the optimal number of clusters and using ANOVA tables to identify which variables contribute most to cluster separation. The goal of cluster analysis is to efficiently group data points into a minimal number of meaningful clusters.

Uploaded by

dimpy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Cluster analysis

Mala Srivastava
Data
• Dell
Clustering variable
What is your opinion about Dell
• And how much do you agree that Dell Computers makes ordering a computer system easy?
• And how much do you agree that Dell lets customers order computer systems customized to their specifications?
• And how much do you agree that Dell Computers delivers its products quickly?
• And how much do you agree that Dell Computers prices its products competitively?
• And how much do you agree that Dell Computers features attractively designed computer system components?
• And how much do you agree that Dell has computers that run programs quickly?
• And how much do you agree that Dell Computers has high-quality computers with no technical problems?
• And how much do you agree that Dell Computers has high-quality peripherals (e.g., monitor, keyboard, mouse, speakers, disk
drives)?
• And how much do you agree that Dell Computers bundles its computers with appropriate software?
• And how much do you agree that Dell Computers bundles its computers with Internet access?
• And how much do you agree that Dell Computers allows users to easily assemble components?
• And how much do you agree that DellComputers has computer systems that users can readily upgrade?
• And how much do you agree that Dell Computers offer easily accessible technical support?
Descriptive
• Age
• Education
• Income
• Gender
• Recommend
• Satisfied
• repurchase
Hierarchical Cluster Analysis

• This procedure attempts to identify relatively


homogeneous groups of cases (or variables) based on
selected characteristics, using an algorithm that starts
with each case (or variable) in a separate cluster and
combines clusters until only one is left. You can analyze
raw variables, or you can choose from a variety of
standardizing transformations. Distance or similarity
measures are generated by the Proximities procedure.
Statistics are displayed at each stage to help you select
the best solution.
Hierarchical cluster
• Hierarchical cluster analysis begins by separating each object into a cluster by itself. At each
stage of the analysis, the criterion by which objects are separated is relaxed in order to link the
two most similar clusters until all of the objects are joined in a complete classification tree.
• The basic criterion for any clustering is distance. Objects that are near each other should belong
to the same cluster, and objects that are far from each other should belong to different clusters.
For a given set of data, the clusters that are constructed depend on your specification of the
following parameters:
•  Cluster method defines the rules for cluster formation. For example, when calculating the
distance between two clusters, you can use the pair of nearest objects between clusters or the
pair of furthest objects, or a compromise between these methods.
•  Measure defines the formula for calculating distance. For example, the Euclidean distance
measure calculates the distance as a "straight line" between two clusters. Interval measures
assume that the variables are scale; count measures assume that they are discrete numeric;
and binary measures assume that they take only two values.
•.
Step
• To run the cluster analysis, from the menus choose:
• Analyze > Classify > Hierarchical Cluster..
• Methods -select-ward method, measure –squared Euclidean method
• Statistics – agglomeration schedule
Agglomeration schedule
• The agglomeration schedule is a numerical summary of the
cluster solution.
• At the first stage, cases 299 and 368 are combined because
they have the smallest distance.
• In stage 2 Cases 194 and 338 are combinrd because they have
small distance The cluster created by their joining next appears
in stage 4.In stage 4 , 194 and 1 combine forming a cluster with
three members 194,1,338.
Aglomeration schedule
• A good cluster solution sees a sudden jump (gap) in the
distance coefficient. The solution before the gap indicates the
good solution
• The largest gaps in the coefficient's column occur between
stages 326 and 327, indicating a 3-cluster solution, and stages
327 and 328, indicating a 2-cluster solution.
K-means cluster analysis
• K-means cluster analysis is a tool designed to assign cases to a
fixed number of groups (clusters) whose characteristics are not
yet known but are based on a set of specified variables. It is
most useful when you want to classify a large number
(thousands) of cases.
• A good cluster analysis is:
• Efficient. Uses as few clusters as possible.
• Effective. Captures all statistically and commercially important
clusters. For example, a cluster with five customers may be
statistically different but not very profitable.
K-Means Cluster
• The K-Means Cluster Analysis procedure begins with the construction
of initial cluster centers. You can assign these yourself or have the
procedure select k well-spaced observations for the cluster centers.
• After obtaining initial cluster centers, the procedure:
•   Assigns cases to clusters based on distance from the cluster
centers.
• Updates the locations of cluster centers based on the mean values of
cases in each cluster.
• These steps are repeated until any reassignment of cases would
make the clusters more internally variable or externally similar.
Steps
• To run the cluster analysis, from the menus choose:
• Analyze > Classify > K-Means Cluster...
• Type 100 as the maximum iterations.
• Select ANOVA table and Cluster information for each
group in the Statistics group.
Anova
• The ANOVA table indicates which variables contribute the most
to your cluster solution.
• Variables with large F values provide the greatest separation
between clusters.
Anova
Final cluster
• The final cluster centers are computed as the mean for each
variable within each final cluster. The final cluster centers reflect
the characteristics of the typical case for each cluster.
Final cluster centre
Difference in group
  1.0000 2.0000 3.00
Overall, how satisfied are you with your
1.4205 1.3611 2.0000
Dell computer system?
How likely would you be to recommend
1.7045 1.5000 2.4754
Dell to a friend or relative?
If you could make your computer
purchase decision again, how likely 1.3864 1.3111 1.6557
would you be to choose Dell?
age 6.6136 5.2944 4.9508
Distance between cluster
• This table shows the Euclidean distances between the final
cluster centers. Greater distances between clusters correspond
to greater dissimilarities.

You might also like