Cluster Analysis: Mala Srivastava

This document discusses different cluster analysis techniques that can be used to group similar data points together, including hierarchical cluster analysis and k-means cluster analysis. It provides information on how each technique works, the parameters that must be specified to perform the clustering, and how to interpret the results, including looking at agglomeration schedules to determine the optimal number of clusters and using ANOVA tables to identify which variables contribute most to cluster separation. The goal of cluster analysis is to efficiently group data points into a minimal number of meaningful clusters.

Uploaded by

dimpy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views21 pages

Cluster Analysis: Mala Srivastava

Uploaded by

dimpy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Cluster analysis

Mala Srivastava
Data
• Dell
Clustering variable
What is your opinion about Dell
• And how much do you agree that Dell Computers makes ordering a computer system easy?
• And how much do you agree that Dell lets customers order computer systems customized to their specifications?
• And how much do you agree that Dell Computers delivers its products quickly?
• And how much do you agree that Dell Computers prices its products competitively?
• And how much do you agree that Dell Computers features attractively designed computer system components?
• And how much do you agree that Dell has computers that run programs quickly?
• And how much do you agree that Dell Computers has high-quality computers with no technical problems?
• And how much do you agree that Dell Computers has high-quality peripherals (e.g., monitor, keyboard, mouse, speakers, disk
drives)?
• And how much do you agree that Dell Computers bundles its computers with appropriate software?
• And how much do you agree that Dell Computers bundles its computers with Internet access?
• And how much do you agree that Dell Computers allows users to easily assemble components?
• And how much do you agree that DellComputers has computer systems that users can readily upgrade?
• And how much do you agree that Dell Computers offer easily accessible technical support?
Descriptive
• Age
• Education
• Income
• Gender
• Recommend
• Satisfied
• repurchase
Hierarchical Cluster Analysis

• This procedure attempts to identify relatively

homogeneous groups of cases (or variables) based on
selected characteristics, using an algorithm that starts
with each case (or variable) in a separate cluster and
combines clusters until only one is left. You can analyze
raw variables, or you can choose from a variety of
standardizing transformations. Distance or similarity
measures are generated by the Proximities procedure.
Statistics are displayed at each stage to help you select
the best solution.
Hierarchical cluster
• Hierarchical cluster analysis begins by separating each object into a cluster by itself. At each
stage of the analysis, the criterion by which objects are separated is relaxed in order to link the
two most similar clusters until all of the objects are joined in a complete classification tree.
• The basic criterion for any clustering is distance. Objects that are near each other should belong
to the same cluster, and objects that are far from each other should belong to different clusters.
For a given set of data, the clusters that are constructed depend on your specification of the
following parameters:
• Cluster method defines the rules for cluster formation. For example, when calculating the
distance between two clusters, you can use the pair of nearest objects between clusters or the
pair of furthest objects, or a compromise between these methods.
• Measure defines the formula for calculating distance. For example, the Euclidean distance
measure calculates the distance as a "straight line" between two clusters. Interval measures
assume that the variables are scale; count measures assume that they are discrete numeric;
and binary measures assume that they take only two values.
•.
Step
• To run the cluster analysis, from the menus choose:
• Analyze > Classify > Hierarchical Cluster..
• Methods -select-ward method, measure –squared Euclidean method
• Statistics – agglomeration schedule
Agglomeration schedule
• The agglomeration schedule is a numerical summary of the
cluster solution.
• At the first stage, cases 299 and 368 are combined because
they have the smallest distance.
• In stage 2 Cases 194 and 338 are combinrd because they have
small distance The cluster created by their joining next appears
in stage 4.In stage 4 , 194 and 1 combine forming a cluster with
three members 194,1,338.
Aglomeration schedule
• A good cluster solution sees a sudden jump (gap) in the
distance coefficient. The solution before the gap indicates the
good solution
• The largest gaps in the coefficient's column occur between
stages 326 and 327, indicating a 3-cluster solution, and stages
327 and 328, indicating a 2-cluster solution.
K-means cluster analysis
• K-means cluster analysis is a tool designed to assign cases to a
fixed number of groups (clusters) whose characteristics are not
yet known but are based on a set of specified variables. It is
most useful when you want to classify a large number
(thousands) of cases.
• A good cluster analysis is:
• Efficient. Uses as few clusters as possible.
• Effective. Captures all statistically and commercially important
clusters. For example, a cluster with five customers may be
statistically different but not very profitable.
K-Means Cluster
• The K-Means Cluster Analysis procedure begins with the construction
of initial cluster centers. You can assign these yourself or have the
procedure select k well-spaced observations for the cluster centers.
• After obtaining initial cluster centers, the procedure:
• Assigns cases to clusters based on distance from the cluster
centers.
• Updates the locations of cluster centers based on the mean values of
cases in each cluster.
• These steps are repeated until any reassignment of cases would
make the clusters more internally variable or externally similar.
Steps
• To run the cluster analysis, from the menus choose:
• Analyze > Classify > K-Means Cluster...
• Type 100 as the maximum iterations.
• Select ANOVA table and Cluster information for each
group in the Statistics group.
Anova
• The ANOVA table indicates which variables contribute the most
to your cluster solution.
• Variables with large F values provide the greatest separation
between clusters.
Anova
Final cluster
• The final cluster centers are computed as the mean for each
variable within each final cluster. The final cluster centers reflect
the characteristics of the typical case for each cluster.
Final cluster centre
Difference in group
1.0000 2.0000 3.00
Overall, how satisfied are you with your
1.4205 1.3611 2.0000
Dell computer system?
How likely would you be to recommend
1.7045 1.5000 2.4754
Dell to a friend or relative?
If you could make your computer
purchase decision again, how likely 1.3864 1.3111 1.6557
would you be to choose Dell?
age 6.6136 5.2944 4.9508
Distance between cluster
• This table shows the Euclidean distances between the final
cluster centers. Greater distances between clusters correspond
to greater dissimilarities.

Introduction To Competitive Programming
0% (1)
Introduction To Competitive Programming
48 pages
Intelligent Control of Robotic Systems 1st Edition Laxmidhar Behera Instant Download
100% (1)
Intelligent Control of Robotic Systems 1st Edition Laxmidhar Behera Instant Download
59 pages
ArtificiaI Intelligence Engineer - Brochure - Compressed
No ratings yet
ArtificiaI Intelligence Engineer - Brochure - Compressed
27 pages
SPSS Annotated Output K Means Cluster Anal
No ratings yet
SPSS Annotated Output K Means Cluster Anal
10 pages
Data Science For Civil Engineering Unit 4 Notes
No ratings yet
Data Science For Civil Engineering Unit 4 Notes
18 pages
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
No ratings yet
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
13 pages
Ebook 042 Tutorial Spss Hierarchical Cluster Analysis
No ratings yet
Ebook 042 Tutorial Spss Hierarchical Cluster Analysis
17 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
Algorithmic Graph Theory
No ratings yet
Algorithmic Graph Theory
343 pages
Chapter 2 Accuracy and Errors
No ratings yet
Chapter 2 Accuracy and Errors
9 pages
Histogram
No ratings yet
Histogram
10 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
Grouping
No ratings yet
Grouping
98 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
TwoStep Cluster Analysis
No ratings yet
TwoStep Cluster Analysis
35 pages
Cluster Analysis: Abu Bashar
No ratings yet
Cluster Analysis: Abu Bashar
18 pages
Clustering: Analisis Big Data - Pertemuan 6
No ratings yet
Clustering: Analisis Big Data - Pertemuan 6
51 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Unit III
No ratings yet
Unit III
35 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Cluster Analysis
No ratings yet
Cluster Analysis
45 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Chapter 20: Cluster Analysis: Advance Marketing Research
No ratings yet
Chapter 20: Cluster Analysis: Advance Marketing Research
40 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Predictive Analytics and Data Mining: Segmentation Using Clustering
No ratings yet
Predictive Analytics and Data Mining: Segmentation Using Clustering
25 pages
Computational Stats Aiml Notes
No ratings yet
Computational Stats Aiml Notes
36 pages
Block 18 ST3188
No ratings yet
Block 18 ST3188
29 pages
SPSS Tutorial Cluster Analysis
No ratings yet
SPSS Tutorial Cluster Analysis
42 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research
No ratings yet
Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Ebook 037 Tutorial Spss K Means Cluster Analysis PDF
No ratings yet
Ebook 037 Tutorial Spss K Means Cluster Analysis PDF
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
About Pytorch Brief Details 1716579380
No ratings yet
About Pytorch Brief Details 1716579380
20 pages
Homework 3: Answer
No ratings yet
Homework 3: Answer
14 pages
Lec 35
No ratings yet
Lec 35
18 pages
Part IV Data Secure Transmission Control - 22.2
No ratings yet
Part IV Data Secure Transmission Control - 22.2
37 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
11 Chapter 3
No ratings yet
11 Chapter 3
17 pages
AlphaGo Tutorial Slides
No ratings yet
AlphaGo Tutorial Slides
16 pages
BSDCH ZC317 AlgorithmDesign 1-2022
No ratings yet
BSDCH ZC317 AlgorithmDesign 1-2022
7 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Traffic Sign Recognition Project
No ratings yet
Traffic Sign Recognition Project
9 pages
Market Trade Analytics Internship Assessment
No ratings yet
Market Trade Analytics Internship Assessment
3 pages
Examples of Braided Groups and Braided Matrices: Articles You May Be Interested in
No ratings yet
Examples of Braided Groups and Braided Matrices: Articles You May Be Interested in
9 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Design of Risk-Based Univariate Control Charts With Measurement Uncertainty
No ratings yet
Design of Risk-Based Univariate Control Charts With Measurement Uncertainty
7 pages
Grasshopper Optimization Algorithm Based Design of Structures
No ratings yet
Grasshopper Optimization Algorithm Based Design of Structures
1 page
Transportation and Assignment Problem
No ratings yet
Transportation and Assignment Problem
4 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
Writeup
No ratings yet
Writeup
5 pages
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
No ratings yet
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
6 pages
An Economical Class of Digital Filters For Decimation and Interpolation (El Que Propuso Este Filtro)
No ratings yet
An Economical Class of Digital Filters For Decimation and Interpolation (El Que Propuso Este Filtro)
8 pages
11-12-K Means Using SPSS
No ratings yet
11-12-K Means Using SPSS
4 pages
Option Pricing Calculator: Michael Rechenthin, PHD
No ratings yet
Option Pricing Calculator: Michael Rechenthin, PHD
2 pages
In Marketing, Cluster Analysis Is Used For: Statistical
No ratings yet
In Marketing, Cluster Analysis Is Used For: Statistical
3 pages
I 24 Nov 2023 Lab Exam Questions Material
No ratings yet
I 24 Nov 2023 Lab Exam Questions Material
2 pages
Problems: Soru 6.1
No ratings yet
Problems: Soru 6.1
2 pages
Summary Hadoop
No ratings yet
Summary Hadoop
2 pages
Q No Questions Marks BTL CO PO CO 3 PO1: 1 2 F 1.5+20PG1+0.1 (PG1) F2 1.9+30PG2+0.1 (PG2) 3 4
No ratings yet
Q No Questions Marks BTL CO PO CO 3 PO1: 1 2 F 1.5+20PG1+0.1 (PG1) F2 1.9+30PG2+0.1 (PG2) 3 4
1 page
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Rishabh Singh: Experience
No ratings yet
Rishabh Singh: Experience
1 page
Spss 8
No ratings yet
Spss 8
4 pages
Data Mining Theory Syllabus
No ratings yet
Data Mining Theory Syllabus
2 pages

Cluster Analysis: Mala Srivastava

Uploaded by

Cluster Analysis: Mala Srivastava

Uploaded by

Cluster analysis

• This procedure attempts to identify relatively

You might also like