0% found this document useful (0 votes)

84 views5 pages

Cluster Analysis Detail Steps

This document provides steps for performing cluster analysis, including: (1) formulating the problem by identifying dependent and independent variables, (2) deciding which variables to use as the basis for clustering, (3) selecting a distance measure such as Euclidean distance, (4) choosing a clustering procedure like hierarchical or k-means, and (5) deciding on the optimal number of clusters. Key factors discussed include avoiding autocorrelation among variables, standardizing data on different scales, interpreting results, and evaluating solutions using metrics like iterations, agglomeration coefficients, and ANOVA. The document uses examples from burger and NFL datasets to illustrate the steps and interpretation of cluster analysis outputs.

Uploaded by

Tram Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views5 pages

Cluster Analysis Detail Steps

Uploaded by

Tram Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

MKTG3850 Cluster Analysis Page 1

Steps to cluster analysis.

Formulate the Problem

You need a DV and IV thought process where the DV are the groups and the IV are the
reasons why observations or rows belong to one group and not the other group(s).

In the Burger dataset, the groups would separate on calories for two reasons. One, there
is a range of values from less than 200 calories to more than a thousand calories. Two,
calories reflects the content of the item. As I add more sodium, protein, carbohydrates,
then the caloric count increases.

In addition to calories, I would consider what items would explain membership in a

cluster. Wheat flour wrap compared to almond flour wrap would not explain much if
anything. Bun compared to bunless could explain something. On the other hand, from a
nutrition view, carbohydrates would serve as a better variable instead of bun compared
to bunless. From an analytics view, carbohydrates measured in grams serves as a better
measure instead of bun compared to bunless because carbohydrates is measured as a
ratio variable and bun-bunless is measured as a nominal variable.

In the NFL dataset, I would look for groups based on wins because we have been using
wins throughout the semester. Separately, we could perform cluster analysis around (a)
offensive variables, (b) defensive variables, or (c) special teams variables. For our
purposes, though, we have been considering wins.

Decide Which Variables to Use as Bases for Clustering

Select variables for inclusion. This step represents the most critical of all the steps. The
inclusion and exclusion of variables drives your cluster effort. Too often we rely on
intuition and data availability to drive this decision.

There are several approaches to this step. One, review the managerial question to
understand the objective that you (as the analyst) are trying to accomplish through
cluster analysis. This objective should serve as your guide throughout the analysis.

Two, perform a correlation matrix. As you add more variables or columns to the
analysis, autocorrelation becomes a problem. In the Burger dataset, in reviewing
correlation coefficients for the three measures of fat, I observe coefficients high enough
(greater than .8) to consider removing at least two from the analysis (see Correlation
tab).

In reviewing the correlation coefficients involving sugar, I see that two variables, sugar
and protein, appear as a weak relationship instead of a strong relationship or
autocorrelated.
MKTG3850 Cluster Analysis Page 2

Three, perform a factor analysis. Items that load on to one factor higher than .8 or lower
than .6 could be considered for removal because of either autocorrelation or lack of an
explanatory power.

Select Distance Measure

Euclidian distance is the most common. Squared Euclidian distance amplifies the
dissimilarity between groups. City block or block appears in textbook because it is
easier to calculate with less robust packages.

We need to standardize our data because distance serves as our measure. Standardizing
the data allows us to include variables measured on different scales or magnitude. A
variable measured in millions such as household income for a market and percent of
households with at least one person holding a college degree reflect different
magnitudes. To include them in an analysis such as cluster analysis, which is based on
distance, I should standardize both variables.

Remember, when standardizing variables, you (as the analyst) lose or forfeit the ability
to make interpretations or inferences about that variables. To make interpretations or
inferences about that variables, you (as the analyst) should look at the unstandardized
values AFTER a cluster solution has been reached.

Select Clustering Procedure

Two types exist of clustering procedures exist, including: (1) hierarchical and (2)
nonhierarchical. Hierarchical is good for smaller datasets or when the number of
groups is unknown. hierarchical clustering generates the dendogram that looks good in
presentations.

Several techniques exist to generate hierarchical clusters. Ward’s is focused on variance.

You (as the analyst) should know the relationship between variance and distance.
Ward’s typically generates clusters with equal membership. Centroid is good for
smaller samples. You (as the analyst) should develop a solution one method such as
Ward’s and verify in another method such as Centroid. Nearest neighbor and farthest
neighbor possess a similar relationship as Ward’s and Centroid.

Nonhierarchical has become known as k-means. The k refers to the number of groups.
Unlike hierarchical cluster, you (as the analyst) must specify the number of groups at
the outset. Nonhierarchical will not generate a dendogram. It performs better compared
to hierarchical when dealing with large datasets.
MKTG3850 Cluster Analysis Page 3

In this approach, the package will pick a random observation to serve as the initial or
seed value and then group observations that minimizes distance to that initial or seed
value. The package then repeats until a solution converges.

In terms of groups, typically three to eight groups serve as a good rule of thumb. In the
NFL dataset, we are probably looking at three to five groups if we think about wins as a
variable to separate groups. In the Dunnhumby dataset, we are probably considering no
more than nine but that would be seemingly unusual. In the Burger dataset, we are
most likely looking at three to five groups.

Two groups usually generate a high group and a low group and, as a consequence,
probably not interesting. A two-group cluster solution could work in terms of
subsetting a dataset so that we have a cluster of observations that are A and another
cluster of observations that are B. A two-group solution should lead to more analysis on
each group.

More than eight groups would be difficult to justify and require additional analysis for
support.

To verify a nonhierarchical cluster, running an ANOVA on the dependent variable

should provide sufficient support. Alternatively, a hierarchal approach should verify
the number of clusters established in the nonhierarchal approach.

Decide on Number of Clusters

In hierarchal clustering, as the amount of distance increases, then you are losing
information. Distance is your guide in these procedures. The more distance needed, the
more heterogeneous the membership within the groups becomes because you are losing
information related to the uniqueness of the group.

The dendogram will display distance in relation to number of groups formed.

Typically, the most distance shown occurs when combining two groups into one large
group.

Some packages including Enginius (nee Marketing Engineering) will display a stress
test based on the amount of distance (y-axis) in relation to the number of groups (x-
axis). In these graphs, you (as the analyst) are looking for an inflection point where the
amount of amount distance as shown by the y-axis flattens or plateau along the a-axis.

In other packages including SPSS, such a line graph can be created by charting the
coefficients from the Agglomeration Schedule. Alternatively, you (as the analyst) can
subtract the distance from each stage where two observation or observations are
grouped.
MKTG3850 Cluster Analysis Page 4

In the Burger dataset, I include the standardized scores of calories, total fat, sodium,
cholesterol, carbohydrates, fiber, and sugars. I then created a hierarchal cluster method
using Centroid and squared Euclidian distance.

In looking at the agglomeration schedule (see Agg Schedule tab), the coeffiecent to form
the first group (observation 129 and observation 142) is .004 as shown in the coefficient
column. The next group (observation 149 and observation 150) is then formed and the
coefficient is .013. The amount of distance from the first group to the second group is .
009 (.013 - .004). That’s not a lot distance.

Scroll to the end of the schedule. The distance to form one group by combining the
remaining two group is 4.634 (52.73 – 48.096). That appears as a lot of distance
compared to the amount of distance to form the first two groups. To form two groups
by combing the remaining three groups is 20.675 (48.096 – 27.421). Looking at the
difference, I am looking at either a six-group solution, four-group, or a three-group
solution. Remember, I want to work with as few groups as possible while retaining as
much information as possible.

In looking at the cluster membership (see Cluster Membership tab), cluster 2 from a
seven-cluster solution merges with cluster 1. In looking at the table, it would be seem
reasonable for those two types of food items to group.

Alternatively, you (as the analyst) could work with a nonhierarchal approach. I prefer
that you (as the student) take this approach for this course so that you gain experience
with it. Nonhierarchal has gained popularity among digital analysts because
nonhierarchal performs better with large datasets.

Using the Burger dataset, there are several pieces of the output that I want to look at it.
First, I want to consider the Iteration History. Some researchers will set the iteration
value at 10, others at 20, and some at 50. In the Iteration History, you (as the analyst)
want to know how many iterations it took for the change in the cluster centers to reach
zero (0). The fewer the iterations, the more support for that cluster solution.

At two clusters, it takes 10 iterations for the values to reach zero in both groups (see 2
Cluster tab). At three clusters, the solution does not converge at the tenth iteration (see 3
Cluster tab). At six clusters, the solution converges at the fourth iteration (see 6 Cluster
tab).

Second, the Final Cluster Centers will provide the centroid value for each variable.
Looking at each variable, you (as the analyst) can get a sense of what each cluster is
high, medium, and low in for each variable. A comparison to each cluster based on each
variable remains possible because the you (as the analyst) relied on the standardized
value.
MKTG3850 Cluster Analysis Page 5

Graphing the values provided in the Final Cluster Center with a bar chart will provide a
visual display to make interpretation easier. In the two-cluster solution, I see that I have
a high group and a low group.

Third, you (as the analyst) can determine the heterogeneity or dissimilarity between
groups using the values provided in the Distance between Final Cluster Centers. In the
two-cluster solution, the distance between clusters appears as 3.37, which provides
enough support that heterogeneity has been achieved.

At four clusters, the clusters appear sufficiently heterogenous or dissimilar (see 4

Cluster tab). At six clusters, cluster 2 and cluster 6 could be homogenous or similar
based on calories.

To resolve this issue, you (as the analyst) should conduct an ANOVA with the cluster
membership serving as the independent variable and the dependent variable should be
standardize calories. The post-hoc analysis must be included. In the post-hoc analysis,
clusters 1 and 2 do not appear different than other clusters (see 6 Cluster ANOVA).
Therefore, a six-cluster solution should no longer be considered. In the four-cluster and
five-cluster solutions, support exists for both solutions.

Finally, back in the cluster analysis output, the ANOVA table provides two needed
elements. One, if a variable lacks significance then the variable should be removed from
the cluster analysis. Two, based on the F-ratio values, a comparison between variables
can be determined. In the two-clusters solution, calories (F = 240.967) provide more
weight compared to sugars (F = 22.67), fiber (F = 28.134), and carbohydrates(F = 53.611).
That is, the package relies more on calories to form groups than other variables (see 2
Cluster tab).

Interpret

You (as the analyst) need to decide on the number of clusters and support that decision.
Then, you (as the analyst) should develop the mean values of each unstandardized
variable for each cluster level, and finish by naming each cluster group. Finally, provide
a recommendation.

In the Burger dataset, I arrived at a five-cluster solution because clusters 2 through 4

appear unique enough that a managerial recommendation would exist for each group
(see 5 Cluster Means tab). At a four-cluster solution, which is also defensible, too much
information would be lost.

A fast-casual outlet could introduce a sandwich with (a) flavored cheeses, sauce, and
three beef hamburger patties (Big and Beefy), (b) cheese, sauce, and bacon with either
fried chicken or a lot of turkey (Sweet and Salty), or (c) unbreaded chicken breast or
turkey with no cheese or sauce (Flat Fiber).

Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Physics: Answer Any Six Questions: Question No. 4 Is Compulsory: (6 X2 12)
No ratings yet
Physics: Answer Any Six Questions: Question No. 4 Is Compulsory: (6 X2 12)
3 pages
Gross Margin Analysis: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
0% (1)
Gross Margin Analysis: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
3 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
ĐỀ CƯƠNG ÔN TẬP HỌC KỲ 2
No ratings yet
ĐỀ CƯƠNG ÔN TẬP HỌC KỲ 2
17 pages
Weekly Training Plan and Accomplishment Report
No ratings yet
Weekly Training Plan and Accomplishment Report
8 pages
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
TOPIC5
No ratings yet
TOPIC5
13 pages
UGRD CS6105 Discrete Mathematics Legit Not Quizzess
100% (1)
UGRD CS6105 Discrete Mathematics Legit Not Quizzess
8 pages
PRESENTATION ON Annotated Bibliography
40% (5)
PRESENTATION ON Annotated Bibliography
19 pages
Cluster Analysis - CFL PPT2
No ratings yet
Cluster Analysis - CFL PPT2
10 pages
Jadwal KBM 2023-2024 Gasal - Revisi Tgl.30!7!23-2
No ratings yet
Jadwal KBM 2023-2024 Gasal - Revisi Tgl.30!7!23-2
7 pages
Cluster Analysis Notes
No ratings yet
Cluster Analysis Notes
37 pages
Lec 35
No ratings yet
Lec 35
18 pages
BGP MBA 11 - Syllabus - BMKT 2533 - B2B Marketing (Coticchia)
No ratings yet
BGP MBA 11 - Syllabus - BMKT 2533 - B2B Marketing (Coticchia)
9 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
The Teaching Ministry of Jesus 4
No ratings yet
The Teaching Ministry of Jesus 4
6 pages
Unsupervised Learning: Uses of Cluster Analysis
No ratings yet
Unsupervised Learning: Uses of Cluster Analysis
2 pages
BA 2023 - 2024 T03 Descriptive Data Mining
No ratings yet
BA 2023 - 2024 T03 Descriptive Data Mining
57 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
SHCherbal Health
No ratings yet
SHCherbal Health
281 pages
Sentence Correction - Verbal Section
No ratings yet
Sentence Correction - Verbal Section
2 pages
Understanding Consumer Behavior
No ratings yet
Understanding Consumer Behavior
2 pages
Value Proposition: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
No ratings yet
Value Proposition: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
2 pages
Retail Store Visits: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
No ratings yet
Retail Store Visits: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
2 pages
Steps To See Answers
No ratings yet
Steps To See Answers
4 pages
Lingpeng@ln - Edu.hk: Brief Course Description
No ratings yet
Lingpeng@ln - Edu.hk: Brief Course Description
5 pages
Understanding Market Research: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
No ratings yet
Understanding Market Research: To: Marketing Assistant Trainees From: Porter Gamble, Vice President of Marketing
4 pages
938 Aurora Boulevard, Cubao, Quezon City 1109 1 Semester S.Y. 2019 - 2020
No ratings yet
938 Aurora Boulevard, Cubao, Quezon City 1109 1 Semester S.Y. 2019 - 2020
1 page
Marketing Renaissance: Opportunities and Imperatives For Improving Marketing Thought, Practice, and Infrastructure
No ratings yet
Marketing Renaissance: Opportunities and Imperatives For Improving Marketing Thought, Practice, and Infrastructure
26 pages
Awards Level A2 June 2015
No ratings yet
Awards Level A2 June 2015
12 pages
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
From Everand
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
Jurex Gallo
No ratings yet
ANH - Research Methods Learning2
No ratings yet
ANH - Research Methods Learning2
3 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Criminal Appeal: T. T Sigauke, For The Accused M Musarurwa, For The State
No ratings yet
Criminal Appeal: T. T Sigauke, For The Accused M Musarurwa, For The State
7 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Vietnam Startup Ecosystem Chart
No ratings yet
Vietnam Startup Ecosystem Chart
32 pages
Cluster Analysis
No ratings yet
Cluster Analysis
61 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Cse CG Health Round Robin Final Grades 1 10
No ratings yet
Cse CG Health Round Robin Final Grades 1 10
47 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Anh Truong - Research Methodology1
No ratings yet
Anh Truong - Research Methodology1
8 pages
Statement of HPCL Upaid 2nd Interim Dividend 2016-17 As of 30-04-2017
No ratings yet
Statement of HPCL Upaid 2nd Interim Dividend 2016-17 As of 30-04-2017
424 pages
Silat Tua: The Malay Dance of Life - Preview
100% (10)
Silat Tua: The Malay Dance of Life - Preview
28 pages
Steps For Hypothesis Testing
No ratings yet
Steps For Hypothesis Testing
3 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Group#10 (Cluster Analysis)
No ratings yet
Group#10 (Cluster Analysis)
53 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Developmental Psychology: Bandura, Ross and Ross (1961)
No ratings yet
Developmental Psychology: Bandura, Ross and Ross (1961)
1 page
Module 4 ML
No ratings yet
Module 4 ML
11 pages
SAP S4 HANA For Fashion - Groupsoft US Inc.
No ratings yet
SAP S4 HANA For Fashion - Groupsoft US Inc.
4 pages
Song Lyrics of The 1950s
No ratings yet
Song Lyrics of The 1950s
10 pages
Champion
0% (1)
Champion
12 pages
Markup 01 Statistika Lanjut - Cluster Analysis 1
No ratings yet
Markup 01 Statistika Lanjut - Cluster Analysis 1
60 pages
Paraphrasing
No ratings yet
Paraphrasing
20 pages
Persephone's Quest - Entheogens - R. Gordon Wasson PDF
100% (2)
Persephone's Quest - Entheogens - R. Gordon Wasson PDF
367 pages
Cluster Analysis
No ratings yet
Cluster Analysis
67 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Pericles Nov 26
No ratings yet
Pericles Nov 26
81 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Cluster Analysis
No ratings yet
Cluster Analysis
4 pages
Some Teaching Strategies For Meeting IELTS Learners' Needs: IATEFL Harrogate April 2014
No ratings yet
Some Teaching Strategies For Meeting IELTS Learners' Needs: IATEFL Harrogate April 2014
31 pages
Advertising and Brand Management
No ratings yet
Advertising and Brand Management
2 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
Physical Assessment Write-Up
No ratings yet
Physical Assessment Write-Up
10 pages
Chapter 5 Medical Studies at Ust
No ratings yet
Chapter 5 Medical Studies at Ust
8 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
2021 BM MA Course Session 3 - Segmentation
No ratings yet
2021 BM MA Course Session 3 - Segmentation
20 pages
Pediatrics
100% (1)
Pediatrics
4 pages
PMI SP Certification - Sprintzeal
No ratings yet
PMI SP Certification - Sprintzeal
11 pages
Clustering Today
No ratings yet
Clustering Today
52 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Materials Management
No ratings yet
Materials Management
1 page
Clustering X
No ratings yet
Clustering X
2 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?
No ratings yet
What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?
32 pages
Clustering: Georg Gerber Lecture #6, 2/6/02
No ratings yet
Clustering: Georg Gerber Lecture #6, 2/6/02
50 pages
2007 IscTheory Repeat
No ratings yet
2007 IscTheory Repeat
7 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
No ratings yet
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
13 pages
11 Chapter 3
No ratings yet
11 Chapter 3
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
Spss 8
No ratings yet
Spss 8
4 pages
Advanced Marketing Research: Session 17: Cluster Analysis
No ratings yet
Advanced Marketing Research: Session 17: Cluster Analysis
8 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Cluster Analysis For Market Segmentation
No ratings yet
Cluster Analysis For Market Segmentation
24 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
47 pages
Chapter 23 - Cluster Analysis
100% (1)
Chapter 23 - Cluster Analysis
16 pages
SPSS Tutorial Cluster Analysis
No ratings yet
SPSS Tutorial Cluster Analysis
42 pages
Chapter13 Slides
No ratings yet
Chapter13 Slides
24 pages

Cluster Analysis Detail Steps

Uploaded by

Cluster Analysis Detail Steps

Uploaded by

MKTG3850 Cluster Analysis Page 1

Steps to cluster analysis.

Formulate the Problem

In addition to calories, I would consider what items would explain membership in a

Decide Which Variables to Use as Bases for Clustering

Select Distance Measure

Select Clustering Procedure

Several techniques exist to generate hierarchical clusters. Ward’s is focused on variance.

To verify a nonhierarchical cluster, running an ANOVA on the dependent variable

Decide on Number of Clusters

The dendogram will display distance in relation to number of groups formed.

At four clusters, the clusters appear sufficiently heterogenous or dissimilar (see 4

In the Burger dataset, I arrived at a five-cluster solution because clusters 2 through 4

You might also like