0% found this document useful (0 votes)
23 views105 pages

Slide Share Session 15 To 18 BRM

The document provides an overview of Discriminant Analysis and Factor Analysis, detailing key concepts such as Eigenvalue, Canonical Correlation, and Wilks’ Lambda in the context of data differentiation and reduction. It outlines the applications, conditions, and statistical measures associated with Factor Analysis, emphasizing its utility in identifying underlying dimensions and simplifying complex data sets. Additionally, it discusses methods for determining the number of factors and the process of rotating factors for better interpretability.

Uploaded by

ktprashant7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views105 pages

Slide Share Session 15 To 18 BRM

The document provides an overview of Discriminant Analysis and Factor Analysis, detailing key concepts such as Eigenvalue, Canonical Correlation, and Wilks’ Lambda in the context of data differentiation and reduction. It outlines the applications, conditions, and statistical measures associated with Factor Analysis, emphasizing its utility in identifying underlying dimensions and simplifying complex data sets. Additionally, it discusses methods for determining the number of factors and the process of rotating factors for better interpretability.

Uploaded by

ktprashant7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

re

Session # 15 to 18

ha
ts
BRM

no
Section E
PGP 2024-26, Term III, IIM Nagpur
o
D
DISCRIMINANT ANALYSIS: Recap

re
ha
ts
no
o
D
Discriminant Analysis

re
ha
ts
no
o
D
Discriminant Analysis

re
ha
• Durability and performance is having less than 0.05, that means

ts
this variables significantly contributed in differentiating between
two groups.

no
• So, the mean of this two differs significantly for both the groups.
o
D
Key Terms

re
• Eigenvalue - The basic principle in the estimation of a discriminant

ha
function is that the variance between the groups relative to the
variance within the group should be maximized. The ratio of between
group variance to within group variance is called Eigenvalue.

ts
• Canonical Correlation - Canonical correlation is the simple correlation
coefficient between the discriminant score and the group membership.
(0, 1 or 1,2 etc.)

no
• Wilks’ Lambda – It is given by ratio of within group sum of squares to
total sum of squares. The Wilks’ lambda takes a value between 0 and 1
and lower the value of Wilks’ lambda, the higher is the significance of
o
the discriminant function. A statistically significant function will
enhance the reliability that the differentiation between the groups
D
exists.
Discriminant Analysis

re
ha
ts
no
o
D
Discriminant Analysis

re
ha
ts
no
o
D
Discriminant Analysis

re
ha
ts
no
o
D
• Purchase =1 (positive); does not purchase= 2
Discriminant Analysis

re
ha
ts
no
o
D
Discriminant Analysis

re
• Buying decision= -3.440 +.063*durability

ha
+.022*performance -.027*looking
• Purchase are the positive and does not

ts
purchase contributes to negative

no
o
D
Discriminant Analysis

re
ha
ts
no
o
D
Discriminant Analysis

re
ha
ts
• ‘Classification results’ is a simple summary of numbers and percent of subjects

no
classified correctly and incorrectly. 33+39=72, 72/80=0.90. This means that 90%
of the cases were correctly classified into their respective groups.
• Classification accuracy should be more than the Hit Ratio.
o
• Here 90% > 62.5% [50+50(1/4)].
D
re
ha
FACTOR ANALYSIS

ts
no
o
D
Introduction to Factor Analysis

re
• Factor analysis is a general name denoting a class of procedures

ha
primarily used for data reduction and summarization.
• Factor analysis is an interdependence technique in which an entire

ts
set of interdependent relationships is examined without making the
distinction between dependent and independent variables.

no
o
D
Introduction to Factor Analysis

re
• In factor analysis, all variables under investigation are analyzed

ha
together to extract the underlined factors.
• It is a very useful method to reduce a large number of variables

ts
(resulting in data complexity) to a few manageable factors.
• These factors explain most part of the variations of the original set of
data.

no
• A factor is a linear combination of variables.
• It is a construct that is not directly observable but that needs to be
inferred from the input variables.
o
• The factors are statistically independent.
D
Uses of Factor Analysis

re
• To identify underlying dimensions, or factors, that explain the

ha
correlations among a set of variables.
• To identify a new, smaller, set of uncorrelated variables to replace the

ts
original set of correlated variables in subsequent multivariate analysis
(regression or discriminant analysis).

no
o
D
Applications

re
• Scale construction: Factor analysis could be used to develop concise

ha
multiple item scales for measuring various constructs.
• Establish antecedents: This method reduces multiple input variables into
grouped factors. Thus, the independent variables can be grouped into

ts
broad factors.
• Psychographic profiling: Different independent variables are grouped to

no
measure independent factors. These are then used for identifying
personality types.
• Segmentation analysis: Factor analysis could also be used for
segmentation. For example, there could be different sets of two-wheelers-
o
customers because of different importance they give to factors like
D
prestige, economy consideration and functional features.
Applications

re
Marketing studies: The technique has extensive use in the field of

ha
marketing and can be successfully used for new product development;
product acceptance research, developing of advertising copy, pricing

ts
studies and for branding studies.
For example we can use it to:

no
• identify the attributes of brands that influence consumers’ choice;
• get an insight into the media habits of various consumers;
• identify the characteristics of price-sensitive customers
o
D
Conditions for Factor Analysis

re
• Requires metric data. This means the data should be either interval or ratio

ha
scale in nature.
• The variables for factor analysis are identified through exploratory research

ts
which may be conducted by reviewing the literature on the subject,
researches carried out already in this area, by informal interviews of
knowledgeable persons, qualitative analysis like focus group discussions

no
held with a small sample of the respondent population, analysis of case
studies and judgement of the researcher.
• As the responses to different statements are obtained through different
o
scales, all the responses need to be standardized. The standardization helps
D
in comparison of different responses from such scales.
Conditions for Factor Analysis

re
• The size of the sample respondents should be at least four to five times more

ha
than the number of variables (number of statements).
• The basic principle behind the application of factor analysis is that the initial
set of variables should be highly correlated. If the correlation coefficients

ts
between all the variables are small, factor analysis may not be an
appropriate technique.

no
• The significance of correlation matrix is tested using Bartlett’s test of
sphericity. The hypothesis to be tested is
• H0 : Correlation matrix is insignificant, i.e., correlation matrix is an identity
o
matrix where diagonal elements are one and off diagonal elements are zero.
D
• H1 : Correlation matrix is significant.
Conditions for Factor Analysis

re
• Bartlett's test of sphericity. Bartlett's test of sphericity is a test

ha
statistic used to examine the hypothesis that the variables are
uncorrelated in the population. In other words, the population

ts
correlation matrix is an identity matrix; each variable correlates
perfectly with itself (r = 1) but has no correlation with the other
variables (r = 0).

no
• Correlation matrix. A correlation matrix is a lower triangle matrix
showing the simple correlations, r, between all possible pairs of
variables included in the analysis. The diagonal elements, which are
o
all 1, are usually omitted.
D
Conditions for Factor Analysis

re
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-

ha
Meyer-Olkin (KMO) measure of sampling adequacy is an index used to
examine the appropriateness of factor analysis. High values (between
0.5 and 1.0) indicate factor analysis is appropriate. Values below 0.5

ts
imply that factor analysis may not be appropriate.
• The KMO statistics compares the magnitude of observed correlation
coefficients with the magnitudes of partial correlation coefficients.

no
• indicates the proportion of variance in your variables that might be
caused by underlying factors. High values (close to 1.0) generally indicate
that a factor analysis may be useful with your data.
o
• A small value of KMO shows that the variables are uncorrelated and
D
there may not be a common factor influencing them.
Statistics Associated with Factor Analysis

re
• Communality. Communality is the amount of variance a variable shares

ha
with all the other variables being considered. This is also the proportion
of variance explained by the common factors. It is a measure of the

ts
percentage of variable’s variation that is explained by the factors.
• Factor Scores. It is the composite scores estimated for each respondent

no
on the extracted factors.
• Factor Loading. Factor loadings are simple correlations between the
variables and the factors.
o
D
Statistics Associated with Factor Analysis

re
• Eigenvalue. The eigenvalue represents the total variance explained by

ha
each factor. The eigenvalue of any factor is obtained by taking the
sum of squares of the factor loadings of each component.

ts
• Factor matrix (Component Matrix). A factor matrix contains the factor
loadings of all the variables on all the factors extracted.

no
o
D
Statistics Associated with Factor Analysis

re
• Percentage of variance. The percentage of the total variance

ha
attributed to each factor.
• Residuals are the differences between the observed correlations, as

ts
given in the input correlation matrix, and the reproduced
correlations, as estimated from the factor matrix.

no
• Scree plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction.
o
D
Conducting Factor Analysis: Formulate the
Problem

re
• The objectives of factor analysis should be identified.

ha
• The variables to be included in the factor analysis should be specified
based on past research, theory, and judgment of the researcher. It is

ts
important that the variables be appropriately measured on an
interval or ratio scale.

no
• An appropriate sample size should be used. As a rough guideline,
there should be at least four or five times as many observations
(sample size) as there are variables.
o
D
An example

re
ha
ts
no
o
D
re
ha
ts
• Bartlett's test of sphericity can be used to test the null hypothesis
that the variables are uncorrelated in the population

no
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. Small
values of the KMO statistic indicate that the correlations between
pairs of variables cannot be explained by other variables and that
o
factor analysis may not be appropriate.
D
D
o
no
ts
ha
re
Determine the Method of Factor Analysis

re
• In principal components analysis, the total variance in the data is

ha
considered. The diagonal of the correlation matrix consists of unities, and
full variance is brought into the factor matrix. Principal components
analysis is recommended when the primary concern is to determine the

ts
minimum number of factors that will account for maximum variance in the
data for use in subsequent multivariate analysis. The factors are called
principal components.

no
• In common factor analysis, the factors are estimated based only on the
common variance. Communalities are inserted in the diagonal of the
correlation matrix. This method is appropriate when the primary concern is
to identify the underlying dimensions and the common variance is of
o
interest. This method is also known as principal axis factoring.
D
re
• Communality.

ha
Communality is the
amount of variance a

ts
variable shares with all
the other variables
being considered. This is

no
also the proportion of
variance explained by
the common factors.
o
D
• Eigenvalue. The eigenvalue represents the total variance explained by each
factor. The eigenvalue of any factor is obtained by taking the sum of squares
of the factor loadings of each component.

re
• Factor Loading. Factor loadings are simple correlations between the
variables and the factors.

ha
ts
no
o
D
Results of Principal Components Analysis

re
Initial Eigenvalues Blank Blank Blank

ha
Factor Eigenvalue % of Variance Cumulative %
1 2.731 45.520 45.520
2 2.218 36.969 82.488

ts
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580

no
6 0.085 1.420 100.000

Extraction Sums of Squared Loadings


Factor Eigenvalue % of Variance Cumulative %
o
1 2.731 45.520 45.520
2 2.218 36.969 82.488
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
Determine the Number of Factors

re
• A Priori Determination. Sometimes, because of prior knowledge, the

ha
researcher knows how many factors to expect and thus can specify the
number of factors to be extracted beforehand.

ts
• Determination Based on Eigenvalues. In this approach, only factors
with Eigenvalues greater than 1.0 are retained. An Eigenvalue

no
represents the amount of variance associated with the factor. Hence,
only factors with a variance greater than 1.0 are included. Factors with
variance less than 1.0 are no better than a single variable, since, due to
standardization, each variable has a variance of 1.0. If the number of
o
variables is less than 20, this approach will result in a conservative
D
number of factors.
Determine the Number of Factors

re
• Determination Based on Scree Plot. A scree plot is a plot of the

ha
Eigenvalues against the number of factors in order of extraction.
Experimental evidence indicates that the point at which the scree
begins denotes the true number of factors. Generally, the number of

ts
factors determined by a scree plot will be one or a few more than
that determined by the Eigenvalue criterion.

no
• Determination Based on Percentage of Variance. In this approach the
number of factors extracted is determined so that the cumulative
percentage of variance extracted by the factors reaches a satisfactory
level. It is recommended that the factors extracted should account for
o
at least 60% of the variance.
D
D
o
no
ts
ha
re
Rotate Factors

re
• Through rotation, the factor matrix is transformed into a simpler one

ha
that is easier to interpret.
• In rotating the factors, we would like each factor to have nonzero, or

ts
significant, loadings or coefficients for only some of the variables.
Likewise, we would like each variable to have nonzero or significant

no
loadings with only a few factors, if possible with only one.
• The rotation is called orthogonal rotation if the axes are maintained at
right angles.
o
D
Rotate Factors

re
• The most commonly used method for rotation is the varimax

ha
procedure. This is an orthogonal method of rotation that minimizes
the number of variables with high loadings on a factor, thereby

ts
enhancing the interpretability of the factors. Orthogonal rotation
results in factors that are uncorrelated.
• The rotation is called oblique rotation when the axes are not

no
maintained at right angles, and the factors are correlated. Sometimes,
allowing for correlations among factors can simplify the factor pattern
matrix. Oblique rotation should be used when factors in the
o
population are likely to be strongly correlated.
D
Factor Matrix Before and After Rotation

re
ha
ts
no
o
D
Determine the Model Fit

re
• The correlations between the variables can be deduced or

ha
reproduced from the estimated correlations between the variables
and the factors.

ts
• The differences between the observed correlations (as given in the
input correlation matrix) and the reproduced correlations (as

no
estimated from the factor matrix) can be examined to determine
model fit. These differences are called residuals.
o
D
re
ha
ts
no
o
• Residuals are the differences between the observed correlations, as given in the input
D
correlation matrix, and the reproduced correlations, as estimated from the factor matrix.
Interpret Factors

re
ha
ts
no
o
D
Applications of Factor Analysis in other
Techniques

re
• Multiple regression – Factor scores can be used in place of independent variables

ha
in a multiple regression estimation. This way we can overcome the problem of
multicollinearity.
• Simplifying the discrimination solution – A number of independent variables in a

ts
discriminant model can be replaced by a set of manageable factors before
estimation.
• Simplifying the cluster analysis solution - To make the data manageable, the
variables selected for clustering can be reduced to a more manageable number

no
using a factor analysis and the obtained factor scores can then be used to cluster
the objects/cases under study.
• Perceptual mapping in multidimensional scaling - Factor analysis that results in
factors can be used as dimensions with the factor scores as the coordinates to
o
develop attribute-based perceptual maps where one is able to comprehend the
placement of brands or products according to the identified factors under study.
D
Factor Analysis

re
• Factor scores can be used in place of independent variables in a

ha
multiple regression estimation. This way we can overcome the
problem of multicollinearity.

ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
• How to Detect These Issues?
• 1. Detecting Multicollinearity
• Variance Inflation Factor (VIF):

re
• If VIF > 5 (or 10 in some cases) → High multicollinearity.
• Correlation Matrix:

ha
• If two independent variables have a high correlation (> 0.7 or 0.8), multicollinearity
is likely.
• Eigenvalues & Condition Index:

ts
• A condition index > 30 indicates severe multicollinearity.
• 2. Detecting Autocorrelation
• Durbin-Watson Test:

no
• Values close to 2 → No autocorrelation.
• < 1.5 → Positive autocorrelation (errors follow a pattern).
• > 2.5 → Negative autocorrelation (errors oscillate).
• Residual Plot:
o
• If residuals show a clear pattern over time, autocorrelation is present.
D
• Ljung-Box Test:
• Tests if residuals are independent. If p-value < 0.05, autocorrelation exists.
Difference Between R² and Adjusted R²

re
• Difference Between R² and Adjusted R²
• Both R² (Coefficient of Determination) and Adjusted R² measure the goodness of

ha
fit in regression analysis, but they differ in how they handle the number of
predictors.
• Key Takeaway

ts
• R² tells how well the model explains the data but does not consider the number
of predictors.
• Adjusted R² adjusts for unnecessary predictors, making it more reliable for

no
multiple regression models.
• Example
• If you run a regression model:
• R² = 0.85 → The model explains 85% of the variance in the dependent variable.
o
• Adjusted R² = 0.80 → After accounting for unnecessary predictors, the model
D
effectively explains 80%.
D
o
no
ts
ha
re
Overcoming the problem of multicollinearity

re
• Using Factor Analysis

ha
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
Cluster Analysis

re
ha
ts
no
o
D
Cluster Analysis

re
ha
ts
no
o
D
Cluster Analysis

re
• Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to

ha
objects in the other clusters. Cluster analysis is also called classification analysis, or numerical
taxonomy.
• The advantage of the technique is that it is applicable to both metric and non-metric data.

ts
• The grouping can be done post hoc , i.e. after the primary data survey is over. The technique has wide
applications in all branches of management . However, it is most often used for market segmentation
analysis.

no
o
D
• Both cluster analysis and discriminant analysis are concerned with classification.
However, discriminant analysis requires prior knowledge of the cluster or group

re
membership for each object or case included, to develop the classification rule. In
contrast, in cluster analysis there is no a priori information about the group or cluster
membership for any of the objects. Groups or clusters are suggested by the data, not

ha
defined a priori.

ts
no
o
D
D
o
no
ts
ha
re
Cluster analysis- basic tenets

re
• Can be used to cluster objects, individuals and entities

ha
• Similarity is based on multiple variables
• Measures proximity between study variables

ts
• Groups that are grouped in one cluster are homogenous as compared
to others

no
• Can be conducted on metric, non-metric as well as mixed data
o
D
Usage

re
• Market segmentation – customers/potential customers can be split

ha
into smaller more homogenous groups by using the method.
• Segmenting industries – the same grouping principle can be applied

ts
for industrial consumers.
• Segmenting markets – cities or regions with similar or common traits

no
can be grouped on the basis of climatic or socio-economic conditions.
o
D
Statistics associated with cluster analysis

re
ha
• Metric data analysis

 (X − X jk )
3
d ij =
2

ts
ik
k =1

Where,

• no
dij = distance between person i and j.
k = variable (interval / ratio)
o
• i = object
D
• j = object
re
• Non-metric data

ha
• Simple matching coefficient =

ts
• Jaccard coefficient =



Where
P=positive matches no
o
• N=negative matches
• M=mismatches
D
Statistics Associated with Cluster Analysis

re
• Agglomeration schedule. A hierarchical method that provides

ha
information on the objects, starting with the most similar pair and
then at each stage provides information on the object joining the pair

ts
at a later stage. An agglomeration schedule gives information on the
objects or cases being combined at each stage of a hierarchical
clustering process.

no
• ANOVA table: The univariate or one way ANOVA statistics for each
clustering variable. The higher is the ANOVA value , the higher is the
difference between the clusters on that variable.
o
D
Statistics Associated with Cluster Analysis

re
• Cluster centroid. The cluster centroid is the mean values of the variables for all

ha
the cases or objects in a particular cluster.
• Cluster centers. The cluster centers are the initial starting points in
nonhierarchical clustering. Clusters are built around these centers, or seeds.

ts
• Cluster membership. Cluster membership indicates the cluster to which each
object or case belongs.
• Dendrogram: This is a tree like diagram that is used to graphically present the

no
cluster results. The vertical axis represents the objects and the horizontal
represents the inter-respondent distance. The figure is to be read from left to
right.
• Distances between final cluster centres: These are the distances between the
individual pairs of clusters. A robust solution that is able to demarcate the groups
o
distinctly is the one where the inter cluster distance is large; the larger the
distance the more distinct are the clusters.
D
Statistics Associated with Cluster Analysis

re
• Entropy group: The individuals or small groups that do not seem to

ha
fit into any cluster.
• Hierarchical methods: A step-wise process that starts with the most

ts
similar pair and formulates a tree-like structure composed of separate
clusters.

no
• Non-hierarchical methods: Cluster seeds or centres are the starting
points and one builds individual clusters around it based on some
pre-specified distance of the seeds.
o
D
Statistics Associated with Cluster Analysis

re
• Proximity matrix: A data matrix that consists of pair-wise

ha
distances/similarities between the objects. It is a N x N matrix, where
N is the number of objects being clustered.

ts
• Summary: Number of cases in each cluster is indicated in the non-
hierarchical clustering method.

no
• Icicle diagram: Quite similar to the dendogram, it is a graphical
method to demonstrate the composition of the clusters.
o
D
Conducting Cluster Analysis

re
ha
ts
no
o
D
Cluster Analysis

re
ha
ts
no
o
D
Cluster Analysis

re
ha
ts
no
o
D
D
o
no
ts
ha
re
Cluster Analysis

re
1. When to Use Hierarchical Clustering
• Hierarchical clustering is best when:

ha
• You have a small to medium dataset (less than a few thousand
observations).
• You don’t know the optimal number of clusters beforehand.

ts
• You want a visual representation (dendrogram) of how clusters are formed.
• Your data has a nested or hierarchical structure.

no
• You need interpretability, as it provides a hierarchy of clusters.
• Example of Hierarchical Clustering Usage:
• A hospital wants to group patients based on their medical conditions and
o
health profiles. Since they are unsure how many clusters exist, they use
Hierarchical Clustering to build a dendrogram, which helps them decide on
D
meaningful patient groups.
Cluster Analysis

re
2. When to Use K-Means Clustering
• K-Means clustering is best when:

ha
• You have a large dataset (thousands or millions of observations).
• You already have an idea of the number of clusters (K).
• You need a fast and efficient clustering algorithm.

ts
• Your data is spherically distributed (round-shaped clusters).
• You want to apply clustering in real-time applications.

no
Example of K-Means Clustering Usage:
• An e-commerce company wants to segment its customers based on their
purchase behavior. They suspect there are 3 or 4 customer segments and use K-
Means clustering to quickly divide them into clusters like:
o
• Frequent Buyers
• Discount Seekers
D
• Occasional Shoppers
• High-Value Customers
Exercise: two-Cluster Analysis class exercise 1

re
• Case Study- Shopping Mall

ha
ts
no
o
D
D
o
no
ts
ha
re
re
ha
ts
no
• Significance denote high difference in the distance between clusters
o
on particular variable.
D
D
o
no
ts
ha
re
Cluster Analysis

re
ha
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
Perceptual Maps

re
ha
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
SPSS Windows

re
To select this procedure using SPSS for Windows, click:

ha
Analyze>Classify>Hierarchical Cluster …

ts
Analyze>Classify>K-Means Cluster …

no
Analyze>Classify>Two-Step Cluster 
o
D
SPSS Windows: Hierarchical Clustering (1
of 2)

re
1. Select ANALYZE from the SPSS menu bar.

ha
2. Click CLASSIFY and then HIERARCHICAL CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best
Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the
VARIABLES box.

ts
4. In the CLUSTER box, check CASES (default option). In the DISPLAY
box, check STATISTICS and PLOTS (default options).
5. Click on STATISTICS. In the pop-up window, check

no
AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box,
check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF
CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS,
enter 4. Click CONTINUE.
o
D
SPSS Windows: Hierarchical Clustering (2
of 2)

re
6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In

ha
the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION
box, check VERTICAL or HORIZONTAL. Click CONTINUE.
7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD.

ts
In the MEASURE box, check INTERVAL and select SQUARED
EUCLIDEAN DISTANCE. Click CONTINUE.
8. Click OK.

no
o
D
SPSS Windows: K-Means Clustering

re
1. Select ANALYZE from the SPSS menu bar.

ha
2. Click CLASSIFY and then K-MEANS CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care

ts
[v5],” and “Compare Prices [v6]” into the VARIABLES box.
4. For NUMBER OF CLUSTER, select 3.

no
5. Click on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER
CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.
6. Click OK.
o
D
SPSS Windows: Two-Step Clustering

re
1. Select ANALYZE from the SPSS menu bar.

ha
2. Click CLASSIFY and then TWO-STEP CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best
Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the

ts
CONTINUOUS VARIABLES box.
4. For DISTANCE MEASURE, select EUCLIDEAN.

no
5. For NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.
6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION
CRITERION (AIC).
o
7. Click OK.
D

You might also like