re
Session # 15 to 18
ha
ts
BRM
no
Section E
PGP 2024-26, Term III, IIM Nagpur
o
D
DISCRIMINANT ANALYSIS: Recap
re
ha
ts
no
o
D
Discriminant Analysis
re
ha
ts
no
o
D
Discriminant Analysis
re
ha
• Durability and performance is having less than 0.05, that means
ts
this variables significantly contributed in differentiating between
two groups.
no
• So, the mean of this two differs significantly for both the groups.
o
D
Key Terms
re
• Eigenvalue - The basic principle in the estimation of a discriminant
ha
function is that the variance between the groups relative to the
variance within the group should be maximized. The ratio of between
group variance to within group variance is called Eigenvalue.
ts
• Canonical Correlation - Canonical correlation is the simple correlation
coefficient between the discriminant score and the group membership.
(0, 1 or 1,2 etc.)
no
• Wilks’ Lambda – It is given by ratio of within group sum of squares to
total sum of squares. The Wilks’ lambda takes a value between 0 and 1
and lower the value of Wilks’ lambda, the higher is the significance of
o
the discriminant function. A statistically significant function will
enhance the reliability that the differentiation between the groups
D
exists.
Discriminant Analysis
re
ha
ts
no
o
D
Discriminant Analysis
re
ha
ts
no
o
D
Discriminant Analysis
re
ha
ts
no
o
D
• Purchase =1 (positive); does not purchase= 2
Discriminant Analysis
re
ha
ts
no
o
D
Discriminant Analysis
re
• Buying decision= -3.440 +.063*durability
ha
+.022*performance -.027*looking
• Purchase are the positive and does not
ts
purchase contributes to negative
no
o
D
Discriminant Analysis
re
ha
ts
no
o
D
Discriminant Analysis
re
ha
ts
• ‘Classification results’ is a simple summary of numbers and percent of subjects
no
classified correctly and incorrectly. 33+39=72, 72/80=0.90. This means that 90%
of the cases were correctly classified into their respective groups.
• Classification accuracy should be more than the Hit Ratio.
o
• Here 90% > 62.5% [50+50(1/4)].
D
re
ha
FACTOR ANALYSIS
ts
no
o
D
Introduction to Factor Analysis
re
• Factor analysis is a general name denoting a class of procedures
ha
primarily used for data reduction and summarization.
• Factor analysis is an interdependence technique in which an entire
ts
set of interdependent relationships is examined without making the
distinction between dependent and independent variables.
no
o
D
Introduction to Factor Analysis
re
• In factor analysis, all variables under investigation are analyzed
ha
together to extract the underlined factors.
• It is a very useful method to reduce a large number of variables
ts
(resulting in data complexity) to a few manageable factors.
• These factors explain most part of the variations of the original set of
data.
no
• A factor is a linear combination of variables.
• It is a construct that is not directly observable but that needs to be
inferred from the input variables.
o
• The factors are statistically independent.
D
Uses of Factor Analysis
re
• To identify underlying dimensions, or factors, that explain the
ha
correlations among a set of variables.
• To identify a new, smaller, set of uncorrelated variables to replace the
ts
original set of correlated variables in subsequent multivariate analysis
(regression or discriminant analysis).
no
o
D
Applications
re
• Scale construction: Factor analysis could be used to develop concise
ha
multiple item scales for measuring various constructs.
• Establish antecedents: This method reduces multiple input variables into
grouped factors. Thus, the independent variables can be grouped into
ts
broad factors.
• Psychographic profiling: Different independent variables are grouped to
no
measure independent factors. These are then used for identifying
personality types.
• Segmentation analysis: Factor analysis could also be used for
segmentation. For example, there could be different sets of two-wheelers-
o
customers because of different importance they give to factors like
D
prestige, economy consideration and functional features.
Applications
re
Marketing studies: The technique has extensive use in the field of
ha
marketing and can be successfully used for new product development;
product acceptance research, developing of advertising copy, pricing
ts
studies and for branding studies.
For example we can use it to:
no
• identify the attributes of brands that influence consumers’ choice;
• get an insight into the media habits of various consumers;
• identify the characteristics of price-sensitive customers
o
D
Conditions for Factor Analysis
re
• Requires metric data. This means the data should be either interval or ratio
ha
scale in nature.
• The variables for factor analysis are identified through exploratory research
ts
which may be conducted by reviewing the literature on the subject,
researches carried out already in this area, by informal interviews of
knowledgeable persons, qualitative analysis like focus group discussions
no
held with a small sample of the respondent population, analysis of case
studies and judgement of the researcher.
• As the responses to different statements are obtained through different
o
scales, all the responses need to be standardized. The standardization helps
D
in comparison of different responses from such scales.
Conditions for Factor Analysis
re
• The size of the sample respondents should be at least four to five times more
ha
than the number of variables (number of statements).
• The basic principle behind the application of factor analysis is that the initial
set of variables should be highly correlated. If the correlation coefficients
ts
between all the variables are small, factor analysis may not be an
appropriate technique.
no
• The significance of correlation matrix is tested using Bartlett’s test of
sphericity. The hypothesis to be tested is
• H0 : Correlation matrix is insignificant, i.e., correlation matrix is an identity
o
matrix where diagonal elements are one and off diagonal elements are zero.
D
• H1 : Correlation matrix is significant.
Conditions for Factor Analysis
re
• Bartlett's test of sphericity. Bartlett's test of sphericity is a test
ha
statistic used to examine the hypothesis that the variables are
uncorrelated in the population. In other words, the population
ts
correlation matrix is an identity matrix; each variable correlates
perfectly with itself (r = 1) but has no correlation with the other
variables (r = 0).
no
• Correlation matrix. A correlation matrix is a lower triangle matrix
showing the simple correlations, r, between all possible pairs of
variables included in the analysis. The diagonal elements, which are
o
all 1, are usually omitted.
D
Conditions for Factor Analysis
re
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-
ha
Meyer-Olkin (KMO) measure of sampling adequacy is an index used to
examine the appropriateness of factor analysis. High values (between
0.5 and 1.0) indicate factor analysis is appropriate. Values below 0.5
ts
imply that factor analysis may not be appropriate.
• The KMO statistics compares the magnitude of observed correlation
coefficients with the magnitudes of partial correlation coefficients.
no
• indicates the proportion of variance in your variables that might be
caused by underlying factors. High values (close to 1.0) generally indicate
that a factor analysis may be useful with your data.
o
• A small value of KMO shows that the variables are uncorrelated and
D
there may not be a common factor influencing them.
Statistics Associated with Factor Analysis
re
• Communality. Communality is the amount of variance a variable shares
ha
with all the other variables being considered. This is also the proportion
of variance explained by the common factors. It is a measure of the
ts
percentage of variable’s variation that is explained by the factors.
• Factor Scores. It is the composite scores estimated for each respondent
no
on the extracted factors.
• Factor Loading. Factor loadings are simple correlations between the
variables and the factors.
o
D
Statistics Associated with Factor Analysis
re
• Eigenvalue. The eigenvalue represents the total variance explained by
ha
each factor. The eigenvalue of any factor is obtained by taking the
sum of squares of the factor loadings of each component.
ts
• Factor matrix (Component Matrix). A factor matrix contains the factor
loadings of all the variables on all the factors extracted.
no
o
D
Statistics Associated with Factor Analysis
re
• Percentage of variance. The percentage of the total variance
ha
attributed to each factor.
• Residuals are the differences between the observed correlations, as
ts
given in the input correlation matrix, and the reproduced
correlations, as estimated from the factor matrix.
no
• Scree plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction.
o
D
Conducting Factor Analysis: Formulate the
Problem
re
• The objectives of factor analysis should be identified.
ha
• The variables to be included in the factor analysis should be specified
based on past research, theory, and judgment of the researcher. It is
ts
important that the variables be appropriately measured on an
interval or ratio scale.
no
• An appropriate sample size should be used. As a rough guideline,
there should be at least four or five times as many observations
(sample size) as there are variables.
o
D
An example
re
ha
ts
no
o
D
re
ha
ts
• Bartlett's test of sphericity can be used to test the null hypothesis
that the variables are uncorrelated in the population
no
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. Small
values of the KMO statistic indicate that the correlations between
pairs of variables cannot be explained by other variables and that
o
factor analysis may not be appropriate.
D
D
o
no
ts
ha
re
Determine the Method of Factor Analysis
re
• In principal components analysis, the total variance in the data is
ha
considered. The diagonal of the correlation matrix consists of unities, and
full variance is brought into the factor matrix. Principal components
analysis is recommended when the primary concern is to determine the
ts
minimum number of factors that will account for maximum variance in the
data for use in subsequent multivariate analysis. The factors are called
principal components.
no
• In common factor analysis, the factors are estimated based only on the
common variance. Communalities are inserted in the diagonal of the
correlation matrix. This method is appropriate when the primary concern is
to identify the underlying dimensions and the common variance is of
o
interest. This method is also known as principal axis factoring.
D
re
• Communality.
ha
Communality is the
amount of variance a
ts
variable shares with all
the other variables
being considered. This is
no
also the proportion of
variance explained by
the common factors.
o
D
• Eigenvalue. The eigenvalue represents the total variance explained by each
factor. The eigenvalue of any factor is obtained by taking the sum of squares
of the factor loadings of each component.
re
• Factor Loading. Factor loadings are simple correlations between the
variables and the factors.
ha
ts
no
o
D
Results of Principal Components Analysis
re
Initial Eigenvalues Blank Blank Blank
ha
Factor Eigenvalue % of Variance Cumulative %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
ts
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580
no
6 0.085 1.420 100.000
Extraction Sums of Squared Loadings
Factor Eigenvalue % of Variance Cumulative %
o
1 2.731 45.520 45.520
2 2.218 36.969 82.488
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
Determine the Number of Factors
re
• A Priori Determination. Sometimes, because of prior knowledge, the
ha
researcher knows how many factors to expect and thus can specify the
number of factors to be extracted beforehand.
ts
• Determination Based on Eigenvalues. In this approach, only factors
with Eigenvalues greater than 1.0 are retained. An Eigenvalue
no
represents the amount of variance associated with the factor. Hence,
only factors with a variance greater than 1.0 are included. Factors with
variance less than 1.0 are no better than a single variable, since, due to
standardization, each variable has a variance of 1.0. If the number of
o
variables is less than 20, this approach will result in a conservative
D
number of factors.
Determine the Number of Factors
re
• Determination Based on Scree Plot. A scree plot is a plot of the
ha
Eigenvalues against the number of factors in order of extraction.
Experimental evidence indicates that the point at which the scree
begins denotes the true number of factors. Generally, the number of
ts
factors determined by a scree plot will be one or a few more than
that determined by the Eigenvalue criterion.
no
• Determination Based on Percentage of Variance. In this approach the
number of factors extracted is determined so that the cumulative
percentage of variance extracted by the factors reaches a satisfactory
level. It is recommended that the factors extracted should account for
o
at least 60% of the variance.
D
D
o
no
ts
ha
re
Rotate Factors
re
• Through rotation, the factor matrix is transformed into a simpler one
ha
that is easier to interpret.
• In rotating the factors, we would like each factor to have nonzero, or
ts
significant, loadings or coefficients for only some of the variables.
Likewise, we would like each variable to have nonzero or significant
no
loadings with only a few factors, if possible with only one.
• The rotation is called orthogonal rotation if the axes are maintained at
right angles.
o
D
Rotate Factors
re
• The most commonly used method for rotation is the varimax
ha
procedure. This is an orthogonal method of rotation that minimizes
the number of variables with high loadings on a factor, thereby
ts
enhancing the interpretability of the factors. Orthogonal rotation
results in factors that are uncorrelated.
• The rotation is called oblique rotation when the axes are not
no
maintained at right angles, and the factors are correlated. Sometimes,
allowing for correlations among factors can simplify the factor pattern
matrix. Oblique rotation should be used when factors in the
o
population are likely to be strongly correlated.
D
Factor Matrix Before and After Rotation
re
ha
ts
no
o
D
Determine the Model Fit
re
• The correlations between the variables can be deduced or
ha
reproduced from the estimated correlations between the variables
and the factors.
ts
• The differences between the observed correlations (as given in the
input correlation matrix) and the reproduced correlations (as
no
estimated from the factor matrix) can be examined to determine
model fit. These differences are called residuals.
o
D
re
ha
ts
no
o
• Residuals are the differences between the observed correlations, as given in the input
D
correlation matrix, and the reproduced correlations, as estimated from the factor matrix.
Interpret Factors
re
ha
ts
no
o
D
Applications of Factor Analysis in other
Techniques
re
• Multiple regression – Factor scores can be used in place of independent variables
ha
in a multiple regression estimation. This way we can overcome the problem of
multicollinearity.
• Simplifying the discrimination solution – A number of independent variables in a
ts
discriminant model can be replaced by a set of manageable factors before
estimation.
• Simplifying the cluster analysis solution - To make the data manageable, the
variables selected for clustering can be reduced to a more manageable number
no
using a factor analysis and the obtained factor scores can then be used to cluster
the objects/cases under study.
• Perceptual mapping in multidimensional scaling - Factor analysis that results in
factors can be used as dimensions with the factor scores as the coordinates to
o
develop attribute-based perceptual maps where one is able to comprehend the
placement of brands or products according to the identified factors under study.
D
Factor Analysis
re
• Factor scores can be used in place of independent variables in a
ha
multiple regression estimation. This way we can overcome the
problem of multicollinearity.
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
• How to Detect These Issues?
• 1. Detecting Multicollinearity
• Variance Inflation Factor (VIF):
re
• If VIF > 5 (or 10 in some cases) → High multicollinearity.
• Correlation Matrix:
ha
• If two independent variables have a high correlation (> 0.7 or 0.8), multicollinearity
is likely.
• Eigenvalues & Condition Index:
ts
• A condition index > 30 indicates severe multicollinearity.
• 2. Detecting Autocorrelation
• Durbin-Watson Test:
no
• Values close to 2 → No autocorrelation.
• < 1.5 → Positive autocorrelation (errors follow a pattern).
• > 2.5 → Negative autocorrelation (errors oscillate).
• Residual Plot:
o
• If residuals show a clear pattern over time, autocorrelation is present.
D
• Ljung-Box Test:
• Tests if residuals are independent. If p-value < 0.05, autocorrelation exists.
Difference Between R² and Adjusted R²
re
• Difference Between R² and Adjusted R²
• Both R² (Coefficient of Determination) and Adjusted R² measure the goodness of
ha
fit in regression analysis, but they differ in how they handle the number of
predictors.
• Key Takeaway
ts
• R² tells how well the model explains the data but does not consider the number
of predictors.
• Adjusted R² adjusts for unnecessary predictors, making it more reliable for
no
multiple regression models.
• Example
• If you run a regression model:
• R² = 0.85 → The model explains 85% of the variance in the dependent variable.
o
• Adjusted R² = 0.80 → After accounting for unnecessary predictors, the model
D
effectively explains 80%.
D
o
no
ts
ha
re
Overcoming the problem of multicollinearity
re
• Using Factor Analysis
ha
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
Cluster Analysis
re
ha
ts
no
o
D
Cluster Analysis
re
ha
ts
no
o
D
Cluster Analysis
re
• Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to
ha
objects in the other clusters. Cluster analysis is also called classification analysis, or numerical
taxonomy.
• The advantage of the technique is that it is applicable to both metric and non-metric data.
ts
• The grouping can be done post hoc , i.e. after the primary data survey is over. The technique has wide
applications in all branches of management . However, it is most often used for market segmentation
analysis.
no
o
D
• Both cluster analysis and discriminant analysis are concerned with classification.
However, discriminant analysis requires prior knowledge of the cluster or group
re
membership for each object or case included, to develop the classification rule. In
contrast, in cluster analysis there is no a priori information about the group or cluster
membership for any of the objects. Groups or clusters are suggested by the data, not
ha
defined a priori.
ts
no
o
D
D
o
no
ts
ha
re
Cluster analysis- basic tenets
re
• Can be used to cluster objects, individuals and entities
ha
• Similarity is based on multiple variables
• Measures proximity between study variables
ts
• Groups that are grouped in one cluster are homogenous as compared
to others
no
• Can be conducted on metric, non-metric as well as mixed data
o
D
Usage
re
• Market segmentation – customers/potential customers can be split
ha
into smaller more homogenous groups by using the method.
• Segmenting industries – the same grouping principle can be applied
ts
for industrial consumers.
• Segmenting markets – cities or regions with similar or common traits
no
can be grouped on the basis of climatic or socio-economic conditions.
o
D
Statistics associated with cluster analysis
re
ha
• Metric data analysis
(X − X jk )
3
d ij =
2
ts
ik
k =1
Where,
•
• no
dij = distance between person i and j.
k = variable (interval / ratio)
o
• i = object
D
• j = object
re
• Non-metric data
ha
• Simple matching coefficient =
ts
• Jaccard coefficient =
•
•
Where
P=positive matches no
o
• N=negative matches
• M=mismatches
D
Statistics Associated with Cluster Analysis
re
• Agglomeration schedule. A hierarchical method that provides
ha
information on the objects, starting with the most similar pair and
then at each stage provides information on the object joining the pair
ts
at a later stage. An agglomeration schedule gives information on the
objects or cases being combined at each stage of a hierarchical
clustering process.
no
• ANOVA table: The univariate or one way ANOVA statistics for each
clustering variable. The higher is the ANOVA value , the higher is the
difference between the clusters on that variable.
o
D
Statistics Associated with Cluster Analysis
re
• Cluster centroid. The cluster centroid is the mean values of the variables for all
ha
the cases or objects in a particular cluster.
• Cluster centers. The cluster centers are the initial starting points in
nonhierarchical clustering. Clusters are built around these centers, or seeds.
ts
• Cluster membership. Cluster membership indicates the cluster to which each
object or case belongs.
• Dendrogram: This is a tree like diagram that is used to graphically present the
no
cluster results. The vertical axis represents the objects and the horizontal
represents the inter-respondent distance. The figure is to be read from left to
right.
• Distances between final cluster centres: These are the distances between the
individual pairs of clusters. A robust solution that is able to demarcate the groups
o
distinctly is the one where the inter cluster distance is large; the larger the
distance the more distinct are the clusters.
D
Statistics Associated with Cluster Analysis
re
• Entropy group: The individuals or small groups that do not seem to
ha
fit into any cluster.
• Hierarchical methods: A step-wise process that starts with the most
ts
similar pair and formulates a tree-like structure composed of separate
clusters.
no
• Non-hierarchical methods: Cluster seeds or centres are the starting
points and one builds individual clusters around it based on some
pre-specified distance of the seeds.
o
D
Statistics Associated with Cluster Analysis
re
• Proximity matrix: A data matrix that consists of pair-wise
ha
distances/similarities between the objects. It is a N x N matrix, where
N is the number of objects being clustered.
ts
• Summary: Number of cases in each cluster is indicated in the non-
hierarchical clustering method.
no
• Icicle diagram: Quite similar to the dendogram, it is a graphical
method to demonstrate the composition of the clusters.
o
D
Conducting Cluster Analysis
re
ha
ts
no
o
D
Cluster Analysis
re
ha
ts
no
o
D
Cluster Analysis
re
ha
ts
no
o
D
D
o
no
ts
ha
re
Cluster Analysis
re
1. When to Use Hierarchical Clustering
• Hierarchical clustering is best when:
ha
• You have a small to medium dataset (less than a few thousand
observations).
• You don’t know the optimal number of clusters beforehand.
ts
• You want a visual representation (dendrogram) of how clusters are formed.
• Your data has a nested or hierarchical structure.
no
• You need interpretability, as it provides a hierarchy of clusters.
• Example of Hierarchical Clustering Usage:
• A hospital wants to group patients based on their medical conditions and
o
health profiles. Since they are unsure how many clusters exist, they use
Hierarchical Clustering to build a dendrogram, which helps them decide on
D
meaningful patient groups.
Cluster Analysis
re
2. When to Use K-Means Clustering
• K-Means clustering is best when:
ha
• You have a large dataset (thousands or millions of observations).
• You already have an idea of the number of clusters (K).
• You need a fast and efficient clustering algorithm.
ts
• Your data is spherically distributed (round-shaped clusters).
• You want to apply clustering in real-time applications.
no
Example of K-Means Clustering Usage:
• An e-commerce company wants to segment its customers based on their
purchase behavior. They suspect there are 3 or 4 customer segments and use K-
Means clustering to quickly divide them into clusters like:
o
• Frequent Buyers
• Discount Seekers
D
• Occasional Shoppers
• High-Value Customers
Exercise: two-Cluster Analysis class exercise 1
re
• Case Study- Shopping Mall
ha
ts
no
o
D
D
o
no
ts
ha
re
re
ha
ts
no
• Significance denote high difference in the distance between clusters
o
on particular variable.
D
D
o
no
ts
ha
re
Cluster Analysis
re
ha
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
Perceptual Maps
re
ha
ts
no
o
D
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
D
o
no
ts
ha
re
SPSS Windows
re
To select this procedure using SPSS for Windows, click:
ha
Analyze>Classify>Hierarchical Cluster …
ts
Analyze>Classify>K-Means Cluster …
no
Analyze>Classify>Two-Step Cluster
o
D
SPSS Windows: Hierarchical Clustering (1
of 2)
re
1. Select ANALYZE from the SPSS menu bar.
ha
2. Click CLASSIFY and then HIERARCHICAL CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best
Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the
VARIABLES box.
ts
4. In the CLUSTER box, check CASES (default option). In the DISPLAY
box, check STATISTICS and PLOTS (default options).
5. Click on STATISTICS. In the pop-up window, check
no
AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box,
check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF
CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS,
enter 4. Click CONTINUE.
o
D
SPSS Windows: Hierarchical Clustering (2
of 2)
re
6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In
ha
the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION
box, check VERTICAL or HORIZONTAL. Click CONTINUE.
7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD.
ts
In the MEASURE box, check INTERVAL and select SQUARED
EUCLIDEAN DISTANCE. Click CONTINUE.
8. Click OK.
no
o
D
SPSS Windows: K-Means Clustering
re
1. Select ANALYZE from the SPSS menu bar.
ha
2. Click CLASSIFY and then K-MEANS CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care
ts
[v5],” and “Compare Prices [v6]” into the VARIABLES box.
4. For NUMBER OF CLUSTER, select 3.
no
5. Click on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER
CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.
6. Click OK.
o
D
SPSS Windows: Two-Step Clustering
re
1. Select ANALYZE from the SPSS menu bar.
ha
2. Click CLASSIFY and then TWO-STEP CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best
Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the
ts
CONTINUOUS VARIABLES box.
4. For DISTANCE MEASURE, select EUCLIDEAN.
no
5. For NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.
6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION
CRITERION (AIC).
o
7. Click OK.
D