0% found this document useful (0 votes)
62 views12 pages

Comparison of Segmentation Approaches: by Beth Horn and Wei Huang

Uploaded by

Pritam Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views12 pages

Comparison of Segmentation Approaches: by Beth Horn and Wei Huang

Uploaded by

Pritam Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Comparison of Segmentation Approaches

By Beth Horn and Wei Huang

You attended the alignment meeting with all


key stakeholders during which business and
research objectives have been thoroughly
discussed. All agreed that segmentation was
the appropriate research approach to fulfill
your goals.

Qualitative research was conducted to illuminate the end-


user’s experience with the product or service. Insightful
questionnaire items were constructed and implemented Factor Segmentation
in the quantitative survey. The survey was fielded to a Factor segmentation is based on factor analysis. The
sufficient sample of respondents. first step is to factor-analyze or form groups of attributes
that express some sort of common theme. The number
The analysis was conducted. The results were
of factors is determined using a combination of statistics
reported and the stakeholders are happy. The project
and knowledge of the category. Once the number of
was a success, but you are left wondering, if the best
factors has been determined, each respondent receives
segmentation approach was used. What if another
a score for each of the factors. Respondents are then
approach had been implemented? How would the
assigned to the factor that has the highest score.
segments have differed? Which segmentation method
would have been most appropriate to use?
K-Means Clustering
As we consider these questions, let’s review some This method attempts to identify similar groups of
popular approaches to segmentation. respondents based on selected characteristics. Like most
segmentation techniques, k-means clustering requires
Overview of Selected Segmentation that the analyst specifies the desired number of clusters
Approaches or segments. During the procedure the distances of each
Segmentation approaches can range from throwing respondent from the cluster centers are calculated. The
darts at the data to human judgment and to advanced procedure repeats until the distance between cluster
cluster modeling. We will explore four such methods: centers is maximized (or other specified criterion is
factor segmentation, k-means clustering, TwoStep cluster reached). Respondents are assigned to the cluster with
analysis, and latent class cluster analysis. the nearest center.

1.817.640.6166 or 1.800. ANALYSIS • www.decisionanalyst.com

Copyright © 2016 Decision Analyst. All rights reserved.


clusters given the input variables. The algorithm is able
to handle both continuous and categorical segmentation
variables.

Latent Class Cluster Analysis


Latent class cluster analysis uses probability modeling
to maximize the overall fit of the model to the data.
The model can identify patterns in multiple dependent
variables (such as attitudes and needs) and quantify
correlation of dependent variables with related variables
(such as buying behaviors). For each survey respondent,
The procedure provides some statistics that can the analysis delivers the probability of belonging to
provide information on the ability of each variable to each cluster (segment). Respondents are assigned to
differentiate the segments. K-means is simple to execute the cluster to which they have the highest probability of
because most statistical software packages include this belonging.
procedure, and it can be used with a large number of
This method includes statistics to guide the analyst in
respondents or data records.
selecting the optimal number of clusters, and it can
incorporate segmentation variables of mixed metrics.
TwoStep Cluster Analysis
Latent class cluster analysis can include respondents
TwoStep cluster analysis is based on hierarchical
who have missing values for some of the dependent
clustering (SPSS Inc., 2001; Zhang, et al., 1996; and
variables, which reduces the rate of misclassification
Chiu et al., 2001). The algorithm identifies groups of
(assigning consumers or businesses to the wrong
cases that exhibit similar response patterns. Typically,
segment).
cases are assigned to the cluster with the nearest center.
The analyst can specify a noise percentage (cases Comparison of Segmentation Methods
that do not belong to any cluster) however. Segment Based on Actual Data
membership is then determined by the distance of the A head-to-head comparison was devised to more fully
respondent to the closest nonnoise cluster and to the understand advantages and disadvantages of each
noise cluster. Respondents who are nearest to the noise segmentation approach discussed: factor segmentation,
cluster are considered outliers. k-means cluster analysis, TwoStep cluster, and latent
class cluster analysis. The data set used consisted of
The algorithm contains two stages: (1) preclustering and
4,156 respondents from Health and Nutrition Strategist™
(2) hierarchical clustering. The precluster stage groups
(HANS™), a Decision Analyst syndicated research
the respondents into several small clusters. The cluster
study. The data were collected online in 2006 using a
stage uses the small clusters as input and groups them
U.S. nationally representative sample from the American
into larger clusters. Based on well-defined statistics, the
Consumer Opinion® online panel.
procedure can automatically select the optimal number of

2 Decision Analyst: Comparison of Segmentation Approaches Copyright © 2016 Decision Analyst. All rights reserved.
Table 1: Segmentation Items

Attribute Battery—How satisfied are you currently with each of the following things in your life? (Each item was rated on
a three-point scale: not satisfied, somewhat satisfied, and completely satisfied.)
1. Amount of exercise I get 10. My fitness level 17. My level of education 24. My social activities
2. My current weight 11. My health 18. My level of energy 25. My spouse (or significant
3. My breakfast choices 12. My hobbies or leisure 19. My level of happiness other or close friend)
4. My circle of friends activities 20. My lifestyle 26. Community I live in
5. Clothes in my closet 13. My home 21. My lunch choices 27. My success at following
6. My coworkers 14. My home’s yard or 22. My reflection in the mirror a diet
7. My dinner choices landscaping 23. My security and personal 28. My travel opportunities
8. My faith 15. My job or livelihood safety 29. Vehicle I drive
9. My financial situation 16. My last vacation

Related Items
Question Scale Rated
30. How would you describe your physical health overall? Excellent, Very good, Good, Fair, Poor
31. How would you describe your emotional health overall? Excellent, Very good, Good, Fair, Poor
32. How would you describe the level of stress in your life? A lot of stress, Moderate stress, Minor stress, No stress
33. How would you best describe the quality of your diet (i.e., Very healthy, Somewhat healthy, Somewhat unhealthy, Very
what you eat and drink) overall? unhealthy

We selected an attribute battery containing 29 items plus The results of the factor segmentation classification are
an additional four items (overall physical health, overall shown in Table 2 on page 4.
emotional health, level of stress, and overall quality
of diet). Each item in the attribute battery related to Factor Segmentation Conclusions
satisfaction with components of the respondent’s life, An advantage of this segmentation method is that the
and it was rated on a three-point satisfaction scale results are very clear. The respondents in the “Fitness”
(not satisfied, somewhat satisfied, and completely segment have the highest standardized score on the
satisfied). The four additional items were rated on either “Fitness” factor across all segments. We can say that
4-point or 5-point categorical scales. The segmentation these respondents are satisfied with the attributes of the
items appear in Table 1. “Fitness” factor (such as my current weight and my
fitness level) but not as satisfied with Home and Work
A factor score was computed for each respondent for
Environment, Social Support, Diet, and Health. A
each of the five factors from Table 2 on page 4 using
similar pattern emerges across all segments. Another
the regression method. Factor scores are standardized
plus is that it is relatively simple to execute, as most
values with a mean of zero and a standard deviation of
statistical software packages perform factor analysis.
one. Higher factor scores indicate that the respondents
are more satisfied with the items in the factor or have As an artifact of the method, respondents tend to have a
rated the items in the factor more positively. high score on the one factor that describes the segment
to which they have been assigned and low scores on the
Each respondent was then assigned to the factor for
other factors. This may not be realistic. For example, we
which he or she had the highest and most positive score.

Copyright © 2016 Decision Analyst. All rights reserved. Decision Analyst: Comparison of Segmentation Approaches 3
Table 2: Factor Segmentation—Average Factor Scores by Segment
Segments
Home and Work
Fitness Social Support Diet Health
Environment
Percent of 25% 23% 18% 19% 21%
Respondents
Fitness 0.984 -0.450 -0.419 -0.271 -0.212
Home and Work -0.166 0.872 -0.087 -0.204 -0.256
Environment
Social Support -0.184 -0.135 0.906 -0.233 -0.237
Diet -0.121 -0.272 -0.252 0.935 -0.262
Health -0.114 -0.305 -0.283 -0.326 0.931
Note: The values in the table are standard normal scores (z-scores) that have a standard deviation of one and range from -1 to +1. A
higher factor score indicates higher levels of satisfaction with the items contained within the factor. Scores that are relatively high across
segments are highlighted in blue.

can probably think of people we know who are satisfied selected where many of the attributes’ standard scores
with both Fitness and Social Support or both Diet were significantly different across the clusters. To aid
and Health or perhaps who are dissatisfied with all five interpretation, the clusters (segments) were named.
factors. Factor segmentation might fail to capture the
Unlike factor segmentation, k-means clustering will often
multifaceted nature of consumers.
reveal segments of respondents who are highly satisfied
or dissatisfied on more than one attribute dimension. To
K-Means Cluster Analysis
further illustrate, factor scores were calculated for each of
This method can use as input the factor scores (such
the k-means clusters.
as those developed using factor analysis), the individual
attributes, or a combination. In this paper, the 33 In Table 3 on page 5, we can see that members of
individual attributes were used as the segmentation the Satisfied With Environment But Not With
variables. Fitness segment are satisfied with Home and Work
Environment and Social Support, but are not satisfied
Because k-means does not handle variables of
with their Fitness. Members of the Ultra Satisfied With
different scales very well, the individual attributes were
Life segment are satisfied with everything, but especially
transformed into a common metric—a z-score. These
satisfied by their Fitness and Diet.
standardized scores have a mean of zero and a standard
deviation of one. The higher a variable’s score, the
K-Means Cluster Analysis Conclusions
higher the actual rating on that particular variable. These
K-means cluster analysis overcomes one of the
standardized attributes were then used as input into a
potential shortfalls of factor segmentation by describing
k-means procedure.
the multidimensionality of attitudes and behaviors.
The algorithm is affected by order of the records in Consumers can be satisfied or dissatisfied with more
the data set; thus, various seed numbers and sorting than one lifestyle area, for example. K-means also offers
schemes were explored. A five-cluster solution was F-statistics that provide information about each attribute’s

4 Decision Analyst: Comparison of Segmentation Approaches Copyright © 2016 Decision Analyst. All rights reserved.
Table 3: K-Means—Average Factor Scores by Segment
Segments
Satisfied With
Fitness But Not Satisfied With Ultra
Ultra Dissatisfied Dissatisfied With With Environment But Satisfied With
With Life Fitness & Health Environment Not With Fitness Life
Percent of 16% 23% 26% 19% 15%
Respondents
Fitness -0.433 -0.802 0.563 -0.315 1.121
Home and Work -0.712 0.039 -0.389 0.623 0.564
Environment
Social Support -0.854 0.097 -0.349 0.673 0.491
Diet -0.615 -0.144 -0.059 0.216 0.695
Health -0.627 -0.342 0.169 0.258 0.566
Note: The values in the table are standard normal scores (z-scores) that have a standard deviation of 1 and range from -1 to +1. A higher
factor score indicates higher levels of satisfaction with the items contained within the factor. Scores that are relatively high across seg-
ments are highlighted in yellow. Scores that are relatively low across segments are highlighted in blue.

contribution to differentiating the clusters. These statistics are rendered useless if the segmentation inputs
statistics can be used to simplify the segmentation by are correlated (which is true in many cases). In the end,
allowing the analyst to omit attributes that have a small the analyst must use additional statistical testing, plotting
impact on the cluster solution. of differences among the attributes across clusters, and
a good dose of personal judgment to arrive at the optimal
K-means, though, assumes that all underlying variables
segmentation solution.
are continuous (interval level data). Segmentation
inputs that are count, ordinal, or ranked variables are
TwoStep Cluster Analysis
not appropriate. Transformations of such attributes to a
Factor scores or individual attributes can serve as input
common metric must be accomplished before clustering.
into TwoStep cluster analysis. Additionally, TwoStep can
Another disadvantage to k-means is that the outcome
handle categorical variables, such as demographics
is affected by the order of the data records. Various
(e.g., gender, ethnicity) rated on a satisfaction scale.
ordering schemes can be explored to test the robustness
For the current analysis, the 33 individual attributes,
of the k-means solutions.
classified as categorical, were used as the segmentation
K-means also requires the analyst to specify the number variables.
of clusters desired. In some statistical packages,
To determine the number of clusters, the analyst can
the procedure provides limited statistics to guide the
specify the number or have the procedure select the
analyst in identifying the optimal number of clusters.
number of clusters, based on the Bayesian Information
For example, the FASTCLUS procedure in SAS® (SAS
Criterion (BIC) or Akaike Information Criterion (AIC).
Institute Inc., 2008) prints the approximate expected
There is also a provision for handling respondents who
overall R2 and the cubic-clustering criterion that can be
do not meet the criteria for inclusion in any cluster.
used to evaluate cluster solutions. Unfortunately, both

Copyright © 2016 Decision Analyst. All rights reserved. Decision Analyst: Comparison of Segmentation Approaches 5
Table 4: TwoStep Cluster—Average Factor Scores by Segment

Segments
Satisfied With Satisfied With
Ultra Dissatisfied Fitness But Not Environment Ultra
Dissatisfied With Fitness & With But Not With Satisfied With
With Life Health Environment Fitness Life
Percent of Respondents 10% 30% 28% 24% 8%
Fitness -0.466 -0.749 0.450 0.173 1.355
Home and Work Environment -0.733 -0.057 -0.265 0.465 0.727
Social Support -0.970 -0.024 -0.278 0.596 0.547
Diet -0.778 -0.171 -0.102 0.414 0.796
Health -0.747 -0.330 0.142 0.332 0.729
Note: The values in the table are standard normal scores (z-scores) that have a standard deviation of one and range from -1 to +1. A
higher factor score indicates higher levels of satisfaction with the items contained within the factor. Scores that are relatively high across
segments are highlighted in yellow. Scores that are relatively low across segments are highlighted in blue.

These “outlier” respondents are grouped together so that shown in Table 4. The five segments were assigned
they can be excluded from further profiling. the same names used in the k-means profile to aid
comparison.
The number of clusters produced by each procedure
was intended to be the same to facilitate comparisons The profile of the cluster produced by TwoStep was
among methods. Yet the automatic determination of similar to the profile of the clusters developed by
clusters was implemented in TwoStep to identify what k-means. For example, both profiles showed a segment
the “optimal” statistical solution might be, assuming no of respondents, Ultra Satisfied With Life, whose
outliers. The optimal number of clusters ranged from two members are happy with most aspects of life, and
to three, based on different orderings of the records in another segment, Ultra Dissatisfied With Life, whose
the data file. members are woefully depressed.

A five-cluster solution, in contrast, produced more As shown in Table 4, TwoStep also reveals segments of
interesting differentiation among the clusters. TwoStep respondents who are satisfied or dissatisfied on more
provides statistics (chi-square statistics for categorical than one factor. Respondents who are in the Satisfied
variables and t-statistics for continuous variables) that With Fitness But Not With Environment segment, for
quantify the relative contribution of each variable to the example, are satisfied with Fitness, but dissatisfied with
formation of a cluster. In the five-cluster solution, all Home and Work Environment and Social Support.
except five of the attributes were significant contributors. Members of the Ultra Dissatisfied With Life segment
Using this information, we omitted the five attributes (my are very unhappy with everything.
faith, my last vacation, my spouse [or significant
other or close friend], community I live in, and TwoStep Cluster Analysis Conclusions
vehicle I drive) and ran the analysis again to refine the TwoStep cluster analysis has advantages versus the
segmentation solution. The profile of the segments is methods previously discussed. One advantage deals

6 Decision Analyst: Comparison of Segmentation Approaches Copyright © 2016 Decision Analyst. All rights reserved.
with the range of cluster sizes. Factor segmentation used as a starting point for further consideration as the
and k-means tend to produce clusters that are very analyses proceed with additional clusters.
similar in size, as shown previously (ranging from 15%
Overall, TwoStep represents a mathematical
to 26%). TwoStep yielded clusters that had a larger size
improvement over factor segmentation and k-means with
range (8% to 30%). Having a segmentation solution that
handling of categorical variables and providing statistics
contains clusters of different sizes has more face validity.
to guide in determining the number of clusters.
For example, we could imagine that consumers who are
really happy with life and those who are very unhappy
Latent Class (LC) Cluster Analysis
with life comprise a smaller group than those who are
LC cluster analysis, as implemented by Latent GOLD®
more middle-of-the-road.
4.5 (Statistical Innovations Inc., 2008), allows the analyst

Another advantage is that TwoStep can use variables to select any number of segmentation inputs or indicators

that have differing scale types. Factor segmentation and covariates (such as demographics) for the model.

and k-means cannot treat variables as categorical; the The indicators are dependent variables that are used

variables must be considered continuous or transformed to define or measure the latent classes in an LC cluster

in some manner (i.e., standard score). In TwoStep, model. They are the primary drivers that determine the

though, categorical attributes can be specified as segmentation. The secondary drivers are the covariates,

such. This can encourage better separation among the which can be demographics or critical outcome

segments and easier interpretation of the results. variables, such as purchase intent for a new product.
Covariates can be treated as either active (allowed to
Yet there are disadvantages to the TwoStep method. influence the clustering) or inactive (serve as profiling
Like k-means clustering, TwoStep is influenced by the variables only) in the analysis.
order of the records in the data set. Sorting the data
Segment solutions for two different model structures
records in several ways can help the analyst understand
are reported. The first model used the 29 satisfaction
how the cluster profiles change with different orderings.
attributes as indicators, and (the four additional items
In addition, respondents with any missing values are overall physical health, overall emotional health, level of
excluded from the analysis altogether. This could stress, and overall quality of diet) as active covariates.
decrease the sample size available for segmentation if In the second model the 29 satisfaction attributes were
a large number of respondents skip or refuse to answer considered covariates, while the other four variables
critical segmentation questions. became nominal indicators. (Transformation of the
data is not needed in LC cluster analysis; the model
TwoStep gives some guidance as to the optimal
treats each variable according to its own type—nominal,
number of clusters via the BIC and AIC, whereas
ordinal, count, rank, and continuous.)
factor segmentation and k-means do not. However,
in this paper and in the experience of the authors, the Similar to TwoStep cluster, LC cluster analysis provides
automatic-clustering routine yields too few clusters and a set of cluster model selection tools, including the BIC.
is not usually useful. However, the AIC or BIC can be Statistically, the lower the BIC, the better the model
describes the data. The BIC value was still decreasing

Copyright © 2016 Decision Analyst. All rights reserved. Decision Analyst: Comparison of Segmentation Approaches 7
Table 5: LC Cluster Analysis Approach 1—Average Factor Scores by Segment
Segments
Satisfied with Satisfied With
Ultra Dissatisfied Fitness But Environment
Dissatisfied With Fitness & Not With But Not With Ultra Satisfied
With Life Health Environment Fitness With Life
Percent of Respondents 14% 23% 30% 21% 12%
Fitness -0.439 -0.882 0.502 -0.100 1.182
Home and Work Environment -0.783 0.084 -0.357 0.539 0.673
Social Support -0.909 0.105 -0.352 0.656 0.555
Diet -0.657 -0.173 -0.062 0.301 0.722
Health -0.594 -0.356 0.126 0.261 0.614
Note: The values in the table are standard normal scores (z-scores) that have a standard deviation of one and range from -1 to +1. A
higher factor score indicates higher levels of satisfaction with the items contained within the factor. Scores that are relatively high across
segments are highlighted in yellow. Scores that are relatively low across segments are highlighted in blue.

Table 6: LC Cluster Analysis Approach 2—Average Factor Scores by Segment


Segments
Satisfied With Satisfied With
Ultra Dissatisfied Fitness But Environment Ultra
Dissatisfied With Fitness & Not With But Not With Satisfied With
With Life Health Environment Fitness Life
Percent of Respondents 13% 22% 34% 13% 19%
Fitness -0.283 -0.582 0.193 -0.353 0.767
Home and Work Environment -0.325 -0.076 -0.113 0.383 0.255
Social Support -0.926 0.095 -0.249 0.957 0.321
Diet -0.350 -0.258 -0.005 0.181 0.428
Health -1.025 -0.636 0.194 0.139 1.000
Note: The values in the table are standard normal scores (z-scores) that have a standard deviation of one and range from -1 to +1. A
higher factor score indicates higher levels of satisfaction with the items contained within the factor. Scores that are relatively high across
segments are highlighted in yellow. Scores that are relatively low across segments are highlighted in blue.

Table 7: Cross-tabulation of LC Cluster Analysis Approach 2 With Approach 1


LC Cluster Analysis Approach 2 — Model includes the four variables that measures health, stress, and
diet as indicators and the 29 satisfaction attributes as active covariates.
Satisfied With Satisfied With
Ultra Dissatisfied Fitness But Environment Ultra
Dissatisfied With Fitness Not With But Not With Satisfied With
With Life & Health Environment Fitness Life
LC Cluster Ultra Dissatisfied 64% 19% 5% 0% 0%
Analysis With Life
Approach
Dissatisfied With
1 —Model 22% 62% 15% 15% 1%
includes the
Fitness & Health
29 satisfaction Satisfied With
attributes as Fitness But Not 13% 14% 59% 11% 19%
indicators and With Environment
the four variables
Satisfied With
that measure
Environment But 1% 6% 18% 65% 28%
health, stress,
Not With Fitness
and diet as active
covariates. Ultra Satisfied 0% 0% 3% 8% 52%
With Life

8 Decision Analyst: Comparison of Segmentation Approaches Copyright © 2016 Decision Analyst. All rights reserved.
for models that contained more than five clusters for on page 8), there is some overlap among segment
each of the three LC cluster models tested. Thus, membership (52% to 65%) between Latent Class
statistically, more than five clusters would be optimal Approach 1 and Approach 2. Yet classifying overall
for this data. To facilitate comparison with the other physical health, overall emotional health, level of
techniques reported in this paper, however, the five- stress, and overall quality of diet as indicators and
cluster model solution was selected for each of the LC classifying the satisfaction attributes as covariates
cluster models tested. (Approach 2) did yield segments with somewhat stronger
profiles than did Approach 1, especially in the Satisfied
LC Cluster Analysis—Approach 1 With Environment But Not With Fitness segment.
In this approach, overall physical health, overall
The Satisfied With Fitness But Not With Environment
emotional health, level of stress, and overall quality
segment is neither strongly satisfied nor dissatisfied in
of diet were used as active covariates in the model. The
any dimension. However, because these respondents
model’s covariates play a less important role (i.e., show
are moderately dissatisfied about their Social Support,
less differentiation among the segments) in the analysis
it indicates they could be on the verge of a downslide
than do the indicators (the 29 satisfaction attributes).
and might respond favorably to products/services that
Likewise the average scores for the factors in Table 5 on increase their emotional well-being. Satisfied With
page 8 are very similar to the factor scores shown for the Environment But Not With Fitness respondents have
k-means and TwoStep. the highest home and work satisfaction, yet they feel
their fitness level is lacking. These respondents might be

LC Cluster Analysis—Approach 2 career-oriented, for example, and desire fitness options

For the final variation on the LC cluster analysis, overall and products for weight loss that fit with their busy

physical health, overall emotional health, level of schedules.


stress, and overall quality of diet were considered
indicators in the cluster model, while the 29 attributes LC Cluster Analysis Conclusions
were active covariates. As shown in the cluster profile in LC cluster analysis has the most compelling
Table 6 on page 8, the segmentation solution using this methodological advantage in that it is based on
approach is similar to earlier solutions, especially to the probability modeling, unlike other segmentation methods
TwoStep; however, stronger, more pronounced profiles discussed in this paper. For this reason, one might
are evident. conclude that these segments are most likely to be
“real” and not just an interesting way of looking at the
For example, the Satisfied With Environment But Not
data. A model-based analysis allows the analyst to
With Fitness segment is much more decisively satisfied
find segments that have real linkages among attributes
with Social Support.
and behaviors with critical outcome measures, such as
As shown in the cross-tabulation of LC Cluster purchase intent or frequency of category usage. This
Analysis—Approach 2 With Approach 1 (Table 7 increases the likelihood that the resulting segments will
be useful for targeting. The model-based approach also

Copyright © 2016 Decision Analyst. All rights reserved. Decision Analyst: Comparison of Segmentation Approaches 9
yields for each respondent the probability of belonging to Implications for Marketing and
each segment. Respondents are assigned to the cluster Research
to which they have the highest probability of belonging. Within the confines of our empirical test, each

Indeed, respondents could be assigned to more than one segmentation method yielded a different segmentation

cluster, based on their probabilities. solution. Indeed, within the same method, different
variable classifications and ordering of data records can
The ability to consider segmentation inputs as either produce dissimilar solutions. Consider that there are
indicators or covariates allows the analyst to uncover even more techniques available with which to segment
potentially useful segments that may not be identified and endless permutations of variables that can be
using other methods. For example, in LC Cluster included in the analysis. The options are overwhelming.
Analysis—Approach 2, somewhat stronger segments
were found by modeling several overarching outcome Taking a step back, though, it can be helpful to consider

variables as covariates and attitudes as indicators. how the segmentation solution will be used before
selecting a technique. The segmentation methods
LC cluster analysis provides model selection criteria, discussed in this paper can provide unique benefits given
as does TwoStep cluster analysis. Yet in our data, particular business objectives.
TwoStep’s automatic cluster selection feature found
two to three clusters as optimal for the data. LC cluster If the objective is marketing communications, factor

analysis found that more than five clusters were optimal, segmentation might be the approach to use. The

statistically. Relying on TwoStep’s automatic selection of analysis is simple to execute, and the results are fairly

clusters might lead the analyst to overlook key marketing straightforward. Respondents are assigned to the

segments. segment for which they have the highest factor score;
each segment is represented by one attitudinal or
LC cluster analysis, however, can take longer to run
behavioral theme. This makes targeting a particular
versus other approaches, especially with data sets that
consumer group easier. Consumers in the Diet segment
contain thousands of respondents. For large, complex
might be targeted with a message such as “Product X is
segmentation projects, the authors have experienced run
a healthful lunch choice,” while consumers in the Fitness
times of several hours using a high-speed computer. LC
segment might receive messages such as “Product X will
cluster analysis requires advanced knowledge of statistics
help you maintain optimal fitness.”
to help the analyst wade through the myriad of options
available. Because LC cluster analysis can handle so If the business objective is new product development,

many variables, it is tempting to add more segmentation it is vital to understand how consumers group together

inputs than are really necessary. The analyst must guard according to needs. The cluster analyses, k-means,

against the urge to place “everything but the kitchen sink” TwoStep, and latent class best accomplish grouping

into the model. Undue complexity makes interpreting the respondents according to their patterns of needs.

segmentation solution more difficult. The resulting segments are based on multiple needs,
attitudes, and behaviors. Segments defined by various

10 Decision Analyst: Comparison of Segmentation Approaches Copyright © 2016 Decision Analyst. All rights reserved.
need states allow product developers to create new
products or line extensions that can meet core needs of
consumers within a particular segment. Product Y might
be developed, for instance, to address several need
states among consumers in the Sort of Dissatisfied
With Life segment—improve health and fitness,
successfully follow a diet, and decrease weight.

Once the appropriateness of various approaches has


been assessed (given the objectives of the research),
consider also the data, the strengths and limitations of
the techniques, and how market segments will be linked Link segments to important market outcomes.
to market outcomes. Some clients shy away from market segmentation

Examine the data. because previous research yielded groups that had weak

Are there many different types of scales represented relationships with key measures, such as purchase intent

in your segmentation inputs? Select the method that for new or existing products and messaging components

best accounts for the differences in variable types. Do for promotion strategy. At the initial planning stage, it

you have long attribute lists? Try factoring or other data is vital to understand which key metrics are important

reduction methods to decrease the number of variables to the client and craft an analysis plan to include these

that enter into the segmentation. There are countless metrics. In LC cluster analysis, for instance, attitudinal

ways in which variables can be combined and factored. and behavioral variables can be selected as cluster
model indicators, and new product purchase intent and
Know the techniques. demographics can be covariates in the model. Modeling
We discussed four methods in this paper. There are the data in this way can increase the likelihood that
others as well, such as discriminant analysis, principal certain attitudes and behaviors are “linked” to different
components analysis, and so forth. Review the strengths levels of purchase intent. Such results can help the
and weaknesses of each technique and understand the client company determine which segments to target first
software to which you have access. (groups that are likely to purchase the product) and how

Try more than one. to communicate with them.

As illustrated in this paper, different solutions can be


Never forget the basics.
found depending upon the underlying assumptions
Segments need to be different on easily measured
of the techniques used. If using one technique is not
variables: large enough to impact revenue; reachable
producing a solution that seems usable, try another one
through marketing, advertising, and distribution; relatively
for comparison.
stable over time; and able to respond to targeted
marketing. If, for example, your client cannot locate

Copyright © 2016 Decision Analyst. All rights reserved. Decision Analyst: Comparison of Segmentation Approaches 11
segment members to communicate with them, then the References
segments are not useful. Segmentation solutions that T. Chiu, D. Fang, J. Chen, Y. Wang, and C. Jeris.
accomplish these objectives should be favored over (2001). A Robust and Scalable Clustering Algorithm for
other solutions. Mixed Type Attributes in Large Database Environment.
Proceedings of the seventh ACM SIGKDD international
Although there can be a great deal of sophistication in
conference on knowledge discovery and data mining,
the analysis stage, segmentation is not a purely scientific
San Francisco, CA: ACM, PP. 263–268.
pursuit. Sadly, there are no magic buttons to press to
generate the “best” segments. Given that the data have SAS Institute Inc. (2008). SAS/STAT® 9.2 User’s Guide.
been modeled with the most appropriate technique(s) Cary, NC: SAS Institute, Inc.
available and that the basics are addressed, category
SPSS Inc. (2001). The SPSS TwoStep Cluster
experience and expert judgment are the final guides to
Component: A Scalable Component Enabling More
the selection of the “best” segmentation solution.
Efficient Customer Segmentation. Technical report,

Data Set Chicago, IL.

The dataset used consisted of 4,156 respondents from Statistical Innovations, Inc. (2008). Latent GOLD® 4.5.
the Decision Analyst’s Health and Nutrition Strategist™ Belmont, MA.
research study. The data were collected online in 2006
using a nationally representative sample of adults T. Zhang, R. Ramakrishnan, and M. Livny (1996).

in the U.S. recruited from the American Consumer BIRCH: An Efficient Data Clustering Method for Very

Opinion® panel. The Health and Nutrition Strategist™ Large Databases. Proceedings of the ACM SIGMOD

is a massive, integrated knowledge base of food and Conference on Management of Data, Montreal, Canada:

beverage consumption, restaurant usage, health habits, ACM, PP 103–114.

and nutritional trends.

About the Author

Beth Horn ([email protected]) is a Vice President at Decision Analyst. Wei Huang


([email protected]) is a Senior Statistical Analyst at Decision Analyst. The authors may be
reached at 1-800-262-5974 or 1-817-640-6166.

Decision Analyst is a leading international marketing research and analytical consulting firm. The company
specializes in advertising testing, strategy research, new product ideation, new product research, and
advanced modeling for marketing-decision optimization.

604 Avenue H East  Arlington, TX 76011-3100, USA


1.817.640.6166 or 1.800. ANALYSIS  www.decisionanalyst.com

12 Decision Analyst: Comparison of Segmentation Approaches Copyright © 2016 Decision Analyst. All rights reserved.

You might also like