DIY MaxDiff 2023 04 27
DIY MaxDiff 2023 04 27
Table of Contents
Prohibitions............................................................................................................................................ 27
(
DIY MAXDIFF 1
Who is this eBook for?
• Statistical analysis
• Reporting
(
DIY MAXDIFF 2
When to use MaxDiff
(
DIY MAXDIFF 3
A MaxDiff study involves presenting a sample of respondents with a series of questions, in which
each question contains a list of alternatives and the respondent is asked which alternative they like
the most (best) and which the least (worst). An example is shown below. The list of alternatives
changes from question to question.
MaxDiff is used to resolve two practical problems with traditional rating scales:
• Poor discrimination between alternatives, with respondents in surveys often rating multiple
alternatives as very important, or 10, on a 10-point scale
• Yeah-saying biases, which are a type of response bias, whereby some respondents typically
give much higher ratings than others
Consider the problem of working out what capabilities people would most like in the President of the
United States. Asking people to rate the importance of each of the following characteristics would
likely not be so useful. We all want a decent and ethical president. But we also want a president who
is healthy. And the President needs to be good in a crisis. We would end up with a whole lot of the
capabilities shown in the table on the next page being rated as 10 out of 10 important. Some people
may give an average rating of 9, whereas others may give an average rating of 5, just because they
differ in terms of how strongly they like to state things. It is for such problems that MaxDiff is ideal.
(
DIY MAXDIFF 4
Decent/ethical Good in a crisis Concerned about Entertaining
global warming
• A ranking of alternatives in order of preference. For example, if the study is being used for
product-concept testing, the goal is to work out the relative appeal of the concepts.
(
DIY MAXDIFF 5
Creating a list of alternatives
to be evaluated
(
DIY MAXDIFF 6
Typically, MaxDiff is used to compare either attributes, advertising messages, product claims,
promotional offers, and brands, although in principle it can be used to compare any objects.
It is usually straightforward to come up with a list of brands to be compared. In a recent study where
we were interested in the relative preference for Google and Apple, we used the following list of
brands:
Apple
Google
Samsung
Sony
Microsoft
Intel
Dell
Nokia
IBM
Yahoo
The logic was to have a good cross-section of consumer software and hardware brands. Is it a good
list? We return to that in the discussion of context effects, below.
When conducting studies involving product concepts, the key is to ensure that short descriptions are
used, as a MaxDiff question becomes problematic if users cannot easily read and compare the
alternatives.
An example of attributes is the list of capabilities of a president listed in the previous chapter. More
commonly, MaxDiff studies involving attributes focus on preferences for the attributes of a product.
There are some mistakes to avoid when choosing alternatives for a MaxDiff study: having too many
alternatives, vaguely-worded alternatives, and context effects.
(
DIY MAXDIFF 7
Too many alternatives
The more alternatives, the worse the quality of the resulting data. With more alternatives you have
only two choices: You can ask more questions, which increases fatigue and reduces the quality of the
data, or you can collect less data on each alternative, which reduces the reliability of the data. The
damage of adding alternatives grows the more alternatives that you have (e.g., the degradation of
quality from having 14 over 13 alternatives is greater than that of having 11 over 10). Techniques for
dealing with large numbers of alternatives are described in Designs with many alternatives.
Vague wording
Vaguely-worded alternatives are impossible to interpret. For example, if you are evaluating gender as
an attribute, it is better to use male or female exclusively. Otherwise, if the results show that ‘Gender’
is important, you will not know which gender was most appealing.
If you ask about attributes without specifying the level of the attribute, such as asking people about
the importance of price, the resulting data will be of limited meaning. For some people "price" will just
mean “not too expensive”, while others will interpret it as a deep discount. This can partially be
improved by nominating a specific price point, such as Price of $100, but even then, the attribute has
some ambiguity when it comes to interpretation time, as one respondent may have a reference price
of $110 and another a reference price of $90. A better way to address price may be in terms of
discounts (e.g., $10 discount). However, the issue of “Compared to what?” remains, which is why
choice modeling (rather than MaxDiff) is the methodology better suited to understanding trade-offs
between attributes with different levels.
(
DIY MAXDIFF 8
Context effects
A context effect in MaxDiff occurs when we think that level of preference that a person feels towards
an alternative is conditional upon the alternatives it is compared against. In the two studies described
in this eBook, there is a reasonable likelihood of context effects. In the case of the technology study, if
a user sees a question comparing Samsung, Nokia, Google, Sony, and Apple, there is a good chance
they will be thinking about Google in terms of its hardware. By contrast, if Google is shown against
Yahoo, it will more likely be thought of in terms of its search engine. Similarly, in the study of
presidential capabilities, perhaps Successful in business and Understands economics are not in the
same question, each will become a bit more important, as people will treat them as surrogates for
each other.
In practice, context effects are something to be thought about and lived with. It is often impractical to
avoid alternatives simply because of the possibility of context effects. Moreover, experimental designs
typically present each alternative with each other alternative, so context effects can be averaged out.
This is discussed further in the next chapter.
(
DIY MAXDIFF 9
Standard experimental
designs
(
DIY MAXDIFF 10
As discussed, MaxDiff involves a series of questions – typically, six or more. Each of the questions
has the same basic structure, as shown below, and each question shows the respondent a subset of
the list of alternatives (e.g., five). People are asked to indicate which option they prefer the most, and
which they prefer the least. Each question is identical in structure but shows a different list of
alternatives. The experimental design is the term for the instructions that dictate which alternatives to
show in each question.
(
DIY MAXDIFF 11
What an experimental design looks like
The most straightforward designs involve showing each person the same questions. The table below
shows an experimental design for the technology study. This study had ten alternatives. This design
involved asking people six questions, where each question had five alternatives (options). The design
dictated that the first question should show brands 1, 3, 5, 6, and 10. The second shows brands 1, 5,
7, 8, and 9. And so on.
Multiple version designs are designs where some or all respondents are asked different questions.
Two additional columns are added at the beginning of the design. The first shows the version and the
second shows the question number within the version. The remaining columns are the same as in a
standard design. The table below shows a design with two versions (ten or more is the norm if using
multiple versions). Note that the first six rows contain the single-version design shown above, and the
next six show a new design, created by randomly swapping around aspects of the original design.
(
DIY MAXDIFF 12
Multiple-version designs come about for two different reasons. Sometimes they are used because
researchers wish to have different designs to deal with context effects, which are discussed later in
this chapter. Other times they are used to deal with large numbers of alternatives, which are
discussed in the chapter Designs with many alternatives.
The previous chapter introduced a MaxDiff study looking at ten technology brands. Here we show
how to create an experimental design for it. This is done using Displayr/Q, but the same logic can be
applied using any other software.
In this example, there are ten alternatives (brands). As the brands are relatively easy to evaluate, five
alternatives were shown per question. It is always nice to have a small number of questions, so we
start with a small number of questions – four – and a single version. It is possible to create a design
with these inputs, but not a good design: Displayr returned a list of warnings, which are reproduced
below.
(
DIY MAXDIFF 13
The first of the warnings tells us that we have too few questions and suggests we should have six.
The other warnings are specific problems with the design that are consequences of having too few
questions (or too many alternatives – these are two sides of the same coin). If we increase the
number of questions to five, we still get lots of warnings. At six, the warnings go away.
More detail about how to create the designs – as well as the characteristics of a good design – are
discussed throughout the rest of this chapter.
Typically, this is just the number of alternatives that you want to research (see the previous chapter).
For example, in the technology study described in the previous chapter, this was ten, and in the study
of presidential capabilities there were 16 alternatives.
The more alternatives, the worse the analysis in terms of the reliability of the results.
This is the number of alternatives that a respondent is asked to compare in a question. Nearly all
studies use four to six alternatives. It is difficult to envisage a situation where more than seven would
be appropriate. Considerations when choosing the number of alternatives include:
• If you have fewer than four alternatives it is not really a MaxDiff study, as three alternatives is
a complete ranking, and two alternatives prevents you from asking the “worst” question.
(
DIY MAXDIFF 14
• Cognitive difficulty: In the example question shown above this is five. With studies where the
alternatives are lengthy or difficult to understand it is often better to have four alternatives per
question. Where the alternatives are very easy to understand, six may be appropriate.
• The number of alternatives in each question should be no more than half the number of
alternatives in the entire study. Otherwise, it becomes difficult to create a good experimental
design, and a straightforward ranking exercise is likely to be sufficient.
Number of questions
A rule of thumb provided by Sawtooth Software states that the ideal number of questions is at least:
which leads to each alternative being shown to each respondent at least three times. This would
suggest that in the technology study with its ten alternatives, we should have used at least 3 * 10 / 5 =
six questions.
There are two conflicting factors to trade off when setting the number of questions. The more
questions, the more respondent fatigue, and the worse your data becomes. The fewer questions, the
less data, and the harder it is to work out the relative appeal of alternatives that have a similar level of
overall appeal. We return to this topic in the discussion of checking designs, below.
Number of versions
Should a MaxDiff study have multiple versions of the experimental design with different respondents
seeing different versions, or a single experimental design seen by all respondents? The answer to this
depends on several factors:
• If you use modern analysis techniques, such as latent class analysis and hierarchical Bayes
(both described in later chapters), then a single version is usually okay. If you are using
counting analysis, also described in a later chapter, it is generally advisable to have lots of
versions (e.g., one for every respondent), although even then counting analysis is not an
appropriate way to analyze MaxDiff data.
• Whether context effects are likely to be problematic: With a single-version MaxDiff study, the
design ensures that context effects are averaged to an extent. If we ask multiple versions, this
averaging will be more comprehensive. While some people interpret this as a strong
argument for context effects, it is not quite as strong as it appears. Context effects do not
cancel out. If you have context effects, having multiple versions just means that whatever bias
they add is estimated consistently. A simpler approach than using multiple versions is to
randomize, where this randomization is automatically performed by the data collection
software. In particular:
(
DIY MAXDIFF 15
o Randomization of question ordering: If each respondent sees the questions in a
different order, then the context effects are likely to be averaged in much the same
way as will be achieved by using multiple versions.
o Randomizing the order of alternatives: This can be done either on a question-by-
question basis, or between respondents.
• Whether the goal of the study is to compare people, or rank the alternatives: If we believe that
context effects exist, using different versions will cause them to be averaged across
respondents. If our goal is segmentation, then this is undesirable, as it will cause results to
differ by people due to differences in the versions rather than between people.
Sawtooth Software suggests that when utilizing multiple versions, as few as ten are sufficient to
minimize order and context effects. However, if administering the study online, it might be better to
have one per respondent or, if that is impractical due to how the study is administered, 100 different
versions. While there is clearly a diminishing marginal utility to adding extra designs, the cost is zero.
Repeats
When software creates an experimental design it usually starts by generating a design using
randomization and then seeks to improve this design using various algorithms. Occasionally the initial
randomly-generated design has some quirk that makes it hard for the algorithms to improve it. Most
experimental design packages have an option to repeat the process multiple times to see if it can be
improved. Generally, the “number of repeats” setting in the software should be left at its default level,
unless you have identified a problem with your design.
Increasing the number of repeats helps only occasionally. Problems with experimental designs are
usually best addressed by increasing the number of questions.
(
DIY MAXDIFF 16
Software
Experimental designs in Q are generated by selecting Create > Marketing > MaxDiff >
Experimental Design. The options correspond to the categories in the previous section.
Experimental designs in Displayr are generated by selecting Anything > Advanced Analysis >
MaxDiff > Experimental Design. The options correspond to the categories in the previous section.
In an ideal world, a MaxDiff experimental design has the following characteristics, where each
alternative appears:
1. At least three times.
2. The same number of times.
3. With each other alternative the same number of times (e.g., each alternative appears with
each other alternative twice).
(
DIY MAXDIFF 17
Due to a combination of math and a desire to
avoid respondent fatigue, few MaxDiff
experimental designs satisfy all three
requirements (the last one is particularly tough).
Earlier in the chapter we showed a single
version design for ten alternatives, five
alternatives per question, and six questions. The
output shown to the right is for a modification of
this design, where the number of alternatives per
question is reduced from five to four, which
causes the design to become poor. How can we
see it is poor?
The pairwise.frequencies table shows us how often each alternative appears in questions with
each other alternative (the main diagonal contains the frequencies). We can see that many pairs of
alternatives never appear together (e.g., 1 and 3, 1 and 7). Ideally, each alternative will appear the
same number of times with each other alternative. Such designs are sometimes also referred to as
being balanced (see the section on jargon at the end of the chapter).
As the whole purpose of a MaxDiff study is to understand the relative appeal of different options, it is
not ideal that we have pairs of alternatives that the respondent never explicitly encounters. It should
be emphasized that while it is not ideal to have two alternatives never appearing together, neither is it
a catastrophe. If using latent class analysis or hierarchical Bayes, the analysis itself is often able to
compensate for weaknesses in the experimental design, although it would be foolhardy to rely on this
without very carefully checking the design using a small sample (discussed below).
(
DIY MAXDIFF 18
1 2 3 4 5 6 7 8 9 10
1 1.00 0.00 -0.50 0.00 0.25 0.25 -0.50 0.00 -0.71 0.25
2 0.00 1.00 0.00 -0.33 0.00 0.00 -0.71 0.33 0.33 -0.71
3 -0.50 0.00 1.00 0.00 -0.50 -0.50 0.25 0.00 0.00 0.25
4 0.00 -0.33 0.00 1.00 0.00 0.00 0.00 -1.00 0.33 0.00
5 0.25 0.00 -0.50 0.00 1.00 -0.50 -0.50 0.00 0.00 0.25
6 0.25 0.00 -0.50 0.00 -0.50 1.00 0.25 0.00 0.00 -0.50
7 -0.50 -0.71 0.25 0.00 -0.50 0.25 1.00 0.00 0.00 0.25
8 0.00 0.33 0.00 -1.00 0.00 0.00 0.00 1.00 -0.33 0.00
9 -0.71 0.33 0.00 0.33 0.00 0.00 0.00 -0.33 1.00 -0.71
10 0.25 -0.71 0.25 0.00 0.25 -0.50 0.25 0.00 -0.71 1.00
The table above shows the binary correlations. This correlation matrix shows the correlations between
each of the columns of the experimental design (i.e., of the binary.design shown on the previous
page). Looking at row 4 and column 8 we see a problem. Alternative 4 and 8 are perfectly negatively
correlated. That is, whenever alternative 4 appears in the design alternative 8 does not appear, and
whenever 8 appears, 4 does not appear. One of the useful things about MaxDiff is that it can
sometimes still work even with such a flaw in the experimental design (although, again, it is a
dangerous design that needs to be carefully checked).
Another concerning aspect of this design is the large range in the correlations. In most other areas
where experimental designs are used, the ideal design results in a correlation of 0 between all the
variables. MaxDiff designs differ from this, as, on average, there will always be a negative correlation
between the variables. However, the basic idea is the same: We strive for designs where the
correlations are as close to 0 as possible. Correlations in the range of -0.5 and 0.5 are usually no
cause for concern in MaxDiff studies.
A few things can ruin a MaxDiff study. One is a poor design which cannot be used to estimate relative
preference. Another is errors in how the data is captured. A third is a poorly constructed set of
alternatives that contain one or more alternatives that are preferred or hated by everybody due just to
lazy wording. Each of these problems can be detected by checking the design using a small sample.
Good practice is to get a data file after about 10% of the data has been collected. This will be either
10% of the final sample, or just a pilot study. When working with a team that has limited experience
with MaxDiff, it is also a good idea to complete a few questions yourself, remember your responses,
and check that they are recorded correctly in the data file. The data file is checked as follows:
1. Create tables to check the balance of the design (i.e., how many times each alternative has
been shown). If the design has been programmed incorrectly this will usually be clear from
these tables.
2. Look at the raw data and check that the design in the raw data is as expected.
(
DIY MAXDIFF 19
3. Estimate a latent class analysis model (discussed in a later chapter). This is the model most
likely to detect a fundamental problem of some kind. If a subset of the respondents received
experimental designs that were flawed in some way, this can show up as an error in the latent
class analysis, whereas hierarchical Bayes models are a little less likely to detect such
problems due to the way they pool data between respondents (i.e., if one respondent has an
experimental design that is very poor, that respondent’s coefficients are assigned based on
what other respondents have answered).
4. Form preliminary conclusions. That is, check that the model is telling you what you need to
know for the project to be a success. Yes, the sampling error will be relatively high, but key
conclusions should still be making sense at this time.
This is the gold standard for checking a design. You can conduct this process along with all the other
approaches. If you are brave, you can do just this step and skip the earlier approaches; but skipping
testing on a small sample is foolhardy, as reviewing the results from a small sample of real
respondents checks things much more thoroughly than the other approaches.
One last comment on checking designs: Experienced researchers check more than novice
researchers – they have learned from pain. If this is your first MaxDiff experiment, make sure you
check everything very carefully.
When you set the number of versions to more than one, this will not change any of the warnings
described in the previous section. All these warnings relate to the quality of the design for an
individual person. Increasing the number of versions improves the design for estimating results for the
total sample, but this does not mean the designs change in any way for individual respondents. So, if
you are doing any analysis at the respondent level, changing the number of versions does not help in
any way.
Additional detailed outputs are provided when using multiple versions, which show the properties of
the whole design. These show the binary correlations across the entire design, and the pairwise
frequencies. Their interpretation is as described above, except that it relates to the entire design,
rather than to the design of each version.
Software instructions
In Q and Displayr, check the Detailed outputs option to see all the outputs.
(
DIY MAXDIFF 20
Fixing a poor design
The first thing to do when you have a poor design is to increase the setting for the number of
repeats. Start by setting it to ten. Then, if you have patience, try 100, and then bigger numbers. This
works only occasionally, and when it does, it is a good outcome.
If changing the number of repeats does not work, you need to change something else. Reducing the
number of alternatives and/or increasing the number of questions is usually effective.
Jargon
The standard experimental designs are created using incomplete block designs, where block refers to
the questions, incomplete refers to an incomplete set of alternatives appearing in each question,
alternatives are referred to as treatments, and the binary design is referred to as an incidence matrix.
A balanced incomplete block design is one where each alternative appears the same number of times
with each other alternative.
(
DIY MAXDIFF 21
Designs with many
alternatives
• Bridging designs
• Hybrid MaxDiff
• Bandit MaxDiff
(
DIY MAXDIFF 22
With a standard design, as you increase the number of alternatives you need to increase the number
of questions. This can lead to situations where there are too many questions for it to be practical to
get respondents to answer all the questions. This section lists strategies that can be used in such
situations.
The issue of how many alternatives is too many has no correct answer. The more alternatives, the
lower the quality of the data. There is no magical tipping point. This is really the same issue as the
one of how big grid questions can be in surveys, and how long a questionnaire should be. Some
experienced MaxDiff researchers use more than 100 alternatives in studies. Others never use more
than 20.
Before explaining the design strategies, a note of caution: If each respondent does not see each
alternative three or more times, we may not get reliable data at the respondent level (e.g.,
segmentation may become unreliable). A design with many alternatives is typically appropriate only
when the focus is on ranking the appeal of the alternatives.
The simplest and often best strategy is to create a large design and then randomly allocate
respondents to different questions. For example, if a design has 20 questions, each respondent may
be randomly allocated ten.
While this is a simple strategy, it is also a good strategy in that it is both simple and tends to produce
good quality designs.
(
DIY MAXDIFF 23
Bridging designs
This strategy splits the alternatives into overlapping sets, creating a separate experimental design for
each. For example, one design may be for alternatives 1 through 10, and another for alternatives 8
through 17. Or you can have one design testing, say, {1…10}, {11…20}, {21….30}, {31…40},
{1,5,9,13,17,21,25,29,33, 37}, etc.
This strategy makes most sense when there are natural groupings of alternatives. In general, it is
better to allocate questions randomly instead, as bridging works well only if the alternatives that are
common cover a good range from low to high appeal.
This approach involves deciding to show each respondent a subset of the alternatives, where an
experimental design is used to work out which respondents see which alternatives and another
design is used to show which alternatives appear in which question.
Either randomly assign subsets of alternatives to respondents or use an incomplete block design to
allocate alternatives to respondents. Then use an incomplete block design for each set of alternatives.
For example, if there are 100 alternatives, you could:
• Create a design with 100 alternatives, 20 blocks (questions) and ten alternatives per block
(Design A).
• Create a second design with ten alternatives, six questions and five alternatives per block
(Design B).
• Randomly allocate each respondent to one of the 20 blocks in Design A (i.e., so that the
respondent only sees the ten alternatives in this block).
• Use Design B to create the questions to be shown to the respondent (where the alternatives
used are dictated by the respondent's block in Design A).
(
DIY MAXDIFF 24
Hybrid MaxDiff
Respondents are asked to provide ratings of a larger number of alternatives (e.g., ratings of how
much they like them on a scale of 0 to 10), and then MaxDiff is used for a subset of the alternatives.
If using random allocation, this can be analysed in the same way as any MaxDiff experiment and the
ratings ignored. However, a better solution, which is also the required solution for the other two
1
approaches is to set it up as if an anchored MaxDiff (see Setting up the analysis of anchored
MaxDiff in Q and Displayr):
• Set up the MaxDiff as a Ranking Variable Set in Displayr or Question in Q (see the
instructions in
https://docs.displayr.com/wiki/Setting_Up_a_MaxDiff_Experiment_as_a_Ranking).
• Treat the importance data as ranking data.
It is usually a bad idea to choose which brands to show which respondents based on their ratings.
The reason that this is a bad idea is that the resulting data contains something called endogeneity,
which makes valid statistical analysis very difficult (as the residuals of the models cease to be
independent). The analysis of this data is described in the chapter on anchored MaxDiff.
1
If the other two approaches are analyzed with a standard model, such as the traditional Hierarchical Bayes
model, the model will suffer from an endogeneity bias.
(
DIY MAXDIFF 25
Constructed MaxDiff / Relevant Items MaxDiff
This is the same idea as hybrid MaxDiff, except that options that are irrelevant to each respondent are
2
also excluded.
Bandit MaxDiff
For the initial respondents this works like Sparse MaxDiff / Random question allocation, but once it
becomes clear which alternatives are most popular, only these are shown to subsequent
respondents.
This approach is useful where the goal is to identify the most preferred alternatives and preferences
between then, but it does not collect data that permits conclusions about:
• Segmentation
• Preferences among the less popular alternatives
2
Bahna, Eric and Christopher Chapman (2018), “Constructed, Augmented MaxDiff,” 2018 Sawtooth Software
Conference, Provo, UT. Accessed at:
https://www.sawtoothsoftware.com/download/techpap/2018Proceedings.pdf
(
DIY MAXDIFF 26
Prohibitions
In a MaxDiff study, prohibitions are rules regarding
which alternatives cannot be shown with which
other alternatives.
(
DIY MAXDIFF 27
A prohibition in a MaxDiff study is a rule regarding specific sets of alternatives that should not appear
in the same question. For example, in a product-testing study there may be two very similar versions
of a concept. A goal of the study may be to see which is preferred, but it may be believed that the
study will be undermined if both alternatives appear in the same question. Or, alternatives may
represent attributes, where some relate to difference levels (e.g., Price $4, Price $5, or Price $6).
Below we list some strategies for addressing prohibitions, but prior to doing so we emphasize that
prohibitions in MaxDiff are often a poor idea. One problem is that they tend to lead to bloating of the
number of alternatives, as often prohibitions are wanted to address the use of very similar
alternatives. As mentioned earlier, the fewer the alternatives the better, but if you end up having
multiple levels for each attribute, respondent fatigue will grow. A second problem with prohibitions is
that they always either result in less reliable estimates of preference or increase context effects. This
is discussed in more detail with the design strategies below. Often prohibitions are wanted when the
real solution is to use choice modeling instead.
For the situation where there are two alternatives that should not be seen together, the rigorous
approach is to randomize across respondents, so that no respondent sees both alternatives. The
process for generating the design is:
• Generate a standard experimental design, where you include only one of the alternatives to be
substituted.
• Randomly swap one of the alternatives. For example, if you have generated a design for ten
alternatives, for a random selection of respondents, replace all the alternative 10s in the design
with 11s. Alternatively, the replacement can be done sequentially (e.g., changing every second
respondent’s design).
This strategy has the disadvantage that you end up with a smaller sample size for each of the
alternatives that you do not want to appear together. Another method is to allow people to see both
alternatives. However, this runs the risk of causing context effects. For example, if you test $4 and $5,
whether somebody sees $4 first may change how they feel when they see $5, due to the anchoring
heuristic.
(
DIY MAXDIFF 28
Random swapping of alternatives
Wherever a prohibited combination appears, manipulate the design in such a way as to remove the
prohibition by replacing one of the alternatives that conflicts with the prohibition with another (and, if
possible, doing a reverse substitution elsewhere, so as to limit the reduction of balance).
Create sets by randomly selecting alternatives (e.g., using R's sample function) and discard any sets
that contain the prohibited features. Note that this approach is sensible only if different respondents
see different sets of alternatives.
Sawtooth Software
Sawtooth Software’s MaxDiff experimental design software has options for generating experimental
designs with prohibitions regarding alternatives that should not appear together. These designs will, in
a statistical sense, outperform the other strategies in this section, as they simultaneously trade off the
considerations in the earlier chapter about good designs (e.g., balance) while also addressing the
prohibitions.
While this may sound appealing, a disadvantage is that if, say, you have tested three price points,
such as $3, $4, and $5, a price alternative appears in the design up to three times as often as other
attributes. In the related field of choice modeling there is a lot of evidence that such things can
(
DIY MAXDIFF 29
increase the importance of attributes. Furthermore, as mentioned earlier, context effects are likely a
bigger issue.
If you do find yourself in a situation where you feel you need software that has this functionality,
please contact us and we will consider adding the feature.
(
DIY MAXDIFF 30
Data collection
This is the easy step. You need:
• Check.
(
DIY MAXDIFF 31
Include MaxDiff in a longer questionnaire
Typically, MaxDiff questions are asked as a part of longer questionnaires. It’s can be helpful to make
sure you have other questions in the study which can be used to cross-check the results of the
MaxDiff. For example, if doing a MaxDiff of the cell phone market, you would collect data on current
cell phone plans and carriers, and would expect that the MaxDiff results would be correlated with this
other data.
Not all survey software supports MaxDiff. There are two things that are key:
• We also recommend that you use software that will allow you to:
o Have a code list of all the alternatives and then use the design to filter these by
respondent (rather than using piping, which can be error prone and hard to analyze).
o Collect some other profiling data (e.g., age, gender, etc.)
o Check, before you program the questionnaire, that the data is going to come out in a
format that is analyzable (see the next chapter)
(
DIY MAXDIFF 32
Data files
(
DIY MAXDIFF 33
Perhaps the most painful part of a MaxDiff study is dealing with data files. The requirements are
simple, but the difficulty is that the data files provided by some data collection software platforms have
not been created with much thought about how the data needs to be used.
Qualtrics
If you are using data from Qualtrics, both Q and Displayr have special-purpose tools for reading their
MaxDiff data, and these are automatically applied by selecting the Qualtrics option when importing
the data.
This results in MaxDiff data automatically being set up as an Experiment question/variable set.
If using data from Survey Gizmo, MaxDiff can be set up in Q using Automate > Browse Online
Library > Marketing > MaxDiff > Convert Alchemer (Survey Gizmo) MaxDiff Data for Analysis,
and in Displayr using Anything > Advanced Analysis > MaxDiff > Convert Alchemer (Survey
Gizmo) MaxDiff Data for Analysis.
(
DIY MAXDIFF 34
All other data formats
The experimental design file is typically a CSV file or an Excel file containing a single sheet, where
the design is as described in What an experimental design looks like.
Data files
There are two aspects to getting an appropriate data file. The simple bit is getting a data file. The
most useful file format is normally one specifically developed for survey data, such as an SPSS data
file (file extension of *.sav). Simple file formats like CSV files and Excel files can be used but will
typically add a lot of pain (why is discussed below).
(
DIY MAXDIFF 35
Statistical analysis
(
DIY MAXDIFF 36
“Keep it simple, stupid!” is good advice in many areas of life, but not when it comes to analyzing
MaxDiff data, where simple = wrong – and wrong in not merely a pedantic way. The case study we
will use in this chapter shows how simple methods can lead to massively wrong conclusions.
Case study
In this chapter we will work our way through a MaxDiff data set based on 302 respondents asking
about the ten technology brands listed in earlier chapters. The data file is here: https://wiki.q-
researchsoftware.com/images/f/f1/Technology_2017.sav. If you are trying to reproduce all the outputs
in this eBook, please note that this study was also conducted in 2012, and some of the outputs are
from that study (in particular, Anchored MaxDiff). The design is https://wiki.q-
researchsoftware.com/images/7/78/Technology_MaxDiff_Design.csv. This design is different from the
one described in the earlier chapter on experimental design.
(
DIY MAXDIFF 37
most popular, it has its fair share of detractors. So, focusing just on its
best scores does not tell the true story.
The next table shows the differences. It now shows that Apple and
Google are almost tied in preference. But we know from looking at the
best scores that this is not correct.
What is going on here? First, Apple is the most popular brand. This last
table is just misleading. Second, and less obviously, the reason that the
last table tells us a different story is that Apple is a divisive brand, with.
lots of adherents and a fair number of detractors. This means that we
need to be focused on measuring preferences at the respondent level
and grouping similar respondents (i.e., segmentation). As we will soon
see, there is a third problem lurking in this simplistic analysis.
To understand data at the respondent level we will start by looking at the experimental design and
responses for a single person. The table below shows the MaxDiff experimental design used when
collecting the data. The choices of the first respondent in the data set are shown by color. Blue shows
which alternative was chosen as best, red as worst. The question that we are trying to answer is, what
is the respondent’s rank ordering of preference between the ten tech brands?
(
DIY MAXDIFF 38
The simplest solution is to count the number of times each option is chosen, giving a score of 1 for
each time it is chosen as best and -1 for each time it is chosen as worst. This leads to the following
scores, and rank ordering, of the brands:
Microsoft 3 > Google 1 ≈ Samsung 1 ≈ Dell 1 > Apple ≈ Intel ≈ Sony > Yahoo -1 > Nokia -2 > IBM -3
This approach is very simple, and far from scientific. Look at Yahoo. Yes, it was chosen as worst
once, and our counting analysis suggests it is the third worst brand, less appealing to the respondent
than each of Apple, Intel, and Sony. However, look more carefully at Question 5. Yahoo has been
compared with Microsoft, Google, Samsung, and Dell. These are the brands that the respondent
chose as most preferred in the experiment, and thus the data suggests that they are all better than
Apple, Intel, and Sony. That is, there is no evidence that Yahoo is worse than Apple, Intel, and Sony.
The counting analysis is simple – but wrong.
We make the analysis more rigorous by considering which alternative was compared with which
others. This makes a difference because not all combinations of alternatives can be tested, as it
would lead to enormous fatigue. We have already concluded that Yahoo is no different from Apple,
Intel, and Sony, which leads to:
Microsoft > Google ≈ Samsung ≈ Dell > Apple ≈ Intel ≈ Sony ≈ Yahoo > Nokia > IBM
Consider the key implication of this. Counting analysis fails because it ignores the experimental
design. If we use a single version of the design, employing counting analysis can lead to misleading
conclusions.
Returning to the first respondent’s data, which brand is the second most preferred? Each of
Samsung, Google, and Dell have been chosen as best once. Does this mean they are all in equal
second? No, it does not. In Question 4, Dell was against Google, and Google was preferred. Thus, we
know that:
Microsoft > Google > Dell > Apple ≈ Intel ≈ Sony ≈ Yahoo > Nokia > IBM
But note that Samsung has been removed. Samsung is a problem. It may be between Microsoft and
Google. It may be between Google and Dell. Or it may be less than Dell. There is no way to tell. We
(
DIY MAXDIFF 39
can guess that it has the same appeal as Dell. Samsung is in blue: while the guess is not silly, it is
nevertheless not a super-educated guess:
Microsoft > Google > Samsung ≈ Dell > Apple, Intel, Sony, Yahoo > Nokia > IBM
A more difficult problem is posed by respondent 13’s data, shown below. She chose Apple twice as
best, Samsung twice, and Google and IBM once each. Which is her favorite? Here it gets ugly.
Google > Samsung (Question 5) Samsung > Apple (Question 6) Samsung > IBM (Question 6)
This data is contradictory. Look at the first row of conclusions. They tell us that:
Apple > IBM > Google
Most people’s instinct, when confronted by data like this, is to say that the data is bad and to chuck
it. Unfortunately, it is not so simple. It turns out most of us give inconsistent data in surveys. We get
distracted and bored, taking less care than we perhaps should. We change our minds as we think.
The interesting thing about MaxDiff is not that it leads to inconsistent data; rather, it is that it allows us
to see that the data is contradictory. This is a good thing: had we instead asked the respondent to
rank the data, it would still have contained errors, but we would never have seen them, as we would
have no opportunity to see the inconsistencies.
To summarize, computing scores for each respondent by summing up the best scores and
subtracting the worst scores is not valid, because:
(
DIY MAXDIFF 40
1. Analysis that ignores the experimental design aspects of which alternatives are shown with
which other alternatives will be misleading.
2. Respondents provide inconsistent data and this needs to be modeled in some way.
3. We do not have enough data to get a complete ordering of the alternatives from a single
person.
4. People differ in their preferences (compare respondents 1 and 13), so we should not pool all
the data when performing analyses.
Fortunately, a bit of statistical wizardry can help us with these problems, each of which is well
understood.
The problem of respondents providing inconsistent data is not new. It has been an area of active
3
academic research since the 1920s. The area of research that deals with this is known as random
utility models, and if you are reading this post you may already be familiar with this class of models
(e.g., multinomial logit, latent class logit, random parameters logit are all models that solve this
problem). These models also take into account which alternatives were shown in which questions, so
they solve the second issue as well.
The problems of needing to pool data among respondents while still considering differences between
them has been one of the most active areas in statistics for the past 20 or so years. Consequently,
good solutions exist for those problems as well. The next two chapters introduce two techniques that
address all four issues above: latent class analysis, and hierarchical Bayes.
Earlier we saw that the data for respondent 1 was consistent with the following preferences:
Microsoft > Google > Samsung ≈ Dell > Apple ≈ Intel ≈ Sony ≈ Yahoo > Nokia > IBM
3
Thurstone, L. L. (1927a) A Law of comparative judgment. Psychological Review, 34, 273-286.
(
DIY MAXDIFF 41
Such an analysis is useful for a single respondent, but there is no straightforward way to summarize
such findings across a sample of respondents. The solution is to assign a score to each of the
alternatives such that the score is consistent with the preferences. These scores are referred to as
coefficients. One convention when assigning coefficients is to assign a score of 0 to the first brand
and then assign the other coefficients relative to that. The first column of numbers below is consistent
with the data above. An even better way of doing this is to compute the average and subtract that
from all the scores, so that the average coefficient becomes 0. This is shown in the second column.
Both conventions are in widespread use.
Apple is Average
0 is 0
Apple 0 -0.4
Google 2 1.6
Samsung 1 0.6
Sony 0 -0.4
Microsoft 3 2.6
Intel 0 -0.4
Dell 1 0.6
Nokia -1 -1.4
IBM -2 -2.4
Yahoo 0 -0.4
In the coefficients above, a value of 1 has been used as the difference between all ranks. We can do
better than that. If a respondent is consistent in their preferences, we would want to give a bigger
difference, whereas if they are inconsistent, a smaller difference. This is achieved by logit scaling.
Logit scaling involves estimating coefficients that can be used to predict the probability that a person
will choose an alternative in a question, such that these probabilities best align with their actual
choices.
The basic idea of logit scaling is best appreciated by a simple example. The first column of the table
below shows the proportion of people to choose each option as best in the first question of the tech
study. The Coefficient column contains coefficients. The third column contains 𝑒 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 . The
(
DIY MAXDIFF 42
4
preference share is computed by dividing the values in this column by the total. For example, 3.6702
/ 7.7511 = .4735 = 47.35%. Some key aspects of this are:
• We assign coefficients to each respondent.
• The coefficients tell us about the respondent’s preference between the alternatives: If one
alternative has a higher coefficient than another, it implies that the respondent prefers the
alternative with the higher coefficient.
• The coefficients also allow us to compute preference shares, such that these shares are
consistent with the choices of best and worst predictions in the data.
Preference
Coefficient exp(Coefficient)
share
In this example, the coefficients exactly match the preference shares observed for the first question.
However, in practice, we have a few more challenges that we need to address:
• There are inconsistencies in answers across questions. We can resolve this by computing the
coefficients that most closely match the preference shares across all of the questions, which
means that they do not perfectly match with the observed preferences for each question.
• Respondents differ in their preferences. We saw this with respondents 1 and 13 above. To
address this, we need to compute a separate set of coefficients for each respondent.
4
Sawtooth Software has popularized an alternative formula for computing preference shares. For the formulas,
see https://wiki.q-researchsoftware.com/wiki/Marketing_-_MaxDiff_-
_Analyze_as_a_Ranking_Question_-_Compute_Sawtooth-
Style_Preference_Shares_from_Individual-Level_Parameter_Means_(K_Alternatives)
(
DIY MAXDIFF 43
• We do not have enough data to estimate accurately the preferences of a single respondent.
For example, for the first respondent, we had insufficient data to work out preferences for
Google versus Samsung versus Dell. The solution to this problem is to pool the data, so that
we can estimate one respondent’s preferences by borrowing information from other
respondents. For example, if we found that most respondents preferred Samsung to Dell, we
could assume that this is true for Respondent 1, even though we have no direct data about
this from respondent 1. While this idea may sound a bit strange if you are new to it, it has long
been established that models that pool data in such a way outperform models that are
estimated separately for each respondent (and, as mentioned, we do not have enough data
to estimate models for each respondent anyway).
Two main techniques in widespread use today for estimating coefficients from MaxDiff are
hierarchical Bayes and latent class analysis. These are the focus of the next two chapters.
Utilities
Typically, the word utility refers to transformed coefficients from a choice model. When discussed in
the context of a MaxDiff model, it typically refers to any of the following:
• Coefficients at the respondent level
• Average coefficients
• Preference shares at the respondent level
• Average preference shares
• Some transformation of the coefficients or shares, at the average or respondent level (e.g.,
scaled so that a 0 is given for the least preferred alternative and 100 for the most preferred
alternative)
(
DIY MAXDIFF 44
Using preference shares, and referring to them as preference shares, is often best for non-technical
audiences. Alternatively, use utilities or coefficients, but report them using a bumps chart (ranking
plot), which focuses on relativities rather than the actual values.
You can extract the utilities (zero-centered coefficients) for each person by selecting a MaxDiff model
and:
• In Q: Create > Marketing > MaxDiff > Save variable(s) > Zero-Centered Utilities
• In Displayr: by scrolling to the bottom of the object inspector and choosing SAVE
VARIABLE(S) > RLH (Root Likelihood) > Zero-Centered Utilities.
You can then transform these into other formats using the various tools for transforming variables. For
example, if you want to modify them to have a minimum of 0 and a maximum of 100:
• In Displayr:
o Select the variables
o Click the + to insert new variables and select Ready-Made New Variable(s) > Scale
Variables > Unit Scale Within Case.
o Select the new variables, click the + to insert new variables and select Ready-Made
New Variable(s) > Multiply and set The single numeric value to Multiply by to 10.
• In Q:
o Select the variables
o Automate > Browse Online Library > New Variables > Scale Variable(s) > Unit
Scale Within Case.
o Right click on the first of the newly created variables and select Edit R Variable
o Enter at the beginning of the first line of code 100*
o Press the play button (the blue triangle).
o Press Update R Variable.
MaxDiff questions can be a bit boring. When bored, some respondents will select options without
taking the time to evaluate them with care. Fortunately, this is easy to detect with MaxDiff: we look at
how well the model predicts a person's actual choices. The simplest way to do this is to count up the
(
DIY MAXDIFF 45
number of correctly predicted best choices. However, an even better approach is to us the RLH
statistic. The approach works as follows:
You can extract the RLH for each person by selecting a MaxDiff model and:
• In Q: Create > Marketing > MaxDiff > Save variable(s) > RLH (Root Likelihood)
• In Displayr: by scrolling to the bottom of the object inspector and choosing SAVE
VARIABLE(S) > RLH (Root Likelihood).
For each question, compute the estimated probability that the person chooses the option that they
choose. (This is a computation performed by the model you use, such as Hierarchical Bayes).
• Multiply the probabilities together. For example, in a study involving four MaxDiff questions, if
a person chooses an option as best that the model predicted they had a 0.4 probability of
choosing, and their choices in the remaining three questions had probabilities of 0.2, 0.4, and
(
DIY MAXDIFF 46
0.3 respectively, then the overall probability is 0.0096. This is technically known as the
person's likelihood.
• Compute likelihood^(1/k), where k is the number of questions. In this example, the result is
0.31. The value is known as the root likelihood (RLH). It is better than just looking at the
percentage of the choices that the model predicts correctly, as it rewards situations where the
model was close and penalizes situations where the model was massively wrong. Note that
the RLH value of 0.31 is close to the mean of the values (technically, it is the geometric
mean).
Two main techniques in widespread use today for estimating coefficients from MaxDiff are
hierarchical Bayes and latent class analysis. These are the focus of the next two chapters.
(
DIY MAXDIFF 47
Latent class analysis
Latent class analysis is used to find groups of
people that revealed similar preferences when
forming segments.
(
DIY MAXDIFF 48
Latent class analysis is like cluster analysis. You put in a whole lot of data and tell it how many
classes (i.e., clusters) you want. Latent class analysis is something of a catch-all expression covering
many different techniques for forming groups in data. There are two specific variants which can be
used for analyzing MaxDiff data: one is the tricked-logit model and another is the rank-ordered logit
5
model. The tricked-logit model is the one popularized by Sawtooth. Both models give pretty much the
same results, and both are available in Q and Displayr.
Latent class analysis is used with MaxDiff studies for two quite different reasons:
1. To create segments. The output of latent class analysis is a small number of groups of
respondents with different preferences. These groups can be treated as segments.
5
See https://www.displayr.com/tricked-vs-rank-ordered-logit/ for a discussion about the differences between the
two.
(
DIY MAXDIFF 49
Case study
The output below shows the results from the 5-class model, which is to say a latent class analysis model
that assumes there are five types of people in terms of their preferences as revealed by the MaxDiff
study. (Why five? We’ll return to this later.)
The Mean column shows the average coefficient estimated for each respondent. These coefficients
have some special properties we’ll explore later, but for the moment, the way to read them is that the
average coefficient across all the respondents and all the alternatives is 0. Reading down the Mean
column, we can see that Google is, on average, the most preferred brand, followed by Apple and
Samsung.
(
DIY MAXDIFF 50
The histograms show the variation among the respondents. The actual values are not shown because
they are not particularly meaningful. The key things to appreciate are the patterns and relativities. We
can see, for example, that preferences for Apple are very diverse relative to most of the other brands,
whereas there is limited disagreement regarding Google. This is communicated quantitatively via the
Standard Deviation column.
It may have occurred to you that there is something a bit odd about this output. We have told you we
are estimating a latent class model with five classes, but the histograms reveal that there are more than
five unique values for each coefficient. This is because some respondents have data that is best
described as being an average of multiple classes.
The colors show the values by segment. Looking at Apple, we can see that rightmost column is
orange, which corresponds to Class 2. That is, the second class have a strong preference for Apple.
We can also see they like Google, are in the middle of the pack for IBM, and, dislike Sony and Nokia
relative to the other groups.
The table below shows the coefficients estimated in each of the five classes and the size of the classes.
Looking at Class 2, we can see that it tells the same story in terms of preferences for Apple, Microsoft,
Sony, and Nokia that we could see in the histograms.
The next table shows the preference share by class. This is often the most useful output when the goal
is to describe the segments. Reading the first row, for example, we can see that the analysis has indeed
identified that a fundamental difference between segments relates to their preference for Apple, with
Google, by contrast, having some popularity in all segments and Samsung being strong in one segment
(Class 2).
(
DIY MAXDIFF 51
It is also possible to compute coefficients and preference shares for each respondent, but this will be
the focus of the next chapter (the process and interpretation are the same).
As with cluster analysis, a key input to latent class analysis is the required number of classes, where
class means the same thing as cluster in cluster analysis and segment in segmentation.
Our eBook How to do Market Segmentation contains a more detailed discussion of issues involved in
how to select the number of classes from a segmentation, so this chapter will focus purely on the
technical side of selecting the number of classes.
If the goal of the analysis is to create segments, the basic approach to choosing the number of
segments is to trade off statistical and managerial considerations. If our focus is on creating an
alternative to a hierarchical Bayes model, then we just take statistical considerations into account.
In the header and footer to the latent class analysis outputs you will see the Prediction accuracy and
BIC. The output below shows these measures for models from one to ten classes, and for a
hierarchical Bayes model, shown in the bottom row.
(
DIY MAXDIFF 52
This study asked each respondent six questions. We have used four of their questions, randomly
selected, to estimate the models. We then predict the best choice for the remaining two questions,
which is shown as Out-of-sample accuracy in the table above. We can see that the best of the
latent class analysis models is the one with 9 segments. The hierarchical Bayes model is clearly the
best of all the models.
The root likelihood (RLH), is a measure of accuracy which takes into account the probability of the
prediction (e.g., if the model predicts that an alternative has a 90% chance of being selected, but it’s
not selected, the statistic is worse than if a 50% probability and not selected). It suggests that the 10
class solutoin is best, which implies that we might want to investigate even larger number of classes
than 10 if our focus is purely on predictive accuracy.
An alternative statistic for working out the number of classes is the BIC statistic. The smaller the
better. This statistic suggests that the best model is the one with seven classes.
Which statistic should we believe? The out-of-sample RLH is the best of these in that it relies on few
assumptions and is relatively sensitive. But, it’s still affected by the randomness of selecting the
questions to use in the analysis. The BIC statistic, by contrast, is more consistent, so it can be useful.
The lack of consistency across the different statistics means that there is no clear way of making a
choice, so we should instead focus on non-statistical considerations, or, use the HB model.
Other considerations that can be taken into account when selecting the number of classes are:
• The stability of the Mean column: You should keep increasing the number of classes until
the means do not change much between different number of classes.
• The usefulness of the solution in inspiring segmentation strategy: With a study about
brand, this reduces to whether the brands of interest to the stakeholders have differing
preference shares by segment. The above segmentation may be useful for Apple,
(
DIY MAXDIFF 53
Samsung, Sony, and Google but has limited value for the other brands. Admittedly this
segmentation is not so interesting anyway, as brand is rarely a very useful segmentation
variable.
• The complexity of the solution for stakeholders: The fewer classes, the more intelligible.
Start points
Latent class models start by randomly generating a solution, and then improving on this. But, the
initial random configuration may not be the best, meaning that the final solution is what’s known as a
local optima. To check if this is a problem it is possible to re-run the solution from a different random
start point. To do this:
(
DIY MAXDIFF 54
Re-estimating the model
All the models estimated in this chapter have used two questions for cross-validation. Once we have
selected the desired number of classes we should re-estimate the model without cross-validation, so
that we make full use of all the available data. This final model should be used for reporting purposes.
Once we have created our segmentation we can allocate each person to a class and then profile the
classes by creating tables. The table below, for example, shows the 5-class solution by product
ownership. If you compare this table with the latent class solution itself, you will see that the product
ownership lines up with the preferences exhibited in the MaxDiff questions.
(
DIY MAXDIFF 55
Software – Q
Q has two entirely different systems for estimating latent class models.
6
If the data has been set up as a Ranking Question, the general latent
class analysis approach can be used (Create > Segments > Latent
Class Analysis). This approach is particularly useful if your focus is
market segmentation, as it contains automated tools for selecting the
number of segments and for creating segments with multiple different
types of data. See our eBook How to do Market Segmentation for
more information about this.
The rest of this section (and eBook) instead assumes you are using
the special-purpose latent class analysis routines designed for
MaxDiff.
Go to File > Data Sets > Add to Project and import the experimental
design however you want. If you wish to replicate the examples in this
book, you can download the design from here:
https://wiki.q-
researchsoftware.com/images/7/78/Technology_MaxDiff_Design.csv
Import the data file using File > Data Sets > Add to Project. If you
wish to replicate the examples in this book, you can download the data
from here: https://wiki.q-researchsoftware.com/images/f/f1/Technology_2017.sav
6
https://docs.displayr.com/wiki/Setting_Up_a_MaxDiff_Experiment_as_a_Ranking
(
DIY MAXDIFF 56
Estimating the model
Select Create > Marketing > MaxDiff > Latent Class Analysis, changing Inputs >
EXPERIMENTAL DESIGN > Design source to Variables and:
• Click into the Alternatives box and type “alt”. Then select all the “Alt” variables from the
experimental design file. Be sure to select them in the right order (1 to 5). As this design is a
single-version design, there is no need to specify the version.
• For the RESPONDENT DATA:
o Click in the Best selections field in the Object Inspector (far right), and type Most,
which filters the variables that contain the word “most” in the name. Select the six
variables, being careful to choose them in order.
o Select all the Worst selections by searching for Least.
• Under MODEL:
o Set the Number of Classes to 5.
o Set Questions left out for cross-validation to 2.
Copy and paste the latent class analysis output in the Report, changing the number of classes. Do
this as many times as you like to create different models. Then:
Class parameters
Click on the model in the Report and select Inputs > DIAGNOSTICS > Create > Marketing >
MaxDiff > Diagnostic > Class Parameters Table on the right.
Click on the model in the Report and select Inputs > SAVE VARIABLE(S) > Class Preference
Shares on the right.
Class membership
Click on the model output and select Inputs > SAVE VARIABLE(S) > Class Membership on the
right. This adds a new variable to the project.
(
DIY MAXDIFF 57
Profiling segments
Press Create > Tables > Table, and select a class membership question (e.g. Class memberships from
max.diff) in the Blue Dropdown menu and another question in the Brown Dropdown menu (e.g. Q6
Device ownership)
Software – Displayr
Select + Add a data set in the bottom left and import the experimental design whichever way you
wish. If you want to replicate the examples in this book, you can download the design from here:
https://wiki.q-researchsoftware.com/images/7/78/Technology_MaxDiff_Design.csv
Click the + button under Data Sets in the bottom left. If you wish to replicate the examples in this
book, you can download the data from here:
https://wiki.q-researchsoftware.com/images/f/f1/Technology_2017.sav
(
DIY MAXDIFF 58
Estimating the model
(
DIY MAXDIFF 59
Choosing the number of classes
Copy and paste the page with a latent class analysis output on it, changing the number of classes. Do
this as many times as you wish to produce new models. Then:
• Select Anything > Advanced Analysis > MaxDiff > Compare Models
• Select your MaxDiff models in Inputs > Input Models
Class parameters
Click on the model output, and, in the object inspector (right-side of the screen), select Inputs >
DIAGNOSTICS > Class Parameters Table (you may have to scroll down to see this option).
Click on the model output, and, in the object inspector (right-side of the screen), select Inputs >
DIAGNOSTICE > Class Preference Shares Table (you may have to scroll down to see this option).
Class membership
Click on the model output, and, in the object inspector (right-side of the screen), select Inputs >
SAVE VARIABLE(S) > Class Membership (you may have to scroll down to see this option).
Profiling segments
On a new page, press the Table button, and drag across a class membership variable (e.g. class
memberships from max.diff) from Data to the Rows box and Q6 Device ownership to Columns.
(
DIY MAXDIFF 60
Hierarchical Bayes
Hierarchical Bayes is the state-of-the-art technique
for estimating coefficients for respondents in a
MaxDiff study.
(
DIY MAXDIFF 61
Hierarchical Bayes for MaxDiff is a similar idea to latent class analysis. There is no accurate way of
describing how it works without getting into the math in some detail. A simple way of describing how it
works is to contrast it with latent class analysis:
• Latent class analysis essentially pools the data of respondents together by assuming that
there are a small number of different types of respondents.
• Hierarchical Bayes pools together the data by assuming that there are an infinite number of
segments, by assuming that the diversity between the respondents can be described, a priori,
as being multivariate normal.
When computing coefficients for each respondent, both models can be understood as assigning
respondents coefficients as a weighted average of other similar respondents. As the hierarchical
Bayes model assumes an infinite number of segments, in practice it is able to approximate the data of
respondents more closely, even though the multivariate normal assumption is not really a good
description of the true variation in the population.
As you can see from the output below, the resulting histograms of coefficients for the respondents are
clearly not overly influenced by the normality assumption (as none of these distributions is normal).
(
DIY MAXDIFF 62
Respondent-level coefficients
Coefficients for each respondent, also known as respondent-level parameters, can be extracted from
both hierarchical Bayes and latent class analysis models. You may recall that two chapters earlier we
analyzed the data for respondent 1 and concluded that the data showed that:
Microsoft > Google > Samsung = Dell > Apple, Intel, Sony, Yahoo > Nokia > IBM
Samsung is in blue because it was more of a guesstimate. The estimated coefficients for the
respondent are broadly consistent with this ordering, although we see Samsung being a little more
preferred than Google, and Yahoo being more negatively positioned than Nokia. This is because
where there is limited information, the analysis borrows preferences from other respondents.
Preference shares
Preference shares can also be estimated for each respondent. The following donut plot shows the
average preference shares.
(
DIY MAXDIFF 63
These are useful for producing analyses with other data, by comparing average preference shares by
the other data. For example, the plot below compares the preference shares by product ownership.
(
DIY MAXDIFF 64
Technical parameters
The software used to estimate hierarchical Bayes has a few technical settings, which are described in
this section. Typically, unless there is a warning to do otherwise (e.g., the maximum tree depth
has been exceeded), there is no need to modify these settings.
Number of classes
As with latent class analysis, it is possible to estimate multiple classes for hierarchical Bayes, where a
multivariate normal distribution is assumed within each class. However, this tends not to make a big
difference. The basic process for selecting the number of classes is to focus only on the cross-
validation accuracy and out-of-sample RLH, as the BIC statistic is problematic when comparing
numbers of classes in this situation and the various managerial issues are irrelevant (as the resulting
classes do not correspond to segments, because the classes overlap).
Iterations
By default, the hierarchical Bayes software described in this book runs for 100 iterations, where each
iteration is an attempt to improve the model. Sometimes this will be too few (i.e., the model has yet to
converge after 100 iterations), and it is appropriate to increase the number of iterations. The software
will automatically give a warning if it thinks that more iterations are required. You can find out more
about this here: https://www.displayr.com/convergence-hb-maxdiff/
Chains
This option specifies how many separate chains (independent analyses) to run, where the chains run
in parallel, given the availability of multiple cores. Increasing the number of chains increases the
quality of the results. It does, however, result in a longer running time if chains are queued up. If using
Q or Displayr, it is recommended to leave this option at its default value of 8, which is optimal for the
R servers that we use.
(
DIY MAXDIFF 65
Maximum tree depth
This is a very technical option. The practical implication is that this option should need to be changed
only if a warning appears indicating that the maximum tree depth has been exceeded. The default value is
10 and a warning should not appear under most circumstances. If such a warning does appear, try
increasing the maximum tree depth to 12 rather than a larger number, which could increase
computation time significantly.
(
DIY MAXDIFF 66
Software – Q
If you have not already created a latent class analysis, follow the steps in the previous chapter (see
Software – Q), but choose Hierarchical Bayes instead of Latent Class Analysis.
If you have already created a latent class analysis, instead click on it and any associated outputs in
the Report and copy-and-paste them. Then, on the latent class analysis output, change Type (under
Inputs > MODEL on the right) to Hierarchical Bayes and set the Number of classes to 1.
The remaining instructions describe how to create the Hierarchical Bayes model if you have first
created a latent class analysis model.
Respondent-level coefficients
Click on the model in the Report and select Inputs > SAVE VARIABLE(S) > Individual-level
Coefficients or Zero-centered Utilities. These options are identical (coefficients is the term that
people from an academic background will use; utilities the term that former Sawtooth users tend to
us).
Click on the model in the Report and select Inputs > SAVE VARIABLE(S) > Preference Shares.
Donut chart
Press Create > Charts > Visualization > Donut Chart, and click into the Variables box in DATA
SOURCE. Then, select the preference share variables and choose your preferred formatting options
(under Chart).
(
DIY MAXDIFF 67
Ranking plot
Create > Tables > Table and select the preference share variables in the blue dropdown menu and
Q6 in the brown dropdown menu. Then:
• Right-click on the table and select Statistics – Cells, change the statistic to % Column
Share (turning off Averages).
• From Show data as in the toolbar, select Ranking Plot.
• In the Object Inspector (right-side of the screen), select Chart > FORMATTING > Show
Values > Yes - Below.
Preference simulation
The most straightforward way to simulate preference share is to change the preference shares for
brands you wish to exclude to 0, and use % Share, % Column Share, or % Row Share as Statistics
– Cells, as this automatically rebases any values and converts them to a percentage. Then, if
desired, the alternatives can also be deleted from the table underlying the plot (by first converting it to
a table).
To remove an item from the computation of the shares in a table (for example, to remove Apple), the
R code of the preference share variables is edited (do this via the Variables and Questions tab by
right-clicking on the question > Edit R Rariable. In the R CODE, replace:
prop.table(exp(as.matrix((flipMaxDiff::RespondentParameters(max.diff)))),
1)
with:
z =
prop.table(exp(as.matrix((flipMaxDiff::RespondentParameters(max.diff)))),
1)
z[,1] = 0
z
Noting that max.diff can be replaced with whatever your model is named, and that we have set
column 1 to be zero as this is the column for Apple.
(
DIY MAXDIFF 68
Software - Displayr
If you have not already created a latent class analysis, follow the steps in the previous chapter (see
Software – Displayr), but choose Hierarchical Bayes instead of Latent Class Analysis.
If you have already created a latent class analysis, instead click on the page in the Pages tree that
contains latent class analysis output and select Duplicate. Then, change Type to Hierarchical
Bayes and set the Number of classes to 1.
The remaining instructions describe how to create the hierarchical Bayes model if you have first
created a latent class analysis model.
Respondent-level coefficients
Click on the model output, and, in the object inspector (right-side of the screen), select Inputs >
SAVE VARIABLE(S) > Individual-Level Coefficients or Zero-Centered Utilities. You may have to
scroll down to see these option. These options are identical (coefficients is the term that people from
an academic background will use; utilities the term that former Sawtooth users tend to us).
Click on the model output, and, in the object inspector (right-side of the screen), select Inputs >
SAVE VARIABLE(S) > Preference Shares (you may have to scroll down to see this option).
Donut chart
On a new page, select Visualization > Donut Chart. For DATA SOURCE (in the Object Inspector)
use the box Variables in 'Data'. Drag across the preference share variables into the Variables box,
and choose your preferred formatting options (under Chart).
Ranking plot
On a new page, select Table and drag across the preference share variables as Rows and Q6 as
Columns. Then:
(
DIY MAXDIFF 69
• In the object inspector, using Statistics – Cells, change the statistic to % Column Share
(turning off Average).
• With the table selected, press Visualization > Ranking Plot.
• In the Object Inspector (right side of the screen), select Chart > FORMATTING > Show
Values > Yes - Below.
Preference simulation
The most straightforward way to simulate preference share is to change the preference shares for
brands you wish to exclude to 0, and use % Share, % Column Share, or, % Row Share as
Statistics – Cells, as this automatically rebases any values and converts them to a percentage.
Then, if desired, the alternatives can also be deleted from the table underlying the plot (by first
converting it to a table).
To remove an item from the computation of the shares in a table (for example, to remove Apple), the
R code of the preference share variables is edited. Select a variable from the variable Preference
shares from max.diff and the R CODE will be exposed in the right-hand panel.
Replace:
prop.table(exp(as.matrix((flipMaxDiff::RespondentParameters(max.diff)))),
1)
with:
z =
prop.table(exp(as.matrix((flipMaxDiff::RespondentParameters(max.diff)))),
1)
z[,1] = 0
z
where max.diff is the name of the model. Drag the preference shares question on to the page to
see the effect.
(
DIY MAXDIFF 70
Software – R
library(flipMaxDiff)
HB <- FitMaxDiff(design = my.design,
best = my.data[, grep("left", names(my.data))],
worst = my.data[, grep("right", names(my.data))],
tasks.left.out = 2,
is.tricked = TRUE,
algorithm ="HB-Stan")
(
DIY MAXDIFF 71
Multivariate analyses of
coefficients
It is common to use respondent-level coefficients as
inputs into other analyses.
(
DIY MAXDIFF 72
It is commonplace for people to analyze coefficients using other multivariate techniques, such as
factor analysis, cluster analysis, regression, and TURF. There are a number of considerations to keep
in mind when doing this:
• The individual-level coefficients are themselves the result of models with assumptions. These
assumptions can have a significant impact on the resulting analyses, as discussed below.
• The coefficients are estimates with noise attached to them. When you conduct another
statistical analysis, using these coefficients as inputs, the level of noise compounds.
• The coefficients are not independent. The precise value of one person’s coefficient for one
alternative is related to the precise value of the others. For example, if the coefficients have
been mean-centered, if the first alternative has a coefficient of -1 then the remaining
coefficients will sum to a value of 1. This can lead to weird and obscure errors when applying
traditional statistical techniques (e.g., PCA).
A two-step approach is widespread in market research, whereby first the coefficients are estimated for
each respondent using hierarchical Bayes, and then either cluster analysis or latent class analysis is
used to create segments. The problem with this two-step approach is that it leads to a compounding
of errors. The solution is instead to use latent class analysis to estimate the coefficients and create
the segments simultaneously.
Hierarchical Bayes (HB) generally achieves a higher predictive accuracy than latent class analysis.
This is because it does not assume a small number of archetypal respondents. If you are doing
segmentation the whole point is to assign people to a small number of groups, so all the predictive
accuracy gains of hierarchical Bayes relative to latent class analysis are likely lost. Furthermore, as
hierarchical Bayes makes a different set of assumptions, and these assumptions are both wrong to
some unknown extent and inconsistent with latent class analysis, it is highly likely that the two-step
approach will be substantially inferior in terms of the true predictive accuracy (i.e., if computed after
forming the segments).
As discussed in our eBook How to do Segmentation, it is not always the case that the solution that is
technically the best will also be the best from a managerial perspective, so it is often worthwhile to
use both latent class analysis and also hierarchical Bayes.
(
DIY MAXDIFF 73
Statistical tests
The assumptions of all standard statistical tests are violated by respondent-level coefficients. As each
person’s coefficient is a weighted average of those of the others, this means that the assumption of
observations being independent (aka random sampling) is never met. Further, the structure of the
dependency between the coefficients is not consistent with any of the common complex sampling
models (e.g., stratification, clustering).
A solution to this is to conduct tests using the coefficients of the archetypal respondents rather than
the respondent-level coefficients. The easiest way to do this is to use Q and set up the MaxDiff as
7
Ranking questions and then use Q’s automated statistical tests.
The automated signifciance tests from Q are shown below. Color-coding shows relative performance.
Thus, if we wish to test whether the difference between Apple by gender is significant, we need to
remove all the other brands from the analysis (in Q, right-click on the cells and select Remove; in
Displayr, click on them and select Delete. Reset or Undo is used for adding all the categories back
into the analysis). This is shown on the table below. The table on the right below only contains the
data for Intel; it shows that Intel is significantly more preferred, in an absolute sense, among men than
women.
7
https://wiki.q-researchsoftware.com/wiki/MaxDiff_Specifications
(
DIY MAXDIFF 74
In each of these tables the automatic significance testing focuses on relativities. To see how the
brands perform in an absolute sense, we need to use gender as a filter rather than as the columns
(when using gender in the columns, Q interprets this as meaning you are wanting to compare the
genders).
The table output below shows the table with a filter for women and with a Planned Test of Statistical
8
Significance explicitly comparing Apple with the benchmark value of 0, which reveals that the
absolute performance of Apple among women is significantly below the benchmark rating of 7 out of
10 (from an Anchored MaxDiff study, discussed in the next chapter).
8
https://wiki.q-researchsoftware.com/wiki/Planned_Tests_Of_Statistical_Significance
(
DIY MAXDIFF 75
Dimension reduction
Dimension reduction approaches like Principal Components Analysis and Factor Analysis are, at a
9
mathematical level, usually computed from correlations. When we assign coefficients to respondents
we do so with error, and the correlations between the coefficients become biased. If we are using
hierarchical Bayes, it estimates correlations between variables, and these can be used as inputs to
the dimension reduction techniques. To extract these correlations, run the following R code:
library(rstan)
cho.mat <- extract(max.diff$stan.fit, pars = "L_omega")[[1]]
cor.mat <- matrix(0, dim(cho.mat)[2], dim(cho.mat)[2])
for (i in 1:dim(cho.mat)[1])
cor.mat <- cor.mat + cho.mat[i, , ] %*% t(cho.mat[i, , ])
cor.mat <- cor.mat / dim(cho.mat)[1]
where max.diff on line 2 above is the name of the output of a hierarchical Bayes calculation. To run
R code in Q, select Create > R Output, paste the code into the editor on the right-hand side and click
on the Calculate button. Running R code in Displayr is the same except that you begin by clicking the
Calculation button and drawing a box for the calculation on your page.
10
If using latent class analysis, high correlations are inevitable due the math of the model, so using
respondent-level coefficients as an input to dimension reduction will inevitably lead to false
conclusions.
Note that even if you resolve these issues, a practical problem with using correlations is that the
variables are not independent, so you will often get an error message.
9
Although they can often be equivalently defined in other ways. For example, PCA can be defined relative to the
SVD of raw data.
10
For example, if we have two classes and everybody has a 100% probability of being assigned to either of the
classes, then there will be a correlation of 1 or -1 between every variable that contains a coefficient.
(
DIY MAXDIFF 76
Regression
Regression has all the problems described in the previous two sections. There are two solutions to
11
this: regression can be estimated via a sweep operator using the estimated correlations, or the
regression relationships can be built into the MaxDiff estimation process. It seems unlikely that either
of these approaches is ever worth the hassle.
TURF
Below we describe how to do TURF with MaxDiff, and then provide a caution about its use.
TURF assumes that the data is binary (i.e.,0s and 1s). People like an alternative (1) or not (0).
MaxDiff instead estimates numeric values for each alternative. Consequently, in order to use TURF
we need to first discretize the coefficients (i.e., the utilities).
There are two standard approaches: using rankings or thresholds. With rankings, we for assign a 1 to,
say, the two alternatives with the highest coefficients for each respondent. Or, the three highest, or
the four highest. It’s an arbitrary decision.
With thresholds, we assign a 1 to all coefficients above some threshold (e.g., above the average). The
advantage that this approach has over the rankings is that it allows respondents to vary in terms of
how many 1s they have in the data, and in the real world we would expect such variation to also exist.
11
James H. Goodnight (1979), “A Tutorial on the SWEEP Operator,” American Statistician 33: 149-58.
(
DIY MAXDIFF 77
For example, if one respondent loves three of 10 alternatives but hates the rest, she will have three
1s, whereas an average respondent will have five 1s. But, the flipside of this is that where the
threshold is set is completely arbitrary and even more difficult to explain than choices regarding
rankings.
Software - Q
(
DIY MAXDIFF 78
To run the TURF, use Automated > Browse Online Library > TURF > Total Unduplicated Reach
and Frequency. Please see our TURF eBook for more information.
Software - Displayr
(
DIY MAXDIFF 79
• On the right of the screen, select Properties > Transformations > Binary Variables. This
will automatically set values greater than 0 to 1, where with the coefficients/utilities, a 0 is the
average.
• If you wish to use a threshold other than 0, expand out the variable set and click on the first
variable, and then modify the threshold in the R CODE (see the yellow in the screen shot
below).
To run the TURF, use Anything > Advanced Analysis > TURF > TURF Analysis. Please see our
TURF eBook for more information.
There are numerous ways of using MaxDiff outputs in a TURF analysis. Regardless of which
approach you use, there is always a fundamental incompatibility between TURF and MaxDiff. The
point of MaxDiff in market research is typically to find the smallest number of products or offers that
appeals to the largest proportion of people. However, MaxDiff does not tell you anything about the
appeal of products. It only tells you about relative appeal.
Consider a TURF study of preferences for Major League Baseball teams with the goal of working out
which are the key teams for a fast food chain to sponsor. Further, assume that you do an international
study without any screeners and use MaxDiff. There is a good chance that the Yankees will end up
having the highest preference share from the MaxDiff, as the most well-known team. The TURF will
then likely have the Yankees as the number one team to sponsor. However, it turns out that among
12
American baseball teams the Yankees are the most hated, so performing a TURF on the MaxDiff
results will give you precisely the wrong answer.
12
https://fivethirtyeight.com/features/america-has-spoken-the-yankees-are-the-worst/
(
DIY MAXDIFF 80
Of course, if you were really trying to do a TURF analysis of baseball teams, you would be smart
enough to filter the sample to exclude people who are not fans of the sport; but in a normal MaxDiff
study you do not have this luxury. If you are evaluating alternatives that you know little about – which
is usually why people do MaxDiff – you do not know which people dislike all of them and so run the
risk of making precisely this type of mistake.
The solution to these problems is just to use simpler data. If you know that you need to perform a
TURF, you will get a much better TURF if the input data is much simpler (e.g., a multiple-response
question asking people which of the alternatives they like).
(
DIY MAXDIFF 81
Anchored MaxDiff
(
DIY MAXDIFF 82
The table below shows the preference shares for a MaxDiff experiment in which the attributes being
13
compared are technology brands.
Preference shares necessarily add up to 100%. Preference shares show relativities. Thus, while a
naïve reading of the data would lead one to conclude that women like Apple more than men do, the
data does not actually tell us this (i.e., it is possible that the men like every single brand more than the
women do, but because the analysis is expressed as a percentage, such a conclusion cannot be
obtained).
13
This has been set up as a Ranking question in Q. The preference shares are computed under the assumption
of a single-class latent class model and are labeled as Probability % in Q and Displayr.
(
DIY MAXDIFF 83
The table on the left shows the same analysis
but in terms of the coefficients. This is also
uninformative, as these are indexed relative to
the first brand in this case, which is Apple.
Thus, that men and women both have a score
of 0 is an assumption of the analysis rather
than an insight (the color-coding is because
the significance test is comparing the
Preference Shares).
14
Specifying which coefficient is 0 is done in Q and Displayr by dragging the category (on the table) to be the
first category.
(
DIY MAXDIFF 84
Types of anchored MaxDiff experiments
There are two common types of anchored MaxDiff experiments: dual-response format, and MaxDiff
combined with rating scales.
Dual-response format
The dual-response format involves following each MaxDiff question with another question asking
something like:
Considering the four features shown above, would you say that...
○ All are important
○ Some are important, some are not
○ None of these are important.
Before or after the MaxDiff experiment, the respondent provides traditional ratings (e.g., rates all the
alternatives on a scale from 0 to 10 according to their likelihood of recommending each of the
alternatives).
(
DIY MAXDIFF 85
When to use anchored MaxDiff
When should you use anchored MaxDiff? It is not obvious to us that it is a useful technique (we
discuss why below). It is included in this eBook for two reasons:
• Some people like it.
• One can use Hybrid MaxDiff, which is analyzed in the same way.
What is the problem with anchored MaxDiff? We use MaxDiff in situations where we feel that
traditional questions will give poor data, either due to poor discrimination between respondents or due
to yea-saying biases. However, anchored MaxDiff makes use of the same styles of question that
MaxDiff was invented to avoid, so we end up with the worst of both worlds: we have all the problems
of simple rating scales and all the complexity of MaxDiff.
Anchored MaxDiff is most easily analyzed in Q and Displayr by setting up the data as a Ranking
15
question in Q or structure in Displayr . If doing this, hierarchical Bayes is not available, but similar
16
models can be used instead.
15
See https://wiki.q-researchsoftware.com/wiki/MaxDiff_Specifications
16
See https://wiki.q-researchsoftware.com/wiki/MaxDiff_Case_Study
(
DIY MAXDIFF 86
The easiest way to think about anchored MaxDiff is as a ranking with ties. If a respondent was given a
question showing options A, B, C, and F, and chose B as most preferred and F as least preferred, this
17
means that: B > A ≈ C > F.
17
Alternatively, we could assume that the difference between the appeal of B versus either A and C is equal to
the difference between either A and C versus F, as is implicit in the tricked logit model.
(
DIY MAXDIFF 87
Setting up anchored MaxDiff data in Q and Displayr
Note that this is a different model from that used by Sawtooth Software.
18
When each of the choice questions is set up in Q , a 1 is used for the most preferred item, a -1 for
the least preferred item, 0 for the other items that were shown but not chosen and NaN for the items
not shown. Thus, B > A ≈ C > F is encoded as:
A B C D E F
0 1 0 NaN NaN -1
Note that when analyzing this data, Q only looks at the relative ordering, and any other values could
be used instead, if they imply the same ordering.
Anchoring is accommodated in Q by introducing a new alternative. In the case of the dual response,
we will call this new alternative Zero. Consider again the situation where the respondent has been
faced with a choice of A, B, C, and F, and has chosen B as best and F as worst, which leads to B > A
≈ C > F.
The Zero alternative is always assigned a value of 0. The value assigned to the other alternatives is
then relative to these zero values.
18
https://docs.displayr.com/wiki/Setting_Up_a_MaxDiff_Experiment_as_a_Ranking
(
DIY MAXDIFF 88
All are important
Where all of the items are important this implies that all are ranked higher than the Zero option:
A B C D E F Zero
2 3 2 NaN NaN 1 0
Where some of the items are important this implies that the most preferred item must be more
important than Zero, the least preferred item must be less preferred than Zero, but we do not know
the relative preference of the remaining items relative to Zero, and thus this is coded as:
A B C D E F Zero
0 1 0 NaN NaN -1 0
Note that although this coding implies that A = C = Zero, the underlying algorithm does not explicitly
assume these things are equal. Rather, it simply treats this set of data as not providing any evidence
about the relative ordering of these alternatives.
A B C D E F Zero
-2 -1 -2 NaN NaN -3 0
(
DIY MAXDIFF 89
Setting up the combined MaxDiff with ratings in Q
Prior to explaining how to use ratings to anchor the MaxDiff it is useful first to understand how ratings
data can be combined with the MaxDiff experiment without anchoring. Again, keep in mind the
situation where a MaxDiff task reveals that B > A ≈ C > F. Consider a rating question where the six
alternatives are rated, respectively, 9, 10, 7, 7, 7 and 3. Thus, the ratings imply that: B > A > C ≈ D ≈
E > F and this information can be incorporated into Q as just another question in the MaxDiff
experiment:
A B C D E F
9 10 7 7 7 3
Anchoring is achieved by using the scale points. We can use some or all of the scale points as
anchors. From an interpretation perspective it is usually most straightforward to choose a specific
point as the anchor value. For example, consider the case where we decide to use a rating of 7 as the
anchor point. We create a new alternative for the analysis which we will call Seven.
In the case of the MaxDiff tasks, as they only focus on relativities, they are set up in the standard way.
Thus, where a MaxDiff question reveals that B > A > C ≈ D ≈ E > F, we include this new anchoring
alternative, but it is assigned a value of NaN as nothing is learned about its relative appeal from this
task.
A B C D E F Seven
-2 -1 -2 NaN NaN -3 NaN
The setup of the ratings data is then straightforward. It is just the actual ratings provided by
respondents, but with an additional item containing the benchmark value:
A B C D E F Seven
9 10 7 7 7 3 7
(
DIY MAXDIFF 90
(
DIY MAXDIFF 91