0% found this document useful (0 votes)
50 views18 pages

Scott Long (2006) Testing For IIA in The Multinomial Model

This document summarizes a study that evaluates three common tests (MTT, SH, HM) of the independence of irrelevant alternatives (IIA) assumption in multinomial logit models. Through Monte Carlo simulations, the study finds that the size properties of these IIA tests depend on the data structure of the independent variables. Specifically, the tests often reject IIA when alternatives seem distinct and fail to reject IIA when alternatives can be viewed as close substitutes. The study concludes that choice set partitioning tests of IIA are unsatisfactory for applied work.

Uploaded by

prellu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views18 pages

Scott Long (2006) Testing For IIA in The Multinomial Model

This document summarizes a study that evaluates three common tests (MTT, SH, HM) of the independence of irrelevant alternatives (IIA) assumption in multinomial logit models. Through Monte Carlo simulations, the study finds that the size properties of these IIA tests depend on the data structure of the independent variables. Specifically, the tests often reject IIA when alternatives seem distinct and fail to reject IIA when alternatives can be viewed as close substitutes. The study concludes that choice set partitioning tests of IIA are unsatisfactory for applied work.

Uploaded by

prellu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Sociological Methods

& Research
Volume 35 Number 4
May 2007 583-600
Testing for IIA in the Ó 2007 Sage Publications
10.1177/0049124106292361

Multinomial Logit Model http://smr.sagepub.com


hosted at
http://online.sagepub.com
Simon Cheng
University of Connecticut, Storrs
J. Scott Long
Indiana University, Bloomington

The multinomial logit model is perhaps the most commonly used regression
model for nominal outcomes in the social sciences. A concern raised by
many researchers, however, is the assumption of the independence of irrele-
vant alternatives (IIA) that is implicit in the model. In this article, the
authors undertake a series of Monte Carlo simulations to evaluate the three
most commonly discussed tests of IIA. Results suggest that the size proper-
ties of the most common IIA tests depend on the data structure for the inde-
pendent variables. These findings are consistent with an earlier impression
that, even in well-specified models, IIA tests often reject the assumption
when the alternatives seem distinct and often fail to reject IIA when the
alternatives can reasonably be viewed as close substitutes. The authors con-
clude that tests of the IIA assumption that are based on the estimation of a
restricted choice set are unsatisfactory for applied work.

Keywords: IIA; independent of irrelevant alternatives; multinominal logit

he multinomial logit model (MNLM) is perhaps the most commonly


T used regression model for nominal outcomes. The model is easy to esti-
mate, and interpretation is straightforward, albeit complicated due to the
large number of parameters involved. A concern raised by many researchers
is the assumption of the independence of irrelevant alternatives that is
implicit in the model (e.g., Alvarez and Nalgler 1995; Dow and Endersby
2004; Fry and Harris 1996, 1998; Keane 1992; Lacy and Burden 1999;
Mokhtarian and Bagley 2000; Pels, Nijkamp, and Rietveld 2001). The inde-
pendence of irrelevant alternatives (IIA) means that, all else being equal, a

Authors’ Note: Direct all correspondence to Simon Cheng, University of Connecticut,


Department of Sociology, Storrs, CT 06269-2068; e-mail: [email protected]. We
thank Tim Fry, Mark Harris, David Weakliem, and two reviewers for their comments. For
computing code related to this article, see web.uconn.edu/simoncheng/research iia.htm.

583
584 Sociological Methods & Research

person’s choice between two alternative outcomes is unaffected by what


other choices are available. McFadden’s (1974) commonly used example
that illustrates why this assumption can be unrealistic involves a commu-
ter’s choice among modes of transportation. Suppose that a person can tra-
vel to work either by car or by a red bus. Assume that the probability of
each mode of travel is 12, so that the odds of taking a car rather than a red
bus are 12=12 = 1. IIA requires that if a new alternative becomes available, the
probabilities for the prior choices must adjust in precisely the amount neces-
sary to retain the original odds. Now, suppose that the alternatives are
expanded to include travel on a blue bus, where this bus is identical to the
red bus except for color. We would expect that the probability of taking a
red bus would equal that of taking a blue bus. In such a case, the only way
to maintain the original odds of taking a car versus a red bus would be if a
car is chosen with a probability 13, a red bus 13, and a blue bus 13. By this logic,
we could effectively eliminate the use of cars by increasing the number of
colors used by bus companies. Obviously, it is more likely that the original
bus riders would divide evenly between taking red and blue buses. But this
more realistic scenario violates the IIA assumption since the odds of a car
versus a red bus would become 12 = 14 6¼ 1.
Train (2003) points out that the above example is rather extreme and
unlikely to occur in serious, substantive research. It is also important to
keep in mind that violations of IIA are not inherent in the choices them-
selves. That is to say, for a given set of choices, the IIA property could be
violated for one specification of the independent variables but not in some
other specification. As discussed by McFadden, Train, and Tye (1981), the
IIA property implies that those variables omitted from the model are inde-
pendent random variables in a way that is analogous to the assumption of
independent error terms in the linear regression model.
Two basic types of tests can be used to test for violations of IIA: choice
set partitioning tests and model-based tests. Choice set partitioning tests
compare the results from the full MNLM estimated with all outcomes (i.e.,
choices) to the results from a restricted estimation that includes only some
of the outcomes. IIA holds when the estimated coefficients of the full
model are statistically similar to those of the restricted one. If the test sta-
tistic is significant, the assumption of IIA is rejected, and the conclusion is
that the MNLM is inappropriate. The first test of IIA was proposed by
McFadden et al. (1981). This likelihood ratio test, hereafter MTT, com-
pares the value of the log-likelihood equation from the restricted estima-
tion to the value obtained by substituting the estimates from the full model
into the log-likelihood equation for the restricted estimation. Small and
Cheng, Long / Testing for IIA 585

Hsiao (1985) demonstrated that the MTT test is asymptotically biased and
proposed an alternative likelihood ratio test, known as the Small and Hsiao
test, that eliminates this bias. A third IIA test, proposed by Hausman and
McFadden (1984), compares the estimates from the full and restricted
model. The most commonly used tests are the Hausman and McFadden
(HM) test and the Small and Hsiao (SH) test, which are frequently dis-
cussed in econometrics texts (e.g., Greene 2003; Train 2003) and can be
easily computed using standard software (Zhang and Hoffman 1993).
Model-based tests are computed by estimating a more general model that
does not impose the IIA assumption and testing constraints that lead to
IIA. The most commonly discussed alternative models are multinomial
probit, nested logit, and mixed logit (see Train 2003 for an excellent dis-
cussion of these models). When these alternative models are used, IIA can
be tested by comparing the unrestricted model to a model that imposes
constraints leading to IIA. Unfortunately, these models are computation-
ally more difficult and are less familiar to applied researchers. As a conse-
quence, these tests are rarely seen in substantive applications. In the case
of the multinomial probit, issues of identification also make application
difficult (Keane 1992). These tests are not considered further in our article.
Evaluation of statistical tests typically involves assessment of their size
and power properties. In assessing size properties, the nominal signifi-
cance level of a test (e.g., .05, .10) is compared with the empirical signifi-
cance level in the data structure that does not violate the assumption being
evaluated. The empirical significance level is defined as the proportion
of times that the correct null hypothesis is rejected over a large number of
replications. If the size properties of a test are appropriate, the power of
the test is evaluated by assessing the proportion of times that the test
rejects the null hypothesis using a data structure that violates the assump-
tion. The more powerful the test, the higher the proportion of tests that
detects a violation of the assumption.
In two recent articles, Fry and Harris (1996, 1998) use Monte Carlo
simulations to evaluate six choice set partitioning tests of IIA, including the
MTT, SH, and HM tests. The first article provided evidence that these tests
have poor size properties and that critical values based on asymptotic theory
may be inappropriate. In their second article, they find that the SH test is
oversized and that the HM test is reasonably well sized. Although the MTT
test is found to be undersized, it has the greatest power when using empiri-
cal critical values. These values are the 95th percentile of the test statistics
from 1,000 simulations on samples from a population in which IIA is not
violated. Fry and Harris conclude that multiple tests should be used, that
586 Sociological Methods & Research

inference should be based on empirical critical values, and that a size-


adjusted MTT be used. They also point out that their findings from the simu-
lations in the 1998 article to some degree contradict their findings in the
1996 article, a point that we address below.
Our own experience with these tests, reinforced by responding to
researchers who encountered anomalies when using the IIA tests imple-
mented in Stata by Long and Freese (2005), suggests that problems with
IIA tests cannot be corrected with size adjustments or by using alternative
forms of the test. In a variety of substantive applications, we have found
that even with reasonable model specifications, these tests often reject IIA
when the alternatives seem distinct and that they do not reject IIA when
the alternatives can reasonably be viewed as close substitutes. Moreover,
variations of IIA tests applied to the same data using the same model often
provide inconsistent results regarding the violation of IIA in the full model
(see Fry and Harris 1998 for an example). To assess these impressions
more formally, we ran a series of Monte Carlo simulations to evaluate the
MTT, SH, and HM tests. Because our simulations show that the size prop-
erties of these tests are inadequate, we do not consider the power of these
tests. We note, however, that even if a test has adequate size properties, its
power properties could still be poor. See Brooks, Fry, and Harris (1997,
1998) for a discussion of the power properties of IIA tests.
As shown below, our simulations suggest that the size properties of these
IIA tests depend on the data structure for the independent variables. With
some structures, the size properties are reasonable, while in others, they are
extremely inflated. Since in substantive applications it is not possible to
determine if the data structure leads to unreasonable results, we conclude
that the tests are not useful for assessing IIA.1 We also consider the use of
size-adjusted critical values, as suggested by Fry and Harris (1996, 1998).
Our simulations find that these values depend on the data structure. This
makes the application of the test computationally intensive and largely
impractical. We begin with a formal statement of the MNLM and three IIA
tests. We then describe our simulations and present the results. We conclude
with a discussion of the implications of these findings.

The Multinomial Logit Model

Let y be the dependent variable with J outcomes numbered from 1 to


J. Let x be a vector of K independent variables plus a constant for the
intercept. The probability of observing outcome m for a given x is
Cheng, Long / Testing for IIA 587

expðxβm Þ
Prðy = m | xÞ = PJ   for m = 1; . . . ; J: ð1Þ
j=1 exp xβj

The vector βm = ðβ0m    βkm    βKm Þ0 includes the intercept β0m and
coefficients βkm for the effect of xk on outcome m. To identify the model,
we assume without loss of generality that β1 = 0. The model can also be
written in terms of the odds for each pair of options m and n:
m | n = expðx½βm − βn Þ: ð2Þ

Equation (2) shows that the odds of choosing m versus n do not depend on
which other outcomes are possible. That is, the odds are determined only
by the coefficient vectors for m and n—namely, βm and βn . This is the
independence of irrelevant alternatives property, or simply IIA.

Testing IIA

The MNLM can be viewed as the simultaneous estimation of binary


logits for all pairs of outcome categories. While efficient estimation of the
model requires that all pairs be estimated simultaneously, which imposes
certain logical constraints among parameters, Begg and Gray (1984) show
that consistent but inefficient estimates can be obtained by estimating a
series of binary logits. For example, an MNLM with three outcomes could
be estimated by estimating two binary logits, the first comparing outcomes
1 to 2 and the second comparing 1 to 3. Choice set partitioning tests of
IIA essentially involve comparing the estimates using all outcomes simul-
taneously to those based on a restricted choice set. We now formally
describe the tests.
The full model is given in equation (1), with estimates β bf . The super-
m
script f indicates that the estimates are from the full model that includes
all outcomes. The restricted estimation is identical to the full model except
that the equation for outcome J is excluded:
expðxβm Þ
Prðy = m | xÞ = PJ−1   for m = 1; . . . ; J − 1; ð3Þ
j=1 exp xβj

where we assume that β1 = 0. While we have dropped outcome J, any


other outcome could have been dropped. Under IIA, estimates β br from the
m
bf from
restricted choice set are consistent but inefficient, while estimates β m
the full model are consistent and efficient. The various tests of IIA involve
comparing the estimates from the full model to those from the restricted
588 Sociological Methods & Research

estimation. To define these tests, the estimates from the restricted choice
set are stacked in the vector β br0    β
b r = ðβ br0 Þ0 , with the corresponding
2 J−1
estimates from the full model β b = ðβ
f b β
f 0 bf 0 Þ0 . Note that β
bf does not
2 J−1
bf since it was not estimated in the restricted estimation.
include β J

MCFadden, Train, and Tye Test


The approximate likelihood ratio test of IIA proposed by McFadden
et al. (1981) is defined as
h    i
bf − Lr β
MTT = −2 Lr β br ;

where Lr is the log-likelihood function for the restricted estimation. Quite


simply, the test compares the value of the log-likelihood equation from
the restricted estimation to the value obtained by plugging estimates from
the full model into the log-likelihood from the restricted model. When IIA
holds, MTT is as distributed chi-square with degrees of freedom equal to
the rows in βbr .

Small and Hsiao Test


Small and Hsiao (1985) show that MTT is asymptotically biased toward
accepting the null hypothesis, which has been empirically confirmed in
studies such as Fry and Harris (1996, 1998). Small and Hsiao proposed a
modified version of MTT to avoid this bias. First, the sample is randomly
divided into subsamples A and B of roughly equal size. The full model
from equation (1) is estimated on both subsamples, with estimates con-
bf and β
tained in β bf . The weighted average of the coefficients from the
A B
two samples is defined as
    
bf = p1ffiffiffi β
β bf + 1 − p1ffiffiffi β
bf :
AB A B
2 2
A restricted subsample is created from subsample B by eliminating all
cases with a given value of the dependent variable—in our case, category
J. The restricted choice set is estimated using the restricted subsample
br with the likelihood function Lr . The Small-
yielding the estimates β B
Hsiao statistic is
h i
bf ∗ Þ − Lr ðβ
SH = −2 Lr ðβ br Þ :
AB B
Cheng, Long / Testing for IIA 589

SH is asymptotically distributed as chi-square with the degrees of freedom


equal to the number of parameters in the restricted choice set.

Hausman and MCFadden Test


Hausman and McFadden (1984) proposed a Hausman test (Hausman
1978) that compares the estimates βbf , which are consistent and efficient if
br .
the null hypothesis is true, to the consistent but inefficient estimates β
The HM test is defined as
 0 h    i−1  
HM = βbr − β
bf Var br − Var
d β d β bf br − β
β bf ;


where Varð br Þ and Varð
dβ bf Þ are the estimated covariance matrices. If IIA
holds, HM is asymptotically distributed as chi-square with df equal to the
rows in βbr . Significant values of HM indicate that the IIA assumption has
been violated. Hausman and McFadden (1984:1226) note that HM can be

negative if Varð br Þ − Varð
dβ bf Þ is not positive semidefinite, but they con-
clude that this is evidence that IIA holds. We use this decision rule in the
results we present below.

Alternative Forms of Each Test


Multiple variants of each test are created by eliminating different alter-
natives to create the restricted choice set. For example, if we use a
restricted estimation that excludes a single category, as we do in our simu-
lations, there are J versions of each test. Version 1 excludes the first cate-
gory to create the restricted estimation, version 2 excludes the second
category, and so on. While the resulting tests are asymptotically equiva-
lent, results can differ substantially in finite samples, as shown below.

Generation of Data

To examine the size properties of the MTT, HM, and SH tests, we con-
ducted Monte Carlo simulations using eight artificial data sets in which the
IIA assumption was not violated. These artificial data sets were constructed
to reflect scenarios that might occur in real survey data with both continuous
and categorical covariates, with different degrees of collinearity among the
covariates, different values of the βs, and small cells in the cross-tabulation
between the outcome variable and dichotomous covariates. For each data
590 Sociological Methods & Research

structure, we generated 150,000 observations with a three-category outcome


variable and three independent variables. The independent variables were
constructed as follows.

1. x1 is drawn from a uniform distribution on the interval from 1 to 2.


2. x2c is a continuous variable constructed by adding the uniform random vari-
able used for x1 to a normal random variable. The relative weights for the
uniform random variable and the normal random variable varied across data
sets to change the amount of collinearity between x1 and x2 . To create cate-
gorical covariates and sparse cells in the cross-tabulation between outcome
y and categorical covariates, we dichotomized x2c to create the binary vari-
able x2d . These are further discussed later.
3. x3 is a skewed variable constructed by adding a random variable drawn from
a chi-square distribution with one degree of freedom and the uniform ran-
dom variable used to construct x1 . Again, we varied the relative weights.

The outcome y was constructed as follows.

1. Select values for the βs in equation (1).


2. Compute predicted probabilities for each of 150,000 observations using the
probability equation
expðβm0 + βm1 x1 + βm2 x2 + βm3 x3 Þ
Prðy = m|xÞ = P3   for m = 2; 3 ð4Þ
j=1 exp βj0 + βj1 x1 + βj2 x2 + βj3 x3

where Prðy = 1|xÞ = 1 − Prðy = 2|xÞ − Prðy = 3|xÞ.


3. Generate a uniform random number on the interval from 0 to 1 for each
observation in each data set. If this random number is less than Prðy = 1Þ
computed with equation (4), then y = 1. If the number is between Prðy = 1Þ
and Prðy = 1Þ + Prðy = 2Þ, then y = 2; otherwise, y = 3.

In Data Sets 1, 2, and 3, all of the xs are continuous. These data sets
differ in the degree of collinearity among the xs, with the maximum corre-
lations ranging from .62 in Data Set 1 to .82 in Data Set 3. Data Set 4 was
created by dichotomizing x2 with 47 percent of the cases equal to 1. Data
Sets 5 through 8 are discussed in the ‘‘IIA Tests in Data With Sparse
Cells’’ section. Table 1 summarizes the data sets used in our simulations.

Design of Simulations

For each data set, simulations were run for sample sizes of n = 150,
250, 350, 500, 1,000, and 2,000.2 The simulations involved these steps:
Cheng, Long / Testing for IIA 591

Table 1
Descriptive Statistics for Data Sets Used in Simulations
Percentage in Means Correlations

Data Set y=1 y=2 y=3 x1 x2 x3 rx1x2 rx1x3 rx2x3

1 21.6 57.7 20.8 1.50 1.50 1.60 .82 .56 .46


2 15.6 69.0 15.4 1.50 2.00 2.10 .89 .72 .63
3 12.1 76.9 11.1 1.50 2.49 2.60 .92 .81 .74
4 41.0 33.9 25.1 1.50 0.47 1.60 .73 .56 .41
5 30.7 40.9 28.4 1.50 0.88 1.60 .45 .56 .26
6 15.6 70.1 14.4 1.50 0.88 1.60 .45 .56 .26
7 32.8 34.2 33.0 1.50 0.88 1.60 .46 .56 .26
8 48.8 25.1 26.1 1.50 0.88 1.60 .45 .56 .26

1. Draw a random sample of size N with replacement from the population.


2. For this sample, estimate the MNLM with outcome y and predictors x1 , x2 ,
and x3 .
3. Using estimates from Step 2, compute three variations of the MTT, HM, and
SH tests, excluding the first category for the restricted estimation, the second
category, and the third. The test statistics and p values are saved for later
analysis.

These steps were repeated 500 times for each sample size in each data set.
To determine the empirical size for each test, we computed the percentage
of times that each test rejected the null hypothesis that IIA held in the
population at the .05 and .10 levels of significance. Since the results at the
.10 level are consistent with those at the .05 level, they are not reported.
For the HM test, we used Hausman and McFadden’s (1984) suggestion
that negative chi-squared values be recorded as 0 with the corresponding
p value of 1.3
Our analysis begins by examining the three IIA tests using the first four
data structures and shows that the size properties are affected by the
amount of collinearity and depend on which version of the test is used.
Because the undersized properties of the MTT test are highly consistent
with those suggested in earlier research, we only present the results for the
HM and SH tests. While the SH test has seemingly reasonable size proper-
ties with samples of 500 or more in data structures with different degrees
of collinearity, we show that the presence of sparse cells can lead to severe
size distortion for sample sizes up to 2,000, the largest we present. Using
592 Sociological Methods & Research

these findings as a guide, we consider the MTT and illustrate the practical
problems with using empirical critical values.

IIA Tests in Data With Varying Degrees of Collinearity

The results of the simulations for the HM test are presented in Figure 1,
which shows the percentage of times the HM test rejected the null hypoth-
esis of no violation of the IIA assumption using the .05 level.4 For each data
structure, three versions of the HM test were computed, excluding either the
first, second, or third outcome category. The percentage listed in the title for
the graph using Data Set 4 indicates that 10.6 percent of the cases were
found in the smallest cell of the cross-tabulation between y and x2 . The
numbers on the lines within each graph indicate the deleted category for the
test being presented. The results illustrate that the HM test does not reliably
converge to its appropriate size even when the sample is 2,000. Second, the
properties of the test depend on which outcome category is deleted in the
restricted estimation. For example, in Data Set 2, the test approaches its
nominal .05 level when Category 2 is excluded but levels off around .15
when Category 1 is excluded. In a substantial proportion of the samples, the
resulting HM test was negative. Even with a sample size of 1,000, 21 to 49
percent of the test statistics were negative. Overall, our results indicate that
the HM test is not a viable test for assessing IIA.
As shown in Figure 2, the SH test approximates its nominal size as the
sample increases to 500 or 1,000. The magnitude of departures from the
nominal size and the sample size at which these distortions are largely
removed depends on the degree of collinearity in the data. For example,
with high collinearity, the size properties are quite poor with samples
smaller than 500 and require a sample of at least 1,000 before they are
nearly eliminated. We also found evidence of a practical problem that is
often encountered when applying these tests with real-world data. There
are six ways to compute the SH test in our example. Each outcome cate-
gory can be the base category in the MNLM used to compute the test. For
each base category, there are two variations of the test, depending on
which nonbase category is removed. While using Category 1 as the base
category when excluding Category 3 is the same model as using Category
2 as the base when excluding Category 3, the results from the SH test will
differ due to their dependence on a particular draw of random numbers. In
more than 33 percent of samples of 500, at least one of the six possible SH
tests provided inconsistent conclusions compared to the other test. Even in
Cheng, Long / Testing for IIA 593

Figure 1
Size Properties of the Hausman-McFadden Test
of the Independence of Irrelevant Alternatives

Data 1: low collinearity Data 2: medium collinearity


40 40
Percent rejected

Percent rejected
30 30

20 20
1
3 3 1 1 1
1 3 3
10 3 3
1 10
3 2 13
1 3
1 11 13
3 2
3 2 2
2 22 2 222 2
0 0
0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

Data 3: high collinearity Data 4: binary x2 10.6%


40 40
Percent rejected

Percent rejected

30 30

20 20

1
3 1 1
10 1 1
3 3 3 10 1
1
3 2 1 2
1
3 2 2 3
0
222 2
0
11
2
3 21
3 2 2
3 3 3

0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

samples of 1,000, inconsistencies were found in 28 percent of the sample.


Even greater problems were encountered when we explored data struc-
tures with sparse cells.

IIA Tests in Data With Sparse Cells

In our early experiments with a variety of data structures, we occasion-


ally obtained results that showed severe size distortion, such as illustrated
in Figure 3 for Data Structures 7 and 9. In these cases, the size distortion
for the SH test increased with sample size for some variations of the test,
594 Sociological Methods & Research

Figure 2
Size Properties of the Small-Hsiao Test
of the Independence of Irrelevant Alternatives

Data 1: low collinearity Data 2: medium collinearity


40 40
Percent rejected

Percent rejected
30 30
1
20 20 3
3
1
10 2133
2 10 3
13
22 1
2
21 1
3 2
3 1
3 1
2
3
1
2
3 1
2 1 2 2
3
0 0
0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

Data 3: high collinearity Data 4: binary x2 10.6%


40 40
Percent rejected

Percent rejected

30 1 30

20 3 20
23
1
21 3
2
1 1
10 2
321 10
3 2
3
1
3 12 1
3
2 2
3 1
3 2
2
1 3 2 3
1
0 0
0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

and there were substantial differences in the percentage of times different


versions of the test rejected the true null. Further analysis revealed that
these results were due to the presence of sampling zeros in the cross-
tabulation between the outcome y and the dichotomous variable x2d . This
problem is similar to the size distortion for the likelihood chi-squared
statistics in contingency tables with sparse cells (Larntz 1978).
To explore this finding more systematically, we constructed four data sets
in which the percentage of cases in the smallest cell of the cross-tabulation
Cheng, Long / Testing for IIA 595

Figure 3
Illustration of Severe Size Distortion

HM test SH test
50 50

40 3
31 40
Percent rejected

Percent rejected
1 13
3
1
30 30 3
2 1
3 3
1
20 1 20 3
3 3131
11
1 322
10 10
2 2
2 2
22 2 2 2
0 0
0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

Note: HM = Hausman-McFadden; SH = Small-Hsiao.

of y and x2 varied from 1.8 percent to less than .1 percent. These small
percentages could easily occur in data where one of the independent vari-
ables indicates membership in an underrepresented group, with an out-
come category with few cases, or when a combination of multiple binary
variables would lead to the rare occurrence of some outcome category for
some combination of independent characteristics.
In drawing small samples from data structures in which there were
sparse cells, it was common to draw a sample in which there was a zero in
the y × x2 table. In such cases, the MNLM can be estimated, but a singu-
d βÞ.
larity occurs in Covð b A researcher who encounters this situation when
building a model is likely to respecify the model to remove the singularity,
either dropping one of the independent variables or collapsing categories.
We adapted our simulations to reflect this scenario. If a zero cell was
encountered, we drew a replacement sample. Figure 4 presents the results
of our simulations for the SH test in data sets with sparse cells. Again, the
percentages listed in the title for each graph indicate the percentage of
cases in the smallest y × x2 cell in the population data structure. The per-
centage of tests that reject the null depends greatly on the excluded cate-
gory. In some cases, the tests have extreme size distortion, rejecting the
correct null 50 percent of the time, even with samples of 1,000. In supple-
mentary analyses, we extended the simulations to larger sample sizes
596 Sociological Methods & Research

Figure 4
Size Properties of the Small-Hsiao Test of the Independence
of Irrelevant Alternatives in the Presence of Sparse Cells

Data 5: binary x2 1.8% Data 6: binary x2 .64%


50 50
3
1
40 40
Percent rejected

Percent rejected
3
1
2
30 30 1
1 3
3
20 20 2 1
31
2
10 32 10 22 3
32
1 3
1 1
2
3 3
1 2 1
2
3
1
2
2
0 0
0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

Data 7: binary x2 .10% Data 8: binary x2 .02%


1 1
3
50 11 3311 3
333 3 50 3 1
1 1 1
Percent rejected

Percent rejected

40 1
3 3
40
30 3 30 2
2 1
20 2 20
22
10 22 10 2 2
2 2
2
0 0
0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,000

(6,000, 8,000, and 10,000) and restricted analysis to random samples with
at least five observations in the smallest cell. In both cases, the size distor-
tion persisted, again confirming our early finding that the size properties
of the IIA tests are highly dependent on the data structure for the indepen-
dent variables. The results for the HM test (not shown) are similar to those
for Data Structures 1 through 4: The size of the test does not converge as
the sample size increases, and the percentage rejected depends on the cate-
gory excluded in the restricted model. These findings could explain why
Cheng, Long / Testing for IIA 597

Table 2
Empirical Critical Values by Data Structure for n = 500
Excluded Category

Data Set 1 2 3

1 0.16 1.03 0.70


2 0.11 1.59 0.69
3 0.08 1.81 0.70
4 0.68 0.25 0.23
5 0.29 0.13 0.13
6 0.26 1.19 0.09
7 0.87 0.24 0.14
8 0.31 0.19 0.14

Fry and Harris (1996, 1998) found contradictory results for the size prop-
erties of the HM and SH tests in their two simulations.

Empirical Critical Values for the MTT Test

Fry and Harris (1998) explored the use of size-adjusted tests. For these
tests, the critical value is set to be the 95th percentile of the test statistic
computed in the simulation. Based on power, they recommend the MTT as
the preferred test. They state, ‘‘Furthermore, where possible, we would
recommend that a simulation experiment be conducted to obtain empirical
(size-corrected) critical values for use in inference concerning the IIA
property’’ (p. 419). Our results suggest that the sampling distribution of
MTT is highly dependent on the data structure. For example, Table 2
shows the empirical critical values generated for the MTT test in our eight
data structures. Even though Structures 1 through 3 are very similar, dif-
fering only in their degree of collinearity, the values differ substantially
relative to the small variances in the distributions of the MTT tests (e.g.,
for Data Set 1, the standard deviations for the three tests are .03, .36, and
.21). The variability in the computed empirical critical values suggests
that the MTT test may not be effective even with size adjustments.
Furthermore, even if a researcher decides that the size-adjusted MTT test
is appropriate, we caution that Fry and Harris’s advice requires that
researchers obtain the empirical critical values from their own simulations
using their data. We believe that this makes the size-adjusted MTT imprac-
tical in most substantive applications.
598 Sociological Methods & Research

Conclusion

Our overall conclusion, based on the simulations shown above and our
evaluation of other data structures, is that tests of the IIA assumption that
are based on the estimation of a restricted choice set are unsatisfactory for
applied work. The Hausman-McFadden test shows substantial size distor-
tion that is unaffected by sample size in our simulations. The Small-Hsiao
test has reasonable size properties in some data sets but has severe size dis-
tortion even in large samples when there are sparse cells in the table of the
outcome variable with a binary independent variable. While our simula-
tions are based on relatively simple models with three outcomes and three
independent variables, we suspect that simulations using more complex
models that more closely approximate real-world models would uncover
additional problems with these tests. Furthermore, even if a researcher
decided to use these tests, the problem of inconsistent results based on dif-
ferent variations of the test is likely. The MTT test with empirically based
critical values, as suggested by Fry and Harris (1996, 1998), also has lim-
itations that make its use impractical in substantive applications.
Overall, it appears that the best advice regarding concern about IIA
goes back to an early statement by McFadden (1974), who wrote that the
multinomial and conditional logit models should only be used in cases
where the outcome categories ‘‘can plausibly be assumed to be distinct
and weighed independently in the eyes of each decision maker.’’ Similarly,
Amemiya (1981:1517) suggests that the MNLM works well when the
alternatives are dissimilar. Care in specifying the model to involve distinct
outcomes that are not substitutes for one another seems to be reasonable,
albeit unfortunately ambiguous, advice. The generalized extreme value
(e.g., nested logit, paired combinatorial logit, etc.) and mixed logit model
(see Train 2003) show great promise for models that do not impose the IIA
assumption but require intensive calculation to estimate and involve more
complicate data structures.

Notes

1. While our simulations are based on the multinomial logit model, the results for the IIA
tests should also apply to the conditional logit model.
2. We also ran simulations with samples sizes of 200, 300, 400, and 450. The results were
consistent with those presented in our figures.
3. Supplementary analyses suggest that 20 to 60 percent of the resulting chi-square values
from the Hausman-McFadden test were negative, but the incidence decreases as sample size
Cheng, Long / Testing for IIA 599

increases. There is no clear relationship between the type of data structure and the percentage
of tests with negative chi-square values.
4. The scales of the figures are fixed to make comparisons across figures easier.

References
Alvarez, R. Michael and Jonathan Nalgler. 1995. ‘‘Economics, Issues and the Perot Candi-
dacy: Voter Choice in the 1992 Presidential Election.’’ American Journal of Political
Science 39:714-44.
Amemiya, Takeshi. 1981. ‘‘Qualitative Response Models: A Survey.’’ Journal of Economic
Literature 19:1483-1536.
Begg, Colin B. and Robert Gray. 1984. ‘‘Calculation of Polychotomous Logistic Regression
Parameters Using Individualized Regressions.’’ Biometrika 71:11-8.
Brooks, Robert D., Tim R. L. Fry, and Mark N. Harris. 1997. ‘‘The Size and Power Properties
of Combining Choice Set Partition Tests for the IIA Property in the Logit Model.’’ Journal
of Quantitative Economics 13:45-61.
———. 1998. ‘‘Combining Choice Set Partition Tests for IIA: Some Results in the Four
Alternative Setting.’’ Journal of Quantitative Economics 14:1-9.
Dow, Jay K. and James W. Endersby. 2004. ‘‘Multinomial Probit and Multinomial Logit: A
Comparison of Choice Models for Voting Research.’’ Electoral Studies 23:107-22.
Fry, Tim R. L. and Mark N. Harris. 1996. ‘‘A Monte Carlo Study of Tests for the Indepen-
dence of Irrelevant Alternatives Property.’’ Transportation Research Part B: Methodolo-
gical 30:19-30.
———. 1998. ‘‘Testing for Independence of Irrelevant Alternatives: Some Empirical
Results.’’ Sociological Methods & Research 26:401-23.
Greene, William H. 2003. Econometric Analysis. 5th ed. New York: Prentice Hall.
Hausman, Jerry A. 1978. ‘‘Specification Tests in Econometrics.’’ Econometrica 46:1251-71.
Hausman, Jerry A. and Daniel McFadden. 1984. ‘‘Specification Tests for the Multinomial
Logit Model.’’ Econometrica 52:1219-40.
Keane, Michael P. 1992. ‘‘A Note on Identification in the Multinomial Probit Model.’’ Journal
of Business and Economic Statistics 10:193-200.
Lacy, Dean and Barry C. Burden. 1999. ‘‘The Vote-Stealing and Turnout Effects of Ross
Perot in the 1992 U.S. Presidential Election.’’ American Journal of Political Science
43: 233-55.
Larntz, Kinley. 1978. ‘‘Small Sample Comparisons of Exact Levels of Chi-Squared Goodness-
of-Fit Statistics.’’ Journal of the American Statistical Association 73:253-63.
Long, J. Scott and Jeremy Freese. 2005. Regression Models for Categorical Dependent Vari-
ables Using Stata. 2nd ed. College Station, TX: Stata Press.
McFadden, Daniel. 1974. ‘‘Conditional Logit Analysis of Qualitative Choice Behavior.’’
Pp. 105-42 in Frontiers of Econometrics, edited by P. Zarembka. New York: Academic Press.
McFadden, Daniel, Kenneth Train, and William B. Tye. 1981. ‘‘An Application of Diagnos-
tic Tests for the Independence From Irrelevant Alternatives Property of the Multinomial
Logit Model.’’ Transportation Research Board Record 637:39-46.
Mokhtarian, Patricia L. and Michael N. Bagley. 2000. ‘‘Modeling Employees’ Perceptions
and Proportional Preferences of Work Locations: The Regular Workplace and Telecom-
muting Alternatives.’’ Transportation Research Part A-Policy and Practice 34:223-42.
600 Sociological Methods & Research

Pels, Eric, Peter Nijkamp, and Piet Rietveld. 2001. ‘‘Airport and Airline Choice in a Multiple
Airport Region: An Empirical Analysis for the San Francisco Bay Area.’’ Regional Studies
35:1-9.
Small, Kenneth A. and Cheng Hsiao. 1985. ‘‘Multinomial Logit Specification Tests.’’ Inter-
national Economic Review 26:619-27.
Train, Kenneth. 2003. Discrete Choice Methods With Simulation. New York: Cambridge
University Press.
Zhang, Junsen and Saul D. Hoffman. 1993. ‘‘Discrete-Choice Logit Models: Testing the IIA
Property.’’ Sociological Methods & Research 22:193-213.

Simon Cheng is an assistant professor of sociology at the University of Connecticut. His


research focuses on race and ethnicity, sociology of education, family, quantitative methods,
and political economy. He is currently working on a new mixture model that allows research-
ers to adjust for potential misidentification of group membership in survey data containing
drastically unequal subsample sizes, as well as on research that examines multiracial
students’ schooling experiences. His most recent publication focuses on resource allocation
to young children from biracial families (forthcoming in American Journal of Sociology).

J. Scott Long is Chancellor’s Professor of Sociology and Statistics at Indiana University–


Bloomington. His research focuses on gender differences in the scientific career, stigma and
mental health, aging and labor force participation, human sexuality, and statistical methods.
His recent research on the scientific career was published as From Scarcity to Visibility by
the National Academy of Sciences. He is past editor of Sociological Methods & Research
and the recipient of the American Sociological Associations Paul F. Lazarsfeld Memorial
Award for Distinguished Contributions in the Field of Sociological Methodology. He is
author of Confirmatory Factor Analysis, Covariance Structure Analysis, Regression Models
for Categorical and Limited Dependent Variables, Regression Models for Categorical and
Limited Dependent Variables With Stata (with Jeremy Freese), and several edited volumes.

You might also like