Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
Multinomial logistic regression models estimate the association between a set of predictors and a
multicategory nominal (unordered) outcome. Examples of such an outcome might include “yes,” “no,”
and “don’t know”; “Apple iPhone,” “Android,” and “Samsung Galaxy”; or “walk,” “bike,” “car,” “public
transit.” The most common form of the model is a logistic model that is a generalization of the binary
outcome of standard logistic regression involving comparisons of each category of the outcome to a
referent category. There are J total categories of the outcome, indexed by the subscript j, and the
number of comparisons is then J – 1. The equation for the model is written in terms of the logit of the
outcome, which is a comparison of a particular category to the referent category, both denoted πj here.
π j
ln = αj + βjX
π
j
The natural log of the ratio of the two proportions is the same as the logit in standard logistic regression,
where ln(πj/πj) replaces ln[π/(1-π)] , and is sometimes referred to as the generalized logit. The binary
logistic model is therefore a special case of the multinomial model. In generalized linear modeling terms,
the link function is the generalized logit and the random component is the multinomial distribution. The
model differs from the standard logistic model in that the comparisons are all estimated simultaneously
within the same model. The j subscript on both the intercept, αj, and slope, βj, indicate that there is an
intercept and a slope for the comparison of each category to the referent category. Note that in the
ordinal logistic model, there is only one slope coefficient for each predictor. Odds ratios for each
coefficient (for predicting the difference of one category response from the referent) are computed as
usual, with OR = e β , and represent the odds increase (or decrease) for category j compared with the
j
The predicted probabilities can be computed from the model parameters for a specific value of X. For
the standard logistic regression, we used the logistic transformation to find the probability according to
the logistic cumulative distribution function (cdf; see the “Logistic Regression” handout). For a simple
logistic regression with one predictor, we used
1
π=
1 + eα + β X
Entering in a specific value of X and the model estimates of α and β and using the exponential function,
the estimate of the expected probability can be computed for the specific value of X. (For additional
predictors, the values for X and β for those variables are added to the exponent in the denominator.)
Predicted probabilities (y-axis) are then often plotted with a separate line for each comparison as a
function of the X variable values (x-axis).
The cdf transformation for the multinomial distribution must add the exponent functions of the intercepts
and the coefficients for each of the comparisons to the referent category. 1 For a single predictor, the
predicted probability can be computed by generalizing the above equation for standard logistic, using the
following equation with as many additional J – 1 terms in the denominator for every comparison to the
referent category:
1 1
πj =
1+ ∑ e j j
α j +β j X α j +β j X α J −1 + β J −1 X α +β X
1+ e +e ... e
1
The referent category in the logistic or the multinomial logistic have e0, which is 1. This is why 1 appears in the numerator and denominator.
Newsom
Psy 525/625 Categorical Data Analysis, Spring 2021 2
The result is the estimated proportion for the referent category relative to the total of the proportions of all
categories combined (1.0), given a specific value of X and the intercept and slope coefficient(s).
Maximum likelihood is the most common estimation used for multinomial logistic regression. And, as with
logistic regression, model fit tests, such as the likelihood ratio test with degrees of freedom equal to J –
1, 2 are used to determine whether together all of the comparisons to the referent are significant.
The multinomial logistic models assume that there is independence of irrelevant alternatives (IIA). The
assumption is that if an additional category was to be added to the outcome, the proportions for the
original categories would be equally affected by adding the new category (e.g., adding a third party
candidate would equally impact votes for the two major party candidates). 3 As this example suggests,
the IIA assumption is not particularly realistic in many situations, even though it is needed for truly
unbiased estimates of the observed and predicted proportions. Although tests have been suggested to
investigate violation of the assumption, they do not appear to perform well (e.g., Cheng & Long, 2005;
Fry & Harris, 1996).
2
SAS prints the score and Wald test for the model as well.
3
The widely used example is the assumption that the original transportation choices of car and red bus would be equally affected if the choice
was between a car, red bus, or blue bus.
Newsom
Psy 525/625 Categorical Data Analysis, Spring 2021 3
R
> #use lessR routine for listwise deletion
> library(lessR)
> mydata <-Subset(work!='NA' & age!='NA' & srh!='NA' & married!='NA')
> #make sure dv is a factor
> d$work <- factor(d$work)
>
> library(nnet)
> d$work <- relevel(d$work, ref = 1)
> model <- multinom(work ~ age + srh + married, data = d)
# weights: 20 (12 variable)
initial value 709.782713
iter 10 value 325.755760
iter 20 value 297.417808
iter 30 value 297.133699
iter 40 value 297.125872
final value 297.125651
converged
Newsom
Psy 525/625 Categorical Data Analysis, Spring 2021 4
> summary(model)
Call:
multinom(formula = work ~ age + srh + married, data = d)
Coefficients:
(Intercept) age srh married
2 -3.311567 0.07788846 -0.6846427 0.02574789
3 -2.291723 0.05759751 -0.6545843 -0.56346491
4 -7.577511 0.17462753 -0.7710237 -0.18024777
Std. Errors:
(Intercept) age srh married
2 4.090534 0.05608322 0.2636701 0.5429907
3 4.752491 0.06532525 0.3052976 0.6251271
4 3.438072 0.04744140 0.2216583 0.4359652
Residual Deviance: 594.2513
AIC: 618.2513
>
> #obtain odds ratios
> exp(cbind(OR=coef(model), confint(model)))
(Intercept) age srh married
2 0.0364589851 1.081002 0.5042704 1.0260822 0.00001202087
3 0.1010921469 1.059289 0.5196580 0.5692333 0.96847498115
4 0.0005118338 1.190803 0.4625393 0.8350633 0.30076485647
SAS
proc logistic data=one ;
model work (ref=first) = age srh married / link=glogit;
run;
Model Information
Response Profile
Ordered Total
Value work Frequency
1 full 29
2 part 38
3 retired 426
4 unemployed 19