5.2) Multinomial Logistic Regression
5.2) Multinomial Logistic Regression
March 2013
1
Multinomial Logistic Regression
Multinomial logistic regression is used to analyze relationships between a
non-metric dependent variable and metric or dichotomous independent
variables.
It is usually used to determine factors that affect the presence or absence
of a characteristic when the dependent variable has three or more levels.
3
Level of measurement requirements
Multinomial logistic regression analysis requires that the
dependent variable be non-metric (nominal).
4
Procedure
In the dialog box, you select one dependent variable and your
independent variables, which may be factors or covariates. Some of
the submenus are given below:
Model: By default, a main effect model is fitted. In this submenu, you can
specify a custom model or a variable selection method.
Statistics: In this submenu, you can request for many statistics including
classification table for the model.
Criteria: This allows you to specify the criteria for the iterations during
model estimation.
Save: This allow you to save some variables to the working data file or to
an external data file.
5
Assumptions and outliers
Multinomial logistic regression does not make any
assumptions of normality, linearity, and
homogeneity of variance for the independent
variables.
6
Sample size requirements
The minimum number of cases per independent variable is
10, using a guideline provided by Hosmer and Lemeshow,
authors of Applied Logistic Regression, one of the main
resources for Logistic Regression.
7
Methods for including variables
• The only method for selecting independent
variables in SPSS is simultaneous or direct
entry.
8
Overall test of relationship
• The overall test of relationship among the independent
variables and groups defined by the dependent is based
on the reduction in the likelihood values for a model
which does not contain any independent variables and
the model that contains the independent variables.
10
The outcome variable is brand (coded as
A, B and C).
11
Brand
Cumulative
Frequency Percent Valid Percent Percent
Valid A
207 28.2 28.2 28.2
B
307 41.8 41.8 69.9
C
221 30.1 30.1 100.0
Total
735 100.0 100.0
Descriptive Statistics
Cumulati
Freque Valid ve
ncy Percent Percent Percent
Valid Female
466 63.4 63.4 63.4
Male
269 36.6 36.6 100.0
Total 12
735 100.0 100.0
Model Fitting Information
13
Model Fitting Information
• The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model chi-
square in the SPSS table titled "Model Fitting Information".
16
Computing by chance accuracy
The percentage of cases in each group defined by the dependent
variable is found in the „Case Processing Summary‟ table.
Marginal
N Percentage
Brand A 207 28.2%
B 307 41.8%
C 221 30.1%
Sex of Female 466 63.4%
participants Male 269 36.6%
Valid 735 100.0%
Missing 0
Total 735
17
Chance accuracy rate (CAR)
The proportional by chance accuracy rate is
computed by calculating the proportion of cases
for each group based on the number of cases in
each group in the 'Case Processing Summary',
and then squaring and summing the proportion
of cases in each group.
18
Comparing accuracy rates
• To characterize our model as useful, we compare the
overall percentage accuracy rate produced by SPSS at the
last step in which variables are entered to 25% more than
the proportional by chance accuracy.
19
Comparing accuracy rates
While we will accept most of the SPSS defaults for
the analysis, we need to specifically request the
classification table.
Classification
Predicted
Percent
Observed A B C Correct
A 58 136 13 28.0%
B 18 238 51 77.5%
C 10 101 110 49.8%
Overall
Percentage 11.7% 64.6% 23.7% 55.2%
20
Numerical problems
• The maximum likelihood method used to calculate multinomial
logistic regression is an iterative fitting process that attempts
to cycle through repetitions to find an answer.
95% Confidence
Interval for Exp(B)
Lower Upper
Branda B Std. Error Wald df Sig. Exp(B) Bound Bound
B Intercept -11.775 1.775 44.024 1 .000
Age .368 .055 44.813 1 .000 1.445 1.297 1.610
[Sex=1.00] .524 .194 7.272 1 .007 1.688 1.154 2.471
[Sex=2.00] 0b . . 0 . . . .
C Intercept -22.721 2.058 121.890 1 .000
Age .686 .063 119.954 1 .000 1.986 1.756 2.245
[Sex=1.00] .466 .226 4.247 1 .039 1.594 1.023 2.482
[Sex=2.00] 0b . . 0 . . . .
a. The reference category is: A.
b. This parameter is set to zero because it is redundant.
25
Relationship of individual independent
variables and the dependent variable
SPSS identifies the comparisons it makes for groups defined
by the dependent variable in the table of ‘Parameter
Estimates,’ using either the value codes or the value labels.
26
Relationship of individual independent
variables and the dependent variable
27
• In this example, there is a statistically
significant relationship between the
independent variables (sex and age) and
the dependent variable (brand type).
(LRT, page 23)
28
• The table on slide number 24 , titled Parameter Estimates,
has two parts, labeled with the categories of the outcome
variable brand. They correspond to two equations:
For example, we can say that for one unit change in the
variable age, the log of the ratio of the two probabilities,
P(brand=B)/P(brand=A), will be increased by 0.368.
29
We can say that for one unit change in the variable age, we
expect the relative risk (in this example, preference for brand B)
of choosing brand B over A to increase by exp(.3682) = OR =
1.45. So we can say that the relative risk (in this example,
preference) is higher for older people.
In general, the older a person is, the more he/she will prefer29
brand B or C.
• Both female and age are statistically significant
across the two models. Females are more likely to
prefer brands B or C compared to brand A.
31
Cautions
• Pseudo-R-Squared: These do not convey the same
information as the R-square for linear regression, even though
it is still "the higher, the better".
33
Cautions
Multicollinearity in the multinomial logistic
regression is detected by examining the standard
errors for the b coefficients.