0% found this document useful (0 votes)
184 views73 pages

Multinomial Logistic Regression Basic Relationships

- Multinomial logistic regression is used to analyze relationships between a non-metric dependent variable and metric or dichotomous independent variables through a combination of binary logistic regressions. - It provides coefficients and equations to compute probabilities of group membership and predict the group with the highest probability for classification accuracy analysis. - A significant overall test statistic indicates a relationship between the dependent variable and independent variables. Classification accuracy above 25% improvement over chance levels shows the model is useful.

Uploaded by

libremd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views73 pages

Multinomial Logistic Regression Basic Relationships

- Multinomial logistic regression is used to analyze relationships between a non-metric dependent variable and metric or dichotomous independent variables through a combination of binary logistic regressions. - It provides coefficients and equations to compute probabilities of group membership and predict the group with the highest probability for classification accuracy analysis. - A significant overall test statistic indicates a relationship between the dependent variable and independent variables. Classification accuracy above 25% improvement over chance levels shows the model is useful.

Uploaded by

libremd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

SW388R7

Data Analysis & Multinomial Logistic Regression


Basic Relationships
Computers II

Slide 1

Multinomial Logistic Regression

Describing Relationships

Classification Accuracy

Sample Problems
Compu
ters II
Multinomial logistic regression
Slide 2
 Multinomial logistic regression is used to analyze relationships
between a non-metric dependent variable and metric or
dichotomous independent variables.

 Multinomial logistic regression compares multiple groups


through a combination of binary logistic regressions.

 The group comparisons are equivalent to the comparisons for a


dummy-coded dependent variable, with the group with the
highest numeric score used as the reference group.

 For example, if we wanted to study differences in BSW, MSW,


and PhD students using multinomial logistic regression, the
analysis would compare BSW students to PhD students and MSW
students to PhD students. For each independent variable, there
would be two comparisons.
Compu
ters II
What multinomial logistic regression predicts
Slide 3

 Multinomial logistic regression provides a set of coefficients for


each of the two comparisons. The coefficients for the
reference group are all zeros, similar to the coefficients for the
reference group for a dummy-coded variable.

 Thus, there are three equations, one for each of the groups
defined by the dependent variable.

 The three equations can be used to compute the probability


that a subject is a member of each of the three groups. A case
is predicted to belong to the group associated with the highest
probability.

 Predicted group membership can be compared to actual group


membership to obtain a measure of classification accuracy.
Compu
ters II
Level of measurement requirements
Slide 4

 Multinomial logistic regression analysis requires that the


dependent variable be non-metric. Dichotomous, nominal, and
ordinal variables satisfy the level of measurement requirement.

 Multinomial logistic regression analysis requires that the


independent variables be metric or dichotomous. Since SPSS
will automatically dummy-code nominal level variables, they
can be included since they will be dichotomized in the analysis.

 In SPSS, non-metric independent variables are included as


“factors.” SPSS will dummy-code non-metric IVs.

 In SPSS, metric independent variables are included as


“covariates.” If an independent variable is ordinal, we will
attach the usual caution.
Compu
ters II
Assumptions and outliers
Slide 5

 Multinomial logistic regression does not make any assumptions


of normality, linearity, and homogeneity of variance for the
independent variables.

 Because it does not impose these requirements, it is preferred


to discriminant analysis when the data does not satisfy these
assumptions.

 SPSS does not compute any diagnostic statistics for outliers. To


evaluate outliers, the advice is to run multiple binary logistic
regressions and use those results to test the exclusion of
outliers or influential cases.
Compu
ters II
Sample size requirements
Slide 6

 The minimum number of cases per independent variable is 10,


using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.

 For preferred case-to-variable ratios, we will use 20 to 1.


Compu
ters II
Methods for including variables
Slide 7

 The only method for selecting independent variables in SPSS is


simultaneous or direct entry.
Compu
ters II
Overall test of relationship - 1
Slide 8

 The overall test of relationship among the independent


variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.

 This difference in likelihood follows a chi-square distribution,


and is referred to as the model chi-square.

 The significance test for the final model chi-square (after the
independent variables have been added) is our statistical
evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.
Compu
ters II
Overall test of relationship - 2
Slide 9

Model Fitting Information

-2 Log
Model Likelihood Chi-Square df Sig.
Intercept Only 284.429
Final 265.972 18.457 6 .005

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".

In this analysis, the probability of the model chi-square


(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables
was rejected. The existence of a relationship between
the independent variables and the dependent variable
was supported.
ters II

Slide Strength of multinomial logistic regression


10 relationship
 While multinomial logistic regression does compute correlation
measures to estimate the strength of the relationship (pseudo R
square measures, such as Nagelkerke's R²), these correlations
measures do not really tell us much about the accuracy or
errors associated with the model.

 A more useful measure to assess the utility of a multinomial


logistic regression model is classification accuracy, which
compares predicted group membership based on the logistic
model to the actual, known group membership, which is the
value for the dependent variable.
ters II

Slide Evaluating usefulness for logistic models


11

 The benchmark that we will use to characterize a multinomial


logistic regression model as useful is a 25% improvement over
the rate of accuracy achievable by chance alone.

 Even if the independent variables had no relationship to the


groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.

 The estimate of by chance accuracy that we will use is the


proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group. The only
difference between by chance accuracy for binary logistic
models and by chance accuracy for multinomial logistic models
is the number of groups defined by the dependent variable.
ters II

Slide Computing by chance accuracy


12
The percentage of cases in each group defined by the dependent
variable is found in the ‘Case Processing Summary’ table.

Case Processing Summary

Marginal
N Percentage
HIGHWAYS 1 62 37.1%
AND BRIDGES 2 93 55.7%
3 12 7.2%
Valid 167 100.0%
Missing 103
Total 270
Subpopulation 153a
a. The dependent variable has only one value observed
in 146 (95.4%) subpopulations.

The proportional by chance accuracy rate was


computed by calculating the proportion of cases for
each group based on the number of cases in each
group in the 'Case Processing Summary', and then
squaring and summing the proportion of cases in each
group (0.371² + 0.557² + 0.072² = 0.453).

The proportional by chance accuracy criteria is 56.6%


(1.25 x 45.3% = 56.6%).
ters II

Slide Comparing accuracy rates


13

 To characterize our model as useful, we compare the overall


percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for multinomial logistic regression .)
Classification

Predicted
Percent
Observed 1 2 3 Correct
1 15 47 0 24.2%
2 7 86 0 92.5%
3 5 7 0 .0%
Overall Percentage 16.2% 83.8% .0% 60.5%

The classification accuracy rate was 60.5%


which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).

The criteria for classification accuracy is


satisfied in this example.
ters II

Slide Numerical problems


14

 The maximum likelihood method used to calculate multinomial


logistic regression is an iterative fitting process that attempts
to cycle through repetitions to find an answer.
 Sometimes, the method will break down and not be able to
converge or find an answer.
 Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
 The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.
ters II

Slide Relationship of individual independent


15 variables and the dependent variable

 There are two types of tests for individual independent


variables:
 The likelihood ratio test evaluates the overall relationship

between an independent variable and the dependent


variable
 The Wald test evaluates whether or not the independent

variable is statistically significant in differentiating between


the two groups in each of the embedded binary logistic
comparisons.

 If an independent variable has an overall relationship to the


dependent variable, it might or might not be statistically
significant in differentiating between pairs of groups defined by
the dependent variable.
ters II

Slide Relationship of individual independent


16 variables and the dependent variable
 The interpretation for an independent variable focuses on its
ability to distinguish between pairs of groups and the
contribution which it makes to changing the odds of being in
one dependent variable group rather than the other.

 We should not interpret the significance of an independent


variable’s role in distinguishing between pairs of groups unless
the independent variable also has an overall relationship to the
dependent variable in the likelihood ratio test.

 The interpretation of an independent variable’s role in


differentiating dependent variable groups is the same as we
used in binary logistic regression. The difference in
multinomial logistic regression is that we can have multiple
interpretations for an independent variable in relation to
different pairs of groups.
ters II

Slide Relationship of individual independent


17 variables and the dependent variable
Parameter Estimates

95% Confidence Interva


HIGHWAYS Exp(B)
a
AND BRIDGES B Std. Error SPSS
Waldidentifiesdf the comparisons
Sig. it makes for
Exp(B) Lower Bound Upper B
1 Intercept 3.240 2.478 groups
1.709defined by1the dependent
.191 variable in
AGE .019 .020
the table
.906
of ‘Parameter
1
Estimates,’
.341
using either .980
1.019
the value codes or the value labels, depending
EDUC .071 .108 on the.427 1
options settings for.514 1.073
pivot table labeling. .868
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075
2 Intercept 3.639 2.456 The 2.195
reference category
1 is identified
.138 in the
footnote to the table.
AGE .003 .020 .017 1 .897 1.003 .963
EDUC .172 .110 In this
2.463analysis, two
1 comparisons
.117 will be
1.188 .958
CONLEGIS -1.657 .613 made:
7.298 1 .007 .191 .057
•the TOO LITTLE group (coded 1, shaded
a. The reference category is: 3. blue) will be compared to the TOO MUCH
Parameter
group (coded Estimates
3, shaded purple)
•the ABOUT RIGHT group (coded 2 ,
shaded orange)) will be compared to the 95% C
HIGHWAYS TOO MUCH group (coded 3, shaded
a
AND BRIDGES B Std.purple).
Error Wald df Sig. Exp(B) Lower B
TOO LITTLE Intercept 3.240 2.478 1.709 1 .191
AGE The reference
.019 .020 category
.906 plays the
1 same.341 role in 1.019
multinomial logistic regression that it plays in
EDUC .071 .108
the dummy-coding .427 1 variable:
of a nominal .514 it is 1.073
CONLEGIS the category
-1.373 .620 that4.913
would be coded
1 with .027zeros .253
ABOUT RIGHT Intercept for
3.639 all of the
2.456 dummy-coded
2.195 variables
1 that
.138 all
other categories are interpreted against.
AGE .003 .020 .017 1 .897 1.003
EDUC .172 .110 2.463 1 .117 1.188
CONLEGIS -1.657 .613 7.298 1 .007 .191
a. The reference category is: TOO MUCH.
ters II

Slide Relationship of individual independent


18 variables and the dependent variable
Likelihood Ratio Tests
In this example, there is a
-2 Log statistically significant
Likelihood of relationship between the
Reduced independent variable
Effect Model Chi-Square df Sig. CONLEGIS and the dependent
Intercept 268.323 2.350 2 .309 variable. (0.010 < 0.05)
AGE 268.625 2.652 2 .265
EDUC 270.395 4.423 2 .110
CONLEGIS 275.194 9.221 2 .010
The chi-square statistic is the difference in -2 log-likelihoods As well, the independent
between the final model and a reduced model. The reduced model is variable CONLEGIS is
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis significant in distinguishing
is that all parameters of that effect are 0. both category 1 of 95%the Confidence Interval f
dependent variable from Exp(B)
HIGHWAYS category 3 of the dependent
a
AND BRIDGES B Std. Error Wald df Sig. Exp(B) < 0.05)
variable. (0.027 Lower Bound Upper Bou
1 Intercept 3.240 2.478 1.709 1 .191
AGE .019 .020 .906 1 .341 1.019 .980 1.0
EDUC .071 .108 .427 1 .514 1.073 .868 1.3
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075 .8
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .963 1.0
EDUC .172 .110 2.463 1 .117 1.188 .958 1.4
CONLEGIS -1.657 .613 7.298 1 .007 .191 .057 .6
a. The reference category is: 3.
And the independent variable CONLEGIS is significant in
distinguishing category 2 of the dependent variable from
category 3 of the dependent variable. (0.007 < 0.05)
ters II

Slide Interpreting relationship of individual independent


19 variables to the dependent variable
Likelihood Ratio Tests

-2 Log
Survey
Likelihood of respondents who had less confidence in congress (higher
values correspond to lower confidence) were less likely to be in the
Reduced
Effect group ofChi-Square
Model survey respondents
df who thought we spend too little money
Sig.
Intercept
on highways and bridges (DV category 1), rather than the group of
268.323 2.350
survey respondents 2
who thought we.309spend too much money on
AGE highways and2.652
268.625 bridges (DV 2category 3).
.265
EDUC 270.395 4.423 2 .110
CONLEGIS For each
275.194 unit increase
9.221 in confidence
2 in Congress, the odds of being
.010
in the group of survey respondents who thought we spend too little
The chi-square statistic
moneyis theon
difference in -2 log-likelihoods
highways and bridges decreased by 74.7%. (0.253 – 1.0
between the final model and
= -0.747) a reduced model. The reduced model is
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.
95% Confidence Interval f
HIGHWAYS Exp(B)
a
AND BRIDGES B Std. Error Wald df Sig. Exp(B) Lower Bound Upper Bou
1 Intercept 3.240 2.478 1.709 1 .191
AGE .019 .020 .906 1 .341 1.019 .980 1.0
EDUC .071 .108 .427 1 .514 1.073 .868 1.3
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075 .8
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .963 1.0
EDUC .172 .110 2.463 1 .117 1.188 .958 1.4
CONLEGIS -1.657 .613 7.298 1 .007 .191 .057 .6
a. The reference category is: 3.
ters II

Slide Interpreting relationship of individual independent


20 variables to the dependent variable
Likelihood Ratio Tests

-2 Log
Likelihood of
Reduced
Effect Model Chi-Square df Sig.
Intercept 268.323 2.350 2 .309
AGE 268.625 2.652 2 .265
EDUC 270.395 4.423 2 .110
CONLEGIS 275.194 9.221 2 .010
Survey respondents who had less confidence in congress (higher
The chi-square statistic isvalues correspond
the difference to lower confidence) were less likely to be in the
in -2 log-likelihoods
between the final model andgroup of survey
a reduced model. respondents
The reduced modelwhois thought we spend about the right
formed by omitting an effect from the final model. The null hypothesis bridges
amount of money on highways and
Parameter (DV category 2), rather
Estimates
than the group
is that all parameters of that effect are 0. of survey respondents who thought we spend too
much money on highways and bridges (DV Category 3). 95% Confidence Interval f
HIGHWAYS Exp(B)
a
AND BRIDGES For each unit
B increase in confidence
Std. Error Wald in Congress,
df the
Sig.odds of being
Exp(B) Lower Bound Upper Bou
1 in the group
Intercept of survey
3.240 respondents
2.478 1.709 who thought
1 we spend
.191 about the
AGE
right amount of money on highways and bridges decreased by
.019 – 1.0 =
80.9%. (0.191 .020
0.809) .906 1 .341 1.019 .980 1.0
EDUC .071 .108 .427 1 .514 1.073 .868 1.3
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075 .8
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .963 1.0
EDUC .172 .110 2.463 1 .117 1.188 .958 1.4
CONLEGIS -1.657 .613 7.298 1 .007 .191 .057 .6
a. The reference category is: 3.
ters II

Slide Relationship of individual independent


21 variables and the dependent variable
Likelihood Ratio Tests

-2 Log In this example, there is


Likelihood of a statistically significant
Reduced relationship between SEX
Effect Model Chi-Square df Sig. and the dependent
Intercept 327.463a .000 0 . variable, spending on
AGE
childcare assistance.
333.440 5.976 2 .050
EDUC 329.606 2.143 2 .343
POLVIEWS 334.636 7.173 2 .028
SEX 338.985 11.521 2 .003
The chi-square statistic is the difference in -2 log-likelihoods
between the final model and a reduced model. The reducedParameter
model Estimates As well, SEX plays a
statistically significant role
is formed by omitting an effect from the final model. The null
in differentiating 95%
the Confidence
TOO Interval
hypothesis is that all parameters of that effect are 0. LITTLE group from the TOO Exp(B)
a.
Sig.MUCH Exp(B)
(reference) group.
a
NATCHLD B Std. Error Wald df Lower Bound Upper Bo
This reducedIntercept
TOO LITTLE model is equivalent to the final2.233
model because (0.007 < 0.5)
8.434 14.261 1 .000
omitting the effect does not increase the degrees of freedom.
AGE -.023 .017 1.756 1 .185 .977 .944 1.
EDUC -.066 .102 .414 1 .520 .936 .766 1.
POLVIEWS -.575 .251 5.234 1 .022 .563 .344 .
[SEX=1] -2.167 .805 7.242 1 .007 .115 .024 .
[SEX=2] 0b . . 0 . . .
However, SEX does not
ABOUT RIGHT Intercept 4.485 2.255 3.955 1 .047differentiate the ABOUT
AGE -.001 .018 .003 1 .955RIGHT .999
group from the
.965 1.
EDUC .011 .104 .011 1 .916 TOO MUCH
1.011 (reference)
.824 1.
POLVIEWS
group.(0.51 > 0.5)
-.397 .257 2.375 1 .123 .673 .406 1.
[SEX=1] -1.606 .824 3.800 1 .051 .201 .040 1.
[SEX=2] 0b . . 0 . . .
a. The reference category is: TOO MUCH.
ters II

Slide Interpreting relationship of individual independent


22 variables and the dependent variable
Likelihood Ratio Tests

-2 Log
Likelihood of
Reduced
Effect Model Chi-Square df Sig.
Intercept 327.463a .000 0 .
AGE Survey respondents
333.440 5.976 who were2 male (code
.050 1 for sex) were less likely
EDUC to 329.606
be in the group of
2.143 survey respondents
2 .343 who thought we spend too
little money on childcare assistance (DV category 1), rather than the
POLVIEWS group of survey 7.173
334.636 respondents who
2 thought
.028 we spend too much
SEX money on childcare
338.985 11.521assistance2 (DV category
.003 3).
The chi-square statistic is the difference in -2 log-likelihoods
between the finalSurvey
model andrespondents
a reduced model. whoThe were male
reduced were 88.5%
Parameter
model less likely (0.115 –
Estimates
1.0 = -0.885) to be in the group of survey respondents who thought
is formed by omitting an effect
we spend from
too the final
little model.
money onThe null
childcare assistance. 95% Confidence Interval
hypothesis is that all parameters of that effect are 0.
Exp(B)
a. a
NATCHLD B Std. Error Wald df Sig. Exp(B) Lower Bound Upper Bo
This reducedIntercept
TOO LITTLE model is equivalent to the final2.233
8.434 model because
14.261 1 .000
omitting the effect does not increase the degrees of freedom.
AGE -.023 .017 1.756 1 .185 .977 .944 1.
EDUC -.066 .102 .414 1 .520 .936 .766 1.
POLVIEWS -.575 .251 5.234 1 .022 .563 .344 .
[SEX=1] -2.167 .805 7.242 1 .007 .115 .024 .
[SEX=2] 0b . . 0 . . .
ABOUT RIGHT Intercept 4.485 2.255 3.955 1 .047
AGE -.001 .018 .003 1 .955 .999 .965 1.
EDUC .011 .104 .011 1 .916 1.011 .824 1.
POLVIEWS -.397 .257 2.375 1 .123 .673 .406 1.
[SEX=1] -1.606 .824 3.800 1 .051 .201 .040 1.
[SEX=2] 0b . . 0 . . .
a. The reference category is: TOO MUCH.
ters II

Slide Interpreting relationships for independent


23 variable in problems

 In the multinomial logistic regression problems, the problem


statement will ask about only one of the independent variables.
The answer will be true or false based on only the relationship
between the specified independent variable and the dependent
variable. The individual relationships between other
independent variables are the dependent variable are not used
in determining whether or not the answer is true or false.
ters II

Slide Problem 1
24
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
ters II

Slide Dissecting problem 1 - 1


25
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who For
thought
thesewe spend too
problems, we little
will money on highways and
bridges from survey respondents who assume
thoughtthat
we spend too much
there is no problemmoney on highways and
bridges and survey respondents who thought we spend
with missing data,about theorright amount of money on
outliers,
highways and bridges from survey respondents who thought we spend too much money on
influential cases, and that the
highways and bridges. validation analysis will confirm
the in
Among this set of predictors, confidence generalizability
Congress wasofhelpful
the in distinguishing among the
groups defined by responses to opinionresults
about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too In little
this problem,
money on wehighways
are told to
and bridges, rather than the
group of survey respondents who thought we spend
use 0.05 toofor
as alpha much
the money on highways and bridges.
For each unit increase in confidence in Congress, logistic
multinomial the odds of being in the group of survey
regression.
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
ters II

Slide Dissecting problem 1 - 2


26

The variables listed first in the problem


statement are the independent variables
(IVs): "age" [age], "highest year of school
11. Incompleted"
the dataset GSS2000,
[educ] is the following
and "confidence in statement true, false, or an incorrect application
of a statistic? Assume that
Congress" [conlegis]. there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways andThebridges from
variable survey
used respondents who thought we spend too much money on
to define
highways andgroups
bridges.
is the dependent
variable (DV): "opinion about
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
spending on highways and
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had[natroad].
bridges" less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the oddsonly
of being in the group of survey
respondents who thought we spend too little moneySPSSon highwayssupports direct or
and bridges decreased by
74.7%. Survey respondents who had less confidence simultaneous entryless
in congress were of independent
likely to be in the
group of survey respondents who thought we spend variables
about theinright
multinomial
amount logistic
of money on
regression,
highways and bridges, rather than the group of survey so we
respondents whohave no choice
thought of
we spend too
method for entering variables.
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
ters II

Slide Dissecting problem 1 - 3


27

SPSS multinomial logistic regression models the relationship by


comparing each of the groups defined by the dependent variable to the
group with the highest code value.
11. In the dataset GSS2000,
The responses is the following
to opinion statement
about spending true, false,
on highways andor an incorrect
bridges were: application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the1= validation analysis will confirm the generalizability of the results. Use a level of
Too little, 2 = About right, and 3 = Too much.
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who The
thought we spend
analysis too in
will result little
two money on highways and bridges, rather than the
comparisons:
group of survey respondents who thought we spend
• survey respondents who thought we spendtoo much too
moneylittleon highways and bridges.
money
For each unit increase in confidence in Congress, the odds of being
versus survey respondents who thought we spend too much in the group of survey
respondents who thought
moneywe onspend
highwaystoo and
littlebridges
money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence
• survey respondents who thought we in congress werethe
spend about less likely to be in the
right
group of survey respondents who thought we spend about the right
amount of money versus survey respondents who thought we
amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
spend and
much money on highways too much money
bridges. on highways
For each and bridges.
unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
ters II

Slide Dissecting problem 1 - 4


28

Each problem includes a statement about the relationship between


one independent variable and the dependent variable. The answer
to the problem is based on the stated relationship, ignoring the
The variablesrelationships
"age" [age], between
"highest the
yearother independent
of school variables
completed" and
[educ] the"confidence in
and
Congress" [conlegis] were
dependent useful predictors for distinguishing between groups based on
variable.
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate This
survey respondents
problem who
identifies thought we
a difference forspend too
both of little
the money on highways and
comparisons
bridges from among
survey respondents who thought we spend too much money on highways and
groups modeled by the multinomial logistic regression.
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend too little money on highways and bridges, rather
than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the
group of survey respondents who thought we spend too little money on highways and
bridges decreased by 74.7%. Survey respondents who had less confidence in congress were
less likely to be in the group of survey respondents who thought we spend about the right
amount of money on highways and bridges, rather than the group of survey respondents
who thought we spend too much money on highways and bridges. For each unit increase in
confidence in Congress, the odds of being in the group of survey respondents who thought
we spend about the right amount of money on highways and bridges decreased by 80.9%.
ters II

Slide Dissecting problem 1 - 5


29

11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence inInCongress,
order for the
the odds
multinomial
of beinglogistic
in theregression
group of survey
respondents who thought we spend too little money on highways andrelationship
question to be true, the overall must
bridges decreased by
be statistically significant, there must be no
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thoughtevidence of numerical
we spend about the problems, the classification
right amount of money on
highways and bridges, rather than the accuracy
group of rate
survey
must respondents who thought
be substantially we spend too
better than
much money on highways and bridges.couldFor each unit increase
be obtained in confidence
by chance alone, andin Congress, the
the
odds of being in the group of survey respondents who thought
stated individual we spend
relationship must be about the right amount
statistically
of money on highways and bridges decreased by 80.9%.
significant and interpreted correctly.
ters II

Slide Request multinomial logistic regression


30

Select the Regression |


Multinomial Logistic…
command from the
Analyze menu.
ters II

Slide Selecting the dependent variable


31

First, highlight the


dependent variable
Second, click on the right
natroad in the list
arrow button to move the
of variables.
dependent variable to the
Dependent text box.
ters II

Slide Selecting metric independent variables


32

Metric independent variables are specified as covariates


in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.

Move the metric


independent variables,
age, educ and conlegis to
the Covariate(s) list box.

In this analysis, there are no non-


metric independent variables. Non-
metric independent variables would be
moved to the Factor(s) list box.
ters II

Slide Specifying statistics to include in the output


33

While we will accept most of


the SPSS defaults for the
analysis, we need to specifically
request the classification table.

Click on the Statistics… button


to make a request.
ters II

Slide Requesting the classification table


34

Third, click
First, keep the SPSS on the
defaults for Summary Continue
statistics, Likelihood button to
ratio test, and complete the
Parameter estimates. request.

Second, mark the


checkbox for the
Classification table.
ters II

Slide Completing the multinomial


35 logistic regression request

Click on the OK
button to request
the output for the
multinomial logistic
regression.

The multinomial logistic procedure supports


additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
ters II

Slide LEVEL OF MEASUREMENT - 1


36
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence
Multinomial logistic in congressrequires
regression were less likely
that the to be in the group of survey
respondents who thought
dependentwe spend too
variable belittle money and
non-metric on highways
the and bridges, rather than the
group of survey respondents who thought we spend too much
independent variables be metric or dichotomous. money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought
"Opinion weabout
spend too little
spending onmoney
highwayson highways
and and bridges decreased by
74.7%. Survey respondents
bridges" who had less
[natroad] confidence
is ordinal, in congress
satisfying the non-were less likely to be in the
metric level
group of survey respondents of measurement
who thought we spend requirement
about the forright
the amount of money on
dependent variable.
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in theIt contains
group three categories:
of survey respondents survey
who respondents
thought we spend about the right amount
who thought we spend too little money, about
of money on highwaysthe and
rightbridges
amountdecreased
of money, by and80.9%.
too much
1. True money on highways and bridges.
2. True with caution
ters II

Slide LEVEL OF MEASUREMENT - 2


37

"Age" [age] and "highest year of


school completed" [educ] are interval,
11. satisfying
In the dataset GSS2000,
the metric is the following statement true, false, or an incorrect application
or dichotomous
of alevel
statistic? Assume thatrequirement
of measurement there is noforproblem with missing data, outliers, or influential cases,
independent variables.
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents
"Confidence in Congress"who thought
[conlegis] is we spend too much money on
ordinal,
highways and bridges. satisfying the metric or dichotomous level of
measurement requirement for independent
variables. If we follow the convention of treating
Among this set of predictors,
ordinalconfidence in Congress
level variables as metricwas helpfulthe
variables, in distinguishing
level among the
groups defined by responses to opinion about
of measurement spending
requirement for on
thehighways and bridges. Survey
analysis is
respondents who had less confidence
satisfied. Sincein congress
some were lessdolikely
data analysts to be in the group of survey
not agree
respondents who thought wethis
with spend too littlea money
convention, note of on highways
caution and
should bebridges, rather than the
included in our interpretation.
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
ters II

Slide Sample size – ratio of cases to variables


38

Case Processing Summary

Marginal
N Percentage
HIGHWAYS 1 62 37.1%
AND BRIDGES 2 93 55.7%
3 12 7.2%
Valid 167 100.0%
Missing 103
Total 270
Subpopulation 153a
a. The dependent variable has only one value observed
Multinomial logistic regression
in 146 requires that the minimum ratio
(95.4%) subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (167) to number of independent variables
(3) was 55.7 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.

The preferred ratio of valid cases to independent variables is


20 to 1. The ratio of 55.7 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
ters II

Slide OVERALL RELATIONSHIP BETWEEN


39 INDEPENDENT AND DEPENDENT VARIABLES

Model Fitting Information

-2 Log
Model Likelihood Chi-Square df Sig.
Intercept Only 284.429
Final 265.972 18.457 6 .005

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".

In this analysis, the probability of the model chi-square


(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables
was rejected. The existence of a relationship between
the independent variables and the dependent variable
was supported.
ters II

Slide NUMERICAL PROBLEMS


40

Parameter Estimates

95% Confidence Inter


HIGHWAYS Exp(B)
a Multicollinearity in the multinomial
AND BRIDGES B Std. Error Wald df
logistic Sig.
regression Exp(B) is Lower Bound
solution Upper
1 Intercept 3.240 2.478 1.709 1 .191
detected by examining the standard
AGE .019 .020 .906 errors1for the .341
b coefficients.
1.019 A .980
EDUC .071 .108 .427 standard
1 error larger
.514 than
1.0732.0 .868
indicates numerical problems, such
CONLEGIS -1.373 .620 4.913 1 .027 among
as multicollinearity .253the .075
2 Intercept 3.639 2.456 2.195 independent
1 variables,
.138 zero cells for
AGE .003 .020 .017 a dummy-coded
1 .897independent
1.003 .963
variable because all of the subjects
EDUC .172 .110 2.463 1 same.117
have the value for1.188
the .958
CONLEGIS -1.657 .613 7.298 variable,
1 and 'complete
.007 separation'
.191 .057
a. The reference category is: 3. whereby the two groups in the
dependent event variable can be
perfectly separated by scores on
one of the independent variables.
Analyses that indicate numerical
problems should not be interpreted.

None of the independent variables


in this analysis had a standard error
larger than 2.0. (We are not
interested in the standard errors
associated with the intercept.)
ters II

Slide RELATIONSHIP OF INDIVIDUAL INDEPENDENT


41 VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests

-2 Log
Likelihood of
Reduced
Effect Model Chi-Square df Sig.
Intercept 268.323 2.350 2 .309
AGE 268.625 2.652 2 .265
EDUC 270.395 4.423 2 .110
CONLEGIS 275.194 9.221 2 .010
The chi-square statistic is the difference in -2 log-likelihoods
between the final model and a reduced model. The reduced model is
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.
The statistical significance of the relationship between
confidence in Congress and opinion about spending on
highways and bridges is based on the statistical significance of
the chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".

For this relationship, the probability of the chi-square statistic


(9.221) was 0.010, less than or equal to the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with confidence in Congress were equal
to zero was rejected. The existence of a relationship between
confidence in Congress and opinion about spending on
highways and bridges was supported.
ters II

Slide RELATIONSHIP OF INDIVIDUAL INDEPENDENT


42 VARIABLES TO DEPENDENT VARIABLE - 2

Parameter Estimates

95% Confiden
HIGHWAYS Exp
a
AND BRIDGES B Std. Error Wald df Sig. Exp(B) Lower Bound
1 Intercept 3.240 2.478 1.709 1 .191
AGE .019 .020 .906 1 .341 1.019 .980
EDUC .071 .108 .427 1 .514 1.073 .868
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .963
EDUC .172 .110 2.463 1 .117 1.188 .958
CONLEGIS -1.657 .613 7.298 1 .007 .191 .057
a. The reference category is: 3.

In the comparison of survey respondents who thought we spend


too little money on highways and bridges to survey respondents
who thought we spend too much money on highways and
bridges, the probability of the Wald statistic (4.913) for the
variable confidence in Congress [conlegis] was 0.027. Since the
probability was less than or equal to the level of significance of
0.05, the null hypothesis that the b coefficient for confidence in
Congress was equal to zero for this comparison was rejected.
ters II

Slide RELATIONSHIP OF INDIVIDUAL INDEPENDENT


43 VARIABLES TO DEPENDENT VARIABLE - 3
Parameter Estimates

95% Confiden
HIGHWAYS Exp
a
AND BRIDGES B Std. Error Wald df Sig. Exp(B) Lower Bound
1 Intercept 3.240 2.478 1.709 1 .191
AGE .019 .020 .906 1 .341 1.019 .980
EDUC .071 .108 .427 1 .514 1.073 .868
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .963
EDUC .172 .110 2.463 1 .117 1.188 .958
CONLEGIS -1.657 .613 7.298 1 .007 .191 .057
a. The reference category is: 3.
The value of Exp(B) was 0.253 which implies that for each unit
increase in confidence in Congress the odds decreased by 74.7%
(0.253 - 1.0 = -0.747).

The relationship stated in the problem is supported. Survey


respondents who had less confidence in congress were less likely
to be in the group of survey respondents who thought we spend
too little money on highways and bridges, rather than the group of
survey respondents who thought we spend too much money on
highways and bridges. For each unit increase in confidence in
Congress, the odds of being in the group of survey respondents
who thought we spend too little money on highways and bridges
decreased by 74.7%.
ters II

Slide RELATIONSHIP OF INDIVIDUAL INDEPENDENT


44 VARIABLES TO DEPENDENT VARIABLE - 4

Parameter Estimates

95% Confiden
HIGHWAYS Exp
a
AND BRIDGES B Std. Error Wald df Sig. Exp(B) Lower Bound
1 Intercept 3.240 2.478 1.709 1 .191
AGE .019 .020 .906 1 .341 1.019 .980
EDUC .071 .108 .427 1 .514 1.073 .868
CONLEGIS -1.373 .620 4.913 1 .027 .253 .075
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .963
EDUC .172 .110 2.463 1 .117 1.188 .958
CONLEGIS -1.657 .613 7.298 1 .007 .191 .057
a. The reference category is: 3.

In the comparison of survey respondents who thought we spend


about the right amount of money on highways and bridges to
survey respondents who thought we spend too much money on
highways and bridges, the probability of the Wald statistic
(7.298) for the variable confidence in Congress [conlegis] was
0.007. Since the probability was less than or equal to the level
of significance of 0.05, the null hypothesis that the b coefficient
for confidence in Congress was equal to zero for this comparison
was rejected.
ters II

Slide RELATIONSHIP OF INDIVIDUAL INDEPENDENT


45 VARIABLES TO DEPENDENT VARIABLE - 5
Parameter Estimates

95% Con
HIGHWAYS
a
AND BRIDGES B Std. Error Wald df Sig. Exp(B) Lower Bou
1 Intercept 3.240 2.478 1.709 1 .191
AGE .019 .020 .906 1 .341 1.019 .9
EDUC .071 .108 .427 1 .514 1.073 .8
CONLEGIS -1.373 .620 4.913 1 .027 .253 .0
2 Intercept 3.639 2.456 2.195 1 .138
AGE .003 .020 .017 1 .897 1.003 .9
EDUC .172 .110 2.463 1 .117 1.188 .9
CONLEGIS -1.657 .613 7.298 1 .007 .191 .0
a. The reference category is: 3.

The value of Exp(B) was 0.191 which implies that for each unit increase in
confidence in Congress the odds decreased by 80.9% (0.191-1.0=-0.809).

The relationship stated in the problem is supported. Survey respondents


who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend about the right amount of
money on highways and bridges, rather than the group of survey
respondents who thought we spend too much money on highways and
bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend about the
right amount of money on highways and bridges decreased by 80.9%.
ters II

Slide CLASSIFICATION USING THE MULTINOMIAL LOGISTIC


46 REGRESSION MODEL: BY CHANCE ACCURACY RATE

The independent variables could be characterized as useful


predictors distinguishing survey respondents who thought we
spend too little money on highways and bridges, survey
respondents who thought we spend about the right amount
of money on highways and bridges and survey respondents
who thought we spend too much money on highways and
bridges if the classification accuracy rate was substantially
higher than the accuracy attainable by chance alone.
Operationally, the classification accuracy rate should be 25%
or more higher than the proportional by chance accuracy
rate.

Case Processing Summary

Marginal
N Percentage
HIGHWAYS 1 62 37.1%
AND BRIDGES 2 93 55.7%
3 12 7.2%
Valid 167 100.0%
Missing 103
Total
The proportional by chance accuracy rate 270
was computed by
calculating the proportion of cases for each
Subpopulation 153agroup based on
the number
a. of dependent
cases in each group in the 'Case Processing
Summary',The variable has only one value observed
and then squaring and summing the proportion of
in 146
cases in each (95.4%)
group subpopulations.
(0.371² + 0.557² + 0.072² = 0.453).
ters II

Slide CLASSIFICATION USING THE MULTINOMIAL LOGISTIC


47 REGRESSION MODEL: CLASSIFICATION ACCURACY

Classification

Predicted
Percent
Observed 1 2 3 Correct
1 15 47 0 24.2%
2 7 86 0 92.5%
3 5 7 0 .0%
Overall Percentage 16.2% 83.8% .0% 60.5%

The classification accuracy rate was 60.5%


which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).

The criteria for classification accuracy is


satisfied.
ters II

Slide Answering the question in problem 1 - 1


48
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidenceWe infound a statistically
congress were less significant
likely tooverall
be in the group of survey
respondents who thought we spendrelationship between
too little money onthe combination
highways of
and bridges, rather than the
independent
group of survey respondents who thought variables
we spend and the
too much dependent
money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
variable.
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less was
There confidence in congress
no evidence were less
of numerical likelyinto be in the
problems
group of survey respondents who thought we
the solution.spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each
Moreover, the unit increaseaccuracy
classification in confidence in Congress, the
surpassed
odds of being in the group of survey
the proportional by chance accuracy criteria, the right amount
respondents who thought we spend about
of money on highways and bridges supporting
decreased the
by 80.9%.
utility of the model.
1. True
2. True with caution
3. False
ters II

Slide Answering the question in problem 1 - 2


49

We verified
The variables "age" [age], thatyear
"highest eachofstatement about the relationship
school completed" [educ] and "confidence in
Congress" [conlegis]between an independent
were useful predictors forvariable and the dependent
distinguishing between groups based on
variable
responses to "opinion about was correct
spending oninhighways
both direction of the relationship
and bridges" [natroad]. These predictors
differentiate surveyand
respondents
the changewho thought we
in likelihood spend too
associated little
with money on highways and
a one-unit
bridges from survey change
respondents
of the who thought variable,
independent we spendfortoo much
both money on highways and
of the
bridges and survey respondents
comparisons who thought
between we stated
groups spend in
about the right amount of money on
the problem.
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True The answer to the question is true
2. True with caution with caution.
3. False
A caution is added because of the
4. Inappropriate application of a statistic inclusion of ordinal level variables.
ters II

Slide Problem 2
50
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spend about the right amount of money on space exploration,
rather than the group of survey respondents who thought we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
ters II

Slide Dissecting problem 2 - 1


51
1. In the dataset GSS2000, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, outliers, or
influential cases, and that the validation analysis will confirm the generalizability of the
results. Use a level of significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration"
For these [natspac].
problems, weThese
will predictors differentiate survey
respondents who thought we spend too little money
assume on is
that there space exploration from survey
no problem
respondents who thought we spend too much money on space
with missing data, outliers, exploration
or and survey
respondents who thought we spend about the right amount of money
influential cases, and that the on space exploration from
survey respondents who thought we spend too much money on space exploration.
validation analysis will confirm
the generalizability of the
Among this set of predictors, total family income was helpful in distinguishing among the
results
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes
In this were
problem, wemore likely
are told to to be in the group of survey
respondents who thought we spend about the right amount of
use 0.05 as alpha for the money on space exploration,
rather than the group of survey respondents who logistic
multinomial thoughtregression.
we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
ters II

Slide Dissecting problem 2 - 2


52

The variables listed first in the problem


statement are the independent variables
1. In (IVs):
the dataset GSS2000,
"highest year of is the following
school statement true, false, or an incorrect application of
completed"
a statistic? Assume that there is no problem
[educ], "sex" [sex] and "total family with missing data, outliers, or influential cases,
and that the validation
income" analysis will confirm the generalizability of the results. Use a level of
[income98].
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family
income" [income98] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on space exploration" [natspac]. These predictors
differentiate survey respondents who thought we spend too little money on space exploration
from survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
The variable used to define
exploration.
groups is the dependent
variable (DV): "opinion about
Among this on
spending setspace
of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
exploration" [natspac].
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spend about the right amount of money on space exploration,
rather than the group of survey respondents who thought we spend too much money on space
SPSS only
exploration. For each unit increase in total family income, thesupports direct or
odds of being in the group of
simultaneous entry of independent
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%. variables in multinomial logistic
regression, so we have no choice of
method for entering variables.
1. True
2. True with caution
3. False
ters II

Slide Dissecting problem 2 - 3


53

SPSS multinomial logistic regression models the relationship


by comparing each of the groups defined by the dependent
variable to the group with the highest code value.

1. In the dataset GSS2000,toisopinion


The responses the following statement
about spending ontrue, false, or an incorrect application of
the space
a statistic? Assume that
program were:there is no problem with missing data, outliers, or influential cases,
and that the1= validation analysis will confirm the generalizability
Too little, 2 = About right, and 3 = Too much. of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate
survey respondents who thought we spend too little money on space exploration from
survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
groups defined byThe analysisto
responses will result in
opinion two comparisons:
about spending on space exploration. Survey
• survey respondents who thought
respondents who had higher total family incomes were we spend too little
more likely money
to be in the group of survey
respondents who thought we spend about the right amount of money on much
versus survey respondents who thought we spend too space exploration,
money
rather than the group on space
of survey exploration
respondents who thought we spend too much money on space
• survey respondents who thought we spend
exploration. For each unit increase in total family income, theabout
odds the right in the group of
of being
survey respondents who thought
amount we spend
of money versusabout therespondents
survey right amount whoofthought
money weon space
exploration increased by 6.0%.
spend too much money on space exploration.

1. True
ters II

Slide Dissecting problem 2 - 4


54

Each
The variables problem
"highest includes
year a statement
of school about
completed" the "sex" [sex] and "total family income"
[educ],
[income98]relationship
were usefulbetween onefor
predictors independent variable
distinguishing and groups based on responses to
between
the dependent variable. The answer to the
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondentsproblem
who thought
is basedweonspend too little
the stated money on space exploration from survey
relationship,
respondentsignoring
who thought we spend too much
the relationships between the money
otheron space exploration and survey
respondentsindependent
who thought we spend
variables andabout the right variable.
the dependent amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of
survey respondents who thought we spend about the right amount of money on space
exploration, rather than the group of survey respondents who thought we spend too much
money on space exploration. For each unit increase in total family income, the odds of
being in the group of survey respondents who thought we spend about the right amount of
money on space exploration increased by 6.0%.

1. True
2. True with caution This problem identifies a difference for only one
3. False of the two comparisons based on the three values
4. Inappropriate application of a of the dependent variable.
statistic
Other problems will specify both of the possible
comparisons.
ters II

Slide Dissecting problem 2 - 5


55

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spend about the right amount of money on space exploration,
rather than the group of survey respondents who thought we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.

1. True In order for the multinomial logistic regression


2. True with caution question to be true, the overall relationship must
3. False be statistically significant, there must be no
evidence of numerical problems, the classification
4. Inappropriate application of a statistic
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.
ters II

Slide LEVEL OF MEASUREMENT - 1


56
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate
survey respondents who thought we spend too little money on space exploration from
survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
Multinomial
groups defined by responses tologistic
opinionregression requires
about spending onthat the exploration. Survey
space
dependent variable be non-metric and the
respondents who had higher total family incomes were more likely to be in the group of survey
independent
respondents who thought variables
we spend aboutbethemetric
rightoramount
dichotomous.
of money on space exploration,
rather than the group of survey respondents who thought
"Opinion about spending on space exploration" we spend too much money on space
exploration. For each unit increase
[natspac] in total
is ordinal, family
satisfying income,
the the odds of being in the group of
non-metric
survey respondentslevel
who of
thought we spend about the
measurement requirement for the right amount of money on space
exploration increased by 6.0%.
dependent variable.

It contains three categories: survey respondents


1. True who thought we spend too little money, about
2. True with caution
the right amount of money, and too much
3. False money on space exploration.
4. Inappropriate application of a statistic
ters II

Slide LEVEL OF MEASUREMENT - 2


57
"Highest year of school
completed" [educ] is interval, "Sex" [sex] is dichotomous,
satisfying the metric or satisfying the metric or
1. In the dataset
dichotomous dichotomous
level ofGSS2000, is the following statement true, false,level of incorrect
or an measurement
application of
a statistic? Assume
measurement that there
requirement for requirement
is no problem with missing data, for independent
outliers, or influential cases,
independent variables. variables.
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family
income" [income98] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on space exploration" [natspac]. These predictors
differentiate survey respondents who thought we spend too little money on space exploration
from survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents
"Total family income"who thought we
[income98] spend too much money on space
is ordinal,
exploration. satisfying the metric or dichotomous level of
measurement requirement for independent
variables. If we follow the convention of treating
Among this set of ordinal
predictors,
level total family
variables as income was helpful
metric variables, the in distinguishing among the
level
groups defined byofresponses
measurementto opinion about spending
requirement on space
for the analysis is exploration. Survey
respondents who had higherSince
satisfied. totalsome
family incomes
data analystswere more
do not likely to be in the group of survey
agree
respondents who thought
with thiswe spend about
convention, theofright
a note amount
caution of be
should money on space exploration,
included in our interpretation.
rather than the group of survey respondents who thought we spend about the right amount of
money on space exploration. For each unit increase in total family income, the odds of being in
the group of survey respondents who thought we spend about the right amount of money on
space exploration increased by 6.0%.

1. True
2. True with caution
ters II

Slide Request multinomial logistic regression


58

Select the Regression |


Multinomial Logistic…
command from the
Analyze menu.
ters II

Slide Selecting the dependent variable


59

First, highlight the


dependent variable
Second, click on the right
natspac in the list
arrow button to move the
of variables.
dependent variable to the
Dependent text box.
ters II

Slide Selecting non-metric independent variables


60

Non-metric independent variables are specified as


factors in multinomial logistic regression. Non-metric
variables can be either dichotomous, nominal, or
ordinal.

These variables will be dummy coded as needed and


each value will be listed separately in the output.

Select the
dichotomous Move the non-metric
variable sex. independent variables
listed in the problem to
the Factor(s) list box.
ters II

Slide Selecting metric independent variables


61

Metric independent variables are specified as covariates


in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.

Move the metric


independent variables,
educ and income98, to
the Covariate(s) list box.
ters II

Slide Specifying statistics to include in the output


62

While we will accept most of


the SPSS defaults for the
analysis, we need to specifically
request the classification table.

Click on the Statistics… button


to make a request.
ters II

Slide Requesting the classification table


63

Third, click
First, keep the SPSS on the
defaults for Summary Continue
statistics, Likelihood button to
ratio test, and complete the
Parameter estimates. request.

Second, mark the


checkbox for the
Classification table.
ters II

Slide Completing the multinomial


64 logistic regression request

Click on the OK
button to request
the output for the
multinomial logistic
regression.

The multinomial logistic procedure supports


additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
ters II

Slide Sample size – ratio of cases to variables


65
Case Processing Summary

Marginal
N Percentage
SPACE EXPLORATION 1 33 15.9%
PROGRAM 2 90 43.3%
3 85 40.9%
RESPONDENTS SEX 1 94 45.2%
2 114 54.8%
Valid 208 100.0%
Missing 62
Total 270
Subpopulation 138a
a. The dependent variable has only one value observed in 112
Multinomial logistic
(81.2%) regression requires that the minimum ratio
subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (208) to number of independent
variables( 3) was 69.3 to 1, which was equal to or greater than
the minimum ratio. The requirement for a minimum ratio of
cases to independent variables was satisfied.

The preferred ratio of valid cases to independent variables is


20 to 1. The ratio of 69.3 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
ters II

Slide OVERALL RELATIONSHIP BETWEEN


66 INDEPENDENT AND DEPENDENT VARIABLES

Model Fitting Information

-2 Log
Model Likelihood Chi-Square df Sig.
Intercept Only 354.268
Final 334.967 19.301 6 .004

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".

In this analysis, the probability of the model chi-square


(19.301) was 0.004, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables
was rejected. The existence of a relationship between
the independent variables and the dependent variable
was supported.
ters II

Slide NUMERICAL PROBLEMS


67
Parameter Estimates

95% Confidence
SPACE EXPLORATION Exp(B)
a
PROGRAM B Std. Error Wald df Sig. Exp(B) Lower Bound U
1 Intercept -4.136 1.157 12.779 Multicollinearity
1 in the multinomial
.000
EDUC .101 .089 1.276 logistic regression
1 .259 solution is
1.106 .929
INCOME98 .097 .050 3.701
detected1 by examining
.054
the
1.102 .998
standard errors for the b
[SEX=1] .672 .426 2.488 1
coefficients. .115
A standard 1.959
error .850
[SEX=2] 0b . . larger than
0 2.0 indicates
. numerical
. .
2 Intercept -2.487 .840 8.774 problems,1 such as
.003multicollinearity
among the independent variables,
EDUC .108 .068 2.521 1 for a dummy-coded
zero cells .112 1.114 .975
INCOME98 .058 .034 2.932 independent
1 variable
.087 because
1.060 all of .992
[SEX=1] .501 .317 2.492 the subjects
1 have the
.114 same value
1.650 .886
b for the variable, and 'complete
[SEX=2] 0 . . 0
separation' whereby . the two . .
a. The reference category is: 3. groups in the dependent event
variable can be perfectly separated
b. This parameter is set to zero because it is redundant.
by scores on one of the
independent variables. Analyses
that indicate numerical problems
should not be interpreted.

None of the independent variables


in this analysis had a standard
error larger than 2.0.
ters II

Slide RELATIONSHIP OF INDIVIDUAL INDEPENDENT


68 VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests

-2 Log
Likelihood of
Reduced
Effect Model Chi-Square df Sig.
Intercept 334.967a .000 0 .
EDUC 337.788 2.821 2 .244
INCOME98 340.154 5.187 2 .075
SEX 338.511 3.544 2 .170
The chi-square statistic is the difference in -2 log-likelihoods
between the final model and a reduced model. The reduced model
is formed by omitting an effect from the final model. The null
hypothesis is that all parameters of that effect are 0.
a.
The statistical significance of the relationship between
total family income and Thisopinion
reduced about
model isspending onthe
equivalent to space
final model because
exploration is based on the statistical
omitting significance
the effect does not increaseof
thethedegrees of freedom.
chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".

For this relationship, the probability of the chi-square


statistic (5.187) was 0.075, greater than the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with total family income were
equal to zero was not rejected. The existence of a
relationship between total family income and opinion
about spending on space exploration was not supported.
ters II

Slide Answering the question in problem 2


69
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
We found a statistically significant overall
relationship between the combination of
Among this set of predictors, totalindependent
family income was helpful
variables in dependent
and the distinguishing among the
groups defined by responses to opinion about
variable. spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spendThere
about thenoright
was amount
evidence of money problems
of numerical on space in
exploration,
rather than the group of survey respondents who thought we spend too much money on space
the solution.
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%. However, the individual relationship between
total family income and spending on space was
not statistically significant.
1. True
2. True with caution The answer to the question is false.
3. False
4. Inappropriate application of a statistic
ters II

Slide Steps in multinomial logistic regression:


70 level of measurement and initial sample size

The following is a guide to the decision process for answering


problems about the basic relationships in multinomial logistic
regression:

Dependent non-metric? No Inappropriate


Independent variables application of
metric or dichotomous? a statistic

Yes

Ratio of cases to No Inappropriate


independent variables at application of
least 10 to 1?
a statistic

Yes

Run multinomial logistic regression


ters II

Slide Steps in multinomial logistic regression:


71 overall relationship and numerical problems

Overall relationship No
statistically significant? False
(model chi-square test)

Yes

Standard errors of
No
coefficients indicate no
False
numerical problems (s.e.
<= 2.0)?

Yes
ters II

Slide Steps in multinomial logistic regression:


72 relationships between IV's and DV

Overall relationship
between specific IV and DV
No
is statistically significant? False
(likelihood ratio test)

Yes

Role of specific IV and DV


groups statistically significant No
and interpreted correctly? False
(Wald test and Exp(B))

Yes
ters II

Slide Steps in multinomial logistic regression:


73 classification accuracy and adding cautions

Overall accuracy rate is No


25% > than proportional False
by chance accuracy rate?

Yes

Satisfies preferred ratio of No


cases to IV's of 20 to 1 True with caution

Yes

One or more IV's are Yes


ordinal level treated as
metric? True with caution

No

True

You might also like