0% found this document useful (0 votes)
258 views33 pages

Logistic Regression

Logistic regression allows prediction of categorical outcomes with two or more categories. This analysis uses survey data to predict whether respondents have a sleep problem based on gender, age, hours of sleep, and difficulty falling or staying asleep. The logistic regression model correctly classified 75.1% of cases, with difficulty getting to sleep, staying asleep, and fewer hours of sleep increasing the likelihood of reporting a sleep problem.

Uploaded by

Waqar Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
258 views33 pages

Logistic Regression

Logistic regression allows prediction of categorical outcomes with two or more categories. This analysis uses survey data to predict whether respondents have a sleep problem based on gender, age, hours of sleep, and difficulty falling or staying asleep. The logistic regression model correctly classified 75.1% of cases, with difficulty getting to sleep, staying asleep, and fewer hours of sleep increasing the likelihood of reporting a sleep problem.

Uploaded by

Waqar Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Logistic Regression

Generalized Regression
• Family of Regression Analysis in which DV is a
categorical Variable is called generalized
regression.
• If DV has 02 categories it is called
Binomial(Binary) Regression
• If DV has more than 02 categories it is called
Multinomial Regression
Introduction
• There are many research situations, when the
dependent variable of interest is categorical
(e.g. win/lose; fail/pass; dead/alive).
• Multiple Regression is not suitable when you
have categorical dependent variables.
• For multiple regression your dependent
variable (the thing that you are trying to
explain or predict) is continuous variable.
Introduction
• Logistic regression allows you to test models to
predict categorical outcomes with two categories.
• Your predictor (independent) variables can be
either categorical or continuous, or a mix of both
in the one model.
• There is a family of logistic regression techniques
available in SPSS that will allow you to explore the
predictive ability of sets or blocks of variables,
and to specify the entry of variables.
Binary Logistic Regression
• In this approach, all predictor variables are
tested in one block to assess their predictive
ability while controlling for the effects of other
predictors in the model.
Example
• Use the file sleep4ED.sav
• In a survey, respondents were asked whether
they considered that they had a sleep-related
problem (yes/no).
• This variable will be used as the dependent
variable in this analysis.
• The set of predictors (independent variables)
includes sex, age, the number of hours of sleep
the person gets per weeknight, whether they
have trouble falling asleep, and whether they
have difficulty staying asleep.
Procedure
• Each of the variables was subjected to re-
coding of their original scores to ensure their
suitability for this analysis. The categorical
variables were recoded from their original
coding so that 0=no and 1=yes.
Procedure
• In order to make sense of the results of logistic regression,
it is important that you set up the coding of responses to
each of your variables carefully.
• For the dichotomous dependent variable, you should code
the responses as 0 and 1 (or recode existing values using
the Recode procedure in SPSS).
• The value of 0 should be assigned to whichever response
indicates a lack or absence of the characteristic of interest.
• In this example, 0 is used to code the answer No to the
question ‘Do you have a problem with your sleep?’ The
value of 1 is used to indicate a Yes answer. A similar
approach is used when coding the independent variables.
Procedure
• Variables:
• Problem with sleep recoded (probsleeprec): score
recoded to 0=no, 1=yes.
• Sex: 0=female, 1=male.
• Age: age in years.
• Hours sleep/weeknight (hourweeknight): in hours.
• Problem getting to sleep recoded (getsleeprec): score
recoded to: 0=no, 1=yes.
• Problem staying asleep recoded (staysleeprec): score
recoded to: 0=no, 1=yes.
Procedure
• Example of research question: What factors
predict the likelihood that respondents would
report that they had a problem with their
sleep?
• What you need:
• one categorical (dichotomous) dependent
variable (problem with sleep: No/Yes, coded
0/1)
Procedure for logistic regression
1. From the menu at the top of the screen, click on
Analyze, then click on Regression and then Binary
Logistic.
2. Choose your categorical dependent variable (e.g.
problem sleep recoded 01: probsleeprec) and move it
into the Dependent box.
• Click on your predictor variables (sex, age, problem
getting sleep recoded 01: getsleeprec, problem stay
asleep recoded 01: staysleeprec, hours sleep per
weeknight: hourweeknight) and move them into the
box labelled Covariates.
• For Method, make sure that Enter is displayed.
3. If you have any categorical predictors (nominal or
ordinal measurement), you will need to click on the
Categorical button. Highlight each of the categorical
variables (sex, getsleeprec, staysleeprec) and move
them into the Categorical covariates box.
• Highlight each of your categorical variables in
turn and click on the button labelled First in the
Change contrast section. Click on the Change
button and you will see the word (first) appear
after the variable name. This will set the group to
be used as the reference as the first group listed.
Repeat for all categorical variables.
• Click on Continue
4. Click on the Options button. Select Classifi
cation plots, Hosmer- Lemeshow goodness of
fit, Casewise listing of residuals, and CI for
Exp(B).
5. Click on Continue and then OK
Results
Interpretation
• The first thing to check is the details
concerning sample size provided in the Case
Processing Summary table. Make sure you
have the number of cases that you expect.
• The next table, Dependent Variable Encoding,
tells you how SPSS has dealt with the coding
of your dependent variable (in this case,
whether people consider they have a problem
with their sleep).
Result
• Block 0, is the results of the analysis without
any of our independent variables used in the
model. This will serve as a baseline later for
comparing the model with our predictor
variables included.
Interpretation
• Classification table, the overall percentage of
correctly classified cases is 57.3 per cent. In this
case, SPSS classified (guessed) that all cases
would not have a problem with their sleep (only
because there was a higher percentage of people
answering No to the question).
• We hope that later, when our set of predictor
variables is entered, we will be able to improve
the accuracy of these predictions.
Interpretation
• Block 1 is where our model (set of predictor
variables) is tested.
• The Omnibus Tests of Model Coefficients gives
us an overall indication of how well the model
performs, over and above the results obtained
for Block 0, with none of the predictors
entered into the model.
Interpretation
• This is referred to as a ‘goodness of fi t’ test.
For this set of results, we want a highly
significant value (the Sig. value should be less
than .05). In this case, the value is .000 (which
really means p<.0005). Therefore, the model
(with our set of variables used as predictors) is
better than SPSS’s original guess shown in
Block 0.
Interpretation
• Hosmer and Lemeshow Test also support our model as
being worthwhile. This test, which SPSS states is the
most reliable test of model fit available in SPSS, is
interpreted very differently from the omnibus test
• For the Hosmer-Lemeshow Goodness of Fit Test poor
fi t is indicated by a significance value less than .05, so
to support our model we actually want a value greater
than .05.
• In our example, the chi-square value for the Hosmer-
Lemeshow Test is 10.019 with a significance level of
.264. This value is larger than .05, therefore indicating
support for the model.
Interpretation
• Model Summary gives us another piece of
information about the usefulness of the model.
The Cox & Snell R Square and the Nagelkerke R
Square values provide an indication of the
amount of variation in the dependent variable
explained by the model (from a minimum value
of 0 to a maximum of approximately 1).
• When IVs are quantitative we prefer Cox & Snell.
• When IVs are mixed we use Nagelkerke
• These are Pseudo R squares
Interpretation
• Classification Table provides us with an
indication of how well the model is able to
predict the correct category (sleep
problem/no sleep problem) for each case. We
can compare this with the Classification Table
shown for Block 0
• The model correctly classified 75.1 per cent of
cases overall
Interpretation
• The Variables in the Equation table gives us
information about the contribution or importance of
each of our predictor variables. The test that is used
here is known as the Wald test, and you will see the
value of the statistic for each predictor in the column
labelled Wald.
• In this example, the major factors influencing whether
a person reports having a sleep problem are: difficulty
getting to sleep, trouble staying asleep and the number
of hours sleep per weeknight. Gender and age did not
contribute significantly to the model.
Interpretation
• The B values provided in the second column are
equivalent to the B values obtained in a multiple
regression analysis. These are the values that you
would use in an equation to calculate the
probability of a case falling into a specific
category.
• You should check whether your B values are
positive or negative. This will tell you about the
direction of the relationship (which factors
increase the likelihood of a yes answerand which
factors decrease it).
Interpretation
• In this example, the variable measuring the number of
hours slept each weeknight showed a negative B value
(–.448). This indicates that the more hours a person
sleeps per night, the less likely it is that they will report
having a sleep problem.
• For the two other significant categorical variables
(trouble getting to sleep, trouble staying
• asleep), the B values are positive. This suggests that
people saying they have difficulty getting to sleep or
staying asleep are more likely to answer yes to the
question whether they consider they have a sleep
problem.

You might also like