0% found this document useful (0 votes)
75 views12 pages

Psy 512 Logistic Regression

The intervention significantly predicted whether participants were cured or not.

Uploaded by

Abhinav Nigam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views12 pages

Psy 512 Logistic Regression

The intervention significantly predicted whether participants were cured or not.

Uploaded by

Abhinav Nigam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

2/26/2017

PSY 512: Advanced Statistics for


Psychological and Behavioral Research 2

When and why do we use logistic


regression?
Binary
Multinomial
Theory behind logistic regression
Assessing the model
Assessing predictors
Things that can go wrong
Interpreting logistic regression

To predict an outcome variable that is categorical from


predictor variables that are continuous and/or
categorical
Used because having a categorical outcome variable
violates the assumption of linearity in normal regression
The only real limitation for logistic regression is that the outcome
variable must be discrete
Logistic regression deals with this problem by using a
logarithmic transformation on the outcome variable
which allow us to model a nonlinear association in a
linear way
It expresses the linear regression equation in logarithmic terms (called
the logit)

1
2/26/2017

Can the categories be correctly predicted


given a set of predictors?
What is the relative importance of each
predictor?
Are there interactions among predictors?
How good is the model at classifying cases
for which the outcome is known?

Absence of multicollinearity
No outliers
Independence of errors assumes a
between subjects design
There are other forms of logistic regression if the
design is within subjects
Ratio of cases to variables using discrete
variables requires that there are enough
responses in every given category
If there are too many cells with no responses, then
the model will not fit the data

Odds-like probability: Odds are usually written as 5 to


1 odds which is equivalent to 1 out of five or .20
probability or 20% chance, etc.
The problem with probabilities is that they are non-linear
Going from .10 to .20 doubles the probability, but going from .80 to
.90 barely increases the probability
Odds ratio: The ratio of the odds over 1 minus the odds
The probability of winning over the probability of losing
5 to 1 odds equates to an odds ratio of .20/.80 = .25.
Logit: This is the natural log of an odds ratio; often called
a log odds even though it really is a log odds ratio
The logit scale is linear and functions much like a z-score scale
Logits are continuous, like z scores
p = 0.50, then logit = 0
p = 0.70, then logit = 0.84
p = 0.30, then logit = -0.84

2
2/26/2017

An ogive function is a
curved s-shaped
function and the most
common is the logistic
function which looks
like:


Where Y is the estimated probability that
the ith case is in a category and U is the
regular linear regression equation:
U = A + B1X1 + B2X2 ++BKXK

For a response variable y with p(y=1)= P and p(y=0) = 1- P


Probability of coronary
heart disease

Logistic regression will allow for


the estimation of an equation that
fits a curve the age/probability of
CHD relationship

A regression method to deal


with the case when the
dependent variable y is
binary (dichotomous)

3
2/26/2017

Change in probability is not constant (linear) with


constant changes in X
This means that the probability of a success (Y = 1)
given the predictor variable (X) is a non-linear
function, specifically a logistic function
It is not obvious how the regression coefficients for
X are related to changes in the dependent variable
(Y) when the model is written this way
Change in Y(in probability units)|X depends on value of X
Look at S-shaped function
The values in the regression equation A and B1
take on slightly different meanings.
A The regression constant (moves curve left and right)
B1 The regression slope (steepness of curve)

Constant regression
constant different
slopes
v2: A = -4.00
B1 = 0.05
v3: A = -4.00
B1 = 0.15
v4: A = -4.00
B1 = 0.025

Constant slopes
with different
regression
constants
v2: A = -3.00
B1 = 0.05
v3: A = -4.00
B1 = 0.05
v4: A = -5.00
B1 = 0.05

4
2/26/2017

The logistic regression equation can be written in


terms of an odds ratio for success
Odds ratios range from 0 to positive infinity
Odds ratio: P/Q is an odds ratio; less than 1 = less
than .50 probability, greater than 1 means greater
than .50 probability
P = probability of success; Q = probability of failure
Log-odds are a linear function of the predictors
The regression coefficients go back to their old
interpretation (kind of)
The expected value of the logit (log-odds) when X = 0
Called a logit difference; The amount the logit (log-odds)
changes, with a one unit change in X; the amount the logit
changes in going from X to X + 1

Outcome
We predict the probability of the outcome occurring
A and B1
Can be thought of in much the same way as multiple
regression
Note the normal regression equation forms part of the
logistic regression equation
This is the probability of
Y occurring

Outcome
We predict the probability of the outcome occurring
A and B1
Can be thought of in much the same way as multiple
regression
Note the normal regression equation forms part of the
logistic regression equation
This is the base of natural
logarithms. It is a constant that
is approximately equal to
2.718281828. The natural
logarithm of a number X is the
power to which e would have
to be raised to equal X. It is
very helpful for estimating the
area under a curve

5
2/26/2017

Outcome
We predict the probability of the outcome occurring
A and B1
Can be thought of in much the same way as multiple
regression
Note the normal regression equation forms part of the
logistic regression equation
This is the simple linear
regression model. Y-
intercept moves the
curve left or right. The
slope influences the
steepness of the curve

Outcome
We still predict the probability of the outcome occurring
Differences
Note the multiple regression equation forms part of the
logistic regression equation
This part of the equation expands to accommodate
additional predictors

The Log-likelihood statistic


Analogous to the residual sum of squares in multiple
regression
It is an indicator of how much unexplained
information there is after the model has been fitted
Large values indicate poorly fitting statistical
models

6
2/26/2017

Indicates the change in odds resulting from a unit


change in the predictor.
Odds Ratio > 1: Predictor , Probability of outcome
occurring
Odds Ratio < 1: Predictor , Probability of outcome
occurring

Simultaneous: All variables entered at the same


time
Hierarchical: Variables entered in blocks
Blocks should be based on past research, or theory
being tested (Best Method)
Stepwise: Variables entered on the basis of
statistical criteria (i.e., relative contribution to
predicting outcome)
Should be used only for exploratory analysis

Predictors of a treatment intervention


Participants
113 adults with a medical problem
Outcome:
Cured (1) or not cured (0)
Predictor:
Intervention: intervention (1) or no treatment (0)
SPSS Syntax:
compute a=intervention.
LOGISTIC REGRESSION VAR=cured
/METHOD=ENTER a
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

7
2/26/2017

This tells us how SPSS has


coded our outcome
variable. If we used 0 and
1, then it will be the same
as we used. If we used
something else (e.g., 1 and
2), then SPSS will convert it
to 0 and 1

This tells us how SPSS has


coded our categorical
predictor variable. If we
used 0 and 1, then it will be
the same as we used

This assesses model fit


with larger values
corresponding to poorer
fitting models. The Log
Likelihood is multiplied by
-2 because this gives it an
approximate chi-square
distribution

8
2/26/2017

The initial model involves the


outcome variable without any
predictors in the model so
SPSS defaults to predicting
the most likely outcome. 65
were cured and 48 were
not cured so it will choose
cured as the default .

This represents the Y-intercept without any predictors


in the model

This table presents the information for the variables


that were not included in the Step 0 model

9
2/26/2017

This model includes


intervention as a predictor
variable. The -2 Log
Likelihood assess model fit
(lower values indicate better
fit). The chi-square test
compares the fit of this
model with the Step 0 model

This table identifies the


accuracy of the predictive
model when intervention
was included as a predictor
variable

This is a pseudo-R2 which


allows us to estimate how
much of the variability in the
outcome variable can be
explained by the model

10
2/26/2017

This value is the unstandardized regression coefficient


that represents the slope of the model. It represents the
change in the logit of the outcome variable (natural
logarithm of the odds of Y occurring) associated with a
one-unit change in the predictor variable

The Wald statistic is the crucial value because it tells us


whether the B coefficient is significantly different from
0. If it is significantly different from 0, then we can
assume that the predictor is making a significant
contribution to the prediction of the outcome variable

This is the odds-ratio which is the odds (success)


over 1 minus the odds (failure). In this example, we
can say that the odds of a patient who is treated being
cured are 3.41 times higher than those of a patient
who is not treated

11
2/26/2017

The overall fit of the final model is shown by


the -2 log-likelihood statistic
If the significance of the chi-square statistic is less than
.05, then the model provides a significant fit for the data
Check the table labelled Variables in the
equation to see which variables significantly
predict the outcome
Use the Wald statistic or the odds ratio, Exp(B),
for interpretation
Odds Ratio > 1, then as the predictor increases, the odds
of the outcome occurring increase
Odds Ratio < 1, then as the predictor increases, the odds
of the outcome occurring decrease

Logistic regression to predict membership of more


than two categories
It (basically) works in the same way as binary
logistic regression
The analysis breaks the outcome variable down into
a series of comparisons between two categories.
Example: if you have three outcome categories (A, B, and C),
then the analysis will consist of two comparisons that you
choose:
Compare everything against your first category (e.g. A vs. B and
A vs. C),
Or your last category (e.g. A vs. C and B vs. C),
Or a custom category (e.g. B vs. A and B vs. C).
The important parts of the analysis and output are
much the same as we have just seen for binary
logistic regression

12

You might also like