Logistic Regression
Logistic Regression
The independent variable Could be correlated with Should not be correlated with
each other (Specially in each other (no
multiple linear regression) multicollinearity exist).
Logistic regression
When we are trying to predict membership of only two categorical outcomes the
analysis is known as binary logistic regression, but when we want to predict
membership of more than two categories we use multinomial (or polychotomous)
logistic regression.
Categorical Response Variables
Examples:
Non − smoker
Whether or not a person smokes Y =
Binary Response Smoker
Survives
Success of a medical treatment Y =
Dies
b0 = the Y intercept
b1 = the gradient of the straight line
X1 = the value of the predictor variable
ε = residual term
Expressing the multiple linear regression equation in logarithmic terms (called the
logit) and thus overcomes the problem of violating the assumption of linearity.
Assumptions
2. Independence of errors: It means that cases of data should not be related; for
example, the same people can not be measured at different points in time.
Violating this assumption produces over dispersion.
One way around this problem is to transform the data using the logarithmic
transformation. This transformation is a way of expressing a non-linear
relationship in a linear way. The logistic regression equation described above is
based on this principle: it expresses the multiple linear regression equation in
logarithmic terms (called the logit) and thus overcomes the problem of violating
the assumption of linearity.
Why use logistic regression?
▪ There are many important research topics for which the dependent
variable is "limited."
Suppose you shot 100 free throws and made 70. Based on this sample, the
probability of making a free throw is 70%. The odds of making a free throw can
be calculated as:
Odds = 0.70 / (1–0.70) = 2.333
More crucial to the interpretation of logistic regression is the value of the odds ratio
(Exp(B) in the SPSS output), which is an indicator of the change in odds resulting from a
unit change in the predictor.
The odds of an event occurring are defined as the probability of an event occurring
divided by the probability of that event not occurring.
Odds
Definition:
P P(Yes)
= is the odds of Yes.
1 − P P( No)
P odds
odds = P=
1− P 1 + odds
Odds
Logit form of the model:
æ p ö
log ç ÷ = b + b X
è 1- p ø 0 1
p b0 + b1 X
odds = =e
1- p
Odds ratio
Odds of an event happening is defined as the likelihood that an event will occur,
expressed as a proportion of the likelihood that the event will not occur. Therefore, if
A is the probability of subjects affected and B is probability of subjects not affected,
then odds = A /B.
OR = 1 indicates no effect
OR >1 indicates increased occurrence of event (Risk factor)
OR <1 indicates decreased occurrence of event (protective exposure)
Calculating Odds Ratio
Where
a = Number of exposed cases
b = Number of exposed non-cases
c = Number of unexposed cases
d = Number of unexposed non-cases
Ill people: people who ate ice cream / people who did not = 13/17
People who are not ill: people who ate ice cream / people who did not = 32/23
An odds ratio of exactly 1 means that exposure to property A does not affect the odds
of property B.
An odds ratio of more than 1 means that there is a higher odds of property B
happening with exposure to property A.
An odds ratio is less than 1 is associated with lower odds.
Example: TMS for Migraines
Transcranial Magnetic Stimulation vs. Placebo
Pain Free? TMS Placebo
YES 39 22
NO 61 78
Total 100 100
In the study, 186 of the 263 adolescents previously judged as having experienced a
suicidal behavior requiring immediate psychiatric consultation did not exhibit
suicidal behavior (non-suicidal, NS) at six months follow-up. Of this group, 86 young
people had been assessed as having depression at baseline. Of the 77 young people
with persistent suicidal behavior at follow-up (suicidal behavior, SB), 45 had been
assessed as having depression at baseline.
First we determine the numbers to use for (a), (b), (c), (d)
Thus, the odds of persistent suicidal behaviour is 1.63 higher given baseline
depression diagnosis compared to no baseline depression.
Log-liklihood
The logistic regression model predicts the probability of an event occurring for a
given person (we would denote this as P(Yi) the probability that Y occurs for the ith
person), based on observations of whether or not the event did occur for that person
(we could denote this as Yi, the actual outcome for the ith person). For a given person,
Y will be either 0 (the outcome didn’t occur) or 1 (the outcome did occur), and the
predicted value, P(Y), will be a value between 0 (there is no chance that the outcome
will occur) and 1 (the outcome will certainly occur).
observed odds
=128/187
= .684
Block 1
P(continue) P(continue)
= 1.44/.43 = 3.35
The Exp(B) or odds ratio tells us that the model predicts that the odds of deciding to
continue the research are 3.35 times higher for men than they are for women. For
the men, the odds are 1.44, and for the women they are 0.429.
Classification table
The results of our logistic regression can be used to classify subjects with respect to
what decision we think they will make. The model leads to the prediction that the
probability of deciding to continue the research is 30% for women and 59% for men.
Before we can use this information to classify subjects, we need to have a decision
rule. Our decision rule will take the following form: If the probability of the event is
greater than or equal to some threshold, we shall predict that the event will take
place. By default, SPSS sets this threshold to .5. While that seems reasonable, in
many cases we may want to set it higher or lower than .5. Using the default
threshold, SPSS will classify a subject into the “Continue the Research” category if
the estimated probability is .5 or more, which it is for every male subject. SPSS will
classify a subject into the “Stop the Research” category if the estimated probability is
less than .5, which it is for every female subject.
Two terms:
Test sensitivity is the ability of a test to correctly identify those with the disease (true
positive rate), whereas test specificity is the ability of the test to correctly identify
those without the disease (true negative rate).
The Classification Table shows us that this rule allows us to correctly classify 68 / 128 = 53% of the
subjects where the predicted event (deciding to continue the research) was observed. This is
known as the sensitivity of prediction, the P(correct | event did occur), that is, the percentage of
occurrences correctly predicted. We also see that this rule allows us to correctly classify 140 / 187
= 75% of the subjects where the predicted event was not observed. This is known as the
specificity of prediction, the P(correct | event did not occur), that is, the percentage of
nonoccurrences correctly predicted. Overall our predictions were correct 208 out of 315 times,
for an overall success rate of 66%. Recall that it was only 59% for the model with intercept only.
We could focus on error rates in classification. A false positive would be predicting that the event
would occur when, in fact, it did not. Our decision rule predicted a decision to continue the
research 115 times. That prediction was wrong 47 times, for a false positive rate of 47 / 115 =
41%. A false negative would be predicting that the event would not occur when, in fact, it did
occur. Our decision rule predicted a decision not to continue the research 200 times. That
prediction was wrong 60 times, for a false negative rate of 60 / 200 = 30%.
Example
The first table above gives the overall test for the model that includes the predictors.
The chi-square value of 41.46 with a p-value of less than 0.0005 tells us that our
model as a whole fits significantly better than an empty model (i.e., a model with no
predictors).
The -2*log likelihood (458.517) in the Model Summary table can be used in
comparisons of nested models.
In the table labeled Variables in the Equation we see the coefficients, their standard
errors, the Wald test statistic with associated degrees of freedom and p-values, and the
exponential coefficient (also known as an odds ratio). Both gre and gpa are statistically
significant. The overall (i.e., multiple degree of freedom) test for rank is given first,
followed by the terms for rank=1, rank=2, and rank=3. The overall effect of rank is
statistically significant, as are the terms for rank=1 and rank=2. The logistic regression
coefficients give the change in the log odds of the outcome for a one unit increase in
the predictor variable.
For every one unit change in gre, the log odds of admission (versus non-admission)
increases by 0.002.
For a one unit increase in gpa, the log odds of being admitted to graduate school
increases by 0.804.
The indicator variables for rank have a slightly different interpretation. For example,
having attended an undergraduate institution with rank of 1, versus an institution with
a rank of 4, increases the log odds of admission by 1.551.
Building Model: What is the equation of the binary logistic regression model