Logistic Regression Playbook
Logistic Regression Playbook
Regression
Playbook
1. Theory
2. Example
3. Interpretation
For example:
Whether a person buys or does
not buy a particular product
or
whether a disease is
present or not
We could study the influence of We could now investigate what influence the
age, gender and smoking status independent variables have on the disease.
on that particular disease. If there is an influence, then we can predict
how likely a person is to have a certain disease.
y
y
1
1
0
0
x
No matter which value we have for the x
independent variables, only 0
or 1 results. minus infinity can occur.
However, the goal of logistic No matter where we are on the x-axis,
regression is to estimate the
probability of occurrence.
1
The value range for the prediction
should therefore be between 0 and 1.
1/2
y
-∞ 0 +∞
1
between minus and plus infinity only
values between 0 and 1 result.
0
And that is exactly
x
what we want!
1
1/2 The logistic function is now
used by the logistic regression.
-∞ 0 +∞
For z, the equation of the linear
regression is now simply inserted.
In our example,
the probability of having a certain disease
In our example,
the probability of having a certain disease
If you like, you can download the example dataset for free and
The first thing that is displayed is the results table. In the results
table you can see that a total of 36 people were examined.
To be noted:
For deciding whether a person is diseased or not the
threshold of 50% is used.
50%
-∞ 0 +∞
and in the other model the independent variables are not used.
With the help of the Chi2 test we compare how good the prediction is
when the dependent variables are used and how good it is when the
dependent variables are not used and the Chi2 test “tells us” if there is a
significant difference between these two results.
In this table we see on the one hand the -2 log likelihood value and on the
other hand we are given different coefficients of determination R2.
R2 is used to find out how well the regression model explains the
dependent variable. In a linear regression, the R2 indicates the
proportion of the variance that can be explained by the independent
variables. The more variance can be explained, the better the regression
model.
Coefficients B
In the first column we can read the calculated
coefficients from our model.
Example:
We want to know how likely a person who is 55 years old,
female, and smoker is to be diseased.
We insert:
55 for the age
0, because the person is female
and 1, as the person is a smoker.
This gives us 0.69 or 69%.
…
But now back to the table!
Odds ratio
In this column we can then read the odds ratio.
For example, the odds ratio of 1.04 means that a one unit increase in the
variable age increases the probability that a person is sick by 1.04 times.
If you liked this Playbook
feel free to share it!
Of course we are also happy if you
visit us on datatab.net.