0% found this document useful (0 votes)

62 views19 pages

Logistic Regression Playbook

This document provides an overview of logistic regression, including how it differs from linear regression, how it can be used to predict dichotomous outcomes, and how to interpret the results. Logistic regression estimates the probability of an outcome being 1 based on independent variables using a logistic function that outputs values between 0 and 1.

Uploaded by

sohel.digitalhive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views19 pages

Logistic Regression Playbook

Uploaded by

sohel.digitalhive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Logistic

Regression
Playbook
1. Theory
2. Example
3. Interpretation

Author: Dr. Mathias Jesussek

©DATAtab e.U. | Graz | 2023
What is the difference between a
What is a regression
linear regression and a logistic
A regression analysis is a method for
regression?
modeling relationships between
variables.
In a linear regression, the dependent
It makes it possible to infer variable is a metric variable, e.g. salary
or predict a variable or electricity consumption.

based on one or more

other variables.
In a logistic regression, the
dependent variable is a
The variable we want to infer or
predict is called the dependent
dichotomous variable.
variable or criterion.
What is a dichotomous variable?

Dichotomous variables are variables

with only two values.

For example:
Whether a person buys or does
not buy a particular product
or
whether a disease is
present or not

The variables we use for prediction

are called independent
variables or predictors.
How can logistic Our data set might look like this:
regression be used
Here we have the
independent variables
With the help of logistic
regression, we can determine what
has an influence on whether a certain
disease is present or not. Age Gender Smoker status Disease
22 female Non-smoker 1
25 female Smoker 1
18 male Smoker 0
45 male Non-smoker 0
12 female Smoker 0
43 male Smoker 1
23 male Smoker 0
33 male Smoker 1
… … … …

and here the dependent

variable with 0 and 1.

We could study the influence of We could now investigate what influence the
age, gender and smoking status independent variables have on the disease.
on that particular disease. If there is an influence, then we can predict
how likely a person is to have a certain disease.

In this case 0 stands for not diseased

and 1 for diseased
Now, of course, the
question arises:
Why do we need logistic
regression in this case?
Why can't we just use linear
and the probability for the occurrence
of the characteristic 1 (=characteristic regression?
present) is estimated.
A quick recap: A linear regression would now simply
In linear regression, this is put a straight line through the points.
our regression equation:

We have the the 1

dependent variable independent variables

and the regression coefficients. x

We can now see, that in the case of

linear regression, values between
However, we now have a
dependent variable that is either
plus and
0 or 1.

y
y
1
1

0
0
x
No matter which value we have for the x
independent variables, only 0
or 1 results. minus infinity can occur.
However, the goal of logistic No matter where we are on the x-axis,
regression is to estimate the
probability of occurrence.
1
The value range for the prediction
should therefore be between 0 and 1.
1/2

y
-∞ 0 +∞
1
between minus and plus infinity only
values between 0 and 1 result.

0
And that is exactly
x
what we want!

So we need a function that only

takes values between 0 and 1!

The equation for the logistic

And that is exactly what the function looks like this:

logistic function does.

1
1/2 The logistic function is now
used by the logistic regression.

-∞ 0 +∞
For z, the equation of the linear
regression is now simply inserted.

This gives us this equation:

Thus, the probability that the

dependent variable is 1 is given by:

What does this look

like for our example

In our example,
the probability of having a certain disease

is a function of age, gender and smoking status.

For z, the equation of the linear regression
is now simply inserted.

This gives us this equation:

Thus, the probability that the dependent

variable is 1 is given by:

What does this look like for our example

In our example,
the probability of having a certain disease

is a function of age, gender and smoking status.

Now we need to determine the coefficients

so that our model best represents the given data.

To solve this problem, the so-called

maximum likelihood method is used.
For this purpose, there are good numerical
methods that can solve the problem efficiently.
But how do you interpret the results
of a logistic regression

Age Gender Smoker status Disease

22 female Non-smoker 1
Let's take a look at this
25 female Smoker 1
fictitious example.
18 male Smoker 0
45 male Non-smoker 0
12 female Smoker 0
43 male Smoker 1
23 male Smoker 0
33 male Smoker 1
… … … …

If you like, you can download the example dataset for free and

follow the steps in parallel. Please just use this link.

Or load it from the logistic Regression tutorial

When you use the link, the data is automatically loaded.

We want to calculate a logistic regression,

so we just click on regression.
When we copy our data in here, the
variables show up down here.

Depending on how your dependent variable is

scaled, DATAtab will calculate either a logistic or a
linear regression under the tab Regression.
We choose disease as the dependent variable and age,
gender, and smoking status as the independent variables.
Datatab now calculates a logistic regression for us.
If you don't know how to interpret
the results, you can click on

We will now go through all the

tables slowly and understandably.
Let's start at the top.
Let‘s Start

The first thing that is displayed is the results table. In the results
table you can see that a total of 36 people were examined.

With the help of the calculated regression model, 26 of

36 persons could be correctly assigned. That is 72.22%!

Then comes the classification table.

Here you can see how often the categories

not diseased and diseased were observed
and how often they were predicted.
In total, "not diseased" was observed 16 times.

Of these 16 individuals, the regression model correctly scored

11 as not diseased and incorrectly scored 5 as diseased.
Of the 20 diseased individuals, 15 were correctly scored as
diseased and 5 incorrectly scored as diseased.

To be noted:
For deciding whether a person is diseased or not the
threshold of 50% is used.

50%

-∞ 0 +∞

If the regression model estimates a value greater than 50%,

this person is assigned “diseased”, otherwise “not diseased”.
Now comes the Chi2 test.

Here we can read whether the model as

a whole is significant or not.

Two models are compared for this purpose

In one model all independent variables are used

and in the other model the independent variables are not used.

With the help of the Chi2 test we compare how good the prediction is
when the dependent variables are used and how good it is when the
dependent variables are not used and the Chi2 test “tells us” if there is a
significant difference between these two results.

The null hypothesis is that both models are the same.

If the p-value is less than 0.05,

this null hypothesis is rejected.

In our example, the p-value is less than

0.05 and we assume that there is a
significant difference between the
models. Thus, the model as a whole is
significant.
Next comes the model summary.

In this table we see on the one hand the -2 log likelihood value and on the
other hand we are given different coefficients of determination R2.

R2 is used to find out how well the regression model explains the
dependent variable. In a linear regression, the R2 indicates the
proportion of the variance that can be explained by the independent
variables. The more variance can be explained, the better the regression
model.

However, in the case of logistic regression, the meaning is

different and there are different ways to calculate the R2. Unfortunately,
there is also no agreement yet on which way is the "best" way.

DATAtab gives you the R2 according to

Cox and Snell, according to Nagelkerke and according to McFadden.
And now comes the most important table.
The table with the model coefficients.

The most important parameters are

the coefficient B, the p-value and the odds ratio.

Coefficients B
In the first column we can read the calculated
coefficients from our model.

We can insert these into the

regression equation.
If we insert the coefficients, we get the following regression equation:

With this we can now calculate the probability that a

person is diseased.

Example:
We want to know how likely a person who is 55 years old,
female, and smoker is to be diseased.

We insert:
55 for the age
0, because the person is female
and 1, as the person is a smoker.
This gives us 0.69 or 69%.

Thus, it is 69% likely that a 55-year-old female smoker is diseased.

Based on this prediction, it could now be decided whether to do

another extensive investigation.

The example is purely fictitious.

In reality, there would certainly be many other and different

independent variables.

…
But now back to the table!

In this column we can read whether the coefficient is

significantly different from zero.

The following null hypothesis is tested:

The coefficient is zero in the population.

So, if the value is smaller than 0.05, the respective

coefficient has a significant influence.
In our example, we see that none of the coefficients have a
significant impact, as all p-values are greater than 0.05.

Odds ratio
In this column we can then read the odds ratio.

For example, the odds ratio of 1.04 means that a one unit increase in the
variable age increases the probability that a person is sick by 1.04 times.
If you liked this Playbook
feel free to share it!
Of course we are also happy if you
visit us on datatab.net.

Handbook of Parametric and Nonparametric Statistical Procedures PDF
100% (3)
Handbook of Parametric and Nonparametric Statistical Procedures PDF
972 pages
Lineare Regrassion and Correlation For MPH
No ratings yet
Lineare Regrassion and Correlation For MPH
119 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Lecture 7 Logistic Regression
No ratings yet
Lecture 7 Logistic Regression
33 pages
Regression Categorical Variables 2024-2
No ratings yet
Regression Categorical Variables 2024-2
61 pages
Logistic Regression Mini Tab
100% (3)
Logistic Regression Mini Tab
20 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Statistics For The Social Sciences 1729780459. Print
No ratings yet
Statistics For The Social Sciences 1729780459. Print
1,113 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
No ratings yet
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
54 pages
Econometrics CH 1-4
100% (1)
Econometrics CH 1-4
315 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Logistic Regression 1
No ratings yet
Logistic Regression 1
6 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Module 4 - Logistic Regression - Afterclass1b
No ratings yet
Module 4 - Logistic Regression - Afterclass1b
54 pages
Logistic Regression-1
No ratings yet
Logistic Regression-1
27 pages
Lecture3-Logistic Regression 6-5-08
No ratings yet
Lecture3-Logistic Regression 6-5-08
72 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Week 8 - Logistic Regression
No ratings yet
Week 8 - Logistic Regression
67 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Logistic Regression
100% (1)
Logistic Regression
34 pages
Logistic Regression (2022)
No ratings yet
Logistic Regression (2022)
44 pages
Lect7 Math231
No ratings yet
Lect7 Math231
29 pages
Logistic+Regression+Monograph+ +DSBA+v2
No ratings yet
Logistic+Regression+Monograph+ +DSBA+v2
54 pages
Logistic Regression
100% (2)
Logistic Regression
47 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Lecture 5. Part 1 - Regression Analysis
No ratings yet
Lecture 5. Part 1 - Regression Analysis
28 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Logistic Regression Models: Series: Basic Statistics For Busy Clinicians (Vii)
No ratings yet
Logistic Regression Models: Series: Basic Statistics For Busy Clinicians (Vii)
11 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Logistic Regression
100% (2)
Logistic Regression
32 pages
ARCH Models: Time Series Econometrics
No ratings yet
ARCH Models: Time Series Econometrics
9 pages
Friday, August 16, 2019 8:20 PM: Final Test Page 1
No ratings yet
Friday, August 16, 2019 8:20 PM: Final Test Page 1
4 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
79 LogisticReg - Cleaned
No ratings yet
79 LogisticReg - Cleaned
4 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Z Test
No ratings yet
Z Test
18 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
Sit 212 Lecture Note
No ratings yet
Sit 212 Lecture Note
99 pages
Log Reg
No ratings yet
Log Reg
32 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Regression With Categorical Variables
No ratings yet
Regression With Categorical Variables
28 pages
Message
No ratings yet
Message
2 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Laboratory 10
No ratings yet
Laboratory 10
8 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic Ordinal Regression
No ratings yet
Logistic Ordinal Regression
10 pages
Assignment On Statistics For Banking & Finance Students
No ratings yet
Assignment On Statistics For Banking & Finance Students
13 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Ref. CH 3 Gujarati Book
No ratings yet
Ref. CH 3 Gujarati Book
51 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
OMBC106 Research Methodology
No ratings yet
OMBC106 Research Methodology
13 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Tutorial 1-14 Student S Copy 201605
0% (2)
Tutorial 1-14 Student S Copy 201605
27 pages
Presentation Generalized Linear Model Theory
No ratings yet
Presentation Generalized Linear Model Theory
77 pages
Logistic Regression in Minitab
No ratings yet
Logistic Regression in Minitab
4 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Correlation Regression
No ratings yet
Correlation Regression
62 pages
ANOVA and MANOVA: Statistics For Psychology
No ratings yet
ANOVA and MANOVA: Statistics For Psychology
34 pages
SLR Assignment1 CS
No ratings yet
SLR Assignment1 CS
3 pages
9.3statistical Tables
No ratings yet
9.3statistical Tables
6 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
Problem 7.5 A)
No ratings yet
Problem 7.5 A)
11 pages
SAS 02 - MAT089 (Biostat) - Branches of Statistics, Biostatistics
No ratings yet
SAS 02 - MAT089 (Biostat) - Branches of Statistics, Biostatistics
6 pages
Powelletal 2019 Aquaculture Research
No ratings yet
Powelletal 2019 Aquaculture Research
11 pages
Netflix PDF
No ratings yet
Netflix PDF
12 pages
SARIMA Model RMSE 1
No ratings yet
SARIMA Model RMSE 1
9 pages
6338 - Multicollinearity & Autocorrelation
No ratings yet
6338 - Multicollinearity & Autocorrelation
28 pages
#Quantitative Analysis Excel
No ratings yet
#Quantitative Analysis Excel
5 pages
Live Chat With Ace?
No ratings yet
Live Chat With Ace?
1 page
Home Assignment CO2
No ratings yet
Home Assignment CO2
3 pages
Are Crude Oil, Gas and Coal Prices Cointegrated?: Iranian Economic Review, Vol.15, No.28, Winter 2011
No ratings yet
Are Crude Oil, Gas and Coal Prices Cointegrated?: Iranian Economic Review, Vol.15, No.28, Winter 2011
23 pages
Parameter Estimation of Bernoulli Distribution Using Maximum Likelihood and Bayesian Methods
No ratings yet
Parameter Estimation of Bernoulli Distribution Using Maximum Likelihood and Bayesian Methods
14 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Chapter 2 Curve Fitting, Regression and Correlation
No ratings yet
Chapter 2 Curve Fitting, Regression and Correlation
18 pages
Econ 303: Homework 2: by 11:59 PM Via Dropbox
No ratings yet
Econ 303: Homework 2: by 11:59 PM Via Dropbox
3 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

Logistic Regression Playbook

Uploaded by

Logistic Regression Playbook

Uploaded by

Logistic

Author: Dr. Mathias Jesussek

based on one or more

Dichotomous variables are variables

The variables we use for prediction

and here the dependent

In this case 0 stands for not diseased

We have the the 1

and the regression coefficients. x

We can now see, that in the case of

So we need a function that only

The equation for the logistic

And that is exactly what the function looks like this:

This gives us this equation:

Thus, the probability that the

What does this look

is a function of age, gender and smoking status.

This gives us this equation:

Thus, the probability that the dependent

What does this look like for our example

is a function of age, gender and smoking status.

Now we need to determine the coefficients

To solve this problem, the so-called

Age Gender Smoker status Disease

follow the steps in parallel. Please just use this link.

Or load it from the logistic Regression tutorial

We want to calculate a logistic regression,

Depending on how your dependent variable is

We will now go through all the

With the help of the calculated regression model, 26 of

Then comes the classification table.

Here you can see how often the categories

Of these 16 individuals, the regression model correctly scored

If the regression model estimates a value greater than 50%,

Here we can read whether the model as

Two models are compared for this purpose

The null hypothesis is that both models are the same.

If the p-value is less than 0.05,

In our example, the p-value is less than

However, in the case of logistic regression, the meaning is

DATAtab gives you the R2 according to

The most important parameters are

We can insert these into the

With this we can now calculate the probability that a

Thus, it is 69% likely that a 55-year-old female smoker is diseased.

Based on this prediction, it could now be decided whether to do

The example is purely fictitious.

In reality, there would certainly be many other and different

In this column we can read whether the coefficient is

The following null hypothesis is tested:

The coefficient is zero in the population.

So, if the value is smaller than 0.05, the respective

You might also like