Logistic Regression

Binary logistic regression is used when the response variable is dichotomous (binary). It can be used for classification and to estimate the relationship between predictor variables and the likelihood of the response variable being 1 or 0. Key aspects include using the logit link function to transform the probability into odds, interpreting coefficients as log odds ratios, and assessing model fit using -2 log likelihood and pseudo R-squared values.

Uploaded by

tsandrasanal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views25 pages

Logistic Regression

Uploaded by

tsandrasanal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Binary Logistic Regression

• Logistic Regression, more commonly called logit regression, is used

when the response variable is dichotomous (i.e., binary or 0-1). The
predictor variable may be quantitative, categorical, or a mixture of
the two.
• Logistic regression does not requires the strict assumptions of
multivariate normality and equal variance – covariance matrices
across groups. Even if the assumptions are met, many researchers
prefer logistic regression because it is more similar to multiple
regression.
Assumptions
1. The outcome variable is binary. However, if the dependent variable
has three or more outcomes, then multinomial or ordinal logistic
regression shall be used.
2. The observations must be independent of each other
3. The logistic model assumes a linear relationship between the logit
and the independent variables. Box-Tidwell test is used to check
this assumption
4. Absence of multicollinearity
Objectives
1. Explanation- providing estimates of the ability of a set of
independent variables collectively and individual to distinguish
between a binary outcome
2. Classification – provide a means for classification of cases into the
outcome groups and provide a range of diagnostic measures of
predictive accuracy
Assigning binary values
• Logistic regression approaches the task of assigning binary values to
the dependent variable. It does not matters which group is assigned
the value of 1 versus 0, but this assignment (coding) must be noted
for the interpretation of the coefficients.
• If the group represent outcomes or events (e.g. success or failure).
Assume that the group with success is coded as 1, with failure coded
as 0. Then coefficients represent the impacts on the likelihood of
success. The coefficients would reflect the impact of the independent
variables on the likelihood of the group coded as 1.
The logistic function
• The basic form of the logistic function is
1
P= ---------------------------- (1)
1+exp(−𝑧)
where z is the predictor variable.
If the numerator and denominator of the right side of the above
equation are multiplies by exp(𝑧), then
exp(𝑧)
P= ------------------------------- (2)
1+exp(𝑧)
The logistic function
• Instead of a straight line, the logistic function fit some kind of sigmoid
curve to the observed points.
• The tails of the sigmoid curve level off before reaching P=0 or P=1, so
that the problem of impossible values of P is avoided.
The logistic function

• A property of the logistic function, is that when Z becomes infinitely

negative, exp(-z) becomes infinitely large, so that P approaches 0.
When Z becomes infinitely positive, exp(-z) becomes infinitesimally
small, so that P approaches unity.

• When Z=0, exp(-z) =1, so that P =0.5. Thus the logistic curve has its
“center” at (Z,P) = (0, 0.5). This point is called an inflection point. The
logistic curve is symmetric about its inflection point.
The multivariate logistic function
• Here Z is a linear function of a set of predictor variables:
Z = 𝑏0 + 𝑏1𝑋1 + 𝑏2𝑋2 + ⋯ … . +𝑏𝑘𝑋𝑘
Then
𝟏
P= ---------- (3)
𝟏+𝐞𝐱𝐩[ − 𝒃𝟎 + 𝒃𝟏𝑿𝟏 + 𝒃𝟐𝑿𝟐+⋯….+𝒃𝒌𝑿𝒌 ]
this function ranges between 0 and 1.
The Logit Link Function
• The basic form of the logistic function as given in equation (1) is
1
P= ---------------------------- (1)
1+exp(−𝑧)
It follows that
1 𝑒𝑥𝑝 (−𝑧)
1-P = 1- = ---------------- (4)
1+exp(−𝑧) 1+exp(−𝑧)
Dividing (1) by (4) yields
𝑃
= exp (z) ------------------------------- (5)
1−𝑃

Taking the natural logarithm of both sides of (5), we get

𝑃
Log =Z
1−𝑃
The Logit Link Function
𝑃
• The quantity is called the odds, denoted more concisely as Ώ and the
1−𝑃
𝑃
quantity log [ ] is called log odds or the logit of P. Thus
1−𝑃

𝑃
• Odds = =Ώ
1−𝑃

and
𝑃
Logit P = log [ ] = log Ώ
1−𝑃
A link function is simply a function of the mean of the response variable Y
that we use as the response instead of Y itself. This implies we use the logit
of Y as the response in our regression equation instead of just Y
The Logit Link Function
• The multivariate equation becomes
logit P = 𝑏0 + 𝑏1𝑋1 + 𝑏2𝑋2 + ⋯ … . +𝑏𝑘𝑋𝑘
𝑃
log = 𝑏0 + 𝑏1𝑋1 + 𝑏2𝑋2 + ⋯ … . +𝑏𝑘𝑋𝑘 ------ (6)
1−𝑃

The equation (6) is the form of a multiple regression equation and the
coefficients shall be interpreted in a similar way. Multiples regression
employs the method of least squares for estimating parameters.
However, logistic regression uses maximum likelihood procedure.
Odds Ratio as measures of effects on the odds
• The trouble is that logit P is not a familiar quantity, so that the
meaning of these effects are not very clear.

• So we use another a measure called odds ratio to interpret the impact

of independent variables.
Odds Ratio as Measures of Effect on the Odds
Consider a model
Logit P = a+bE+CU+dI ------- (7)
P: estimated probability of health awareness
E: number of completed years of education
U: 1 if urban, 0 otherwise
I: 1 if Indian, 0 otherwise
Odds Ratio as Measures of Effect on the Odds
The model may expressed as
logΏ = a+bE+cU+dI ------ (10)
Taking the exponential of both sides, we obtain
Ώ = e a+bE+cU+dI
Suppose we increase E by one unit, holding U and I constant.
Denoting the new value of Ώ as Ώ*, we have
Ώ ∗ = e a+b(E+1)+cU+dI
= e a+bE+cU+dI+b
= e a+bE+cU+dI eb
= Ώeb ------------------- (11)

Which can be written alternatively as

Ώ∗ b
= e ---------------------- (12)
Ώ
Odds Ratio as Measures of Effect on the Odds
From (11), it is evident that a one-unit increase in E, holding other
predictor variables constant, multiplies the odds by the factor eb . The
quantity is eb called an odds ratio.

The original coefficient b represents the additive effect of a one-unit

change in E on the log odds of using contraception. Equivalently, the
odds ratio eb represents the multiplicative effect of a one-unit change in
E on the odds of using contraception. Insofar as the odds is a more
intuitively meaningful concept than the log odds, eb is more readily
understandable than b as a measure of effect.
Assessing the goodness-of-fit of the
estimated model
• There are two primary methods for the evaluation of model fit. The
first is to use an overall measure of statistical significance of the
model fit and also “Psedu” R2 values.
• The second approach is to examine predictive accuracy where the
ability of the model to correct classify the outcome measure is
computed in what is termed a classification matrix.
Assessing the goodness-of-fit of the
estimated model
Model estimation fit: the basic measure of how well the maximum
likelihood estimation procedure fits is the likelihood. Logistic
regression measures model estimation fit with the value of -2 times the
log of the likelihood value, referred to as -2LL or -2 log likelihood. The
minimum value for -2LL is 0, which corresponds to a perfect fit
(likelihood=1 and -2LL is then 0). Thus, the lower the -2LL value, the
better the fit of the model.
Assessing the goodness-of-fit of the
estimated model
Between Model Comparisons: The likelihood value can be compared
between equations to assess the difference in predictive fit from one
equation to another, with statistical tests for the significance of these
differences. The basic approach follows three steps:
1. Estimate a null model: the first step is to calculate a null model,
which acts as the baseline for making comparisons of improvement
in model fit. The most common null model is one without any
independent variables. The logic behind this form of null model is
that it can acts as a baseline against which any model containing
independent variables can be compared.
Assessing the goodness-of-fit of the
estimated model
2. Estimate the proposed model: This model contains the
independent variables to be included in the logistic regression
model. Hopefully, model fit will improve from the null model and
result in a lower -2LL value. Any number of proposed models can be
estimated.
3. -2LL difference: The final step is to assess the statistical significance
of the -2LL value between the two models (null model versus
proposed model). If the statistical tests support significant
differences, then we can state that the set of independent variables
in the proposed model is significant in improving model estimation
fit and that model as a whole is statistically significant.
Pseudo R2 measures

• The Pseudo R2 measures are interpreted in manner similar to the

coefficient of determination in multiple regression. A pseudo R2 value
can be easily derived for logistic regression similar to the R2 value in
regression analysis. The Pseudo R2 for a logit model (R2 logit) can be
calculated as:
−2𝐿𝐿𝑛𝑢𝑙𝑙 −(−2𝐿𝐿𝑚𝑜𝑑𝑒𝑙)
R2 logit = −2𝐿𝐿𝑛𝑢𝑙𝑙
The logit R2 value ranges from 0.0 to 1.0. As the proposed model
increases model fit, the -2LL value decreases. A perfect fit has a -2LL
value of 0.0 and a R2 logit of 1.0.
Pseudo R2 measures

• Two other measures are similar in design to the pseudo R2 value and
are generally categorized as pseudo R2 measures as well. The Cox and
Snell R2 measure operates in the same manner, with high values
indicating greater model fit. However, this measure is limited in that it
cannot reach the maximum value of 1, so Nagelkerke proposed a
modification that had the range of 0 to 1.
• Both of these additional measures are interpreted as reflecting the
amount of variation accounted for by the logistic model, with 1.0
indicating perfect model fit.
Hosmer-Lemeshow Test
• The Hosmer-Lemeshow test (HL test) is a goodness of fit test for
logistic regression, especially for risk prediction models.
• The test is only used for binary response variables
• It calculates whether the observed event rates match the expected
event rates in population subgroups.
• If the p-value is less than 0.05,then the model is poor fit.
Interpreting coefficients
• Interpreting the direction of original coefficients: The sign of the original
coefficients (positive or negative) indicates the direction of the relationship, just
as seen in regression coefficients. A positive coefficient increases the probability,
whereas a negative decreases the predictive probability.
• Interpreting the direction of exponentiated coefficients: Exponentiated
coefficients must be interpreted differently because they are the logarithms of
the original coefficient. By taking the logarithm, we are actually stating the
exponentiated coefficient in terms of odds, which means that exponentiated
coefficients will not have negative values. Because the logarithm of 0 (no effect) is
1.0, an exponentiated coefficient of 1.0 actually corresponds to a relationship
with no direction. Thus, exponentiated coefficients above 1.0 reflects a positive
relationship and values less than 1.0 represents negative relationships.
• Percentage change in odds = (exponentiated coefficienti – 1.0)*100
Testing for significance of the coefficients
• We use a statistical test called Wald statistic, to see whether the
logistic coefficient is different from 0.
• Wald statistic is test is used to test hypothesis about parameters that
are estimated by maximum likelihood method.

Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
Lect7 Math231
No ratings yet
Lect7 Math231
29 pages
Econometrics
No ratings yet
Econometrics
320 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Wine Quality Research Paper
100% (1)
Wine Quality Research Paper
3 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
1 page
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
BACOSTMX Module-2 Lecture Cost-Behavior PDF
No ratings yet
BACOSTMX Module-2 Lecture Cost-Behavior PDF
69 pages
Using Propensity Score Analysis in Behavioral Studies: Jie Chen, Ph.D. University of Massachusetts Boston
No ratings yet
Using Propensity Score Analysis in Behavioral Studies: Jie Chen, Ph.D. University of Massachusetts Boston
58 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Tutorial Stat 322 PDF
No ratings yet
Tutorial Stat 322 PDF
58 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
SYLLABUS IN ADVANCED STATISTICS PHD
100% (5)
SYLLABUS IN ADVANCED STATISTICS PHD
2 pages
One Way Anova
No ratings yet
One Way Anova
5 pages
4 - C - Logistic Regression
No ratings yet
4 - C - Logistic Regression
13 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Cointegration and Error Correction Models: Readings
No ratings yet
Cointegration and Error Correction Models: Readings
35 pages
Logistic Regression Tutorial
No ratings yet
Logistic Regression Tutorial
25 pages
Pricing in Insurance
No ratings yet
Pricing in Insurance
29 pages
Regression in Data Mining
No ratings yet
Regression in Data Mining
15 pages
PS Unit - Iv
No ratings yet
PS Unit - Iv
19 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
Result of Pilot Test: Reliability PBI MOTHER
No ratings yet
Result of Pilot Test: Reliability PBI MOTHER
14 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
GR 12 - Statistics
No ratings yet
GR 12 - Statistics
25 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Harvard Ec 1123 Econometrics Problem Set 4 - Tarun Preet Singh
100% (2)
Harvard Ec 1123 Econometrics Problem Set 4 - Tarun Preet Singh
4 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
20 pages
AI Mod 3 and 4
No ratings yet
AI Mod 3 and 4
18 pages
Logistic Regression-Advanced Biostat PDF
No ratings yet
Logistic Regression-Advanced Biostat PDF
86 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
AI Module 1 and 2
No ratings yet
AI Module 1 and 2
14 pages
Logistic Regression
No ratings yet
Logistic Regression
33 pages
ASEU TEACHERFILE WEB 3936427542519593610.ppt 1608221424
No ratings yet
ASEU TEACHERFILE WEB 3936427542519593610.ppt 1608221424
9 pages
Adam Smith Business School Subject of Economics Degree of MSC Degree Exam Basic Econometrics, Econ5002
No ratings yet
Adam Smith Business School Subject of Economics Degree of MSC Degree Exam Basic Econometrics, Econ5002
6 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
13 pages
Artikel Ahmad Fadhil Imran PDF
No ratings yet
Artikel Ahmad Fadhil Imran PDF
5 pages
BMI Autopsy: 1 1 3 1 1 1 1 2 1 3 1 1 1 2 2 1 2 3 3 2 4 1 2 1 1 2 1 2 1 1 1 Total Result 23 27 50
No ratings yet
BMI Autopsy: 1 1 3 1 1 1 1 2 1 3 1 1 1 2 2 1 2 3 3 2 4 1 2 1 1 2 1 2 1 1 1 Total Result 23 27 50
4 pages
1-5 Personal Expenses
No ratings yet
1-5 Personal Expenses
19 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Line of Regression Part 1
No ratings yet
Line of Regression Part 1
27 pages
3008 Assignment 1 - Due Oct 9th Revised
No ratings yet
3008 Assignment 1 - Due Oct 9th Revised
3 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Assignment
No ratings yet
Assignment
7 pages
Nationalism, Patriotism and Foreign Policy Attitudes Among Chinese University Students
No ratings yet
Nationalism, Patriotism and Foreign Policy Attitudes Among Chinese University Students
20 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Loges Tic
No ratings yet
Loges Tic
30 pages
Darren George, Paul Mallery - IBM SPSS Statistics 29 Step by Step - 11
No ratings yet
Darren George, Paul Mallery - IBM SPSS Statistics 29 Step by Step - 11
1 page
Logistic Regression
0% (1)
Logistic Regression
49 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Probit Logit Interpretation
No ratings yet
Probit Logit Interpretation
26 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Cda Chapter Three
No ratings yet
Cda Chapter Three
18 pages
Lab Program (SVM From Scratch)
No ratings yet
Lab Program (SVM From Scratch)
2 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Log Reg
No ratings yet
Log Reg
32 pages
Logistic Regression 007
No ratings yet
Logistic Regression 007
1 page
Logistic Regression
No ratings yet
Logistic Regression
54 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
An Introduction To Logistic Regression in R
No ratings yet
An Introduction To Logistic Regression in R
25 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Chapter 16 - Logistic Regression Model
No ratings yet
Chapter 16 - Logistic Regression Model
7 pages
Article: An Introduction Tos Logistic Regression Analysis and Reporting
No ratings yet
Article: An Introduction Tos Logistic Regression Analysis and Reporting
5 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

Binary Logistic Regression

• Logistic Regression, more commonly called logit regression, is used

• A property of the logistic function, is that when Z becomes infinitely

Taking the natural logarithm of both sides of (5), we get

• So we use another a measure called odds ratio to interpret the impact

Which can be written alternatively as

The original coefficient b represents the additive effect of a one-unit

• The Pseudo R2 measures are interpreted in manner similar to the

You might also like