Chapter 8
Chapter 8
Chapter 8
Learning Objectives
Introduction
Objectives of Logistic Regression Models
Common Uses of Logistic Regression Models
Logistic Regression Example
Logistic Regression
Logistic Regression Model
Odds of an Event
Model Fit and Testing for Significance of the Coefficients
Odds Ratio in Logistic Regression
Example
Logistic regression with one predictor variable
Logistic regression with multiple predictor variables
Summary
Takeaways
Know what the odds ratio is, how it is calculated and used
Learning Objectives
Introduction
Objectives of Logistic Regression Models
Common Uses of Logistic Regression Models
Logistic Regression Example
Logistic Regression
Logistic Regression Model
Odds of an Event
Model Fit and Testing for Significance of the Coefficients
Odds Ratio in Logistic Regression
Example
Logistic regression with one predictor variable
Logistic regression with multiple predictor variables
Summary
Takeaways
As part of his modeling approach, Max first pulled data from the local
company server on all of MyFriend’s current and former insurance policy
holders, including whether they had churned sometime during the last 5
years (captured by a binary variable: 1 = yes, 0 = no), the customer’s
complaint behavior, gender, age, higher education, annual income,
geographic information etc.
They then sorted the customers by (1) customer worth and (2)
probability of churning. Many of MyFriend’s most valuable customers had
a high probability of churning.
Armed with these insights, Max’s Manager assembled a team and tasked it
with identifying interventions they could target at MyFriend’s most
valuable customers with high probability of churning.
She hoped these interventions would keep these customers happy and
thus not cancel their insurance policy.
Learning Objectives
Introduction
Objectives of Logistic Regression Models
Common Uses of Logistic Regression Models
Logistic Regression Example
Logistic Regression
Logistic Regression Model
Odds of an Event
Model Fit and Testing for Significance of the Coefficients
Odds Ratio in Logistic Regression
Example
Logistic regression with one predictor variable
Logistic regression with multiple predictor variables
Summary
Takeaways
where e is the base of the natural logarithm, i.e., 2.71828, p is the probability
of the event of interest happening (e.g., probability of a customer to churn),
X are the predictor (or x) variables, and β are the coefficients to be
estimated.
Probability of churning
event happening follows .8
an s-shape with .6
asymptotes at 0 and 1.
.4
to the right. .0
1 2 3 4 5 6 7 8
x (number of complaints)
Probability and odds are often used interchangeably. However, they are
not equivalent mathematically.
Probability ranges between 0 and 1, where 0 is an impossible event and 1 an
inevitable event.
Probability is usually reported as a percentage ranging from 0% to 100%
(which is equivalent to ranging from 0 and 1).
Probability captures the ratio of how often an event of interest occurred (e.g.,
customers who churned) over all possible events (e.g., customers who did and
did not churn).
Odds on the other hand can range from 0 to infinity. Odds capture the ratio of
how often an event of interest occurred (e.g., customers who churned) over how
often it did not occur (e.g., customer who did not churn).
A likelihood value (i.e., - 2 times the log of the likelihood value, or -2LL) is
used to determine how well a logistic regression model fits overall.
Models that fit well have a small -2LL value, and a -2LL value of 0
indicates perfect model fit.
Other R2- like measures used in logistic regression include Cox and Snell’s
R2 as well as the Nagelkerke value.
Logistic regression will also provide the odds ratio for each predictor
variable x. The odds ratio is calculated as follows:
where βi is the coefficient of predictor variable xi.
The odds ratio captures by how much the predicted odds of the event of
interest happening (e.g., customer churning) increases as the focal
predictor variable x (e.g., number of customer complaints) increases by
one unit.
Returning to the MyFriend example, the table below shows the logistic
regression results based on one of the predictor variables Max had pulled,
number of customer complaints.
Predictor Standard
Coefficient z-value p-value Odds Ratio
variable Error
Constant -2.57 .296 -8.66 .000 .07
Complaints 1.71 .181 9.43 .000 5.52
# of times complained 0 1 2 3 4
Estimated probability of churning 7% 30% 70% 93% 98%
Estimated odds of churning .076 .42 2.3 12.9 71.5
Predictor Standard
Coefficient z-value p-value Odds Ratio
variable Error
Constant 1.24 .69 1.85 .064 3.45
Complaints 1.74 .23 7.73 .000 5.69
Age -.103 .02 -5.72 .000 .902
Gender -.00 .45 -.000 1.000 .999
Looking across the coefficients, we can see that two predictor variables
are statistically significant: The number of complaints and age. In
contrast, gender is not significant.
Likewise, the probability of a customer who has complained 1 time and is
25 years old churning is 60%.
© Palmatier, Petersen, and Germann 22
Agenda
Learning Objectives
Introduction
Objectives of Logistic Regression Models
Common Uses of Logistic Regression Models
Logistic Regression Example
Logistic Regression
Logistic Regression Model
Odds of an Event
Model Fit and Testing for Significance of the Coefficients
Odds Ratio in Logistic Regression
Example
Logistic regression with one predictor variable
Logistic regression with multiple predictor variables
Summary
Takeaways
Learning Objectives
Introduction
Objectives of Logistic Regression Models
Common Uses of Logistic Regression Models
Logistic Regression Example
Logistic Regression
Logistic Regression Model
Odds of an Event
Model Fit and Testing for Significance of the Coefficients
Odds Ratio in Logistic Regression
Example
Logistic regression with one predictor variable
Logistic regression with multiple predictor variables
Summary
Takeaways
Probability and odds are often used interchangeably, but they are not
equivalent mathematically. Probability captures the ratio of how often a
binary event of interest occurs (e.g., customers who churn) over all
possible events (e.g., customers who do and do not churn).
Logistic regression provides model fit measures that tell the researcher
how good the predictor variables predict the outcome variable.