0% found this document useful (0 votes)
76 views

Chapter 10 - Logistic Regression: Data Mining For Business Intelligence

This document discusses logistic regression, a powerful classification model. It relates predictor variables to a binary outcome using the logit function and maximum likelihood estimation. It also covers interpreting coefficients, variable selection, and assessing significance of predictors.

Uploaded by

jay
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Chapter 10 - Logistic Regression: Data Mining For Business Intelligence

This document discusses logistic regression, a powerful classification model. It relates predictor variables to a binary outcome using the logit function and maximum likelihood estimation. It also covers interpreting coefficients, variable selection, and assessing significance of predictors.

Uploaded by

jay
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Chapter 10 – Logistic Regression

Data Mining for Business Intelligence


Shmueli, Patel & Bruce

© Galit Shmueli and Peter Bruce 2010


Logistic Regression
 Powerful model-based classification tool
 Extends idea of linear regression to situation where
outcome variable is categorical
Model relates predictors with the outcome
Example: Y denotes recommendation on
holding/selling/buying a stock – categorical variable with 3
categories
 We focus on binary classification, i.e. Y=0 or Y=1 but
predictors can be categorical or continuous
 Widely used, particularly where a structured model is
useful
The Logit
Goal: Find a function of the predictor variables that relates
them to a 0/1 outcome

 Instead of Y as outcome variable (like in linear regression),


we use a function of Prob(Y=1) called the logit
 Logit can be modeled as a linear function of the predictors
 The logit can be mapped back to a probability, which, in
turn, can be mapped to a class
Using cut-off value on the probability of belonging to class 1,
P(Y=1)
Step 1: Logistic Response Function
• Let p = probability of belonging to class 1
• Logistic regression relates p to predictors with a function
that guarantees 0  p  1

Standard linear function (shown below) does not:

+…

q = number of predictors
The Fix:
use logistic response function

Equation 10.2 in textbook


Step 2: The Odds

The odds of an event are defined as:

p
eq. 10.3 Odds  p = probability of event
1 p

Or, given the odds of an event, the probability of the


event can be computed by:

Odds
eq. 10.4
p
1  Odds
We can also relate the Odds to the
predictors:

 0  1 x1   2 x2   q xq
eq. 10.5 Odds  e

Recall that:
Step 3: Take log on both sides
• This gives us the logit:

log(Odds)   0  1 x1   2 x2     q xq eq. 10.6

• Log(odds) is called the logit and it takes values from –∞


to +∞
• Logit is the dependent variable, and is a linear function of
the predictors x1, x2, …, xq
• Helps make interpretations easier
Example: Acceptance of Personal Loan
Offer
Outcome variable: accept bank loan (0/1)

Predictors: Demographic (age, income, etc.), and information about


their bank relationship (mortgage, securities account, etc.)

Data: 5000 customers – 480 (9.6%) accepted the loan offer previously

Goal: find characteristics of customers who are most likely to accept


loan offer in future mailings
Data preprocessing
Partition 60% training, 40% validation
Create 0/1 dummy variables for categorical predictors
Single Predictor Model
Modeling loan acceptance on income (x)

Fitted coefficients: b0 = -6.3525, b1 = 0.0392


Last step - classification
 Model produces an estimated probability of being a “ 1”
Example: P(accept loan|income)
 Convert to a classification by establishing cutoff level
 If estimated prob. > cutoff, classify as “ 1”
 Thus model helps in classification as well as predicting the
probability of belonging to one class
 Default cut-off value: 0.50 but can be changed to:
Maximize classification accuracy
Example: Parameter estimation
Estimates of ’s are derived through an iterative
process called maximum likelihood estimation

Let us include all 12 predictors in the model now


Estimated Equation for Logit

• Interpreting binary predictor effects:


• The odds of accepting the loan offer for those who already have a CD account with the
bank is 32.1 times as the odds of accepting the loan offer for those who do not have a
CD account (p value < 0.001).

• Interpreting continuous predictor effects:


• The odds of accepting the loan offer increases by 77.1% if the family size increases by
one (p value < 0.001).
• The odds of accepting the loan offer decreases by 4.4% if a client is 1 year older (p value
= 0.624).
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm
Variable Selection
Problems:
As in linear regression, correlated predictors introduce bias in
the method
 Overly complex models have the danger of overfitting

Solution: Remove extreme redundancies by dropping


predictors via automated selection of variable subsets (like
linear regressions) or by data reduction methods such as PCA
P-values for Predictors
 Test null hypothesis that coefficient = 0
P-values with the coefficients display results of these tests
Coefficients with low p-values (close to 0) are statistically
significant
 Useful for review to determine whether to include variable
in model
 Key in profiling tasks, but less important in predictive
classification
Summary
 Logistic regression is similar to linear regression, except
that it is used with a categorical response
 It can be used for explanatory tasks (=profiling) or
predictive tasks (=classification)
 The predictors are related to the response Y via a nonlinear
function called the logit
 As in linear regression, reducing predictors can be done via
variable selection
 Logistic regression can be generalized to more than two
classes

You might also like