0% found this document useful (0 votes)

15 views29 pages

Chapter 10 Logistic Reg (Python)

Chapter 10 discusses logistic regression, which extends linear regression to situations with categorical outcome variables, focusing on binary classification. It introduces the logit function to relate predictor variables to a 0/1 outcome and explains the process of fitting a logistic regression model, including variable selection and performance evaluation. The chapter emphasizes the importance of understanding odds and probabilities in predictive classification and addresses issues like multicollinearity in predictor variables.

Uploaded by

orselmerve2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views29 pages

Chapter 10 Logistic Reg (Python)

Uploaded by

orselmerve2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Chapter 10 – Logistic

Regression
Logistic Regression
⚫Extends idea of linear regression to
situation where outcome variable is
categorical

⚫Widely used, particularly where a

structured model is useful to explain
(=profiling) or to predict
⚫ Finding the factors that differantiate
between male and female top executives

⚫We focus on binary classification

i.e. Y=0 or Y=1
The Logit
Goal: Find a function of the predictor
variables that relates them to a 0/1
outcome

⚫Instead of Y as outcome variable (like in

linear regression), we use a function of Y
called the logit
⚫Logit can be modeled as a linear function of
the predictors
⚫The logit can be mapped back to a
probability, which, in turn, can be mapped
to a class
Step 1: Logistic Response
Function
p = probability of belonging to class 1

Need to relate p to predictors with a function

that guarantees 0 ≤ p ≤ 1

Standard linear function (as shown below)

does not:

q = number of
predictors
The Fix:
use logistic response
function

Equation 10.2 in
textbook
Step 2: The Odds
The odds of an event are defined as:

p = probability of
eq. 10.3 event

Or, given the odds of an event, the probability

of the event can be computed by:
eq.
10.4
We can also relate the Odds to
the predictors:

eq. 10.5

To get this result, substitute 10.2

into 10.4
Step 3: Take log on both
sides

This gives us the logit:

log(Odds) = logit (eq. 10.6)

Logit, cont.

So, the logit is a linear function of predictors

x1, x2, …
⚫ Takes values from -infinity to +infinity

Review the relationship between logit, odds

and probability (Check the chapter 10)
Odds (a) and Logit (b) as function of
P
Example
Personal Loan Offer
(UniversalBank.csv)

Outcome variable: accept bank loan (0/1)

Predictors: Demographic info, and info about

their bank relationship
Single Predictor Model
Modeling loan acceptance on income (x)

Assume Fitted coefficients (more later): b0 = -

6.3525, b1 = -0.0392
Seeing the Relationship
Last step - classify
Model produces an estimated probability of
being a “1”

⚫Convert to a classification by establishing

cutoff level

⚫If estimated prob. > cutoff, classify as “1”

© Galit Shmueli and Peter Bruce 2017

Ways to Determine Cutoff
⚫0.50 is popular initial choice

⚫Additional considerations (see Chapter 5)

⚫ Maximize classification accuracy
⚫ Maximize sensitivity (subject to min. level of
specificity)
⚫ Minimize false positives (subject to max. false
negative rate)
⚫ Minimize expected cost of misclassification
(need to specify costs)
Example, cont.

⚫Estimates of β’s are derived through an

iterative process called maximum likelihood
estimation

⚫Let’s include all 12 predictors in the model

now
Data Prep

bank_df = pd.read_csv('UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
bank_df.columns = [c.replace(' ', '_') for c in bank_df.columns]

# Treat education as categorical, convert to dummy variables

bank_df['Education'] = bank_df['Education'].astype('category')
new_categories = {1: 'Undergrad', 2: 'Graduate', 3:
'Advanced/Professional'}
bank_df.Education.cat.rename_categories(new_categories, inplace=True)
bank_df = pd.get_dummies(bank_df, prefix_sep='_', drop_first=True)

y = bank_df['Personal_Loan']
X = bank_df.drop(columns=['Personal_Loan'])
Fitting Model

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y,
test_size=0.4, random_state=1)

# fit a logistic regression (set penalty=l2 and C=1e42 to avoid

# regularization)
logit_reg = LogisticRegression(penalty="l2", C=1e42,
solver='liblinear')
logit_reg.fit(train_X, train_y)
print('intercept ', logit_reg.intercept_[0])
print(pd.DataFrame({'coeff': logit_reg.coef_[0]},
index=X.columns).transpose())
print('AIC', AIC_score(valid_y, logit_reg.predict(valid_X), df =
len(train_X.columns) + 1))
Results
intercept -12.61895521314035

Age Experience Income Family CCAvg Mortgage

coeff -0.032549 0.03416 0.058824 0.614095 0.240534 0.001012

Securities_Account CD_Account Online CreditCard

coeff -1.026191 3.647933 -0.677862 -0.95598

Education_Graduate Education_Advanced/Professional
coeff 4.192204 4.341697

AIC -709.1524769205962

coefficients for logit

Converting from logit to probabilities

logit_reg_pred = logit_reg.predict(valid_X)
logit_reg_proba = logit_reg.predict_proba(valid_X)
logit_result = pd.DataFrame({'actual': valid_y,
'p(0)': [p[0] for p in logit_reg_proba],
'p(1)': [p[1] for p in logit_reg_proba],
'predicted': logit_reg_pred })

# display four different cases

interestingCases = [2764, 932, 2721, 702]
print(logit_result.loc[interestingCases])

actual p(0) p(1) predicted

2764 0 0.976 0.024 0
932 0 0.335 0.665 1
2721 1 0.032 0.968 1
702 1 0.986 0.014 0
Interpreting Odds, Probability
For predictive classification, we typically use
probability with a cutoff value

For explanatory purposes, odds have a useful

interpretation:
⚫If we increase x1 by one unit, holding x2, x3 …
xq constant, then
⚫b1 is the factor by which the odds of
belonging to class 1 increase
⚫ Recall
⚫ Consider single predictor as «Income»,
remaining will be constant.
⚫ Odds(Personel Loan=Yes|Income)=
⚫ So, is the multiplicative factor by which the
odds (of belonging to class 1) increase
when the value of X1 is increased by 1
unit, holding all other predictors constant.
If < 0, an increase in X1 is associated with
a decrease in the odds of belonging to
class 1, whereas a positive value of is
associated with an increase in the odds.
Loan Example:
Evaluating Classification
Performance

Performance measures: Confusion matrix

and % of misclassifications

More useful in this example: gains (lift)

(terms sometimes used interchangeably)
Python’s Gains
Chart
df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
gainsChart(df.actual, ax=axes[0])
liftChart(df['p(1)'], title=False, ax=axes[1])
plt.show()

# of 1’s yielded by model,

moving thru records sorted by
predicted prob. of being a 1

# of 1’s yielded by selecting

randomly
Python’s Lift Chart
df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
gainsChart(df.actual, ax=axes[0])
liftChart(df['p(1)'], title=False, ax=axes[1])
plt.show()

Top decile (i.e. the 10%

most probable to be
1’s) are 7.8 times as
likely to be 1’s,
compared to random
selection
Multicollinearity

Problem: As in linear regression, if one

predictor is a linear combination of other
predictor(s), model estimation will fail
⚫Note that in such a case, we have at least
one redundant predictor

Solution: Remove extreme redundancies

(by dropping predictors via variable
selection, or by data reduction methods such
as PCA)
Variable Selection
This is the same issue as in linear regression
⚫ The number of correlated predictors can grow when we
create derived variables such as interaction terms
(e.g. Income x Family), to capture more complex
relationships
⚫ Problem: Overly complex models have the danger of
overfitting
⚫ Solution: Reduce variables via automated selection of
variable subsets (as with linear regression)
⚫ See Chapter 6
Summary
⚫Logistic regression is similar to linear
regression, except that it is used with a
categorical response
⚫It can be used for explanatory tasks
(=profiling) or predictive tasks
(=classification)
⚫The predictors are related to the response Y
via a nonlinear function called the logit
⚫As in linear regression, reducing predictors
can be done via variable selection
⚫Logistic regression can be generalized to
more than two classes

Logistic Regression
100% (3)
Logistic Regression
41 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Solution 227
No ratings yet
Solution 227
15 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Replacement Models
No ratings yet
Replacement Models
9 pages
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
0% (2)
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
186 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
Logistic+Regression - Done
100% (1)
Logistic+Regression - Done
41 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
BANA 560 Lecture - 4 - LogisticRegression
No ratings yet
BANA 560 Lecture - 4 - LogisticRegression
26 pages
Chapter 10 Logistic Reg
No ratings yet
Chapter 10 Logistic Reg
29 pages
Chap10 LogisticRegression
No ratings yet
Chap10 LogisticRegression
19 pages
Habs Boys Maths 07 PDF 1
No ratings yet
Habs Boys Maths 07 PDF 1
8 pages
Reporting Document-Sap BPC Epm
100% (1)
Reporting Document-Sap BPC Epm
43 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Mathsa 2a Practics Papers
No ratings yet
Mathsa 2a Practics Papers
38 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
Topic 7 Regression (Cont2) Logistic Regression
No ratings yet
Topic 7 Regression (Cont2) Logistic Regression
33 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Dav Exp4 66
No ratings yet
Dav Exp4 66
5 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Class
No ratings yet
Class
102 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Ogistic Egression: Concha Bielza, Pedro Larra Naga
No ratings yet
Ogistic Egression: Concha Bielza, Pedro Larra Naga
33 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Session 9-Logistic Regression
No ratings yet
Session 9-Logistic Regression
33 pages
2+logistic Regression
No ratings yet
2+logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Lec 20
No ratings yet
Lec 20
16 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Class 6 Math Chapter 1 Knowing Our Numbers Solutions CE
No ratings yet
Class 6 Math Chapter 1 Knowing Our Numbers Solutions CE
13 pages
Prime Factorization: by Jane Alam Jan
No ratings yet
Prime Factorization: by Jane Alam Jan
6 pages
Research Introduction - Exploring The Impact of AI Math Technology Applications Among STEM Sudents
No ratings yet
Research Introduction - Exploring The Impact of AI Math Technology Applications Among STEM Sudents
13 pages
Second Moment of Area
No ratings yet
Second Moment of Area
4 pages
Object Detection and Recognition: Final Project Title
No ratings yet
Object Detection and Recognition: Final Project Title
6 pages
G7 Q1 Week 01
No ratings yet
G7 Q1 Week 01
8 pages
Eisenstein-Nov18 - Definicao-1-30
No ratings yet
Eisenstein-Nov18 - Definicao-1-30
30 pages
Polarization Through Quarter
No ratings yet
Polarization Through Quarter
10 pages
Planimeters
No ratings yet
Planimeters
13 pages
1st Year Honours Syllabus Statistics Physics
No ratings yet
1st Year Honours Syllabus Statistics Physics
16 pages
Group 5
No ratings yet
Group 5
5 pages
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
100% (2)
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
65 pages
Ch. 11 Modified
No ratings yet
Ch. 11 Modified
5 pages
I-Tutor Weekly Test-3A Maths (C-IX) - 26-04-2020
No ratings yet
I-Tutor Weekly Test-3A Maths (C-IX) - 26-04-2020
1 page
Kind of Angle
No ratings yet
Kind of Angle
5 pages
OMAE2007-29155: Mooring Line Damping Estimation by A Simplified Dynamic Model
No ratings yet
OMAE2007-29155: Mooring Line Damping Estimation by A Simplified Dynamic Model
8 pages
Torque Rotation
No ratings yet
Torque Rotation
6 pages
Print Assessment
No ratings yet
Print Assessment
20 pages
Recurrent Neural Network: Unit - 3
No ratings yet
Recurrent Neural Network: Unit - 3
12 pages
2202.00726 Yhe Atrs
No ratings yet
2202.00726 Yhe Atrs
13 pages
Design and Analysis of Algorithms CSC 321 Lecture 3 29092022 032607pm
No ratings yet
Design and Analysis of Algorithms CSC 321 Lecture 3 29092022 032607pm
49 pages
PrOBLEM Reading and Measuring THERMOMETER
No ratings yet
PrOBLEM Reading and Measuring THERMOMETER
16 pages
Emotion Recognition Based On Joint Visual and Audi
No ratings yet
Emotion Recognition Based On Joint Visual and Audi
4 pages
Department of Education: General Mathematics Weekly Home Learning Plan
No ratings yet
Department of Education: General Mathematics Weekly Home Learning Plan
3 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet