0% found this document useful (0 votes)
15 views29 pages

Chapter 10 Logistic Reg (Python)

Chapter 10 discusses logistic regression, which extends linear regression to situations with categorical outcome variables, focusing on binary classification. It introduces the logit function to relate predictor variables to a 0/1 outcome and explains the process of fitting a logistic regression model, including variable selection and performance evaluation. The chapter emphasizes the importance of understanding odds and probabilities in predictive classification and addresses issues like multicollinearity in predictor variables.

Uploaded by

orselmerve2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

Chapter 10 Logistic Reg (Python)

Chapter 10 discusses logistic regression, which extends linear regression to situations with categorical outcome variables, focusing on binary classification. It introduces the logit function to relate predictor variables to a 0/1 outcome and explains the process of fitting a logistic regression model, including variable selection and performance evaluation. The chapter emphasizes the importance of understanding odds and probabilities in predictive classification and addresses issues like multicollinearity in predictor variables.

Uploaded by

orselmerve2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 10 – Logistic

Regression
Logistic Regression
⚫Extends idea of linear regression to
situation where outcome variable is
categorical

⚫Widely used, particularly where a


structured model is useful to explain
(=profiling) or to predict
⚫ Finding the factors that differantiate
between male and female top executives

⚫We focus on binary classification


i.e. Y=0 or Y=1
The Logit
Goal: Find a function of the predictor
variables that relates them to a 0/1
outcome

⚫Instead of Y as outcome variable (like in


linear regression), we use a function of Y
called the logit
⚫Logit can be modeled as a linear function of
the predictors
⚫The logit can be mapped back to a
probability, which, in turn, can be mapped
to a class
Step 1: Logistic Response
Function
p = probability of belonging to class 1

Need to relate p to predictors with a function


that guarantees 0 ≤ p ≤ 1

Standard linear function (as shown below)


does not:

q = number of
predictors
The Fix:
use logistic response
function

Equation 10.2 in
textbook
Step 2: The Odds
The odds of an event are defined as:

p = probability of
eq. 10.3 event

Or, given the odds of an event, the probability


of the event can be computed by:
eq.
10.4
We can also relate the Odds to
the predictors:

eq. 10.5

To get this result, substitute 10.2


into 10.4
Step 3: Take log on both
sides

This gives us the logit:

log(Odds) = logit (eq. 10.6)


Logit, cont.

So, the logit is a linear function of predictors


x1, x2, …
⚫ Takes values from -infinity to +infinity

Review the relationship between logit, odds


and probability (Check the chapter 10)
Odds (a) and Logit (b) as function of
P
Example
Personal Loan Offer
(UniversalBank.csv)

Outcome variable: accept bank loan (0/1)

Predictors: Demographic info, and info about


their bank relationship
Single Predictor Model
Modeling loan acceptance on income (x)

Assume Fitted coefficients (more later): b0 = -


6.3525, b1 = -0.0392
Seeing the Relationship
Last step - classify
Model produces an estimated probability of
being a “1”

⚫Convert to a classification by establishing


cutoff level

⚫If estimated prob. > cutoff, classify as “1”

© Galit Shmueli and Peter Bruce 2017


Ways to Determine Cutoff
⚫0.50 is popular initial choice

⚫Additional considerations (see Chapter 5)


⚫ Maximize classification accuracy
⚫ Maximize sensitivity (subject to min. level of
specificity)
⚫ Minimize false positives (subject to max. false
negative rate)
⚫ Minimize expected cost of misclassification
(need to specify costs)
Example, cont.

⚫Estimates of β’s are derived through an


iterative process called maximum likelihood
estimation

⚫Let’s include all 12 predictors in the model


now
Data Prep

bank_df = pd.read_csv('UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
bank_df.columns = [c.replace(' ', '_') for c in bank_df.columns]

# Treat education as categorical, convert to dummy variables


bank_df['Education'] = bank_df['Education'].astype('category')
new_categories = {1: 'Undergrad', 2: 'Graduate', 3:
'Advanced/Professional'}
bank_df.Education.cat.rename_categories(new_categories, inplace=True)
bank_df = pd.get_dummies(bank_df, prefix_sep='_', drop_first=True)

y = bank_df['Personal_Loan']
X = bank_df.drop(columns=['Personal_Loan'])
Fitting Model

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y,
test_size=0.4, random_state=1)

# fit a logistic regression (set penalty=l2 and C=1e42 to avoid


# regularization)
logit_reg = LogisticRegression(penalty="l2", C=1e42,
solver='liblinear')
logit_reg.fit(train_X, train_y)
print('intercept ', logit_reg.intercept_[0])
print(pd.DataFrame({'coeff': logit_reg.coef_[0]},
index=X.columns).transpose())
print('AIC', AIC_score(valid_y, logit_reg.predict(valid_X), df =
len(train_X.columns) + 1))
Results
intercept -12.61895521314035

Age Experience Income Family CCAvg Mortgage


coeff -0.032549 0.03416 0.058824 0.614095 0.240534 0.001012

Securities_Account CD_Account Online CreditCard


coeff -1.026191 3.647933 -0.677862 -0.95598

Education_Graduate Education_Advanced/Professional
coeff 4.192204 4.341697

AIC -709.1524769205962

coefficients for logit


Converting from logit to probabilities

logit_reg_pred = logit_reg.predict(valid_X)
logit_reg_proba = logit_reg.predict_proba(valid_X)
logit_result = pd.DataFrame({'actual': valid_y,
'p(0)': [p[0] for p in logit_reg_proba],
'p(1)': [p[1] for p in logit_reg_proba],
'predicted': logit_reg_pred })

# display four different cases


interestingCases = [2764, 932, 2721, 702]
print(logit_result.loc[interestingCases])

actual p(0) p(1) predicted


2764 0 0.976 0.024 0
932 0 0.335 0.665 1
2721 1 0.032 0.968 1
702 1 0.986 0.014 0
Interpreting Odds, Probability
For predictive classification, we typically use
probability with a cutoff value

For explanatory purposes, odds have a useful


interpretation:
⚫If we increase x1 by one unit, holding x2, x3 …
xq constant, then
⚫b1 is the factor by which the odds of
belonging to class 1 increase
⚫ Recall
⚫ Consider single predictor as «Income»,
remaining will be constant.
⚫ Odds(Personel Loan=Yes|Income)=
⚫ So, is the multiplicative factor by which the
odds (of belonging to class 1) increase
when the value of X1 is increased by 1
unit, holding all other predictors constant.
If < 0, an increase in X1 is associated with
a decrease in the odds of belonging to
class 1, whereas a positive value of is
associated with an increase in the odds.
Loan Example:
Evaluating Classification
Performance

Performance measures: Confusion matrix


and % of misclassifications

More useful in this example: gains (lift)


(terms sometimes used interchangeably)
Python’s Gains
Chart
df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
gainsChart(df.actual, ax=axes[0])
liftChart(df['p(1)'], title=False, ax=axes[1])
plt.show()

# of 1’s yielded by model,


moving thru records sorted by
predicted prob. of being a 1

# of 1’s yielded by selecting


randomly
Python’s Lift Chart
df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
gainsChart(df.actual, ax=axes[0])
liftChart(df['p(1)'], title=False, ax=axes[1])
plt.show()

Top decile (i.e. the 10%


most probable to be
1’s) are 7.8 times as
likely to be 1’s,
compared to random
selection
Multicollinearity

Problem: As in linear regression, if one


predictor is a linear combination of other
predictor(s), model estimation will fail
⚫Note that in such a case, we have at least
one redundant predictor

Solution: Remove extreme redundancies


(by dropping predictors via variable
selection, or by data reduction methods such
as PCA)
Variable Selection
This is the same issue as in linear regression
⚫ The number of correlated predictors can grow when we
create derived variables such as interaction terms
(e.g. Income x Family), to capture more complex
relationships
⚫ Problem: Overly complex models have the danger of
overfitting
⚫ Solution: Reduce variables via automated selection of
variable subsets (as with linear regression)
⚫ See Chapter 6
Summary
⚫Logistic regression is similar to linear
regression, except that it is used with a
categorical response
⚫It can be used for explanatory tasks
(=profiling) or predictive tasks
(=classification)
⚫The predictors are related to the response Y
via a nonlinear function called the logit
⚫As in linear regression, reducing predictors
can be done via variable selection
⚫Logistic regression can be generalized to
more than two classes

You might also like