0% found this document useful (0 votes)
186 views30 pages

Logistic Regression

Logistic regression can be used for classification problems where the target variable is categorical. The logistic regression model estimates the probability of an observation belonging to a particular class based on predictor variables. Several metrics can evaluate the classification performance of logistic regression models, including accuracy, confusion matrices, and information criteria scores. Variable selection methods may help identify the most predictive variables and reduce overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views30 pages

Logistic Regression

Logistic regression can be used for classification problems where the target variable is categorical. The logistic regression model estimates the probability of an observation belonging to a particular class based on predictor variables. Several metrics can evaluate the classification performance of logistic regression models, including accuracy, confusion matrices, and information criteria scores. Variable selection methods may help identify the most predictive variables and reduce overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

PHUONG NGUYEN

LOGISTIC REGRESSION
CONTENT
1. INTRODUCTION

2. LOGISTIC REGRESSION MODEL

3. EVALUATING CLASSIFICATION PERFORMANCE


INTRODUCTION


INTRODUCTION


INTRODUCTION

5
LOGISTIC RESPONSE FUNCTION
1
𝑝=
1 + 𝑒 −𝑥

6
PROBABILITY

 

1
𝑝= −(𝛽 0 +𝛽 1 𝑥 1 +𝛽2 𝑥 2 + …𝛽 𝑞 𝑥 𝑞 )
1+ 𝑒
ODDS

𝑝
𝑂𝑑𝑑𝑠 =
1−𝑝

𝑂𝑑𝑑𝑠 1
𝑝= =
1 + 𝑂𝑑𝑑𝑠 1 + 𝑂𝑑𝑑𝑠 −1
ODDS

𝑝
𝑂𝑑𝑑𝑠 =
1−𝑝
LOGIT

𝑂𝑑𝑑𝑠 = 𝑒 𝛽0 +𝛽1𝑥1 +𝛽2𝑥2 +⋯+𝛽𝑞𝑥𝑞

ln(𝑂𝑑𝑑𝑠) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑞 𝑥𝑞
LOGIT
𝑝
𝐿𝑜𝑔𝑖𝑡 = 𝑙𝑛
1−𝑝
LOGISTIC REGRESSION MODEL


PERSONAL LOAN OFFER
UNIVERSALBANK.CSV



SINGLE PREDICTOR MODEL

 
SINGLE PREDICTOR MODEL


PYTHON FUNCTIONALITY NEEDED
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression,
LogisticRegressionCV
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from mord import LogisticIT
import matplotlib.pylab as plt
import seaborn as sns
from dmba import classificationSummary, gainsChart,
liftChart
from dmba.metric import AIC_score

https://github.com/nnbphuong/datascience4biz/blob/
master/Logistic_Regression.ipynb
DATA PREPROCESSING
bank_df = pd.read_csv('UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
bank_df.columns = [c.replace(' ', '_') for c in bank_df.columns]

# Treat education as categorical, convert to dummy variables


bank_df['Education'] = bank_df['Education'].astype('category')
new_categories = {1: 'Undergrad', 2: 'Graduate', 3:
'Advanced/Professional'}
bank_df.Education.cat.rename_categories(new_categories, inplace=True)
bank_df = pd.get_dummies(bank_df, prefix_sep='_', drop_first=True)

y = bank_df['Personal_Loan']
X = bank_df.drop(columns=['Personal_Loan’])

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y,
test_size=0.4, random_state=1)
FITTING THE MODEL
▪ 

# fit a logistic regression


logit_reg = LogisticRegression(penalty="l2", C=1e42,
solver='liblinear')
logit_reg.fit(train_X, train_y)
print('intercept ', logit_reg.intercept_[0])
print(pd.DataFrame({'coeff': logit_reg.coef_[0]},
index=X.columns).transpose())
print('AIC', AIC_score(valid_y, logit_reg.predict(valid_X),
df = len(train_X.columns) + 1))
FITTING THE MODEL OUTPUT
intercept -12.61895521314035

Age Experience Income Family CCAvg Mortgage


coeff -0.032549 0.03416 0.058824 0.614095 0.240534 0.001012

Securities_Account CD_Account Online CreditCard


coeff -1.026191 3.647933 -0.677862 -0.95598

Education_Graduate Education_Advanced/Professional
coeff 4.192204 4.341697

AIC -709.1524769205962
CONVERTING FROM LOGIT TO PROBABILITY
𝑙𝑜𝑔𝑖𝑡
𝑂𝑑𝑑𝑠
𝑂𝑑𝑑𝑠 = 𝑒 →𝑝=
1 + 𝑂𝑑𝑑𝑠
logit_reg_pred = logit_reg.predict(valid_X)
logit_reg_proba = logit_reg.predict_proba(valid_X)
logit_result = pd.DataFrame({'actual': valid_y,
'p(0)': [p[0] for p in logit_reg_proba],
'p(1)': [p[1] for p in logit_reg_proba],
'predicted': logit_reg_pred })

# display four different cases


interestingCases = [2764, 932, 2721, 702]
print(logit_result.loc[interestingCases])

OUTPUT
actual p(0) p(1) predicted
2764 0 0.976 0.024 0
932 0 0.335 0.665 1
2721 1 0.032 0.968 1
702 1 0.986 0.014 0
INTERPRETING PROBABILITY AND ODDS

▪ 
EVALUATING CLASSIFICATION PERFORMANCE
classificationSummary(train_y, logit_reg.predict(train_X))
classificationSummary(valid_y, logit_reg.predict(valid_X))

OUTPUT
Confusion Matrix (Accuracy 0.9080)

Prediction
Actual 0 1
0 2632 81
1 195 92
Confusion Matrix (Accuracy 0.9110)

Prediction
Actual 0 1
0 1763 44
1 134 59
VARIABLE SELECTION



VARIABLE SELECTION

×
VARIABLE SELECTION



MODEL SELECTION


SUMMARY

You might also like