100% found this document useful (1 vote)
60 views

Logistic Regression

Logistic regression is used for classification problems where the response variable is categorical. It models the probability of an event occurring versus not occurring. Some examples include predicting loan defaults, fraud detection, customer churn, and propensity to buy models. Unlike linear regression, which predicts absolute values, logistic regression predicts probabilities. It uses a sigmoid function to map predictor variable values to a probability between 0 and 1. Model parameters are estimated using maximum likelihood estimation to minimize the error between predicted and actual probabilities. Thresholds can be selected using methods like ROC curves to optimize sensitivity and specificity for classification.

Uploaded by

Saket Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
60 views

Logistic Regression

Logistic regression is used for classification problems where the response variable is categorical. It models the probability of an event occurring versus not occurring. Some examples include predicting loan defaults, fraud detection, customer churn, and propensity to buy models. Unlike linear regression, which predicts absolute values, logistic regression predicts probabilities. It uses a sigmoid function to map predictor variable values to a probability between 0 and 1. Model parameters are estimated using maximum likelihood estimation to minimize the error between predicted and actual probabilities. Thresholds can be selected using methods like ROC curves to optimize sensitivity and specificity for classification.

Uploaded by

Saket Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Logistic regression

model

Case studies for choice


models
Choice model cater to cases where the response variable are
categorical variables
Home loan/credit card/ Consumer loan defaults { default vs. no
default}
Fraud detection {fraud case vs. no fraud}
Customer Churn Analysis {churn vs. no churn}
Propensity to buy models { buy vs. no buy|

Linear regression bad choice when


response variables are categorical

- Clearly simplest
model could be y =1
when tumor size is
greater than 5
- In the first model one
could do that by
saying y_predicted
>0.5
- Adding a few more
grey points should not
result in new model or
a new line because in
reality the cut has not
changed

General structure for choice


models
X

Home loan default


Income
Debt to Income
Default on other
loans
Salaried vs.
Business
Expense to
Income

Credit Score

Probability of
default

Logistic regression model


Instead of predicting absolute value we predict probability
of an event
1.2
Probability
of Cancer
1
0.8
0.6
0.4
0.2
0
0

P(z) = 1/(1+exp(-z))
6

10

Tumor Size

12

14

16

Sigmoid function

Error function(analogy)

Y=0

(p-0)
Roughly
MLE

1
Error

Y=1

(1-p)

Error

p1 y (1 p ) y

Minimiz
e

p y (1 p )1 y

Maximiz
e

MLE
(Maximum
Likelihood)

Estimate parameter using


Maximum Likelihood

Max yi ln( p ( zi )) (1 yi ) ln(1 p ( zi ))


i

where
zi xi

Churn Model Example

Setting Threshold for


classification
Positive

Threshold

Negative

High Threshold -> High Accuracy low


capture
Low Threshold -> Low Accuracy high
capture

Picking a threshold:
KS Chart
- Divide the
population into
deciles
-

Take upper limit of


all deciles and plot
the cumulative
percentage of good
and bad examples

- Pick the
score/threshold of
the decile where the
separation between
good and bad is the
maximum

Truth Table to measure


accuracy
False Negative Rate = False Negative/Total Actual False
(specificity)
True Positive Rate = True Positive/Total Actual True
(sensitivity)
actual
True

False

True

True Positive

False
Positive

False

True
Negative

False
Negative

Predicted

Max sensitivity and


Specificity
Choose the threshold where both sensitivity and specificity are
maximized

Goodness of fit ROC Curve

- The dotted line


represents the case
where model has not
learnt anything i.e. picks
the same percentage of
of false positives and
True Positives
- The area under the blue
curve therefore
represents the goodness
of fit (0.5<Area<1)

You might also like