Binary Logistic (5)
Binary Logistic (5)
Binary Logistic (5)
𝑒 𝛽0+𝛽1𝑋
𝑝 𝑋 = 𝛽 +𝛽 𝑋
… … … … (1)
1+𝑒 0 1
𝑝 𝑋
= 𝑒 𝛽0+𝛽1𝑋 … … … (2)
1 − 𝑝(𝑋)
• The quantity 𝑝(𝑋)/[1 − 𝑝 𝑋 ] is called the odds, and can take on any
value odds between 0 and ∞.
Odds
• Values of the odds close to 0 and ∞ indicate very low and very high
probabilities of default, respectively.
• For example, on average 1 in 5 people with an odds of 1/4 will
0.2
default, since 𝑝 𝑋 = 0.2 implies an odds of =1/4.
(1−0.2)
• Likewise on average nine out of every ten people with an odds of 9
will default, since 𝑝 𝑋 = 0.9 implies an odds of 0.9/(1 − 0.9) = 9.
• Odds are traditionally used instead of probabilities in horse-racing,
since they relate more naturally to the correct betting strategy.
Logistic Model
• Alternatively, (2) can be written as
𝑝 𝑋
ln = 𝛽0 + 𝛽1 𝑋 … … … 3 .
1−𝑝 𝑋
𝑒 −3.5041+0.4049×0
Pr default = Yes student = No = −3.5041+0.4049×0
= 0.0292.
1+𝑒
• Also, students are approximately 1.5 times more likely to default as
compared to those who are not students.
Multiple Logistic Regression
• We now consider the problem of predicting a binary response using
multiple predictors.
• By analogy with the extension from simple to multiple linear
regression, we can generalize Equation (3) as follows:
𝑝(𝑋)
ln = 𝛽0 + 𝛽1 𝑋1 + ⋯ + 𝛽𝑝 𝑋𝑝 , … … … (4)
1 − 𝑝(𝑋)
where 𝑋 = 𝑋1 , 𝑋2 , … , 𝑋𝑝 are 𝑝 predictors.
Multiple Logistic Regression
• Equation (4) can be rewritten as
𝑒 𝛽0+𝛽1𝑋1+⋯+𝛽𝑝 𝑋𝑝
𝑝 𝑋 = 𝛽0 +𝛽1 𝑋1 +⋯+𝛽𝑝 𝑋𝑝
… … … (5)
1+𝑒
105
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = = 31.53%
333
9627
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = = 99.59%
9667
268
𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 𝑅𝑎𝑡𝑒 = = 2.68%
10000
Confusion Matrix
• The table reveals that the logistic regression predicted that a total of
145 people would default.
• Of these people, 105 actually defaulted and 40 did not.
• Hence only 40 out of 9, 667 of the individuals who did not default
were incorrectly labelled.
• This looks like a pretty low error rate!
Confusion Matrix
• However, of the 333 individuals who defaulted, 228 (or 68.47%) were
missed by Logistic Regression.
• So while the overall error rate is low, the error rate among individuals
who defaulted is very high.
• From the perspective of a credit card company that is trying to
identify high-risk individuals, an error rate of 228/333 = 68.47%
among individuals who default may well be unacceptable.
Sensitivity and Specificity
• terms sensitivity and specificity characterize the performance of a
classifier.
• In this case the sensitivity is the percentage of true defaulters that are
identified, a low 31.53% in this case.
• The specificity is the percentage of non-defaulters that are correctly
identified, here (1 −40/9, 667)× 100 = 99.59%.
Improving Logistic Regression Classifier
• For example, we might label any customer with a probability of
default above 20% to the default class.
• That is, we may assign an observation to default class if
Pr default = Yes X = x > 0.2 … … … (7)
Deciding the Optimal Threshold
ROC Curve
• The ROC curve is a used for simultaneously displaying two types of
errors for all possible thresholds.
• The name “ROC” comes from communications theory. It is an
acronym for receiver operating characteristics.
• The overall performance of a classifier, summarized over all possible
thresholds, is given by the area under the (ROC) curve (AUC).
• An ideal ROC curve should touch the top left corner, so the larger the
AUC the better the classifier.
General Rule
𝐴𝑈𝐶 Decision
𝐴𝑈𝐶 = 0.5 No Discrimination
0.7 ≤ 𝐴𝑈𝐶 < 0.8 Acceptable Discrimination
0.8 ≤ 𝐴𝑈𝐶 < 0.9 Excellent Discrimination
𝐴𝑈𝐶 ≥ 0.9 Outstanding Discrimination
How well does the model fit the data?
• The Hosmer-Lemeshow (HL) test is widely used to address the
question “How well does my model fit the data?”
• It serves as a goodness-of-fit (GOF) test for the logistic regression
model.
• This test is used to find out whether there is any significant evidence
against the model fitting the data well.
• If the 𝑝-value is small, this is indicative of poor fit.
• For the Default data set, the observed 𝑝-value is 0.8846, indicating
that there is no evidence of poor fit.
• So our model is indeed correctly specified!!
How well does the model fit the data?
• The logistic regression model is fitted using the method of maximum likelihood.
• The parameter estimates are those values which maximize the likelihood of the
data which have been observed.
• McFadden's R squared measure is given by
2
log 𝐿𝐶
𝑅 =1− ,
log 𝐿𝑁𝑈𝐿𝐿
where 𝐿𝐶 denotes the (maximized) likelihood value from the current fitted model,
and 𝐿𝑁𝑈𝐿𝐿 denotes the corresponding value for the null model - the model with
only an intercept and no predictors.
• McFadden's R squared measure also takes value between 0 and 1.
• For the Default data set, the McFadden's R square value is 46.19%, indicating that
the model may be useful in practice.
Training Error Rate and Test Error Rate
• The misclassification error rate calculated earlier with the optimal
threshold was 13.81%.
• However, we have used the same data to train and test our model.
• In reality, this error rate is in fact the training error rate.
• In order to assess the accuracy of the model, we should first fit a
model using a part of the data and then should examine the
performance on the “hold-out” data.
• This error rate is called the test error rate.
• Next we have used 80% of the observations to fit the model and 20%
of observations are kept aside for validating the model.