Binary Logistic

Uploaded by

Sahil Pruthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views29 pages

Binary Logistic

Uploaded by

Sahil Pruthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Classification Methods

Logistic Regression Model

Introduction
• The linear regression model assumes that the response variable Y is
quantitative.
• In many situations, the response variable is instead qualitative.
• Often qualitative variables are referred to as categorical ; we will use
these terms interchangeably.
• Here we study approaches for predicting qualitative responses, a
process that is known as classification.
Introduction
• Predicting a qualitative response for an observation can be referred to
as classifying that observation, since it involves assigning the
observation to a category, or class.
• The methods used for classification generally predict the probability
of each of the categories of the qualitative variable, as the basis for
making the classification. In this sense they also behave like
regression methods.
• Here we will use logistic regression in modelling a binary response
variable.
Logistic Regression
• Instead of modelling this response 𝐷𝑒𝑓𝑎𝑢𝑙𝑡 directly, logistic
regression models the probability that Default belongs to a particular
category.
• For example, the probability of default given balance can be written
as
Pr Default = Yes Balance .
• The values of Pr Default = Yes Balance , which we denote by
𝑝(balance), will range between 0 and 1.
Logistic Regression
• Then for any given value of balance, a prediction can be made for
default.
• For example, one might predict 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 = 𝑌𝑒𝑠 for any individual for
whom 𝑝 balance > 0.5.
• Alternatively, if a company wishes to be conservative in predicting
individuals who are at risk for default, then they may choose to use a
lower threshold, such as 𝑝 balance > 0.1.
Logistic Model
• We need to model 𝑝(𝑋) using a function that gives outputs between
0 and 1 for all values of 𝑋.
• Many functions meet this criteria.
• In logistic regression, we use the logistic function,

𝑒 𝛽0+𝛽1𝑋
𝑝 𝑋 = 𝛽 +𝛽 𝑋
… … … … (1)
1+𝑒 0 1

• 𝛽0 and 𝛽1 are the parameters of the model.

Logistic Model
• After a bit of manipulation of Equation (1), we find that

𝑝 𝑋
= 𝑒 𝛽0+𝛽1𝑋 … … … (2)
1 − 𝑝(𝑋)

• The quantity 𝑝(𝑋)/[1 − 𝑝 𝑋 ] is called the odds, and can take on any
value odds between 0 and ∞.
Odds
• Values of the odds close to 0 and ∞ indicate very low and very high
probabilities of default, respectively.
• For example, on average 1 in 5 people with an odds of 1/4 will
0.2
default, since 𝑝 𝑋 = 0.2 implies an odds of =1/4.
(1−0.2)
• Likewise on average nine out of every ten people with an odds of 9
will default, since 𝑝 𝑋 = 0.9 implies an odds of 0.9/(1 − 0.9) = 9.
• Odds are traditionally used instead of probabilities in horse-racing,
since they relate more naturally to the correct betting strategy.
Logistic Model
• Alternatively, (2) can be written as

𝑝 𝑋
ln = 𝛽0 + 𝛽1 𝑋 … … … 3 .
1−𝑝 𝑋

• The left-hand side is called the log-odds or logit.

• We see that the logistic regression model (1) has a logit that is linear
in X.
Interpreting the Coefficient Table
• We see that 𝛽1 =0.0055; this indicates that an increase in balance is
associated with an increase in the probability of default.
• To be precise, a one-unit increase in balance is associated with an
increase in the log odds of default by 0.0055 units.
• Also note that the estimated odds ratio for one-unit increase in
balance is 1.005.
• This indicates that for every one-unit increase in balance, the odds for
defaulting increases by 1.005 times.
Making Predictions
• Once the coefficients have been estimated, it is a simple matter to
compute the probability of default for any given credit card balance.
• For example, using the coefficient estimates given in the Table, we
predict that the default probability for an individual with a balance of
$1, 000 is
𝑒 𝛽0+𝛽1𝑥
𝑝 𝑋 = = 0.00576.
1 + 𝑒𝛽0+𝛽1𝑥

• This is below 1%.

• In contrast, the predicted probability of default for an individual with a
balance of $2, 000 is much higher, and equals 0.586 or 58.6%.
Qualitative Predictors
• One can use qualitative predictors with the logistic regression model
using the dummy variable approach.
• As an example, the Default data set contains the qualitative variable
student.
• To fit the model we simply create a dummy variable that takes on a
value of 1 for students and 0 for non-students.
• The logistic regression model that results from predicting probability
of default from student status can be seen in the next Table.
Qualitative Predictors
Coefficien Std. error 𝑧-statistic 𝑝-value Odds
t Ratio
Intercept −3.5041 0.0707 −49.55 <0.0001
Student[Yes] 0.4049 0.1150 3.52 0.0004 1.499
Interpreting the Coefficients
• The coefficient associated with the dummy variable is positive, and the
associated p-value is statistically significant.
• This indicates that students tend to have higher default probabilities than
non-students:
𝑒 −3.5041+0.4049×1
Pr default = Yes student = Yes = −3.5041+0.4049×1
= 0.0431,
1+𝑒

𝑒 −3.5041+0.4049×0
Pr default = Yes student = No = −3.5041+0.4049×0
= 0.0292.
1+𝑒
• Also, students are approximately 1.5 times more likely to default as
compared to those who are not students.
Multiple Logistic Regression
• We now consider the problem of predicting a binary response using
multiple predictors.
• By analogy with the extension from simple to multiple linear
regression, we can generalize Equation (3) as follows:

𝑝(𝑋)
ln = 𝛽0 + 𝛽1 𝑋1 + ⋯ + 𝛽𝑝 𝑋𝑝 , … … … (4)
1 − 𝑝(𝑋)
where 𝑋 = 𝑋1 , 𝑋2 , … , 𝑋𝑝 are 𝑝 predictors.
Multiple Logistic Regression
• Equation (4) can be rewritten as

𝑒 𝛽0+𝛽1𝑋1+⋯+𝛽𝑝 𝑋𝑝
𝑝 𝑋 = 𝛽0 +𝛽1 𝑋1 +⋯+𝛽𝑝 𝑋𝑝
… … … (5)
1+𝑒

• As before, we use the maximum likelihood method to estimate the

parameters 𝛽0 , … , 𝛽𝑝 .
• The next Table shows the coefficient estimates for a logistic
regression model that uses balance, income (in thousands of dollars),
and student status to predict probability of default.
Making Prediction
• By substituting estimates for the regression coefficients from the
Table into Equation (5), we can make predictions.
• For example, a student with a credit card balance of $1,500 and an
income of $40, 000 has an estimated probability of default of
𝑒 −10.869+0.00574×1500+0.003×40−0.6468×1
𝑝 𝑋 = −10.869+0.00574×1500+0.003×40−0.6468×1
= 0.058.
1+𝑒
• A non-student with the same balance and income has an estimated
probability of default of
𝑒 −10.869+0.00574×1500+0.003×40−0.6468×0
𝑝 𝑋 = −10.869+0.00574×1500+0.003×40−0.6468×0
= 0.105.
1+𝑒
Confusion Matrix
• In practice, a binary classifier such as logistic regression can make two
types of errors.
• It can incorrectly assign an individual who defaults to the no default
category, or it can incorrectly assign an individual who does not
default to the default category.
• It is often of interest to determine which of these two types of errors
are being made.
• A confusion matrix, shown for the Default data in the next Table, is a
convenient way to display this information.
Confusion Matrix
True default status
Predicted No Yes Total
Default No 9627 228 9855
Status
Yes 40 105 145
Total 9667 333 10000

105
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = = 31.53%
333
9627
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = = 99.59%
9667
268
𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 𝑅𝑎𝑡𝑒 = = 2.68%
10000
Confusion Matrix
• The table reveals that the logistic regression predicted that a total of
145 people would default.
• Of these people, 105 actually defaulted and 40 did not.
• Hence only 40 out of 9, 667 of the individuals who did not default
were incorrectly labelled.
• This looks like a pretty low error rate!
Confusion Matrix
• However, of the 333 individuals who defaulted, 228 (or 68.47%) were
missed by Logistic Regression.
• So while the overall error rate is low, the error rate among individuals
who defaulted is very high.
• From the perspective of a credit card company that is trying to
identify high-risk individuals, an error rate of 228/333 = 68.47%
among individuals who default may well be unacceptable.
Sensitivity and Specificity
• terms sensitivity and specificity characterize the performance of a
classifier.
• In this case the sensitivity is the percentage of true defaulters that are
identified, a low 31.53% in this case.
• The specificity is the percentage of non-defaulters that are correctly
identified, here (1 −40/9, 667)× 100 = 99.59%.
Improving Logistic Regression Classifier
• For example, we might label any customer with a probability of
default above 20% to the default class.
• That is, we may assign an observation to default class if
Pr default = Yes X = x > 0.2 … … … (7)
Deciding the Optimal Threshold
ROC Curve
• The ROC curve is a used for simultaneously displaying two types of
errors for all possible thresholds.
• The name “ROC” comes from communications theory. It is an
acronym for receiver operating characteristics.
• The overall performance of a classifier, summarized over all possible
thresholds, is given by the area under the (ROC) curve (AUC).
• An ideal ROC curve should touch the top left corner, so the larger the
AUC the better the classifier.
General Rule
𝐴𝑈𝐶 Decision
𝐴𝑈𝐶 = 0.5 No Discrimination
0.7 ≤ 𝐴𝑈𝐶 < 0.8 Acceptable Discrimination
0.8 ≤ 𝐴𝑈𝐶 < 0.9 Excellent Discrimination
𝐴𝑈𝐶 ≥ 0.9 Outstanding Discrimination
How well does the model fit the data?
• The Hosmer-Lemeshow (HL) test is widely used to address the
question “How well does my model fit the data?”
• It serves as a goodness-of-fit (GOF) test for the logistic regression
model.
• This test is used to find out whether there is any significant evidence
against the model fitting the data well.
• If the 𝑝-value is small, this is indicative of poor fit.
• For the Default data set, the observed 𝑝-value is 0.8846, indicating
that there is no evidence of poor fit.
• So our model is indeed correctly specified!!
How well does the model fit the data?
• The logistic regression model is fitted using the method of maximum likelihood.
• The parameter estimates are those values which maximize the likelihood of the
data which have been observed.
• McFadden's R squared measure is given by
2
log 𝐿𝐶
𝑅 =1− ,
log 𝐿𝑁𝑈𝐿𝐿
where 𝐿𝐶 denotes the (maximized) likelihood value from the current fitted model,
and 𝐿𝑁𝑈𝐿𝐿 denotes the corresponding value for the null model - the model with
only an intercept and no predictors.
• McFadden's R squared measure also takes value between 0 and 1.
• For the Default data set, the McFadden's R square value is 46.19%, indicating that
the model may be useful in practice.
Training Error Rate and Test Error Rate
• The misclassification error rate calculated earlier with the optimal
threshold was 13.81%.
• However, we have used the same data to train and test our model.
• In reality, this error rate is in fact the training error rate.
• In order to assess the accuracy of the model, we should first fit a
model using a part of the data and then should examine the
performance on the “hold-out” data.
• This error rate is called the test error rate.
• Next we have used 80% of the observations to fit the model and 20%
of observations are kept aside for validating the model.

(Ebook PDF) The Basic Practice of Statistics 8Th Edition: Go To Download The Full and Correct Content Document
No ratings yet
(Ebook PDF) The Basic Practice of Statistics 8Th Edition: Go To Download The Full and Correct Content Document
43 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
Stat - Sir Corpuz
No ratings yet
Stat - Sir Corpuz
213 pages
Answer
100% (3)
Answer
5 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (2)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Semi-Detailed Lesson Plan in Statistics and Probability I. Objectives
100% (2)
Semi-Detailed Lesson Plan in Statistics and Probability I. Objectives
5 pages
Logistic+Regression - Done
100% (1)
Logistic+Regression - Done
41 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Slides With Solutions PDF
No ratings yet
Slides With Solutions PDF
483 pages
Analytical Chemistry
No ratings yet
Analytical Chemistry
4 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Week-9 Discrete Probability Distributions
No ratings yet
Week-9 Discrete Probability Distributions
97 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Partial Least Squares Structural Equation Modeling: September 2017
No ratings yet
Partial Least Squares Structural Equation Modeling: September 2017
41 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Statistics and Probability: Quarter 3 - Module 6 Central Limit Theorem
No ratings yet
Statistics and Probability: Quarter 3 - Module 6 Central Limit Theorem
20 pages
Module 2
No ratings yet
Module 2
92 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Classification
No ratings yet
Classification
56 pages
Session 5 - Logistic Regression
No ratings yet
Session 5 - Logistic Regression
69 pages
Correlation
No ratings yet
Correlation
57 pages
Name: Udaya Bir Saha Batch: 61D Student ID: 50 Marks: Out of 60 SPRING 2021
No ratings yet
Name: Udaya Bir Saha Batch: 61D Student ID: 50 Marks: Out of 60 SPRING 2021
21 pages
Lecture 06
No ratings yet
Lecture 06
55 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Exercises and Cases in Econometrics
No ratings yet
Exercises and Cases in Econometrics
30 pages
BANA 560 Lecture - 4 - LogisticRegression
No ratings yet
BANA 560 Lecture - 4 - LogisticRegression
26 pages
Class
No ratings yet
Class
102 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Chapter 10 Logistic Reg
No ratings yet
Chapter 10 Logistic Reg
29 pages
Lecture - 4 - Classification - Logistic Regression
No ratings yet
Lecture - 4 - Classification - Logistic Regression
32 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
LO3 Logistic Regression1
No ratings yet
LO3 Logistic Regression1
31 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Unit 3
No ratings yet
Unit 3
20 pages
Topic 7 Regression (Cont2) Logistic Regression
No ratings yet
Topic 7 Regression (Cont2) Logistic Regression
33 pages
Ch04 Classification P1
No ratings yet
Ch04 Classification P1
22 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
3 Classification
No ratings yet
3 Classification
26 pages
Data Science and Bigdata Analytics: Dr. Ali Imran Jehangiri
No ratings yet
Data Science and Bigdata Analytics: Dr. Ali Imran Jehangiri
20 pages
04 Chap04 ClassificationMethods-LogisticRegression 2024
No ratings yet
04 Chap04 ClassificationMethods-LogisticRegression 2024
23 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
Session 9-Logistic Regression
No ratings yet
Session 9-Logistic Regression
33 pages
Chap10 LogisticRegression
No ratings yet
Chap10 LogisticRegression
19 pages
Chapter 10 Logistic Reg (Python)
No ratings yet
Chapter 10 Logistic Reg (Python)
29 pages
DS 1 - Tut 2 - Sec A
No ratings yet
DS 1 - Tut 2 - Sec A
9 pages
A Practical Approach To Kalman Filter and How To Implement It
No ratings yet
A Practical Approach To Kalman Filter and How To Implement It
13 pages
Homework 1 - Solutions
No ratings yet
Homework 1 - Solutions
6 pages
Logistic Regression - 2011
No ratings yet
Logistic Regression - 2011
76 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecturenotes 3
No ratings yet
Lecturenotes 3
11 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
CH 8
No ratings yet
CH 8
13 pages
Bachman Pre Cup 2015
No ratings yet
Bachman Pre Cup 2015
10 pages
DSC 1371 - Chapter 07 - RV Probability Distributions
No ratings yet
DSC 1371 - Chapter 07 - RV Probability Distributions
10 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
MCQ - Basic Econometrics
No ratings yet
MCQ - Basic Econometrics
9 pages
Problem Set1 PDF
No ratings yet
Problem Set1 PDF
6 pages
Background 2.1. Logistic Definition
No ratings yet
Background 2.1. Logistic Definition
6 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
19 pages
Notes For Chapter 7
No ratings yet
Notes For Chapter 7
13 pages
The Regression Model For The Above Problem Could Be Stated As Under
No ratings yet
The Regression Model For The Above Problem Could Be Stated As Under
5 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
NPCR and UACI Randomness Tests For Image Encryption
No ratings yet
NPCR and UACI Randomness Tests For Image Encryption
8 pages
Identifying Models Using Kendall Notation
No ratings yet
Identifying Models Using Kendall Notation
4 pages
QTA 18-04-2013 Logistic Regression
No ratings yet
QTA 18-04-2013 Logistic Regression
4 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
CE204 Recitation08 Week10 Chapter4
No ratings yet
CE204 Recitation08 Week10 Chapter4
3 pages
Additive and Multiplicative
No ratings yet
Additive and Multiplicative
2 pages
Christ University Previous Year Paper-Statistics
No ratings yet
Christ University Previous Year Paper-Statistics
2 pages
POOXM: The EBook
From Everand
POOXM: The EBook
Keith Singleton
No ratings yet
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
From Everand
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
S. Deviant
4.5/5 (3)

Binary Logistic

Uploaded by

Binary Logistic

Uploaded by

Classification Methods

Logistic Regression Model

• 𝛽0 and 𝛽1 are the parameters of the model.

• The left-hand side is called the log-odds or logit.

• This is below 1%.

• As before, we use the maximum likelihood method to estimate the

You might also like