0% found this document useful (0 votes)

3 views33 pages

Topic 7 Regression (Cont2) Logistic Regression

The document discusses logistic regression, particularly its application in predicting loan approvals based on categorical dependent variables. It outlines the importance of model performance metrics such as accuracy, sensitivity, and specificity, as well as the process of partitioning data for model training and testing. Additionally, it emphasizes the significance of choosing appropriate cut-off values for classification and the systematic approach to model building in regression analysis.

Uploaded by

dieudan0505

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

Topic 7 Regression (Cont2) Logistic Regression

Uploaded by

dieudan0505

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Topic 7 Regression (cont)

Logistic regression
Vincent Hoang (2022), Lecture 11
Logistic regression analysis
Logistic regression for dependent categorical variable.
How to assess models’ performance using differing metrics and different research purposes for
classification models.
Loan application modelling
• There are many factors that determine if a loan application is approved
or not approved by the bank
◦ What are the dependent variable?

• Note that our dependent variable is not continuous but rather

dichotomous (note original data is ordinal)
• Now we can use logistic regression to build this specific dichotomous
classification regression model.
Classification models
• Classification methods seek to classify a categorical outcome into one of
two or more categories based on various data attributes.
• For each record (observation) in a dataset, we have a categorical variable
of interest and additional predictor variables.
• For a given set of predictor variables, we would like to assign the best
value of the categorical variable (dependent variable).
◦ Should we approve this loan or not? Is this person high risk or not?
• Very useful practical implications: efficient allocation of resources,
targeted marketing, better decisions, etc.
Visual classification methods
• A simple plot says a lot – can you see any distinct points where loans go from
being approved to not approved looking at credit history and credit score?

Large
bubbles
means
rejected
applications

Reject if Credit Score < 640 Reject if Years + 0.095*(Credit Score) < 74.66
Logistic regression (1)
• It uses predictor variables to estimate the log-odds (not the probabilities) of an
observation belonging to the category of interest (Y=1)

where ln is the natural log, p is the probability of the event of interest happens.
• Once estimated, we can use algebra to rearrange the above equation for the
predicted probability (p)

⋯
Logistic regression vs linear regression
• We could use simple or multiple regression to estimate the probabilities – tends to work well near
the centre of the data, but doesn’t do well in the tails

• Predicted probabilities from logistic regression are bounded between 0 and 1, while predicted
probabilities from linear regression can fall outside these bounds

• Logistic regression uses maximum likelihood estimation to estimate parameters, while linear
regression use ordinary least squares.
Credit
approved
data
Probabilities vs odds
• p: the probability that an event will occur as the fraction of times you expect to see
that event in many trials.
• The odds are defined as the probability that the event will occur divided by the
probability that the event will not occur.
• Example: If the horse runs 100 races and wins 80, the probability of winning is
80/100 = 0.80 or 80%, and the odds of winning are 80/20 = 4 to 1.
◦ The probability of success is p=0.8, then the probability of failure is 1−p=0.2
.
◦ The odds of success is = .
= 4, i.e. the odds of success is 4 to 1
. .
◦ The odds of failure is .
= .
= 0.25 to 1
Odds vs odds ratios
• Example: seven out of 10 males are admitted to an engineering school while three of 10
females are admitted.
• For male,
◦ the probability of admission is p=0.7, then the probability of failure is 1−p=0.3
.
◦ the odds of success is = =2.333
.

• For female:
◦ the probability of admission is p=0.3, then the probability of failure is 1−p=0.7
.
◦ the odds of success is = =0.42857
.

• The odds ratios for admission would be OR = 2.333/.42857 = 5.44. So for a male, the odds
of being admitted are 5.44 times as large as the odds for a female being admitted.
Logistic regression (2)
• If we know the odd values, we can calculate the probability value.
• For example, re-arrange the above equation for the predicted probability (p)

• The predicted probabilities smoothly approach 0 and 1 as values of independent

variables go to their own extremes.
• If we plot the predictions against a single continuously-distributed independent
variable while holding the others fixed, we see an S-shaped or reverse-S shaped
"logistic curve."
Logistic regression (3)
• There are applications of logistic regression in which the objective is to build a
model to generate a predicted probability of the dependent event under a given set
of values for the independent variables.
• In other settings the objective is to make categorical predictions: a definite 1 or a
definite 0 in each case (our case study for loan application)
◦ Usually this is done by adding one more parameter to the model: a cut-off value.
◦ If the predicted probability of a positive outcome is greater than the cut-off value, it is categorically
predicted to be a 1, otherwise 0.
◦ The chosen cut-off value depend on the relative consequences of type 1 and type 2 errors (false
positives and false negatives), although a value of 0.5 is often used by default.
Measuring classification performance
• How well did our chosen model do?
• Therefore, we normally look at three measures
1. Accuracy – the percentage of observations that were correctly classified
2. Sensitivity – the percentage of positives (Dependent Variable = 1) that were correctly classified
3. Specificity – the percentage of negatives (Dependent Variable = 0) that were correctly classified

• We can also estimate our model using a subset of our data and evaluate its
performance out-of-sample in a test sample.
Partitioning data
• Partitioning a data set is splitting the data randomly into two,
sometimes three smaller data sets: Training, Validation and Test.
◦ Training: The subset of data used to create (build) a model
◦ Validation: the subset of data that remains unseen when building the model
and is used to tune the model parameter estimates.
◦ Test (hold-out): A subset of data used to measure overall model performance
and compare the performance among different candidate models.
Classification performance - Accuracy
• Accuracy is simply the percentage of observations that are correctly classified
Predicted
Total
No = 0 Yes = 1

No = 0 True Negative False Positive Total Actual Negatives

Actual
Yes = 1 False Negative True Positive Total Actual Positives

Total Total Predicted Negatives Total Predicted Positives Overall Total

Classification performance - Sensitivity
• Sensitivity is the percentage of positives (Dummy = 1) that are correctly
classified
Predicted
Total
No = 0 Yes = 1
No = 0 True Negative False Positive Total Actual Negatives
Actual
Yes = 1 False Negative True Positive Total Actual Positives
Total Total Predicted Negatives Total Predicted Positives Overall Total
Classification performance - Specificity
• Specificity is the percentage of negatives (Dummy = 0) that are correctly
classified
Predicted
Total
No = 0 Yes = 1
No = 0 True Negative False Positive Total Actual Negatives
Actual
Yes = 1 False Negative True Positive Total Actual Positives
Total Total Predicted Negatives Total Predicted Positives Overall Total
Classification threshold – cut-off value
• The cut-off value: At this point we classify an observation as belonging to the
category of interest (1) or the reference group (0)
• Common cut-off values include:
◦ 0.5
◦ Finding the mid-point between the smallest and largest predicted scores/probabilities

• A trade-off between sensitivity and specificity – given the context, are you more
willing to have false positives or false negatives?
◦ For example, are you more willing to approve a loan that shouldn’t be approved (false positive),
or not approve a loan that should be approved (false negative)?
An example –
loan approval data
Logistic regression to
predict loan approval

Spreadsheet: Topic 7 Credit Approval Data.xlsx

We will use logistic regression to predict loan

approval on the following predictors:
◦ Homeowner status
◦ Credit score
◦ Years of credit history
◦ Balance of existing credit facilities the applicant has and can use
◦ % of existing credit facility utilised by applicant
Logistic regression with “RegressIt” in Excel
• You will need to download the “RegressIt” Excel add-in at the link below,
then activate it
◦ https://regressit.com/logistic.html

• To activate, go to File>Options>Add-Ins>Manage Add-ins, then search for

the RegressIt file that you downloaded (it will likely be in your “Downloads”
folder)
• RegressIt toolbar:
Data preparation
• The data provided has two parts:
◦ 50 loan applicants, their details, and
whether their loan was approved or not.
◦ 6 potential applicants that have yet to be
assessed.

• In the RegressIt tab, click on “Select

Data”, then click on “Create Names”
and follow the prompts. This will tell
Excel what data you are using.
Make sure all the data is selected.
The Logistic
Regression
interface

• To run a logistic
regression, click on
the “Logistic
Regression” button
• Take the time to
familiarise yourself
with the instruction
manual
Running the
regression

• Let’s start with a simpler

model – predicting loan
approval with credit score
and homeowner status
Logistic regression with “Real Statistic” in
Excel
• You will need to download the “XRealStats” Excel add-in at the link
below, then activate it
◦ https://real-statistics.com/free-download/
◦ To activate, go to File>Options>Add-Ins>Manage Add-ins, then search for the
XRealStats file that you downloaded (it will likely be in your “Downloads” folder)

• Real Statistic toolbar:

The Logistic
Regression
interface

• To run a logistic
regression, click on
the “Logistic
Regression” button
• Then add input
range
Results
• Log-odds of approval is positively and significantly related to credit
score and homeowner status:
𝑝
ln = −30.6003 + 0.0413 ∗ 𝐶𝑟𝑒𝑑𝑖𝑡 𝑆𝑐𝑜𝑟𝑒 + 4.7397 ∗ 𝐻𝑜𝑚𝑒𝑜𝑤𝑛𝑒𝑟
1−𝑝
Classification performance
• 45/50 (90%) correctly classified
(accuracy)
• 21/23 (91.3%) of approvals correctly
classified (sensitivity)
• 24/27 (88.9%) of rejections correctly
classified (specificity)
Out-of-sample
testing –
thoughts?
• Let’s evaluate the
classification in
the test sample
Predicted probabilities
Observation Homeowner Credit Score
51 1 700
52 0 520
53 1 650
54 0 602
55 0 549
56 1 742

Let’s predict the probabilities of getting approval of the observations 51-55

Classification threshold – cut-off value
• The cut-off value: At this point we classify an observation as belonging to the
category of interest (1) or the reference group (0)
• Common cut-off values include:
◦ 0.5
◦ Finding the mid-point between the smallest and largest predicted scores/probabilities (not
discussed here)

• A trade-off between sensitivity and specificity – given the context, are you more
willing to have false positives or false negatives?
Model 1: using different cut-off values
• Cut off value of 0.5  0.7  0.8

We can change the cut-off value;

classification tables will change accordingly.
Sensitivity Specificity Accuracy
Comparing models (using cut-off of 0.5)
• Model 1 Model 2
Systematic modelling approach
• Analysts can take an approach in which
◦ They start with a simple model with several key variables
◦ How to determine what are key variables? Often we rely on
◦ Theories or theoretical foundations: prices going up, less consumption
◦ Widely accepted views or norms: managers earn more than non-managerial employees or experienced staff
get higher paid.
◦ Your own experience or research understanding (e.g. your work using multiple regression analysis
◦ Research objectives.
◦ Then they can add “more” variables into the model, in doing so they watch out for
◦ Increased in adjusted R-square
◦ Issues of multicollinearity
◦ Statistical significance of predictor / explanatory / independent X variables (together with signs of the
coefficient and magnitude of the coefficient)
Systematic modelling (2)
• Over the last few weeks, we have been following another approach in
which
◦ We start with a complicated model with many independent variables, we acknowledge
of potential issues related to multicollinearity
◦ Then we can remove those are not statistically significant to get to a simpler model

• In general, both approaches are good for the purpose of examining a

certain hypothesis or understanding more about the situations.
• In case we are more concerned about “predictive” power, then there are
other measures of model performance as discussed today.

22 Scheme CSE
No ratings yet
22 Scheme CSE
65 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
IBM SANBasics
50% (2)
IBM SANBasics
162 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
An Introduction To Music Technology 2nd Edition Dan Hosken PDF Download
No ratings yet
An Introduction To Music Technology 2nd Edition Dan Hosken PDF Download
54 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
Speakout Grammar Extra Intermediate Plus Unit 4
No ratings yet
Speakout Grammar Extra Intermediate Plus Unit 4
2 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Practice 7 QC Tool Part 1
No ratings yet
Practice 7 QC Tool Part 1
11 pages
Classification
No ratings yet
Classification
56 pages
002 6030 RH120E Undercarriage CAT
No ratings yet
002 6030 RH120E Undercarriage CAT
26 pages
Ready To Install: Distribution Boards, DIN Rail Mounted Equipment, Enclosures and Connection Systems
No ratings yet
Ready To Install: Distribution Boards, DIN Rail Mounted Equipment, Enclosures and Connection Systems
215 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
STS PDF
100% (1)
STS PDF
4 pages
Chapter 10 Logistic Reg (Python)
No ratings yet
Chapter 10 Logistic Reg (Python)
29 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
Module 2
No ratings yet
Module 2
92 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Session 9-Logistic Regression
No ratings yet
Session 9-Logistic Regression
33 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
Past Question
No ratings yet
Past Question
50 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Chapter 4 Statistical Classification Methods
No ratings yet
Chapter 4 Statistical Classification Methods
63 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
57 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Unit 3-2
No ratings yet
Unit 3-2
20 pages
Chapter 10 Logistic Reg
No ratings yet
Chapter 10 Logistic Reg
29 pages
SMDS Unit 5
No ratings yet
SMDS Unit 5
21 pages
Logistic+Regression - Done
100% (1)
Logistic+Regression - Done
41 pages
G-06-Autonomous Database - Serverless and Dedicated-Transcript
No ratings yet
G-06-Autonomous Database - Serverless and Dedicated-Transcript
7 pages
Chap10 LogisticRegression
No ratings yet
Chap10 LogisticRegression
19 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
BANA 560 Lecture - 4 - LogisticRegression
No ratings yet
BANA 560 Lecture - 4 - LogisticRegression
26 pages
Kodak PIXPRO FZ53 Manual
No ratings yet
Kodak PIXPRO FZ53 Manual
91 pages
BA TopicB LoR
No ratings yet
BA TopicB LoR
29 pages
A Qac Spe 000 00001specification For Arh Compliance Requirements PDF Free
No ratings yet
A Qac Spe 000 00001specification For Arh Compliance Requirements PDF Free
30 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Session 7-8 - Data Cleaning and Logistic Regression For Classification
No ratings yet
Session 7-8 - Data Cleaning and Logistic Regression For Classification
30 pages
MLS - Logistic Regression
No ratings yet
MLS - Logistic Regression
13 pages
7.logistics Regression - BDSM - Oct - 2020
No ratings yet
7.logistics Regression - BDSM - Oct - 2020
49 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
User Manual 44953
No ratings yet
User Manual 44953
32 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
Camozzi
No ratings yet
Camozzi
24 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Logistic Regression With R
No ratings yet
Logistic Regression With R
58 pages
Adafruit Cap1188 Breakout
No ratings yet
Adafruit Cap1188 Breakout
20 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
7 The Internet
No ratings yet
7 The Internet
27 pages
Logistic Regression:: PGP Dse Bangalore July 2018
No ratings yet
Logistic Regression:: PGP Dse Bangalore July 2018
62 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Intel It Annual Performance Report 2021 2022 Paper
No ratings yet
Intel It Annual Performance Report 2021 2022 Paper
19 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Some Results On The Graph Theory For Complex Neutrosophic Sets
No ratings yet
Some Results On The Graph Theory For Complex Neutrosophic Sets
32 pages
Senior Data Engineer
No ratings yet
Senior Data Engineer
4 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
17 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Summer Internship Report 1
No ratings yet
Summer Internship Report 1
67 pages
Flipkart 190325140735
No ratings yet
Flipkart 190325140735
12 pages
Ranchhod Rangila
No ratings yet
Ranchhod Rangila
13 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
COM155 F2019 Sheet3
No ratings yet
COM155 F2019 Sheet3
3 pages
Crash 2024 02 22 - 19.05.51 Client
No ratings yet
Crash 2024 02 22 - 19.05.51 Client
6 pages
Muayad CV
No ratings yet
Muayad CV
1 page
Lecture4 (Piecewise Interpolation)
No ratings yet
Lecture4 (Piecewise Interpolation)
7 pages
Ec3401 Set4
No ratings yet
Ec3401 Set4
2 pages
Matlab Assignment
No ratings yet
Matlab Assignment
4 pages
A 950 - A 950M - 99 (Reapproved 2003) PDF
No ratings yet
A 950 - A 950M - 99 (Reapproved 2003) PDF
5 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet

Topic 7 Regression (Cont2) Logistic Regression

Uploaded by

Topic 7 Regression (Cont2) Logistic Regression

Uploaded by

Topic 7 Regression (cont)

• Note that our dependent variable is not continuous but rather

• The predicted probabilities smoothly approach 0 and 1 as values of independent

No = 0 True Negative False Positive Total Actual Negatives

Total Total Predicted Negatives Total Predicted Positives Overall Total

Spreadsheet: Topic 7 Credit Approval Data.xlsx

We will use logistic regression to predict loan

• To activate, go to File>Options>Add-Ins>Manage Add-ins, then search for

• In the RegressIt tab, click on “Select

• Let’s start with a simpler

• Real Statistic toolbar:

Let’s predict the probabilities of getting approval of the observations 51-55

We can change the cut-off value;

• In general, both approaches are good for the purpose of examining a

You might also like