0% found this document useful (0 votes)

9 views54 pages

Logistic Regression

Uploaded by

SAI SHRIYA NYASAVAJHULA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views54 pages

Logistic Regression

Uploaded by

SAI SHRIYA NYASAVAJHULA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

1

Logistic Regression
Classification Problems 2

Classification is an important category of problems in which the decision maker

would like to classify the case/entity/customers into two or more groups.

Examples of Classification Problems:

•Customer profiling (customer segmentation)

•Customer Churn.
•Credit Classification (low, high and medium risk)
•Employee attrition.
•Fraud (classification of transaction to fraud/no-fraud)
•Stress levels
•Text Classification (Sentiment Analysis)
•Outcome of any binomial and multinomial experiment.
3

Challenging Classification Problems

• Ransomware

• Anomaly Detection

• Image Classification (Medical Devices, Satellite

images)

• Text Classification
4

Logistic Regression -
Supervised Learning Algorithm
5

Logistic Regression
Classification

Discrete Choice

Class probability
6

Probability Modelling
In regression analysis, it is assumed that the dependent variable is a metric (interval
and ratio) scale variable and independent variables are a combination of metric as well
as non-metric variables.
There is a special class of models for regression analysis in which the dependent
variable has only two values 0 and 1, where 0 represents the absence of a condition
and 1 represents the presence of a condition.
Here the dependent variable in this model is a dichotomous variable. The independent
variables are a combination of metric and non- metric variables, same as in a normal
regression analysis.
These types of models are known as probability models. The objective is to determine
the impact of input (independent) variables on the probability of occurrence of the
output (dependent) variable.
7
Probability Modelling
In regression analysis, it is assumed that the dependent variable is a metric (interval and
ratio) scale variable and independent variables are a combination of metric as well as non-
metric variables.
• There is a special class of models for regression analysis in which the dependent variable has only two values 0
and 1, where 0 represents the absence of a condition and 1 represents the presence of a condition.

Here the dependent variable in this model is a dichotomous variable. The independent
variables are a combination of metric and non- metric variables, same as in a normal
regression analysis.

These types of models are known as probability models. The objective is to determine the
impact of input (independent) variables on the probability of occurrence of the output
(dependent) variable.
8
Probability Models (binary dependent variable) with one
explanatory variable

Pi= 𝜶 + 𝜷𝟏𝑿𝒊 + 𝜺i

Assume that OLS is used to estimate the above model, then how to interpret
the estimated value of Yi?
9
• We may have the graph of the estimated model as thus:
10
Therefore , such a model is called Linear Probability Model

It is nothing but the estimated value of the following model

The estimated value is interpreted as the probability of having Yi

is as one. The interpretation of the coefficient of Xi is marginal
increase in the Probability that a particular Y takes a value one
for one unit increase in X.
11
Problem No. 1-The Linear Probability Model

As per the Linear Probability Model, the probability increases with

the values of X and it could be possible that at times, the values of
probabilities are negative (<0) or more than one (>1).

In that case, it is difficult to interpret the results.

Problem No. 2-The Linear Probability Model

• The Distribution of the Error term is not Normal Probability

Distribution but it is Binomial Probability Distribution.
13

Problem No. 3-The Linear Probability Model

• The error term are not homoscedastic but they have a

problem of heteroscedasticity.
14

Problem No. 4-The Linear Probability Model

• In the Linear Probability Model, as X increases the probability

value continues to increase/decrease at a constant rate. Since
the probability values should lie between 0 and 1, that is 0 < p
< 1, a constant rate of increase/decrease is impossible.
• We need non-linear relation to restrict the probability values
between 0 and 1.
15

Now, let’s start with LOGIT Model…

• The LOGIT Model assumes that the underlying distribution is a

logistic function or Sigmoid Function whose shape is as follows:

Note that it is
non-linear
and its value
lies between
0 and 1.
16

16
17
18

LOGIT Model…
• The LOGIT Model expresses the probability p that a dependent variable
Y takes the value 1 given Xi.
• For the LOGIT Model, a particular type of logistic function is used which is called SIGMOID
FUNCTION given below –

𝟏
f(x) =
1+e−z

• Using the above function form, we may express the probability p that a dependent variable Y
takes the value 1 given Xi assuming that there is only one explanatory variable.
1
• f(x) = ; Where z=α+βX+ε
1+ e −z
i
19

Estimation of Logit Model

The LOGIT Model is not estimated by using OLS but it is

estimated by using Maximum Likelihood Method and
thus, the estimates we obtain are known as MLEs.
20

Logistic Regression
21
What is a logistic regression:

A logit is the natural log of the odds of an event occurring.

Logistic regression was developed by statistician David Cox in

1958.
22

Logistic Model
Odd Ratio 23

𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 𝒐𝒄𝒄𝒖𝒓𝒊𝒏𝒈

Odd ratio =
𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 𝒏𝒐𝒕 𝒐𝒄𝒄𝒖𝒓𝒓𝒊𝒏𝒈

If odd Ratio = 1,
As the predictor increases the probability of occurrence of an event = Probability of non-
occurance of an event

If odd Ratio > 1,

As the predictor increases the probability of occurrence of an event increases

probability of occurrence of an event > Probability of non-occurance of an event.

If odd Ratio < 1,

it means as the predictor increases the probability of occurrence of an event decreases
probability of occurrence of an event < Probability of non-occurance of an event
24

Odd Ratio as a measure of effect size

Odd ratios also act as a measure of effect side as they tell us the
relative contribution of each variable
25

Requirements
DV nominal
IV either scale or Nominal
26

Coding of variables
Coding of nominal DV: generally 0 means absence of property of
interest (absence of heart disease). The negative response is
coded as zero.

Coding of categorical IV:

The reference variable should be coded as zero , for example if
you want to say like- as compared to females (coded as 0) males
(coded as 1) are more likely to smoke then code like in
parenthesis
27

Assumptions
There is a linear relationship between IVs and logit (loge) of DV
The error term should be independent
Rule of 10: There should be at least 10 cases per IV, but some
suggest 15
There should not be multicollinearity in data.
Multicollinearity occurs when correlation between two variables are
very high (above .85). You can check it via tolerance and VIF
There should not be significant outliers in data.
28

Wald Test

Wald test
A Wald test is used to evaluate the statistical
significance of each coefficient (b) in the model.
29

Model Fit
Log-likelihood (L):
Model fit is assessed using log-likelihood function which indicates the extent of unexplained information
after the model has been fitted. Its just opposite to R2 in linear regression

Deviance: it is defined as
Deviance = -2 LL

We can see the difference between baseline model (a model that has no variable and contains only
constant) and other models that we specify and calculate a Chi square Godness of fit
Omnibus Test = Chi square difference or Likelihood Ratio*= -2LLbaseline – (-2LL default model)
= 2LL default model – 2LLbaseline model
Df = number of parameters in default model – number of parameters in baseline model

*Note: It is called ratio because subtractions of logs are same as division of numbers, 8/3 is same is log 8
– log 3
Statistics Associated for Logistic Regression bj  0 30

There are several statistics which can be used for comparing alternative models
or evaluating the performance of a single model:
Model Chi-Square. Use the “Model Chi-Square” statistic to determine if the
overall model is statistically significant. Model Chi-square thus tests the null
hypothesis that all population logistic regression coefficients except the constant
are zero. When probability (model chi-square) <= .05, we reject the null
hypothesis that knowing the independents makes no difference in predicting the
dependent in logistic regression, and conclude that at least one coefficient ;
Hosmer-Lemeshow test of goodness of fit. Hosmer-Lemeshow test of goodness
of fit is an alternative method for testing the same hypothesis. If Hosmer-
Lemeshow test of goodness of fit is not significant ( p  ), then the model has
adequate fit. By the same token, if the test is significant, the model does not
adequately fit the data. 30
31
Introduction

Wald statistic (test): The Wald statistic is a test which is commonly

used to test the significance of individual logistic regression coefficients
for each independent variable (that is, to test the null hypothesis in
logistic regression that a particular logit (effect) coefficient is zero). The
null hypothesis is rejected, i. e., if bj  0
; p  
The percentage of the correct predictions. The percentage of the
correct predictions is an important measure for a model's usefulness.
The "Percent Correct Predictions" statistic assumes that if the estimated
percent is greater than or equal to 50% then the event is expected to
occur and not occur otherwise.

31
32

Link

https://www.youtube.com/watch?v=FG6FRZLtCMs

https://www.linkedin.com/pulse/checks-logistic-regressions-sray-
agarwal
https://www.linkedin.com/pulse/confusion-matrix-type-i-ii-error-
swaroop-shinde

32
33

Errors in Classification
34
35
36
37
38

Confusion Matrix:

38
39
Misclassifying a true negative as positive or False
positive is Type I error and otherwise s type II error

Sensitivity = TPR = TP/(FN+TP)

Specificity = TNR = TN/(TN+FP)

39
Sensitivity, Specificity and Precision 40

The ability of the model to correctly classify positives and negatives are called
sensitivity and specificity, respectively. The terminologies sensitivity and specificity
originated in medical diagnostics.
In generic case
Sensitivity = P(model classifies Yi as positive | Yi is positive)
Sensitivity is calculated using the following equation:

Sensitivity (Recall)=
True Positive (TP)
True Positive (TP)  False Negative (FN)
where True Positive (TP) is the number of positives correctly classified as positives
by the model and False Negative (FN) is positives misclassified as negative by the
model. Sensitivity is also called as recall.
Specificity 41

Specificity is the ability of the diagnostic test to correctly classify the test as
negative when the disease is not present. That is:
Specificity = P(diagnostic test is negative | patient has no disease)
In general:
Sensitivity = P(model classifies Yi as negative | Yi is negative)
Specificity can be calculated using the following equation:

True Negative (TN)

Specificity = True Negative (TN)  False Positive (FP)

where True Negative (TN) is number of the negatives correctly classified as

negatives by the model and False Positive (FP) is number of negatives
misclassified as positives by the model.
42
The decision maker has to consider the tradeoff between sensitivity
and specificity to arrive at an optimal cut-off probability.
Precision measures the accuracy of positives classified by the
model.
Precision = P(patient has disease | diagnostic test is positive)

True Positive (TP)

Precision =
True Positive (TP)  False Positive (FP)

F Score (F Measure) is another measure used in binary logistic

regression that combines both precision and recall and is given by:

2  Precision  Recall
F  Score 
Precision  Recall
43
44

Confusion Matrix:

44
Concordant and Discordant Pairs
45

Discordant Pairs. A pair of positive and negative observations

for which the model has no cut-off probability to classify both of
them correctly are called discordant pairs.

Concordant Pairs. A pair of positive and negative observations

for which the model has a cut-off probability to classify both of
them correctly are called concordant pairs.
46

Concordance & Discordance

A pair is said to be concordant if 1 has a higher predicted probability than 0 and is said
to be discordant if 0 has a higher predicted probability than 1 and is tied if both are
same.
Steps: After calculating the estimated probability using logistic regression divide the
data set in two groups, one with all 1’s and one with all 0’s.

Now make groups of 1 and 0 and compare their probability to decide which one is
concordant and which set is discordant

46
47

Percent Concordant:

Percent Concordant: Percentage of pairs where the observation

with the desired outcome (event) has a higher predicted probability
than the observation without the outcome (non-event).
(Number of concordant pairs)/Total number of pairs

47
48

Percent Discordance:

Percent Discordance: Percentage of pairs where the observation

with the desired outcome (event) has a lower predicted probability
than the observation without the outcome (non-event).
(Number of discordant pairs)/Total number of pairs

48
49

Percent Tied:

Percent Tied: Percentage of pairs where the observation with the

desired outcome (event) has same predicted probability as the
observation without the outcome (non-event).
(Number of tied pairs)/Total number of pairs
50

Area under curve (c statistics) = Percent Concordant + 0.5 * Percent

Tied
Example, there are four mushrooms (A, B, C and D) two of which are
non-poisonous (being 1) and two are poisonous (being 0) as (A:1, B:0,
C:1, D:0) with probability of 0.70, 0.63, 0.23, 0.15 respectively.
Thus we can have four pairs
[A,B]=[1,0]=[0.70,0.63] : Concordant
[A,D]=[1,0]=[0.70,0.15] : Concordant
[B,C]=[0,1]=[0.63,0.23] : Discordance
[C,D]=[1,0]=[0.23,0.15] : Concordant
Somers D = (% concordant pair - % discordant pair). Higher the
Somers D the better the model is.
Threshold 51

The best threshold (or cutoff) point to be used in glm models is the
point which maximises the specificity and the sensitivity. This
threshold point might not give the highest prediction in your
model, but it wouldn't be biased towards positives or negatives.
The ROCR package contain functions that can help you do this.

In some applications of ROC curves, you want the point closest to the TPR of (1) and FPR of (0). This cut
point is “optimal” in the sense it weighs both sensitivity and specificity equally.
52

Case 1
File: LPM
Dependent Variable: Probability-----(1: Loan given; 0: not given)
Independent Variable: Income
1. Run Logistic Regression analysis
2. What is odds ratio
3. Write the Logistic Regression Equation
4. Interpret the Logistic Regression equation w.r.t exponential beta and
dependent variable. What is the use of Exponential beta, walds stats and
significance value.
5. What is Classification Table.
6. What is cut off value.
7. What is hit ratio
8. Calculate Sensitivity, Specificity, Precision and Calculate F Ratio.
Case 2: Challenger Crash data
53
Read from the book chapter Logistic Regression of
Business analytics by U Dinesh Kumar
1. https://bookdown.org/egarpor/SSS2-UC3M/logreg-examps.html

1. Run Logistic Regression analysis

2. What is odds ratio
3. Write the Logistic Regression Equation
4. Interpret the Logistic Regression equation w.r.t exponential
beta and dependent variable
5. What is Classification Table
6. What is cut off value. Calculate 2-classification table at cut off
value at .5 and .2
7. What is hit ratio for cut off value of .5 and .2
8. Calculate Sensitivity, Specificity, Precision and Calculate F
Ratio from classification table at .5 and .2 cut off value.
54
Case 3: Titanic Data
Collect data from Kraggle/github for Titanic Data and solve the case

1. Run Logistic Regression analysis

All Link File
100% (1)
All Link File
4 pages
1st Year Fee Receipt
No ratings yet
1st Year Fee Receipt
1 page
Marketing Myopia Div C
No ratings yet
Marketing Myopia Div C
13 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Sai Shriya Nyasavajhula: Visualizing Citibike Trips With Tableau
No ratings yet
Sai Shriya Nyasavajhula: Visualizing Citibike Trips With Tableau
1 page
Div C ICA Term End Project
No ratings yet
Div C ICA Term End Project
25 pages
Manual de Servicio - Accutorr-3
No ratings yet
Manual de Servicio - Accutorr-3
66 pages
Assurance Features and Navigation: Cisco DNA Center 1.1.2 Training
No ratings yet
Assurance Features and Navigation: Cisco DNA Center 1.1.2 Training
54 pages
Fundamentals of ML - Pre Quiz - Attempt Review
No ratings yet
Fundamentals of ML - Pre Quiz - Attempt Review
4 pages
HDS Rev 1
No ratings yet
HDS Rev 1
52 pages
Az-700 Dumps
No ratings yet
Az-700 Dumps
7 pages
Quad9 - A Public and Free DNS Service For A Better Security and
No ratings yet
Quad9 - A Public and Free DNS Service For A Better Security and
3 pages
Five Generations of Computers
No ratings yet
Five Generations of Computers
7 pages
Program
No ratings yet
Program
34 pages
Nama Item Harga Keterangan
No ratings yet
Nama Item Harga Keterangan
14 pages
622400048en Ats4000ts Ifu 080513
No ratings yet
622400048en Ats4000ts Ifu 080513
75 pages
6.2.3.10 Lab - Troubleshooting Multiarea OSPFv2 and OSPFv3 - ILM PDF
95% (19)
6.2.3.10 Lab - Troubleshooting Multiarea OSPFv2 and OSPFv3 - ILM PDF
29 pages
Document Management Plan - The Complete Guide
No ratings yet
Document Management Plan - The Complete Guide
15 pages
Sinp - Oasis: User Guide For Applicants
No ratings yet
Sinp - Oasis: User Guide For Applicants
53 pages
CMake Lists
No ratings yet
CMake Lists
15 pages
Spoto Ccna 200-125 Dumps
No ratings yet
Spoto Ccna 200-125 Dumps
5 pages
Tender 1526 ICT
No ratings yet
Tender 1526 ICT
7 pages
OOPs Lab Question Bank
No ratings yet
OOPs Lab Question Bank
4 pages
Browser Compatibility Testing
No ratings yet
Browser Compatibility Testing
6 pages
SDC LAB Manual
No ratings yet
SDC LAB Manual
40 pages
Introduction To Digital Literacy
No ratings yet
Introduction To Digital Literacy
40 pages
6 - 2D Viewing Transformation
No ratings yet
6 - 2D Viewing Transformation
31 pages
Turbo C - Input and Output Statements
No ratings yet
Turbo C - Input and Output Statements
16 pages
Excel Formulas Cheat Sheet Detailed
No ratings yet
Excel Formulas Cheat Sheet Detailed
42 pages
AIML Notes
No ratings yet
AIML Notes
75 pages
Basics On SDH From STM-1 Up To
No ratings yet
Basics On SDH From STM-1 Up To
124 pages
Software, Os, Assembler, Interpreter and Compiler
No ratings yet
Software, Os, Assembler, Interpreter and Compiler
20 pages
Performance Task 1 Prog 114 No. 2 B
100% (1)
Performance Task 1 Prog 114 No. 2 B
4 pages
Import Java - Util.stack Public Class Linkedinterface (
No ratings yet
Import Java - Util.stack Public Class Linkedinterface (
4 pages
Finite Element Method Magnetics: Download: 32-Bit Executable 64-Bit Executable Source
No ratings yet
Finite Element Method Magnetics: Download: 32-Bit Executable 64-Bit Executable Source
4 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

1

Classification is an important category of problems in which the decision maker

Examples of Classification Problems:

•Customer profiling (customer segmentation)

Challenging Classification Problems

• Image Classification (Medical Devices, Satellite

It is nothing but the estimated value of the following model

The estimated value is interpreted as the probability of having Yi

As per the Linear Probability Model, the probability increases with

In that case, it is difficult to interpret the results.

Problem No. 2-The Linear Probability Model

• The Distribution of the Error term is not Normal Probability

Problem No. 3-The Linear Probability Model

• The error term are not homoscedastic but they have a

Problem No. 4-The Linear Probability Model

• In the Linear Probability Model, as X increases the probability

Now, let’s start with LOGIT Model…

• The LOGIT Model assumes that the underlying distribution is a

Estimation of Logit Model

The LOGIT Model is not estimated by using OLS but it is

A logit is the natural log of the odds of an event occurring.

Logistic regression was developed by statistician David Cox in

𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 𝒐𝒄𝒄𝒖𝒓𝒊𝒏𝒈

If odd Ratio > 1,

probability of occurrence of an event > Probability of non-occurance of an event.

If odd Ratio < 1,

Odd Ratio as a measure of effect size

Coding of categorical IV:

Wald statistic (test): The Wald statistic is a test which is commonly

Sensitivity = TPR = TP/(FN+TP)

True Negative (TN)

where True Negative (TN) is number of the negatives correctly classified as

True Positive (TP)

F Score (F Measure) is another measure used in binary logistic

Discordant Pairs. A pair of positive and negative observations

Concordant Pairs. A pair of positive and negative observations

Concordance & Discordance

Percent Concordant: Percentage of pairs where the observation

Percent Discordance: Percentage of pairs where the observation

Percent Tied: Percentage of pairs where the observation with the

Area under curve (c statistics) = Percent Concordant + 0.5 * Percent

1. Run Logistic Regression analysis

1. Run Logistic Regression analysis

You might also like