0% found this document useful (0 votes)
1 views

Logistic Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Logistic Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

1

Logistic Regression
Classification Problems 2

Classification is an important category of problems in which the decision maker


would like to classify the case/entity/customers into two or more groups.

Examples of Classification Problems:

•Customer profiling (customer segmentation)


•Customer Churn.
•Credit Classification (low, high and medium risk)
•Employee attrition.
•Fraud (classification of transaction to fraud/no-fraud)
•Stress levels
•Text Classification (Sentiment Analysis)
•Outcome of any binomial and multinomial experiment.
3

Challenging Classification Problems


• Ransomware

• Anomaly Detection

• Image Classification (Medical Devices, Satellite


images)

• Text Classification
4

Logistic Regression -
Supervised Learning Algorithm
5

Logistic Regression
Classification

Discrete Choice

Class probability
6

Probability Modelling
In regression analysis, it is assumed that the dependent variable is a metric (interval
and ratio) scale variable and independent variables are a combination of metric as well
as non-metric variables.
There is a special class of models for regression analysis in which the dependent
variable has only two values 0 and 1, where 0 represents the absence of a condition
and 1 represents the presence of a condition.
Here the dependent variable in this model is a dichotomous variable. The independent
variables are a combination of metric and non- metric variables, same as in a normal
regression analysis.
These types of models are known as probability models. The objective is to determine
the impact of input (independent) variables on the probability of occurrence of the
output (dependent) variable.
7
Probability Modelling
In regression analysis, it is assumed that the dependent variable is a metric (interval and
ratio) scale variable and independent variables are a combination of metric as well as non-
metric variables.
• There is a special class of models for regression analysis in which the dependent variable has only two values 0
and 1, where 0 represents the absence of a condition and 1 represents the presence of a condition.

Here the dependent variable in this model is a dichotomous variable. The independent
variables are a combination of metric and non- metric variables, same as in a normal
regression analysis.

These types of models are known as probability models. The objective is to determine the
impact of input (independent) variables on the probability of occurrence of the output
(dependent) variable.
8
Probability Models (binary dependent variable) with one
explanatory variable

Pi= 𝜶 + 𝜷𝟏𝑿𝒊 + 𝜺i

Assume that OLS is used to estimate the above model, then how to interpret
the estimated value of Yi?
9
• We may have the graph of the estimated model as thus:
10
Therefore , such a model is called Linear Probability Model

It is nothing but the estimated value of the following model

The estimated value is interpreted as the probability of having Yi


is as one. The interpretation of the coefficient of Xi is marginal
increase in the Probability that a particular Y takes a value one
for one unit increase in X.
11
Problem No. 1-The Linear Probability Model

As per the Linear Probability Model, the probability increases with


the values of X and it could be possible that at times, the values of
probabilities are negative (<0) or more than one (>1).

In that case, it is difficult to interpret the results.


12

Problem No. 2-The Linear Probability Model

• The Distribution of the Error term is not Normal Probability


Distribution but it is Binomial Probability Distribution.
13

Problem No. 3-The Linear Probability Model

• The error term are not homoscedastic but they have a


problem of heteroscedasticity.
14

Problem No. 4-The Linear Probability Model

• In the Linear Probability Model, as X increases the probability


value continues to increase/decrease at a constant rate. Since
the probability values should lie between 0 and 1, that is 0 < p
< 1, a constant rate of increase/decrease is impossible.
• We need non-linear relation to restrict the probability values
between 0 and 1.
15

Now, let’s start with LOGIT Model…

• The LOGIT Model assumes that the underlying distribution is a


logistic function or Sigmoid Function whose shape is as follows:

Note that it is
non-linear
and its value
lies between
0 and 1.
16

16
17
18

LOGIT Model…
• The LOGIT Model expresses the probability p that a dependent variable
Y takes the value 1 given Xi.
• For the LOGIT Model, a particular type of logistic function is used which is called SIGMOID
FUNCTION given below –

𝟏
f(x) =
1+e−z

• Using the above function form, we may express the probability p that a dependent variable Y
takes the value 1 given Xi assuming that there is only one explanatory variable.
1
• f(x) = ; Where z=α+βX+ε
1+ e −z
i
19

Estimation of Logit Model

The LOGIT Model is not estimated by using OLS but it is


estimated by using Maximum Likelihood Method and
thus, the estimates we obtain are known as MLEs.
20

Logistic Regression
21
What is a logistic regression:

A logit is the natural log of the odds of an event occurring.

Logistic regression was developed by statistician David Cox in


1958.
22

Logistic Model
Odd Ratio 23

𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 𝒐𝒄𝒄𝒖𝒓𝒊𝒏𝒈


Odd ratio =
𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 𝒏𝒐𝒕 𝒐𝒄𝒄𝒖𝒓𝒓𝒊𝒏𝒈

If odd Ratio = 1,
As the predictor increases the probability of occurrence of an event = Probability of non-
occurance of an event

If odd Ratio > 1,


As the predictor increases the probability of occurrence of an event increases

probability of occurrence of an event > Probability of non-occurance of an event.

If odd Ratio < 1,


it means as the predictor increases the probability of occurrence of an event decreases
probability of occurrence of an event < Probability of non-occurance of an event
24

Odd Ratio as a measure of effect size


Odd ratios also act as a measure of effect side as they tell us the
relative contribution of each variable
25

Requirements
DV nominal
IV either scale or Nominal
26

Coding of variables
Coding of nominal DV: generally 0 means absence of property of
interest (absence of heart disease). The negative response is
coded as zero.

Coding of categorical IV:


The reference variable should be coded as zero , for example if
you want to say like- as compared to females (coded as 0) males
(coded as 1) are more likely to smoke then code like in
parenthesis
27

Assumptions
There is a linear relationship between IVs and logit (loge) of DV
The error term should be independent
Rule of 10: There should be at least 10 cases per IV, but some
suggest 15
There should not be multicollinearity in data.
Multicollinearity occurs when correlation between two variables are
very high (above .85). You can check it via tolerance and VIF
There should not be significant outliers in data.
28

Wald Test

Wald test
A Wald test is used to evaluate the statistical
significance of each coefficient (b) in the model.
29

Model Fit
Log-likelihood (L):
Model fit is assessed using log-likelihood function which indicates the extent of unexplained information
after the model has been fitted. Its just opposite to R2 in linear regression

Deviance: it is defined as
Deviance = -2 LL

We can see the difference between baseline model (a model that has no variable and contains only
constant) and other models that we specify and calculate a Chi square Godness of fit
Omnibus Test = Chi square difference or Likelihood Ratio*= -2LLbaseline – (-2LL default model)
= 2LL default model – 2LLbaseline model
Df = number of parameters in default model – number of parameters in baseline model

*Note: It is called ratio because subtractions of logs are same as division of numbers, 8/3 is same is log 8
– log 3
Statistics Associated for Logistic Regression bj  0 30

There are several statistics which can be used for comparing alternative models
or evaluating the performance of a single model:
Model Chi-Square. Use the “Model Chi-Square” statistic to determine if the
overall model is statistically significant. Model Chi-square thus tests the null
hypothesis that all population logistic regression coefficients except the constant
are zero. When probability (model chi-square) <= .05, we reject the null
hypothesis that knowing the independents makes no difference in predicting the
dependent in logistic regression, and conclude that at least one coefficient ;
Hosmer-Lemeshow test of goodness of fit. Hosmer-Lemeshow test of goodness
of fit is an alternative method for testing the same hypothesis. If Hosmer-
Lemeshow test of goodness of fit is not significant ( p  ), then the model has
adequate fit. By the same token, if the test is significant, the model does not
adequately fit the data. 30
31
Introduction

Wald statistic (test): The Wald statistic is a test which is commonly


used to test the significance of individual logistic regression coefficients
for each independent variable (that is, to test the null hypothesis in
logistic regression that a particular logit (effect) coefficient is zero). The
null hypothesis is rejected, i. e., if bj  0
; p  
The percentage of the correct predictions. The percentage of the
correct predictions is an important measure for a model's usefulness.
The "Percent Correct Predictions" statistic assumes that if the estimated
percent is greater than or equal to 50% then the event is expected to
occur and not occur otherwise.

31
32

Link

https://www.youtube.com/watch?v=FG6FRZLtCMs

https://www.linkedin.com/pulse/checks-logistic-regressions-sray-
agarwal
https://www.linkedin.com/pulse/confusion-matrix-type-i-ii-error-
swaroop-shinde

32
33

Errors in Classification
34
35
36
37
38

Confusion Matrix:

38
39
Misclassifying a true negative as positive or False
positive is Type I error and otherwise s type II error

Sensitivity = TPR = TP/(FN+TP)


Specificity = TNR = TN/(TN+FP)

39
Sensitivity, Specificity and Precision 40

The ability of the model to correctly classify positives and negatives are called
sensitivity and specificity, respectively. The terminologies sensitivity and specificity
originated in medical diagnostics.
In generic case
Sensitivity = P(model classifies Yi as positive | Yi is positive)
Sensitivity is calculated using the following equation:

Sensitivity (Recall)=
True Positive (TP)
True Positive (TP)  False Negative (FN)
where True Positive (TP) is the number of positives correctly classified as positives
by the model and False Negative (FN) is positives misclassified as negative by the
model. Sensitivity is also called as recall.
Specificity 41

Specificity is the ability of the diagnostic test to correctly classify the test as
negative when the disease is not present. That is:
Specificity = P(diagnostic test is negative | patient has no disease)
In general:
Sensitivity = P(model classifies Yi as negative | Yi is negative)
Specificity can be calculated using the following equation:

True Negative (TN)


Specificity = True Negative (TN)  False Positive (FP)

where True Negative (TN) is number of the negatives correctly classified as


negatives by the model and False Positive (FP) is number of negatives
misclassified as positives by the model.
42
The decision maker has to consider the tradeoff between sensitivity
and specificity to arrive at an optimal cut-off probability.
Precision measures the accuracy of positives classified by the
model.
Precision = P(patient has disease | diagnostic test is positive)

True Positive (TP)


Precision =
True Positive (TP)  False Positive (FP)

F Score (F Measure) is another measure used in binary logistic


regression that combines both precision and recall and is given by:

2  Precision  Recall
F  Score 
Precision  Recall
43
44

Confusion Matrix:

44
Concordant and Discordant Pairs
45

Discordant Pairs. A pair of positive and negative observations


for which the model has no cut-off probability to classify both of
them correctly are called discordant pairs.

Concordant Pairs. A pair of positive and negative observations


for which the model has a cut-off probability to classify both of
them correctly are called concordant pairs.
46

Concordance & Discordance


A pair is said to be concordant if 1 has a higher predicted probability than 0 and is said
to be discordant if 0 has a higher predicted probability than 1 and is tied if both are
same.
Steps: After calculating the estimated probability using logistic regression divide the
data set in two groups, one with all 1’s and one with all 0’s.

Now make groups of 1 and 0 and compare their probability to decide which one is
concordant and which set is discordant

46
47

Percent Concordant:

Percent Concordant: Percentage of pairs where the observation


with the desired outcome (event) has a higher predicted probability
than the observation without the outcome (non-event).
(Number of concordant pairs)/Total number of pairs

47
48

Percent Discordance:

Percent Discordance: Percentage of pairs where the observation


with the desired outcome (event) has a lower predicted probability
than the observation without the outcome (non-event).
(Number of discordant pairs)/Total number of pairs

48
49

Percent Tied:

Percent Tied: Percentage of pairs where the observation with the


desired outcome (event) has same predicted probability as the
observation without the outcome (non-event).
(Number of tied pairs)/Total number of pairs
50

Area under curve (c statistics) = Percent Concordant + 0.5 * Percent


Tied
Example, there are four mushrooms (A, B, C and D) two of which are
non-poisonous (being 1) and two are poisonous (being 0) as (A:1, B:0,
C:1, D:0) with probability of 0.70, 0.63, 0.23, 0.15 respectively.
Thus we can have four pairs
[A,B]=[1,0]=[0.70,0.63] : Concordant
[A,D]=[1,0]=[0.70,0.15] : Concordant
[B,C]=[0,1]=[0.63,0.23] : Discordance
[C,D]=[1,0]=[0.23,0.15] : Concordant
Somers D = (% concordant pair - % discordant pair). Higher the
Somers D the better the model is.
Threshold 51

The best threshold (or cutoff) point to be used in glm models is the
point which maximises the specificity and the sensitivity. This
threshold point might not give the highest prediction in your
model, but it wouldn't be biased towards positives or negatives.
The ROCR package contain functions that can help you do this.

In some applications of ROC curves, you want the point closest to the TPR of (1) and FPR of (0). This cut
point is “optimal” in the sense it weighs both sensitivity and specificity equally.
52

Case 1
File: LPM
Dependent Variable: Probability-----(1: Loan given; 0: not given)
Independent Variable: Income
1. Run Logistic Regression analysis
2. What is odds ratio
3. Write the Logistic Regression Equation
4. Interpret the Logistic Regression equation w.r.t exponential beta and
dependent variable. What is the use of Exponential beta, walds stats and
significance value.
5. What is Classification Table.
6. What is cut off value.
7. What is hit ratio
8. Calculate Sensitivity, Specificity, Precision and Calculate F Ratio.
Case 2: Challenger Crash data
53
Read from the book chapter Logistic Regression of
Business analytics by U Dinesh Kumar
1. https://bookdown.org/egarpor/SSS2-UC3M/logreg-examps.html

1. Run Logistic Regression analysis


2. What is odds ratio
3. Write the Logistic Regression Equation
4. Interpret the Logistic Regression equation w.r.t exponential
beta and dependent variable
5. What is Classification Table
6. What is cut off value. Calculate 2-classification table at cut off
value at .5 and .2
7. What is hit ratio for cut off value of .5 and .2
8. Calculate Sensitivity, Specificity, Precision and Calculate F
Ratio from classification table at .5 and .2 cut off value.
54
Case 3: Titanic Data
Collect data from Kraggle/github for Titanic Data and solve the case

1. Run Logistic Regression analysis


2. What is odds ratio
3. Write the Logistic Regression Equation
4. Interpret the Logistic Regression equation w.r.t exponential
beta and dependent variable
5. What is Classification Table
6. What is cut off value.
7. What is hit ratio.
8. Calculate Sensitivity, Specificity, Precision and Calculate F
Ratio.

You might also like