Logistic Regression
Logistic Regression
Logistic Regression
Classification Problems 2
• Anomaly Detection
• Text Classification
4
Logistic Regression -
Supervised Learning Algorithm
5
Logistic Regression
Classification
Discrete Choice
Class probability
6
Probability Modelling
In regression analysis, it is assumed that the dependent variable is a metric (interval
and ratio) scale variable and independent variables are a combination of metric as well
as non-metric variables.
There is a special class of models for regression analysis in which the dependent
variable has only two values 0 and 1, where 0 represents the absence of a condition
and 1 represents the presence of a condition.
Here the dependent variable in this model is a dichotomous variable. The independent
variables are a combination of metric and non- metric variables, same as in a normal
regression analysis.
These types of models are known as probability models. The objective is to determine
the impact of input (independent) variables on the probability of occurrence of the
output (dependent) variable.
7
Probability Modelling
In regression analysis, it is assumed that the dependent variable is a metric (interval and
ratio) scale variable and independent variables are a combination of metric as well as non-
metric variables.
• There is a special class of models for regression analysis in which the dependent variable has only two values 0
and 1, where 0 represents the absence of a condition and 1 represents the presence of a condition.
Here the dependent variable in this model is a dichotomous variable. The independent
variables are a combination of metric and non- metric variables, same as in a normal
regression analysis.
These types of models are known as probability models. The objective is to determine the
impact of input (independent) variables on the probability of occurrence of the output
(dependent) variable.
8
Probability Models (binary dependent variable) with one
explanatory variable
Pi= 𝜶 + 𝜷𝟏𝑿𝒊 + 𝜺i
Assume that OLS is used to estimate the above model, then how to interpret
the estimated value of Yi?
9
• We may have the graph of the estimated model as thus:
10
Therefore , such a model is called Linear Probability Model
Note that it is
non-linear
and its value
lies between
0 and 1.
16
16
17
18
LOGIT Model…
• The LOGIT Model expresses the probability p that a dependent variable
Y takes the value 1 given Xi.
• For the LOGIT Model, a particular type of logistic function is used which is called SIGMOID
FUNCTION given below –
𝟏
f(x) =
1+e−z
• Using the above function form, we may express the probability p that a dependent variable Y
takes the value 1 given Xi assuming that there is only one explanatory variable.
1
• f(x) = ; Where z=α+βX+ε
1+ e −z
i
19
Logistic Regression
21
What is a logistic regression:
Logistic Model
Odd Ratio 23
If odd Ratio = 1,
As the predictor increases the probability of occurrence of an event = Probability of non-
occurance of an event
Requirements
DV nominal
IV either scale or Nominal
26
Coding of variables
Coding of nominal DV: generally 0 means absence of property of
interest (absence of heart disease). The negative response is
coded as zero.
Assumptions
There is a linear relationship between IVs and logit (loge) of DV
The error term should be independent
Rule of 10: There should be at least 10 cases per IV, but some
suggest 15
There should not be multicollinearity in data.
Multicollinearity occurs when correlation between two variables are
very high (above .85). You can check it via tolerance and VIF
There should not be significant outliers in data.
28
Wald Test
Wald test
A Wald test is used to evaluate the statistical
significance of each coefficient (b) in the model.
29
Model Fit
Log-likelihood (L):
Model fit is assessed using log-likelihood function which indicates the extent of unexplained information
after the model has been fitted. Its just opposite to R2 in linear regression
Deviance: it is defined as
Deviance = -2 LL
We can see the difference between baseline model (a model that has no variable and contains only
constant) and other models that we specify and calculate a Chi square Godness of fit
Omnibus Test = Chi square difference or Likelihood Ratio*= -2LLbaseline – (-2LL default model)
= 2LL default model – 2LLbaseline model
Df = number of parameters in default model – number of parameters in baseline model
*Note: It is called ratio because subtractions of logs are same as division of numbers, 8/3 is same is log 8
– log 3
Statistics Associated for Logistic Regression bj 0 30
There are several statistics which can be used for comparing alternative models
or evaluating the performance of a single model:
Model Chi-Square. Use the “Model Chi-Square” statistic to determine if the
overall model is statistically significant. Model Chi-square thus tests the null
hypothesis that all population logistic regression coefficients except the constant
are zero. When probability (model chi-square) <= .05, we reject the null
hypothesis that knowing the independents makes no difference in predicting the
dependent in logistic regression, and conclude that at least one coefficient ;
Hosmer-Lemeshow test of goodness of fit. Hosmer-Lemeshow test of goodness
of fit is an alternative method for testing the same hypothesis. If Hosmer-
Lemeshow test of goodness of fit is not significant ( p ), then the model has
adequate fit. By the same token, if the test is significant, the model does not
adequately fit the data. 30
31
Introduction
31
32
Link
https://www.youtube.com/watch?v=FG6FRZLtCMs
https://www.linkedin.com/pulse/checks-logistic-regressions-sray-
agarwal
https://www.linkedin.com/pulse/confusion-matrix-type-i-ii-error-
swaroop-shinde
32
33
Errors in Classification
34
35
36
37
38
Confusion Matrix:
38
39
Misclassifying a true negative as positive or False
positive is Type I error and otherwise s type II error
39
Sensitivity, Specificity and Precision 40
The ability of the model to correctly classify positives and negatives are called
sensitivity and specificity, respectively. The terminologies sensitivity and specificity
originated in medical diagnostics.
In generic case
Sensitivity = P(model classifies Yi as positive | Yi is positive)
Sensitivity is calculated using the following equation:
Sensitivity (Recall)=
True Positive (TP)
True Positive (TP) False Negative (FN)
where True Positive (TP) is the number of positives correctly classified as positives
by the model and False Negative (FN) is positives misclassified as negative by the
model. Sensitivity is also called as recall.
Specificity 41
Specificity is the ability of the diagnostic test to correctly classify the test as
negative when the disease is not present. That is:
Specificity = P(diagnostic test is negative | patient has no disease)
In general:
Sensitivity = P(model classifies Yi as negative | Yi is negative)
Specificity can be calculated using the following equation:
2 Precision Recall
F Score
Precision Recall
43
44
Confusion Matrix:
44
Concordant and Discordant Pairs
45
Now make groups of 1 and 0 and compare their probability to decide which one is
concordant and which set is discordant
46
47
Percent Concordant:
47
48
Percent Discordance:
48
49
Percent Tied:
The best threshold (or cutoff) point to be used in glm models is the
point which maximises the specificity and the sensitivity. This
threshold point might not give the highest prediction in your
model, but it wouldn't be biased towards positives or negatives.
The ROCR package contain functions that can help you do this.
In some applications of ROC curves, you want the point closest to the TPR of (1) and FPR of (0). This cut
point is “optimal” in the sense it weighs both sensitivity and specificity equally.
52
Case 1
File: LPM
Dependent Variable: Probability-----(1: Loan given; 0: not given)
Independent Variable: Income
1. Run Logistic Regression analysis
2. What is odds ratio
3. Write the Logistic Regression Equation
4. Interpret the Logistic Regression equation w.r.t exponential beta and
dependent variable. What is the use of Exponential beta, walds stats and
significance value.
5. What is Classification Table.
6. What is cut off value.
7. What is hit ratio
8. Calculate Sensitivity, Specificity, Precision and Calculate F Ratio.
Case 2: Challenger Crash data
53
Read from the book chapter Logistic Regression of
Business analytics by U Dinesh Kumar
1. https://bookdown.org/egarpor/SSS2-UC3M/logreg-examps.html