Logistic Regression
Logistic Regression
Chapter - 1
Business Analytics
Logistic Regression
Linear Regression
Recall:
Linear regression is the process of
finding a best fitting straight line
passing through points with the
objective of being able to use the
equation of the line as a model for
prediction.
The key assumption is that both
the predictor and target variables
are continuous as seen in this chart
below. Intuitively, one can state that
when X increases, Y increases along
the slope of the line.
Logistic Regression - Introduction
In the binary logistic regression, the outcome can have only two
possi le types of alues e.g. Yes or No , u ess or Failure .
The target variable is Class which binary with levels- Bad, Good
Bad: Customer with high credit risk
Good: Customer with low credit risk
We can get the descriptive statistics for all the variables by giving
following command:
Output
We need to convert the target variable Class from Bad/Good to 0/1 to
apply logistic regression. This can be done by giving the following
command.
Partitioning the Data Set
One issue that arises when fitting the model is to check how
well the newly created model behaves when applied to new
data.
To address this issue, the data set can be divided into two
partitions a training partition used to build the model and a
test partition used to validate how well our model is performing.
Now we will partition our German Credit data set into training and
testing sets using the createDataPartition function.
Partitioning the Data Set
The training data set contains 700 observation and the testing data
set contains 300 observations.
We will now consider training data set and use logistic regression to
model Class as a function of 5 predictor variables Age, Amount,
ForeignWorker, Property.RealEstate, Housing.Own ,
CreditHistory.Critical and Purpose.NewCar using glm function.
Model Building
Model Building
Interpreting Output
All the variables in our model are significant.
The null deviance indicates the deviance for a model without
variables whereas the residual deviance is for our model.
The null deviance and residual deviance need to be as far as possible
from each other.
We can see that the null deviance is 839.40 and residual deviance is
769.93 which indicates that there is not much considerable difference
between them.
Estimates from logistic regression characterize the relationship
between the predictor and response variable on a log-odds scale.
The estimate for variable Age is 0.01839. This implies that for every 1
unit increase in Age, the log odds of the consumer having good credit
increases by exp(0.01839) = 1.01856
Similarly, for other variables.
Prediction
Since the p-value is greater than 0.05, we do not reject H0. i.e. the
current model is a good fit.
Wald Test
A Wald test is used to evaluate the statistical significance of each
coefficient in the model and is calculated by taking the ratio of the
square of the regression coefficient to the square of the standard
error of the coefficient.
We test whether the coefficient of the independent variable in the
model is significantly different from zero.
If the test fails to reject the null hypothesis, this suggests that
removing the variable from the model will not substantially harm the
fit of that model.
To apply Wald test, regTermTest function from survey library will be
used.
Wald Test
Since the p-value is less than 0.05, we reject H0 i.e. removing the
variable ForeignWorker from the model will harm the fit of the model.
Mc Fadden R2
Unlike linear regression, there is no R2 statistic which explains the
proportion of variation in the dependent variable that is explained by
the predictors.
For this we use Mc Faddens R2.
The predictors in our model explain just the 8.27% variation in the
data.
This suggests that one or more variables are missing in our model.
We can try a model by considering all the variables.
ROC Curve
Receiver Operating Characteristic (ROC) curve
is a plot of the true positive rate against the
false positive rate.
Since the area under the curve is 0.6944, we can say the discrimination
ability of our model is fair.
Kolmogorov Smirnov Chart
Kolmogorov Smirnov chart measures the performance of classification
models.
The odel li e represe ts the ase of apturi g the respo ders if you
go by the model generated probability scores where you begin by
targeting datapoint with highest probability scores.
Kolmogorov Smirnov Chart
As the measure
between the goods
and bads is very
small, we can say
that the
performance of the
model is not good.
Kolmogorov Smirnov Statistic
Since all the paths are mutually exclusive and there are three paths
which lead to a defective bulb, to answer the original question we
must add the probabilities for the three paths,
i.e. 4/30 + 1/3*1/6 + 1/3*3/8 = 2/15 + 1/18 + 1/8 = 0.314.
Tree Diagram Example 2