Introduction To Logistic Regression: Implement Linear Equation
Introduction To Logistic Regression: Implement Linear Equation
If we have one explanatory variable (x1) and one response variable (z), then the linear equation
would be given mathematically with the following equation-
y = β0 + β1x1
If there are multiple explanatory variables, then the above equation can be extended to
Here, the coefficients β0, β1, β2 and βn are the parameters of the model.
So, the predicted response value is given by the above equations and is denoted by z.
Sigmoid Function
This predicted response value, denoted by z is then converted into a probability value that lie
between 0 and 1. We use the sigmoid function in order to map predicted values to probability
values. This sigmoid function then maps any real value into a probability value between 0 and 1.
In machine learning, sigmoid function is used to map predictions to probabilities. The sigmoid
function has an S shaped curve. It is also called sigmoid curve.
A Sigmoid function is a special case of the Logistic function. It is given by the following
mathematical formula.
Sigmoid Function
Decision boundary
The sigmoid function returns a probability value between 0 and 1. This probability value is then
mapped to a discrete class which is either “0” or “1”. In order to map this probability value to a
discrete class (pass/fail, yes/no, true/false), we select a threshold value. This threshold value is
called Decision boundary. Above this threshold value, we will map the probability values into
class 1 and below which we will map values into class 0.
Making predictions
Now, we know about sigmoid function and decision boundary in logistic regression. We can use
our knowledge of sigmoid function and decision boundary to write a prediction function. A
prediction function in logistic regression returns the probability of the observation being
positive, Yes or True. We call this as class 1 and it is denoted by P(class = 1). If the probability
inches closer to one, then we will be more confident about our model that the observation is in
class 1, otherwise it is in class 0.
5. The success of Logistic Regression model depends on the sample sizes. Typically, it
requires a large sample size to achieve the high accuracy.
4. Types of Logistic Regression
Logistic Regression model can be classified into three groups based on the target variable
categories. These three groups are described below:-
Binary Logistic Regression - In Binary Logistic Regression, the target variable has two
possible categories. The common examples of categories are yes or no, good or bad,
true or false, spam or no spam and pass or fail.
Ordinal Logistic Regression - In Ordinal Logistic Regression, the target variable has three
or more ordinal categories. So, there is intrinsic order involved with the categories. For
example, the student performance can be categorized as poor, average, good and
excellent.
There are two possible predicted classes: “yes” and “no”. If we were predicting that
employees would leave an organisation, for example, “yes” would mean they will, and
“no” would mean they won’t leave the organisation.
The classifier made a total of 165 predictions (e.g., 165 employees were being studied).
Out of those 165 cases, the classifier predicted “yes” 110 times, and “no” 55 times.
In reality, 105 employees in the sample leave the organisation, and 60 do not.
Sensitivity/Recall = TP/(TP + FN). When it’s actually yes, how often does it predict yes?
i.e 100/(100+5)
Precision = TP/(TP+FP). When it predicts yes, how often is it correct?100/(10+100)
Specificity = TN/(TN + FP).When it’s actually no, how often does it predict no?? i.e
50/(50+10)
F1 Score = 2*((precision*recall)/(precision+recall)).
It is also called the F Score or the F Measure. Put another way, the F1 score conveys the
balance between the precision and the recall.