Mathematics Behind Logistic Regression Model 1598272636
Mathematics Behind Logistic Regression Model 1598272636
Mathematics Behind Logistic Regression Model 1598272636
Logistic regression is the most widely used machine learning algorithm for
classification problems. In its original form it is used for binary classification
problem which has only two classes to predict. However with little extension
and some human brain, logistic regression can easily be used for multi class
classification problem. In this post I will be explaining about binary classification.
I will also explain about the reason behind maximizing log likelihood function.
Sigmoid Function
So Sigmoid function gives us the probability of being into the class 1 or class 0.
So generally we take the threshold as .5 and say that if p >.5 then it belongs to
class 1 and if p<.5 then it belongs to class 0. However this is not the fixed
threshold. This vary based on the business problem. And what threshold value
should be, we can decide it with the help of AIC and ROC curves. Which I will be
explaining later, in this post I will target mostly on how logistic regression works.
As I have already written above that logistic regression uses Sigmoid function to
transform linear regression into the logit function. Logit is nothing but log of
Odds. And then using log of Odds it calculate the required probability. So let’s
understand first what the log of Odds is.
Log of Odds:
In the below info graphics I have explained complete working of logistic model.
One more thing to note here is that logistic regression uses maximum likelihood
estimation (MLE) instead of least squares method of minimizing the error which
is used in linear models.
Now let’s understand how log likelihood function behaves for two classes 1 and
0 of target variable.
Case 1: when Actual target class is 1 then we would like to have predicted target
y hat value as close to 1. Let’s understand how log likelihood function achieve
this.
Putting y_i =1 will make the second part (after the +) of the equation 0 and only
remaining is ln(y_i hat). And y_i hat will be between 0 to 1. ln(1) is 0 and ln(less
than 1) will be less than 0 means negative. Hence max value of Log likelihood
will be 0 and this will be only when y_i hat will be as close to 1. So maximizing
the log likelihood is equivalent to getting y_i hat as close to 1 which means it will
clearly identify predicted target as 1 which is same as actual target.
Case 2: when actual target class is 0 then we would like to have predicted target
y hat as close to 0. Let’s again understand this how maximizing log likelihood in
this case will produce y_i hat closer to 0.
Putting y_i = 0 will make the first part (before + sign) of the equation 0 and only
(1-y_i)ln(1-y_i hat) will remain. 1-y_i will again be 1 as y_i is 0 hence after
reducing further equation will remain ln(1 – y_i hat). So now again 1 – y_i hat
will be less than 1 as y_i hat will be between 0 to 1. So maximum value of ln(1 –
y_i hat) can be 0. Means 1 – y_i hat should be close to 1 which implies y_i hat
should be close to 0. that is as expected as actual value y_i is also 0.This is the
reason we maximize the log likelihood.
Thank You
For more Articles please visit -> https://ashutoshtripathi.com